Cross-Material Damage Detection and Analysis for Architectural Heritage Images

Yu, Qingman; Yuan, Xin; Xu, Lingyu

doi:10.3390/buildings15173100

Open AccessArticle

Cross-Material Damage Detection and Analysis for Architectural Heritage Images

by

Qingman Yu

,

Xin Yuan

and

Lingyu Xu

^*

School of Architecture and Design, Beijing Jiaotong University, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(17), 3100; https://doi.org/10.3390/buildings15173100

Submission received: 4 August 2025 / Revised: 21 August 2025 / Accepted: 26 August 2025 / Published: 28 August 2025

(This article belongs to the Special Issue Analysis, Conservation, and Refurbishment Methods of Heritage Architecture Based on Modern Technology)

Download

Browse Figures

Versions Notes

Abstract

This study addresses the strategic requirements for cultural heritage preservation as specified, alongside the needs of high-quality urban-rural development. It highlights the inefficiency, subjectivity, and lack of intelligence in traditional manual detection methods used in architectural heritage preservation. Consequently, this research explores intelligent damage detection and quantitative analysis through image detection technology based on artificial intelligence. Firstly, a cross-material classification standard for architectural remnants is developed to facilitate data annotation in image detection techniques. Secondly, a dense object detection algorithm specifically designed for architectural images is proposed to address challenges such as boundary ambiguity and high-density damage in architectural heritage. This algorithm effectively facilitates intelligent detection and quantitative analysis of architectural heritage damage. On this basis, multiple datasets for architectural heritage damage detection are compiled and constructed on-site. Experimental results obtained from these datasets demonstrate that the proposed method surpasses comparative approaches across various metrics, including average precision, thus confirming its feasibility and effectiveness. Additionally, a software application for intelligent damage detection and quantitative analysis of architectural heritage images is developed, providing novel insights and support for the field of architectural heritage preservation.

Keywords:

architectural heritage preservation; intelligent damage analysis; image object detection; damage severity classification; architectural damage dataset

1. Introduction

Architecture not only fulfills basic daily requirements but also serves as a repository of a nation’s cultural heritage. Buildings from specific historical periods encapsulate the lifestyle, social structure, aesthetic values, and folk traditions of their era. Therefore, the preservation and restoration of architectural heritage are vital for maintaining historical continuity and reinforcing national identity.

Since the Renaissance, with the advancement of urban development, countries have gradually recognized the value of architectural heritage, introduced the concept of its preservation, and actively explored protective measures.

The 1931 Athens Charter for the Restoration of Historic Monuments marked the beginning of an international consensus on safeguarding cultural architectural heritage. Subsequently, the 1964 Venice Charter established a comprehensive framework for the conservation of architectural heritage, although its focus at that time was predominantly on the physical preservation of individual monuments. As the understanding of heritage conservation evolved, the 1987 Washington Charter introduced the concept of integrated conservation, stressing holistic protection of historic urban areas. Further expanding this scope, the 2011 Recommendation on the Historic Urban Landscape broadened the definition of architectural heritage to include the wider “historic urban landscape”, emphasizing the alignment of heritage preservation with urban development dynamics [1].

The approach to architectural heritage has evolved from solely protecting physical structures to integrating them harmoniously with socio-economic development and environmental conservation.

Currently, damage identification and analysis are key focus areas within the field of architectural heritage preservation. Traditional methods rely heavily on manual identification by researchers, which is often time-consuming, labor-intensive, and subject to varying accuracy due to differences in expertise. In recent years, the deep integration of artificial intelligence (AI) into this field has led to significant advancements in digital technologies such as 3D laser scanning, Building Information Modeling (BIM), Geographic Information Systems (GIS), and Virtual Reality (VR). These technologies have achieved substantial progress in architectural surveying, construction of historical information databases, creation of building information models, virtual reality scene reconstruction, and heritage safety monitoring, providing robust support for intelligent damage identification in architectural heritage.

However, conventional non-destructive testing (NDT) techniques based on acoustic and thermal principles encounter challenges of high cost and low efficiency when addressing large-scale and high-density architectural heritage damages. In this context, AI-based image object detection technology presents a viable solution for large-scale applications and the advancement of architectural heritage damage detection. Traditional object detection algorithms (e.g., DPM) suffer from high computational complexity and struggle to satisfy practical detection requirements given variations in target appearance, illumination conditions, and background diversity. With significant improvements in computational power and the exponential growth of image datasets, Deep Convolutional Neural Networks (DCNNs) have attracted considerable attention from researchers. Currently, mainstream object detection architectures primarily fall into two categories: two-stage detectors (e.g., R-CNN, which achieves higher detection accuracy) and one-stage detectors (e.g., the YOLO series, which offers faster detection speeds) [2].

Existing image object detection technologies primarily target common objects such as pedestrians and vehicles. However, architectural damage presents unique challenges due to its indistinct boundaries, widespread distribution on heritage surfaces, and the presence of overlapping and adjacent instances with diverse material characteristics. These complexities hinder existing techniques from effectively managing high-density targets. Consequently, there is an urgent need for precise task delineation and targeted algorithmic research in image object detection to address damage detection on architectural heritage surfaces, which is an essential requirement in the field of architectural heritage preservation.

Despite ongoing improvements in general object detection methods, intrinsic limitations remain in their core components (backbone, neck, and head) when applied to architectural heritage damage. For instance, backbones such as CSP-DarkNet53 [3] and popular feature extraction networks such as Transformer [4] perform down sampling operations to minimize feature map resolution, thus eroding the fine-grained information for small targets or material-specific textures in architectural images. Neck modules, including Feature Pyramid Networks (FPN) [1] and Path Aggregation Network (PAN) [5], utilize hierarchical fusion mechanisms that inadequately integrate features, often neglecting essential high-level features for small objects. This results in an imbalance between semantic and spatial information. Moreover, these neck module designs exacerbate noise in low-contrast boundary regions and lack flexible adaptation to diverse material properties. Regarding the head module, anchor-based designs frequently misalign with small objects due to reliance on prior knowledge, while anchor-free variants struggle with regression stability concerning small objects. Additionally, current methods face challenges with material classification and localization tasks, as indeterminate loss functions fail to accurately model boundary uncertainty. Collectively, these limitations constrain existing object detection methods from achieving high-resolution discrimination and uncertainty-aware modeling in complex scenarios.

This paper focuses on damage identification and analysis within architectural heritage images. It outlines the current demands in the field, refines the requirements for image-based cross-material damage detection tasks driven by artificial intelligence, and introduces a dense object detection method specifically designed for such applications. This enables the detection of cross-material damage in architectural heritage images, facilitating data collection and practical analysis.

The contributions of this paper are as follows:

1.: The paper clarifies and defines cross-material damage detection tasks for architectural heritage images and proposes an AI-driven approach aimed at preserving architectural heritage.
2.: A method for cross-material dense damage detection in architectural heritage images is introduced for effectively addressing the limitations of existing technologies in detecting blurred-boundary and high-density damage points, enabling large-scale and efficient detection of architectural damage.
3.: A series of damage image datasets oriented towards architectural heritage preservation is constructed, providing a benchmark dataset for related research fields. Comparative experiments validate the effectiveness of the proposed method.

2. Related Work of Architectural Heritage Damage Detection

2.1. Digital Technologies for Architectural Heritage

Digital technologies in architectural heritage involve transforming existing physical structures into digital models, providing precise measurements that bolster preservation, restoration, and research efforts. In 2018, the Architectural Design Institute of Tianjin University demonstrated the significant value of digital technology in the lifecycle of historical architectural preservation through the restoration of Duan Qirui’s former residence as an example [6]. The Notre-Dame de Paris fire in 2019 caused severe damage to the structure. Professor De Luca, Director of the French National Center for Scientific Research (CNRS), led a team in executing detailed digital restoration work, including digital mapping, visualization software, virtual reality, and cloud computing technologies, and achieved remarkable results in heritage restoration [7].

Common digital technologies include laser scanning, 3D reconstruction, and virtual reality. Laser scanning allows rapid and precise measurement and documentation through laser measuring equipment. This technology was utilized during the restoration of Beijing’s Forbidden City, delivering high-precision digital models that supported the restoration process [8]. Meanwhile, 3D reconstruction technology involves scanning, measuring, and simulating architectural heritage to generate high-resolution digital models. Virtual reality technology brings digital models to life, offering immersive experiences. Additionally, Professor Li Guo and colleagues applied integrated 3D laser scanning and close-range photogrammetry to acquire high-resolution spatial data, supporting a systematic analysis of the overall layout, structural composition, and landscape sequencing of the Three Su Ancestral Temple [9]. Currently, these technologies are mainly used for building heritage model databases in architectural conservation, while their application in damage identification remains insufficient.

2.2. Non-Destructive Testing (NDT) for Architectural Heritage

NDT technologies in architectural heritage facilitate the effective examination and testing of structural features without compromising the building’s integrity or materials. Current NDT methods in architecture employ techniques based on acoustic, electromagnetic, radioactive, penetrative, and thermal properties. These methods are applied to masonry, wood, and concrete structures for heritage conservation purposes. For instance, Professor Zhao Jiajin et al. utilized a self-designed active thermal stimulation device to detect detachment damage in cave temples, enabling reliable quantification of the air layer thickness associated with exfoliation configurations [10].

However, damage identification in architectural heritage often involves multiple heterogeneous materials, whereas most existing non-destructive testing (NDT) techniques are primarily designed for single-material analysis. Moreover, even within the same material category, variations in composition and internal microstructure still pose significant challenges, which can complicate comprehensive assessment and detection. These obstacles sometimes reduce the accuracy of damage identification, underscoring the need for further development and refinement in this area [11]. In recent years, researchers have explored integrating Geographic Information Systems (GIS) into cultural heritage conservation. This integration aims to precisely identify risk zones and provide a scientific basis for preventive preservation measures [12].

2.3. Image Object Detection Technology for Architectural Applications

Image object detection technology combines object localization and classification by employing image processing and machine learning techniques to identify objects of interest within images [13]. These technologies are categorized into traditional algorithms and those based on deep learning. Traditional algorithms, such as the SIFT-series and AdaBoost-series, are highly effective for specific objects but encounter difficulties when dealing with large and diverse classes. Deep learning-based algorithms, particularly the DCNN series, are inspired by biological visual systems and capitalize on the unique characteristics of deep learning. Their primary advantages include automatic feature learning, efficient parameter utilization, and strong nonlinear fitting capabilities. Currently, mainstream object detection architectures are divided into two categories: two-stage detection and one-stage detection.

Two-stage detection methods, such as RCNN, separate the tasks of object localization and classification, offering higher detection accuracy but at the cost of slower inference speeds. Conversely, one-stage detection methods, exemplified by the YOLO series, predict class probabilities and bounding box coordinates in a single step, resulting in a more streamlined architecture. While one-stage detectors achieve faster detection speeds, their accuracy typically falls short compared to two-stage architectures [2].

However, both architectures face challenges in detecting small objects due to issues like weak feature representation, susceptibility to environmental interference, and difficulties in precise localization. To enhance small object detection accuracy, two-stage detectors have incorporated the Feature Pyramid Network (FPN) algorithm, though its slower speed limits practical deployment [14]. Meanwhile, one-stage detectors, particularly from the YOLOv8 to YOLOv11 iterations, have progressively narrowed the accuracy gap with two-stage methods while maintaining rapid inference speeds, making them the preferred choice for real-time applications [15].

In the field of architectural damage identification, there are practical requirements such as large-scale heritage detection, real-time dynamic monitoring, and on-site applications. Although two-stage algorithms offer higher recognition accuracy, their complex workflows and high computational costs limit their widespread adoption in architectural heritage conservation. In contrast, the YOLO series algorithms showcase greater practicality and versatility due to their efficient single-stage architecture, precise instance segmentation capabilities, and lightweight deployment advantages [16,17]. Notably, Professor Zhu Xundiao and the team developed a YOLOv8-based method for detecting and localizing surface damage in the masonry structure of the White Pagoda located in Lanzhou, Gansu Province [18].

2.4. Summary of Research Status

Currently, damage detection in architectural heritage predominantly employs manual inspection and non-destructive testing (NDT). On the one hand, manual inspection requires significant time and labor investment, making it inefficient for processing large-scale data. On the other hand, non-destructive testing faces challenges due to variations in materials and internal structures, which can impair accuracy. Recent advancements in digital technologies and image object detection offer a robust theoretical and practical basis for developing comprehensive databases of architectural heritage damage and utilizing images for precise and efficient damage recognition. Among these methods, the YOLO series algorithms exhibit distinct advantages for architectural heritage damage detection, owing to their superior processing speed and progressively improved recognition accuracy.

3. Cross-Material Intensive Deterioration Detection Methodology for Architectural Heritage Imagery

3.1. Integrated Workflow for Heritage Deterioration Detection

The damage detection framework for architectural heritage developed in this study focuses on visual deep learning technology. It integrates cross-material damage classification standards with multi-modal data processing techniques to form an intelligent system that spans the entire process from data collection to model training and intelligent analysis. This provides a data-driven scientific solution for architectural heritage preservation. As illustrated in Figure 1, the framework consists of three core modules that are closely aligned with the cross-material damage classification system, addressing the subjectivity and single-material assessment limitations inherent in traditional manual detection methods.

Data Acquisition and Preprocessing: Addressing the diversity of materials such as timber, brick, tile, earthen, stone, and lime-based structures, as well as the complex scenarios in architectural heritage, a multi-perspective approach was employed to acquire 7719 images from Shanxi, Beijing, and Hebei. These images encompass damage types found on roofs, walls, paving tiles, beams, and columns. Through a dual screening process combining manual and algorithmic methods, images with imaging defects (MTF < 0.3, tilt > 5°, etc.) and environmental interference were eliminated, resulting in a curated set of 4581 high-quality images. The dataset was divided into a training set (3339 images) and a test set (1242 images) at a 7:3 ratio (or 6:4 for small samples). Images were standardized to JPEG format and annotated with a “material-damage-grade” triplet based on cross-material classification standards to establish a structured foundational database.

Model Training and Optimization: With YOLO-v11 as the core algorithm, the network architecture was optimized for the multi-scale nature of damage targets, which enhances the extraction of subtle damage features and improves detection accuracy for small objects. A triple-classifier in the head network provides outputs for damage location, grade, and confidence. Collaboration on the LabelImg platform ensured precise grading annotations using quantitative metrics, such as crack size and spalling area, resulting in 3339 standardized training samples complete with spatial coordinates and semantic attributes. Sample imbalance issues were addressed through dynamic adjustment of class weights in the loss function. Transfer learning facilitated the loading of training weights on NVIDIA GTX 1660 GPUs (NVIDIA, Santa Clara, CA, USA), with model performance assessed via confusion matrices and mAP metrics. Subsequent ablation experiments further refined model optimization.

Detection and Analysis: The system supports the import of multi-modal data, including documents and real-time captures, and automatically parses metadata across platforms. YOLO-v11 processes the input images to deliver millisecond-level detection results, with damage grades visually differentiated by color (Grade I: blue, Grade II: cyan, Grade III: white) and accompanied by confidence scores, enabling automated grading recognition. The quantitative analysis module extracts geometric parameters, such as crack orientation and defect area, and material-specific features, constructing a probability matrix of damage types to assist manual verification. Interactive visualization features allow for historical data comparison and generate reports with repair recommendations. The Peripheral Detection Module (under development) permits users to upload visual displays of adjacent material damage correlations, compile heat maps of detection results, produce structured reports containing data on damage locations, grade statistics, and repair recommendations, and support interactive displays. The historical data management system enables multi-dimensional retrieval, facilitating the tracking of damage evolution and providing temporal data support for long-term protection planning.

This framework is applicable to fields such as architectural restoration, urban renewal, and heritage conservation. It is capable of identifying risks like masonry weathering and timber fracturing while also providing quantitative safety assessments. Key technological innovations include interdisciplinary integration (embedding heritage conservation disease classification standards into visual detection), full-process standardization from data collection to decision-making, and improved generalization capabilities achieved by training across multiple materials and scenarios to enhance detection precision. This shift from experience-led to data-driven practices in architectural heritage protection establishes an intelligent detection system encompassing major material types, thereby offering a replicable technical paradigm for cultural heritage conservation.

3.2. Development of a Cross-Material Classification Standard for Building Deterioration

In response to the diversity of materials in architectural heritage and informed by literature reviews and professional references such as Cultural Relics Conservation Science [19] and Conservation and Restoration Techniques for Chinese Cultural Relics [20], a “Cross-Material Classification System for Building Deterioration Levels” was developed to form the foundational database. This system aligns different materials with their applicable architectural structures, providing a systematic framework for damage assessment and restoration. While a review of key international resources reveals no pre-existing multi-material classification framework, this system is grounded in globally recognized conservation principles from bodies like ICOMOS and UNESCO. It aligns with UNESCO’s 1972 Convention Concerning the Protection of the World Cultural and Natural Heritage [21], which emphasizes adopting appropriate scientific and technical measures for heritage identification, protection, conservation, and rehabilitation. Additionally, it reflects ICOMOS’ 2017 Principles for the Conservation of Wooden Built Heritage [22], which focuses on material-specific vulnerabilities—such as wood’s susceptibility to temperature changes, fungal or insect damage, and natural disasters. This adherence provides the system with a robust theoretical basis and ensures relevance within international practice.

The architectural damage analysis framework utilizes material as the primary classification criterion, dividing into six fundamental categories: timber structures, brick structures, tile structures, earthen structures, stone structures, and mortar structures. Each category is associated with specific deterioration characteristics, establishing secondary indicators for damage types. For instance, timber structures focus on biological and physical deterioration such as insect infestations, decay, and cracking. Brick structures concentrate on weathering/spalling, crack propagation, and salt efflorescence/powdering. Earth structures examine rain erosion, wall cracking, and foundation settlement. This classification method facilitates precise localization of damage issues across various material types within architectural heritage, ultimately categorizing common damage types into Slight Deterioration (Level I), Moderate Deterioration (Level II), and Severe Deterioration (Level III) (Figure 2).

By establishing a correlation matrix linking material type, deterioration type, severity level, and applicable building structure, foundational data support is provided for the development of computer vision-based intelligent detection technologies. This system not only delivers scientific evaluation criteria for architectural heritage preservation projects and overcomes the limitations of traditional single-material assessments but also establishes a systematic evaluation framework encompassing key material types. It promotes a transition from experience-led to data-driven preservation and restoration efforts through the formulation of quantitative standards. This approach facilitates the scientific and precise advancement of cultural heritage conservation, offering a unified technical language for protection planning, restoration design, and preventive conservation of diverse material architectural heritage. Moreover, it holds significant theoretical innovation and practical application value.

3.3. Intensive Object Detection Algorithm for Architectural Image

3.3.1. Overall Framework

An Intensive Object Detection Algorithm (IODA) is introduced specifically for analyzing architectural images. This algorithm incorporates YOLOv5s [23] as its foundational framework and introduces two novel loss functions aimed at enhancing the detection of intensive damage: Superpixel-based Loss for initial, coarse detection, and Graph-based Loss for more detailed, intensive detection. The architecture of YOLOv5 consists of three primary modules: the Backbone, the Neck, and the Head. Our algorithm utilizes CSP-DarkNet53 [3] as the backbone, which employs multiple convolutional and pooling layers to extract features. This configuration enhances the number of channels through slicing and concatenation, thereby increasing sensitivity to small objects by sampling feature maps. The Neck module, situated between the Backbone and the Head, is designed to further guide feature extraction and fusion. In the standard YOLOv5, the Neck architecture integrates Feature Pyramid Networks (FPN) [14] and Path Aggregation Network (PAN) [5]. The FPN effectively propagates higher-order semantic information to bolster low-level features, while the PAN supplies low-level spatial data to augment high-level features. This integration facilitates a more effective fusion approach, thereby improving the detection of intricate small objects. In the Backbone module, the Cross Stage Partial (CSP) [24] and attention layers improve feature extraction capabilities with minimal computational expense. For the Neck module, the combination of FPN and PAN provides enhanced guidance for multi-scale feature fusion and propagation. In the Head module, YOLOv5 offers several loss functions accompanied by efficient optimization strategies.

To improve the detection of intricate objects in architectural heritage images, Superpixel-based Loss and Graph-based Loss are proposed for coarse and intensive detection, respectively.

3.3.2. Superpixel-Based Loss for Coarse Detection

The presence of intensively damaged objects in architectural heritage images often leads to a high rate of missed detections during the Region of Interest (ROI) extraction phase of object detection algorithms. To address this issue, this paper introduces a Superpixel-based Loss [25] for coarse detection. This approach effectively captures the global coarse correlation among partially obscured and severely damaged objects, optimizing their confidence scores to recover missing anchors, thereby reducing the frequency of missed detections.

In standard ROI extraction processes, images, regardless of resolution, are analyzed to determine the locations and confidence scores of all potential objects they contain. Within the YOLOv5 network, a classical ResNet [26] serves as the core feature extraction network. The backbone employs two parallel 1 × 1 convolution layers: one for bounding box regression and the other for determining object presence within these boxes. As sliding windows move across the image, multiple regions may contain target objects indicative of architectural damage. The maximum number of potential damages per region is denoted as k. Consequently, the bounding box regression layer generates 4k coordinates for the k bounding boxes, while the classification layer provides binary outcomes to ascertain whether each proposed bounding box contains an object. Anchors are centrally positioned within sliding windows and come in various scales and aspect ratios, producing multiple anchors at each position. Specifically, three scales and aspect ratios result in nine anchors per sliding position. For a feature map with W×H resolution, this amounts to W × H × k anchors. Within this context, each anchor, characterized by multiple scales and aspect ratios, is referred to as a Superpixel.

To effectively train the ROI extraction network, a binary classification label is constructed to determine if an anchor encapsulates target objects. An anchor box overlapping with a ground-truth box at an Intersection over Union (IoU) greater than 0.7 is assigned a positive label. A single ground-truth box can assign positive labels to several anchor boxes. If the IoU ratios with all ground-truth boxes fall below a certain threshold, non-positive anchors receive negative labels. Anchors that do not qualify as either positive or negative do not impact the training objective. The Superpixel-based Loss for coarse detection is defined as follows:

\begin{matrix} L_{s} = \frac{1}{N_{c l s}} \sum_{i} L_{c l s} (p_{i}, p_{i}^{*}) + λ \frac{1}{N_{r e g}} \sum_{i} p_{i}^{*} L_{r e g} (t_{i}, t_{i}^{*}) . \end{matrix}

(1)

where

i

denotes the index of an anchor within the mini-batch, and

p_{i}

represents the predicted probability of the anchor being a target object. The ground-truth label

p_{i}^{*}

is 1 if the anchor box is a positive sample; otherwise it is 0.

t_{i}

refers to the coordinates of the ground-truth box associated with the positive anchor. Moreover,

L_{1}

-based smoothing loss

L_{r e g} (t_{i}, t_{i}^{*}) = R (t_{i} - t_{i}^{*})

is employed for regression, where

p_{i}^{*} L_{r e g}

indicates that the regression loss is activated solely for positive anchor boxes

p_{i}^{*} = 1

and disabled otherwise

p_{i}^{*} = 0

.

3.3.3. Graph-Based Loss for Intensive Detection

Following the establishment of a Superpixel-based loss for coarse detection, a Graph-based Loss is introduced to enhance capabilities for intensive detection. This approach constructs a correlation-based graph to capture relational networks among objects within images and designs a graph-based loss specifically tailored to detect intensive damage. Traditional detection networks typically output candidate box locations along with their classification scores.

In a conventional object detection network, the outputs include location coordinates

p_{p o s}

and predicted classification scores

p_{c l s}

for candidate boxes. To comprehensively capture the correlations between objects, two distinct types of graph structures are constructed: spatial and semantic correlations. These graphs are subsequently fused into a unified structure. Edges in these graphs are determined based on the distance between features. The first graph type uses the absolute position of each candidate box, connecting each candidate box to its nearest neighbor to form a simple graph structure. Spatial correlations are represented by normalized distances between candidate boxes:

d_{i, j}^{R} = (Δ x, Δ y, Δ w, Δ h) = (\frac{x_{i} - x_{j}}{w_{i}}, \frac{y_{i} - y_{j}}{h_{i}}, \frac{w_{j}}{w_{i}}, \frac{h_{j}}{h_{i}})

(2)

where

w_{i}

and

h_{i}

denote the width and height of the

i

-th candidate bounding box. Based on these normalized distances, connections are established between each candidate box and its nearest neighbor, creating a spatially oriented graph.

The second graph type employs the distance between feature vectors, using a k-nearest neighbor algorithm to connect candidate boxes. These two graph types are subsequently concatenated to form a unified graph structure. Assuming the presence of K candidate boxes in the final output, with feature vectors denoted as

X = {[x_{1}, \dots, x_{n}]}^{⊤}

, edges in the semantic-based graph are constructed using Euclidean distances between these features. Each vertex represents a visual object within the image, and each edge connects a vertex to its k nearest neighbors, resulting in k edges involving k vertices. Ultimately, an incidence matrix

H \in R^{N \times N}

with

N \times K

edges is generated.

The Graph-based Loss for Intensive Detection is defined as follows:

L_{g} = - \frac{1}{N} \sum_{i} \sum_{c = 1}^{M} y_{i c} \log (p_{i c}^{*}) + {‖p^{*} - p‖}_{2}

(3)

where

p_{i c}^{*}

denotes the c-th confidence score of the i-th sample in the secondary classification. The regularization term

{‖p^{*} - p‖}_{2}

ensures consistency between primary and secondary classifications, where

p

refers to the initial predicted classification score and

p^{*}

represents the secondary predicted classification score.

By implementing these graph structures, prior knowledge from extensive sample datasets can be leveraged to perform secondary corrective predictions on object detection categories, leading to more accurate identification and classification of object categories.

4. Experimental Analysis of Deterioration Detection for Architectural Heritage Imagery

4.1. Dataset for Heritage Deterioration Detection

4.1.1. Building Deterioration Data Collection

This study employs an original, self-created dataset derived from systematic data collection in typical architectural heritage sites, including Diji City and Shangzhuang Village in Shanxi, the main campus and residential area of Beijing Jiaotong University, the Juyongguan section of the Great Wall in Beijing, and the ancient Great Wall in Qian’an, Hebei. A total of 7719 images of multi-material architectural heritage were obtained. Specifically, 3468 images were collected from Shanxi’s Diji City and Shangzhuang Village, capturing tile roofs, brick and stone walls, brick and stone pavements, and wooden beams. From Beijing Jiaotong University, 1368 images of brick pavements were acquired. In the Juyongguan Great Wall area, 976 images of brick walls were captured. Additionally, 1035 images of brick walls and 872 images of brick pavements were collected from the ancient Great Wall in Qian’an, Hebei (Table 1).

For roof elements, aerial images were captured using DJI Air 3 drones, with the camera sensor parallel to the ground plane. For other structural elements, ground-level photographs were acquired using smartphones, with the lens plane parallel to the target surface in landscape orientation at a consistent distance of approximately 1.5 m. All images were acquired under natural daylight conditions (10:00–17:00 daily) to ensure uniform illumination while minimizing shadows. No image enhancement or preprocessing techniques such as sharpening, contrast adjustment, color correction, noise reduction, or geometric transformations were applied during data processing.

To address the heterogeneous nature of architectural heritage image data, a standardized preprocessing system was developed. This system comprises quality control, classification indexing, and dataset standardization to meet the requirements of intelligent detection model training. The implementation details are as follows:

In the data quality control phase, a dual screening mechanism, combining manual and algorithmic methods, was employed to exclude non-conforming images. This process involved removing images with quality defects such as focus inaccuracies (MTF < 0.3), compositional tilt (angle deviation > 5°), and motion blur (blur kernel diameter > 5 pixels), as well as those with environmental interference like significant shadowing, leaf obstructions, or debris covering key areas. These steps ensured that subsequent processing would be based on quality-assured images, establishing a foundation for classification indexing and dataset standardization.

4.1.2. Building Deterioration Data Annotation

During the dataset partitioning stage, stratified sampling was employed to divide 4581 images into a training set (3339 images) and a test set (1242 images) at a 7:3 ratio. For categories with fewer samples (≤200 per category), a 6:4 split was used, while large sample categories, such as 2593 images of brick pavements and 2727 of walls, ensured coverage through random sampling. All images were converted to JPEG format, suitable for machine learning inputs, and stored in a distributed file system to provide a standardized data interface for model training (Table 2).

The annotation process strictly adhered to the pre-established damage classification system, which includes the following: Level I—Slight Deterioration; Level II—Moderate Deterioration; Level III—Severe Deterioration. Through the LabelImg platform, precise bounding boxes and detailed annotations of damage regions were created. YOLO-v11 played a crucial auxiliary role due to its excellent feature extraction and object localization capabilities. By analyzing semantic features and spatial positioning of damage targets, both annotation efficiency and accuracy were significantly improved. For instance, roof tile images were annotated based on quantitative parameters such as crack geometry (length, width), displacement, and biological erosion, allowing for accurate determination and annotation of damage grades according to the classification standards (Figure 3).

Ultimately, 3339 training samples were finalized, each meticulously verified and annotated manually, and stored in standardized XML format. These files document target categories corresponding to damage grades, precise spatial coordinates (xmin, ymin, xmax, ymax), and semantic attribute features, providing highly structured data input for deep learning model training. The dataset emphasizes accuracy, consistency, and completeness, supporting the extraction of characteristic patterns across various types and damage levels of architectural heritage, establishing a solid data foundation for intelligent, high-precision damage detection (Figure 4).

4.1.3. Statistical Analysis of Dataset

To ensure the effectiveness and generalizability of the damage detection method for architectural heritage, a rigorous selection process was employed to curate 4581 images. These images were systematically divided into a training set comprising 3339 images and a test set containing 1242 images, thus providing robust data support for model training and evaluation. During the damage detection procedures applied to the training set images, an in-depth analysis was conducted on the positional and dimensional distribution characteristics of the targets. As shown in Figure 5, the top-left subgraph presents a two-dimensional coordinate (x, y) distribution of damage targets within image space, highlighting the extensive and diverse scene coverage of the training set. The bottom subgraphs illustrate the correlation distribution of width, height, and (x, y) coordinates, as well as one-dimensional histograms of width and height, systematically revealing the size characteristics and distribution patterns of damage targets. These visual analyses indicate that the post-detection images from the training set exhibit varied positional and dimensional attributes, encompassing spatial layout features of different material types such as wood, brick, stone, and tile structures, along with instances of multi-scale damage. The comprehensive and representative nature of this dataset is ensured by meticulous damage detection and detailed distribution feature analysis within the training set images.

4.2. Experimental Setup and Evaluation Metrics

4.2.1. Experimental Environment Configuration

The hardware environment for this experiment was configured with an Intel i5-10400 processor, an NVIDIA GTX 1660 GPU, supported by 16 GB of RAM and a 512 GB hard drive, providing stable computational support for model training and inference. Both development and runtime environments operated on Windows 10/11, utilizing Microsoft vs. Code as the development platform. Additional support environments included Chrome and Microsoft vs. Code with programming conducted in Python 3.8 and HTML. This environment configuration is closely aligned with the objectives of intelligent damage detection and quantitative analysis of architectural heritage, ensuring experimental efficiency and code compatibility.

4.2.2. Evaluation Metric Selection

To comprehensively evaluate the model’s performance in detecting damage in architectural heritage, the following core evaluation metrics were selected: Precision (P), Recall (R), mean Average Precision at an Intersection over Union (IoU) of 0.5 (mAP50), and mean Average Precision across IoUs ranging from 0.5 to 0.95 (mAP50-95). Precision measures the proportion of correctly identified damage targets among those detected by the model, while Recall indicates the extent to which actual damage targets are detected by the model. Average Precision (AP) calculates the area under the Precision-Recall curve for a specific Intersection over Union (IoU) threshold. Mean Average Precision (mAP) is the average AP over all object classes. Therefore, mAP50 provides a holistic assessment of the model’s accuracy at an IoU threshold of 0.5, while mAP50-95 evaluates its robustness by averaging the mAP across IoU thresholds from 0.5 to 0.95 in steps of 0.05.

4.2.3. Initial Model Performance Presentation

The experiment employed the YOLOv11 model as the base detection model, with its structure and initial detection performance illustrated in the accompanying figures. The model integrates into a 100-layer network comprising 9,413,961 parameters and requires 21.3 GFLOPs for computation. Average inference time per image is 35.6 ms. In terms of detection performance metrics, Box (P) indicates bounding box precision, while Box (R) signifies bounding box recall. The statistical results show an overall precision rate of 0.432, a recall rate of 0.388, mAP50 of 0.344, and mAP50-95 of 0.161 across all categories. Specifically, for Class I (Mild Deterioration), the precision is 0.417, recall is 0.358, mAP50 is 0.321, and mAP50-95 is 0.156. For Class II (Moderate Deterioration), the precision is 0.43, recall is 0.374, mAP50 is 0.332, and mAP50-95 is 0.167. For Class III (Severe Deterioration), the precision is 0.45, recall is 0.433, mAP50 is 0.379, and mAP50-95 is 0.159. These data delineate the model’s initial performance characteristics in architectural heritage damage detection tasks, providing essential baselines for subsequent experimental configurations, comparative model performance evaluations, and ablation studies.

As illustrated in Figure 6, the model’s detection performance metrics reveal certain traits and areas for enhancement. Overall indicators such as precision (P = 0.432), recall (R = 0.388), and mAP50 (0.344) suggest that improvements are necessary to enhance accuracy and completeness in detecting damage within architectural heritage, especially in scenarios involving complex background interference and subtle damage feature capture. Category-specific metrics indicate that the model performs relatively better in detecting severe deterioration (Class III) with higher precision (0.45) and recall (0.433), reflecting effective detection of well-defined severe damage features. Conversely, the lower mAP50 for mild deterioration (Class I: 0.321) and moderate deterioration (Class II: 0.332) reflect the need to enhance detection capabilities for less pronounced features.

A comprehensive analysis of the model’s performance metrics reveals that the inconsistent effectiveness in detecting varying degrees of deterioration largely results from the model’s preferential perception of high-contrast, large-scale damage features over low-saliency targets. Specifically, the model demonstrates optimal recognition efficacy for Severe Deterioration (Class III) due to its pronounced destructive characteristics and well-defined boundaries. Performance for Moderate Deterioration (Class II) is intermediate, while Mild Deterioration (Class I) exhibits systematically suboptimal detection accuracy. This limitation arises from inherently subtle features often misclassified as natural building material textures, coupled with ambiguous damage boundaries that compromise annotation consistency.

These findings underscore the necessity for sustained training iterations and optimization efforts to enhance graded deterioration detection, particularly for mild damage scenarios. Future work should focus on refining feature discrimination mechanisms, particularly in improving feature extraction capabilities and adapting to various degrees of damage, aiming to boost the precision and reliability of architectural heritage damage detection.

4.3. Performance Comparison of Deterioration Detection Methods

In the task of damage detection for architectural heritage, model performance is a critical factor determining the reliability and practical applicability of the detection results. Figure 7 illustrates dynamic changes in loss functions and detection performance metrics during the model’s training and validation phases, providing essential insights for an in-depth analysis of model performance.

For the loss function part, the upper section depicts the training-phase losses, including bounding box loss (train/box_loss), classification loss (train/cls_loss), and distribution focal loss (train/dfl_loss). The lower section shows the corresponding validation-phase losses (val/box_loss, val/cls_loss, val/dfl_loss). Bounding box loss (box_loss) measures the accuracy of damage localization within architectural heritage, classification loss (cls_loss) reflects errors in damage categorization, and distribution focal loss (dfl_loss) focuses on the precision of prediction box distribution. Both the training and validation loss curves exhibit a significant downward trend with similar patterns, reflecting the model’s effective learning of architectural heritage damage features throughout training. This indicates continuous parameter optimization to minimize prediction errors while avoiding overfitting, thereby demonstrating robust stability and generalization capabilities. Such foundations are crucial for accurate damage detection in architectural heritage contexts.

The detection performance metric curves on the right side further depict the enhancement of the model’s detection capabilities. As training epochs progress, these metrics increase steadily and stabilize, indicating improved effectiveness in detecting deterioration. In complex scenarios involving diverse architectural materials and components, such as brick-tile walls in traditional villages or masonry structures in extensive linear heritage sites, the model successfully identifies deterioration targets while minimizing false positives and false negatives.

As shown in Figure 8, the Confusion Matrix Normalized and Confusion Matrix effectively present category prediction versus ground-truth matching from probability distribution and sample quantity dimensions.

The Confusion Matrix Normalized shows the proportionate distribution of predicted versus actual categories, where element values represent the probability of real category samples being predicted as each target category. Color depth maps the probability, with values approaching 1 indicating higher concentration in the prediction. The Confusion Matrix displays the numerical distribution of samples, with rows corresponding to true categories and columns to predicted categories. Matrix element values indicate the number of real category samples predicted as each category, offering an intuitive overview of prediction distribution details across categories.

With the first row as an example (assuming it represents a specific damage category), the normalized confusion matrix shows a correct prediction ratio of 0.36, indicating insufficient feature learning for this category by the model. A high misclassification rate of 0.57 indicates a tendency to incorrectly classify samples as background, reflecting vulnerability to background interference. This is corroborated by the raw confusion matrix, where 2216 true category samples were correctly predicted, yet 2705 were misclassified as background. Additionally, 151 and 25 samples were erroneously identified as other damage categories, highlighting significant challenges in distinguishing this category from both the background and other categories. In the second row, the normalized confusion matrix shows a correct prediction ratio of 0.37, with 0.32 misclassified as background and 0.02 and 0.05 as other categories. Similarly, the raw matrix reveals that 1345 samples were correctly classified, while 1537 were misclassified as background, and 188 were misidentified as other damage categories. These results indicate similar issues with background confusion and inter-category misclassification.

4.4. Establishment and Visualization of the Deterioration Detection System

The development of the PC-based intelligent detection system adhered rigorously to a comprehensive technical methodology [27], ensuring optimal functionality, usability, and stability. As depicted in the “Software Design Workflow” diagram, the system’s development initiated with requirements analysis, proceeded through architecture design, visual design, software development, and testing, concluding with store listing optimization. This sequence established a coherent development chain (Figure 9).

During the requirements analysis phase, the system was clearly defined as a professional-grade tool for architectural heritage conservation. Core functionalities, such as multi-modal data acquisition and intelligent deterioration identification, were specified alongside competitive product analysis and styling organization. In the architecture design phase, activities included information architecture design—planning software page hierarchies and navigation structures—and interaction design, which simulated operational workflows like data import and algorithm selection logic. These efforts ensured rational interface layouts and user-friendly operational flows.

In the software development phase, the frontend team constructed the user interface, while the backend provided support through multi-modal data fusion functionalities and implemented intelligent detection algorithms. This enabled the core workflow of the system: “Data Input → Intelligent Processing → Visual Analysis → Results Output”. During the software testing phase, system quality was ensured through functional testing (verifying module functionalities such as detection algorithm accuracy and report generation completeness), multi-platform compatibility testing (ensuring stable operation on Windows, Ubuntu, macOS), and UI testing (examining interface display smoothness and interaction responsiveness). Ultimately, the system underwent review and continuous refinement to ensure efficient and stable practical application.

The PC-based intelligent detection system is designed for intelligent detection and quantitative analysis of architectural heritage deterioration through a multi-module integrated design. Interface functionalities are divided into four primary zones: “Home”, “Explore”, “Messages”, and “Me”, enhancing user interactivity while refining core functions. The system provides an efficient, precise, and flexible intelligent solution for detecting architectural heritage deterioration, meeting diverse user needs such as those of cultural heritage institutions and researchers.

A notable development within the project is the “Deterioration Identification” module, which manages processes from data input to results output. Following a four-stage workflow, the system architecture includes four main functional modules: Data Acquisition and Input, Intelligent Detection and Analysis, Visualization and Interaction, and Report Generation and Management. Auxiliary functions, such as a Historical Database, Knowledge Base System, and Cloud Collaboration, supplement these modules, forming a comprehensive process management system. Within the Data Acquisition and Input Module, the system supports the import and preprocessing of multi-source heterogeneous data. Users can load image files (in formats such as JPEG/PNG/PDF) via the “Select File” menu or “Real-time Capture” option on the bottom toolbar. Subsequently, the system automatically reads image metadata, including capture time and sensor parameters (Figure 10).

The Intelligent Detection and Analysis Module incorporates multiple detection algorithms. After data input, users can click “Detection” on the bottom sidebar to view real-time progress and intermediate results, facilitating precise monitoring of analytical status for automated deterioration detection and quantitative analysis. Intermediate results are intuitively visualized, as shown in the right-side image of Figure 11. The system automatically identifies deteriorated sections of architectural heritage, annotates distinct deterioration grades with color-coded bounding boxes, and displays confidence values (e.g., 0.67, 0.75) to indicate the reliability of each detection outcome. This methodology achieves efficient automated detection and quantitative analysis of architectural heritage deterioration while maintaining accuracy and objectivity (Figure 11).

The Visualization and Interaction Module provides multi-dimensional analysis support. Users can select data from various periods and employ annotation tools (brush/rectangle) to manually label specific pathologies, add textual annotations, and document conservation recommendations to meet in-depth analysis requirements. Features under development include “Nearby Detection” and “Collaborative Knowledge Base Co-Construction for Deterioration”, aimed at enhancing data linkage and visual interactivity, generating thermal maps illustrating deterioration evolution, and facilitating user participation in technical optimization. This approach allows dynamic updates to the deterioration classification framework and database, enhancing the software’s accuracy in identifying deterioration.

The Report Generation and Management Module automates the creation and export of detection reports. With intelligent data extraction technology, the system automatically compiles detection results, accurately mapping content to corresponding templates and eliminating manual entry errors, thereby significantly improving report compilation efficiency. Additionally, the module implements a structured archival system for historical detection records, enabling users to quickly retrieve past inspections through multidimensional filters (e.g., building name, inspection date, deterioration type). This function facilitates longitudinal tracking and comparative analysis of deterioration progression in specific architectural heritage, establishing comprehensive data chains to support long-term monitoring and adaptive conservation strategies.

4.5. Partial Results of Deterioration Detection

As depicted in Figure 12, Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17, the following are photographs captured in Shanxi, Beijing and Hebei, along with deterioration detection processing results generated after input into the software.

4.6. Cross-Material Deterioration Detection

Beyond general performance, we also evaluated the cross-material generalization performance of the proposed method. Specifically, the network was trained on the four distinct datasets from four different region and validated on the cross-region data. The cross-region detection precision was statistically analyzed, with experimental results detailed in Table 3. It can be observed that the models trained on datasets from the same material achieve the best detection performance. However, cross-material detection also demonstrates relatively stable results, indicating that our method exhibits strong cross-material generalization capabilities.

5. Conclusions and Future Work

This paper establishes a cross-material standard for grading architectural damage. By employing a multi-perspective approach, field investigations were carried out in Shanxi, Beijing, and Hebei to classify and annotate damage conditions across various building materials, thereby creating a training dataset. The core algorithm used for model training and optimization is YOLO-v11, which facilitates the automatic classification and identification of architectural damage. Subsequently, a mobile intelligent detection system was developed to ensure ease of use and stability.

The study identifies several limitations that warrant attention. Firstly, there are data collection constraints due to insufficient imagery of certain material deteriorations, influenced by factors such as site accessibility and environmental conditions. This limitation impedes model training and diminishes the software’s generalizability. Secondly, there are gaps in typological coverage. Although the research addresses ancient dwellings, modern structures, and linear heritage, further validation is necessary to assess applicability to industrial sites, religious architecture, and other heritage typologies. Future research should prioritize conducting expanded field surveys across underrepresented heritage categories, enhance database comprehensiveness to improve recognition accuracy, and develop adaptive algorithms for cross-typological damage detection.

While a cross-material classification standard for architectural damage has been established, practical applications reveal significant variation in damage characteristics across different regions and historical periods. Consequently, future investigations must refine these classification standards to enhance the model’s adaptability and accuracy within diverse contexts.

Despite the comprehensive software development process and functional interface of the mobile intelligent detection system, further enhancements in both system functionality and user interface (UI) design are crucial to optimize user experience. The software developed in this study enables rapid identification of damaged locations and severity levels, facilitating predictive analysis of potential structural issues. This capability empowers conservation professionals to implement targeted repair strategies, effectively prolonging the service life of architectural heritage while preserving its historical and cultural significance. Specifically, the system offers precise detection of facade detachment and structurally compromised buildings, provides data-driven insights for urban renewal planning, and presents a scientific basis for optimizing resource allocation. As vital components of urban historical fabric, architectural heritage requires technologically enhanced protection methodologies to ensure sustainable conservation and transmission of cultural values.

The proposed cross-material architectural heritage damage detection method not only enhances detection accuracy and reliability but also presents innovative technical solutions for heritage conservation. Continuous optimization of model performance and expansion of dataset coverage can further improve detection capabilities, advancing architectural heritage preservation practices.

Furthermore, the developed mobile intelligent detection system serves an educational purpose by disseminating knowledge on damage identification, raising awareness of heritage conservation, and encouraging broader societal participation in preservation efforts. It fosters a collaborative environment for long-term, sustainable heritage protection. This integrated approach significantly contributes to cultivating a conservation-conscious society while promoting technological empowerment in safeguarding cultural heritage.

Author Contributions

Conceptualization, L.X.; Data curation, Q.Y. and X.Y.; Formal analysis, Q.Y. and X.Y.; Funding acquisition, L.X.; Investigation, Q.Y. and X.Y.; Methodology, Q.Y., X.Y. and L.X.; Project administration, L.X.; Resources, L.X.; Software, Q.Y., X.Y. and L.X.; Supervision, L.X.; Validation, Q.Y. and X.Y.; Visualization, Q.Y.; Writing—original draft, Q.Y. and X.Y.; Writing—review and editing, L.X. All authors have read and agreed to the published version of the manuscript.

Funding

This study supported by National Natural Science Foundation of China No. 52008019.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this article are not readily available due to policy and qualification restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liang, W.; Ahmad, Y.; Mohidin, H.H.B. The Development of the Concept of Architectural Heritage Conservation and Its Inspiration. Built Herit. 2023, 7, 21. [Google Scholar] [CrossRef]
Xiao, Y.; Tian, Z.; Yu, J.; Zhang, Y.; Liu, S.; Du, S.; Lan, X. A Review of Object Detection Based on Deep Learning. Multimed. Tools Appl. 2020, 79, 23729–23791. [Google Scholar] [CrossRef]
Guo, X.; Jiang, F.; Chen, Q.; Wang, Y.; Sha, K.; Chen, J. Deep Learning-Enhanced Environment Perception for Autonomous Driving: MDNet with CSP-DarkNet53. Pattern Recognit. 2025, 160, 111174. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 2017 Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; IEEE: Long Beach, CA, USA, 2017; pp. 5998–6008. [Google Scholar] [CrossRef]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Salt Lake City, UT, USA, 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
Zhang, J. Application Research of BIM Technology in the Protection and Restoration of Modern Historical and Stylistic Architecture. Master’s Thesis, Tianjin University, Tianjin, China, 2018. [Google Scholar]
De Luca, L. A Digital Ecosystem for the Multidisciplinary Study of Notre-Dame de Paris. J. Cult. Herit. 2024, 65, 206–209. [Google Scholar] [CrossRef]
Li, Y.; Du, Y.; Yang, M.; Liang, J.; Bai, H.; Li, R.; Law, A. A Review of the Tools and Techniques Used in the Digital Preservation of Architectural Heritage within Disaster Cycles. Herit. Sci. 2023, 11, 199. [Google Scholar] [CrossRef]
Guo, L.; Ma, W.; Gong, X.; Zhang, D.; Zhai, Z.; Li, M. Digital Preservation of Classical Gardens at the San Su Shrine. Herit. Sci. 2024, 12, 66. [Google Scholar] [CrossRef]
Zhao, J.; Zhong, H.; Zhu, Z.; Zhang, D.; Pei, Q.; Liu, H. An Active Infrared System for Identifying the Flaking Disease in Qingyang North Grotto Temple. J. Cult. Herit. 2024, 66, 392–397. [Google Scholar] [CrossRef]
Li, W.; Xie, Q.; Ao, J.; Lin, H.; Ji, S.; Yang, M.; Sun, J. Systematic Review: A Scientometric Analysis of the Status, Trends and Challenges in the Application of Digital Technology to Cultural Heritage Conservation (2019–2024). Npj Herit. Sci. 2025, 13, 90. [Google Scholar] [CrossRef]
Liu, B.; Wu, C.; Xu, W.; Shen, Y.; Tang, F. Emerging Trends in GIS Application on Cultural Heritage Conservation: A Review. Herit. Sci. 2024, 12, 139. [Google Scholar] [CrossRef]
Ergun, B.; Sahin, C.; Bilucan, F. Level of Detail (LoD) Geometric Analysis of Relief Mapping Employing 3D Modeling via UAV Images in Cultural Heritage Studies. Herit. Sci. 2023, 11, 194. [Google Scholar] [CrossRef]
Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Honolulu, HI, USA, 2017; pp. 936–944. [Google Scholar] [CrossRef]
Shao, Y.; Zhang, D.; Chu, H.; Zhang, X.; Rao, Y. A Review of YOLO Object Detection Based on Deep Learning. J. Electron. Inf. Technol. 2022, 44, 3697–3708. [Google Scholar] [CrossRef]
Chen, X.; He, J.; Wang, S. Deep Learning-Driven Pathology Detection and Analysis in Historic Masonry Buildings of Suzhou. Npj Herit. Sci. 2025, 13, 197. [Google Scholar] [CrossRef]
Mishra, M.; Lourenço, P.B. Artificial Intelligence-Assisted Visual Inspection for Cultural Heritage: State-of-the-Art Review. J. Cult. Herit. 2024, 66, 536–550. [Google Scholar] [CrossRef]
Zhu, X.; Zhu, Q.; Zhang, Q.; Du, Y. Deep Learning-Based 3D Reconstruction of Ancient Buildings with Surface Damage Identification and Localization. Structures 2025, 73, 108383. [Google Scholar] [CrossRef]
Wang, H. Cultural Relics Conservation Science, 1st ed.; Cultural Relics Press: Beijing, China, 2009; pp. 101–401. [Google Scholar]
Chinese Academy of Cultural Heritage. Conservation and Restoration Techniques for Chinese Cultural Relics, 1st ed.; Science Press: Beijing, China, 2009; pp. 20–239. [Google Scholar]
UNESCO. Convention Concerning the Protection of the World Cultural and Natural Heritage. In Proceedings of the 17th General Conference, Paris, France, 16 November 1972; p. 3. [Google Scholar]
ICOMOS. Principles for the Conservation of Wooden Built Heritage. In Proceedings of the 19th ICOMOS General Assembly, New Delhi, India, 11–15 December 2017; p. 1. [Google Scholar]
Mahaur, B.; Mishra, K.K. Small-Object Detection Based on YOLOv5 in Autonomous Driving Systems. Pattern Recognit. Lett. 2023, 168, 115–122. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. Scaled-YOLOv4: Scaling Cross Stage Partial Network. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Nashville, TN, USA, 2021; pp. 13024–13033. [Google Scholar] [CrossRef]
Yan, J.; Yu, Y.; Zhu, X.; Lei, Z.; Li, S.Z. Object Detection by Labeling Superpixels. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; IEEE: Boston, MA, USA, 2015; pp. 5107–5116. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Las Vegas, NV, USA, 2016; pp. 770–778. [Google Scholar] [CrossRef]
Shi, L. Application of Layered Technology in Computer Software Design. In Proceedings of the 2023 9th Annual International Conference on Network and Information Systems for Computers (ICNISC), Wuhan, China, 27–29 October 2023; IEEE: Wuhan, China, 2023; pp. 552–557. [Google Scholar] [CrossRef]

Figure 1. Schematic visualization of the deterioration detection workflow using image object detection.

Figure 2. Integrated workflow for multi-material building deterioration.

Figure 3. Visualization of annotation and grading methodology for roof tile deterioration: (a) input photographs; (b) damaged area identification; (c) damage severity grading.

Figure 4. Visual interface for model training workflow execution samples.

Figure 5. Visualization of target detection in training images: matrix cell color intensity reflects label correlation strength (darker = stronger), while subplots show spatial (x/y coordinates) and dimensional (width/height) distributions.

Figure 6. Visualization of the YOLOv11 architectural framework and performance evaluation metrics.

Figure 7. Training and validation metrics for object detection: (a) training bounding box loss; (b) training classification loss; (c) training distributed focal loss; (d) bounding box detection precision; (e) bounding box detection recall; (f) validation bounding box localization loss; (g) validation classification loss; (h) validation distributed focal loss; (i) mean average precision@IoU = 0.5 (mAP50); (j) mean average precision@IoU = 0.5:0.95 (mAP50-95).

Figure 8. Confusion matrix analysis for heritage pathology classification: (a) normalized confusion matrix; (b) absolute confusion matrix.

Figure 9. Visualization of the software implementation workflow for architectural heritage conservation.

Figure 10. Visualization of file import operations for the “deterioration identification” module interface.

Figure 11. Functionality demonstration of the deterioration identification module: (a) input image (pre-detection state); (b) output with identified pathology markers and quantitative grading.

Figure 12. Brick paving tile deterioration assessment at Diji Fortress, Shanxi: (a) on-site photographed example 1 of brick paving tile; (b) software processing output result for brick paving tile example 1 photo; (c) on-site photographed example 2 of brick paving tile; (d) software processing output result for brick paving tile example 2 photo.

Figure 13. Brick wall surface defects documentation at Diji Fortress, Shanxi: (a) on-site photographed example 1 of brick wall surface; (b) software processing output result for brick wall surface example 1 photo; (c) on-site photographed example 2 of brick wall surface; (d) software processing output result for brick wall surface example 2 photo.

Figure 14. Tile roof degradation analysis at Shangzhuang Village, Shanxi: (a) on-site photographed example 1 of tile roof; (b) software processing output result for tile roof example 1 photo; (c) on-site photographed example 2 of tile roof; (d) software processing output result for tile roof example 2 photo.

Figure 15. Timber structural member decay evaluation at Shangzhuang Village, Shanxi: (a) on-site photographed example 1 of timber structural member; (b) software processing output result for timber structural member example 1 photo; (c) on-site photographed example 2 of timber structural member; (d) software processing output result for timber structural member example 2 photo.

Figure 16. Brick pavement deterioration at Beijing Jiaotong University campus: (a) on-site photographed example 1 of brick pavement; (b) software processing output result for brick pavement example 1 photo; (c) on-site photographed example 2 of brick pavement; (d) software processing output result for brick pavement example 2 photo.

Figure 17. Brick fortification pavement deterioration in Qian’an Ancient Great Wall, Hebei: (a) on-site photographed example 1 of brick fortification pavement; (b) software processing output result for brick fortification pavement example 1 photo; (c) on-site photographed example 2 of brick fortification pavement; (d) software processing output result for brick fortification pavement example 2 photo.

Table 1. Structured documentation and archiving of building deterioration data.

Survey Period	Location	Investigated Deterioration Types	Images Recorded
June 2024	Diji Fortress and Shangzhuang Village, Shanxi	Tile Roof degradation	670
		Brick wall surface defects	1248
		Stone wall surface defects	178
		Brick paving tile deterioration	450
		Stone paving tile deterioration	160
		Timber structural member decay	762
September 2024	Beijing Jiaotong University Campus and Residential Area	Brick paving tile deterioration	1368
October 2024	Juyongguan Pass Great Wall Section, Beijing	Masonry surface deterioration of brick fortification walls	976
October 2024	Qian’an Ancient Great Wall Section, Hebei	Masonry surface deterioration of brick fortification walls	1035
October 2024	Qian’an Ancient Great Wall Section, Hebei	Masonry surface deterioration of brick fortification pavements	872

Table 2. Dataset curation and classification protocol for architectural deterioration analysis.

Location	Structural Element	Training Set	Validation Set
Shanxi Traditional Villages	Ceramic Roof Tiles	142	59
	Brick Masonry Walls	416	162
	Stone Masonry Walls	(Insufficient Samples)	-
	Brick Paving	150	117
	Stone Paving	(Insufficient Samples)	-
	Timber Structural Members	200	160
Beijing Jiaotong University	Brick Paving	777	200
Juyongguan Pass, Great Wall	Fortification Brick Walls	599	196
Qian’an Ancient Great Wall	Fortification Brick Walls	665	196
Qian’an Ancient Great Wall	Brick Paving	390	152

Table 3. Precision of Cross-Material Detection Performance.

Training\Validation Datasets	Shanxi Villages	Beijing Jiaotong Univ.	Juyongguan Pass
Shanxi Villages	0.489	0.407	0.412
Beijing Jiaotong Univ.	0.398	0.423	0.383
Juyongguan Pass	0.340	0.349	0.379
Qian’an Ancient	0.339	0.327	0.354

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, Q.; Yuan, X.; Xu, L. Cross-Material Damage Detection and Analysis for Architectural Heritage Images. Buildings 2025, 15, 3100. https://doi.org/10.3390/buildings15173100

AMA Style

Yu Q, Yuan X, Xu L. Cross-Material Damage Detection and Analysis for Architectural Heritage Images. Buildings. 2025; 15(17):3100. https://doi.org/10.3390/buildings15173100

Chicago/Turabian Style

Yu, Qingman, Xin Yuan, and Lingyu Xu. 2025. "Cross-Material Damage Detection and Analysis for Architectural Heritage Images" Buildings 15, no. 17: 3100. https://doi.org/10.3390/buildings15173100

APA Style

Yu, Q., Yuan, X., & Xu, L. (2025). Cross-Material Damage Detection and Analysis for Architectural Heritage Images. Buildings, 15(17), 3100. https://doi.org/10.3390/buildings15173100

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cross-Material Damage Detection and Analysis for Architectural Heritage Images

Abstract

1. Introduction

2. Related Work of Architectural Heritage Damage Detection

2.1. Digital Technologies for Architectural Heritage

2.2. Non-Destructive Testing (NDT) for Architectural Heritage

2.3. Image Object Detection Technology for Architectural Applications

2.4. Summary of Research Status

3. Cross-Material Intensive Deterioration Detection Methodology for Architectural Heritage Imagery

3.1. Integrated Workflow for Heritage Deterioration Detection

3.2. Development of a Cross-Material Classification Standard for Building Deterioration

3.3. Intensive Object Detection Algorithm for Architectural Image

3.3.1. Overall Framework

3.3.2. Superpixel-Based Loss for Coarse Detection

3.3.3. Graph-Based Loss for Intensive Detection

4. Experimental Analysis of Deterioration Detection for Architectural Heritage Imagery

4.1. Dataset for Heritage Deterioration Detection

4.1.1. Building Deterioration Data Collection

4.1.2. Building Deterioration Data Annotation

4.1.3. Statistical Analysis of Dataset

4.2. Experimental Setup and Evaluation Metrics

4.2.1. Experimental Environment Configuration

4.2.2. Evaluation Metric Selection

4.2.3. Initial Model Performance Presentation

4.3. Performance Comparison of Deterioration Detection Methods

4.4. Establishment and Visualization of the Deterioration Detection System

4.5. Partial Results of Deterioration Detection

4.6. Cross-Material Deterioration Detection

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI