Research on Identification, Evaluation, and Digitization of Historical Buildings Based on Deep Learning Algorithms: A Case Study of Quanzhou World Cultural Heritage Site

Wang, Siqi; Zhang, Jiahao; Tun, Aung Nyan; Sein, Kyi

doi:10.3390/buildings15111843

Open AccessArticle

Research on Identification, Evaluation, and Digitization of Historical Buildings Based on Deep Learning Algorithms: A Case Study of Quanzhou World Cultural Heritage Site

¹

School of Architecture, Huaqiao University, No. 668 Jimei Avenue, Jimei District, Xiamen 361021, China

²

Urban and Rural Architectural Heritage Protection Technology Key Laboratory of Fujian Province, No. 668 Jimei Avenue, Jimei District, Xiamen 361021, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Buildings 2025, 15(11), 1843; https://doi.org/10.3390/buildings15111843

Submission received: 5 May 2025 / Revised: 21 May 2025 / Accepted: 26 May 2025 / Published: 27 May 2025

(This article belongs to the Special Issue New Insights on the Intelligent Preservation of Architectural Heritage)

Download

Browse Figures

Versions Notes

Abstract

Historical buildings have important historical and social value, but they are generally difficult to identify, complicated to evaluate, and insufficiently addressed in digitization research. On 25 July 2021, Quanzhou successfully applied for World Heritage status. In this case study, Qiaonan Village in the Quanzhou World Heritage Area is selected, and a deep learning algorithm is proposed for the identification, evaluation, and digitization of historical buildings. By comparing multiple models, the optimal model is selected for intelligent identification and classification of building elevations. Combined with GIS, a distribution map of the village buildings is created for digitization research. An intelligent monitoring platform is built to enable dynamic monitoring and hierarchical protection of the buildings, aiding in the protection of historical structures and the sustainable development of the tourism industry. In the future, we will continue to optimize the integration of YOLO and GIS and explore a more universal model for the intelligent protection of historical buildings.

Keywords:

deep learning algorithm; historical buildings; GIS; intelligent recognition; dynamic monitoring

1. Introduction

Quanzhou, located on the southeastern coast of Fujian Province, is an important port city in China and was a hub of world trade and cultural exchange during the Song and Yuan dynasties. Economic prosperity promoted the fusion of Chinese and foreign cultures, shaping Quanzhou’s unique architectural style and leaving behind a rich cultural heritage for China.

On 25 July 2021, Quanzhou was successfully nominated as a World Heritage Site [1]. The Quanzhou heritage consists of 22 representative monuments, sites, and their surrounding environments. Luoyang Bridge, one of the listed monuments, is the oldest surviving beam-type stone bridge across the sea in China and is a national key cultural relics protection unit [2].

Located at the mouth of the Luoyang River, Qiaonan Village forms a symbiotic relationship with Luoyang Bridge. Within the village, many ancient Minnan Gu-Cuo and ancestral temples have been preserved. Some of the buildings combine traditional Minnan architecture with Western architectural elements, forming a unique East-meets-West style that constitutes an important part of China’s cultural heritage.

In the process of digitizing historical buildings in Qiaonan Village (hereinafter referred to as Qiaonan Village) in the Quanzhou World Heritage Area, several problems were encountered:

Since the reform and opening up in 1978, China’s urbanization has developed rapidly, and many villagers in Qiaonan Village have moved to towns for employment or settlement. A large number of new residential buildings have been constructed in the village, disconnecting it from its traditional architectural culture [3,4,5]. The phenomenon of “empty nesting” of historical buildings is serious, causing some historical buildings to fall into disrepair, suffer elevation damage, or even become dangerous. Some historical buildings are used as chicken or duck houses, which is highly detrimental to their preservation [3,4,5].
Qiaonan Village has a long history and rich culture. However, over time, due to natural changes and the lack of systematic planning and management, the village’s current layout has become chaotic, with new and historical buildings intermingled. This disorganized arrangement hinders the protection and unified management of historical buildings [3].
The absence of a building dataset necessitates the collection of information on village structures. However, because the buildings are closely spaced and the village paths are winding and narrow, it is difficult to scan building elevations effectively. This also poses potential fire safety risks.

Artificial intelligence has undergone one of the greatest transformations in recent years, and its ability to categorize and make decisions has, in some cases, surpassed that of humans. Deep learning algorithms have been shown to outperform earlier machine learning techniques in several areas [6]. Although deep learning algorithms are widely used in many fields, their application in the preservation of historic buildings has lagged behind [7,8].

The aim of this paper is to digitize historical buildings in Qiaonan Village through deep learning algorithms, exploring a more efficient and accurate method of classifying and archiving historical buildings, and to realize the identification, evaluation, and digitization of these structures.

The categorization of architectural styles in Chinese traditional settlements (CTSs) is crucial for the development and conservation of traditional villages. Categorizing the architectural styles of Qiaonan Village will aid in the conservation and management of historical buildings, promote local development and tourism, and help preserve local architectural characteristics [8].

The evolution of building types depends on the social, economic, political, and religious conditions of each region, and is also influenced by building materials and climate [9]. Visual inspection is a simple and trustworthy method for categorizing building elevations, but it is subjective and inefficient. Therefore, there is a need to explore effective image processing methods to improve the efficiency of building elevation classification and archiving [10].

Currently, deep learning-based object detection algorithms are mainly divided into two types: region-based two-stage detection algorithms and regression-based single-stage detection algorithms [11]. The main representatives of two-stage detection algorithms are the R-CNN series (R-CNN, Fast R-CNN, and Faster R-CNN), which generate region proposal networks (RPNs) for candidate regions and then perform fine classification and bounding box regression [12]. The main representative of single-stage detection algorithms is the YOLO (You Only Look Once) series, which directly performs dense prediction, completing target classification and localization in a single step, thereby improving detection speed [4].

YOLO (You Only Look Once) is a widely recognized object detection and image segmentation model, known for its efficiency, accuracy, and real-time performance capabilities [13]. Over the years, the YOLO series has undergone continuous iterations, resulting in substantial improvements in both accuracy and inference speed through the introduction of novel architectural modules and optimization techniques [14].

The latest version, YOLOv11, incorporates several advanced modules that enhance its ability to detect fine-grained and occluded targets in complex environments. These include the C3K2 module (Cross Stage Partial with kernel size 2), SPPF (Fast Spatial Pyramid Pooling), and C2PSA (Convolutional Spatial Attention Mechanism) [15]. These components improve multi-scale feature extraction and spatial attention, enabling the model to localize small or partially occluded objects with greater precision.

Compared to its predecessors, YOLOv11 achieves higher detection accuracy and faster inference speed while maintaining efficient parameter usage. This is particularly valuable for historical building analysis, where elevations often contain intricate decorative elements, small architectural features (e.g., window frames, cornices), and partial occlusions from vegetation or neighboring structures. YOLOv11’s enhanced attention mechanism and multi-scale fusion capabilities make it highly effective for accurately detecting and classifying these architectural details, making it an ideal model for historic building elevation recognition tasks [15].

The application of deep learning algorithms has become a significant area of innovation in historic building preservation. For example, Wei et al. (2023) improved BFD-YOLO, based on YOLOv7, for building elevation defect detection, addressing YOLO’s challenges in complex backgrounds and small target detection, and significantly improving detection accuracy [10]. Deep learning algorithms are also used for architectural style classification: R-CNN is suitable for identifying national affiliations of historical buildings through region proposals and CNN feature extraction, while YOLO excels at recognizing structural components through grid-based object localization [7,16].

Siountri et al. (2023) proposed a deep learning method based on YOLO for the automated classification and archiving of historical buildings in Athens [9]. Abraham et al. (2017) introduced sparse features and trained a CNN to recognize the architectural styles of Mexican buildings [17]. Han et al. (2022) applied transfer learning (TL) and AutoAugment techniques to CNNs for the classification of Chinese traditional settlement styles [8,18]. Li et al. (2024) used an improved Faster R-CNN to recognize nine types of riding styles [19]. Nils et al. (2021) used Mask R-CNN for window detection from elevation images [20]. Llamas et al. (2017) demonstrated CNN’s effectiveness by creating the AHE_Dataset with 10 classes of architectural elements [21]. Jia Liu (2023) combined Vision Transformer and CNN to propose the CViT method for chronological style recognition [22]. Gao et al. (2023) applied an improved YOLOv8 model for detecting and preserving traditional gardens in Jiangnan [23]. Qiu et al. (2024) used an improved YOLOv8 model for roof damage detection in Fujian’s traditional village buildings [4]. Although previous versions of YOLO have been applied to historic building conservation with promising results, the latest iteration—YOLOv11—has not yet been widely explored in this field. Therefore, this study investigates the application of YOLOv11 and evaluates its feasibility for the digitalization and intelligent classification of historic buildings.

Although these AI-based explorations lay a foundation for this study, related research remains mostly experimental, and there is a lack of post-identification conservation and management studies. Historic buildings are facing severe challenges, including rapid disappearance due to natural and human factors [24], and the high costs associated with maintenance and management. In the preservation field, there is increasing demand for documentation and visualization of historic buildings [25].

This paper focuses on applying deep learning algorithms to the digitalization of historic buildings, achieving intelligent recognition and classification, and establishing a database to support hierarchical protection and management.

Geographic Information System (GIS) is a critical tool in spatial analysis, widely applied in planning, environmental protection, shoreline zoning, public health improvement, and building distribution analysis [26,27]. GIS has also been employed internationally in historic building conservation research [28,29,30]. For instance, Noardo et al. (2015) integrated PostgreSQL, PostGIS, and QGIS for building information management and spatial data visualization [30,31].

Combining GIS with multi-source data and spatial analysis can significantly enhance the efficiency of historic building preservation. Amorim et al. (2013) applied GIS with panoramic spherical photogrammetry (PSP) to improve data collection accuracy in historic buildings in Bahia, Brazil [24,32]. Recently, 3D-GIS has been used in urban modeling and analysis, such as in Almeida et al. (2016)’s project to tackle urban decay in Portugal through a 3D city model integrating spatial analysis and BIM [33]. Braik et al. (2024) combined satellite imagery, GIS, and CNN models for post-disaster building damage assessment using the XBD dataset [34].

However, most studies focus on macroscopic architectural applications [34,35]. Micro-level applications of GIS to individual historic buildings are still limited. Therefore, this paper explores the detailed application of GIS for protecting historic buildings by building an intelligent detection platform that integrates GIS and deep learning.

Intelligent monitoring platforms have become essential tools in urban and heritage management, integrating diverse measurement technologies, sensor systems, and numerical modeling platforms—such as Historical Building Information Modeling (HBIM)—to support data-driven conservation efforts. In recent years, the integration of 3D models and 2D datasets for the conservation and management of historic buildings has emerged as a significant trend.

Building on the principles of Building Information Modeling (BIM), Murphy et al. (2009) introduced the concept of HBIM as a framework for deriving engineering drawings and 3D models from laser scanning and image-based surveys, enabling the systematic collection, storage, modeling, and interpretation of historic building data in Europe [36,37,38].

Further advancing this field, Dabrowski et al. (2025) proposed a novel approach for symmetry assessment within HBIM datasets by combining terrestrial laser scanning (TLS) and unmanned aerial vehicle (UAV)-based LiDAR (ALS) technologies. Using the Gediminas Tower in Vilnius, Lithuania, as a case study, they demonstrated the feasibility of assessing architectural symmetry through integrated 3D measurement techniques [38].

In addition, photogrammetry has become a standard method for acquiring 3D data, with recent technological advances improving both the speed and precision of point cloud generation. Laser scanning, in particular, has proven highly effective in capturing detailed 3D geometric features of historic buildings, offering high-density data collection—with some scans yielding millions of points per scene—which significantly enhances the accuracy of heritage documentation and modeling processes [39].

In this study, CNN, Fast R-CNN, YOLOv8, and YOLOv11 are compared to identify and classify historic buildings, establish a Qiaonan Village building dataset, integrate information into GIS to generate 2D maps and 3D models, and create a historical building database for monitoring and hierarchical protection.

The main contributions of this paper are as follows:

Identifying and evaluating historical buildings in Qiaonan Village, establishing a database to address the lack of digitized research on Quanzhou’s historical buildings.
Combining YOLOv11’s object detection and GIS for categorizing and archiving Qiaonan Village buildings, replacing traditional methods with a more advanced, economical, and efficient process [40].
Developing a technical route for building an intelligent monitoring platform based on deep learning and GIS for the hierarchical protection of historic buildings.

2. Methods and Data

2.1. Study Area

This study focuses on Qiaonan Village, a typical representative of traditional settlements in southern Fujian, China. The village features well-preserved historical buildings alongside modern complexes. Locals refer to the traditional architecture as red-brick Gu-Cuo. Notably, more than 80% of the existing historical buildings maintain intact masonry and wood structure systems.

2.2. Multi-Model Comparison for Intelligent Classification of Historic Buildings

This study proposes an intelligent classification method for historic buildings through a multi-model comparison framework based on deep learning algorithms. Specifically, we evaluate the performance of four widely used models—CNN, Fast R-CNN, YOLOv8, and the YOLOv11 series—by comparing key performance metrics, including mAP@0.5, mAP@0.5:0.95, GFLOPs, and parameter count (Params). The objective is to identify the most suitable model for the automatic recognition and classification of historical building elevations, particularly under complex architectural and environmental conditions (see Section 3.1 for details). The technical workflow consists of three phases: data acquisition, model training and intelligent recognition, and GIS annotation and visualization (Figure 1).

Since the YOLO series is widely recognized for its high efficiency and accuracy in building elevation detection [10], and CNN and Fast R-CNN have proven effective in traditional building classification [7,12], this study compares different deep learning models to identify the most suitable approach.

The process includes:

Phase 1: Acquiring building elevation images of known ages in Qiaonan Village using cameras and smartphones. Images are annotated using LabelImg to build a labeled dataset.
Phase 2: Training multiple models (CNN, Fast R-CNN, YOLOv8, and YOLOv11) and comparing their performances in building classification tasks to select the optimal model. Model robustness is validated under complex scene conditions.
Phase 3: Integrating classification results into a GIS platform to generate a building distribution map based on geographic coordinates. Historical buildings are further scanned to create 3D point cloud models, where 2D photos, 3D models, and chronological labels are linked to their real-world locations. An intelligent management platform is built to support the hierarchical protection of historical buildings.

2.3. Historic Building Facade Data Collection

2.3.1. Elevation Data Set Construction

We selected 21 buildings in Qiaonan Village with known construction dates, including 4 buildings from the Qing Dynasty (QD), 3 buildings from the Republic of China (RC) period, and 14 buildings from the People’s Republic of China (PRC) period.

A total of 89 high-precision elevation images were captured using a Sony A6000 camera (with a resolution of 0.05 m) and smartphones under varying light and angle conditions. Building elevations were labeled with bounding boxes and annotated with their corresponding chronological labels (QD, RC, PRC) using LabelImg.

To address occlusion problems caused by the dense building layout and narrow trails in Qiaonan Village, the dataset was expanded using geometric transformations such as HSV color space enhancement, image translation, affine transformation, and perspective transformation.

Additionally, probability-controlled data augmentation methods were applied to enhance the model’s ability to generalize to occluded targets by introducing multi-dimensional feature perturbations. As a result, the dataset was expanded to a total of 3150 images. The labels of each chronology are shown in Table 1.

2.3.2. Elevation Data Set Segmentation

The dataset was divided into training and validation sets using a 7:3 ratio, resulting in 66 images for training and 27 images for validation. Care was taken to maintain a balanced chronological distribution among the three categories: Qing Dynasty (23%), Republic of China (9%), and People’s Republic of China (68%).

2.4. Deep Learning Algorithm Model Network Structure

YOLOv11 is an advanced iteration built upon the YOLOv8 framework, incorporating three key architectural innovations—C3K2, SPPF, and C2PSA—to improve detection accuracy and computational efficiency, particularly in complex architectural environments.

C3K2 Module: This module replaces a single large convolutional kernel (e.g., C2f) with two smaller convolutional kernels, effectively reducing computational load and enhancing processing speed. This design improves the model’s sensitivity to fine-scale features commonly found in historic building elevations, such as windows, cornices, and decorative elements.
SPPF (Spatial Pyramid Pooling—Fast): This module facilitates the fusion of large-scale contextual information with fine-grained details by applying multiple spatial pooling layers in parallel. It improves the model’s ability to detect both global and localized architectural features within a single image.
C2PSA (Convolutional Coordinate-aware Position and Spatial Attention): The C2PSA module introduces a spatial attention mechanism that enables the model to dynamically emphasize important regions in an image. This selective focus enhances detection performance, especially in visually complex or partially occluded scenes, which are common in historic village environments.

These enhancements collectively make YOLOv11 well suited for the intelligent recognition of historic building elevations, where architectural detail, occlusion, and scale variation present substantial challenges. The model architecture consists of three main components: the backbone network, feature pyramid network (FPN), and detection head (Figure 2).

The backbone network employs a lightweight C3K2 module as the core feature extractor. It reduces computational complexity by using compact 3 × 3 convolutional kernels, decreasing GFLOPs by 18%. Combined with the Cross Stage Partial (CSP) structure, it preserves detailed elevation features such as material textures and structural morphology [13]. A dynamic convolution block further improves gradient propagation via residual concatenation [41], adapting well to high-resolution UAV imagery.

The FPN integrates the SPFF module with a bidirectional cross-scale fusion strategy (PANet++), applying multiple pooling kernels (5 × 5, 9 × 9, and 13 × 13) to extract multiresolution features in parallel. This enhances sensitivity to small-scale architectural elements such as cornice decorations. Channel compression is also applied, reducing computational overhead by 12% compared to traditional SPP modules.

The detection head features a decoupled structure using the C2PSA attention mechanism, which dynamically weights occluded region features using spatial attention [42]. This decoupling helps optimize classification and localization independently. The multi-scale prediction head outputs feature maps at three levels—P3 (small targets < 50 × 50 pixels), P4 (medium targets 50 × 50–200 × 200 pixels), and P5 (large targets > 200 × 200 pixels)—with PANet++ enhancing robustness across varying scales [43] (see Figure 3).

2.5. Training Strategies and Parameters

The YOLOv11L model was trained in a high-performance computing environment equipped with an NVIDIA RTX 3080Ti GPU (16 GB VRAM), 32 GB RAM, and the Ubuntu 20.04 operating system. This configuration provided sufficient computational power and memory to support efficient training, particularly for high-resolution images and complex architectural details common in historical building elevations.

An input image resolution of 640 × 640 pixels was selected to balance feature resolution and computational efficiency, which is particularly effective for the detection of small targets and intricate structural components. A batch size of 16 was used to optimize memory utilization while ensuring training stability and computational speed. Although larger batch sizes can enhance gradient stability and improve batch normalization, memory overflow was carefully avoided.

For optimization, the initial learning rate was set at 0.01, and a cosine annealing learning rate schedule was employed to gradually reduce it to a minimum of 0.001. This strategy promotes smooth convergence in the later training stages, enhances model generalization, and reduces the risk of overfitting.

The training process was configured for a maximum of 400 epochs, with an early stopping mechanism implemented: if the validation loss failed to decrease for 30 consecutive epochs, training was terminated to conserve computational resources and maintain training efficiency.

3. Results

3.1. Comparison of Building Recognition Results Based on Deep Learning Algorithms

This study evaluated the performance of several deep learning models—including CNN, Fast R-CNN, YOLOv8, and various versions of YOLOv11—on a dataset of 93 high-resolution images representing 21 historical buildings of known age in Qiaonan Village.

To comprehensively assess YOLOv11’s effectiveness in elevation classification tasks, five variants were tested (see Table 2).

Each model was evaluated using four key metrics:

mAP@0.5 (mean average precision at IoU = 0.5): Measures precision when intersection-over-union is ≥0.5; higher values indicate better detection accuracy.
mAP@0.5:0.95: A more stringent metric averaged over IoU thresholds from 0.5 to 0.95 (at 0.05 intervals); indicates bounding box localization accuracy.
GFLOPs (Giga floating point operations): Reflects the model’s computational demand during inference; higher GFLOPs imply higher hardware requirements and less real-time efficiency.
Params (parameters): Total number of trainable parameters; a measure of model complexity.

The performance metrics for each model are shown in Table 3.

The results of the experiment show that:

The superior overall performance of the YOLOv11 series confirms its advantage over other deep learning models in the digital recognition and classification of historic building elevations. The mAP50 average of YOLOv11 reached 0.964, which is an improvement of 53.02% compared to Fast R-CNN and of 82.3% compared to YOLOv8. The mAP50-95 mean value of 0.783 was significantly better than that of other models. The target detection model performance comparison shows that the four models of YOLOv11 had similar mAP50 values, but since the parameter and GFLOPs values of YOLOv11L were significantly higher than those of the other three models and had the highest overall performance, which suggests that it is suitable for the historical building preservation task with high precision requirements, YOLOv11L was finally selected as the final training model for the study.

3.2. Training Process and Model Convergence Analysis

The training process of the YOLOv11L model exhibited excellent convergence behavior, as illustrated by the loss curve in Figure 4. Initially, both the bounding box loss (box_loss) and classification loss (cls_loss) were high, reflecting the model’s limited ability to extract detailed features. However, as training progressed, a steady decline in both metrics was observed.

By around epoch 500, the box_loss stabilized at approximately 0.15, and the cls_loss decreased to around 0.28, indicating effective learning of spatial localization and class differentiation. The precision–recall curve in Figure 5 further demonstrates the model’s stability: Even at recall levels above 0.90, precision remained higher than 0.95, highlighting its robustness in practical scenarios. This not only proves that the model has extremely high accuracy in the building detection task, but also reflects that it can balance well the problem of leakage and misdetection in practical applications. During the training process, the reasonable settings of hardware environment and hyperparameters such as batch size further ensure the balance of computational efficiency and memory management, thus making the whole model training process more efficient.

3.3. Classification and Post-Processing of Unknown Building Dates

3.3.1. Method Flow and Parameter Setting

Using the trained YOLOv11L model (best.pt), automated classification was performed on 178 buildings of unknown age around Qiaonan Village. To ensure reliable results, the following settings were applied:

Confidence threshold: 0.6 (predictions below this are discarded);
Non-maximum suppression (NMS): IoU threshold of 0.5 to eliminate redundant detections;
Output categories: QD (0), RC (1), PRC (2), matching training label encoding.

Finally, batch inference was conducted for all images, producing bounding boxes, predicted categories, and confidence scores for each building.

3.3.2. Analysis of Classification Results

The classification results for the 178 scanned unknown buildings are presented in Table 4.

The classification results show:

Among the 178 unknown buildings, the model identified 15 architectural heritage buildings, including 12 QD buildings and 3 RC buildings, with an average confidence level of 0.90, which indicates that the model has a strong ability to recognize architectural heritage features. The architectural heritage is mainly distributed in the core area of the village, which is highly consistent with the positioning of the “Ming and Qing Dynasty Ancient Village” recorded in the local history (see Figure 6).
The percentage of PRC building architecture reached 91.6%, which was the highest, and was concentrated in the periphery of the village, reflecting the spatial expansion pattern in the process of urbanization. The average confidence level was 0.90, which verifies that the model accurately recognized the characteristics of PRC buildings (e.g., flat roofs, tiled elevations).

The results demonstrate that the application of YOLOv11L in historic building preservation significantly enhances the efficiency of intelligent recognition and classification, while reducing the time and labor costs traditionally associated with documentation. This study introduces a novel deep learning-based method for the digitalization of rapidly vanishing historic structures, offering a scalable approach to heritage conservation.

3.4. GIS Annotation and Visualisation of Historic Buildings

In order to better apply the identification results to the conservation of historic buildings and the management of the heritage area in the region, GIS was introduced in the last step of the study to realize the geographic location annotation of the buildings in the heritage area and the integration of the information of the historic buildings, as shown in Figure 7.

The 15 newly identified heritage buildings were combined with the 13 known ones, totaling 28 historical buildings.
Laser scanning was conducted to generate 3D point cloud models of these structures.
Spatial coordinates, 2D imagery, and 3D model data were integrated into GIS to create an interactive building distribution map.

This integration enables dynamic monitoring and intelligent management of the historic buildings in Qiaonan Village through a unified database platform, as shown in Figure 8.

By integrating 2D imagery, 3D models, and spatial data through GIS, this study realized the visualization and digital archiving of historic buildings within traditional villages and heritage zones. The approach not only facilitates intelligent monitoring and hierarchical protection, but also supports long-term conservation research and management of architectural heritage.

4. Discussion

4.1. Research Results and Contributions

This study proposes a method based on deep learning algorithms to identify and realize the classification of historical buildings, and integrates building information, 2D images and 3D model data with GIS to establish a database of historical buildings, which provides data support for the digital management research of historical buildings. The proposed method aims to address the challenges of identifying, evaluating, and preserving historic buildings in heritage areas or traditional villages under autonomous construction, and to make up for the lack of digitalization research in the field of historic buildings.

To achieve this goal, the study was divided into three phases. The first phase collected data on building elevations in the village. In the second stage, target detection of historical buildings was carried out, and the buildings in the village were categorized into three periods according to different styles and ages. In the third stage, based on the identification results, the historical buildings were scanned to generate 3D point cloud models, and a building distribution map of Qiaonan Village was obtained using GIS, while the information of the historical buildings, 2D images, and 3D models was integrated into a database.

In the recognition phase, historical buildings were detected with 98.5% confidence using the YOLOv11L target detection model, which is an acceptable level of accuracy. Comparison experiments showed that the model outperformed other deep learning algorithm models in terms of accuracy and speed. In the stage of establishing the database, the support of 2D pictures and 3D models made the database more complete, and GIS visualized the distribution of historical buildings and showed the location of historical buildings more intuitively, which represents an important contribution to the digital research of historical buildings.

The main contributions of this study are as follows:

The recognition of building elevations by deep learning algorithms confirms the feasibility of applying AI technology to the value assessment of historic buildings, improves the efficiency of identifying and categorizing historic buildings, and makes the protection system of historic buildings in Quanzhou more complete.
Integrating the information, 2D pictures, and 3D models of the buildings to build a database of historical buildings with temporal and spatial attributes realizes the archiving of full-dimensional information and intelligent monitoring of historical buildings, which is an application and innovation for the protection and management of historical buildings. At the same time, the establishment of the database can promote the development of tourism in the heritage area, make tourists feel the local culture more intuitively, and realize the transformation and upgrading from sightseeing tourism to an in-depth cultural experience.
Applying artificial intelligence technology to the study of identification, assessment, and digital management of historical buildings in Qiaonan Village will help the future protection of Quanzhou World Cultural Heritage Site and the construction of an intelligent monitoring system, promote the cross-fertilization of multiple disciplines in the field of historical building protection, and help build an intelligent historical building database and management platform.

4.2. Limitations of the Study

Despite the promising outcomes, several limitations are acknowledged:

The geographic locations of historical buildings need to be manually labeled in GIS, and 2D images and 3D models need to be converted into links to be inserted into the corresponding distribution points of historical buildings, which significantly increases the cost of data preprocessing and reduces the efficiency of data fusion.
The building elevation of traditional villages under autonomous construction generates various complex problems. Confusing building layouts and narrow building spacing hinder the collection of building elevation information, and the identification process confuses the identification of some building elevations due to materials, building forms, and damaged building elevations, which is a problem that cannot be fully covered by the current data set.
Although our research methodology has shown satisfactory results in the digitization of historic buildings in Qiaonan Village, the current study still focuses on selected villages in southern Fujian and lacks data on other historic building preservation zones and historic buildings in China, which does not adequately represent the public and historical value of historic buildings.

5. Conclusions

This study validates the feasibility and effectiveness of deep learning algorithms—particularly the YOLOv11L model—in the identification and classification of historical buildings, using Qiaonan Village as a case study. Through a multi-step pipeline integrating deep learning, 3D scanning, and GIS visualization, the study demonstrates how AI can enhance the intelligent monitoring and hierarchical protection of architectural heritage in traditional villages. The findings offer solutions to the persistent challenges of inadequate protection and inefficient management in heritage preservation.

The proposed method not only supports the development of a comprehensive digital protection system for historical buildings in Quanzhou but also promotes sustainable tourism and the interdisciplinary integration of AI, architecture, cultural heritage, and spatial information science.

Future outlook:

Toward an Integrated Research Model: This study marks an initial advancement in the digital preservation of historical buildings. Future work will focus on enhancing the interoperability of deep learning algorithms with GIS and other spatial technologies, aiming to establish a “data collection–intelligent recognition–data integration” workflow. This model would support real-time monitoring and intelligent management, addressing the limitations of traditional preservation methods. We envision this framework serving as a blueprint for the intelligent protection of cultural heritage both in China and globally.
Building a Smart Cultural Heritage Platform: The construction of an intelligent management system will not only assist in the preservation of the Quanzhou World Heritage Site but also enable dynamic assessments of tourism carrying capacity. This integration supports a balanced approach to cultural tourism development and heritage conservation, facilitating a transformation from passive sightseeing to active cultural engagement.
Expanding Scope and Generalizability: The YOLOv11L-based detection system has proven effective in autonomously identifying and evaluating traditional village architecture, significantly reducing manual labor and increasing efficiency. In the next stage, the research will be expanded to include adjacent regions and broader national contexts, thereby strengthening the digital foundation of China’s historical building protection network and contributing to its systematic, intelligent evolution.

Author Contributions

Conceptualization, S.W., J.Z. and A.N.T.; methodology, S.W., J.Z. and A.N.T.; software, A.N.T. and K.S.; validation, S.W. and A.N.T.; formal analysis, S.W. and A.N.T.; investigation, S.W. and A.N.T.; resources, S.W. and A.N.T.; data curation, S.W. and A.N.T.; writing—original draft preparation, S.W.; writing—review and editing, S.W. and J.Z.; visualization, A.N.T. and K.S.; supervision, S.W. and J.Z.; project administration, J.Z.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The project is supported by the National Natural Science Foundation of China (52008175) and the National Social Science Foundation of China (24VJXT013).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

We are grateful to the School of Architecture of Huaqiao University for providing us with equipment support, and the local villagers for their help and support during our research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

YOLO	You Only Look Once (deep convolutional neural network)
QD	Qing Dynasty
PRC	People’s Republic of China
RC	Republic of China
GIS	Geographic Information System
CTSs	Chinese traditional settlements
CNN	Convolutional neural network
R-CNN	Region-based convolutional neural network
CViT	Convolutional vision transformer
AI	Artificial intelligence
QGIS	Quantum Geographic Information System (cross-platform free and open-source desktop GIS application
PSP	Panoramic spherical photogrammetry
BIM	Building Information Modeling
2D	Two-dimensional
3D	Three-dimensional
HSV	Hue, saturation, value
FPN	Feature pyramid networks
GFLOPs	Giga floating point operations per second
CSP	Cross Stage Partial
PANet	Path aggregation network
SPP	Spatial Pyramid Pooling
mAP	Mean average precision
Params	Parameters
cls_loss	Classification loss
NMS	Non-maximum suppression
IOU	Intersection over union
AHE_Dataset	Architectural heritage elements dataset

References

Wang, Q. Quanzhou: The World Emporium at the East End of the Maritime Silk Routes. In Architecture, Monuments and Urbanism Along the Silk Roads; Shebahang, M., Ed.; UNESCO Publishing: Paris, France, 2024; pp. 31–53. [Google Scholar]
Quanzhou Municipal People’s Government. Luoyang Bridge: “The First Bridge in the Sea” Born in the Trade Boom of the Song Dynasty. Available online: https://www.quanzhou.gov.cn/gastronomy/ch/qzgk/syzc/202502/t20250211_3138540.htm (accessed on 20 February 2025).
Ren, Z. Study on the protection and renewal of the cultural landscape of Qiaonan Village in Quanzhou under the perspective of human-land relations. Beauty Times 2021, 10, 46–48. [Google Scholar] [CrossRef]
Qiu, H.; Zhang, J.; Zhuo, L.; Xiao, Q.; Chen, Z.; Tian, H. Research on intelligent monitoring technology for roof damage of traditional Chinese residential buildings based on improved YOLOv8: Taking ancient villages in southern Fujian as an example. Herit. Sci. 2024, 12, 231. [Google Scholar] [CrossRef]
Li, Y.; Jia, L.; Wu, W.; Yan, J.; Liu, Y. Urbanisation for rural sustainability—Rethinking China’s urbanisation strategy. J. Clean. Prod. 2018, 178, 580–586. [Google Scholar] [CrossRef]
Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep learning for computer vision: A brief review. Comput. Intell. Neurosci. 2018, 2018, 7068349. [Google Scholar] [CrossRef]
Ji, S.Y.; Jun, H.J. Deep learning model for form recognition and structural member classification of East Asian traditional buildings. Sustainability 2020, 12, 5292. [Google Scholar] [CrossRef]
Han, Q.; Yin, C.; Deng, Y.; Liu, P. Towards classification of architectural styles of Chinese traditional settlements using deep learning: A dataset, a new framework, and its interpretability. Remote Sens. 2022, 14, 5250. [Google Scholar] [CrossRef]
Siountri, K.; Anagnostopoulos, C.N. The classification of cultural heritage buildings in Athens using deep learning techniques. Heritage 2023, 6, 3673–3705. [Google Scholar] [CrossRef]
Wei, G.; Wan, F.; Zhou, W.; Xu, C.; Ye, Z.; Liu, W.; Lei, G.; Xu, L. BFD-YOLO: A YOLOv7-based detection method for building elevations defects. Electronics 2023, 12, 3612. [Google Scholar] [CrossRef]
Deng, J.; Xuan, X.; Wang, W.; Li, Z.; Yao, H.; Wang, Z. A Review of Research on Object Detection Based on Deep Learning. In Proceedings of the 2020 International Seminar on Artificial Intelligence, Networking and Information Technology, Shanghai, China, 18–20 September 2020. [Google Scholar] [CrossRef]
Jiao, L.; Zhang, F.; Liu, F.; Yang, S.Y.; Li, L.L.; Feng, Z.X.; Qu, R. A survey of deep learning-based object detection. IEEE Access 2019, 7, 128837–128868. [Google Scholar] [CrossRef]
Ultralytics. Ultralytics YOLO Documentation. Available online: https://docs.ultralytics.com/zh/models/yolo11/ (accessed on 20 February 2025).
SkyCloud Developer Community. Computer Vision Frontier Exploration: In-depth Analysis of Target Detection and Recognition Algorithms. Available online: https://www.ctyun.cn/developer/article/637181884231749 (accessed on 20 February 2025).
Khanam, R.; Hussain, M. Yolov11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. Available online: https://arxiv.org/abs/2410.17725 (accessed on 20 February 2025).
Yu, L. Semantic Representation: From Color to Deep Embeddings. Ph.D. Thesis, Universitat Autònoma de Barcelona, Bellaterra, Spain, 2019. [Google Scholar]
Obeso, A.M.; Benois-Pineau, J.; Acosta, Á.R.; Vázquez, M.S.G. Architectural style classification of Mexican historical buildings using deep convolutional neural networks and sparse features. J. Electron. Imaging 2017, 26, 011016. [Google Scholar] [CrossRef]
Wu, S.; Di, B.; Ustin, S.L.; Stamatopoulos, C.A.; Li, J.; Zuo, Q.; Wu, X.; Ai, N. Classification and detection of dominant factors in geospatial patterns of traditional settlements in China. J. Geogr. Sci. 2022, 32, 873–891. [Google Scholar] [CrossRef]
Li, M.H.; Yu, Y.; Wei, H.; Chan, T.O. Classification of the qilou (arcade building) using a robust image processing framework based on the Faster R-CNN with ResNet50. J. Asian Archit. Build. Eng. 2024, 23, 595–612. [Google Scholar] [CrossRef]
Nordmark, N.; Ayenew, M. Window Detection in Elevations Imagery: A Deep Learning Approach Using Mask R-CNN. arXiv 2021, arXiv:2107.10006. Available online: https://arxiv.org/abs/2107.10006 (accessed on 20 February 2025).
Llamas, J.; Lerones, P.M.; Medina, R.; Zalama, E.; Gómez-García-Bermejo, J. Classification of architectural heritage images using deep learning techniques. Appl. Sci. 2017, 7, 992. [Google Scholar] [CrossRef]
Liu, J. Research on Building Style Recognition Based on Deep Learning. Master’s Thesis, Xi’an University of Architecture and Technology, Xi’an, China, 2023. Available online: https://link.cnki.net/doi/10.27393/d.cnki.gxazu.2023.000478 (accessed on 20 February 2025).
Gao, C.; Zhang, Q.; Tan, Z.; Zhao, G.; Gao, S.; Kim, E.; Shen, T. Applying optimized YOLOv8 for heritage conservation: Enhanced object detection in Jiangnan traditional private gardens. Herit. Sci. 2024, 12, 31. [Google Scholar] [CrossRef]
De Amorim, A.L.; Fangi, G.; Malinverni, E.S. Documenting Architectural Heritage in Bahia, Brazil, Using Spherical Photogrammetry. In Proceedings of the 24th International CIPA Symposium, Strasbourg, France, 2–6 September 2013. [Google Scholar] [CrossRef]
Tsilimantou, E.; Delegou, E.T.; Nikitakos, I.A.; Ioannidis, C.; Moropoulou, A. GIS and BIM as integrated digital environments for modeling and monitoring of historic buildings. Appl. Sci. 2020, 10, 1078. [Google Scholar] [CrossRef]
Bolstad, P. GIS Fundamentals: A First Text on Geographic Information Systems, 6th ed.; XanEdu: Ann Arbor, MI, USA, 2019. [Google Scholar]
Parlavecchia, M.; Pascuzzi, S.; Anifantis, A.S.; Santoro, F.; Ruggiero, G. Use of GIS to evaluate minor rural buildings distribution compared to the communication routes in a part of the Apulian territory (Southern Italy). Sustainability 2019, 11, 4700. [Google Scholar] [CrossRef]
Duckham, M.; Sun, Q.C.; Worboys, M.F. GIS: A Computing Perspective, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2023. [Google Scholar] [CrossRef]
Petrescu, F. The Use of GIS Technology in Cultural Heritage. In Proceedings of the 21st International CIPA Symposium: Anticipating the Future of the Cultural Past, Athens, Greece, 1–6 October 2007; CIPA: Athens, Greece, 2007. [Google Scholar]
Cerutti, E.; Noardo, F.; Spanò, A. Architectural Heritage Semantic Data Managing and Sharing in GIS. In Proceedings of the 1st International Conference on Geographical Information Systems Theory, Applications and Management, Barcelona, Spain, 28–30 April 2015. [Google Scholar] [CrossRef]
Taboroff, J. Cultural Heritage and Natural Disasters: Incentives for Risk Management and Mitigation. In Managing Disaster Risk in Emerging Economies; Kreimer, A., Arnold, M., Eds.; World Bank: Washington, DC, USA, 2000; Volume 2, pp. 71–79. [Google Scholar]
Fangi, G. Multiscale Multiresolution Spherical Photogrammetry with Long Focal Lenses for Architectural Surveys. In Proceedings of the ISPRS Mid-Term Symposium, Newcastle, UK, 8–11 June 2010; ISPRS: Newcastle, UK, 2010. [Google Scholar]
Almeida, A.; Gonçalves, L.M.S.; Falcão, A.P.; Ildefonso, S. 3D-GIS Heritage City Model: Case Study of the Historical City of Leiria. In Proceedings of the 19th AGILE International Conference on Geographic Information Science, Helsinki, Finland, 14–17 June 2016. [Google Scholar]
Braik, A.M.; Koliou, M. Automated building damage assessment and large-scale mapping by integrating satellite imagery, GIS, and deep learning. Comput.-Aided Civ. Infrastruct. Eng. 2024, 39, 2389–2404. [Google Scholar] [CrossRef]
Shen, Y.; Zhu, S.; Yang, T.; Chen, C.; Pan, D.; Chen, J.; Xiao, L.; Du, Q. BDANet: Multiscale convolutional neural network with cross-directional attention for building damage assessment from satellite images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14. [Google Scholar] [CrossRef]
Zhuo, L.; Zhang, J.; Hong, X. Cultural heritage characteristics and damage analysis based on multidimensional data fusion and HBIM—Taking the former residence of HSBC bank in Xiamen, China as an example. Herit. Sci. 2024, 12, 128. [Google Scholar] [CrossRef]
Murphy, M.; McGovern, E.; Pavia, S. Historic Building Information Modelling (HBIM). Struct. Surv. 2009, 27, 311–327. [Google Scholar] [CrossRef]
Dabrowski, P.S.; Zienkiewicz, M.H.; Tysiąc, P.; Burdziakowski, P.; Szulwic, J.; Sužiedelytė-Visockienė, J.; Paršeliūnas, E.; Obuchovski, R.; Bražiūnas, R.; Ossowski, R. HBIM Symmetry Parametrization Using TLS and UAV LiDAR Measurements. Measurement 2025, 253, 117750. [Google Scholar] [CrossRef]
Tang, P.; Huber, D.; Akinci, B.; Lipman, R.; Lytle, A. Automatic Reconstruction of As-Built Building Information Models from Laser-Scanned Point Clouds: A Review of Related Techniques. Autom. Constr. 2010, 19, 829–843. [Google Scholar] [CrossRef]
Fan, J.; Chen, Y.; Zheng, L. Artificial intelligence for routine heritage monitoring and sustainable planning of the conservation of historic districts: A case study on Fujian earthen houses (tulou). Buildings 2024, 14, 1915. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 26–28 June 2016. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]

Figure 1. Research methodology.

Figure 2. Structure of the YOLOv11 model.

Figure 3. Backbone of the YOLOv11 feature pyramid network.

Figure 4. Performance parameters of model training.

Figure 5. Precision–recall curve.

Figure 6. YOLOv11L scanning results for historic building facades.

Figure 7. Visualization workflow of historical building information.

Figure 8. GIS-based distribution map of historical buildings in Qiaonan Village.

Table 1. Chronological labeling.

Labels	Introduction	Representative Buildings
QD (1616–1912)	This is the oldest building type in the study area, preserving the style of local historical buildings, mainly Minnan ancient Gu-Cuo and ancestral temples and monasteries.
RC (1912–1949)	Combining the historical buildings of Southern Fujian and Western architectural elements, it has formed a unique East-meets-West style and is an important material carrier of modern overseas Chinese culture.
PRC (post-1949)	Predominantly local self-built houses, with elevations made up mainly of modern building materials such as red bricks and ceramic tiles, some of which have been subsequently added to, resulting in the appearance of two styles in one building.

Table 2. YOLOv11 series model explanation.

Model	Explanation
YOLOv11N	Lightweight version optimized for limited computing environments and real-time requirements.
YOLOv11S	Slightly enhanced capacity with low computational load.
YOLOv11M	Balanced trade-off between accuracy and efficiency.
YOLOv11L	Higher parameter count and deeper architecture for improved accuracy.
YOLOv11X	Deepest variant with maximum detection precision, ideal for offline analysis or high-performance servers.

Table 3. Comparison of target detection model performance.

Modeling	mAP50	mAP50-95	Parameters	GFLOPs	Accuracy (%)
Fast_RCNN	0.6295	0.3297	41,304,286	/	63
CNN	/	/	51,480,000	0.57	89
YOLOv8	0.529	0.261	3,006,233	8.1	52.9
YOLOv11L	0.985	0.792	23,281,625	86.6	98.5
YOLOv11N	0.978	0.807	2,582,737	6.3	97.8
YOLOv11M	0.988	0.755	20,032,345	67.7	98.8
YOLOv11S	0.941	0.807	9,413,961	21.3	94.1
YOLOv11X	0.929	0.756	56,830,489	194.4	92.9

Table 4. Unknown building age classification statistics.

Age	Number (Buildings)	Average Confidence Level (Math.)
QD (0)	12	0.92
RC (1)	3	0.87
PRC (2)	163	0.90

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, S.; Zhang, J.; Tun, A.N.; Sein, K. Research on Identification, Evaluation, and Digitization of Historical Buildings Based on Deep Learning Algorithms: A Case Study of Quanzhou World Cultural Heritage Site. Buildings 2025, 15, 1843. https://doi.org/10.3390/buildings15111843

AMA Style

Wang S, Zhang J, Tun AN, Sein K. Research on Identification, Evaluation, and Digitization of Historical Buildings Based on Deep Learning Algorithms: A Case Study of Quanzhou World Cultural Heritage Site. Buildings. 2025; 15(11):1843. https://doi.org/10.3390/buildings15111843

Chicago/Turabian Style

Wang, Siqi, Jiahao Zhang, Aung Nyan Tun, and Kyi Sein. 2025. "Research on Identification, Evaluation, and Digitization of Historical Buildings Based on Deep Learning Algorithms: A Case Study of Quanzhou World Cultural Heritage Site" Buildings 15, no. 11: 1843. https://doi.org/10.3390/buildings15111843

APA Style

Wang, S., Zhang, J., Tun, A. N., & Sein, K. (2025). Research on Identification, Evaluation, and Digitization of Historical Buildings Based on Deep Learning Algorithms: A Case Study of Quanzhou World Cultural Heritage Site. Buildings, 15(11), 1843. https://doi.org/10.3390/buildings15111843

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Identification, Evaluation, and Digitization of Historical Buildings Based on Deep Learning Algorithms: A Case Study of Quanzhou World Cultural Heritage Site

Abstract

1. Introduction

2. Methods and Data

2.1. Study Area

2.2. Multi-Model Comparison for Intelligent Classification of Historic Buildings

2.3. Historic Building Facade Data Collection

2.3.1. Elevation Data Set Construction

2.3.2. Elevation Data Set Segmentation

2.4. Deep Learning Algorithm Model Network Structure

2.5. Training Strategies and Parameters

3. Results

3.1. Comparison of Building Recognition Results Based on Deep Learning Algorithms

3.2. Training Process and Model Convergence Analysis

3.3. Classification and Post-Processing of Unknown Building Dates

3.3.1. Method Flow and Parameter Setting

3.3.2. Analysis of Classification Results

3.4. GIS Annotation and Visualisation of Historic Buildings

4. Discussion

4.1. Research Results and Contributions

4.2. Limitations of the Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI