Machine Learning and Deep Learning for Cultural Heritage Conservation: A Bibliometric and Task-Oriented Review

Li, Xinchen; Chiabrando, Filiberto; Sammartano, Giulia

doi:10.3390/rs18040628

Open AccessReview

Machine Learning and Deep Learning for Cultural Heritage Conservation: A Bibliometric and Task-Oriented Review

by

Xinchen Li

^*

,

Filiberto Chiabrando

and

Giulia Sammartano

Department of Architecture and Design (DAD), Politecnico di Torino, Viale Pier Andrea Mattioli, 39, 10125 Turin, Italy

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(4), 628; https://doi.org/10.3390/rs18040628

Submission received: 21 December 2025 / Revised: 5 February 2026 / Accepted: 12 February 2026 / Published: 17 February 2026

(This article belongs to the Special Issue Applications of Photogrammetry and Lidar Techniques in Cultural Heritage Documentation)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

This review reveals that machine learning and deep learning methodologies within the cultural heritage domain exhibit distinct task-oriented distributions, which can be categorized into three core themes: recognition, reconstruction and virtual restoration, alongside monitoring and prediction.
Through bibliometric and content analysis, it is evident that the applicability of artificial intelligence models in cultural heritage is highly contingent upon specific task objectives, the characteristics of heritage objects, and the modalities and structures of the data.

What are the implications of the main findings?

The proposed “data + technology + task” framework provides a systematic reference for selecting AI model paradigms within the cultural heritage domain, facilitating effective alignment between methodological approaches and heritage conservation objectives.
Future AI research in cultural heritage should prioritize the development of standardized heritage benchmark datasets, the integration of explainable AI strategies, and collaborative design methodologies between conservation specialists and AI systems. This will underpin more reliable and sustainable decision-making in heritage conservation.

Abstract

With the rapid advancement of Artificial Intelligence (AI) technologies, Machine Learning (ML) and Deep Learning (DL) have become pivotal methods for driving the digital documentation, restoration, preservation, and preventive conservation of Cultural Heritage (CH). This paper constructs an integrated “data + technology + task” framework tailored for CH scenarios. It employs a combination of bibliometric analysis and systematic content study based on relevant literature published between 2011 and 2025. First, publication trends, sources of publication, global collaboration networks, and topic modeling reveal the overall landscape and evolutionary path of research on the digitization and intelligent transformation of CH. Subsequently, beginning with ML and DL systems, it summarizes classic workflows and outlines their applications in CH conservation. Concurrently, integrating topic modeling, existing research is categorized into three themes based on task attributes: Recognition, Reconstruction and Virtual Restoration, and Monitoring and Prediction. Representative literature, typical tasks, and technological trends within each theme are systematically outlined. Distinct from existing reviews, this study introduces a unified data technology task framework that explicitly links AI model paradigms to heritage-specific constraints. Moving forward, by constructing high-quality heritage datasets, enhancing model interpretability, and exploring cross-model fusion approaches, AI technologies hold promise to play a more reliable and sustainable role in CH conservation, risk management, and digital dissemination.

Keywords:

cultural heritage conservation; machine learning; deep learning; bibliometric; topic modeling; heritage data

1. Introduction

In recent years, the rapid advancement of Artificial Intelligence (AI) technology has propelled the digitalization and intelligent transformation of Cultural Heritage (CH) research. Among these, Machine Learning (ML) and Deep Learning (DL) have demonstrated formidable capabilities across multiple domains, including image recognition, 3D modeling, damage diagnosis, material analysis, and monitoring prediction, gradually reshaping the research paradigm for CH conservation.

As a representative AI technology, ML was among the first to enter the CH field. Statistical methods, such as linear and logistic regression, were applied to classify and identify heritage objects. For instance, integrating multidimensional attributes, such as fragment color, geometric compatibility, contour overlap, and fracture surface characteristics, enables the prediction of the probability of correct fragment matching and high-precision sorting to aid in the reconstruction of archaeological artifacts [1]. With the proliferation of multi-source data acquisition methods, including high-resolution remote sensing imagery, Terrestrial Laser Scanning (TLS), Uncrewed Aerial Vehicle (UAV) photogrammetry, near-infrared and Hyperspectral Imaging (HSI), and environmental monitoring sensors, CH research has entered the era of big data. This data provides an unprecedented training foundation for ML models. The application of ML to cultural heritage has evolved into complex DL models that generate analyses and preventive conservation studies for diverse heritage tasks. These primarily include heritage object recognition, architectural element segmentation, semi-automated Heritage Building Information Modeling (HBIM) reconstruction, heritage damage detection, virtual restoration, and disaster monitoring and risk prediction at heritage sites.

Since 2020, the application of ML and DL within the cultural heritage sector has exhibited a trend towards diversification. Building upon an increasingly rich body of research, scholars have begun to summarize research progress in cultural heritage from various perspectives. For specific heritage tasks, ref. [2] investigated various ML techniques applied to the health assessment of cultural heritage structures. Combining laboratory or field-collected test data with ML can yield more robust predictive models. These models can be used for diverse predictive tasks, such as forecasting masonry compressive strength, simulating potential damage scenarios in heritage structures, conducting seismic vulnerability assessments, determining material mechanical properties, and identifying surface damage on monuments caused by weathering, material loss, erosion, leakage, algal growth, and lichen deposition. Ref. [3] reviews the applicability of qualitative Non-Destructive Testing (NDT) techniques such as Infrared Thermography (IRT), photogrammetry, and laser scanning, alongside quantitative NDT methods like Heat Flux Meter (HFM), Quantitative Infrared Thermography (QIRT), and airtightness measurements in architectural heritage diagnostics. Methods that apply image processing and DL techniques to the assessment of CH damage are summarized in [4]. Addressing the critical issue of point cloud data in CH, ref. [5] comprehensively reviews 3D point cloud segmentation (3DPCS) techniques, including region growing, model fitting, and unsupervised clustering, for identifying and extracting fundamental geometric shapes, particularly in immovable cultural heritage. It also outlines application strategies for 3D semantic point cloud segmentation (3DPCSS) algorithms, including Random Forest (RF), PointNet, PointNet++, and Dynamic Graph Convolutional Neural Network (DGCNN), in damaged-area surveys, architectural structure extraction, and multi-scale spatial 3D information analysis. In specific disciplines such as archaeology, remote sensing, combined with AI-related technologies, is rapidly being adopted for the study, evaluation, and dynamic monitoring of multi-source data, including geospatial and physical signals. This approach is particularly suited for rapid identification and classification of archaeological sites, their environmental features, and objects [6,7]. Regarding specific heritage objects, refs. [8,9] reviewed tools and techniques for the digital preservation of architectural heritage from perspectives of LiDAR technology and disasters, respectively. These studies often focus on specific tasks such as building Structural Health Monitoring (SHM), classification and segmentation of heritage point clouds, and cultural heritage damage assessment; specific method categories such as LiDAR, Computer Vision (CV), and NDT; or particular disciplines such as architecture, archaeology, and remote sensing. Concurrently, broader macro-level reviews have emerged, such as [10], which describes ML applications in Cultural Heritage (CH) across supervised, semi-supervised, and unsupervised learning. The paper deals with practical examples involving linear and logistic regression, Decision Trees (DT) and RF, Support Vector Machine (SVM), deep neural networks, and unsupervised K-Means clustering algorithms.

However, existing reviews typically adopt a singular analytical perspective, focusing on specific heritage tasks, particular data types, distinct disciplinary fields, specific heritage objects, or emphasizing certain algorithmic systems. Whilst these perspectives yield valuable insights, they often fail to illuminate the interplay and connections between different dimensions within cultural heritage conservation research. To address this, this study proposes an integrated analytical framework centered on ‘data + technology + task’, comprising:

Research landscape from a bibliometric perspective: encompassing temporal evolution of research volume, global collaboration networks, interdisciplinary trends, and thematic structures;
Technical systems of ML and DL in cultural heritage: covering representative machine learning models, deep learning architectures, and their applicable data modalities;
Specific application scenarios of AI methods in cultural heritage tasks: including identification and detection, 3D modeling and virtual restoration, monitoring and prediction, material and chemical analysis, etc.

Distinct from existing reviews on CH, ML, and DL, this framework aims to understand the interdependencies between data sources, technological systems, and application objectives within cultural heritage conservation research. Simultaneously, it supports forward-looking analysis to provide researchers with guidance for more effectively understanding, selecting, and applying ML and DL methods in cultural heritage conservation, thereby uncovering potential opportunities for interdisciplinary method transfer and technological innovation.

2. Materials and Methods

This study reviews existing research on cultural heritage and artificial intelligence technologies in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) reporting guidelines [11,12]. By integrating bibliometric analysis, it provides an impartial, comprehensive, and practical synthesis of the current literature on this topic. The aim is to review the application trends of ML and DL technologies in cultural heritage conservation, demonstrate the interactions among multi-source data, AI technologies, and cultural heritage, and explore the potential of these technologies for cultural heritage research.

2.1. Literature Collection and Selection

Materials and Methods primarily encompass two components: literature collection and selection, and literature analysis and evaluation. The literature collection and selection process comprised three stages: Identification, Screening, and Inclusion.

During the Identification phase, Scopus served as the sole data source for this study. As a comprehensive bibliographic database, Scopus provides extensive interdisciplinary coverage across engineering, computer science, architecture, and cultural heritage studies, aligning well with the scope of this research. Furthermore, Scopus provides well-structured, standardized bibliographic metadata, including abstracts, author information, affiliations, keywords, and citation records, making it particularly suitable for large-scale bibliometric and thematic analysis. Finally, given the substantial overlap between Scopus and Web of Science, this study restricted its data source to a single database to avoid potential biases and inconsistencies that could arise from cross-database integration and duplicate removal. Based on keywords related to cultural heritage conservation, machine learning, deep learning, and their synonyms, the search query was set as TITLE-ABS-KEY ((“cultural heritage” OR “digital cultural heritage” OR “heritage” OR “digital heritage” OR “heritage conservation” OR “heritage preservation”) AND (“machine learning” OR “deep learning”)). Literature retrieval was conducted through titles, abstracts, and keywords, yielding 2819 records (as of 7 December 2025). Preliminary automated cleaning using PyCharm Professional Edition v2024.3 removed 236 duplicate records and those with missing author information. Subsequently, records not classified as Data papers, Articles, Conference papers, or Reviews were excluded, leaving 2446 documents that advanced to the screening stage.

During the screening phase, manual review was conducted based on content relevance using the following criteria:

Focus on the field of cultural heritage preservation, emphasizing heritage science research.
Study subjects include the physical cultural heritage itself and its digital carriers, as well as intangible cultural heritage integrated with physical airlines (e.g., characters, scripts, inscriptions).
Research must apply at least one machine learning or deep learning technique to address practical problems in heritage conservation.

Based on these criteria, 1362 papers were excluded for title, abstract, or keyword reasons, and 37 full-text documents were removed due to unavailability, leaving 1325 papers. Subsequently, a secondary screening based on the criteria and full-text content identified 867 papers for subsequent analysis and evaluation.

2.2. Literature Analysis and Assessment

During the literature analysis and evaluation, Python-based scripts and VOSviewer v1.6.18 [13] were used to conduct bibliometric analyses of the 867 included studies. Overall research trends were presented through data mining and visual representations. The specific methodology is outlined below.

2.2.1. Social Network Analysis

Social network analysis is a research methodology that originated in sociology, mathematics, and graph theory. Its focus lies not on individuals themselves, but on the relationships between individuals and the overall network structure formed by these relationships [14,15,16,17]. The behavior and outcomes of nodes within a network are influenced by the social network structure in which they are embedded. This study employs social network analysis to explore cooperative phenomena and relationships among nations globally. At the aggregate level, the network density metric is used to capture the extent of global cooperation. At the individual level, three classic centrality metrics [15], namely, degree centrality, betweenness centrality [16], and closeness centrality [17], are selected to measure the importance of each node (country) within the network structure. These metrics enable the identification of “core countries” that have a quantitative advantage in cooperation within the international cooperation network and “bridge countries” that connect different groups. The specific meanings and calculation formulas for the aforementioned indicators are presented in Table 1.

2.2.2. PyBibX: Topic Modeling and Network Analysis

This study employs the topics_creation() function from the PyBibX package for topic modeling [18]. This method falls under natural language processing (NLP) and text mining, fundamentally involving probabilistic modeling based on literature abstracts or keyword texts to identify sets of frequently co-occurring keywords. Each set represents an implicit thematic structure, enabling the calculation of the probability of each document belonging to various topics. Additionally, the network_adj_map() function was used to construct the adjacency matrix of the collaboration network, which serves as the basis for social network analysis.

2.2.3. VOSviewer: Keyword Co-Occurrence Analysis

Beyond collaborative network analysis, this study also employed VOSviewer v1.6.18 software [13] to conduct keyword co-occurrence analysis. By constructing keyword co-occurrence networks, a visual scientific knowledge map was generated. Analyzing different clustering results and their associated keywords effectively identifies research hotspots and frontier directions within the field. Within the keyword co-occurrence network, nodes represent keywords, while edges denote the co-occurrence relationship between two keywords within the same document. Node size is proportional to the keyword’s total frequency. Edge thickness corresponds to the co-occurrence strength, the number of times the keywords appear together, while node color indicates the cluster to which it belongs. Nodes of the same color belong to the same research subfield.

Building upon the bibliometric analysis in Section 3.1, this study systematically analyzes the selected literature along two primary dimensions: Section 3.2, technical approaches; and Section 3.3, application practices, aligned with the research framework and objectives (Figure 1).

3. Results

3.1. Bibliometric Analysis

In bibliometric analysis, the study systematically reviews the core themes, research hotspots, and their evolutionary trends in the relevant literature published between 2011 (the first year in which studies explicitly applied machine learning techniques to cultural heritage conservation) and 2025. By visualizing the correlation patterns and clustering of six key variables, journals, countries, time periods, titles, abstracts, and keywords, it further delineates several components: annual publication trends, publication sources, global collaboration networks, thematic modeling and keyword co-occurrence. Combined with text-mining methods for re-clustering and reinterpreting the thematic structure, this comprehensive overview examines research trends and developments in machine learning and deep learning for cultural heritage conservation from multiple perspectives.

3.1.1. Publishing Trends and Publication Source

Annual Publication Volume Trends

Overall, since 2011, the annual publication volume on cultural heritage conservation and machine learning has exhibited a significant phased growth trend (Figure 2). Between 2011 and 2017, yearly publications remained relatively low, in the single digits, with limited growth, indicating that the field was still in its early exploratory phase. During this period, the application of artificial intelligence in cultural heritage research had not yet been widely adopted, with related work primarily focused on fundamental applications such as image recognition and classification [1,19,20,21,22,23]. Furthermore, publication numbers began to change significantly after 2020, exhibiting rapid and sustained growth. Between 2020 and 2025, the number of relevant papers exhibited near-exponential growth, with 803 published during this period, accounting for approximately 92.6% of the entire sample. This rapid expansion is closely tied to the widespread adoption of deep learning methods in cultural heritage applications, particularly in successful practices such as heritage point cloud classification and segmentation [24,25,26,27,28,29], heritage site monitoring and prediction [30,31,32], and heritage object recognition and detection [33,34,35]. By this point, AI and cultural heritage research had gradually transitioned from an early exploratory phase to a stage of interdisciplinary integration and methodological maturity, exhibiting a trend of continuous expansion and deepening development.

At the national level, the top 15 nations collectively accounted for approximately 92.58% of global publications (using the full-count method, where each publication is counted once for each country involved, and multi-country collaborative publications are counted for all relevant nations). Figure 3 illustrates the annual publication trends for the top 15 countries in this field between 2011 and 2025. China leads with 226 publications (23.94%), exhibiting a marked upward trajectory in output since 2021. Italy ranks second with 202 publications (21.40%), indicating sustained, steady growth in its publication volume. Subsequently, India (91 publications), France (57), the United Kingdom (52), and the United States (51) followed, demonstrating stable engagement in this research domain. Spain (4.56%), Japan (2.75%), Greece (2.65%), and Turkey (2.12%) constitute the second tier, whereas countries such as Indonesia, Germany, South Korea, the Netherlands, and Canada exhibit comparatively lower research activity.

2.: Publication Sources and Disciplinary Distribution

From the perspective of publication sources, the top 20 journals by publication volume are primarily distributed across multiple relevant disciplinary fields, including surveying and photogrammetry, remote sensing, cultural heritage, computer science, and architecture and engineering (Figure 4). These publication sources encompass multiple internationally influential academic organizations and publishing systems, including the International Society for Photogrammetry and Remote Sensing (ISPRS), the Association for Computing Machinery (ACM), and the Institute of Electrical and Electronics Engineers (IEEE). This reflects the cross-disciplinary development characteristics of AI and cultural heritage research across different academic systems. Based on frequency and temporal distribution patterns, these publications can be broadly categorized into three types: conference proceedings series, high-frequency journals with significant academic influence, and increasingly active, highly specialized journals that have emerged in recent years. Early research was characterized by a technical exploration phase, with applications of machine learning and deep learning methods in cultural heritage predominantly appearing in conference papers and proceedings. Focus centered on computer vision, machine learning models, and remote sensing and spatial data processing, as evidenced by publications in the ISPRS Archives and Annals, Lecture Notes in Computer Science (LNCS), Communications in Computer and Information Science (CCIS), and the ACM International Conference Proceedings Series (ICPS). Following 2020, “AI + Cultural Heritage” emerged as a prominent research focus, with nearly all primary publication sources exhibiting sustained or significant growth in output, indicating markedly heightened research activity. Among these, the Journal of Cultural Heritage, Heritage Science (npj Heritage Science), and Remote Sensing have demonstrated exceptional prominence in both publication volume and temporal continuity, establishing themselves as the most active and highly influential journals in the field. Their research encompasses diverse domains, including cultural heritage conservation, remote-sensing analysis, the arts and humanities, archaeology, and the earth sciences. Furthermore, journals such as Applied Sciences and Heritage have published numerous case studies in recent years, providing rich experimental scenarios and research support for the applied practice and interdisciplinary approaches of digital cultural heritage.

3.1.2. Analysis of Global Collaborative Networks

At the aggregate level, given the relatively limited number of publications in the initial years, the analysis of global collaboration trends focuses on the period 2020–2025. During this phase, the structure of collaborative networks became relatively stable, rendering it statistically meaningful for analysis. Based on the Network Density metric and the number of participating nations over the past five years, Figure 5 presents a visual analysis of the evolutionary characteristics of global international collaboration from 2020 to 2025. The results indicate that the number of participating countries in publications has increased overall, peaking in 2024, representing the broadest global scientific participation to date. Between 2020 and 2021, as the number of participating nations increased, the collaborative network density rose concurrently, indicating relatively close international cooperative ties. However, from 2022 onward, despite continued growth in the number of participating nations, the overall network density declined. This reflects a global collaborative network that, while expanding in scale, is gradually becoming sparser, with cooperative relationships becoming more diverse yet more widely distributed.

At the individual level, to further quantify and validate the structural positions of countries within global cooperation networks, three indicators—Degree Centrality, Betweenness Centrality, and Closeness Centrality—were calculated for the top 15 countries by number of partners, using social network analysis methods (Table 2). These metrics characterize a nation’s functional role within international cooperation networks from distinct perspectives, connectivity breadth, bridging capacity, and overall accessibility, aiding the identification of countries exhibiting greater dynamism or structural advantages in global collaboration. Findings reveal Italy occupies a leading position in both partner count (34 nations) and Degree Centrality (0.425), indicating its high connectivity and extensive cooperative foundation within the global cooperation network. China ranks second in the number of partners (33 countries), indicating active cross-continental engagement. The United States and the United Kingdom, with comparable numbers of partners (each above 20), also occupy core positions within the network. Regarding Betweenness Centrality, Italy (0.237), China (0.285), Japan (0.131), the United States (0.126), and Spain (0.117) exhibit relatively high values. This indicates that these nations frequently occupy positions along the shortest paths between countries within the cooperation network, thereby serving as bridges and relays in cross-regional or cross-group collaboration. Regarding the Closeness Centrality metric, nations such as Italy (0.619), China (0.586), and the United States (0.545) exhibit shorter average distances to other network members. This high overall accessibility enables them to efficiently engage with and connect to most nodes in the international cooperation network.

The national cooperation network map visually represents collaborative relationships between nations, with color intensity indicating the breadth of each country’s international engagement, and darker hues signifying a greater number of cooperative ties. As illustrated in Figure 6, the global cooperation network exhibits distinct regional clustering in its spatial distribution, with Europe emerging as the densest core zone, concentrating extensive transnational linkages. Cross-regional cooperative ties are primarily concentrated between Europe and North America, and between Europe and Asia. This has formed three major regional cooperation spheres centered on Europe, North America, and Asia, collectively constituting the core structure of the global co-operation network. Within this network, countries such as Italy, China, the United States, the United Kingdom, France, and Spain stand out for the scale and scope of their cooperation. Occupying pivotal positions within the network, they serve as crucial hubs.

3.1.3. Topic Modeling and Keywords Clustering

Through Natural Language Processing (NLP) and text mining using PyBibX [18], a total of 12 topics, ranging from Topic −1 to Topic 10, were identified. Each topic corresponds to a set of high-frequency keywords and their associated literature. Topic −1 denotes a noise topic. Given that the study’s sample had undergone literature screening, the relevant literature retained value. This research, therefore, subjected this Topic to manual secondary classification, assigning it to the following three Themes. Based on the semantic characteristics of exemplary keywords and research content for each Topic, further inductive consolidation was conducted to identify three major research themes, which are systematically discussed in the subsequent sections (Table 3).

Theme 0: Recognition, encompassing tasks such as identification, detection, classification, and segmentation. Multiple Topics (0, 4, 5, 6, 8) within this theme exhibit high keyword overlap, generally centering on the automated recognition of heritage objects, components, or deterioration. In CH research, detection and classification are seldom isolated tasks but rather sequential processes that serve the same heritage objective. For instance, locating heritage objects or damage (detection) typically requires integration with category determination (classification), and, in specific scenarios, further refinement to the pixel or region-level (segmentation) (Figure 7a). Consequently, this study adopts a holistic approach to heritage tasks and research objectives, unifying the aforementioned methods under the overarching theme of ‘Recognition’; Theme 1: Reconstruction and Virtual Restoration, primarily corresponding to Topics 1, 2, and 10, addresses issues such as three-dimensional reconstruction, geometric and textural restoration, virtual restoration, and digital reproduction (Figure 7b); Theme 2: Monitoring and Prediction, corresponding to Topics 3, 7, and 9, emphasizes applications such as structural health monitoring, risk assessment, environmental impact analysis, and disaster prediction (Figure 7c). Building on this, statistical analysis of annual publication trends across the three research themes (Figure 8) revealed their temporal evolution. Concurrently, VOSviewer v1.6.18 was employed to perform co-occurrence network analysis and visualization for each theme (Figure 7), exploring the internal research hotspots and emerging development directions within each domain.

3.2. ML and DL Technology in Cultural Heritage Conservation

3.2.1. Machine Learning and Deep Learning

ML is a data-driven methodology that enables computers to learn underlying patterns from data by constructing specific algorithms and models, thereby performing fundamental tasks such as classification, regression, clustering, dimensionality reduction, and prediction. In CH research, ML methods are widely used, with primary advantages including robust interpretability, favorable adaptability to limited sample sizes, relatively low computational costs, and effective processing of multi-source, heterogeneous data. Within the general task framework of classification, regression, clustering, dimensionality reduction, and prediction, different ML algorithms or models correspond to distinct application themes and research requirements in CH studies based on their methodological characteristics.

From a learning paradigm perspective, machine learning methods typically encompass supervised, unsupervised, semi-supervised, self-supervised, reinforcement, and active learning. Supervised and unsupervised learning are most prevalent in cultural heritage research. Concurrently, the ML framework has developed a range of strategies to enhance model performance and stability, including ensemble learning [30,31,36] and transfer learning [22,37,38]. These strategies can be flexibly applied across different learning paradigms, enhancing model robustness and predictive performance by combining models or leveraging prior knowledge. Standard ML algorithms and models include linear models [39], such as linear regression and logistic regression, tree models, such as Decision Trees (DT) [1,24,26,40] and Random Forests (RF) [24,26,40,41,42]. Support Vector Machines (SVM) [26,42,43,44], k-Nearest Neighbour (KNN) [24,45], unsupervised models such as k-means [46], Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Principal Component Analysis (PCA) [39,42], and probabilistic graphical models. Furthermore, specific optimization techniques, such as Genetic Algorithms (GA) [47,48] and Particle Swarm Optimization (PSO) [49] are frequently employed as auxiliary methods alongside ML models for parameter search, feature selection, or model structure optimization.

DL constitutes a significant branch of machine learning. Building on Artificial Neural Network (ANN), it employs deeper, more complex Deep Neural Network (DNN) architectures to enable automatic learning of multi-level features. Its core advantage lies in its ability to extract multi-scale, multi-level high-level semantic features directly from raw data in an end-to-end manner, demonstrating superior feature representation capabilities and higher predictive accuracy under large-scale data conditions. Within cultural heritage research, the highly diverse data types, including images, spectra, point clouds, speech, and textual documents, have led to distinct model families and systems emerging from different DL architectures. These architectures, characterized by their distinct network architectures and computational mechanisms, include Convolutional Neural Network (CNN) [37,50,51,52,53], Recurrent Neural Network (RNN), Graph Neural Network (GNN), Generative Adversarial Networks (GAN) [54], and Transformers [55] (Table A1).

From a technical hierarchy perspective, deep learning models can typically be understood across three levels: Architecture, Backbone, and Framework. Architecture denotes the highest-level network category, defining the model’s fundamental computational paradigm and overall structure. The backbone (or feature extractor) serves as the core module that extracts high-level features from input data. Highly versatile, it serves as a foundational component across diverse task frameworks. In cultural heritage applications—whether for image classification, point cloud semantic segmentation, or multimodal recognition tasks, the backbone’s primary function is to encode raw data into feature representations suitable for subsequent processing. A framework is a complete task system built on a backbone that integrates task-specific heads. Examples include Faster R-CNN [35,56], the YOLO series [57,58], RetinaNet, and Mask R-CNN, which are frequently employed in heritage object detection tasks. Furthermore, with ongoing technological advancements, a single architecture often gives rise to multiple model families and variants. Examples include the VGG family (VGG-11, VGG-16, VGG-19), the ResNet family [56] (ResNet-18, ResNet-50, ResNet-101), and the GAN family (DCGAN, StyleGAN, CycleGAN, Pix2Pix). These variants are typically adapted to suit different task requirements and application scenarios by adjusting network depth, convolutional structures, or introducing attention mechanisms.

Having established the fundamental paradigms, model hierarchies, and technical frameworks of machine learning and deep learning within cultural heritage research, it is necessary to systematically compare the strengths and limitations of different AI paradigms in CH contexts with respect to methodological applicability and practical constraints. Given that cultural heritage data typically exhibit characteristics such as small sample sizes, heterogeneous types, and stringent requirements for authenticity and interpretability, different models vary significantly in real-world applications. To this end, Table 4 summarizes the primary strengths and typical limitations faced in cultural heritage applications for standard AI methods currently employed in heritage research. This provides a reference framework for subsequent method selection and application analysis.

3.2.2. Classic Procedures of ML and DL Applications in CH

The application of ML and DL methods in the field of CH typically follows a classical sequence: data acquisition, data preprocessing, model selection, model training and optimization, and model evaluation. This covers the entire process from data collection to result assessment.

1.: Multimodal Data and Data Preprocessing for Digital Cultural Heritage

The digitization of CH relies heavily on multi-sensor systems, with various devices capable of capturing complementary information across dimensions such as geometric structure, surface texture, material composition, and environmental dynamics. Consequently, cultural heritage data exhibit pronounced multimodal characteristics, encompassing diverse data structures, ranging from images to three-dimensional geometry and geophysical and environmental monitoring data. This provides a rich input foundation for subsequent ML and DL applications (Table 5).

Cultural heritage data is often raw, unstructured, and highly heterogeneous, making preprocessing a critical step in ML and DL workflows. The data preprocessing process for machine learning centers on feature extraction, which requires manual design and computation to transform raw data, such as images, point clouds, and spectra, into feature vectors that are recognizable by ML models. For instance, in the task of classifying point clouds of heritage architectural elements, studies [24,25,26,27,59,60] have employed feature extraction techniques. This involves extracting geometric features based on covariance matrices (linearity, planarity, curvature, normals, roughness, anisotropy, verticality, Z-coordinate), and RGB radiometric features. These features are then combined into unified multidimensional vectors for use in classification models such as RF, SVM, or KNN, rather than being trained directly on raw point coordinates. Refs. [40,41] optimized feature sets through feature importance ranking, flexibly selecting key features to enhance classification performance and model efficiency. The key advantage of DL lies in its automatic feature learning, eliminating reliance on manual feature engineering. Data preprocessing focuses on preparing data to fit the model architecture rather than on designing specific features; it primarily involves data formatting and normalization, such as standardizing image sizes, aligning point clouds, normalizing coordinates, and applying data augmentation.

Currently, large-scale standardized public datasets within the field remain limited. Researchers commonly utilize two data sources: existing public datasets, such as the large-scale heritage point cloud segmentation benchmark [28] which involves 17 scenes and 10 semantic labels; the MSD-Det brickwork damage detection dataset [61], the Dunhuang digital art and mural image dataset [62], and a multimodal dataset of images, annotations, and style models for Chinese ancient architecture [63], DAFNE: A Dataset of Mural Fragments for Digital Reconstruction [64]. Second, self-built datasets, which represent a more widely adopted approach. Researchers construct small to medium-scale, customized datasets for specific heritage sites, artifact types, or damage categories using photogrammetry, laser scanning, and other methods. These datasets are utilized for identification, classification, segmentation, prediction, or monitoring tasks.

2.: Model Selection, Model Training, and Model Evaluation

Model selection in CH applications typically depends on task type, data modality, data scale, and desired output format. Specific cultural heritage tasks often require selecting or designing tailored ML and DL models. Transfer learning is a common strategy for model training, employing approaches such as Feature Extraction (freezing most pre-trained layers and retraining only the classification head), Fine-tuning (unfreezing some or all convolutional layers to update model parameters using cultural heritage data), and Full Training (training custom networks or models from scratch). Additionally, some studies employ a “pre-trained CNN + traditional ML classifier” combination. For instance, Navarro et al. [42] used ResNet’s convolutional features as input. They performed final classification with SVM or RF to address cultural heritage objects with limited data but significant category differences.

In CH research, model evaluation methods are highly dependent on the specific heritage task type, with different tasks placing significant emphasis on distinct evaluation metrics. Consequently, judiciously selecting evaluation metrics is crucial for analyzing model performance. For identification and classification tasks, commonly employed metrics include Accuracy [25,38,65,66,67], Precision [24,38,40], Recall [24,37,38,40], F1-score [24,25,38,40,68], Confusion Matrix [41], and Overall Accuracy (OA) [24,25,40,57,66]. These metrics are widely used in tasks such as artifact type recognition, architectural component classification, and heritage damage detection to measure a model’s discriminative capability across categories. For functions involving image quality or geometric reconstruction quality, such as virtual heritage restoration, image enhancement, and 3D reconstruction metrics like PSNR (Peak Signal-to-Noise Ratio), SSIM (Structural Similarity Index), and LPIPS (Learned Perceptual Image Patch Similarity) are employed to evaluate reconstruction outcomes in terms of pixel consistency, structural similarity, and perceptual quality. For tasks centered on time-series or continuous variables, such as structural health monitoring and environmental change prediction, model evaluation typically relies on error metrics such as the Root Mean Square Error (RMSE) and the Mean Absolute Error (MAE). These are often combined with strategies such as K-fold cross-validation to enhance the stability and reliability of model assessment outcomes (Table 6).

3.3. Applications of ML and DL in Cultural Heritage Conservation

When delving into task-specific analyses, this section further examines typical heritage tasks, representative methodologies, and relevant literature to provide an overview of ML and DL applications in CH. Recognition tasks focus on extracting semantic information from multimodal data and encompass traditional objectives such as object detection, recognition, classification, and semantic segmentation. These constitute the most fundamental and widely applied category within cultural heritage AI applications. Reconstruction and Virtual Restoration centers on reconstructing, restoring, or completing the form and appearance of heritage assets using observational data or generating new representations. This encompasses geometric structure reconstruction, virtual restoration, texture generation, and deep learning-based image and 3D content generation. Monitoring and Prediction places greater emphasis on the temporal dimension, employing regression models, temporal networks, or graph networks to forecast heritage risks, environmental responses, and structural health status.

Having established the overall distribution characteristics of AI methods across different themes, it is necessary to further summarize their application content at the level of specific heritage tasks. Table 7 lists the various specific heritage tasks within each theme. Overall: Within the Recognition theme, research primarily focuses on the automatic identification, detection, and classification of diverse cultural heritage objects. This encompasses the recognition and detection of ancient sites and archaeological landscapes; the detection of damage in historical buildings and structures; and the semantic classification and segmentation of point clouds from buildings and sites. It also includes the detection, identification, and classification of fragments in artefacts and archaeological relics; and the recognition of textual, inscribed, and historical document heritage objects, alongside classification tasks targeting typology, style, and pattern analysis. Within the Reconstruction and Virtual Restoration theme, specific heritage tasks can be further subdivided: one strand centers on historical buildings and architectural structures, emphasizing Building Information Modelling (BIM) construction and 3D reconstruction, alongside the generation and restoration of architectural facades and decorative elements; the virtual reconstruction and restoration of archaeological artefacts and objects based on fragment geometry and surface information; and a third category centered on pictorial heritage such as murals, paintings, and ancient texts, focusing on the restoration and recovery of damaged areas, texture and color reconstruction, and the stylistic transfer and generation of ancient paintings. Within the Monitoring and Prediction theme, research primarily addresses the state evolution and risk assessment of heritage objects, categorized as follows: structural health monitoring and prediction of deformation and performance evolution for architectural heritage and structures; risk prediction studies for archaeological sites and other immovable heritage under natural disaster or multi-hazard scenarios; and com-prehensive risk assessment and vulnerability analysis of cultural heritage sites at regional or site scales. These three thematic areas, spanning identification, reconstruction, restoration, monitoring, and prediction, constitute a layered application pathway for AI technologies in cultural heritage research. This progression moves from cognitive understanding to virtual representation, ultimately supporting risk-informed decision-making.

3.3.1. Recognition

Within the heritage Recognition theme, several common tasks emerge: heritage damage detection, automated identification and surveying of archaeological remains, semantic segmentation and classification of 3D point clouds, and ancient script recognition.

In heritage damage detection research, relevant work typically employs object detection models (such as Faster R-CNN and the YOLO series) for the automated identification and spatial localization of damage on heritage surfaces [35,69]. Damage types are predominantly categorized based on their formation mechanisms, encompassing both physical damage such as cracks, spalling, surface staining or water marks, and color aberrations, as well as complex scenarios including chemical deterioration caused by microbial erosion and plant penetration.

Research on the automated identification and survey of archaeological sites primarily addresses large-scale spatial contexts. It focuses on utilizing remote sensing imagery and LiDAR data, and their derived Digital Terrain Model (DTM), combined with machine learning or deep learning methods, to automatically detect faint surface undulations or buried remains. Such research typically emphasizes the extraction of terrain features such as residual relief and multi-scale deviation (MS-DEV). Methods such as random forests, U-Net, CNNs, and transfer learning are widely employed. Relevant application cases include the identification of Neolithic burial mounds [70], Viking-era ring forts [71], and historical mining pits [37].

Ancient script recognition tasks primarily focus on the automatic detection and recognition of textual objects such as oracle bone inscriptions, pictographic scripts, and historical handwritten documents. For instance, ref. [72] proposed an oracle bone inscription detection method based on the SSD model, enabling character segmentation and recognition from rubbing images. This approach effectively addresses challenges such as the small scale of oracle bone characters, their uneven distribution, and complex background noise.

Semantic segmentation and classification of 3D point clouds can be categorized into two approaches: those that act directly on the point cloud and those that act indirectly. The former category, exemplified by PointNet, PointNet++, DGCNN, and its variants, PCNN, etc., directly processes irregular 3D point cloud data without requiring conversion to voxel or image structures. The latter approach converts 3D point clouds into one- or two-dimensional feature representations (feature vectors or sequences), which are then fed into designed ML models (e.g., RF, kNN) for classification [59].

Furthermore, in two-dimensional image classification and recognition tasks, researchers extensively employ CNN architectures such as AlexNet, ResNet, Inception-v3, and Xception. These are applied to cultural heritage images, including architectural facades [73], historical coins [33], ancient Egyptian hieroglyphics [38], and pottery [74], for stylistic classification, element recognition, damage assessment, and matching degree evaluation (Table A2).

3.3.2. Reconstruction and Virtual Restoration

Within this research theme, semi-automated HBIM reconstruction, Neural Radiance Field (NeRF), and the virtual restoration and stitching of cultural heritage constitute three interrelated strands of inquiry that collectively support high-precision digital representation and virtual reconstruction of cultural heritage.

Semantic segmentation and classification of 3D point clouds are critical prerequisites for semi-automated Scan-to-BIM reconstruction within Historical Building Information Modelling (H-BIM). Relevant research typically employs traditional machine learning approaches or deep learning models to automatically segment structural elements such as walls, columns, and arches within historical building point clouds [24,25,27,59,75,76]. The core objective is to extract reliable semantic information from unstructured 3D data and convert it into parametric architectural components suitable for HBIM modelling. This approach reduces manual modelling costs and enhances modelling consistency.

Regarding three-dimensional reconstruction and visualization, refs. [77,78,79] explored the application potential of neural radiant fields (NeRF) and their variants (such as LERFs) in cultural heritage digitization. These were contrasted with traditional photogrammetric methods (MVS–SfM) through comparative analysis of reconstruction quality, data requirements, and computational efficiency. These methods demonstrate certain advantages in scenarios involving non-cooperative surfaces, limited image quantities, or rapid reconstruction, offering new technical pathways for the three-dimensional representation of cultural heritage.

Regarding virtual restoration and reconstruction of cultural heritage, existing research primarily focuses on tasks such as three-dimensional restoration and rebuilding of damaged artefacts, damage repair and image completion for murals [80] and ancient paintings [51,81,82], virtual assembly and reconstruction of fragmented artefacts [64], and style transfer and generation for ancient paintings [50]. Regarding 3D restoration, ref. [79] proposed an integrated framework combining Stable Diffusion for semantic image restoration with NeRF for 3D reconstruction, demonstrating its feasibility through ceramic artefact case studies. For mural and painting restoration, extensive research focuses on image stitching and automated defect filling. Examples include the low-texture image stitching method, DunHuangStitch [83] for Dunhuang murals, and image restoration strategies based on CNNs, GANs, and diffusion models [84]. Some studies also address color degradation in ancient painted murals, proposing color recovery methods using reversible residual networks [81].

Furthermore, the application of generative artificial intelligence in artistic expression and style transfer is gaining increasing attention. Relevant research employs generative models to learn and reconstruct ancient artistic styles, such as the Convolutional Neural Network Style Transfer method proposed for Chinese painting [50], which aims to preserve the unique brushstrokes and ink characteristics of ink wash painting. Ref. [85] proposed a diffusion model guided by semantic layout for high-fidelity synthesis of traditional masterpieces like the Thousand Miles of Rivers and Mountains, offering novel possibilities for digital heritage display and creative assistance (Table A3).

3.3.3. Monitoring and Prediction

This theme focuses on Structural Health Monitoring (SHM), structural performance assessment, and model updating for heritage buildings, as well as risk modeling and prediction for cultural heritage in the context of natural disasters. Relevant research typically employs time-series monitoring data, structural response data, or multi-source environmental factors as inputs, utilizing machine learning and deep learning methods to enhance data quality, structural condition recognition capabilities, and the reliability of disaster risk assessments.

Regarding structural health monitoring, ref. [86] demonstrated a method employing deep learning neural networks to repair anomalies in SHM data from ancient city walls. An optimized model effectively addressed outliers, drift, and missing data issues, thereby enhancing the stability and credibility of long-term monitoring data. Ref. [87] further explored the application of a CNN–LSTM hybrid model for electromagnetic impedance signal prediction, monitoring bond strength variations in reinforced concrete components. This work highlights the method’s potential to assess the condition of architectural heritage and to enable early warning systems.

Regarding structural assessment and model updating, refs. [47,48] combined optimization algorithms with machine learning models for parameter updating and calibration of finite element models of historic buildings. This approach more accurately reflects actual structural loading conditions. It supports planning for heritage restoration and reinforcement, demonstrating the advantages of ML methods in reducing physical modelling uncertainty and compensating for limitations in field testing.

In the context of disaster risk and multi-hazard threat modelling, research has addressed fire risk prediction, landslide susceptibility mapping, and integrated disaster assessment. Ref. [36] systematically analyzed key input features and pre-processing methods for different disaster types, including entropy-weighted TOPSIS, variance inflation factor (VIF), and estimates of climate and land-use change. Ref. [31] By integrating multiple ML models (ANN, GBM, MaxEnt), the stability and predictive reliability of landslide susceptibility mapping for World Heritage sites were enhanced. Ref. [30] Further work combined machine learning with future climate scenarios and land-use change models to achieve multi-hazard susceptibility mapping, identifying cultural heritage areas facing long-term risks.

Additionally, some studies have introduced remote sensing and non-destructive testing techniques into heritage deterioration monitoring and risk assessment. Ref. [39] Multispectral imagery and machine learning methods were employed to map deterioration in historical defensive structures, while [43] an SVM-based ML model combined with terahertz (THz) non-destructive testing technology enabled the prediction of internal cavity deterioration in stone artefacts (e.g., Yungang Grottoes) (Table A4).

4. Discussion

This section explores key issues in the integrated development of CH and AI from multiple interconnected perspectives. First, based on research bibliometric findings, it examines the evolutionary characteristics of cross-disciplinary research between this field and AI. Second, focusing on CH data systems, it addresses structural challenges such as multi-source data integration, standardization efforts, long-term data governance, and the absence of benchmark datasets. Building on this foundation, it further examines the potential and limitations of synthetic data in alleviating data scarcity and enhancing model generalization. Finally, adopting a “data-technology-task” coupling perspective, it explores potential pathways for collaborative decision-making between human experts and intelligent systems. Through these multidimensional discussions, this section aims to construct a forward-looking analytical framework to deepen understanding of research directions and practical models in cultural heritage AI.

4.1. Exploring Characteristics of Interdisciplinary Research on CH and AI Through Bibliometric Results

Between 2011 and 2025, the number of publications on interdisciplinary research between CH conservation and AI exhibited a sustained upward trend at both global and national levels. This growth did not occur in isolation but resulted from the synergistic effects of multiple factors. On the one hand, the early application of machine learning in tasks such as pattern recognition and classification established preliminary methodological frameworks for heritage data analysis. Subsequent technological breakthroughs in deep learning, particularly in image understanding, three-dimensional reconstruction, semantic segmentation, and automated feature extraction, have further enhanced capabilities for processing complex heritage data. Concurrently, the maturation of open-source software ecosystems, widespread access to GPU and cloud computing infrastructure, sustained investment in funding and digitalization policies, heightened institutional focus on heritage conservation risks and monitoring, reduced costs for accessing digital resources, the establishment of open-access publishing ecosystems, the internet’s efficient dissemination mechanisms, and the gradual development of interdisciplinary research communities have collectively propelled deep integration across architecture, archaeology, remote sensing science, computer science, and cultural heritage conservation. These factors have played an undeniable role in disseminating knowledge within cultural heritage conservation, significantly enriching both research content and practical experimentation in the field.

Furthermore, this study presents certain national-level analytical findings using absolute frequency values such as Degree Centrality, Betweenness Centrality, and Closeness Centrality. Such results are not entirely equivalent to the research capabilities of various countries. Instead, they are the result of the combined influence of multiple factors to a certain extent, including population size, research investment, the level of scientific research infrastructure, as well as the quantity and volume of cultural heritage resources. Meanwhile, the policy orientations of various countries in the field of digitalization and the intelligence of CH also have a significant impact on the research activity in this area. For instance, China’s 14th Five-Year Plan for Cultural Heritage Protection and Technological Innovation [88] and the Opinions on Promoting the Implementation of the National Cultural Digitization Strategy [89] explicitly advocate advancing the digitalization of cultural heritage and the deep integration of intelligent technologies like artificial intelligence. By continuously strengthening heritage data infrastructure and establishing intelligent conservation systems, these policies provide robust support for research and practice in this domain. At the European level, multiple research funding frameworks and strategic plans centered on cultural heritage digitization, such as the European Cultural Heritage Collaborative Cloud (ECCCH) initiative under the EU’s Horizon Europe programme [90] and the European Cultural Heritage Common Data Space Strategy (2025–2030) [91], have effectively fostered the establishment and development of transnational collaborative networks in this domain. Nations such as Italy, possessing exceptionally dense heritage resources, are better positioned to occupy pivotal nodes within international cooperative networks for interdisciplinary research at the intersection of CH preservation and AI, driven by both national policy support and EU funding mechanisms.

4.2. Interoperability and Standardisation Challenges for Multimodal CH Data

In response to the expanding range of data modalities within CH research, this review systematically catalogues sensors, acquisition tools, and diverse data types, including RGB imagery, hyperspectral and thermal imaging data, LiDAR point clouds, time-series monitoring data, spectral records, textual archives, and knowledge graphs. The rapid development of three-dimensional digitalization and multimodal acquisition technologies has significantly enhanced the recording and analysis capabilities of CH, but it has also brought dual challenges of data complexity control and standardized management. Throughout the entire process from data source investigation, dataset construction, to specific application tasks, the interoperability and long-term sustainable management of cultural heritage data have always been the core issues that need to be urgently addressed. In practice, heritage data frequently suffers from fragmented formats, inconsistent metadata standards, and significant variations in domain-specific annotation conventions. Concurrently, much research relies on proprietary datasets that are difficult to reuse, compare, or expand, thereby constraining reproducibility and cumulative development. For heritage practitioners lacking technical expertise, manually managing complex metadata systems in distributed environments proves both time-consuming and error-prone. The fragmentation of tools across commercial and open-source workflow systems further complicates data integration, posing greater challenges to scalability and consistency.

Concurrently, the absence of standardized evaluation benchmarks warrants attention. Unlike general computer vision, which benefits from extensive public datasets and mature evaluation frameworks, cultural heritage objects and their environments exhibit high diversity and context dependency. This limits the practical application of transfer learning and cross-scenario generalization. The current lack of publicly available, high-quality datasets and unified evaluation benchmarks directly impacts the reproducibility of model results and the reliability of cross-study comparisons. A robust evaluation framework for CH must satisfy several core conditions: transparent data collection processes, clear and unified annotation standards, reproducible pre-processing pipelines, assessment metrics aligned with actual heritage conservation objectives, and open-source licensing agreements supporting reuse. The gradual construction and refinement of shared datasets will foster the development of a data ecosystem with industry-wide reference value, enhance the comparability of experimental outcomes, and promote interdisciplinary collaboration among heritage scientists, data engineers, and machine learning researchers.

The academic and CH communities have addressed these challenges through multiple avenues, including establishing standardized metadata frameworks, developing storage and access mechanisms for heterogeneous data, advancing metadata sharing and long-term accessibility, adopting FAIR data principles, implementing ontology-based semantic alignment methods, and strengthening data governance alongside transparent licensing practices [92]. Large-scale digital heritage platforms, exemplified by the EU-funded Europeana Foundation [93], have sought to integrate digital resources from museums, libraries, and archives through unified interfaces, thereby establishing infrastructure for cross-institutional data linkage. Such practices demonstrate that data standardization is not merely a technical issue, but rather a long-term process requiring heritage scholars and technical domains to collaboratively develop semantic frameworks and data specifications. From a broader perspective, enhancing heritage data interoperability and refining governance systems are essential prerequisites for knowledge transmission and cultural preservation. Only through clear data structures, open sharing mechanisms, and linkable semantic frameworks can CH repositories effectively support public access, academic research, and professional conservation practices, and provide scientific decision-making support for heritage protection policy formulation and long-term governance.

4.3. The Rise and Limitations of Synthetic Datasets

Beyond the challenges of interoperability and standardization in multimodal data, recent CH research has witnessed a trend towards employing synthetic datasets to bridge gaps in authentic data. This practice simultaneously raises potential ethical concerns regarding cultural heritage [94]. For instance, artificially generated heritage imagery can be confused with authentic photographs, potentially misleading the public and researchers regarding the actual condition of heritage sites. This, in turn, undermines the reliability of heritage documentation and conservation decisions. The application of synthetic data primarily stems from two requirements: firstly, the irreversible loss of historical data, where numerous cultural heritage items have suffered damage, fading, weathering, or even complete destruction due to their age, resulting in the incomplete preservation of authentic imagery and structural information; Secondly, limitations in the scale of authentic datasets. Constrained by conservation principles, environmental conditions, and legal and ethical regulations, substantial heritage sites often lack sufficient high-quality samples to meet the requirements of data-driven models for sample size. Driven by generative AI, related research has begun leveraging techniques such as GANs and diffusion models to synthesize missing images or structural information for heritage artefacts, or to construct augmented datasets when authentic samples are insufficient. This enhances model generalization capabilities and even supports few-shot or zero-shot learning [95]. Therefore, future research also needs to consider building a more systematic, verifiable, and CH characteristic synthetic data generation and evaluation framework, which not only ensures the quality and authenticity of synthetic data but also clarifies its application scope and ethical boundaries, thereby promoting the high-quality construction and compliant utilization of the AI data system for cultural heritage.

4.4. Human–Machine Collaborative Decision-Making Pathways for “Data + Technology + Task”

The high heterogeneity of cultural heritage objects in terms of material properties, spatial structure, historical context, and protection demands determines that there is no unified process applicable to all scenarios. This distribution essentially reflects the compatibility relationship among the characteristics of heritage objects, the nature of tasks, data structures, and model capabilities. Although specific technical methods are context-dependent, they can still be systematically sorted out and interpreted from a higher dimension through a comprehensive framework of “data + technology + task”. Compared to existing reviews that focus solely on single dimensions (such as algorithm categories, data types, or application scenarios), this framework emphasizes the interdependent relationship among these three elements: the data foundation determines the feasibility of technical pathways, technical capabilities shape how tasks are realized, and task objectives, in turn, constrain data collection strategies and model evaluation systems. This structural analytical perspective not only illuminates the intrinsic connections between diverse themes within cultural heritage AI research but also provides a crucial analytical foundation for cross-domain method transfer and technological innovation.

Typical application cases can clearly explain the guiding value of this framework for specific research designs. For instance, in the scenario of automatic identification of surface diseases of heritage (such as cracks, biological erosion, weathering, etc.), high-resolution RGB images serve as the core data modalism. The task falls within the scope of identification, and the ultimate protection goal points to the assessment of heritage status and risk early warning. This configuration is naturally compatible with convolutional neural networks or Transformer-based visual segmentation models. At the same time, from the perspective of heritage protection practice, in addition to technical indicators such as IoU, Precision, and Recall, model evaluation also needs to focus on practical dimension requirements such as the uniformity of disease determination criteria, generalization ability across heritage objects, robustness to light and environmental changes, and the interpretability of model results. Similarly, in the task of classifying heritage components based on LiDAR or photogrammetric point clouds, three-dimensional geometric data and topological relationships become core factors. The task has dual attributes of recognition and reconstruction, making it more suitable to adopt graph neural networks or point cloud deep learning architectures. In addition to geometric accuracy indicators, its evaluation system also needs to take into account semantic consistency, the adaptability of modeling results to the compilation of heritage archives, and subsequent protection applications, etc. The above cases fully demonstrate that the selection of technical methods in the field of cultural heritage should always be embedded in the specific context of protection practices, rather than being discussed in isolation from the data foundation and task goals.

Although this paper categorizes Recognition, Reconstruction, Virtual and Restoration, and Monitoring and Prediction as distinct research themes, the underlying machine learning architectures supporting all three exhibit methodological continuity. Core methods such as feature extraction, representation learning, and attention mechanisms can be shared and reused among different tasks. The convolutional neural network and Transformer architecture initially developed for image recognition are now being adapted for 3D reconstruction and monitoring, as well as predictive tasks. Similarly, models trained for segmentation and classification can also be transferred to scenarios such as heritage damage assessment and disaster risk prediction. Therefore, the subject classification of this article should be understood as an overlapping and interrelated ecosystem rather than clearly demarcated independent sections. In addition, conventional model evaluation technical indicators have inherent limitations in the application scenarios of CH. Problems such as category imbalance, absence of objective truth, subjectivity in the judgment of repair quality, and result bias caused by verification based on specific locations may all lead to misjudgment of model performance. This suggests that the academic community must construct a dedicated evaluation framework for the field of CH, closely integrating algorithm accuracy, semantic interpretability, and the practical needs of heritage protection. More importantly, machine learning and deep learning models should be positioned as decision support tools rather than automated protection solutions that can independently replace expert judgment. CH objects possess uniqueness, historical context dependence, and multiple value dimensions. Meanwhile, the output of AI models is inevitably limited by the bias of training data and the preset assumptions of the model. Therefore, its analysis results need to be interpreted in coordination with expert knowledge, historical research conclusions, and heritage management goals. The future research direction should not pursue the “absolute artificial intelligence” of CH conservation, but rather focus on building a collaborative decision-making system where human experts and computer systems deeply collaborate, establishing a sustainable balance between technological empowerment and the inheritance of cultural connotations.

5. Conclusions and Prospects

This review systematically examines and analyses research advancements in machine learning and deep learning within the cultural heritage sector. Centered on three core themes: heritage identification, reconstruction and virtual restoration, and monitoring and prediction, it summarizes representative methodologies, technical pathways, and overarching developmental trends. Comprehensive analysis indicates that the applicability of diverse models and algorithms in cultural heritage contexts is highly contingent upon specific task objectives, heritage object characteristics, and data modalities and structures. A relatively straightforward pattern of methodological adaptation has gradually emerged across different research domains. Despite ongoing challenges at the data, methodological, and application levels, artificial intelligence technologies have demonstrated significant potential in heritage conservation, risk assessment, digital reconstruction, and knowledge dissemination, and are increasingly becoming a vital technical underpinning for cultural heritage research and practice.

From a future development perspective, current AI methodologies in the cultural heritage domain require further refinement in the following areas:

Shared cultural heritage benchmarks: The development of open, standardized, and well-documented benchmark datasets tailored to cultural heritage tasks, enabling fair comparison of methods and improving reproducibility.
Explainable AI (XAI) protocols: The integration of explainability frameworks to enhance transparency, interpretability, and trust in AI-assisted heritage analysis and conservation decision-making.
AI–conservator co-design approaches: The promotion of collaborative workflows in which AI systems are designed and validated in close cooperation with conservators, restorers, and domain experts, ensuring methodological relevance and ethical compliance.

Concurrently, this study explicitly acknowledges the following limitations: First, geographical bias: The reviewed literature is unevenly distributed across regions, with a pronounced concentration of studies from Europe and East Asia, which may limit the generalizability of findings to underrepresented cultural and geographic contexts. Secondly, prevalence of experimental studies: Most contributions remain at a proof-of-concept or experimental level, with limited evidence of large-scale deployment or routine adoption in real-world conservation practice. Finally, lack of long-term validation: Few studies evaluate the long-term reliability, stability, and conservation impact of ML/DL-based solutions, particularly under changing environmental conditions or across extended monitoring periods.

Author Contributions

Conceptualization, X.L., F.C. and G.S.; methodology, X.L.; software, X.L.; validation, X.L., F.C. and G.S.; formal analysis, X.L.; investigation, X.L.; data curation, X.L.; writing—original draft preparation, X.L.; writing—review and editing, X.L., F.C. and G.S.; visualization, X.L.; supervision, F.C. and G.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors gratefully acknowledge the support of the China Scholarship Council (CSC).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. DL Architectures and Technical Features in Cultural Heritage.

Architecture/Models	Backbone/Family/Variants/Frameworks	CH Applications
Deep Neural Network (DNN)	MLP, Custom architectures	Classification, Regression, Feature Embedding, Attribute Prediction
Convolutional Neural Network (CNN)	AlexNet, Inception (GoogLeNet), ResNet family, VGG family, DenseNet, MobileNet, EfficientNet	Image recognition, classification, object detection, semantic segmentation and classification, instance segmentation, point cloud classification and segmentation, feature extraction, image classification, object detection, feature extraction
Convolutional Neural Network (CNN)	R-CNN variants, RetinaNet, YOLO family, Single Shot Multibox Detector (SSD), Mask R-CNN
Recurrent Neural Network (RNN)	RNN, Long Short-Term Memory (LSTM), GRU	Sequence prediction, time series analysis and monitoring, speech recognition, sensor signal analysis, text processing, OCR recognition, speech data analysis
Graph Neural Network (GNN)	GCN, GAT, GraphSAGE	Network relationship analysis, multimodal association, structural modeling, network topology inference, relationship modeling, knowledge graph
Generative Adversarial Network (GAN)	DCGAN, StyleGAN, CycleGAN, Pix2Pix	Image enhancement, image generation, style transfer, virtual restoration, and synthetic training data
Transformer	BERT, ViT, DPT, Swin	Long Sequence and Multimodal Data Processing, Depth Estimation, Visual Reconstruction, Language Data Processing, Multimodal Fusion, Text Generation, Sentiment Recognition
Autoencoder (AE)	AE, VAE, U-Net, FCRN-Depth	Image reconstruction, noise reduction, semantic segmentation, monocular depth estimation (MDE), 3D reconstruction
Diffusion Models	DDPM, Stable Diffusion	Generative Enhancement, Missing Data Restoration, Virtual Reconstruction
Hybrid Models	CNN + Transformer (Swin Transformer), GNN + Transformer (Graph Transformer)	Classification, detection, segmentation, recognition, etc.

Table A2. Selected Core Literature Under the Recognition Theme.

CH Sub-Task	ML/DL Method	Case	Data	Evaluation	Reference
Archaeological remains detection (Neolithic burial mound)	RF	The megalithic funeral structures in the region of Carnac, the Bay of Quiberon, and the Gulf of Morbihan (France)	Signature (Maximum Terrain Deviation at Three Scales)	confusion matrix, Cohen’s kappa coefficient, precision, recall; probability map	[70]
Pottery fragments detection	RF	Archaeological Project at Abdera and Xanthi (APAX), Greece	Signature (RGB and gradient)	Detection rate	[74]
Early Fire Detection in Cultural Heritage Buildings	Fire-Det, Fire-Det Nano	The Forbidden City, Prince Gong’s Mansion (China)	Video frame	Recall, Precision, mAP_0.5, Confusion Matrix	[58]
Architectural heritage classification	AlexNet, Inception V3, ResNet, and Inception-ResNet-v2	Architectural Heritage Elements Dataset (AHE_Dataset)	Images	Accuracy, F1 score, Recall, Precision, Confusion Matrix	[22]
archaeological/architectural scenarios classification	Fast RF, K-means, RF (implemented in Weka and ImageJ/Fiji)	Basilica and Temple of Neptune (Paestum), Porticoes (Bologna), Mausoleum (Trento), Italy	2D texture images, and 3D point clouds	Precision, Recall, F1-score, Confusion Matrix	[40]
Monumental architectural style façade classification	EXplainable Neural-Symbolic Learning, X-NeSyL (EXPLANet and SHAP-Backprop)	MonuMAI dataset	Images	Accuracy, mAP	[96]
architectural structural elements semantic segmentation	PointNet, PointNet++, PCNN, DGCNN and improved-DGCNN	ArCH (architectural cultural heritage) dataset	point clouds and feature vectors	Confusion matrix, Precision, Recall, F1-Score, IoU	[25]
architectural structural elements semantic segmentation	kNN, NB, DT, RF; DGCNN, improved-DGCNN, DGCNN-Mod, DGCNN-3Dfeat, DGCNN-Mod+3Dfeat	ArCH (architectural cultural heritage) dataset	point clouds and feature vectors	Overall Accuracy (OA), weighted Precision, Recall and F1-Score, IoU	[24]
Ceramics automatic classification	ResNet-18, SVM, RF	the binary image database of Iberian wheel-made pottery vessels’ profiles	binary images	Accuracy, Precision, Recall, F1-Score	[42]
Monumental Heritage Architectural Styles Classification Key Elements	MonuNet and MonuMAI-KED (Faster R-CNN, ResNet-101)	MonuMAI dataset	Images	Precision, Recall, F1-Score, mAP, mAR	[97]
Character Segmentation in Historical Document Images	Customized CNN	Historical Document Images (Tripitaka Koreana in Han)	Images	IoU, Precision, Recall, F1-Score	[98]
Classification of Ancient Egyptian Hieroglyphs	ResNet-50, Inception-v3, Xception and	Merging of two distinct pictographic data sets (derived from pyramid walls, texts, carvings, and murals)	Grayscale images and RGB images	Accuracy, Precision, Recall, F1-Score	[38]
cavernous weathering extent	RF	Hegra (UNESCO World Heritage Site, Kingdom of Saudi Arabia)	TLS and UAV-DP point clouds	Accuracy, Recall, Precision	[99]

Table A3. Selected Core Literature Under the Reconstruction and Virtual Restoration Theme.

CH Sub-Task	ML/DL Method	Heritage Object	Data	Evaluation	Reference
Architectural structural elements semantic segmentation to HBIM	RF	The Pisa Charterhouse (Italy)	Feature Vectors, 3D point clouds	Precision, recall, overall accuracy, and F-measure, confusion matrix	[26]
Archaeological artifacts fragment matching	M5P regression trees (implemented in Weka)	Fresco Fragments	Feature Vectors (extracted from color images, point clouds, 2D contours)	Precision, Recall	[1]
Building façade 3D reconstruction	FCRN-Depth for Single Image Depth Estimation (SIDE), Pix2Pix GAN for Facade Structural Element	Building Façade	RGB images	MSE, Absolute distance deviation	[73]
Reliefs 3D reconstruction	Soft-edge-enhanced Depth Estimation Network	Reliefs	2D Monocular photographs, 2D Edge Images	RMSE, Threshold Accuracy	[100]
3D digitization and immersive visualization of historic buildings	NeRF, Mask R-CNN	Interior Scenes of Historic Buildings	360° images frame from video, camera poses	PSNR, SSIM, Precision, Error	[77]
Degraded artefacts restoration and 3D rendering reconstruction	Stable Diffusion, NeRF	Ceramic artifacts	Images and camera poses, text	MSE, PSNR, SSIM, UIQI, VIF	[79]
3D rendering reconstruction	NeRF	Terpsichore statue, Eagle-shaped lectern, Caprona Tower	Images and camera poses	Cloud-to-cloud deviation analysis	[78]
Restoration of painted decorations on heritage building structures	U-Net-MobileNet, Pix2pix, GauGAN	The Forbidden City, China	Mobile phone photographs	Accuracy, Intersection over Union, IoU	[82]
Restoration of the Giant Mural	Hybrid CNN-VIT, GLGAN	Yongle Palace, China	Experimental Mural Data	MAE, MSE, PSNR, SSIM	[80]
Restoration of the missing mural area	GAN, FCN	Wutaishan, Shanxi Province, China	Images	MSE, PSNR, SSIM	[84]

Table A4. Selected Core Literature Under the Monitoring and Prediction Theme.

CH Sub-Task	ML/DL Method	Case	Data	Evaluation	Category	Reference
Assessment of landslide susceptibility	GBM, MaxEnt, Ensemble modelling	Cinque Terre National Park (World Heritage site) of northern Italy	260 landslides points, 13 predisposing factors (PFs)	k-fold cross-validation, ROC (Receiver Operating Characteristics)/AUC (Area Under Curve), True Skill Statistic (TSS), Coefficient of Variation (CV)	landslide	[31]
Future rainfall, Future land use analysis, and hazard susceptibility assessment	Boosted Regression Tree, BRT Bayesian Additive Regression Tree, BART Bayesian Generalized Linear Model, BGLM	27 CH sites and monuments in the Sikkim state of northeastern India	22 multi-hazard causative factors, 269 and 67 seismic induced debris and rock fall data points from multi-sourse data	receiver operating characteristics-area under curve (ROC-AUC), sensitivity (TPR), specificity (TNR), positive predictive value (PPV) and negative predictive value (NPV)	debris fall, rock fall, and their multi-hazard (MH)	[30]
Tide level prediction	M5P Regression Tree, Random Forest, RF, Multilayer Perceptron, MLP	Venice, Italy	Tide gauge data (Punta della Salute) and meteorological data (CNR platform)	R², Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Relative Absolute Error (RAE)	Tide level	[32]
stone relics hollowing deterioration prediction	SVM-based hollowing deterioration prediction model (SVM-HDPM)	Yungang Grottoes, China	40 stone fragments 2 mm thick from the Yungang Grottoes sandstone sample	leave-one-out cross-validation (LOOCV), mean square error (MSE), Accuracy	hollowing deterioration	[43]

References

Funkhouser, T.; Shin, H.; Toler-Franklin, C.; Castañeda, A.G.; Brown, B.; Dobkin, D.; Rusinkiewicz, S.; Weyrich, T. Learning How to Match Fresco Fragments. J. Comput. Cult. Herit. JOCCH 2011, 4, 1–13. [Google Scholar] [CrossRef]
Mishra, M. Machine Learning Techniques for Structural Health Monitoring of Heritage Buildings: A State-of-the-Art Review and Case Studies. J. Cult. Herit. 2021, 47, 227–245. [Google Scholar] [CrossRef]
Tejedor, B.; Lucchi, E.; Bienvenido-Huertas, D.; Nardi, I. Non-Destructive Techniques (NDT) for the Diagnosis of Heritage Buildings: Traditional Procedures and Futures Perspectives. Energy Build. 2022, 263, 112029. [Google Scholar] [CrossRef]
Mishra, M.; Lourenço, P.B. Artificial Intelligence-Assisted Visual Inspection for Cultural Heritage: State-of-the-Art Review. J. Cult. Herit. 2024, 66, 536–550. [Google Scholar] [CrossRef]
Yang, S.; Hou, M.; Li, S. Three-Dimensional Point Cloud Semantic Segmentation for Cultural Heritage: A Comprehensive Review. Remote Sens. 2023, 15, 548. [Google Scholar] [CrossRef]
Argyrou, A.; Agapiou, A. A Review of Artificial Intelligence and Remote Sensing for Archaeological Research. Remote Sens. 2022, 14, 6000. [Google Scholar] [CrossRef]
Bickler, S.H. Machine Learning Arrives in Archaeology. Adv. Archaeol. Pract. 2021, 9, 186–191. [Google Scholar] [CrossRef]
Li, Y.; Zhao, L.; Chen, Y.; Zhang, N.; Fan, H.; Zhang, Z. 3D LiDAR and Multi-Technology Collaboration for Preservation of Built Heritage in China: A Review. Int. J. Appl. Earth Obs. Geoinf. 2023, 116, 103156. [Google Scholar] [CrossRef]
Li, Y.; Du, Y.; Yang, M.; Liang, J.; Bai, H.; Li, R.; Law, A. A Review of the Tools and Techniques Used in the Digital Preservation of Architectural Heritage within Disaster Cycles. Herit. Sci. 2023, 11, 199. [Google Scholar] [CrossRef]
Fiorucci, M.; Khoroshiltseva, M.; Pontil, M.; Traviglia, A.; Del Bue, A.; James, S. Machine Learning for Cultural Heritage: A Survey. Pattern Recognit. Lett. 2020, 133, 102–108. [Google Scholar] [CrossRef]
Page, M.J.; Moher, D.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. PRISMA 2020 Explanation and Elaboration: Updated Guidance and Exemplars for Reporting Systematic Reviews. BMJ 2021, 372, n160. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]
Jan van Eck, N.; Waltman, L. Text Mining and Visualization Using VOSviewer. arXiv 2011, arXiv:1109.2058. [Google Scholar] [CrossRef]
Wasserman, S.; Faust, K. Social Network Analysis: Methods and Applications; Cambridge University Press: Cambridge, UK, 1994. [Google Scholar] [CrossRef]
Freeman, L.C. Centrality in Social Networks Conceptual Clarification. Soc. Netw. 1978, 1, 215–239. [Google Scholar] [CrossRef]
Brandes, U. A Faster Algorithm for Betweenness Centrality. J. Math. Sociol. 2001, 25, 163–177. [Google Scholar] [CrossRef]
Sabidussi, G. The Centrality Index of a Graph. Psychometrika 1966, 31, 581–603. [Google Scholar] [CrossRef]
Pereira, V.; Basilio, M.P.; Santos, C.H.T. PyBibX—A Python Library for Bibliometric and Scientometric Analysis Powered with Artificial Intelligence Tools. Data Technol. Appl. 2025, 59, 302–337. [Google Scholar] [CrossRef]
Mallinis, G.; Mitsopoulos, I.; Beltran, E.; Goldammer, J.G. Assessing Wildfire Risk in Cultural Heritage Properties Using High Spatial and Temporal Resolution Satellite Imagery and Spatially Explicit Fire Simulations: The Case of Holy Mount Athos, Greece. Forests 2016, 7, 46. [Google Scholar] [CrossRef]
Llamas, J.; Lerones, P.M.; Zalama, E.; Gómez-García-Bermejo, J. Applying Deep Learning Techniques to Cultural Heritage Images within the INCEPTION Project. In Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection; Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2016; Volume 10059, pp. 25–32. [Google Scholar] [CrossRef]
Obeso, A.M.; Benois-Pineau, J.; Álvaro, A.; Acosta, R.; Saraí, M.; Vázquez, G.; Montoya Obeso, A.; García, M.S.; Saraí García Vázquez, M. Architectural Style Classification of Mexican Historical Buildings Using Deep Convolutional Neural Networks and Sparse Features. J. Electron. Imaging 2016, 26, 11016. [Google Scholar] [CrossRef]
Llamas, J.; Lerones, P.M.; Medina, R.; Zalama, E.; Gómez-García-Bermejo, J. Classification of Architectural Heritage Images Using Deep Learning Techniques. Appl. Sci. 2017, 7, 992. [Google Scholar] [CrossRef]
Bassier, M.; Vergauwen, M.; Van Genechten, B. Automated classification of heritage buildings for as-built BIM using machine learning techniques. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 4, 25–30. [Google Scholar] [CrossRef]
Matrone, F.; Grilli, E.; Martini, M.; Paolanti, M.; Pierdicca, R.; Remondino, F. Comparing Machine and Deep Learning Methods for Large 3D Heritage Semantic Segmentation. ISPRS Int. J. Geo Inf. 2020, 9, 535. [Google Scholar] [CrossRef]
Pierdicca, R.; Paolanti, M.; Matrone, F.; Martini, M.; Morbidoni, C.; Malinverni, E.S.; Frontoni, E.; Lingua, A.M. Point Cloud Semantic Segmentation Using a Deep Learning Framework for Cultural Heritage. Remote Sens. 2020, 12, 1005. [Google Scholar] [CrossRef]
Croce, V.; Caroti, G.; De Luca, L.; Jacquot, K.; Piemonte, A.; Véron, P. From the Semantic Point Cloud to Heritage-Building Information Modeling: A Semiautomatic Approach Exploiting Machine Learning. Remote Sens. 2021, 13, 461. [Google Scholar] [CrossRef]
Teruggi, S.; Grilli, E.; Russo, M.; Fassi, F.; Remondino, F. A Hierarchical Machine Learning Approach for Multi-Level and Multi-Resolution 3D Point Cloud Classification. Remote Sens. 2020, 12, 2598. [Google Scholar] [CrossRef]
Matrone, F.; Lingua, A.; Pierdicca, R.; Malinverni, E.S.; Paolanti, M.; Grilli, E.; Remondino, F.; Murtiyoso, A.; Landes, T. A benchmark for large-scale heritage point cloud semantic segmentation. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, XLIII-B2-2020, 1419–1426. [Google Scholar] [CrossRef]
Hatir, M.E.; Barstuğan, M.; İnce, İ. Deep Learning-Based Weathering Type Recognition in Historical Stone Monuments. J. Cult. Herit. 2020, 45, 193–203. [Google Scholar] [CrossRef]
Saha, A.; Pal, S.C.; Santosh, M.; Janizadeh, S.; Chowdhuri, I.; Norouzi, A.; Roy, P.; Chakrabortty, R. Modelling Multi-Hazard Threats to Cultural Heritage Sites and Environmental Sustainability: The Present and Future Scenarios. J. Clean. Prod. 2021, 320, 128713. [Google Scholar] [CrossRef]
Di Napoli, M.; Carotenuto, F.; Cevasco, A.; Confuorto, P.; Di Martire, D.; Firpo, M.; Pepe, G.; Raso, E.; Calcaterra, D. Machine Learning Ensemble Modelling as a Tool to Improve Landslide Susceptibility Mapping Reliability. Landslides 2020, 17, 1897–1914. [Google Scholar] [CrossRef]
Granata, F.; Di Nunno, F. Artificial Intelligence Models for Prediction of the Tide Level in Venice. Stoch. Environ. Res. Risk Assess. 2021, 35, 2537–2548. [Google Scholar] [CrossRef]
Mehta, S.; Kukreja, V.; Bordoloi, D. Heritage Coin Identification Using Convolutional Neural Networks: A Multi-Classification Approach for Numismatic Research. In Proceedings of the 2023 2nd International Conference on Augmented Intelligence and Sustainable Systems, ICAISS 2023, Trichy, India, 23–25 August 2023; pp. 1–6. [Google Scholar] [CrossRef]
Cardellicchio, A.; Ruggieri, S.; Nettis, A.; Renò, V.; Uva, G. Physical Interpretation of Machine Learning-Based Recognition of Defects for the Risk Management of Existing Bridge Heritage. Eng. Fail. Anal. 2023, 149, 107237. [Google Scholar] [CrossRef]
Trier, Ø.D.; Reksten, J.H.; Løseth, K. Automated Mapping of Cultural Heritage in Norway from Airborne Lidar Data Using Faster R-CNN. Int. J. Appl. Earth Obs. Geoinf. 2021, 95, 102241. [Google Scholar] [CrossRef]
Lei, Y.; Shen, Z.; Tian, F.; Yang, X.; Wang, F.; Pan, R.; Wang, H.; Jiao, S.; Kou, W. Fire Risk Level Prediction of Timber Heritage Buildings Based on Entropy and XGBoost. J. Cult. Herit. 2023, 63, 11–22. [Google Scholar] [CrossRef]
Gallwey, J.; Eyre, M.; Tonkins, M.; Coggan, J. Bringing Lunar LiDAR Back Down to Earth: Mapping Our Industrial Heritage through Deep Transfer Learning. Remote Sens. 2019, 11, 1994. [Google Scholar] [CrossRef]
Barucci, A.; Cucci, C.; Franci, M.; Loschiavo, M.; Argenti, F. A Deep Learning Approach to Ancient Egyptian Hieroglyphs Classification. IEEE Access 2021, 9, 123438–123447. [Google Scholar] [CrossRef]
Adamopoulos, E. Learning-Based Classification of Multispectral Images for Deterioration Mapping of Historic Structures. J. Build. Pathol. Rehabil. 2021, 6, 41. [Google Scholar] [CrossRef]
Grilli, E.; Remondino, F. Classification of 3D Digital Heritage. Remote Sens. 2019, 11, 847. [Google Scholar] [CrossRef]
Grilli, E.; Remondino, F. Machine Learning Generalisation across Different 3D Architectural Heritage. ISPRS Int. J. Geo Inf. 2020, 9, 379. [Google Scholar] [CrossRef]
Navarro, P.; Cintas, C.; Lucena, M.; Fuertes, J.M.; Delrieux, C.; Molinos, M. Learning Feature Representation of Iberian Ceramics with Automatic Classification Models. J. Cult. Herit. 2021, 48, 65–73. [Google Scholar] [CrossRef]
Meng, T.; Huang, R.; Lu, Y.; Liu, H.; Ren, J.; Zhao, G.; Hu, W. Highly Sensitive Terahertz Non-destructive Testing Technology for Stone Relics Deterioration Prediction Using SVM-Based Machine Learning Models. Herit. Sci. 2021, 9, 24. [Google Scholar] [CrossRef]
Capobianco, G.; Pronti, L.; Gorga, E.; Romani, M.; Cestelli-Guidi, M.; Serranti, S.; Bonifazi, G. Methodological Approach for the Automatic Discrimination of Pictorial Materials Using Fused Hyperspectral Imaging Data from the Visible to Mid-Infrared Range Coupled with Machine Learning Methods. Spectrochim. Acta A Mol. Biomol. Spectrosc. 2024, 304, 123412. [Google Scholar] [CrossRef] [PubMed]
Coppola, F.; Frigau, L.; Markelj, J.; Malešič, J.; Conversano, C.; Strlič, M. Near-Infrared Spectroscopy and Machine Learning for Accurate Dating of Historical Books. J. Am. Chem. Soc. 2023, 145, 12305–12314. [Google Scholar] [CrossRef] [PubMed]
Molada-Tebar, A.; Marqués-Mateu, Á.; Lerma, J.L.; Westland, S. Dominant Color Extraction with K-Means for Camera Characterization in Cultural Heritage Documentation. Remote Sens. 2020, 12, 520. [Google Scholar] [CrossRef]
Standoli, G.; Salachoris, G.P.; Masciotta, M.G.; Clementi, F. Modal-Based FE Model Updating via Genetic Algorithms: Exploiting Artificial Intelligence to Build Realistic Numerical Models of Historical Structures. Constr. Build. Mater. 2021, 303, 124393. [Google Scholar] [CrossRef]
Salachoris, G.P.; Standoli, G.; Betti, M.; Milani, G.; Clementi, F. Evolutionary Numerical Model for Cultural Heritage Structures via Genetic Algorithms: A Case Study in Central Italy. Bull. Earthq. Eng. 2024, 22, 3591–3625. [Google Scholar] [CrossRef]
Gara, F.; Nicoletti, V.; Arezzo, D.; Cipriani, L.; Leoni, G. Model Updating of Cultural Heritage Buildings Through Swarm Intelligence Algorithms. Int. J. Archit. Herit. 2025, 19, 259–275. [Google Scholar] [CrossRef]
Sheng, J.; Song, C.; Wang, J.; Han, Y. Convolutional Neural Network Style Transfer towards Chinese Paintings. IEEE Access 2019, 7, 163719–163728. [Google Scholar] [CrossRef]
Peng, X.; Zhao, H.; Wang, X.; Zhang, Y.; Li, Z.; Zhang, Q.; Wang, J.; Peng, J.; Liang, H. C3N: Content-Constrained Convolutional Network for Mural Image Completion. Neural Comput. Appl. 2022, 35, 1959–1970. [Google Scholar] [CrossRef]
Wang, N.; Zhao, X.; Asce, A.M.; Wang, L.; Zou, Z. Novel System for Rapid Investigation and Damage Detection in Cultural Heritage Conservation Based on Deep Learning. J. Infrastruct. Syst. 2019, 25, 04019020. [Google Scholar] [CrossRef]
Guyot, A.; Lennon, M.; Lorho, T.; Hubert-Moy, L. Combined Detection and Segmentation of Archeological Structures from LiDAR Data Using a Deep Learning Approach. J. Comput. Appl. Archaeol. 2021, 4, 1–19. [Google Scholar] [CrossRef]
Kumar, P.; Gupta, V. Restoration of Damaged Artworks Based on a Generative Adversarial Network. Multimed. Tools Appl. 2023, 82, 40967–40985. [Google Scholar] [CrossRef]
Kumar, P.; Gupta, V.; Grover, M. Dual Attention and Channel Transformer Based Generative Adversarial Network for Restoration of the Damaged Artwork. Eng. Appl. Artif. Intell. 2024, 128, 107457. [Google Scholar] [CrossRef]
Pathak, R.; Saini, A.; Wadhwa, A.; Sharma, H.; Sangwan, D. An Object Detection Approach for Detecting Damages in Heritage Sites Using 3-D Point Clouds and 2-D Visual Data. J. Cult. Herit. 2021, 48, 74–82. [Google Scholar] [CrossRef]
Karimi, N.; Mishra, M.; Lourenço, P.B. Deep Learning-Based Automated Tile Defect Detection System for Portuguese Cultural Heritage Buildings. J. Cult. Herit. 2024, 68, 86–98. [Google Scholar] [CrossRef]
Gao, S.; Huang, G.; Chen, X.; Jiang, H.; Zhou, L.; Gao, X. Two-Stage Deep Learning-Based Video Image Recognition of Early Fires in Heritage Buildings. Eng. Appl. Artif. Intell. 2024, 129, 107598. [Google Scholar] [CrossRef]
Grilli, E.; Özdemir, E.; Remondino, F. Application of machine and deep learning strategies for the classification of heritage point clouds. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, XLII-4/W18, 447–454. [Google Scholar] [CrossRef]
Grilli, E.; Farella, E.M.; Torresani, A.; Remondino, F. Geometric features analysis for the classification of cultural heritage point clouds. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. ISPRS Arch. 2019, 42, 541–548. [Google Scholar] [CrossRef]
Long, L.; Gan, Z.; Liu, Z.; Zhao, B.; Li, Q. MSD-Det: Masonry Structures Damage Detection Dataset for Preventive Conservation of Heritage. J. Cult. Herit. 2025, 73, 358–370. [Google Scholar] [CrossRef]
Xu, Z.; Yang, Y.; Fang, Q.; Chen, W.; Xu, T.; Liu, J.; Wang, Z. A Comprehensive Dataset for Digital Restoration of Dunhuang Murals. Sci. Data 2024, 11, 955. [Google Scholar] [CrossRef]
Li, B.; Feng, J.; Yan, Y.; Kou, G.; Li, H.; Du, Y.; Wang, X.; Li, T.; Peng, Y.; Guo, K.; et al. Building a Chinese Ancient Architecture Multimodal Dataset Combining Image, Annotation and Style-Model. Sci. Data 2024, 11, 1137. [Google Scholar] [CrossRef]
Dondi, P.; Lombardi, L.; Setti, A. DAFNE: A Dataset of Fresco Fragments for Digital Anastlylosis. Pattern Recognit. Lett. 2020, 138, 631–637. [Google Scholar] [CrossRef]
Samhouri, M.; Al-Arabiat, L.; Al-Atrash, F. Prediction and Measurement of Damage to Architectural Heritages Facades Using Convolutional Neural Networks. Neural Comput. Appl. 2022, 34, 18125–18141. [Google Scholar] [CrossRef]
Fu, X.; Angkawisittpan, N. Detecting Surface Defects of Heritage Buildings Based on Deep Learning. J. Intell. Syst. 2024, 33, 20230048. [Google Scholar] [CrossRef]
Yan, L.; Chen, Y.; Zheng, L.; Zhang, Y. Application of Computer Vision Technology in Surface Damage Detection and Analysis of Shedthin Tiles in China: A Case Study of the Classical Gardens of Suzhou. Herit. Sci. 2024, 12, 72. [Google Scholar] [CrossRef]
Idjaton, K.; Janvier, R.; Balawi, M.; Desquesnes, X.; Brunetaud, X.; Treuillet, S. Detection of Limestone Spalling in 3D Survey Images Using Deep Learning. Autom. Constr. 2023, 152, 104919. [Google Scholar] [CrossRef]
Mishra, M.; Barman, T.; Ramana, G.V. Artificial Intelligence-Based Visual Inspection System for Structural Health Monitoring of Cultural Heritage. J. Civ. Struct. Health Monit. 2022, 14, 103–120. [Google Scholar] [CrossRef]
Guyot, A.; Hubert-Moy, L.; Lorho, T. Detecting Neolithic Burial Mounds from LiDAR-Derived Elevation Data Using a Multi-Scale Approach and Machine Learning Techniques. Remote Sens. 2018, 10, 225. [Google Scholar] [CrossRef]
Stott, D.; Kristiansen, S.M.; Sindbæk, S.M. Searching for Viking Age Fortresses with Automatic Landscape Classification and Feature Detection. Remote Sens. 2019, 11, 1881. [Google Scholar] [CrossRef]
Meng, L.; Lyu, B.; Zhang, Z.; Aravinda, C.V.; Kamitoku, N.; Yamazaki, K. Oracle Bone Inscription Detector Based on SSD. In New Trends in Image Analysis and Processing—ICIAP 2019; Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2019; Volume 11808, pp. 126–136. [Google Scholar] [CrossRef]
Bacharidis, K.; Sarri, F.; Ragia, L. 3D Building Façade Reconstruction Using Deep Learning. ISPRS Int. J. Geo Inf. 2020, 9, 322. [Google Scholar] [CrossRef]
Orengo, H.A.; Garcia-Molsosa, A. A Brave New World for Archaeological Survey: Automated Machine Learning-Based Potsherd Detection Using High-Resolution Drone Imagery. J. Archaeol. Sci. 2019, 112, 105013. [Google Scholar] [CrossRef]
Morbidoni, C.; Pierdicca, R.; Paolanti, M.; Quattrini, R.; Mammoli, R. Learning from Synthetic Point Cloud Data for Historical Buildings Semantic Segmentation. J. Comput. Cult. Herit. JOCCH 2020, 13, 1–16. [Google Scholar] [CrossRef]
Kulkarni, U.; Meena, S.M.; Gurlahosur, S.V.; Mudengudi, U. Classification of Cultural Heritage Sites Using Transfer Learning. In Proceedings of the 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019, Singapore, 11–13 September 2019; pp. 391–397. [Google Scholar] [CrossRef]
Baradaran Rahimi, F.; Demers, C.M.H.; Karimi Dastjerdi, M.R.; Lalonde, J.F. Agile Digitization for Historic Architecture Using 360° Capture, Deep Learning, and Virtual Reality. Autom. Constr. 2025, 171, 105986. [Google Scholar] [CrossRef]
Croce, V.; Billi, D.; Caroti, G.; Piemonte, A.; De Luca, L.; Véron, P. Comparative Assessment of Neural Radiance Fields and Photogrammetry in Digital Heritage: Impact of Varying Image Conditions on 3D Reconstruction. Remote Sens. 2024, 16, 301. [Google Scholar] [CrossRef]
Stoean, R.; Bacanin, N.; Stoean, C.; Ionescu, L. Bridging the Past and Present: AI-Driven 3D Restoration of Degraded Artefacts for Museum Digital Display. J. Cult. Herit. 2024, 69, 18–26. [Google Scholar] [CrossRef]
Yang, J.; Intan Raihana Ruhaiyem, N.; Zhou, C. A 3M-Hybrid Model for the Restoration of Unique Giant Murals: A Case Study on the Murals of Yongle Palace. IEEE Access 2025, 13, 38809–38824. [Google Scholar] [CrossRef]
Xu, Z.; Geng, C. Color Restoration of Mural Images Based on a Reversible Neural Network: Leveraging Reversible Residual Networks for Structure and Texture Preservation. Herit. Sci. 2024, 12, 351. [Google Scholar] [CrossRef]
Zou, Z.; Zhao, P.; Zhao, X. Virtual Restoration of the Colored Paintings on Weathered Beams in the Forbidden City Using Multiple Deep Learning Algorithms. Adv. Eng. Inform. 2021, 50, 101421. [Google Scholar] [CrossRef]
Mei, Y.; Yang, L.; Wang, M.; Yu, T.; Wu, K. DunHuangStitch: Unsupervised Deep Image Stitching of Dunhuang Murals. IEEE Trans. Vis. Comput. Graph. 2025, 31, 4226–4240. [Google Scholar] [CrossRef]
Cao, J.; Zhang, Z.; Zhao, A.; Cui, H.; Zhang, Q. Ancient Mural Restoration Based on a Modified Generative Adversarial Network. Herit. Sci. 2020, 8, 7. [Google Scholar] [CrossRef]
Yang, R.; Ota, K.; Dong, M.; Wu, X. Semantic Layout-Guided Diffusion Model for High-Fidelity Image Synthesis in ‘The Thousand Li of Rivers and Mountains’. Expert Syst. Appl. 2025, 263, 125645. [Google Scholar] [CrossRef]
Deng, Y.; Ju, H.; Li, Y.; Hu, Y.; Li, A. Abnormal Data Recovery of Structural Health Monitoring for Ancient City Wall Using Deep Learning Neural Network. Int. J. Archit. Herit. 2024, 18, 389–407. [Google Scholar] [CrossRef]
Parida, L.; Moharana, S.; Ferreira, V.M.; Giri, S.K.; Ascensão, G. A Novel CNN-LSTM Hybrid Model for Prediction of Electro-Mechanical Impedance Signal Based Bond Strength Monitoring. Sensors 2022, 22, 9920. [Google Scholar] [CrossRef] [PubMed]
General Office of the State Council. The 14th Five-Year Plan for Cultural Heritage Protection and Scientific and Technological Innovation. 2021. Available online: https://www.mct.gov.cn/whzx/whyw/202111/t20211109_928893.htm (accessed on 4 February 2026).
General Office of the Central Committee of the Communist Party of China; General Office of the State Council. Opinions on Promoting the Implementation of the National Cultural Digitization Strategy. 2022. Available online: https://www.mee.gov.cn/zcwj/zyygwj/202205/t20220522_982842.shtml (accessed on 4 February 2026).
The Cultural Heritage Cloud—Research and Innovation—European Commission. Available online: https://research-and-innovation.ec.europa.eu/research-area/social-sciences-and-humanities/cultural-heritage-and-cultural-and-creative-industries-ccis/cultural-heritage-cloud_en?f_link_type=f_linkinlinenote&flow_extra=eyJkb2NfcG9zaXRpb24iOjAsImRvY19pZCI6IjBkYWMzZGQxYzZhYjA1MWMtMjEzN2E4MWU1YzI3NjQzMCIsImlubGluZV9kaXNwbGF5X3Bvc2l0aW9uIjowfQ%3D%3D&prefLang=es (accessed on 4 February 2026).
The Common European Data Space for Cultural Heritage—Strategy 2025–2030|Shaping Europe’s Digital Future. Available online: https://digital-strategy.ec.europa.eu/en/library/common-european-data-space-cultural-heritage-strategy-2025-2030 (accessed on 4 February 2026).
Orzechowski, M.; Opioła, Ł.; Martínez, I.L.; Ioannides, M.; Panayiotou, P.N.; Dutka, Ł.; Słota, R.G.; Kitowski, J. Integrated Data, Metadata, and Paradata Management System for 3D Digital Cultural Heritage Objects: Workflow Automation, Federated Authentication, and Publication. Future Gener. Comput. Syst. 2026, 174, 107964. [Google Scholar] [CrossRef]
Discover Europe’s Digital Cultural Heritage|Europeana. Available online: https://www.europeana.eu/en (accessed on 4 February 2026).
Tiribelli, S.; Pansoni, S.; Frontoni, E.; Giovanola, B. Ethics of Artificial Intelligence for Cultural Heritage: Opportunities and Challenges. IEEE Trans. Technol. Soc. 2024, 5, 293. [Google Scholar] [CrossRef]
Battini, C.; Ferretti, U.; De Angelis, G.; Pierdicca, R.; Paolanti, M.; Quattrini, R. Automatic Generation of Synthetic Heritage Point Clouds: Analysis and Segmentation Based on Shape Grammar for Historical Vaults. J. Cult. Herit. 2024, 66, 37–47. [Google Scholar] [CrossRef]
Díaz-Rodríguez, N.; Lamas, A.; Sanchez, J.; Franchi, G.; Donadello, I.; Tabik, S.; Filliat, D.; Cruz, P.; Montes, R.; Herrera, F. EXplainable Neural-Symbolic Learning (X-NeSyL) Methodology to Fuse Deep Learning Representations with Expert Knowledge Graphs: The MonuMAI Cultural Heritage Use Case. Inf. Fusion 2022, 79, 58–83. [Google Scholar] [CrossRef]
Lamas, A.; Tabik, S.; Cruz, P.; Montes, R.; Martínez-Sevilla, Á.; Cruz, T.; Herrera, F. MonuMAI: Dataset, Deep Learning Pipeline and Citizen Science Based App for Monumental Heritage Taxonomy and Classification. Neurocomputing 2021, 420, 266–280. [Google Scholar] [CrossRef]
Xie, Z.; Huang, Y.; Jin, L.; Liu, Y.; Zhu, Y.; Gao, L.; Zhang, X. Weakly Supervised Precise Segmentation for Historical Document Images. Neurocomputing 2019, 350, 271–281. [Google Scholar] [CrossRef]
Beni, T.; Nava, L.; Gigli, G.; Frodella, W.; Catani, F.; Casagli, N.; Gallego, J.I.; Margottini, C.; Spizzichino, D. Classification of Rock Slope Cavernous Weathering on UAV Photogrammetric Point Clouds: The Example of Hegra (UNESCO World Heritage Site, Kingdom of Saudi Arabia). Eng. Geol. 2023, 325, 107286. [Google Scholar] [CrossRef]
Pan, J.; Li, L.; Yamaguchi, H.; Hasegawa, K.; Thufail, F.I.; Brahmantara; Tanaka, S. 3D Reconstruction of Borobudur Reliefs from 2D Monocular Photographs Based on Soft-Edge Enhanced Deep Learning. ISPRS J. Photogramm. Remote Sens. 2022, 183, 439–450. [Google Scholar] [CrossRef]

Figure 1. PRISMA Flow Diagram of Literature Retrieval and Review.

Figure 2. Publication Volume Trends from 2011 to 2025.

Figure 3. Comparison of Annual Publications Among the Top 15 Countries.

Figure 4. Analysis of Annual Publication Sources.

Figure 5. Global Collaborative Network Density Trends (2020–2025).

Figure 6. National Cooperation Network Map.

Figure 7. Keywords Clustering Analysis Based on the Theme. (a) Recognition Theme Clustering Results; (b) Reconstruction and Virtual Restoration Theme Clustering Results; (c) Monitoring and Prediction Theme Clustering Results.

Figure 8. Annual Publication Trends Analysis by Theme.

Table 1. Metrics Used in Social Network Analysis [15,16,17].

Level	Metrics	Formula	Parameter Description	Note
Overall	Network Density	$D e n s i t y = \frac{2 L}{N (N - 1)}$	L $: The number of actual connections existing in the network . N$ : The total number of nodes in the network.	Measuring the closeness of connections between nodes within a network.
Individual	Degree Centrality	$C_{d e g r e e} (v) = \frac{d e g r e e (v)}{N - 1}$	$d e g r e e (v)$ $is the degree of node v$ $, and N$ is the number of nodes in the network.	Measures the number of directly connected neighbors for a node, reflecting its level of activity within the network.
	Betweenness Centrality	$C_{b e t w e e n n e s s} (v) = \frac{2}{(N - 1) (N - 2)} \sum_{s \neq v \neq t} \frac{σ_{s t} (v)}{σ_{s t}}$	$σ_{s t}$ $denotes the total number of shortest paths between node s$ $and node t$ $, while σ_{s t} (v)$ $represents the number of shortest paths passing through node v$ .	The number of times a node serves as the shortest path “bridge” between other nodes reflects its connectivity role within the network.
	Closeness Centrality	$C_{c l o s e n e s s} (v) = \frac{N - 1}{\sum_{u \neq v} d (v, u)}$	$d (v, u)$ $is the shortest path length from node v$ $to node u$ .	The reciprocal of the average distance from a node to all other nodes in the network reflects whether that node is “close” to the entire network and can reach all other nodes more quickly.

Table 2. Social Network Analysis Metrics.

Country	Partners Count	Degree Centrality	Betweenness Centrality	Closeness Centrality
Italy	34	0.425	0.237	0.619
China	33	0.413	0.285	0.586
USA	26	0.325	0.126	0.545
UK	21	0.263	0.068	0.527
France	18	0.225	0.077	0.534
Spain	17	0.213	0.117	0.506
Japan	15	0.188	0.131	0.510
Germany	14	0.175	0.068	0.481
India	13	0.163	0.056	0.503
Canada	13	0.163	0.028	0.494
Greece	10	0.125	0.017	0.476
Saudi Arabia	10	0.125	0.070	0.443
Qatar	9	0.113	0.016	0.479
Netherlands	9	0.113	0.009	0.481
Poland	9	0.113	0.009	0.433

Table 3. Thematic Classification Based on Topic Modelling.

Theme	Topic Number	Example Keywords	Number of Documents
Recognition (Identification, Detection, Classification, Segmentation)	0, 4, 5, 6, 8	classification, image, recognition, characters, script, manuscripts, handwritten, text, inscriptions, historical, point, semantic, segmentation, damage, detection, building, tiles, wall, gray, types, deterioration, stone, pores, weathering, hollowing, castles, yolo, cracks, fractured, accuracy, proposed, archaeological, sites, potential, remote, lidar, sensing, fragments, visual, art, dataset, content, related, pieces	517
Reconstruction and Virtual Restoration	1, 2, 10	BIM, information, building, techniques, point, reconstruction, reliefs, hidden, visible, parts, texture, depth, restoration, mural, image, color, paintings, network, patterns, segmentation, clouds, semantic, cloud, buildings, architectural, murals, model, health, masonry, structural, detection, historical	196
Monitoring and Prediction	3, 7, 9	archaeological, sites, lidar, remote, sensing, underwater, structural, monitoring, masonry, health, SHM, based, parameters, risk, fire, temperature, climate, timber, regions, seismic, minarets, assessment, accuracy, dataset, classification, performance, architectural, historical, xrf, spectral, pigment, hyperspectral, analysis, materials	154

Table 4. Advantages and Limitations of Major AI Paradigms in Cultural Heritage Applications.

Paradigm	Main Advantages	Limitations in the Cultural Heritage Context
CNN	High robustness in visual feature extraction; effective for textures, patterns, and surface damage	Require large annotated datasets; limited generalization across different sites, periods, and materials; low interpretability for conservation decision-making
RF/SVM	Good interpretability; effective on small to medium-sized datasets; stable training behavior	Strong dependence on manual feature engineering; limited performance on high-complexity data such as raw images or unstructured point clouds
GAN	High capability for visual restoration and content generation	Risk of introducing non-authentic or hallucinated features; difficult historical and artistic validation; ethical concerns related to authenticity and falsification
Transformer	Effective modeling of long-range dependencies and multimodal data	High computational cost; strong dependence on large-scale pretraining; limited availability of sufficiently large CH datasets
Diffusion Models	High-quality reconstruction and completion of missing or degraded areas	Limited controllability of the generative process; lack of CH-specific evaluation metrics; risk of over-restoration
GNN	Explicit modeling of structural, spatial, and relational information	Non-trivial graph construction; lack of standardized workflows for CH applications
Hybrid Models (CNN + Transformer)	Improved balance between local feature extraction and global contextual reasoning	Increased pipeline complexity; reduced interpretability; higher risk of overfitting on small or site-specific datasets

Table 5. Multimodal data in Cultural Heritage.

Data Type		Data Source
Visual	RGB Image	RGB Cameras, Optical Satellites, Aerial Imagery, UAVs equipped with RGB cameras
	Panoramic Image	Panoramic Cameras
	Video	RGB Cameras, Panoramic Cameras, UAV-based Cameras, Surveillance Cameras
	Multispectral and Hyperspectral Image	Multispectral Cameras (MSI), Hyperspectral Sensors (HSI), UAVs equipped with Multispectral Cameras
	Thermal Image	Thermal Imaging Cameras, UAVs equipped with Thermal Imaging Cameras
Geometric	Point Cloud	Terrestrial Laser Scanner (TLS), Mobile LiDAR (MLS), UAV LiDAR, RGB-D Cameras, Structured-light Scanners
	Mesh Models	Photogrammetry Pipelines, 3D Laser Scanners, Structured-light Scanners
	Parametric Model	Derived from point cloud and mesh model through parametric and rule-based modeling
Spectroscopic	Spectroscopic data	Raman Spectroscopy, X-ray Fluorescence Spectrometer (XRF), Fourier-transform infrared spectroscopy (FTIR)
Acoustic	Audio Recordings	Microphones, Audio Recorders, Field Recorders
Sensor Time-series	Environmental Monitoring Time Series	Temperature and Humidity Sensors, Vibration Sensors, Strain Gauges, Displacement Sensor, Gas Sensor, Light Sensors
Sensor Time-series	Geophysical Time Series	Ground-penetrating Radar (GPR), Electrical Resistivity Tomography (ERT), Magnetometry Sensors, Ultrasound Devices, Seismometers
Textual	Text Data	Text Extraction from Inscriptions and Rubbings, Historical Documents, Ancient Books and Archives
Structured and Semantic	Tabular and Relational Database	Management Systems, Databases, GIS Attribute Tables
	Knowledge Graphs	Knowledge Extraction, Semantic Analysis, Semantic Reasoning Systems
	Social and Relational Networks

Table 6. Model Evaluation Metrics under the Cultural Heritage Theme.

CH Theme	Model Evaluation
Recognition (Identification, Detection, Classification, Segmentation)	OA, Mean Accuracy (MA), Precision, Recall, F1-score, Intersection over Union (IoU), Area Under ROC Curve (AUC), Mean Average Precision (mAP), Mean Average Recall (mAR), Confusion Matrix
Reconstruction and Virtual Restoration	PSNR, SSIM, LPIPS, Mean Square Error (MSE), Universal Image Quality Index (UQI), Visual Information Fidelity (VIF)
Monitoring and Prediction	RMSE, MAE, Mean Square Error (MSE), Mean Absolute Percentage Error (MAPE), R², Sensitivity, Frequency Error (FE), Modal Assurance Criterion (MAC)

Table 7. Specific Heritage Tasks and ML methods for the Different Themes.

CH Theme	ML Method	DL Method	Specific Heritage Task
Recognition (Identification, Detection, Classification, Segmentation)	SVM, RF, DT, kNN, K-Means, DBSCAN, Naive Bayes (NB), K-means	CNNs, Faster R-CNN, YOLO family, RetinaNet, ResNet, VGG, Inception, Inception-ResNet, Xception, AlexNet, EfficientNet, MobileNet, U-Net, Mask R-CNN, ViT, Swin Transformer, Autoencoder, Bi-LSTM, DenseNet	Archaeological site and heritage landscape identification, detection, and pattern recognition (e.g., underwater sites, burial mounds, traditional settlements, topography, fortress ruins, heritage monuments); Structural damage detection, deterioration identification, and semantic segmentation (e.g., heritage buildings, historic masonry, wooden structures, caves, grottoes, palaces); Artifact detection and classification (e.g., pottery shards, heritage coins, reliefs, painted rocks, rock carvings, Buddha statues, sculptures); Historical script and document recognition (e.g., ancient scripts, Oracle Bone script, Pictographic writing, historical manuscripts, historical maps); Style and typological classification (e.g., architectural styles, monument styles, ancient painting types); Material identification and spectral analysis (e.g., painting pigments, chemical elements).
Reconstruction, Virtual and Restoration	Regression Trees, RF, PCA	U-Net, CNNs, GANs, Mask R-CNN, DGCNN, PointNet, PointNet++, PCNN, Pix2pix, CycleGAN, Diffusion Models, ResNet-based autoencoders, NeRF, Stable Diffusion (SD)	Semi-automatic HBIM reconstruction; Structural completion and fragment reconstruction (e.g., temples, artefacts, ceramics, murals, relics); Digital restoration of damaged heritage surfaces and objects (e.g., murals, paintings, historical documents); Generative style transfer and artistic reconstruction (e.g., architectural decoration, traditional painting).
Monitoring and Prediction	Boosted Regression Trees (BRT), Bayesian Additive Regression Trees (BART), Generalized Bayesian Linear Models (GBLM), RF, M5P Regression Trees, XGBoost, Generalized Boosting Model (GBM), Gradient Boosting Decision Tree (GBDT)	LSTM, RNNs, CNNs, U-Net, GCN, GNNs, Seq2Seq, CNN-LSTM Hybrid Model, Gated Recurrent Unit (GRU)	Structural health monitoring and condition assessment (e.g., crack evolution, structural damage evaluation); Deterioration modelling and prediction (e.g., surface degradation, stone weathering, displacement prediction, hollow deterioration, deformation prediction); Environmental hazard monitoring and risk assessment (e.g., flood, erosion, landslide, tidal risk, seismic, fire risk).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, X.; Chiabrando, F.; Sammartano, G. Machine Learning and Deep Learning for Cultural Heritage Conservation: A Bibliometric and Task-Oriented Review. Remote Sens. 2026, 18, 628. https://doi.org/10.3390/rs18040628

AMA Style

Li X, Chiabrando F, Sammartano G. Machine Learning and Deep Learning for Cultural Heritage Conservation: A Bibliometric and Task-Oriented Review. Remote Sensing. 2026; 18(4):628. https://doi.org/10.3390/rs18040628

Chicago/Turabian Style

Li, Xinchen, Filiberto Chiabrando, and Giulia Sammartano. 2026. "Machine Learning and Deep Learning for Cultural Heritage Conservation: A Bibliometric and Task-Oriented Review" Remote Sensing 18, no. 4: 628. https://doi.org/10.3390/rs18040628

APA Style

Li, X., Chiabrando, F., & Sammartano, G. (2026). Machine Learning and Deep Learning for Cultural Heritage Conservation: A Bibliometric and Task-Oriented Review. Remote Sensing, 18(4), 628. https://doi.org/10.3390/rs18040628

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning and Deep Learning for Cultural Heritage Conservation: A Bibliometric and Task-Oriented Review

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Literature Collection and Selection

2.2. Literature Analysis and Assessment

2.2.1. Social Network Analysis

2.2.2. PyBibX: Topic Modeling and Network Analysis

2.2.3. VOSviewer: Keyword Co-Occurrence Analysis

3. Results

3.1. Bibliometric Analysis

3.1.1. Publishing Trends and Publication Source

3.1.2. Analysis of Global Collaborative Networks

3.1.3. Topic Modeling and Keywords Clustering

3.2. ML and DL Technology in Cultural Heritage Conservation

3.2.1. Machine Learning and Deep Learning

3.2.2. Classic Procedures of ML and DL Applications in CH

3.3. Applications of ML and DL in Cultural Heritage Conservation

3.3.1. Recognition

3.3.2. Reconstruction and Virtual Restoration

3.3.3. Monitoring and Prediction

4. Discussion

4.1. Exploring Characteristics of Interdisciplinary Research on CH and AI Through Bibliometric Results

4.2. Interoperability and Standardisation Challenges for Multimodal CH Data

4.3. The Rise and Limitations of Synthetic Datasets

4.4. Human–Machine Collaborative Decision-Making Pathways for “Data + Technology + Task”

5. Conclusions and Prospects

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI