Next Article in Journal
Stability of Non-Uniform Soils Slope with Tension Cracks Under Unsaturated Flow Conditions
Next Article in Special Issue
Structural Failures in an Architectural Heritage Site: Case Study of the Blagoveštenje Monastery Church, Kablar, Serbia
Previous Article in Journal
Fire Test on Insulated Steel Beams with Fire-Protection Coating and Fiber Cement Board
Previous Article in Special Issue
Prevention and Control Strategies for Rainwater and Flood Disasters in Traditional Villages: A Concentrated Contiguous Zone Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Smart Data-Enabled Conservation and Knowledge Generation for Architectural Heritage System

Department of Landscape Architecture, School of Architecture, South China University of Technology, Guangzhou 510641, China
*
Author to whom correspondence should be addressed.
Buildings 2025, 15(12), 2122; https://doi.org/10.3390/buildings15122122
Submission received: 25 April 2025 / Revised: 6 June 2025 / Accepted: 16 June 2025 / Published: 18 June 2025
(This article belongs to the Special Issue Advanced Research on Cultural Heritage)

Abstract

In architectural heritage conservation, fragmented data practices and heterogeneous formats hinder knowledge extraction, limiting the translation of raw data into actionable conservation insights. This study proposes a knowledge-centric framework integrating smart data methodologies to bridge this gap. The framework synergizes Heritage Building Information Modeling (HBIM), semantic knowledge graphs, and knowledge bases, prioritizing three interconnected dimensions: geometric digitization through 3D laser scanning and parametric HBIM reconstruction, semantic enrichment of historical texts via NLP and rule-based entity extraction, and knowledge graph-driven discovery of spatiotemporal patterns using Neo4j and ontology mapping. Validated through dual case studies—the Historical Educational Sites in South China (humanistic narratives) and the Dong ethnic drum towers (structural logic)—the framework demonstrates its capacity to automate knowledge generation, converting 20.5 GB of multi-source data into 2652 RDF triples that interconnect 1701 nodes across HBIM models and archival records. By enabling real-time visualization of semantic relationships (e.g., educator networks, mortise-and-tenon typologies) through graph queries, the system enhances interdisciplinary collaboration. Furthermore, the proposed smart data framework facilitated the generation of domain-specific knowledge through systematic data valorization, yielding actionable insights for architectural conservation practice. This research redefines conservation as a knowledge-to-action paradigm, where smart data methodologies unify tangible and intangible heritage values, fostering data-driven stewardship across cultural, historical, and technical domains.

1. Introduction

Architectural heritage serves as a vital manifestation of both material and spiritual civilization, embodying multidimensional values encompassing emotional, cultural, artistic, and societal dimensions. The emergence of the big data era has introduced innovative perspectives for heritage conservation, facilitating the integration of historical architecture into intelligent digital transformation frameworks. Through data-driven approaches, architectural heritage preservation demonstrates growing demands for digital advancement, while simultaneously unlocking unprecedented potential for value activation inherent in heritage assets.
Knowledge bases have emerged as pivotal tools in this context, enabling structured organization of heritage data through semantic ontologies [1,2]. For instance, Khouri et al. [3] mapped Algerian traditional houses to socio-cultural narratives via ontology-driven KBs, yet such systems often prioritize static taxonomies over dynamic pattern discovery [4]. Digital preservation efforts (e.g., HBIM, semantic ontologies) have advanced geometric documentation. Current digitization efforts in architectural heritage preservation demonstrate two research directions: ontology-based systems for HBIM [2,3,5,6,7,8,9,10,11], and relevant humanities knowledge [12,13,14,15,16,17]. However, in terms of data collection, smart data, as an enhanced paradigm of big data processing, systematically converts unstructured/semi-structured data into machine-interpretable structured knowledge through four critical phases: data ingestion, cleansing, integration, and exploitation [4,10,18,19]. This transformation facilitates knowledge discovery in bases (KDD) by generating domain-specific actionable insights [1]. Although lacking a universally standardized definition, smart data methodologies have achieved theoretical and practical maturity across diverse fields, including urban governance [20], GLAM institutions [21], cultural heritage conservation [22], and biomedical research [4,23]. The field’s core research focus lies in the efficient integration of multimodal heterogeneous data, employing advanced techniques such as classification algorithms, clustering analysis, association rule mining [24], and temporal pattern recognition [25] to optimize data mining precision.
This study proposes a knowledge-centric framework: architectural heritage smart data system, integrating smart data methodologies, synergizes Heritage Building Information Modeling (HBIM), semantic knowledge graphs, and knowledge bases, demonstrating its capacity for extracting heritage value, systematic conservation, and digital organization. It constitutes a comprehensive digital preservation system encompassing the acquisition, storage, visualization, and reuse of heritage resources. The process involves systematic collection of multi-source heterogeneous data spanning heritage ontology and associated humanistic documentation, extraction of smart data derivatives, establishment of a knowledge repository, and experimental execution of architectural domain-specific knowledge mining (Figure 1).
It addresses three critical gaps in cultural heritage conservation: incomplete datasets with low interoperability, inefficient domain knowledge extraction, and underutilized built heritage data. By integrating HBIM, semantic knowledge graphs, and NLP-driven text mining, it establishes systematic data valorization.

2. Methods

The construction of the smart data framework for architectural heritage comprises two sequential phases: (1) data acquisition and processing, (2) data storage and visualization, and (3) data reuse. Initial multimodal heterogeneous data are categorized into heritage ontological data (physical attributes, structural parameters, material properties) and humanistic documentation, with the latter encompassing documentary records and historical archives. The operational workflow is detailed as follows (Figure 2).

2.1. Data Acquisition and Processing

2.1.1. Architectural Remains

  • HBIM Database
Architectural heritage ontology integrates geometric representations and domain-specific knowledge frameworks. The acquisition of heritage models and intelligent information management necessitates synergistic application of Heritage Building Information Modeling (HBIM) and terrestrial laser scanning (TLS). HBIM has emerged as a transformative digital methodology in heritage conservation, while TLS—with its automated workflows, sub-millimeter precision (typically <1 mm error), and multi-perspective capture capabilities—has redefined heritage documentation standards [13]. Recent methodological breakthroughs demonstrate accelerated progress: Ursini et al. [14] implemented scan-to-HBIM workflows in Revit Architecture for structural finite element modeling, achieving 94.7% geometric accuracy in dynamic simulations. Guangzhou University’s patented meta-modeling approach [15] automated HBIM reconstruction through architectural construction logic, reducing modeling time by 38% compared to conventional methods. Guo [26] pioneered machine vision-assisted parameter extraction for traditional Dong ethnic mortise-and-tenon structures, attaining 92.4% recognition accuracy in component identification. Zou et al. [27] advanced high-precision reconstruction techniques for architectural heritage 3D modeling, with each texture pixel corresponding to a spatial size of 0.45 mm. Concurrently, archival practices have transitioned from static textual/graphic records to dynamic 3D models with embedded semantic linkages, driving BIM adoption across the preservation sector [26].
Based on the above research, the following methods were used to construct the HBIM database: Trimble point cloud processing tools were utilized to segment the tower’s 3D point cloud model, generating planar slices and diagonal sectional profiles. A pre-trained object detection model processed these profiles to extract critical parameters and identify meta-types through machine vision algorithms. These parameters were input into Dynamo to computationally derive spatial topological reference points for the timber frame components (e.g., purlin brackets and corbelled dougong joints).
  • Knowledge Ontology Construction
This study utilizes the Protégé (5.5.0) ontology editing tool to construct the knowledge ontology of Dong drum towers. Protégé, a Java-based ontology editor, enables the creation of conceptual classes, relationships, properties, and instances for domain-specific ontology development. The generated ontology can be visualized through it, with the final intangible cultural heritage (ICH) ontology data mapped and stored in RDF format.

2.1.2. Humanistic Documentation

  • Document data
A critical challenge in transforming archival literature—as an integral component of humanistic documentation—into structured data lies in extracting actionable insights from vast, unstructured textual corpora. This study implements a semi-automated extraction framework integrating machine recognition with human validation, structured as follows: Firstly, raw texts undergo tokenization to segment linguistic units. Subsequently, rule-based extraction mechanisms are applied using a curated annotated keyword base specific to architectural heritage, enabling automated identification of key sentences or paragraphs. Through semantic matching and data mining algorithms, candidate results are filtered into a targeted dataset. Finally, manual curation transforms these outputs into RDF-compliant entity–relationship–entity triples, ensuring semantic interoperability (Figure 3).
  • Historical Archives
Historical archives, comprising unstructured data such as historical letters, newspapers, and architectural technical drawings, present challenges due to their heterogeneous formats (textual and graphical) and large-scale complexity. To address this, we developed a semi-automated workflow integrating domain-specific structured templates [28] with manual annotation, extracting and standardizing information into RDF triples to prepare structured data for knowledge graph generation (Figure 4).

2.2. Data Storage and Visualization

The preceding section detailed the architectural heritage data acquisition and processing phase, encompassing four key steps: data retrieval, data cleansing, information extraction, and knowledge storage, collectively establishing a structured knowledge repository. This section focuses on the second phase: data storage and visualization.
The unified semantic framework, structured as subject–predicate–object triples under the RDF-based knowledge representation model, is stored in the Neo4j (Community-4.3.15) graph base. Neo4j, a high-performance NoSQL graph base, leverages the Cypher query language to efficiently manage complex relational data, serving as the backbone for knowledge graph construction. Visualization, a hallmark advantage of knowledge graphs, translates abstract and intricate concepts into intuitive graphical representations through machine interpretation. This process enhances human comprehension and cognitive processing of complex knowledge by simplifying its presentation. The curated RDF-formatted data, when batch-imported into Neo4j, automatically generates a knowledge graph that facilitates systematic knowledge management and dynamic visualization.

2.3. Data Reuse

Traditional approaches in architectural heritage conservation have predominantly focused on material preservation, often overlooking intangible aspects such as socio-historical contexts, significant events, and the structural logic or construction techniques of heritage buildings. The proposed smart data framework shifts this paradigm not only by integrating raw archival data but also by uncovering latent values within architectural heritage, thereby offering dynamic, enriched, and intelligent knowledge resources for holistic conservation, which mainly includes three pathways: Citespace (6.3.R1), used to conduct bibliometric trend analysis; Neo4j-based knowledge mining; and HBIM-driven domain expertise.
  • Citespace Bibliometric Trend Analysis: Leveraging Citespace can help users visually identify themes, trends, research directions, collaborators, and citation relationships in literature to map scholarly trends and disciplinary intersections within heritage conservation research. This study attempts to use Citespace to maximize the excavation of knowledge from multi-source data to achieve the goal of enriching the architectural heritage data system. The involved methodologies include keyword frequency analysis, which refers to analyzing the frequency of co-occurrence of different keywords within the same publication to identify research hotspots in the field; keyword timeline clustering, which reflects the evolution of keywords within each cluster label and interconnections among key nodes along a unified timeline, thereby revealing developmental trajectories and future trends in the research domain; and keyword cluster analysis, which classifies research topics with high semantic relevance to distill core thematic foci within the research field.
  • Neo4j-Based Knowledge Mining: As established earlier, Neo4j serves as a visual database that stores information in a unified data structure and displays knowledge graphs through visualization. Within the knowledge graph, entities establish connections via semantic relationships; for example, hierarchical relationships between two entities, e.g., superordinate–subordinate, can be represented as “parent–child” relationships, while attribute relationships can be expressed as “property–value” relationships. These relationships are typically denoted as edges and entities as nodes, collectively forming a graph structure that describes various entities and their interconnections. Such visual linkages reveal both direct and indirect relationships among entities, thereby achieving the extraction of implicit information from the heritage knowledge graph.
  • HBIM-Driven Domain Expertise: The third aspect of knowledge generation pertains to domain-specific architectural knowledge mining, primarily involving the extraction of special structural configurations, construction information from architectural heritage, and insights assisting designers in reconstruction or maintenance. All such information mining is based on HBIM systems constructed via meta-models. A meta-model is defined as an abstract framework describing concepts and their interrelationships within a specific domain, serving to standardize and characterize information within that field. It functions as a model about models to enhance the comprehension and management of underlying models. With the advent of the smart era, metadata as a structured descriptive language enables the processing of architectural heritage model data and the excavation of heritage values.

2.4. Case Study

This study employs two representative cases—the Historical Educational Sites in South China and the Dong ethnic drum towers—to demonstrate the implementation of smart data methodologies in constructing an architectural heritage data framework. The selection criteria for these cases were based on the following considerations:
  • Diversity in Heritage Typology: The cases represent two distinct types of architectural heritage—humanistic narratives (Historical Educational Sites) and structural logic (Dong ethnic drum towers)—ensuring the framework’s applicability across varied heritage contexts.
  • Geographical and Cultural Significance: Both cases are located in South China, a region rich in cultural diversity and historical significance, providing a robust testbed for the framework’s adaptability to regional heritage characteristics.
  • Scale and Complexity: The Historical Educational Sites feature fragmented, small-scale architectural remnants complemented by extensive archival records, while the Dong ethnic drum towers exhibit complex timber structural systems. This contrast validates the framework’s scalability and ability to handle heterogeneous data.
  • Data Availability: The cases offer comprehensive multi-source datasets, including 3D scans, historical texts, and technical drawings, essential for testing the framework’s data integration capabilities.
  • Conservation Urgency: Both cases represent heritage assets at risk due to physical degradation or intangible knowledge loss, aligning with the framework’s goal of enabling proactive conservation.
The “Historical Educational Sites in South China” are situated in Pingshi Town, Lechang, Shaoguan City, Guangdong Province. During the Second Sino-Japanese War, the Nationalist Government of China enacted a policy to relocate higher education institutions inland. Prestigious academic institutions such as Sun Yat-sen University, Lingnan University, and Pui Ching Middle School migrated to the Lechang area in northern Guangdong, sustaining educational activities under wartime conditions. Today, these sites comprise limited physical remnants of school structures alongside extensive archival materials, forming a cultural heritage landscape that demands systematic digital mechanisms for data collection, preservation, and scientifically guided adaptive reuse. However, the fragmented spatial distribution and small scale of surviving architectural remains render them inadequate for exemplifying the construction of a Heritage Building Information Modeling (HBIM) base.
Consequently, this study selects the Jitang Drum Tower—a provincial-level protected cultural heritage site (designated in 1982) located in Jitang Village, Zhaoxing Township, Liping County, Guizhou Province—as the primary validation case for HBIM base development. Dong Ethnic Drum Towers represent a prolific architectural typology within China’s cultural heritage, distinguished by their construction techniques, material selection, and structural logic that diverge from traditional imperial architecture. Their unique timber structural systems embody sophisticated carpentry knowledge rooted in indigenous practices. This case study effectively demonstrates the methodological framework proposed in this research, particularly in addressing multi-source heterogeneous data integration and knowledge extraction specific to architectural heritage.

3. Results

3.1. Data Acquisition and Processing

3.1.1. Architectural Remains

  • HBIM Database
The Jitang Drum Tower, a representative Dong ethnic structure, was selected for HBIM implementation. The workflow culminated in the generation of an intelligent mortise-and-tenon timber framework for the Jitang Drum Tower, achieving full HBIM integration of its structural system, and completed the collection of the architectural heritage HBIM base (Figure 5).
  • Knowledge Ontology Construction
This study utilized the Protégé ontology editing tool to construct the knowledge ontology of Dong drum towers. Academic analyses have identified recurring patterns in Dong drum towers’ structural systems: despite their diverse external forms, these towers share consistent planar geometries and variations of a fundamental frame unit. The timber framework and its key components exhibit annular topological relationships, classified into three meta-models: uniform frame typology, biaxial symmetric typology, and adaptive modular typology [15]. Representative examples of these meta-models include the Xiage Drum Tower (homogeneous type), Baxi Drum Tower (biaxial type), and Jitang Drum Tower (adaptive type). Each drum tower comprises three primary components: a timber structural framework, a honeycomb dougong at the rooftop finial, and a planar column grid.
Following ontology construction, semantic mapping and knowledge representation were performed, with OntoGraf generating the corresponding semantic relationship diagrams.

3.1.2. Humanistic Documentation

  • Document data
Our research team developed the Modern Guangdong Educational Institutions Corpus, which systematically documents historical records of universities, secondary schools, teacher training colleges, and sports academies established in Guangdong during the modern era. This corpus encompasses 25 institutions and 96 individuals associated with the South China Historical Educational Sites focusing on the wartime period (1937–1945) [29]. From CNKI, 362 Chinese publications (including journal articles and theses) were collected using keyword and thematic searches. Automated text extraction and preprocessing were implemented through Python-based (Python 3) NLP tools (NLTK/Jieba), followed by rule-based filtering adhering to temporal (1937–1945) and keyword matching criteria. Manual validation of 1208 candidate sentences resulted in 318 curated RDF triples, establishing a structured foundation for subsequent knowledge graph development.
  • Historical Archives
Archival materials were sourced from authoritative institutions including the Guangdong Provincial Archives and Guangzhou Municipal Archives, yielding a curated dataset of 519 items: 252 historical letters manuscripts, 166 photographs, 79 newspaper issues, and 22 architectural technical drawings.
Taking the example of an architectural technical drawing titled “the Chemistry Department Laboratory, School of Science, National Sun Yat-sen University,” the information extraction process is illustrated in Figure 6. Unstructured paper-based materials were first digitized through high-resolution scanning. Subsequently, domain-specific templates were applied to transform the digitized content into semi-structured data. Finally, structured data were extracted and formatted into RDF triples, enabling semantic interoperability and knowledge graph integration.

3.2. Knowledge Management and Visualization

As demonstrated in the preceding sections, the structured data of the South China Historical Educational Sites were stored in RDF format and imported into the Neo4j graph base, generating an interactive knowledge graph for visual exploration. The result of the generated interface can be seen in Figure 7. Concurrently, the RDF-formatted data of the Dong ethnic Jitang Drum Tower, based on its architectural ontological semantic relationships, were mapped and stored in the Neo4j graph base, enabling a knowledge graph-based visualization of the tower’s structural and cultural attributes. This methodology accomplished the systematic storage and visualization of both architectural ontology and associated humanistic documentation for the case studies.

4. Discussion

As mentioned earlier, this research establishes a smart data framework comprising three components through case studies of the South China Historical Educational Sites and the Dong ethnic Jitang Drum Tower: an HBIM repository for digital models, a structured knowledge base, and a knowledge graph mapping entity relationship. In this section, we discuss the reuse of knowledge based on the above research results, including visualization analysis based on Citespace, knowledge mining via Neo4j, and HBIM-driven knowledge generation for both case studies.

4.1. Visualization Analysis Based on Citespace

This study utilized the CNKI base as the primary source for bibliometric analysis. Using the corpus of South China Historical Educational Sites, a comprehensive search was conducted across three dimensions: subject, title, and keywords. After excluding irrelevant entries, a total of 362 Chinese-language publications were identified for analysis. By configuring relevant parameters in Citespace, a knowledge graph was generated to quantitatively analyze the research landscape, hotspots, and developmental trends in the field of South China Historical Educational Sites.
  • Word Frequency Analysis
Keywords serve as condensed representations of a publication’s core themes, reflecting its primary research focus [30]. As illustrated in the upper-left quadrant of Figure 8a, the keyword co-occurrence network for South China Historical Educational Sites comprises 571 nodes (representing distinct keywords) and 1110 links (denoting contextual relationships), with a network density of 0.0068. This low-density value indicates sparse interconnections among studies in this domain, suggesting fragmented research trajectories. Frequency analysis identifies predominant research hotspots, including wartime universities and key figures associated with educational activities during the Second Sino-Japanese War (1937–1945), as highlighted in Figure 8a.
  • Word Timeline Clustering
Timeline clustering analysis visualizes the evolution of keywords within each cluster and their interconnections. The timeline (1980–2022) demonstrates that research in this field emerged in the 1980s, with two dominant trends projected for future studies: (1) wartime education history during the Second Sino-Japanese War, and (2) the preservation and study of university remnants within the South China Historical Educational Sites (Figure 8b).
  • Thematic Cluster Analysis
Thematic cluster analysis consolidates semantically related research topics to distill core thematic focus [31]. The knowledge graph for South China Historical Educational Sites (Figure 8c) meets the standard validity criteria for bibliometric networks, with a modularity value (Q) of 0.412 and an average silhouette coefficient (S) of 0.682, confirming robust clustering quality. Analysis reveals 13 distinct clusters, including Wartime Period (#0), Liang-Guang Incident (#1), Second Sino-Japanese War (#2), Republic of China Period (#3), Nationalist Government (#4), Lingnan University (#5), Anti-Japanese Militias (#6), Anti-Japanese Resistance (#7), Sun Yat-sen University (#8), 20th Century (#9), Institute of History and Philology (#10), Guangdong (#11), and Cultural Exchange (#12).
Clusters #0, #2, #3, #7, and #9 collectively indicate a strong research focus on the wartime period (1937–1945), aligning with historical records of educational activities during the Second Sino-Japanese War. Clusters #1 and #6 reflect studies on pivotal historical events, such as regional conflicts and grassroots resistance movements. Notably, clusters #5 (Lingnan University) and #8 (Sun Yat-sen University) demonstrate their scholarly prominence, with extensive documentation of institutional narratives and associated figures, underscoring their central role in shaping the South China Historical Educational Sites discourse.
  • Discussion and Implications
Current research on South China Historical Educational Sites predominantly explores historical narratives and biographical accounts, with limited engagement in material heritage studies—particularly the documentation and conservation of surviving architectural remnants. This gap persists despite increasing national policy support for heritage preservation since 2015. To align with governmental priorities and maximize cultural value, future studies should prioritize material documentation: systematic recording of extant campus structures using HBIM and 3D laser scanning, and adaptive reuse frameworks: developing evidence-based strategies for repurposing heritage sites while preserving authenticity.
The application of the visualization tool (Citespace) has enabled multi-dimensional knowledge extraction from 362 publications, uncovering latent research patterns and forecasting disciplinary trajectories. This methodology exemplifies how smart data frameworks can transform fragmented archival data into actionable insights, bridging historical scholarship and contemporary conservation practices.

4.2. Knowledge Mining via Neo4j

Within knowledge graphs, semantic relationships between entities are categorized as direct or indirect [32]. Using the South China Historical Educational Sites knowledge graph as a case study, this section demonstrates semantic relationship queries through Cypher, the query language of the Neo4j graph base.
  • Direct Relationship
A Cypher query was executed to identify entities directly associated with the historical figure Xu Chongqing. As illustrated in Figure 9a, the results reveal direct connections to individuals such as Shi Zhaotang, Wang Yanan, and Lin Liru. By clicking on relationship edges, contextual ties (e.g., colleague, student, affiliation) are visualized. Double-clicking any node dynamically displays all linked entities, enabling exploratory analysis of the semantic network.
  • Indirect Relationship
A parallel query was designed to map entities associated with National Sun Yat-sen University. The results (Figure 9b) highlight direct relationships with figures such as Huang Linshu, Luo Jialun, and Mei Gongbin. The university exhibits an affiliation relationship with both Mei Gongbin and Xu Chongqing, inferring an indirect colleague relationship between these two individuals through inferential reasoning.

4.3. HBIM-Driven Knowledge Generation

The generation of architecture-specific knowledge relies on the HBIM repository, enabling systematic mining of architectural insights. Beyond constructing the ontological framework of architectural heritage within the smart data system, this process extracts critical knowledge from planar configurations, structural characteristics, construction logic, and component identification. The adoption of meta-modeling methodologies facilitates rapid knowledge extraction through preconfigured HBIM libraries.

4.3.1. Spatial Topology-Based Knowledge Mining

As established earlier, three meta-model typologies were identified for Dong ethnic mortise-and-tenon drum towers through analysis of their construction logic. For the Jitang Drum Tower, classification via a trained detection model confirmed its categorization as an adaptive modular typology (Figure 10). Leveraging this meta-model’s attributes, researchers can efficiently infer planar layouts, structural typologies, and constructional principles.

4.3.2. Spatial Localization and Identification Systems

The Jitang Drum Tower, as an adaptive modular typology, features a timber framework derived from repetitive frame units. Projection points of eave columns (yanzhu) on the plan are generated by rotating reference points around the central column (zhongzhu) at defined angles. Similarly, bracket columns (guazhu) are positioned by equidistant partitioning along inner and outer concentric circles, followed by rotational duplication, ultimately establishing planar projection relationships among side central columns, eave columns, and bracket columns (Figure 11a).
Scholars have proposed a tripartite identification system for Dong timber structures [33]: spatial localization of components within 3D frameworks, composite identifiers combining terminology and spatial coordinates, and Moshiwen—traditional carpenter scripts for component labeling.
For drum towers, spatial orientation is defined by the entrance-facing side as the front, with subsequent rear, left, and right orientations. Eave and central columns at the base are named accordingly (e.g., left-front eave corner column, left-central-front eave column). Bracket columns incorporate tier numbering into their identifiers (e.g., left-rear bracket column tier 2) to reflect vertical stacking (Figure 11b).

4.3.3. Design Strategy Guidance

Traditional Dong drum towers were constructed using full-scale, blueprint-free methods guided by artisans’ empirical knowledge, with minimal reliance on technical drawings. While preserving cultural authenticity, this approach complicates conservation, reconstruction, and repair due to scarce documentation. The meta-model framework addresses this challenge through parameter-driven adjustments of key structural and aesthetic parameters (e.g., roof curvature, column spacing). By inputting terrain constraints and design preferences, the system automates model generation, providing architects with data-driven design alternatives. This methodology not only aids decision-making but also digitally preserves intangible construction techniques for future generations.

4.4. Comparative Analysis with Prior Studies

Methodological Advancements: Existing studies on architectural heritage digitization predominantly focus on single-modality data integration, such as HBIM for geometric documentation [13,34] or NLP-driven text mining for historical narratives [12]. In contrast, this framework unifies geometric, semantic, and contextual data streams through a knowledge-centric pipeline. Compared to Ursini et al. [14], whose scan-to-HBIM workflow achieved 94.7% geometric accuracy but lacked semantic linkages, our approach integrates NLP-extracted entity relationships (e.g., educator networks) into HBIM models, enabling bidirectional queries between spatial components and archival records—a capability absent in conventional systems.
Knowledge Graph Innovations: While semantic ontologies for cultural heritage have been explored (e.g., Khouri et al. [3] for Algerian traditional houses), prior works often prioritize static taxonomies over dynamic pattern discovery. The Neo4j-driven knowledge graph here not only maps entity relationships but also identifies latent spatiotemporal patterns (e.g., construction typology evolution in drum towers), surpassing the descriptive analytics of traditional ontology tools. This aligns with Bibri et al.’s [20] vision of smart data for sustainable heritage but extends it through domain-specific rule engines.
Theoretical Implications: The proposed framework redefines heritage conservation as a knowledge-to-action paradigm, contrasting with the data-to-information focus of earlier KDD models. By embedding domain expertise (e.g., mortise-and-tenon typologies) into smart data pipelines, it addresses the “knowledge extraction bottleneck” noted by Fan and Zeng [4].

5. Conclusions

This study proposes a smart data-driven framework for architectural heritage conservation, synthesizing multi-source heterogeneous data into a cohesive, machine-interpretable ecosystem, aiming to bridge fragmented data practices in architectural heritage conservation and enabling systematic knowledge generation from multi-source heterogeneous datasets. Through dual case studies of the South China Historical Educational Sites and the Dong ethnic Jitang Drum Tower, the framework demonstrates its capacity to integrate geometric documentation, historical narratives, and construction logic within a unified digital workflow. The aggregation of 362 scholarly publications, 519 archival records, and 20.5 GB of structured data (including HBIM models) facilitated the creation of 2652 RDF triples and 1701 interconnected nodes, forming a knowledge graph that bridges tangible heritage and intangible craftsmanship.
The research advances the field through three key innovations. First, the integration of HBIM repositories, knowledge bases, and graph-based visualization addresses longstanding challenges in fragmented heritage data management, offering a scalable model for digital conservation. Second, the dual analytical approach—combining bibliometric trend analysis via Citespace with Neo4j-driven relationship mining—reveals latent research patterns while translating traditional construction wisdom into parametric design guidelines through HBIM. Third, the digital preservation of Dong Moshiwen scripts and mortise-and-tenon systems exemplifies how smart data methodologies can safeguard endangered vernacular knowledge against technological erosion.
Despite these contributions, limitations merit acknowledgment. The dual-case validation, while enriching comparative insights, introduced intermittent discontinuities during cross-dataset integration. Additionally, the framework’s current reliance on conventional base architectures restricts real-time interoperability with emerging platforms such as open-access GIS interfaces—a gap future studies could address through cloud-based solutions.
This work redefines architectural heritage data systems as dynamic ecosystems balancing ontological precision (HBIM-driven geometries) and humanistic depth (contextual narratives), establishing a replicable workflow from data acquisition (LiDAR/archival digitization) to knowledge generation (AI-enhanced pattern mining). The empirically validated case studies demonstrate smart data’s potential to harmonize conservation ethics with technological innovation. The proposed framework exhibits broad applicability across heritage conservation domains. It is universally adaptable to sites integrating physical structures with humanistic narratives, such as ancient temples contextualized by ritual texts or colonial-era buildings enriched with archival blueprints. For urban heritage, the methodology enables holistic management of historical districts by correlating architectural models with socio-economic archives, while in intangible cultural heritage (ICH), it supports the digitization of traditional crafts (e.g., Chinese lacquerware techniques) through linkages between artisan interviews, material bases, and craft processes. Industrial heritage applications include managing factory complexes via integrated 3D models, machinery manuals, and oral histories. Furthermore, the framework’s scalability empowers centralized knowledge repositories for large-scale systems like the Silk Road network, facilitating cross-site comparative analysis and proactive risk monitoring—capabilities critical for safeguarding national heritage assets.
Future studies should focus on fostering interdisciplinary collaborations to expand analytical dimensions, particularly in machine learning-aided damage prediction and blockchain-based provenance tracking, positioning the methodology as a global benchmark for data-driven heritage stewardship.

Author Contributions

Conceptualization, Z.R. and G.W.; methodology, Z.R.; software, Z.R.; validation, Z.R. and G.W.; formal analysis, Z.R. and G.W.; investigation, Z.R.; resources, Z.R.; data curation, G.W.; writing—original draft preparation, Z.R.; writing—review and editing, G.W.; visualization, Z.R.; supervision, G.W.; project administration, Z.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Fayyad, U. From Data Mining to Knowledge Discovery in Databases. AI Mag. 1996, 17, 37–54. [Google Scholar] [CrossRef]
  2. Liu, F.; Qiang, W.; Lu, Z.; Fan, Y.; Wang, H. Research on Heritage Characteristics Based on Railway Architectural Heritage Database in Jinzhou Section of the Peking–Mukden Railway. Front. Archit. Res. 2024, 13, 1127–1144. [Google Scholar] [CrossRef]
  3. Khouri, S.; Oufaida, H.; Amrani, R.; Kacher, S.; Ouahab, S.; Cherrad, M. Knowledge Base Construction for the Semantic Management of Environment-Enriched Built Heritage: The Case of Algerian Traditional Houses Architecture. J. Cult. Herit. 2023, 63, 217–229. [Google Scholar] [CrossRef]
  4. Fan, W.; Zeng, L. Exploring Smart Data Generation Paths for the Activation and Utilization of Cultural Heritage in the AI Era. J. China Soc. Libr. Sci. 2024, 50, 4–29. [Google Scholar] [CrossRef]
  5. Zhang, X.; Zhi, Y.; Xu, J.; Han, L. Digital Protection and Utilization of Architectural Heritage Using Knowledge Visualization. Buildings 2022, 12, 1604. [Google Scholar] [CrossRef]
  6. Palomar, I.J.; García Valldecabres, J.L.; Tzortzopoulos, P.; Pellicer, E. An Online Platform to Unify and Synchronise Heritage Architecture Information. Autom. Constr. 2020, 110, 103008. [Google Scholar] [CrossRef]
  7. Acierno, M.; Cursi, S.; Simeone, D.; Fiorani, D. Architectural Heritage Knowledge Modelling: An Ontology-Based Framework for Conservation Process. J. Cult. Herit. 2017, 24, 124–133. [Google Scholar] [CrossRef]
  8. Chen, K.; Lu, W.; Xue, F.; Tang, P.; Li, L.H. Automatic Building Information Model Reconstruction in High-Density Urban Areas: Augmenting Multi-Source Data with Architectural Knowledge. Autom. Constr. 2018, 93, 22–34. [Google Scholar] [CrossRef]
  9. Wang, H.; He, X.; Yan, Z.; Lei, S.; Luo, S.; Lei, J.; Zhou, Q. Research on Pathology Information Management of Educational Architectural Heritage Based on Digital Technology: The Case of James Jackson Gymnasium. Buildings 2024, 14, 1048. [Google Scholar] [CrossRef]
  10. Iafrate, F. What Is Smart Data. In From Big Data to Smart Data; Wiley: Hoboken, NJ, USA, 2015; pp. 13–20. ISBN 978-1-119-11926-5. [Google Scholar]
  11. Angileri, G.; Cernaro, A.; Ferrero, M.; Fiandaca, O. Learning by Modelling in H-BIM Environment to Develop a Framework for Restoration. Application to the Stone Cladding of Casa Delle Armi of Luigi W. Moretti in Rome. J. Cult. Herit. 2025, 73, 28–42. [Google Scholar] [CrossRef]
  12. Néroulidis, A.; Pouyet, T.; Tournon, S.; Rousset, M.; Callieri, M.; Manuel, A.; Abergel, V.; Malavergne, O.; Cao, I.; Roussel, R.; et al. A Digital Platform for the Centralization and Long-Term Preservation of Multidisciplinary Scientific Data Belonging to the Notre Dame de Paris Scientific Action. J. Cult. Herit. 2024, 65, 210–220. [Google Scholar] [CrossRef]
  13. Costantino, D.; Pepe, M.; Restuccia, A.G. Scan-to-HBIM for Conservation and Preservation of Cultural Heritage Building: The Case Study of San Nicola in Montedoro Church (Italy). Appl. Geomat. 2023, 15, 607–621. [Google Scholar] [CrossRef]
  14. Ursini, A.; Grazzini, A.; Matrone, F.; Zerbinatti, M. From Scan-to-BIM to a Structural Finite Elements Model of Built Heritage for Dynamic Simulation. Autom. Constr. 2022, 142, 104518. [Google Scholar] [CrossRef]
  15. Guangzhou University. A Meta-Model and Construction Logic-Based Automated Building Information Modeling Method; Technical Report; Guangzhou University Press: Guangzhou, China, 2020. [Google Scholar]
  16. Cursi, S.; Martinelli, L.; Paraciani, N.; Calcerano, F.; Gigliarelli, E. Linking External Knowledge to Heritage BIM. Autom. Constr. 2022, 141, 104444. [Google Scholar] [CrossRef]
  17. Shamshiri, A.; Ryu, K.R.; Park, J.Y. Text Mining and Natural Language Processing in Construction. Autom. Constr. 2024, 158, 105200. [Google Scholar] [CrossRef]
  18. Chen, X.; Xie, H.; Tao, X.; Wang, F.L.; Leng, M.; Lei, B. Artificial Intelligence and Multimodal Data Fusion for Smart Healthcare: Topic Modeling and Bibliometrics. Artif. Intell. Rev. 2024, 57, 91. [Google Scholar] [CrossRef]
  19. Wu, X.; Zhu, X.; Wu, G.-Q.; Ding, W. Data Mining with Big Data. IEEE Trans. Knowl. Data Eng. 2014, 26, 97–107. [Google Scholar] [CrossRef]
  20. Bibri, S.E.; Krogstie, J. Smart Sustainable Cities of the Future: An Extensive Interdisciplinary Literature Review. Sustain. Cities Soc. 2017, 31, 183–212. [Google Scholar] [CrossRef]
  21. Zeng, L.; Wang, X.G.; Fan, W. Smart Data in GLAM Fields and Its Role in Digital Humanities Research. J. Libr. Sci. China 2018, 44, 17–34. [Google Scholar] [CrossRef]
  22. Xu, F.; Jin, X.P. A Review of Cultural Heritage Digital Preservation Based on Linked Data. J. Natl. Libr. China 2020, 29, 90–99. [Google Scholar] [CrossRef]
  23. Zhou, L.Q. Research on Multi-Source Heterogeneous Knowledge Fusion for Smart Health. Ph.D. Thesis, Wuhan University, Wuhan, China, 2022. [Google Scholar]
  24. Barati, M.; Bai, Q.; Liu, Q. Mining Semantic Association Rules from RDF Data. Knowl.-Based Syst. 2017, 133, 183–196. [Google Scholar] [CrossRef]
  25. Csalódi, R.; Bagyura, Z.; Vathy-Fogarassy, Á.; Abonyi, J. Time-Dependent Frequent Sequence Mining-Based Survival Analysis. Knowl.-Based Syst. 2024, 296, 111885. [Google Scholar] [CrossRef]
  26. Guo, S.H. Research on Machine Vision-Assisted Intelligent Modeling Methods for Dong Ethnic Mortise-and-Tenon Architecture. Master’s Thesis, Guangzhou University, Guangzhou, China, 2024. [Google Scholar]
  27. Zou, J.; Deng, Y. Intelligent Assessment System of Material Deterioration in Masonry Tower Based on Improved Image Segmentation Model. Herit. Sci. 2024, 12, 252. [Google Scholar] [CrossRef]
  28. Digital Management System for Historical and Educational Heritage Sites in South China. Available online: https://open.okaygis.com/hnyx/login (accessed on 25 May 2025).
  29. Guangdong’s Comprehensive Exploration and Revitalization of Historical Educational Sites during the Anti-Japanese War Period. Available online: http://www.nanyueguyidao.cn/ViewMessage.aspx?MessageId=9921 (accessed on 22 May 2025).
  30. Zhang, R.M. Semi-Automatic Construction of Huizhou Architectural Knowledge Graph. J. Anhui Jianzhu Univ. 2021, 29, 13–19. [Google Scholar]
  31. Yang, C. Research on Ontology-Based Knowledge Graph Construction of Huizhou Architecture. Master’s Thesis, Anhui Jianzhu University, Hefei, China, 2021. [Google Scholar]
  32. Santos, V.; Cuconato, B. NoSQL Graph Databases: An Overview. arXiv 2024, arXiv:2412.18143. [Google Scholar]
  33. Cai, L.; Zhang, Q.; Deng, Y. Terminology and Identification System for Timber Structural Components of Dong Ethnic Drum Towers (Part I). Technol. Anc. Archit. Landsc. 2020, 4, 31–33. [Google Scholar]
  34. Intrigila, C.; Giannetti, I.; Eramo, E.; Gabrielli, R.; Caruso, G. HBIM for Conservation and Valorization of Structural Heritage: The Stylite Tower at Umm Ar-Rasas, Jordan. J. Cult. Herit. 2024, 70, 397–407. [Google Scholar] [CrossRef]
Figure 1. Components of the smart data framework for architectural heritage conservation [source: the authors].
Figure 1. Components of the smart data framework for architectural heritage conservation [source: the authors].
Buildings 15 02122 g001
Figure 2. Workflow of the smart data system construction for architectural heritage [source: the authors].
Figure 2. Workflow of the smart data system construction for architectural heritage [source: the authors].
Buildings 15 02122 g002
Figure 3. Workflow for data extraction from document data [source: the authors].
Figure 3. Workflow for data extraction from document data [source: the authors].
Buildings 15 02122 g003
Figure 4. Format templates for historical archives [source: the authors].
Figure 4. Format templates for historical archives [source: the authors].
Buildings 15 02122 g004
Figure 5. Workflow of HBIM reconstruction for the Jitang Drum Tower (the figure retains original Chinese cultural specificity (e.g., historical terms, structural nomenclature)). [source: the authors].
Figure 5. Workflow of HBIM reconstruction for the Jitang Drum Tower (the figure retains original Chinese cultural specificity (e.g., historical terms, structural nomenclature)). [source: the authors].
Buildings 15 02122 g005
Figure 6. Workflow for data extraction from historical archive (the figure retains original data with is Chinese (e.g., historical drawings, historical letter)). [source: the authors].
Figure 6. Workflow for data extraction from historical archive (the figure retains original data with is Chinese (e.g., historical drawings, historical letter)). [source: the authors].
Buildings 15 02122 g006
Figure 7. Knowledge graph visualization generated by Neo4j [source: the authors].
Figure 7. Knowledge graph visualization generated by Neo4j [source: the authors].
Buildings 15 02122 g007
Figure 8. Bibliometric analysis of the South China Historical Educational Sites using Citespace. (a) Keyword frequency co-occurrence network; (b) timeline clustering of research trends; (c) thematic cluster mapping (the figure retains primary Chinese data (e.g., historical drawings, historical letter)). [source: the authors].
Figure 8. Bibliometric analysis of the South China Historical Educational Sites using Citespace. (a) Keyword frequency co-occurrence network; (b) timeline clustering of research trends; (c) thematic cluster mapping (the figure retains primary Chinese data (e.g., historical drawings, historical letter)). [source: the authors].
Buildings 15 02122 g008
Figure 9. Semantic relationship exploration in Neo4j for the South China Historical Educational Sites. (a) Direct associations of Xu Chongqing; (b) direct and inferred linkages of National Sun Yat-sen University [source: the authors].
Figure 9. Semantic relationship exploration in Neo4j for the South China Historical Educational Sites. (a) Direct associations of Xu Chongqing; (b) direct and inferred linkages of National Sun Yat-sen University [source: the authors].
Buildings 15 02122 g009
Figure 10. Knowledge mining of the Jitang Drum Tower based on spatial topological relationships [source: the authors].
Figure 10. Knowledge mining of the Jitang Drum Tower based on spatial topological relationships [source: the authors].
Buildings 15 02122 g010
Figure 11. Structural analysis of the Jitang Drum Tower. (a) Planar geometric relationships of the timber framework; (b) spatial localization and nomenclature system for column positioning [source: the authors].
Figure 11. Structural analysis of the Jitang Drum Tower. (a) Planar geometric relationships of the timber framework; (b) spatial localization and nomenclature system for column positioning [source: the authors].
Buildings 15 02122 g011
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rao, Z.; Wang, G. Smart Data-Enabled Conservation and Knowledge Generation for Architectural Heritage System. Buildings 2025, 15, 2122. https://doi.org/10.3390/buildings15122122

AMA Style

Rao Z, Wang G. Smart Data-Enabled Conservation and Knowledge Generation for Architectural Heritage System. Buildings. 2025; 15(12):2122. https://doi.org/10.3390/buildings15122122

Chicago/Turabian Style

Rao, Ziyuan, and Guoguang Wang. 2025. "Smart Data-Enabled Conservation and Knowledge Generation for Architectural Heritage System" Buildings 15, no. 12: 2122. https://doi.org/10.3390/buildings15122122

APA Style

Rao, Z., & Wang, G. (2025). Smart Data-Enabled Conservation and Knowledge Generation for Architectural Heritage System. Buildings, 15(12), 2122. https://doi.org/10.3390/buildings15122122

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop