1. Introduction
In a 2003 product lifecycle management course at the University of Michigan, Professor Michael Grieves first introduced a framework he initially termed the “Mirrored Space Model” [
1], which was later renamed the Digital Twin in 2011. In 2012, the National Aeronautics and Space Administration (NASA) released a technology roadmap describing the Digital Twin concept as a physical mirror of a product that integrates multidisciplinary, multiscale simulation processes to reflect its entire lifecycle, utilizing physical models, sensor data, and historical data [
2]. Digital Twins have attracted increasing research interest from the academic community in recent years owing to advancement in new-generation information technologies [
3], with widespread practical applications in the industrial sector [
4].
In the development of Digital Twin technology, constructing a Digital Twin model is a crucial prerequisite for its realization. Research on the construction of a high-fidelity Digital Twin model has attracted significant interest from numerous experts, scholars, and corporate institutions worldwide [
5]. The initial conceptual model of the Digital Twin was a three-dimensional model comprising the physical entity, virtual entity, and the connections between them [
6]. Subsequently, Fei et al. proposed a five-dimensional Digital Twin model that includes the physical entity, virtual entity, twin data, services, and connections based on the previous three-dimensional model [
7]. Theodor et al. [
8] proposed a four-dimensional Digital Twin architecture which includes data acquisition and transmission, virtual twins, predictive twins, and decision twins. These four dimensions are categorized function, helping operators understand the capabilities of the virtual entity model. Zheng et al. [
9] proposed a Digital Twin product lifecycle management application framework based on the Physical Space, Information Processing Layer, and Virtual Space. These models provide standardized reference frameworks for the construction of Digital Twins.
Digital Twins have been extensively applied in various fields, including aerospace [
10,
11], manufacturing workshops [
12,
13], smart cities [
14,
15], healthcare [
16,
17] and smart water management [
18,
19]. Especially in the field of smart water management, with the continuous development of Digital Twin modeling and simulation, remarkable progress has been achieved in both the facility and overall optimization of water purification systems. For example, in terms of settling techniques, Plósz et al. developed and validated Vesilind function for hindered settling, leading to a new exponential function addressing the compression settling velocity [
20]; For pump technology, Nguyen et al. developed an adjustable tongue vane that controls the internal flow direction in the volute to increase the energy performance of a single-channel pump [
21]. In the field of whole-plant wastewater treatment modeling and optimization technology, Ekama integrated steady-state models of the activated sludge process and sludge digestion with stoichiometric conversions of biological processes, thereby developing a whole-plant wastewater treatment model based on mass balances of carbon, hydrogen, oxygen, nitrogen, chemical oxygen demand, and charge to aid in the layout design of treatment plants [
22]. In the field of flow monitoring and hydraulic optimization technology, Matias et al. employed a method combining in situ experiments with computational fluid dynamics (CFD), using Large-Scale Particle Image Velocimetry (LS-PIV) to obtain flow characteristic parameters such as cross-sectional velocity distribution and discharge rate, and established a numerical model based on the Reynolds-averaged Navier–Stokes equations with the k-ε turbulence model, ultimately completing the calibration of the rectangular weir discharge equation, providing technical support for ensuring reservoir water quality and public health [
23].
The Digital Twin implementation process can be summarized into the following core stages: requirement analysis and goal definition, model construction and hierarchical mapping, and data integration and real-time connection.
Despite these advances, a unified Digital Twin methodology for cross-domain applications has not been realized. Significant challenges persist, including disparities in application requirements, integration complexity of multidomain heterogeneous models, and lack of interoperability standards. Although recent studies have proposed various domain-specific solutions [
24,
25,
26,
27] and modular approaches to improve reusability [
28,
29,
30,
31], a systematic and universal methodology for constructing complex system-level Digital Twins has not been implemented.
To bridge this gap, this study proposes a novel Digital Twin modeling method based on a hierarchical decoupling architecture and topological connection mechanism. The main contributions of this study are summarized below.
The system complexity is reduced through hierarchical functional decoupling, thereby establishing an architectural foundation for independent component development and reuse.
This study proposed a method for constructing component-level Digital Twins based on standardized information sets.
A multi-dimensional topological connection mechanism is designed based on graph theory.
To address the diverse functionalities and variable scenarios of system-level Digital Twins, a progressive construction method (“hierarchical decoupling-topological connection”) is proposed (
Figure 1). This method first decomposes the overall system top–down into multiple relatively functionally independent subsystems based on core functions, key characteristics, and application objectives. Each subsystem is then further refined into several basic components following the same principles of functional, characteristic, and scenario division. To achieve system integration, the physical and logical connection relationships between components must be precisely described, specifically by abstracting components as nodes and their interaction relationships as edges, generating an adjacency matrix, and constructing a topological connection model based on graph theory. Using this as a structural blueprint, components are assembled bottom–up layer by layer to form subsystem Digital Twins, which are ultimately assembled into a complete system-level Digital Twin, thereby supporting the gradual restoration of system functions and emergent holistic behaviors.
The novelty of this study lies in three key aspects. First, unlike existing methods that use untargeted system decomposition, its scenario-driven hierarchical decoupling achieves a balance between modular independence and integration feasibility. Second, it improves compatibility in existing component modeling by introducing standardized information sets for component-level digital twins. Third, through a graph theory-based multidimensional topological mechanism, it quantifies both physical and logical interactions, surpassing traditional approaches that consider only physical connections. Collectively, these aspects establish a “decomposition–modeling–integration” loop, offering a new approach to complex system-level digital twin construction.
The remainder of this paper is structured as follows:
Section 2 presents a review of the challenges, core characteristics, and current methodological gaps in system-level Digital Twins, as identified in the literature.
Section 3 introduces the construction method of standardized information sets and the partitioning and connection methods for system-level Digital Twins.
Section 4 presents the validation of the feasibility of the proposed method through a typical modeling case study of a water purification system. Discussion is shown in
Section 5. The main conclusions of the study are presented in
Section 6, whereas future research directions are discussed in
Section 7.
3. System-Level Digital Twin Assembly Methodology
The hierarchical decomposition of system Digital Twins can be performed according to multi-dimensional criteria, including temporal, spatial, and operational states. Entities at different levels typically exhibit distinct functions. During the construction of system-level Digital Twins, a spatial scale should be adopted as the fundamental partitioning criterion to establish the hierarchical architecture, with entities categorized into corresponding contextual units based on functionality.
The hierarchical decoupling architecture for system-level Digital Twins can be implemented as follows: First, the system is partitioned into component, subsystem, and system layers based on functional, dimensional, and state differences (where state differences refer to distinctions in operational or lifecycle states, such as data update frequency or component lifespan). At the component layer, a standardized information set (SIS) is established to store basic attribute information, physical parameters, structural materials, and operational parameters, whereas standardized service interfaces enable a “data-as-generated” dynamic construction mechanism where SIS data changes directly trigger twin reconstruction. In the subsystem and system layers, graph theory-based adjacency matrices are constructed according to the spatial topology and functional dependencies of the physical system to represent the connection relationships between components or subsystems. Component-level Digital Twins can be aggregated bottom–up based on topological connection rules defined by the adjacency matrix, first integrated into subsystem-level Digital Twins, and ultimately coupled into a complete system-level Digital Twin. The hierarchical and coupling relationships of this system integration structure are shown in
Figure 2.
3.1. System Partitioning Methodology
The overall architecture of system-level Digital Twins adopts a hierarchical decoupling approach, the core of which is the structural decomposition of complex physical systems. Specifically, the fundamental basis for hierarchical partitioning comes from analyzing the intrinsic correlations of the system. Two key factors are prioritized in this analysis: (1) the degree of functional goal aggregation among system units and (2) the tightness of coupling in their physical connections.
System Level: At the highest level, the system level represents the macro-level functional implementation of the entire physical object. System-level Digital Twins focus on the comprehensive performance metrics of the entire system and the degree of fulfilment of the overall operational objectives. When delineating the specific scope of the Digital Twins system, one must consider not only the physical boundaries in material form but also jointly define them through key input/output interfaces where the system interacts with its external environment or related systems. Simultaneously, the top-level performance metric requirements corresponding to the core tasks of the system-level Digital Twin must be comprehensively considered.
Subsystem Level: The subsystem level occupies the intermediate layer of the entire hierarchical system, bearing the collaborative logic of specific functional modules with the primary objective of describing and implementing localized coordination processes with well-defined functional orientations within the system. Digital Twins at this level are aggregated from component-level entities based on physical topological relationships. When delineating subsystem boundaries, the core criterion is the relative independence of information flow, specifically manifested by significantly higher data exchange and coordination requirements among functional units within a subsystem compared with the interaction needs between subsystems.
Component Level: As the fundamental unit layer, the component level resides at the base of the entire hierarchical architecture, and its modeling objects are the smallest functional units that constitute the physical system. When determining the specific partitioning scale for this level, it is necessary to fully balance the relationship between modeling precision requirements and practical engineering technical conditions. Specifically, each physical unit designated as an independent component must be capable of deploying and configuring the sensing systems required to perceive, monitor, and control its own critical operational state parameters without relying on external support.
3.2. Construction of Component-Level Digital Twin Models Based on Information Sets
This section elaborates on a construction method for component-level Digital Twins based on standardized information sets. The method employs “standardized information sets” as both the unified data source and driving core for physical components within the Digital Twin space. It encapsulates four categories of data, namely basic attribute information, physical parameter information, structural material information, and operational parameter information, within a unified structure and provides plug-and-play instantiation capability for twins through a set of universal interfaces. Digital Twins can be automatically generated or updated on demand solely by invoking the internal resources of the information set. Any updates to data within the information set will drive synchronous changes in the Digital Twin’s state through interfaces, thereby maintaining dynamic consistency with the physical entity.
The descriptive method for information set data is crucial for its successful transition from a theoretical framework to technical implementation, requiring simultaneous satisfaction of both low-level data parsing and code development requirements, as well as the upper-level application needs of component-level Digital Twins. Therefore, the descriptive format of information sets should balance human interpretability with machine executability.
Based on the above requirements, standardized information sets must clearly define the following four categories of key information to comprehensively describe component entities.
Basic attribute information used for identifying the component’s identity and static characteristics, including a globally unique identifier for unambiguous retrieval across systems and platforms; data category and version number to support backward compatibility during information set evolution; geometric topology description containing 3D entities, assembly constraints, and interface coordinate systems; and a topological connection endpoint list recording physical or logical interfaces with other components and their associated device IDs.
Physical parameter information characterizing the dynamic features and performance boundaries during component operation, including real-time measurements, cumulative statistics, extreme operating condition ranges, environmental excitation conditions, failure thresholds, and health status indicators, providing inputs for multi-physics simulation, condition monitoring, and lifespan prediction.
Structural material information describing the component’s material composition and macroscopic physical properties, including material type, standard systems, typical parameters, process history, alternative material indices, and simulation-oriented simplified characterization methods, providing a data foundation for strength, thermal analysis, and reliability assessment.
Business parameter information encapsulating universal rules and strategy templates for component operation, control, management, and decision-making, including start-stop logic, anomaly criteria, alarm classification, maintenance strategies, safety constraints, permission models, and coordination interfaces with external systems, providing orchestratable semantic support for upper-level business systems.
To achieve systematic integration and executability of the four aforementioned information categories within the information set, this study used a rigorous mathematical modeling language based on set theory to describe the information set, ensuring that its content can be precisely expressed and consistently processed. The mathematical model of the information set is given below.
where
SIS represents the standardized information set,
A denotes basic attribute information,
P represents physical parameter information,
M indicates structural material information, and
B denotes business parameter information.
Within the standardized information set, the basic attribute model provides detailed and precise descriptions of component physical characteristics, accurately capturing and digitally representing fundamental attributes, such as unique identifiers, version numbers, production dates, and geometric and topological descriptions. The mathematical description of the basic attribute model is as follows:
where
ID represents the identification information of the physical entity in the standardized information set.
G denotes the set of physical entity dimension attributes, including geometric dimensions, scale ranges, and geometric contours.
TD(
i,
j) is a matrix describing topological dependencies between components, including upstream and downstream related components. Specifically,
TD ∈ {0, 1}
N×N, where
N denotes the number of components.
TD(
i,j) = 1 indicates that component i is an upstream dependency of component j; otherwise,
TD(
i,j) = 0.
TC is the set of topological constraint relationships between physical entities, including maximum connection numbers and compatibility rules.
- 2.
Physical Parameter Modeling
Within the standardized information set, the physical parameter model characterizes three aspects of physical components: real-time status, historical evolution, and responses to external events. For physical parameter modeling, two requirements must be met: (1) it must cover the physical laws and data-driven rules that components follow during operation; (2) it must include continuous or discrete descriptions of how state variables evolve over time. Therefore, the mathematical model for the physical parameters is as follows:
where
S represents the static parameters of the physical entity, including rated power, rated voltage, and rated flow rate of the equipment.
Dk(
t) denotes real-time operational state data during the physical entity’s working process, including real-time pressure, real-time temperature, and real-time flow rate. Here,
Dk(
t) satisfies the dynamic update equation
dk(
t) =
dk(
t − ∆
t) + ∆
dk(
t), where ∆
t represents the sampling period, and ∆
dk(
t) represents the increment from
t − ∆
t to
t.
E indicates environmental data of the physical entity, including ambient temperature, humidity, and environmental interference.
- 3.
Structural Material Modeling
The structural material model consists of material properties, manufacturing process attributes, environmental interaction parameters, and failure characteristics. Material properties provide the constitutive relationships of components, thereby supporting high-fidelity simulations and lifespan assessments. Manufacturing process attributes characterize the relationships between the processing techniques, microstructure, and performance. Environmental interaction parameters describe the evolution patterns of coupled effects such as corrosion, oxidation, and irradiation in working environments. Material failure characteristics include long-term load-bearing failure features and instantaneous ultimate failure characteristics. The mathematical model of the structural material model is as follows:
where
MAT represents the material properties, including material type, density, and thermal conductivity.
MFG denotes the material manufacturing processes, here,
f represents a process parameter, such as welding techniques, surface roughness, and dimensional tolerances.
τ represents the allowable deviation for the process parameter, which satisfies the inequality |
fk-fstd ≤
τk|.
fstd is the standard process parameter.
ENV(
t) represents the environmental durability parameters, including hygrothermal coupling coefficients, corrosion rates, and photodegradation rates. The value of
ENV(
t) varies over time.
LOS denotes the material failure characteristics, including fatigue life, creep life, and fracture toughness.
- 4.
Business Parameter Modeling
Within the standardized information set, the business parameter model encapsulates the behavioral rules and decision logic of the components in operation, control, and management scenarios, abstracting industry standards and engineering experience into computable and orchestratable semantic units to achieve precise mapping and dynamic response of Digital Twins to business processes, thereby supporting their adaptive state adjustment under complex working conditions. Therefore, the mathematical model of business parameters is as follows:
where
CTL represents physical entity control strategies, including start-stop sequences, PID parameters, and priority rules.
SAF denotes safety constraints, including anomaly thresholds, alarm levels, and fault tolerance strategies. Here,
dmin,k,
dwarn,k and
dmax,k respectively denotes the three-level thresholds (lower limit, warning, upper limit), and L represents the alarm levels (normal, level 2 alarm, level 1 alarm). A level 2 alarm (
L2) is triggered when
dk(
t) ∈ [
dwarn,k,
dmax,k], and a level 3 alarm (
L3) is triggered when
dk(
t) >
dmax,k or
dk(
t) <
dmin,k.
MAINT represents maintenance policies, such as periodic inspection cycles, condition-based maintenance thresholds, and emergency repair thresholds. Here,
mai1,
mai2 and
mai3 represent regular maintenance, condition-based maintenance, and emergency maintenance, respectively, where
Treg denotes the regular maintenance cycle.
dmaint,k represents the condition-based maintenance threshold.
POL denotes policy configurations, including operator instruction sets and permission role tables.
AUD represents auditing and tracing, encompassing operation logs, responsible person identifiers, and event timestamps.
The overall framework of the standard information set is shown in
Figure 3.
3.3. JSON-LD-Based Standardized Description and Construction Method for Information Sets
To achieve scalability and semantic consistency of information sets across platforms, we used JSON-LD as the structured data carrier. JSON-LD is a semantic data representation format using the JSON syntax, which supports explicit data semantics through context mechanisms, thereby facilitating system interoperability and integration [
52]. This format offers excellent readability and developer-friendliness, supports flexible schema extension, and strictly adheres to linked data principles. It enables seamless compatibility with other systems that use JSON-LD, making it suitable for standardized representation of information sets.
The process for constructing information sets based on JSON-LD is shown in
Figure 4. First, the semi-structured and non-semi-structured data generated during the design, manufacturing, and operation phases are uniformly collected and preprocessed to form a heterogeneous raw dataset. Branch processing is then performed based on “whether reusable domain ontologies exist.” If available, the existing domain ontologies are reused; otherwise, new domain vocabularies must be constructed. Next, a JSON-LD skeleton is generated, and the structured data are mapped to four core information categories (basic attributes, physical parameters, structural materials, and business parameters) using entity alignment and attribute value population techniques to complete semantic encapsulation. Finally, all concepts and relationships are integrated to form a structured information set that comprehensively describes the characteristics of physical entities.
By integrating component characteristics and behavioral logic, the standardized information set establishes a single data source-driven mechanism for generating and updating Digital Twins, ensuring dynamic consistency between physical entities and digital models, enhancing the construction efficiency and maintainability of component twins, providing a modular foundation for system-level Digital Twin integration, and supporting machine-parsable model computation requirements.
3.4. Graph Theory-Based System-Level Digital Twin Integration Method
Subsystem- and system-level Digital Twins comprise multiple components and their connection relationships. To achieve systematic integration of component-level Digital Twins, graph theory methods can be employed to represent and process system-level Digital Twins.
In graph theory, a graph consists of two types of elements: nodes and edges.
where V and E represent nodes and edges, respectively.
In the context of system-level Digital Twins, individual components in the system are represented as nodes in the graph, whereas physical connections, mechanical couplings, and logical relationships between components are represented by edges in the graph.
The integration of Digital Twins based on topological connection graphs achieves hierarchical construction through adjacency matrices with the following specific process.
The connection relationships between components within a specific subsystem are extracted from the information set based on component attributes and topological information to construct a corresponding homogeneous adjacency matrix for that subsystem. This matrix encapsulates the internal topological structure of the subsystem and its external interfaces, forming an independently representable subsystem-level Digital Twin. At the system level, the homogeneous adjacency matrices of multiple subsystems are superimposed based on their interface relationships to integrate a system-level heterogeneous adjacency matrix. This matrix completely describes the overall system topology and serves as a connection blueprint for the system-level Digital Twin, enabling a unified topological reconstruction from the components to the system through matrix parsing.
Figure 5 illustrates the topological connection process of the component-level digital twin.
4. Construction of a Simple Digital Twin Application Scenario for Water Purification Systems
The technical support system and core resource allocation for this research, which form the foundation for subsequent experimental case modeling and verification, are as follows. At the development hardware level, a terminal equipped with an Intel Core i7-12700H CPU, 32 GB DDR5 RAM, and an NVIDIA GeForce RTX 3060 graphics card was used for front-end coding, 3D scene debugging, and functional module integration. Front-end development was carried out in Visual Studio Code 1.85.1 for HTML structure construction, CSS styling, and JavaScript interaction logic development. The 3D visualization functionality was implemented using the Three.js 0.132.2 WebGL library in combination with the GLTFLoader plugin. The 3D model of the water supply system was created in Blender 3.5.1, consisting of 12 types of components—including RWP (rainwater pool), DW (distribution chamber), and P1–P12 (pipelines)—and exported in lightweight GLB format after polygon simplification in MeshLab 2022.02 to optimize web client loading performance.
Component parameters and operational standards strictly follow industry specifications such as the Standard for Design of Outdoor Water Supply Engineering (GB 50013-2018; Standard for Design of Outdoor Water Supply Engineering. Ministry of Housing and Urban-Rural Development of the People’s Republic of China: Beijing, China, 2018.), providing a compliance basis for pipeline flow velocity thresholds, distribution chamber liquid level design, and component maintenance cycles. Additionally, a set of JSON-LD information sets was constructed to test the data retrieval functionality of platform components. Regarding process integrity design, component selection covered the entire chain of the water supply system—water intake–pretreatment–filtration–disinfection–clean water transport—ensuring that the digital twin fully represents the operational workflow of the physical system.
To validate the proposed method, we selected a typical surface water treatment plant as an experimental case study. A top–down decomposition strategy was adopted to divide the system into four subsystems, namely “System Interaction with External Environment”, “Main Water Flow Treatment”, “Wastewater Recirculation”, and “Chemical Dosing”, based on the core functions, key characteristics, and target application scenarios of the surface water treatment plant.
The System Interaction with External Environment subsystem includes raw water intake and treated water output, including components such as the external environment, rainwater collection basin, and clear water reservoir. The Main Water Flow Treatment subsystem represents the core process of rainwater purification, including components such as distribution chambers, sedimentation tanks, siphon filters, and corresponding pipelines. The Wastewater Recirculation subsystem handles the recirculation and retreatment of backwash water and sludge, including components such as pipelines, valves, and corresponding control units. The Chemical Dosing subsystem is responsible for adding chemical agents to the main water flow to improve water quality, including components such as dosing equipment and corresponding chemical feed pipelines.
In summary, the division of the water purification system into four functional subsystems clarifies the role of each component in different processes, providing a clear structural basis for the construction of component information sets and system modeling.
4.1. Construction of Component-Level Digital Twins and Information Set Description
Each Digital Twin component is defined as an independent entity with the core structure of the information set as follows: @context defines the semantic context to ensure unambiguous interpretation across components, @id serves as the globally unique identifier, @type describes the component type, basicAttributes describe basic attribute information, physicalParameters describe physical parameter information, businessParameters describe operational parameter information, and structuralMaterial describes structural material information. This section provides detailed descriptions using distribution well and pipeline1 as examples.
Listings 1 and 2 show the simplified information set model for the distribution chamber and pipeline1 (for complete JSON-LD models, see
Appendix A). In basicAttributes, the ID field defines metadata, such as equipment name, model, and version. The G field describes the component’s geometric form, including diameter, height, and volume. The TD field specifies the component’s position in the topology and identifies the upstream and downstream connections. In physicalParameters, the S field specifies the design operating conditions. The D field represents the component’s real-time operational status, including dynamic sensor data such as inlet/outlet flow rates, water level, and turbidity. structuralMaterial defines information such as the component’s material, density, and corrosion resistance grade. businessParameters define the operational parameter logic, enabling digital management throughout the entire lifecycle.
| Listing 1. Model of the distribution chamber information set. |
{ “@context”: “https://example.org/dt-context/water-treatment/v1”, “@type”: “DistributionChamber”, “@id”: “WT_Plant_01::DS_01”, “basicAttributes”: { “ID”: { … }, “G”: { … }, “TD”: { … }, “TC”: { … } },
“physicalParameters”: { “S”: { … }, “D”: { … }, “E”: { … } },
“structuralMaterial”: { “MAT”: { … }, “MFG”: { … }, “ENV”: { … }, “LOS”: { … } },
“businessParameters”: { “CTL”: { … }, “SAF”: { … }, “MAINT”: { … }, “AUD”: { … } } } |
| Listing 2. Model of the pipeline1 information set. |
{ “@context”: “https://example.org/dt-context/water-treatment/v1”, “@type”: “Pipe”, “@id”: “WT_Plant_01::Pipe_01”, “basicAttributes”: { … },
“physicalParameters”: { … },
“structuralMaterial”: { … },
“businessParameters”: { … } } |
Through the construction of a standardized information set, each component in the case study possesses a machine-readable and semantically clear digital description, providing a nodal attribute foundation for subsequent topological connections. Each component is encapsulated as a self-contained independent unit that can be identified, parsed, and reused across different subsystems, and even cross-industry Digital Twins, significantly enhancing the reusability of the model.
Figure 6 and
Figure 7 respectively present the encapsulated standardized information sets for the distribution chamber and pipeline1.
4.2. Topology-Based System Integration and Twin Assembly
After completing the standardized construction of component-level Digital Twins, they must be integrated layer by layer into subsystem and system-level Digital Twins to support practical applications. Based on the described method, the first step is to construct a system adjacency matrix according to the component information sets, using this matrix as a structured blueprint for system integration. Components are subsequently assembled bottom–up layer by layer based on the topological rules defined by the adjacency matrix, ultimately completing the encapsulation and integration of the system-level Digital Twin.
The primary step in system integration is to read the unique identifier under basicAttributes.ID in each component’s information set, parse the upstreamComponent and downstreamComponent fields in basicAttributes.TD, and set weights in the corresponding positions of the adjacency matrix based on their relationship types. Finally, multiple homogeneous adjacency matrices describing different types of relationships are superimposed to form a complete description of the topological relationships of the system.
Figure 8 shows the representation form within the adjacency matrix, with the example object system being the water purification system shown in
Figure 9. Here, ENV represents the external environment, RWP denotes the rainwater collection pool, DW represents the distribution chamber, SP denotes the sedimentation pool, SFP represents the siphon filter pool, CWP denotes the clear water pool, DE denotes the dosing equipment, P1 indicates pipeline1, P2 indicates pipeline2, P3 represents pipeline3, and P4 represents pipeline4.
The system contains four independent flow paths, where the green edges represent interactions between the system and the external environment, blue edges indicate the water flow path, red edges indicate wastewater recirculation, and yellow edges represent chemical dosing, which are denoted by G, B, R, and Y, respectively. Therefore, four types of edges exist in the adjacency matrix.
In the water flow circuit, rainwater from the collection tank passes through the distribution chamber, sedimentation tank, and siphon filter before finally entering the clear water tank. In the chemical dosing circuit, chemicals are delivered from the dosing device to the sedimentation tank via Pipeline 2. In the sewage return circuit, wastewater from the clear water tank and the sedimentation tank is conveyed to the dosing device via Pipeline 4 and Pipeline 2, respectively. Furthermore, due to water evaporation, the rainwater collection tank and the clear water tank also interact with the external environment.
The topological structure of the system can be decomposed as four mutually independent isomorphic subgraphs, each containing only a single type of edge and corresponding to a homogeneous adjacency matrix: By superimposing these four isomorphic subgraphs, a complete topological description of the water purification system can be generated.
The integration engine uses the adjacency matrix as input and automatically identifies and assembles components by executing graph theory algorithms. The integration engine reads the adjacency matrix of the water flow links, automatically finds paths from the source node (rainwater collection pool) to the sink node (clear water pool), retrieves the corresponding Digital Twin instances from the component library based on the identified component list, automatically performs logical binding of interfaces according to the connection relationships, and finally encapsulates these components and their internal relationships into an independent, identifiable subsystem-level Digital Twin.
The final system-level integration generates the complete system-level Digital Twin by reading the heterogeneous adjacency matrix formed through the superposition of the four isomorphic subgraphs.
The integrated system is shown in
Figure 10, with a functional menu area at the top of the page supporting the switching and viewing of different components. The left side of the interface is the 3D visualization area for the digital twin, displaying, from left to right, a clear water tank, a siphon filter, a sedimentation tank, a water distribution chamber, a chemical dosing device, and a rainwater collection tank. Blue pipelines represent the water flow lines, yellow pipelines represent the chemical dosing lines, and red pipelines represent the sewage return lines. The right side of the interface is a component information display area, which can present the corresponding dataset information for each component.
4.3. Model Reusability Analysis
The core reusability of this method lies in the independence and completeness of the component-level Digital Twins. Pipeline 1, constructed in the aforementioned example, encapsulates the basic attribute information, physical parameter information, structural material information, and business parameter information of the component. Constructing a new pipeline only requires updating of the corresponding fields of Pipeline 1: except for the modification of the main parameters such as pipeline length, inner diameter, and shape, most of the other attributes such as material MAT, manufacturing process MFG, and interface type TC.connectionCompatibility can be directly reused without changes.
Import the modified JSON-LD file into the new project’s Digital Twin construction platform, generate a new adjacency matrix that defines the topological connection relationships between this pipeline and other components in the new system, connect this pipeline’s interfaces with new components, and complete the Digital Twin encapsulation.
The system-level digital twin constructed based on the aforementioned component reuse and system integration methods incorporates eight water flow pipelines, one clear water tank, one siphon filter, and one sedimentation tank. The integration results are shown in
Figure 11 and
Figure 12.
By reusing components with complete functionality and well-encapsulated data, the integration complexity and error risks are significantly reduced, avoiding redundant modeling and saving substantial time and labor costs. This approach achieves engineering innovation and optimization.
The reusability of this method is not limited to the water treatment industry. The siphon filter unit is essentially a gravity-based filtration and backwashing mechanism, whose core functionality holds a universal reference value for equipment with similar filtration-separation functions in industries such as chemical, petroleum, and food processing. By establishing higher-level cross-industry ontologies to map the information set’s @context, components constructed using this method can potentially find reuse scenarios across different domains, providing a viable technical pathway towards achieving a truly universal cross-industry modeling system.
4.4. Topology Connection Method Comparison
To validate the engineering practicality and efficiency advantages of adjacency matrix-based topology connections over those generated by traversing information set data, a comparative experiment was designed. The experiment simulated variations in the scale of digital twin system components, using digital twin generation time as the evaluation metric to quantitatively assess efficiency differences between the two methods and identify their respective applicable scenarios.
The experiment was conducted in a Python 3.10.8 environment, primarily using the random and numpy packages to simulate the generation of component information sets and to construct and manipulate adjacency matrices, respectively. First, a component information set containing attributes such as component ID, geometric parameters, and upstream/downstream topological relationships was generated. Subsequently, two topology connection generation methods were implemented: one directly traverses the component information set to establish connection relationships, while the other first constructs an adjacency matrix and then establishes connections based on the matrix. To ensure accurate measurement of generation efficiency, both methods used the time library to record timestamps at the start and end of function execution, with the time difference representing the actual time required for topology connection generation.
In this experiment, the number of system components was gradually increased from 10 to 100 in increments of 1, simulating the progressive expansion of component scale in a water purification system and enabling comparison of the efficiency of the two topology connection generation methods. To minimize fluctuations caused by non-theoretical factors such as memory allocation delays, 100 repeated tests were conducted for each component scale, and the mean time consumption was used as the final result. The results showed that when the number of components ranged from 10 to 28, the direct information set traversal method was, on average, 26.63% faster than the adjacency matrix method. When the number of components increased to 29–47, the speeds of the two methods converged (
Figure 13). Beyond 47 components, the adjacency matrix-based method surpassed the direct traversal method in efficiency, averaging 18.61% faster. As the component scale continued to expand beyond 300 (
Figure 14), the speed advantage of the adjacency matrix-based method became increasingly pronounced, averaging 44.69% faster than the direct information set traversal method.
4.5. Assessment of Resource Utilization
To further validate the engineering efficiency of the adjacency matrix-based (AM) method versus the direct information set traversal (DIT) method for digital twin generation, we conducted a comparative experiment on resource utilization.
We selected three metrics: CPU utilization, memory consumption (excluding cache), and GPU utilization, and recorded them using the built-in Windows Task Manager.
To ensure consistency, the experiment was conducted in the same hardware and software environment as
Section 4. For each scenario, 100 repeated tests were performed to eliminate random fluctuations, and the mean value was used as the final result. The resource utilization results across scenarios are summarized in
Table 2.
As shown in
Table 2, when the number of components is fewer than 28, the DIT method reduces CPU utilization by an average of 20.1% and memory consumption by an average of 15.8% compared to the AM method. This is because AM requires additional overhead for matrix initialization and storage, whereas DIT directly parses relationships without the need for intermediate data structures.
When the component count ranges from 29 to 47, the differences in CPU and memory usage between the two methods begin to diminish.
When the number of components exceeds 47, the AM method reduces CPU utilization by an average of 28.9% and memory consumption by an average of 11.3%. The primary reason is that the increased O(n2) complexity of DIT leads to an exponential rise in both CPU occupancy and memory consumption.
Since the 3D visualization workload is minimal for both methods, their GPU utilization rates are nearly identical.
Therefore, it can be concluded that the AM method demonstrates superior resource efficiency in large-scale digital twin systems. This finding is consistent with the topology connection efficiency results presented in
Section 4.4, confirming the suitability of the AM method for practical engineering applications.
5. Discussion
This study proposed a novel Digital Twin construction approach to address the issues of low reusability and poor cross-industry generality in Digital Twin models. The proposed method first divides the system into subsystems and components based on the degree of functional aggregation and physical coupling relationships, achieves topological integration of component-level Digital Twins through constructing adjacency matrices, and ultimately generates system-level Digital Twins. The main contributions of this study are summarized below.
A three-level hierarchical architecture of system–subsystem–component levels was constructed. Through stepwise refinement, cross-scale, highly coupled complex systems are transformed into multiple independently implementable simple Digital Twin components, whereas the integration and coordination of multilevel Digital Twins are achieved via unified interface specifications and information set models.
An information set model and graph theory methods were introduced to achieve topological integration and system encapsulation of component-level Digital Twins. The information set model integrates the component’s basic attribute information, structural material information, physical parameter information, and business parameter information, thereby achieving a unified representation and semantic integration of multi-source heterogeneous data. A system topological network was constructed with components as nodes and connection relationships as edges, generating subsystem and system-level Digital Twins based on adjacency matrices corresponding to isomorphic subgraphs and heterogeneous graphs, respectively, and achieving automated cross-level construction. Notably, the adjacency matrix-based method demonstrated high scalability and efficiency, generating topology connections 44.69% faster than the direct information set traversal method when the number of components exceeded 300.
Validation was performed by reading JSON-LD formatted information set data and adjacency matrices to achieve Digital Twin assembly from component to system level using a water purification system as a case study, and multiple scenario examples were constructed to verify the feasibility of the proposed method.
Integrating the contributions and case validations of this research, the study demonstrates that the proposed three-tier hierarchical architecture and adjacency matrix-based topology integration method effectively reduce the modeling complexity of intricate systems through component-based decomposition. By employing unified information sets and graph theory, the approach enables standardized digital twin construction while maintaining efficient topology generation even at large component scales, providing a viable technical pathway for the scalable application of digital twins.
However, this study has certain limitations. First, the proposed framework lacks a topology reconstruction mechanism for component failure or replacement; the adjacency matrix cannot automatically update connection relationships during faulty component replacement. This necessitates manual topology redefinition, which prolongs fault response cycles and fails to meet industrial real-time requirements. Second, the current framework does not address the need for real-time data updates or dynamic topology adjustments during digital twin operation. Although suitable for static modeling, its model iteration capability in dynamic scenarios requires further enhancement.
Despite these limitations, the study retains both theoretical and practical significance. Theoretically, the three-tier hierarchical architecture enriches the layered modeling theory of digital twins, and integrating information sets with graph theory provides a new paradigm for the semantic fusion of multi-source heterogeneous data. Practically, component-based reuse and standardized integration reduce modeling costs and timelines, deliver standardized solutions for process industries and energy systems, support large-scale deployment through efficient topology generation, and facilitate cross-industry scalable implementation of digital twin technology.
6. Conclusions
This study aims to address the long-standing limitations of Digital Twin (DT) technology, including the lack of a universal cross-industry modeling framework, low component reusability, and high development costs, which have hindered the application of system-level digital twins. To tackle these issues, a hierarchical decoupling and graph theory-based topological connection method for digital twin modeling is proposed and validated using a water purification system.
First, a system is decomposed top–down based on functional aggregation and physical coupling to establish a three-tier hierarchical architecture consisting of System, Subsystem, and Component. This architecture decomposes complex systems into independent, reusable digital twin components, reducing modeling complexity while enabling standardized integration through a unified interface specification.
Second, a component-level digital twin construction method based on a Standardized Information Set (SIS) is developed. The SIS encompasses basic attribute information, physical parameter information, structural material information, and business parameter information, and employs JSON-LD to achieve cross-platform semantic consistency, realizing a data-driven DT generation mechanism that significantly enhances component reusability.
Finally, a topological integration mechanism based on an adjacency matrix is proposed for the bottom–up assembly of system-level digital twins. Experimental results show that when the number of components exceeds 300, the topological connection generation speed is 44.69% faster than direct information set traversal; for systems with more than 47 components, the average CPU utilization and memory consumption are reduced by 28.9% and 11.3%, respectively. The water purification system case further validates that this method enables accurate and standardized system-level digital twin assembly, shortening the modeling cycle.
Theoretically, this study enriches digital twin modeling theory and provides a new paradigm for multi-source heterogeneous data fusion. Practically, it offers a cost-effective and scalable approach for digital twin and model construction in industrial systems.