This section proposes a threat source evolution analysis framework tailored for near-field networks, which overcomes the limitations of traditional control flow graphs (CFGs) in behavior modeling through semi-physical simulation techniques. The framework establishes a dynamic and interactive analytical system across three dimensions: structure, behavior, and state. At the structural modeling level, a semi-physical CFG construction method is introduced, which integrates physical and protocol emulation to achieve high-fidelity reconstruction of hardware behaviors and communication logic. At the behavior and state evolution levels, the framework incorporates complex network theory and propagation dynamics models. By integrating CFG topological evolution (based on the Barabási–Albert model), threat propagation paths (via the Susceptible–Infectious–Recovered model), and dynamic reconfiguration mechanisms, it enables quantitative analysis of threat diffusion pathways and identification of critical nodes.
3.1. Overall Framework
This study aims to develop a control flow graph (CFG) construction and threat evolution modeling system for near-field networks in smart grids, based on semi-physical simulation. The system comprises four core modules: physical simulation, protocol simulation, control flow integration, and threat evolution modeling. These modules collectively form a three-dimensional analysis framework that incorporates structural, behavioral, and state-related dimensions, as shown in
Figure 1.
First, the physical simulation module captures the input–output characteristics of critical devices by modeling their operational behaviors, thereby ensuring the authenticity of device behavior within the simulation environment. The protocol simulation module focuses on the analysis and emulation of communication protocols between devices, with particular attention to complex proprietary protocols, to faithfully reproduce communication dynamics. The control flow integration module fuses the control flow information obtained from both physical and protocol simulations to construct a unified CFG, which comprehensively reflects the system’s operational logic and data flow. Finally, the threat evolution modeling module leverages the unified CFG and applies complex network theory and propagation models to simulate the diffusion paths and evolution processes of threats within the system.
The framework is designed to address the inherent complexity and dynamic nature of smart grid near-field networks. Its modular architecture ensures the independence of each component while enabling seamless integration across the system. Furthermore, the framework supports flexible scalability, allowing for reconfiguration of individual modules to accommodate simulation and analysis tasks of varying scale and complexity. Through this design, this study establishes a high-fidelity and extensible semi-physical simulation and threat evolution modeling system, providing a robust technical foundation for security analysis and defense strategy development in smart grid near-field networks.
3.2. Semi-Physical Simulation CFG
We propose a semi-physical simulation-based method for constructing control flow graphs (CFGs) of networked systems, which achieves precise modeling of threat evolution mechanisms through the organic integration of software simulation and hardware testing. The core innovations of this approach are as follows. First, by integrating real physical devices with virtual components in a controllable environment, the method ensures the authenticity of critical behavioral features while effectively avoiding potential security risks to actual systems. Second, the system allows for flexible adjustment of the ratio between physical devices and virtual components according to research needs, enabling cost-effective experimentation without compromising performance. Third, the semi-physical simulation environment supports rapid scenario reconfiguration and repeated testing, and it allows for progressive scaling in terms of system size and complexity. Finally, the approach enables the deployment of monitoring nodes across multiple system layers, facilitating the acquisition of multidimensional experimental data and overcoming the limitations of conventional monitoring techniques in pure hardware settings. These technical advantages make the semi-physical simulation approach particularly well-suited for CFG construction in smart grid near-field networks, providing a reliable technical foundation for the development of threat evolution models.
3.2.1. Physical Emulation
In the context of physical simulation studies for complex systems, we propose three categories of physical device simulation methods, each tailored to specific device characteristics such as openness, protocol transparency, and hardware dependency. First, for open-source devices that provide full access to device code or protocol stacks and support custom modification and integration, we adopt solutions based on well-established technologies. These include source code integration, Docker-based containerized deployment, and adaptation through hardware abstraction layers (HAL). Second, for devices with complex logic that are difficult to simulate, such as those relying on proprietary hardware components like RF chips, cryptographic modules, or high-precision ADC sampling units with intricate behaviors, we employ a direct physical interfacing approach. This method ensures accurate emulation of hardware-specific functionalities that cannot be effectively replicated in a purely virtual environment. Third, for the majority of non-simulatable units or devices with relatively simple logic, which represent a significant portion of near-field networks and pose a key challenge in physical simulation, we adopt a reverse engineering-based modeling approach. This involves analyzing device input–output behavior through techniques such as fuzz testing and behavioral inference to construct accurate simulation models. Detailed implementation of this approach is presented in the following.
In the implementation of physical simulation, particularly for emulating the control logic of concentrator firmware, this study captures key function calls in real time. For example, the RS-485 packet parsing function ‘parse_485_data()’ and the HPLC command issuance function ‘send_hplc_command()’ are monitored, and their outputs are recorded. These outputs are then processed using custom Python 3.8.20 scripts to perform fuzz testing on RS-485 packets. The process consists of two main phases: static analysis and guided fuzz testing. During the static analysis phase, a third-party library dependency graph is first constructed using third-party library detection techniques. Subsequently, for the binary program and each of its direct or indirect third-party dependencies, individual function call graphs are analyzed. Critical functions within each third-party library are identified, and a complete function call graph of the overall binary program is then constructed. Based on this unified graph, potential vulnerable functions are identified, and seed selection is guided by a key-point recognition mechanism. Key points are defined as boundary functions of library components or functions that lie on paths reaching vulnerable target functions. The priority of fuzzing seeds is evaluated based on the number of key points contained in their execution paths and the proximity of these key points to the target vulnerable functions. Higher priority is assigned to seeds that pass through more key points and whose key points are closer to the target functions. Furthermore, a directed utility analysis is performed in parallel with key-point recognition, generating three key indicators to guide energy allocation for fuzzing seeds: (1) Function-level distance: the average distance from each function on the execution path to the target vulnerable function. (2) Predecessor function count: the total number of functions in the path that can reach the target function. (3) Function priority: determined by the shortest distance to the target vulnerable function within the path; shorter distances imply higher priority. Given any seed, the execution path of
s, the energy distribution calculation is as follows:
During the fuzz testing phase, this study optimizes seed input selection strategies based on key point information obtained from the static analysis stage. The energy allocation algorithm is further refined by applying normalization to three critical metrics: function layer distance (
), number of predecessor functions (
), and function priority (
). This optimization improves the efficiency of path exploration in fuzzing, accelerates the triggering of supply chain-related vulnerabilities in the program, and enhances the generation efficiency of proof-of-concept (PoC) test cases, as illustrated in
Figure 2.
Following this, the firmware’s state space is systematically explored using disassembly and cross-reference analysis techniques in combination with the Angr symbolic execution framework. Based on the outcomes of this analysis, the complete initialization process of the target device is successfully reconstructed. A high-fidelity device emulation model is then developed accordingly.
To address the potential combinatorial explosion of physical and control state spaces during reverse modeling, we adopt a multi-layered strategy. First, we apply constraint-guided symbolic execution, where input constraints derived from protocol specifications and known firmware registers are used to prune infeasible execution paths during path exploration. Second, we implement hierarchical abstraction by clustering CFG nodes based on protocol roles (e.g., handshake routines, authentication, data transmission) and constructing higher-level state transitions. Third, we employ region-based instrumentation by selectively monitoring address ranges and control logic blocks relevant to known threat vectors, significantly reducing the analysis scope. These techniques enable scalable modeling without compromising semantic fidelity, especially when dealing with large binaries or closed-source control logic.
3.2.2. Protocol Emulation
At the current stage, simulation techniques for devices utilizing publicly available and standardized protocols have become relatively mature. However, when dealing with devices that rely on complex proprietary protocols, the workload associated with protocol reverse engineering increases significantly, and simulating composite protocols remains a major technical challenge. This section focuses on the study of a unified approach to the simulation of composite protocols in such devices.
We propose two process identification methods that enable precise tracking of the execution paths of protocol service programs under test by implementing accurate process recognition within the instrumentation tool QEMU. This approach overcomes the technical bottlenecks caused by variations in daemon or subprocess mechanisms across different devices during protocol service implementation. Through a unified interface design, it achieves standardized identification of heterogeneous protocol software processes, thereby facilitating execution path monitoring. QEMU is configured to run in system emulation mode, supporting full binary execution of embedded firmware images. To enable fine-grained control flow tracing within QEMU, we modified its TCG (Tiny Code Generator) backend to support custom instruction-level callbacks. Specifically, we inserted lightweight hooks that monitor the Program Counter (PC) and Stack Pointer (SP) during runtime to identify process context switches and stack frame boundaries. For address-triggered registration, we implemented a dynamic memory access monitor that registers code regions based on predefined address ranges extracted from firmware analysis or binary disassembly. When execution enters these monitored regions, a registration event is triggered, and the corresponding function label or behavior is recorded. All logs are timestamped and stored in an event trace buffer for subsequent CFG reconstruction. This mechanism ensures accurate isolation and tracking of protocol-specific execution paths across different device emulations.
Our research highlights that greybox fuzz testing relies heavily on the ability to obtain execution traces of the target protocol service, which serve as crucial feedback for evaluating test case effectiveness. To accommodate diverse firmware protocol programs, the instrumentation tool must accurately capture the execution path () of the relevant services (S) within the firmware. This requires a careful balance: avoiding excessive coverage that introduces noise from unrelated components, while also ensuring that critical information from the target service and its dependent libraries is not omitted due to overly narrow monitoring.
To address this, we propose a path recording adaptation technique tailored for proprietary protocol programs across different devices. Deeply integrated into the instrumentation tool, this technique provides a standardized user interface to enable differentiated recording of key execution paths across various programs. Specifically, the protocol interface testing adaptation mechanism must fulfill the following core function: accurately identify the target protocol service process within the firmware environment, and comprehensively record the execution path of its critical test logic in the form of a basic block address sequence.
To address the challenge of process identification, we propose a path recording adaptation technique that incorporates two distinct identification modes. The first mode is designed for resident protocol service programs whose process identifiers () remain constant. It employs a hybrid identification approach based on the program counter () and stack pointer (). Specifically, users provide the instrumentation tool with the PC value () and SP value () of the target process at a particular execution point. By continuously monitoring these two registers at runtime, the tool can accurately recognize the target process—when the observed register values match the user-specified values, the process is deemed to be the intended one. The second mode is tailored for subprocess-based protocol service programs, where the changes dynamically. In this case, we introduce an identification method that combines the PC value with code signature matching. Since such subprocesses often make it difficult to obtain register information via conventional debugging methods, users are required to provide both the PC value () at a specific location and the binary instruction value () at that address. Similar to the first mode, the instrumentation tool monitors the PC register and reads the corresponding code at the specified location. When both the register value and code pattern match the user-provided inputs, the process is identified as the critical one to be traced.
For execution path recording, the proposed method implements an address-triggered recording mechanism. Users simply specify a start address () and an end address () for trace logging. Once the execution flow of the target process reaches the designated start address, the instrumentation tool automatically initiates the recording process, capturing a complete execution trace until the end address is reached.
In summary, the proposed protocol interface testing adaptation technique exposes a unified interface to users, namely
for process identification and
for execution trace recording. This enables the precise acquisition of execution traces for protocol service programs.
Table 1 presents the protocol service programs, process identification methods, as well as the start and end address codes of recorded trajectories in binary files for different protocols in edge gateways, routers, and FTU devices.
3.2.3. CFG Integration
The construction of the semi-physical simulation control flow graph (CFG) is based on cross-domain information fusion and dynamic synchronization techniques, achieving the organic integration of the physical simulation model and protocol simulation traces. As shown in
Figure 3, the system integrates devices such as concentrators, smart fusion terminals, electric meters, WAPI routers, Lora gateways, and flood sensors, and, in conjunction with multi-protocol interaction logic, constructs a unified control flow model.
Specifically, the system begins by decomposing the functionality and communication protocols of each device. The concentrator aggregates data using HPLC/RS-485 protocols. The smart fusion terminal performs protocol translation between LoRa/WAPI and HPLC/RS-485, while sensors upload data via the LoRa protocol. Based on this structure, a coordinated multi-device workflow is established, encompassing data transmission paths from the concentrator to the terminal and then to WAPI/LoRa devices, as well as direct RS-485 communication links to electric meters. The system introduces dedicated protocol conversion nodes, enabling the terminal to convert LoRa data into HPLC format and RS-485 data into WAPI format. For typical application scenarios, such as the sensor data reporting process (sensor → gateway → terminal → concentrator → alarm), the system embeds several key mechanisms. These include a real-time constraint mechanism (100 ms heartbeat detection), a multi-protocol concurrent processing mechanism (using a task scheduling and polling strategy to handle electric meter data while simultaneously processing control commands in real time), and a fault recovery mechanism (automatically retrying Lora communication up to three times upon timeout, and triggering an alarm upon repeated failure).
Through this design, the system constructs a comprehensive control flow model that spans hardware interaction, protocol translation, and real-time control. This model provides a robust technical foundation for the reliable operation of the semi-physical simulation system.
3.3. Threat Evolution Modeling Based on CFG
To gain deeper insights into the dynamic evolution of threats within the near-field network of the smart grid, this paper proposes a threat evolution modeling approach based on Control Flow Graphs (CFGs), integrating complex network modeling theory and graph structural evolution mechanisms. This method not only effectively reveals the topological paths and evolutionary patterns of threat propagation but also enables quantitative analysis and prediction of propagation trends and critical nodes. Specifically, we draw upon the Barabási–Albert (BA) scale-free network model, the Susceptible–Infectious–Recovered (SIR) epidemic model, and heterogeneous graph evolution mechanisms to jointly model the node generation, edge evolution, and state transition processes within the CFG-based graph.
3.3.1. BA Model Adaptation and Topology-Driven Mechanism
The Barabási–Albert (BA) model [
30] is a classical framework characterizing the phenomenon of preferential attachment in complex systems. Its core principle posits that new nodes are more likely to connect to existing nodes with higher degrees, thereby forming a scale-free network structure where a few critical nodes dominate the majority of the traffic. We observe a similar pattern in the Control Flow Graph (CFG) structure of near-field networks in smart grids: certain key control logic modules—such as main control units, scheduling algorithms, or protocol parsers—undertake a substantial portion of data processing and control transfer tasks, exhibiting high in-degree or out-degree characteristics in the CFG.
Accordingly, we model the CFG as a heterogeneous, directed, scale-free graph and incorporate the BA mechanism to simulate the preferential propagation tendency of threats along logical control flows. During the dynamic construction of the CFG, each newly detected control flow path, triggered by events such as protocol state transitions or exceptional input branches, is treated as a new node added to the graph. The probability of connecting this new node to existing nodes is computed based on two factors: the behavioral activity and control influence of the target nodes. This process captures the preferential selection of attack paths within the CFG. The connection probability is defined as follows:
Here, denotes the current control flow degree of node i, and is a power-law parameter that regulates the strength of preferential attachment. By tuning , we can control whether the attacker exhibits a stronger preference for targeting high-influence control nodes or adopts a more dispersed strategy by attacking multiple paths. To determine , we generated a series of synthetic networks and selected the value that best matched the degree distribution observed in our NFN emulation topology using least-squares fitting on the log–log plot of node degrees.
3.3.2. SIR Model Mapping and Threat State Annotation Mechanism
To characterize the state evolution of threat propagation within the CFG, we draw inspiration from the Susceptible–Infectious–Recovered (SIR) model [
31] commonly used in epidemiological studies. In this model, system nodes are categorized into three states: Susceptible (S), Infectious (I), and Recovered (R). The connections between nodes, along with the associated transmission probabilities, determine the likelihood of an infection spreading from one node to another. For devices that utilize direct communication methods and are thus challenging to model explicitly, this study adopts an abstraction approach by representing them as nodes. Their transmission likelihood and direction are inferred based on historical propagation pattern data. In the context of CFG-based threat evolution modeling, each control logic node is assigned one of the following states:
- S (Susceptible):
a normal node not yet affected by threats;
- I (Infectious):
a node currently influenced by anomalous logic paths;
- R (Removed or Isolated):
a node that has been shielded, isolated, or mitigated by security mechanisms.
The threat propagation process is thus modeled as a state transition from I to S nodes along the control flow graph. The transmission probability
is formulated as the following function:
Here,
denotes the existence of a control flow path from node
i to node
j;
represents the coupling degree between physical characteristics (such as voltage/current fluctuations) and network behaviors (such as variations in packet traffic) along the path; and
is a temporally correlated propagation amplification factor, accounting for factors such as increased system vulnerability during low-load nighttime periods. Through iterative evolution, we construct a time-aware sequence of state propagation graphs, capturing the threat’s diffusion path, velocity, and intensity as it gradually spreads from the initial infection node to various layers of the system. This sequence ultimately serves as the foundation for threat situational forecasting and resource allocation optimization. For
, we conducted parameter sweeps in the range [0.01, 0.5], measuring the resulting infection spread rates in the simulated environment. The optimal
value was selected based on maximizing alignment with observed propagation patterns in the fuzzing-triggered vulnerability scenarios.
In this formulation,
represents the cyber-physical coupling coefficient between physical-layer deviations and their impact on logical propagation behavior. To quantify this, we define:
where
denotes the observed voltage fluctuation (e.g., from 220 V ±
) correlated with the data path between nodes
i and
j, and
represents the increase in packet retransmission rate or response delay under such fluctuation.
is the nominal system voltage (e.g., 220 V), and
is the normal retransmission rate without physical disturbance. This product captures how physical degradation affects data integrity or timing, allowing
to serve as a normalized multiplier in the threat propagation function. These values are empirically measured during our semi-physical simulation using controlled perturbation injection.
3.3.3. State-Driven CFG Evolution Mechanism and Local Reconfiguration Model
In addition to topological and state-based propagation modeling, threat evolution in smart grids is also reflected in structural changes within the Control Flow Graph (CFG) itself. To address the asymmetric reconfiguration of the CFG during an attack, such as jump chain insertion or conditional branch hijacking, we introduce a state-driven CFG structural evolution mechanism. This mechanism defines the following three types of structural evolution operations based on the CFG’s foundational structure:
- Node Insertion:
the attacker injects new logic flows (e.g., backdoor routines) into existing CFG paths;
- Edge Substitution:
control flow redirection alters the jump target, rendering original edges obsolete while activating new paths;
- Path Splitting:
a branch condition is broadened, resulting in multiple new execution paths from a single node.
By performing evolutionary operation recognition on the sequence of CFGs and employing graph isomorphism detection and graph edit distance computation, we identify structural changes induced by attacks. These changes are then mapped to micro-level events in the broader context of threat evolution.
3.3.4. Threat Evolution Entropy and Critical Path Identification
To quantitatively characterize the uncertainty inherent in threat evolution within the system, we further propose a metric termed threat evolution entropy. Inspired by the concept of information entropy, this metric is designed to measure the diversity of propagation path selection and the potential for risk diffusion within the CFG:
Here,
represents the propagation probability of the
i-th path at time
t. A higher evolution entropy indicates that the system is in a phase of high propagation uncertainty, reflecting a more complex security posture. Building upon the evolution entropy, we further integrate path frequency statistics and risk assessment models to identify critical propagation paths and high-risk nodes, thereby providing decision support for security response actions. Equation (
4) adopts the classic Shannon entropy formulation, but it is applied here in a time-varying context to quantify the uncertainty of multi-path threat propagation and is therefore termed threat evolution entropy in this paper. The term
denotes the probability that the threat at time t propagates through the i-th path in the Control Flow Graph (CFG). These probabilities are derived from the propagation likelihoods
in Equation (
3), normalized such that
at each time step. In practice, we track the distribution of active propagation edges and compute
based on their relative transition weights. A higher entropy value reflects a more uncertain or widely dispersed threat pattern, while lower entropy indicates concentrated, predictable propagation.
While the threat evolution entropy provides a quantitative view of propagation uncertainty, its application in practical scenarios requires defined thresholds for decision-making. Based on empirical observation in our simulation environment, entropy values below 1.0 typically indicate deterministic propagation patterns and can be categorized as low-risk. Values in the range of 1.0 to 2.5 suggest moderate unpredictability in the threat path, indicating medium-risk areas requiring active monitoring. Entropy values exceeding 2.5 reflect high uncertainty in propagation, often corresponding to topologies with redundant or highly dynamic node interconnections, and are classified as high-risk, suggesting a need for preemptive intervention (e.g., protocol isolation or dynamic defense updates). These thresholds may vary with network size and configuration and can be further refined through domain-specific calibration in real-world deployments.
In summary, the CFG-based threat evolution modeling approach proposed in this study leverages the Barabási–Albert (BA) model to capture preferential attachment in control flow path formation, and it employs the SIR model to describe the transitions of propagation states. Additionally, structural changes in the control flow graph induced by attacks are precisely modeled through a state-driven mechanism and graph evolution operations. Combined with quantitative metrics such as threat evolution entropy, this approach establishes a comprehensive threat modeling framework that integrates structure, behavior, and state in a three-dimensional, dynamically coupled manner. This methodology not only enhances the granularity and responsiveness of threat modeling in near-field smart grid networks but also lays a solid foundation for the optimized deployment of subsequent defense strategies.