Previous Article in Journal
Adaptive Software-Defined Honeypot Strategy Using Stackelberg Game and Deep Reinforcement Learning with DPU Acceleration
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling and Authentication Analysis of Self-Cleansing Intrusion-Tolerant System Based on GSPN

1
Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
2
Qingdao Moses Process Control Technology Co., Ltd., Qingdao 266035, China
*
Author to whom correspondence should be addressed.
Modelling 2026, 7(1), 24; https://doi.org/10.3390/modelling7010024
Submission received: 8 December 2025 / Revised: 15 January 2026 / Accepted: 16 January 2026 / Published: 19 January 2026

Abstract

Self-cleansing intrusion-tolerant systems mitigate attacker intrusions and control through periodic recovery, thereby enhancing both availability and security. However, vulnerabilities in the control link render these systems susceptible to request forgery attacks. Furthermore, existing research on the modeling and performance analysis of such systems remains insufficient. To address these issues, this paper introduces an authentication mechanism to fortify control link security and employs Generalized Stochastic Petri Nets for system evaluation. We constructed Petri net models for three distinct scenarios: a traditional system, a system compromised by forged controller requests, and a system fortified with authentication mechanism. Subsequently, isomorphic Continuous-Time Markov Chains were derived to facilitate theoretical analysis. Quantitative evaluations were performed by deriving steady-state probabilities and conducting simulations on the PIPE platform. To further assess practicality, we conduct scalability analysis under varying system scales and parameter settings, and implement a prototype in a virtualized testbed to experimentally validate the analytical findings. Evaluation results indicate that authentication mechanism ensures the reliable execution of cleansing strategies, thereby improving system availability, enhancing security, and mitigating data leakage risks.

1. Introduction

The proliferation of internet technology has significantly expanded the functionality and data volume of network systems. However, inherent limitations in design and implementation make vulnerabilities a prevalent and unavoidable characteristic of these systems. The existence of such vulnerabilities and backdoors is a primary driver of cybersecurity incidents [1]. Consequently, as the scale of these systems grows, so does the number of potential vulnerabilities, exacerbating threats to system security. While established technologies like firewalls, intrusion detection systems, and vulnerability scanners have enhanced system security to some extent, traditional defense paradigms remain challenged by the “easy to attack, hard to defend” asymmetry against continuously evolving and highly unpredictable attack vectors.
Attackers are able to infiltrate target systems, leveraging multiple attack vectors such as zero-day vulnerabilities, flaws in software configuration and access control policies, or by social engineering users into installing or running malicious software [2]. Characteristic-matching defense systems, which rely on analyzing historical attack samples, are inherently limited in their ability to detect unknown threats and adapt to attack variants. Therefore, the research goal of network security technology should be to find a new defensive approach that does not rely on attack characteristics and behavioral information, so as to alleviate the growing problem of cyberspace security [3].
To address the “easy to attack, hard to defend” asymmetry in cyberspace, the academic community has intensified research into proactive defense technologies [4]. This effort has spurred the development of mature technologies such as cyberspace mimic defense [5], deception defense [6], moving target defense [7], and intrusion tolerance [8]. Unlike preventive defense strategies, intrusion tolerance focuses on maintaining continuous service delivery even when the system is compromised [8]. Self-Cleansing Intrusion Tolerance (SCIT) is a prominent example of this paradigm [9]. SCIT utilizes virtualization to dynamically switch between online and offline server instances, allowing the system to rapidly revert to a known-secure state post-intrusion. Rather than relying on traditional detection and prevention, SCIT ensures system availability and resilience through periodic “self-cleansing” operations. This approach enables swift service restoration and prevents attackers from establishing persistent control or causing lasting damage.
The foundational theory of Self-Cleansing Intrusion Tolerance, proposed by Bangalore et al. [9], pioneered a shift in defensive focus from intrusion prevention to loss control, thereby balancing system security with availability. Subsequent work [10,11] integrated SCIT’s benefits with the elastic redundancy of cloud platforms to develop a cost-effective, scalable cloud service architecture, further exploring serverless computing’s potential for cost reduction. SCIT has also been incorporated into the Moving Target Defense (MTD) domain. For instance, Qi et al. [12] combined server switching with software diversification to increase the randomness and unpredictability of the operating environment, subsequently enhancing the system’s overall defensive posture. Within Software-Defined Networking (SDN), Sanoussi et al. [13] proposed ITC (Intrusion-Tolerant Controller), a scheme for multi-controller architectures that combines recovery models with MTD strategies to strengthen the security of the SDN control plane.
Regarding data security, Sood et al. [14] applied an SCIT-based MTD approach to healthcare information systems, enhancing security and mitigating data leakage risks. However, their study lacked an in-depth analysis of SCIT’s theoretical efficacy in this context. Nagarajan et al. [15] proposed a hybrid architecture integrating SCIT with intrusion detection systems (IDS). They validated the approach’s effectiveness in reducing the likelihood of data breaches using decision tree analysis and Monte Carlo simulations.
In the domain of system performance evaluation, a closely related study by El et al. [16] proposed a modeling technique for SCIT systems based on semi-Markov processes. Their approach integrated the SCIT mechanism with preventive maintenance to construct a semi-Markov model, enabling numerical analysis of metrics like system availability and mean time to security failure (MTTSF). Distinct from prior research, our work addresses a critical yet overlooked vulnerability: the unauthenticated SCIT control link’s susceptibility to forged request attacks. These attacks can directly disrupt cleansing operations, potentially causing a complete failure of the intrusion tolerance functionality. To mitigate this threat, we introduce an authentication mechanism designed to secure the infrastructure against such spoofing attacks. We employ Generalized Stochastic Petri Nets (GSPN) for system modeling and evaluation. Compared to the semi-Markov process method, GSPN offers superior capabilities for describing complex attack-defense scenarios; it not only supports quantitative analysis but also intuitively illustrates the adversarial interactions between the attacker and the SCIT strategy, including conditional triggers and concurrent behaviors. Accordingly, we construct GSPN models for various scenarios to comparatively assess the destructiveness of forged controller request attacks and the defensive efficacy of the authentication mechanism. Furthermore, we conduct numerical assessments of key metrics, including system availability, security, and data leakage risk.
The main contributions of this paper are summarized as follows:
  • We identify a critical yet underexplored vulnerability in SCIT systems: the susceptibility of unauthenticated control links to forged controller requests. To address this, we introduce an identity authentication mechanism that effectively prevents spoofing attacks and ensures the reliable execution of cleansing operations.
  • We develop a concise and expressive GSPN-based modeling framework that captures traditional SCIT behavior, forged request attacks, and the authentication-enhanced mechanism within a unified formalism, enabling intuitive representation of adversarial interactions alongside rigorous quantitative analysis.
  • We perform a comprehensive evaluation combining steady-state analysis and PIPE-based simulation, and further extend the study with scalability analysis under varying system sizes and parameter settings. In addition, we implement a prototype and conduct testbed experiments to validate the analytical results. The results demonstrate that the authentication-enhanced design improves availability and security while significantly reducing data leakage risks compared with traditional SCIT systems.

2. Defense Scenarios

The self-cleansing intrusion-tolerant system mitigates network attacks through the periodic rotation of servers between online and offline states. This core mechanism shortens the server’s online exposure window, thereby compressing the effective timeframe available to an attacker. This strategy is key to ensuring service availability.
To systematically evaluate our model and quantify the impact of different defense strategies on system performance, we designed three progressive assessment scenarios.
As depicted in Figure 1, Scenario I models the traditional self-cleansing intrusion-tolerant system, which serves as the evaluation baseline. In this baseline model, the controller schedules server rotations according to a predetermined strategy, periodically cleansing potential intrusions and restoring the system to a secure state. While this baseline model can effectively interrupt attacks and restore services, it lacks security hardening for the control channel. This omission creates a theoretical vulnerability to forged controller request attacks.
Scenario II, depicted in Figure 2, builds upon the baseline by introducing threats to the control link. In this scenario, attackers can perform conventional intrusions and, more critically, monitor the control channel. By analyzing communication protocols, they can then spoof legitimate controller instructions to manipulate server states. Such attacks cause servers to execute unexpected state transitions, such as unplanned shutdowns, thereby severely undermining system availability and the effectiveness of the security policy.
Authentication technology is a cornerstone of information security, ensuring that system resources are accessible only to legitimate entities [17]. To mitigate the threats identified in Scenario II, Scenario Scenario III enhances the SCIT model with an authentication mechanism, as depicted in Figure 3. In this enhanced model, before executing any control instruction, a server must first use the authentication mechanism to verify the identity of the command’s originator. Only validated controllers are permitted to establish a secure channel for subsequent operations; all unauthenticated or spoofed requests are rejected outright. Moreover, this model extends the authentication layer to the client side. Users are now required to authenticate their identity before accessing services, which restricts unauthorized access and further strengthens the system’s overall security posture.

3. GSPN-Based Modeling and Analysis

Petri nets are a formal modeling tool that integrates graphical representation with rigorous mathematical semantics. They excel at capturing the concurrent, synchronous, and resource-contentious behaviors inherent in complex systems, making them widely applicable for performance evaluation, availability analysis, and system modeling. Structurally, a Petri net is a bipartite directed graph consisting of Places and Transitions. Places represent system states or resource availability, while Transitions denote the events or actions that cause state changes. The system’s global state is defined by the distribution of Tokens across its places—a configuration known as a Marking. The dynamic evolution of the system unfolds as transitions “fire,” consuming tokens from input places and producing them in output places. This token flow simulates the system’s operational dynamics. To enhance modeling power, extensions such as Inhibitor Arcs can be introduced. An inhibitor arc prevents a transition from firing if its corresponding place contains tokens, thereby enabling the modeling of negative conditions and complex control logic.
Building upon the basic Petri net model, Generalized Stochastic Petri Nets incorporate concepts of time and stochasticity to enhance modeling precision regarding the dynamic behaviors of real-world systems. In GSPN, transitions are categorized into two types: Timed Transitions and Immediate Transitions. Timed Transitions represent stochastic events with non-zero durations; their firing delays typically follow an exponential distribution, characterized by corresponding rate parameters. Conversely, Immediate Transitions describe instantaneous actions such as logical judgments, strategy selections, or scheduling decisions that consume no time. These transitions fire instantaneously once enabled. Semantically, Immediate Transitions possess higher priority than Timed Transitions, ensuring that the system prioritizes the completion of all immediate logical evolutions within any given state.
Leveraging the aforementioned mechanisms, the dynamic behavior of GSPN can be systematically characterized through the evolution of markings. Depending on whether enabling immediate transitions exist under the current marking, system states are partitioned into Vanishing States and Tangible States. Vanishing States represent only instantaneous logical evolution processes and consume zero time, whereas Tangible States correspond to states where the system actually resides, with residence times governed by the exponential distributions of timed transitions. By eliminating and reducing Vanishing States, the reachable state space of the GSPN can be rigorously mapped to a Continuous-Time Markov Chain (CTMC).
Following the mapping from GSPN to CTMC, the challenge of quantitative system analysis is effectively transformed into the numerical solution of the CTMC. By solving the corresponding steady-state balance equations or transient probability equations, key performance and reliability metrics can be derived, including steady-state probability distribution, availability, mean response time, throughput, and the probability of failure or intrusion occurrences. Consequently, GSPN establishes a comprehensive analysis framework that progresses from structural modeling and dynamic behavior description to stochastic process modeling and numerical evaluation, thereby providing a solid theoretical foundation and engineering feasibility for the quantitative assessment of complex systems [18]. Figure 4 illustrates the fundamental components of a GSPN.

3.1. Scenario I: Traditional Self-Cleansing Intrusion Tolerance Model

As shown in Figure 5, this model represents the core operational cycle of the self-cleansing intrusion-tolerant system. For clarity, the model simplifies the self-cleansing process into two primary states: “online service” ( P o n l i n e ) and “offline ready” ( P r e a d y ) and integrates the attacker’s intrusion path and the system’s recovery mechanisms. The model’s initial marking places one token in P o n l i n e and another in P n o r m a l . This signifies that the server is initially online and the system is secure. Upon reaching a preset runtime threshold, the controller issues a cleansing command, causing transition T c l e a n to fire. This consumes the token from P o n l i n e and generates a token in P r e a d y , transitioning the server from online service to the “offline ready” state after cleansing. Subsequently, to bring the server back online, transition T d e p l o y is activated. This transition consumes the token in P r e a d y and deposits a new token in P o n l i n e , thus restoring the server to active service and completing one full self-cleansing cycle.
While the server is in its “online service” state (indicated by a token in P o n l i n e ), an attacker can scan and reconnoiter it. This behavior corresponds to the activation of transition T s c a n , which consumes the token from the secure state place P n o r m a l and deposits a new one in P e x p o s u r e . As a result, the system transitions to a “risk exposure” state, signifying that a vulnerability has been identified but not yet exploited. If a cleansing operation occurs during this “risk exposure” phase (i.e., T c l e a n fires), a recovery mechanism is initiated. The pre-emptive cleansing enables transition T r e c o v e r 1 , which consumes the token from P e x p o s u r e and regenerates one in P n o r m a l , thereby pre-emptively restoring the system to a secure state before an exploit can occur. Alternatively, if the attacker exploits the vulnerability before a cleansing cycle begins, transition T e x p l o i t fires. This action consumes the token from P e x p o s u r e and places one in P a t t a c k e d , indicating a successful system compromise. The scheduled cleansing cycle proceeds even if the server is compromised. Following the firing of T c l e a n , the inhibitor arc condition for T r e c o v e r 2 is met, enabling it to fire. This recovery transition then consumes the token from P a t t a c k e d and generates a new token in P n o r m a l , effectively restoring the compromised server to a secure state.
Following a successful compromise (indicated by a token in place P a t t a c k e d ), the attacker can proceed to initiate data theft. This action is modeled by the firing of transition T t h e f t , which consumes the token from P a t t a c k e d and deposits one in P l e a k . This signifies that a data leakage event is in progress. If a self-cleansing operation is initiated at this juncture, the data theft process is interrupted. This recovery is implemented in the model by the immediate transition T r e c o v e r 3 . It fires to consume the token from P l e a k and regenerate one in P n o r m a l , thereby restoring the system to a secure state and terminating the data leak.
The set of all reachable markings for the GSPN model, each defined by a unique distribution of tokens across the places, is enumerated in Table 1:
Reachability analysis of the GSPN model yields the corresponding continuous-time Markov chain illustrated in Figure 6. In this CTMC, each directed edge is labeled with λ i , representing the average firing rate of the corresponding transition T i in the GSPN model.
Based on the CTMC depicted in Figure 6, the transition rate matrix Q is formulated as follows:
Q = λ c l e a n λ s c a n λ c l e a n 0 0 λ s c a n λ d e p l o y λ d e p l o y 0 0 0 0 λ c l e a n λ c l e a n λ t h e f t λ t h e f t 0 0 λ c l e a n 0 λ c l e a n 0 0 λ c l e a n λ e x p l o i t 0 λ c l e a n λ e x p l o i t
The steady-state probability vector of the CTMC is denoted by π = { p 0 , p 1 , p 2 , p 3 , p 4 } , where each component p i represents the steady-state probability of the system residing in marking M i . This vector π is determined by solving the system of linear equations defined by the global balance equation ( π Q = 0 ) and the normalization condition ( i = 0 4 p i = 1 ). The resulting steady-state probabilities for each marking are presented as follows:
p 0 = 1 / { 1 + ( λ c l e a n + λ s c a n ) λ d e p l o y 1 + λ s c a n ( λ c l e a n + λ e x p l o i t ) 1 + λ e x p l o i t λ s c a n ( λ c l e a n + λ t h e f t ) 1 ( λ c l e a n + λ e x p l o i t ) 1 ( 1 + λ t h e f t λ c l e a n 1 ) } p 1 = ( λ c l e a n + λ s c a n ) λ d e p l o y 1 p 0 p 2 = λ e x p l o i t λ s c a n ( λ c l e a n + λ t h e f t ) 1 ( λ c l e a n + λ e x p l o i t ) 1 p 0 p 3 = λ t h e f t λ e x p l o i t λ s c a n λ c l e a n 1 ( λ c l e a n + λ t h e f t ) 1 ( λ c l e a n + λ e x p l o i t ) 1 p 0 p 4 = λ s c a n ( λ c l e a n + λ e x p l o i t ) 1 p 0
We define two key performance metrics based on the GSPN model. First, steady-state availability is defined as the summed steady-state probability of all normal operational markings. Second, steady-state security is the summed steady-state probability of all non-compromised markings. Leveraging these definitions and the previously computed steady-state probabilities, the formal expressions for these two metrics are presented as follows:
p a v a i l a b i l i t y = p ( M ( P o n l i n e ) > 0 ( M ( P n o r m a l ) > 0 M ( P e x p o s u r e ) > 0 ) ) = p 0 + p 4 p s e c u r i t y = 1 p ( M ( P a t t a c k e d ) > 0 M ( P l e a k ) > 0 ) = 1 p 2 p 3
where the notation M ( P i ) represents the number of tokens in place P i under marking M.
Upon a successful server compromise, the associated risk of data leakage is quantified by the following equation:
p l e a k = p ( M ( P l e a k ) > 0 ) = p 3

3.2. Scenario II: Forged Controller Request Attack

In the forged controller request attack scenario, an attacker first eavesdrops on legitimate controller commands and subsequently forges malicious requests that mimic them. Conventional self-cleansing intrusion tolerance mechanisms are ineffective against this type of sophisticated forgery attack. Within the GSPN model (Figure 7), this attack is represented by the firing of transition T f o r g e . This event consumes one token from place P o n l i n e and one from P e x p o s u r e , depositing a new token into place P c r a s h e d . The presence of a token in P c r a s h e d signifies that the server has transitioned to a crashed state. Due to the nature of this forgery attack, self-cleansing mechanisms fail, necessitating manual intervention by an administrator. Such interventions may include system reconfiguration, firewall rule updates to block the attack source, or modification of the control command format. This manual repair process is modeled by the firing of transition T r e p a i r . The firing consumes the token from P c r a s h e d and regenerates one token each in places P o n l i n e and P n o r m a l , effectively restoring the system to its initial, secure state.
The set of all reachable markings for the GSPN model is enumerated in Table 2.
Applying reachability analysis to the GSPN model yields its isomorphic continuous-time Markov chain, as depicted in Figure 8.
From the CTMC model depicted in Figure 8, we derive the transition rate matrix Q, as defined in Equation (5).
Q = λ a λ c l e a n λ s c a n 0 0 0 λ d e p l o y λ d e p l o y 0 0 0 0 0 λ c l e a n λ b λ f o r g e 0 λ e x p l o i t λ r e p a i r 0 0 λ r e p a i r 0 0 0 λ c l e a n 0 0 λ c l e a n 0 0 λ c l e a n 0 0 λ t h e f t λ c
where λ a = λ c l e a n + λ s c a n , λ b = λ c l e a n + λ f o r g e + λ e x p l o i t , λ c = λ c l e a n + λ t h e f t .
The steady-state probability vector of the CTMC is denoted as π = { p 0 , p 1 , , p 5 } . These probabilities are determined by solving the system of linear equations derived from the global balance equations ( π Q = 0 ) and the normalization condition ( i = 0 5 p i = 1 ), yielding the solution presented in Equation (6).
p 0 = ( λ c l e a n + λ f o r g e + λ e x p l o i t ) λ s c a n 1 p 2 p 1 = [ ( λ c l e a n + λ f o r g e + λ e x p l o i t ) ( λ c l e a n + λ s c a n ) λ s c a n 1 λ f o r g e ] λ d e p l o y 1 p 2 p 2 = 1 / { ( λ c l e a n + λ f o r g e + λ e x p l o i t ) λ s c a n 1 [ 1 + ( λ c l e a n + λ s c a n ) λ d e p l o y 1 ] + λ f o r g e ( λ r e p a i r 1 λ d e p l o y 1 ) + 1 + λ e x p l o i t ( λ c l e a n + λ t h e f t ) 1 ( λ t h e f t + 1 ) } p 3 = λ f o r g e λ r e p a i r 1 p 2 p 4 = λ t h e f t λ e x p l o i t λ c l e a n 1 ( λ c l e a n + λ t h e f t ) 1 p 2 p 5 = λ e x p l o i t ( λ c l e a n + λ t h e f t ) 1 p 2
The system’s steady-state availability and security are defined as:
p a v a i l a b i l i t y = p ( M ( P o n l i n e ) > 0 ( M ( P n o r m a l ) > 0 M ( P e x p o s u r e ) > 0 ) ) = p 0 + p 2 p s e c u r i t y = 1 p ( M ( P a t t a c k e d ) > 0 M ( P c r a s h e d ) > 0 M ( P l e a k ) > 0 ) = 1 p 3 p 4 p 5
The risk of data leakage is then quantified using Equation (8):
p l e a k = p ( M ( P l e a k ) > 0 ) = p 4

3.3. Scenario III: SCIT Model Based on Authentication Improvement

In our proposed model, a potentially forged controller request initiates a certificate-based authentication process. As depicted in the GSPN model (Figure 9), this event corresponds to the firing of transition T f o r g e , which moves a token from the P e x p o s u r e place to the P v e r i f y place, signaling the system’s entry into the “under verification” state.
When the system is in the verification state (i.e., a token is in place P v e r i f y ), its evolution can follow one of three mutually exclusive paths:
  • A legitimate cleansing command preempts the authentication process. This event is modeled by the firing of priority transition T c l e a n , which moves a token from P o n l i n e to P r e a d y , transitioning the server to the “offline ready” state. Concurrently, the loss of the token in P o n l i n e triggers the immediate transition T r e c o v e r 3 , which returns the token from P v e r i f y to P n o r m a l , thereby resolving the verification state and securing the server.
  • Authentication Failure. If the verification completes without interruption and the attacker’s identity forgery fails, transition T f a i l fires. This moves the token from P v e r i f y to P d e n i e d , signifying a rejected attack attempt. Subsequently, the immediate transition T r e c o v e r 4 is triggered, which moves the token from P d e n i e d back to P n o r m a l to reset the server state.
  • Successful Attack. A successful attack, where the attacker either forges an identity or bypasses authentication via an unknown vulnerability, is modeled by the firing of transition T s u c c e s s . This transition consumes one token from both P v e r i f y and P o n l i n e , and deposits a new token into P c r a s h e d , forcing the server into an unexpected downtime state.
To model an exploit-based attack, an attacker must first pass an authentication challenge. This initial attempt triggers the transition T e x p l o i t , which moves a token from P e x p o s u r e to P a u t h , placing the system into an “authentication-in-progress” state. From this state, three exclusive outcomes are possible:
  • The initiation of a server cleansing during the authentication process enables the immediate transition T r e c o v e r 2 . Its firing returns the token from P a u t h to P n o r m a l , thereby aborting the authentication attempt.
  • If the attacker passes authentication, transition T p a s s fires. This moves the token from P a u t h to P a t t a c k e d , signifying that the server has been compromised.
  • Conversely, if the authentication fails, transition T r e j e c t fires. This event moves the token from P a u t h to P d e n i e d , indicating that the attack attempt has been successfully thwarted.
Data theft is modeled by transition T t h e f t , which moves a token from P a t t a c k e d to P l e a k to signify a data breach. This state persists until a cleansing operation blocks the attacker’s access and triggers the recovery transition T r e c o v e r 6 . The firing of T r e c o v e r 6 then moves the token from P l e a k back to P n o r m a l , completing the defense and recovery loop.
In summary, the model’s reachable marking set, defined by the token distribution across its places, is presented in Table 3.
From the reachable markings listed in Table 3 and their corresponding transition relationships, we construct the continuous-time Markov chain shown in Figure 10.
The continuous-time Markov chain from Figure 10 yields the transition rate matrix shown in Equation (9).
Q = λ a 0 λ s c a n 0 0 λ c l e a n 0 0 0 λ r e p a i r λ r e p a i r 0 0 0 0 0 0 0 0 0 λ b λ f o r g e λ e x p l o i t λ c l e a n 0 0 0 0 λ s u c c e s s 0 λ c 0 λ c l e a n λ f a i l 0 0 0 0 0 0 λ d λ c l e a n λ r e j e c t λ p a s s 0 λ d e p l o y 0 0 0 0 λ d e p l o y 0 0 0 0 0 0 0 0 λ c l e a n λ c l e a n 0 0 0 0 0 0 0 λ c l e a n 0 λ e λ t h e f t 0 0 0 0 0 λ c l e a n 0 0 λ c l e a n
where λ a = λ s c a n + λ c l e a n , λ b = λ f o r g e + λ e x p l o i t + λ c l e a n , λ c = λ s u c c e s s + λ c l e a n + λ f a i l , λ d = λ c l e a n + λ r e j e c t + λ p a s s , λ e = λ c l e a n + λ t h e f t .
Let π = { p 0 , p 1 , , p 8 } be the steady-state probability vector of the CTMC. The individual probabilities p i are obtained by solving the system of linear equations π Q = 0 and i = 0 8 p i = 1 . The resulting values are presented in Equation (10).
p 0 = ( λ f o r g e + λ e x p l o i t + λ c l e a n ) λ s c a n 1 p 2 p 1 = λ s u c c e s s λ f o r g e λ r e p a i r 1 ( λ s u c c e s s + λ c l e a n + λ f a i l ) 1 p 2 p 2 = 1 / { [ ( λ f o r g e + λ e x p l o i t + λ c l e a n ) λ s c a n 1 + 1 + λ f o r g e ( λ c l e a n + λ f a i l ) ( λ s u c c e s s + λ c l e a n + λ f a i l ) 1 λ c l e a n 1 + λ e x p l o i t λ c l e a n 1 ] ( 1 + λ c l e a n λ d e p l o y 1 ) + λ s u c c e s s λ f o r g e λ r e p a i r 1 ( λ s u c c e s s + λ c l e a n + λ f a i l ) 1 } p 3 = λ f o r g e ( λ s u c c e s s + λ c l e a n + λ f a i l ) 1 p 2 p 4 = λ e x p l o i t ( λ c l e a n + λ r e j e c t + λ p a s s ) 1 p 2 p 5 = [ ( λ s c a n + λ c l e a n ) ( λ f o r g e + λ e x p l o i t + λ s c a n ) λ d e p l o y 1 λ s c a n 1 λ s u c c e s s λ f o r g e λ d e p l o y 1 ( λ s u c c e s s + λ c l e a n + λ f a i l ) 1 ] p 2 p 6 = [ λ f a i l λ f o r g e λ c l e a n 1 ( λ s u c c e s s + λ c l e a n + λ f a i l ) 1 + λ r e j e c t λ e x p l o i t λ c l e a n 1 ( λ c l e a n + λ r e j e c t + λ p a s s ) 1 ] p 2 p 7 = λ p a s s λ e x p l o i t ( λ c l e a n + λ t h e f t ) 1 ( λ c l e a n + λ r e j e c t + λ p a s s ) 1 p 2 p 8 = λ t h e f t λ p a s s λ e x p l o i t λ c l e a n 1 ( λ c l e a n + λ t h e f t ) 1 ( λ c l e a n + λ r e j e c t + λ p a s s ) 1 p 2
Using the previously defined metrics, the system’s steady-state availability and security are quantitatively evaluated with Equation (11).
p a v a i l a b i l i t y = p ( M ( P o n l i n e ) > 0 ( M ( P a t t a c k e d ) + M ( P l e a k ) ) = 0 ) = p 0 + p 2 + p 3 + p 4 + p 6 p s e c u r i t y = 1 p ( ( M ( P a t t a c k e d ) + M ( P l e a k ) ) > 0 M ( P c r a s h e d ) > 0 ) = 1 p 1 p 7 p 8
The risk of data leakage is given by:
p l e a k = p ( M ( P l e a k ) > 0 ) = p 8

3.4. Methodology for Scalability Analysis

As illustrated in Figure 5, Figure 7 and Figure 9, the preceding analyses were limited to single-server scenarios. To evaluate the scalability and overall performance of the proposed mechanism in multi-server deployments, we conducted an extended analysis. We assume that the cleansing processes of individual instances are mutually independent, as are the attack sequences targeting different instances. Given that any single instance is characterized by the previously constructed Petri net model, and that all instances share identical structures and parameters, we constructed a system-level Petri net comprising n replicas by duplicating the single-instance model. This approach enables a unified evaluation of the multi-instance system. To quantify system-level metrics, the following criteria were adopted: the system is considered available if at least one server remains available; it is deemed insecure if any server enters a compromised state; and a system-wide data leakage risk is identified if a data breach occurs in any server.

4. Results

4.1. Experimental Environment and Configuration

To quantitatively evaluate the proposed scheme, we employed the PIPE [19,20] modeling tool to construct a generalized stochastic Petri net model for each of the three scenarios. The key model parameters, including the average firing rates ( λ i ) of timed transitions ( T i ), were configured as detailed in Table 4.
A parameter sensitivity analysis was conducted to evaluate the influence of the cleansing firing rate ( λ c l e a n ) on system performance. In this analysis, λ c l e a n was varied from 1 to 15 for each attack scenario, and the resulting system performance was simulated. The findings are illustrated in Figure 11, Figure 12 and Figure 13.
The proposed Self-Cleaning Intrusion Tolerance (SCIT) system was implemented in Golang (version 1.24.0) and deployed within a controlled network environment for simulation. The experimental platform consisted of a physical server equipped with an Intel Xeon E5-2603 v4 CPU (1.70 GHz) and 16 GB of RAM. A VMware virtualization environment was deployed on this server to host the three virtual machines (VMs) constituting the experimental scenario. These VMs were designated as the Service Node, Controller Node, and Client/Attacker Node, tasked with service provisioning, cleansing control, and system evaluation (encompassing liveness detection and attack simulation), respectively. Detailed VM configurations are provided in Table 5.
Regarding parameter settings, the unit time was fixed at 30 s, while the cleaning frequency was incremented from 1 to 15. For each frequency interval, the system operated continuously for 30 min to quantify service availability, security, and data leakage risks.

4.2. Numerical Results and Analysis

Figure 11 illustrates the impact of the cleansing firing rate, λ c l e a n , on system availability across different scenarios. A key observation is the significant drop in availability when controller request forgery attacks are present, which underscores the critical role of the authentication mechanism in preventing such intrusions.
The relationship between λ c l e a n and availability is non-monotonic and exhibits a clear trade-off. Initially, at low cleansing rates, attackers have a prolonged attack window, leading to persistent low availability. As λ c l e a n increases, the server exposure time shortens, enabling faster threat removal and service recovery, which causes a corresponding rise in availability. However, beyond a certain threshold, increasing λ c l e a n becomes detrimental. This occurs for two reasons. First, the security benefits plateau as the attacker’s impact is already maximally mitigated. Second, the defense mechanism itself introduces overhead, as frequent cleansing operations increase server downtime for maintenance. Eventually, this overhead-induced availability loss outweighs the security gains, causing the overall system availability to decline.
Figure 12 illustrates how the cleansing firing rate, λ c l e a n , affects system security. The results highlight two key findings. First, controller request forgery attacks significantly degrade system security. The authentication mechanism effectively counters this threat by intercepting malicious traffic, thereby elevating system security even beyond the Scenario I. Second, system security is positively correlated with λ c l e a n . This is because a higher cleansing rate shortens the attacker’s window of opportunity, thwarting complex, multi-stage attacks that require time to deploy malicious payloads or establish backdoors. Consequently, a higher λ c l e a n reduces the probability of a successful intrusion, leading to enhanced overall system security.
Figure 13 shows a strong negative correlation between the cleansing firing rate ( λ c l e a n ) and the risk of data leakage. This is because higher cleansing frequencies compress the attacker’s window of opportunity, hindering the completion of the full attack chain from initial penetration to final data exfiltration. Consequently, the probability of successful data leakage declines sharply with an increasing λ c l e a n . Interestingly, an anomalous result appears at lower cleansing rates: the data leakage risk in Scenario II is lower than in the Scenario I. This effect occurs because unintended server outages, triggered by forged commands, inadvertently shorten the server’s effective exposure time, thus limiting opportunities for data theft. This finding highlights a complex interplay between availability and security threats. Scenario III achieves the lowest risk by design, using an authentication mechanism to proactively deny unauthorized access, thus offering the most robust defense against data leakage.
To evaluate scalability, multi-instance scenarios were simulated by replicating the single-instance Petri net model n times (where n = 1 , 2 , 3 ), with the cleaning trigger rate uniformly set to 1. As illustrated in Figure 14, system availability exhibits a distinct upward trend as n increases. This improvement is primarily attributed to the redundancy inherent in multi-instance deployments, which enhances fault tolerance: even if specific instances malfunction or become unavailable, the remaining instances can maintain service continuity, thereby boosting overall availability. Conversely, as the number of instances grows, system security declines, and the risk of data leakage escalates. This trend arises because expanding the system scale significantly amplifies the attack surface; the increased number of intrusion paths and targets raises both the probability of successful intrusion and the risk of data leakage. A comparison across the three scenarios reveals that for availability and security metrics, the Authentication-Enhanced SCIT achieves optimal performance, followed by the traditional SCIT. Scenario II (susceptible to forgery attacks) exhibits the poorest performance. Regarding data leakage risk specifically, the Authentication-Enhanced SCIT maintains its superiority. Notably, however, Scenario II outperforms the traditional SCIT in this specific metric, a phenomenon consistent with the data leakage risk analysis presented earlier.

4.3. Experimental Evaluation of the Prototype System

In the experimental setup, services were deployed as Docker containers acting as the protected targets. To simulate destructive intrusions, system panics were manually induced within the containers to provoke service anomalies. A ping interface was exposed as a health-check endpoint for continuous service status monitoring. Probe responses categorized system behavior into three distinct states: (1) Normal Operation, indicated by a successful response; (2) Service Failure, signified by an error response resulting from a successful attack; and (3) Maintenance/Unavailable, characterized by connection errors distinct to the system’s cleaning and rotation phases (i.e., node reset or switching).
VM3 operated concurrently as a legitimate client and an attacking node, initiating continuous attacks and liveness probes while logging response data. Based on the acquired logs over a 30-min operational window, two key metrics were derived:
  • Availability: Defined as the ratio of successful probe responses to the total volume of probes issued.
  • Security: Calculated as the proportion of non-compromised system states–comprising the sum of successful responses and connection errors–relative to the total probe count. This metric effectively quantifies system resilience by excluding instances of service failure resulting from successful intrusions.
To assess the risk of data exfiltration, a dedicated interface was deployed on the service node to emulate a buffer overflow vulnerability. When an attack triggers the buffer overflow, the system is considered to have suffered a confidentiality violation, as such vulnerabilities are widely recognized to potentially lead to unauthorized data disclosure. In this experimental setting, a data leakage event is operationally defined as the successful triggering of a buffer overflow vulnerability, serving as an indicator of compromised confidentiality rather than a direct measurement of actual data exfiltration. Accordingly, the interface returns a Boolean value, where False indicates no overflow and True indicates a successful breach.
Based on the statistical analysis of the interface responses, Data Leakage Probability is quantified as the ratio of positive leakage indicators (returning True) to the total volume of attack requests issued within a 30-min experimental window. This metric reflects the attack success rate in triggering confidentiality-compromising conditions and serves as a quantitative indicator of data exfiltration risk.
The experimental results are presented in Figure 15. Overall, the experimental data aligns closely with the analytical results derived from the Petri net model, exhibiting comparable trends and numerical ranges. This close correspondence confirms that the established model accurately captures the system’s behavioral characteristics across various parameter configurations. Minor discrepancies between the datasets are primarily attributable to non-ideal factors inherent in the virtualized environment, including resource contention, scheduling overhead, and network latency fluctuations. However, these deviations do not compromise the validity of the core conclusions. Ultimately, these empirical results corroborate the effectiveness of the Petri net analysis, providing a solid foundation for subsequent parameter selection and mechanism optimization.
The integration of an authentication mechanism inevitably incurs additional communication overhead. To quantify this, Round-Trip Time (RTT) was measured at the application layer–defined as the interval between request transmission and response reception–and the average latency was computed over 1000 independent trials. Results indicate that, while controlling for network conditions and system configurations, the deployment of authentication imposes an average end-to-end latency penalty of 11.72 ms.

5. Conclusions

This paper addresses the vulnerability of traditional self-cleansing intrusion-tolerant systems to controller command forgery attacks by designing and evaluating an authentication-enhanced intrusion tolerance framework. Our GSPN-based analysis demonstrates that the proposed architecture yields superior system availability, bolstered security, and reduced data leakage risk compared to conventional approaches. By integrating an authentication layer with proactive defense mechanisms, our work offers a robust foundation for building highly resilient network services. In addition, scalability analysis under different system scales and parameter settings, together with prototype-based experiments, further validates the effectiveness and practical feasibility of the proposed design. Future research will extend this model by exploring adaptive cleansing strategies. Such strategies would enable the system to dynamically adjust its defense intensity based on real-time threat intelligence, paving the way for more intelligent and efficient intrusion tolerance in dynamic adversarial environments.

Author Contributions

Methodology and formal analysis, W.F.; validation, S.L.; data curation C.C.; conceptualization and project administration, L.S.; review and editing, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China International Cooperation and Exchange Project (Grant No. 62111530052), Shandong Province Small and Medium-sized Enterprises Innovation Capacity Enhancement Project (Grant No. 2025TSGCCZZB0235), Open Research Projects of Shandong Provincial Key Laboratory of Industrial Network and Information System Security (Grant No. SDKLINISS-2025) and Key Research and Development Program of Shandong (Grant No. 2025CXPT082).

Data Availability Statement

All data included in this study are available upon request by contact with the corresponding author.

Conflicts of Interest

Author Juan Wang was employed by the company Qingdao Moses Process Control Technology Co. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SCITSelf-Cleansing Intrusion Tolerance
GSPNGeneralized Stochastic Petri Nets
CTMCContinuous-Time Markov Chain

References

  1. Li, G.; Wang, W.; Gai, K.; Tang, Y.; Yang, B.; Si, X. A framework for mimic defense system in cyberspace. J. Sign Process. Syst. 2021, 93, 169–185. [Google Scholar] [CrossRef]
  2. Han, X.; Kheir, N.; Balzarotti, D. Deception techniques in computer security: A research perspective. ACM Comput. Surv. 2018, 51, 80. [Google Scholar] [CrossRef]
  3. Li, Q.; Meng, S.; Sang, X.; Zhang, H.; Wang, S.; Bashir, A.K.; Yu, K.; Tariq, U. Dynamic scheduling algorithm in cyber mimic defense architecture of volunteer computing. ACM Trans. Internet Technol. 2021, 21, 75. [Google Scholar] [CrossRef]
  4. Hu, H.; Sui, J.; Zhang, S.; Tong, Y. Proactive Defense Technology in Cyber Security: Strategies, Methods and Challenges. Comput. Sci. 2024, 51, 829–841. [Google Scholar]
  5. Li, B.; Chen, S.; Xu, G.; Jia, Y.; Wang, C.; Xue, F.; Wang, X.; Wang, W.; Li, Z.; Li, J. A review of cyberspace mimic defense research. J. Cyber Secur. 2025, 10, 74–97. [Google Scholar] [CrossRef]
  6. Javadpour, A.; Ja’fari, F.; Taleb, T.; Shojafar, M.; Benzaïd, C. A comprehensive survey on cyber deception techniques to improve honeypot performance. Comput. Secur. 2024, 140, 103792. [Google Scholar] [CrossRef]
  7. Sun, R.; Zhu, Y.; Fei, J.; Chen, X. A survey on moving target defense: Intelligently affordable, optimized and self-adaptive. Appl. Sci. 2023, 13, 5367. [Google Scholar] [CrossRef]
  8. Jiang, Y.; Huang, J.; Jin, W. The research progress of network intrusion tolerance. In Proceedings of the International Conference on Cyberspace Technology (CCT 2014); IET: Stevenage, UK, 2014; pp. 1–4. [Google Scholar]
  9. Bangalore, A.K.; Sood, A.K. Securing web servers using self cleansing intrusion tolerance (SCIT). In Proceedings of the 2009 Second International Conference on Dependability; IEEE: Piscataway, NJ, USA, 2009; pp. 60–65. [Google Scholar]
  10. Wagner, B.; Sood, A. Economics of resilient cloud services. In Proceedings of the 2016 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C); IEEE: Piscataway, NJ, USA, 2016; pp. 368–374. [Google Scholar]
  11. Nguyen, Q.L.; Sood, A. Scalability of cloud based scit-mtd. In Proceedings of the 2017 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C); IEEE: Piscataway, NJ, USA, 2017; pp. 581–582. [Google Scholar]
  12. Qi, X.; Shen, S.; Wang, Q. A Moving Target Defense Technology Based on SCIT. In Proceedings of the 2020 International Conference on Computer Engineering and Application (ICCEA); IEEE: Piscataway, NJ, USA, 2020; pp. 454–457. [Google Scholar]
  13. Sanoussi, N.; Chetioui, K.; Orhanou, G.; El Hajji, S. ITC: Intrusion tolerant controller for multicontroller SDN architecture. Comput. Secur. 2023, 132, 103351. [Google Scholar] [CrossRef]
  14. Sood, A.; Moidu, K. Protection of healthcare information: Adding cyber resilience and recovery. In Proceedings of the 2018 International Conference on Computational Science and Computational Intelligence (CSCI); IEEE: Piscataway, NJ, USA, 2018; pp. 132–134. [Google Scholar]
  15. Nagarajan, A.; Sood, A. SCIT and IDS architectures for reduced data ex-filtration. In Proceedings of the 2010 International Conference on Dependable Systems and Networks Workshops (DSN-W); IEEE: Piscataway, NJ, USA, 2010; pp. 164–169. [Google Scholar]
  16. El Mir, I.; Kim, D.S.; Haqiq, A. Cloud Computing Security Modeling and Analysis based on a Self-Cleansing Intrusion Tolerance Technique. J. Inf. Assur. Secur. 2016, 11, 273–282. [Google Scholar]
  17. Lan, S.; Li, F.; Shi, L. CFL-based authentication and communication scheme for industrial control system. J. Comput. Appl. 2023, 43, 1183–1190. [Google Scholar]
  18. Bezerra, T.; Callou, G.; França, C.; Tavares, E.A.G. A Stochastic Petri Net-Based Approach for Evaluating the Performability of Internet of Medical Things. In Proceedings of the 13th Latin-American Symposium on Dependable and Secure Computing; Association for Computing Machinery: New York, NY, USA, 2024; pp. 210–219. [Google Scholar]
  19. Dingle, N.J.; Knottenbelt, W.J.; Suto, T. PIPE2: A tool for the performance evaluation of generalised stochastic Petri Nets. SIGMETRICS Perform. Eval. Rev. 2009, 36, 34–39. [Google Scholar] [CrossRef]
  20. Bonet, P.; Lladó, C.M.; Puijaner, R.; Knottenbelt, W.J. PIPE v2. 5: A Petri net tool for performance modelling. In Proceedings of the Proc. 23rd Latin American Conference on Informatics (CLEI 2007); CLEI: San Jose, Costa Rica, 2007; pp. 1–12. [Google Scholar]
Figure 1. Scenario I: Traditional SCIT Model.
Figure 1. Scenario I: Traditional SCIT Model.
Modelling 07 00024 g001
Figure 2. Scenario II: Instruction Forgery Attack.
Figure 2. Scenario II: Instruction Forgery Attack.
Modelling 07 00024 g002
Figure 3. Scenario III: Improved Model.
Figure 3. Scenario III: Improved Model.
Modelling 07 00024 g003
Figure 4. Basic elements of GSPN.
Figure 4. Basic elements of GSPN.
Modelling 07 00024 g004
Figure 5. GSPN model of the Scenario I.
Figure 5. GSPN model of the Scenario I.
Modelling 07 00024 g005
Figure 6. CTMC model of the Scenario I.
Figure 6. CTMC model of the Scenario I.
Modelling 07 00024 g006
Figure 7. GSPN model of the Scenario II.
Figure 7. GSPN model of the Scenario II.
Modelling 07 00024 g007
Figure 8. CTMC model of the Scenario II.
Figure 8. CTMC model of the Scenario II.
Modelling 07 00024 g008
Figure 9. GSPN model of the Scenario III.
Figure 9. GSPN model of the Scenario III.
Modelling 07 00024 g009
Figure 10. CTMC model of the Scenario III.
Figure 10. CTMC model of the Scenario III.
Modelling 07 00024 g010
Figure 11. Steady-state availability analysis.
Figure 11. Steady-state availability analysis.
Modelling 07 00024 g011
Figure 12. Steady-state security analysis.
Figure 12. Steady-state security analysis.
Modelling 07 00024 g012
Figure 13. Data leakage risk analysis.
Figure 13. Data leakage risk analysis.
Modelling 07 00024 g013
Figure 14. Scalability analysis. (a) Steady-state availability analysis (b) Steady-state security analysis (c) Data leakage risk analysis.
Figure 14. Scalability analysis. (a) Steady-state availability analysis (b) Steady-state security analysis (c) Data leakage risk analysis.
Modelling 07 00024 g014
Figure 15. Experimental Results on the Prototype. (a) Steady-state availability analysis (b) Steady-state security analysis (c) Data leakage risk analysis.
Figure 15. Experimental Results on the Prototype. (a) Steady-state availability analysis (b) Steady-state security analysis (c) Data leakage risk analysis.
Modelling 07 00024 g015
Table 1. Reachable markings’ set of Scenario I.
Table 1. Reachable markings’ set of Scenario I.
State P attacked P exposure P leak P normal P online P ready
M 0 000110
M 1 000101
M 2 100010
M 3 001010
M 4 010010
Table 2. Reachable markings’ set of Scenario II.
Table 2. Reachable markings’ set of Scenario II.
State P attacked P crashed P exposure P leak P normal P online P ready
M 0 0000110
M 1 0000101
M 2 0010010
M 3 0100000
M 4 0001010
M 5 1000010
Table 3. Reachable markings’ set of Scenario III.
Table 3. Reachable markings’ set of Scenario III.
State P attacked P auth P crashed P denied P leak P exposure P normal P online P ready P verify
M 0 0000001100
M 1 0010000000
M 2 0000010100
M 3 0000000101
M 4 0100000100
M 5 0000001010
M 6 0001000100
M 7 1000000100
M 8 0000100100
Table 4. The average transition triggering rate of Scenario I/II/III.
Table 4. The average transition triggering rate of Scenario I/II/III.
Scenario IScenario IIScenario III
λ d e p l o y 505050
λ s c a n 111
λ e x p l o i t 222
λ t h e f t 222
λ f o r g e -11
λ r e p a i r -0.50.5
λ p a s s --1
λ r e j e c t --1
λ s u c c e s s --1
λ f a i l --1
Table 5. Virtual Machine Configuration.
Table 5. Virtual Machine Configuration.
NodeRole and FunctionSystemResource Allocation
V M 1 Provides external network services; performs cleansingDebian 112 vCPU/2GB RAM
V M 2 Issues cleansing policies and commands; controls cleansing operationsDebian 111 vCPU/1GB RAM
V M 3 Performs liveness probing; launches attacks and records resultsDebian 112 vCPU/2GB RAM
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fu, W.; Luo, S.; Cao, C.; Shi, L.; Wang, J. Modeling and Authentication Analysis of Self-Cleansing Intrusion-Tolerant System Based on GSPN. Modelling 2026, 7, 24. https://doi.org/10.3390/modelling7010024

AMA Style

Fu W, Luo S, Cao C, Shi L, Wang J. Modeling and Authentication Analysis of Self-Cleansing Intrusion-Tolerant System Based on GSPN. Modelling. 2026; 7(1):24. https://doi.org/10.3390/modelling7010024

Chicago/Turabian Style

Fu, Wenhao, Shenghan Luo, Chi Cao, Leyi Shi, and Juan Wang. 2026. "Modeling and Authentication Analysis of Self-Cleansing Intrusion-Tolerant System Based on GSPN" Modelling 7, no. 1: 24. https://doi.org/10.3390/modelling7010024

APA Style

Fu, W., Luo, S., Cao, C., Shi, L., & Wang, J. (2026). Modeling and Authentication Analysis of Self-Cleansing Intrusion-Tolerant System Based on GSPN. Modelling, 7(1), 24. https://doi.org/10.3390/modelling7010024

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop