Enhanced Blockchain-Based Data Poisoning Defense Mechanism

Kim, Song-Kyoo

doi:10.3390/app15074069

Open AccessArticle

Enhanced Blockchain-Based Data Poisoning Defense Mechanism

by

Song-Kyoo Kim

Faculty of Applied Sciences, Macao Polytechnic University, Macao, China

Appl. Sci. 2025, 15(7), 4069; https://doi.org/10.3390/app15074069

Submission received: 15 February 2025 / Revised: 28 March 2025 / Accepted: 3 April 2025 / Published: 7 April 2025

(This article belongs to the Special Issue Approaches to Cyber Attacks and Malware Detection)

Download

Browse Figures

Versions Notes

Abstract

This paper deals with a new secured execution environment which adapts blockchain technology to defend artificial intelligence (AI) models against data poisoning (DP) attacks. The Blockchain Governance Game (BGG) is a theoretical framework for analyzing the network to provide the decision-making moment for taking preliminary cybersecurity actions before DP attacks. This innovative method for conventional decentralized network securities is adapted into a DP defense for AI models in this paper. The core components in the DP defense network, including the Predictor and the BGG engine, are fully implemented. This research concerns the first blockchain-based DP defense mechanism which establishes an innovative framework for DP defense based on the BGG. The simulation in the paper demonstrates realistic DP attack situations targeting AI models. This new controller is newly designed to provide sufficient cybersecurity performance measures even with minimal data collection and limited computing power. Additionally, this research will be helpful for those considering using blockchain to implement a DP defense mechanism.

Keywords:

machine learning; Blockchain Governance Game; data poisoning attack; convolutional neural network; mixed game; cybersecurity; MATLAB

1. Introduction

The stability of modern artificial intelligence (AI) models is essential for various applications and devices [1,2,3,4]. Machine learning (ML) techniques are increasingly integrating additional technologies to tackle cybersecurity challenges. As new vulnerabilities arise, these threats can compromise the integrity of the ML models, which attackers specifically target [5]. Notably, data poisoning (DP) attacks are designed to undermine the integrity of a model during both the training and testing stages. Hence, there is a pressing need to develop advanced defense strategies that bolster the resilience of these models against potential DP threats while ensuring the overall reliability of AI-based systems [6]. AI-adapted decision systems have seen extensive development capitalizing on the availability of data to enhance their efficiency [7]. These traffic datasets often serve as comprehensive records that encompass various attributes, features, and relevant information. The enormous volume of network traffic carrying such attributes within centralized systems makes it challenging to maintain the effectiveness of AI-driven cybersecurity models [8,9,10,11]. Security threats posed to machine learning frameworks are generally categorized into two phases: training and testing (or inference). Importantly, data poisoning (DP) attacks can manifest during both phases, presenting significant risks to the integrity of these models (see Figure 1). DP not only encompasses the inclusion of corrupted data samples but also involves data that lead to unauthorized alterations of hyper-parameters in AI models through illicit means [6]. A prevalent form of DP attack includes the injection of malicious samples into the targeted training set, which disrupts either the feature values or the labels of the training examples during the training phase, thereby affecting the model reliability by modifying its hyper-parameters [6].

DP attacks can take place not only during the training phase but also specifically during the testing phase, where the emphasis is often on altering model parameters as opposed to corrupting the test samples themselves [12,13]. In this context, many studies have found that the predominant defense strategies implemented during testing prioritize the protection of the integrity of testing nodes. Rather than directly filtering or securing the data, these strategies aim to safeguard the framework within which testing occurs [6,14]. In this context, an advanced method for range ambiguity suppression in spaceborne SAR systems utilizing blind source separation has been proposed, enhancing robust signal processing techniques for DP detection [15]. Recent developments have also introduced Trusted Execution Environments (TEEs), which serve as secure execution environments and provide substantial protection for both the code and data, thereby strengthening overall cybersecurity measures. These TEEs ensure that the information contained within them remains confidential and maintain its integrity against potential threats [14]. While TEEs have been extensively studied within specific artificial intelligence (AI) systems, the methodologies for developing these secure environments are applicable beyond just AI scenarios [6]. Additionally, various non-AI-specific defense mechanisms can contribute to protecting trained machine learning models during the testing phase [16,17]. Furthermore, innovative blockchain-based solutions have emerged as effective methods for ensuring integrity across interconnected nodes, making these techniques adaptable for use within AI systems to counteract DP attacks. The implementation of blockchain-based defense mechanisms has gained attention due to their potential to ensure the integrity of connected nodes throughout distributed networks. These mechanisms have been meticulously designed to not only enhance the security of the network as a whole but also manage the risks associated with data exposure and manipulation. A significant advantage of these techniques is their adaptability; they can be effectively integrated into artificial intelligence (AI) systems as defensive strategies against data poisoning (DP) attacks. This adaptability makes blockchain technology a valuable asset for safeguarding AI models from vulnerabilities that could compromise their performance and reliability. Consequently, leveraging blockchain-based approaches in AI systems can help establish a robust framework for defending against the malicious activities associated with DP, thereby contributing to a more secure and trustworthy AI deployment [6].

The technologies adapted from blockchain have been extensively used in myriad applications that go well beyond the realm of cryptocurrencies. This innovative technology serves as an encrypted ledger that is equitably distributed among all participating nodes, providing a robust infrastructure that facilitates secure transactions between entities that may be unknown to one another [18,19,20,21]. Moreover, blockchain technology not only enhances the security of these transactions but also plays a crucial role in maintaining the integrity of the nodes involved in the network. Blockchain technology provides a decentralized and tamper-resistant method for enhancing security, setting it apart from traditional approaches. By utilizing a distributed ledger, blockchain significantly reduces the vulnerabilities linked to single points of failure typical in centralized systems, which endows it with greater resilience against threats such as data alteration and unauthorized access. Transactions benefit from cryptographic features that uphold their integrity and authenticity, thereby enhancing trust among participants. One major strength of blockchain technology is its decentralized peer-to-peer network, which reduces security risks linked to centralized data storage. By distributing data across multiple nodes, this approach minimizes vulnerabilities, making blockchain inherently resilient against attacks and enhancing overall system security. Furthermore, the immutable nature of blockchain allows for transparent audit trails, enhancing accountability in various applications, particularly in sectors like finance, healthcare, and supply chain management. The adaptive capability of blockchain extends to integration with AI systems, providing added layers of security through innovative defense strategies against threats like data poisoning. These characteristics position blockchain as a formidable alternative to conventional security measures, addressing contemporary cybersecurity challenges more effectively while promoting a robust and secure digital landscape. Given these substantial benefits, blockchain has been increasingly integrated into various cybersecurity applications. Among the most pressing concerns addressed by these emerging techniques is the potential for a 51 percent attack, a method where an entity gains control over the majority of the network, compromising its security [22]. Efforts to design effective countermeasures against this kind of attack reflect the growing importance of blockchain in protecting against the vulnerabilities inherent in centralized systems, thus augmenting its presence in the cybersecurity landscape [17]. The 51 percent attack represents a significant threat in the blockchain ecosystem, as it involves a malicious entity gaining control over the majority of the computational power of networks. This type of attack can lead to the generation of fraudulent blocks containing false transaction information, effectively undermining the integrity of the blockchain. Such vulnerabilities are especially concerning, as attackers can exploit this power to affect various applications, including artificial intelligence (AI) systems, potentially compromising their operations. A significant benefit of blockchain-based networks lies in their built-in ability to withstand various conventional cyber threats which frequently impact centralized systems. By distributing information across a wide array of nodes and employing cryptographic techniques, these networks enhance their defenses and create a more secure operational environment compared to traditional setups. By distributing data across numerous nodes and utilizing cryptographic technologies, blockchain networks are better equipped to guard against attacks that could have severe repercussions in centralized environments, thereby enhancing their overall security posture in the digital landscape [16,17].

The main contributions of this paper are significant advancements in the integration of blockchain technology into a robust data poisoning (DP) defense system specifically designed for distributed collective AI model networks. Key achievements include the development of the Blockchain Governance Game (BGG) by incorporating artificial intelligence (AI) methodologies, which enhances its effectiveness in cybersecurity applications. Another crucial aspect is the framework developed to effectively prevent 51 percent attacks, thereby safeguarding the integrity of the network against majority control threats. Additionally, the BGG controller uniquely operates at the intersection of blockchain and AI, utilizing machine learning (ML) and deep learning (DL) algorithms for superior security performance. The innovative BGG controller comprises two key components: the Predictor, which uses a convolutional neural network (CNN) to foresee potential attacks by predicting when an adversary may acquire over half of the nodes; and the BGG decision engine, designed based on BGG theory. This newly proposed controller is designed to deliver high cybersecurity performance even with limited data collection and minimal computational power, ensuring resource efficiency while maintaining robust protection measures. The proposed method features an innovative application of the Blockchain Governance Game (BGG) framework to develop advanced data poisoning (DP) defense strategies, setting it apart from standard techniques. Unlike typical approaches that mainly concentrate on filtering damaged data or securing systems, this method takes a proactive stance by anticipating potential DP attack scenarios. It utilizes a Predictor component that employs convolutional neural networks (CNNs) to enhance its predictive capabilities and fortify defenses effectively. This proactive strategy facilitates prompt defensive maneuvers, thereby safeguarding the integrity of artificial intelligence (AI) systems throughout their operational stages. Moreover, the incorporation of blockchain technology fortifies data integrity and security via decentralized validation and immutable ledger frameworks, effectively mitigating risks that centralized systems might inadvertently overlook. The collaboration of machine learning algorithms within the BGG elevates adaptive responses and resource optimization, distinctly highlighting the originality and innovative essence of this data poisoning (DP) defense mechanism in contrast to traditional methods within the cybersecurity landscape.

This paper is organized as follows: The basic knowledge of a DP attack and its defense mechanism has been explained in this section. The theoretical background of the BGG and the BGG-based DP defense network structure are introduced in Section 2. In Section 3, the BGG controller, which is the core component of the BGG-based DP defense network, is introduced. The BGG controller in the control center is the decision system of the machine-learning-enabled Blockchain Governance Game, and its sub-components, the Predictor and the BGG engine, are also explained in this section. The simulation result of the BGG controller is provided in Section 4. Lastly, the conclusion is shown in Section 5.

2. Novel Blockchain Adapted Data Poisoning Defense Mechanism

The innovative stochastic game model called the Blockchain Governance Game (BGG) was recently developed for analyzing various cybersecurity measures, including the moment of actions (i.e., safety mode) before attacks [17], and a variant of the BGG model has also been developed [16]. The theoretical framework of the BGG has been adapted into various areas including connected cars [23]. These models are designed for protecting decentralized networks from blockchain-based attacks and for keeping the network decentralized. The BGG might be considered as a defense mechanism to protect against the manipulation of AI models in the inference phase [6]. This section provides the basics of the BGG and the adapted cloud network structure of the distributed AI models in the testing phase.

2.1. Theoretical Background

In the original BGG [17], a defender only manages a small portion of genuine nodes as his backup. As is illustrated on Figure 2, the network which might be failed by the 51 percent attack is defended by adding reserved nodes. In the illustration, the total portion of corrupted nodes is less than 51 percent of the total nodes due to the addition of two reserved nodes because the portion is changed to

4 / 9

(right) from

4 / 7

(left) by adapting the BGG. The BGG model and its variant have been fully formulated and analytically proven in the previous research [17,24,25]. Basically, it predicts the moment one step before the 51 percent attack, which is equivalent with the moment of actions.

This two-player adversarial stochastic game illustrates the interaction within a blockchain network between a defender (player H) and an attacker (player A). The objective of the BGG is to thwart the 51 percent attack while ensuring the network remains decentralized. Both players vie to construct blocks, whether they are legitimate or fraudulent. Let us consider two random variables as the numbers of blocks generated by an attacker and a defender as follows:

A : = \sum_{k \geq 0} X_{k} ε_{s_{k}}, s_{0} (= 0) < s_{1} < s_{2} < \dots, a . s .

(1)

H : = \sum_{j \geq 0} Y_{j} ε_{t_{j}}, t_{0} (= 0) < t_{1} < t_{2} < \dots, a . s .,

(2)

which are marked Poisson processes with intensities

λ_{A}

and

λ_{H}

, respectively. The point process

T

=

\{τ_{0}, τ_{1}, \dots\}

which observes a game at random times [17] is defined as follows:

T : = \sum_{i \geq 0} ε_{τ_{i}}, τ_{0} (> 0)), τ_{1} . \dots,

(3)

which is assumed to be a delayed renewal process and is equivalent with the duration of the PoW completion in a blockchain network [17,24]. If we assign

(A (t), H (t)) : = A \otimes H ([0, τ_{k}]), k = 0, 1, \dots,

(4)

as an observation process on

A \otimes H

embedded over

T,

then the process regarding the respective increments is defined as follow:

(X_{k}, Y_{k}) : = A \otimes H ([τ_{k - 1}, τ_{k}]), X_{0} = A_{0}, Y_{0} = H_{0}, k = 1, 2, \dots .

(5)

From (1)–(5), the observation process which is equivalent with the Proof of Work could be formalized as

A_{k} \otimes H_{k} : = \sum_{k \geq 0} (X_{k}, Y_{k}) ε_{τ_{k}},

(6)

where

A_{k} = \sum_{i = 0}^{k} X_{i}, H_{k} = \sum_{i = 0}^{k} Y_{i},

(7)

with position-dependent marking and with

X_{k}

and

Y_{k}

being dependent on the notation

Δ_{k} : = τ_{k} - τ_{k - 1}, τ_{- 1} = 0, δ = E [Δ_{k}],

(8)

and

γ (g, z) = E [g^{X_{k}} \cdot z^{Y_{k}}], g > 0, z > 0, k = 0, 1, \dots .

(9)

This game is ended when, on the i-th observation epoch

τ_{i}

, either player H surpasses half of the total node M or player A controls more than

(\frac{M}{2}) + B

nodes first. The defined exit indices for the players are as follows:

ν : = \inf \{k : A_{k} \geq (\frac{M}{2}) + B\},

(10)

μ : = \inf \{j : H_{j} \geq (\frac{M}{2})\} .

(11)

where B is the number of the reserved honest nodes [17]. The joint functional of the blockchain network model is as follows:

Φ_{⌈\frac{M}{2}⌉} = Φ_{⌈\frac{M}{2}⌉} (ξ, g_{0}, g_{1}, z_{0}, z_{1}) = E [ξ^{ν} \cdot g_{0}^{A_{ν - 1}} \cdot g_{1}^{A_{ν}} \cdot z_{0}^{H_{ν - 1}} \cdot z_{1}^{H_{ν}} 1_{\{ν < μ\}}],

(12)

where

⌈\frac{M}{2}⌉

is 51 percent of the nodes (or ledgers) in a blockchain network. This functional represents the status of false and honest nodes at the exit time

τ_{ν}

. The explicit formula

Φ_{\frac{M}{2}}

and the special operators

D

and

D

have been introduced from the previous research. These operators make it analytically possible to solve the BGG [17,24]:

D_{(x, y)} [f (x, y)] (u, v) : = (1 - u) (1 - v) \sum_{x \geq 0} \sum_{y \geq 0} f (x, y) u^{x} v^{y},

(13)

then

f (x, y) = D_{(u, v)}^{(x, y)} [D_{(x, y)} \{f (x, y)\}],

(14)

and

\{f (x, y)\}

is a sequence, with the inverse

D_{(u, v)}^{(m, n)} (•) = (\frac{1}{m! n!}) lim_{(u, v) \to 0} \frac{\partial^{m} \partial^{n}}{\partial u^{m} \partial v^{n}} \frac{1}{(1 - u) (1 - v)} (•),

(15)

where

m \geq 0

. From [17], the functional

Φ_{\frac{M}{2}}

from (12) concludes the following expression:

Φ_{\frac{M}{2}} = D_{(u, v)}^{(\frac{M}{2}, \frac{M}{2})} [Γ_{0}^{1} - Γ_{0} + \frac{ξ \cdot γ_{0}}{1 - ξ γ} (Γ^{1} - Γ)] .

(16)

From (16), the PGF (probability-generating function) of the exit index

ν

is as follows:

E [ξ^{ν}] = Φ_{⌈\frac{M}{2}⌉} (ξ, 1, 1, 1, 1) .

(17)

The moment of taking an action

τ_{ν - 1}

can be analytically found from (17):

E [τ_{ν - 1}] = E [τ_{0}] + E [Δ_{1}] (E [ν] - 1),

(18)

where

E [ν] = \frac{\partial}{\partial ξ} Φ_{⌈\frac{M}{2}⌉} (ξ, 1, 1, 1, 1) |_{ξ = 1} .

(19)

2.2. BGG-Based DP Defense Structure

The distributed AI models and a BBG controller are hooked up as one blockchain network (see Figure 3). The AI models in a cloud network can broadcast their updated hyper-parameters and the BGG controller will share the information about executing the safety mode. The defense controller contains the BGG controller and backup ledgers for the safety mode and each AI model contains a ledger. The safety mode (i.e., releasing reserved ledgers in the defense control center) is executed one step prior to the attacks and these moments are predicted from the BGG controller. All these components are considered as the nodes in a conventional blockchain network.

The trained AI models as components in a distributed network are capable of communicating with other components and constructing ledgers. These AI models could produce the updated hyper-parameters and the boundaries of each AI model. The defense control center contains the BGG controller and the reserved ledgers for the safety operation. The BGG controller also broadcasts the values of the BGG decision parameters to other components. Sharing these values from AI models and a BGG controller is a transaction in a conventional blockchain network.

3. Novel Design of BGG Controller for DP Defense

This section deals with the novel system design approach used to develop the core component of the blockchain based on the DP defense system. As discussed in Section 2, the BGG controller in the defense control center consists of the Predictor and the BGG decision engine. Novel system design is the part of system engineering which includes designing the architecture, product design, modules, interfaces, and data for a system to satisfy specified requirements [26]. A system diagram describes the inputs, outputs, and system components of a system. The diagram of the BGG controller was also developed during the system design procedure and is shown in Figure 4. The Predictor and the BGG decision engine are the main components of the BGG controller, and these are explained in the following subsections.

3.1. The Predictor

The Predictor in the BGG controller provides the best moment of decision-making

τ_{ν - 1}

based on the inputs, which are mainly the numbers of genuine and corrupted nodes (

A_{k}

,

H_{k}

, and

A_{0}

) from (4)–(7). The BGG engine provides the status of the network at the time

τ_{ν} .

The hyper-parameters M (total number of nodes in the network) and

δ

from (8) are pre-configured. Other parameters

λ_{A}

and

λ_{H}

(the average numbers of corrupted and genuine nodes in the input data) are statistically obtained from the input data during the training phase. It is noted that the overall performance of the combined system will be measured instead of measuring the performance of the Predictor individually (see Section 4). A convolutional neural network (CNN) has been selected for the design of the Predictor in the BGG controller because CNNs and their variants for cybersecurity have been highly successful in classification tasks across different research domains [27]. The CNN for the Predictor within the BGG controller has been selected because of their established effectiveness in classification tasks, particularly within the field of cybersecurity. CNNs are highly skilled at identifying complex patterns and structures in data, making them particularly suited for analyzing network node behavior during potential DP attacks. In contrast to other predictive models that may face challenges with high dimensionality or non-linear relationships, CNNs can effectively handle large datasets while preserving accuracy. The predictive power of CNNs, showcased by achieving an impressive accuracy rate of 96% in training simulations, highlights their dependability in anticipating attack patterns and enabling timely defensive actions. The enhanced performance driven by the complexity of the CNN significantly strengthens decision-making processes crucial for safeguarding AI models against the vulnerabilities posed by DP attacks. Therefore, integrating CNN technology into the BGG framework emerges as a prudent and strategic decision. A CNN with 10 hidden layers has been adapted to calculate the moment of releasing reserved ledgers (i.e.,

τ_{ν - 1}

) in the defense center and predict the result of attacks where an attacker takes a blockchain network by governing more than a half of total nodes.

The Predictor predicts the moment of executing the safety mode

τ_{ν - 1}

as the output of the system. The output indicates the moment of a preliminary action, which depends on the behaviors of both an attacker and a defender. These behaviors are the inputs of the machine learning component. The output of the Predictor are the moments of executing the safety mode

τ_{ν - 1}

, which mainly depend on the inputs of the system. The ML technique generates a multi-variable regression function as a machine learning engine to predict the optimal moment of an action

τ_{ν - 1}

, and the 3D illustration of a regression is shown in Figure 5 for demonstration purposes. The figure provides the relationship between the number of honest nodes and attack nodes with the moment of execution

τ_{ν - 1}

. The x-axis illustrates the number of honest nodes, while the y-axis indicates the number of attack nodes, both ranging from 0 to 50. The z-axis measures

τ_{ν - 1}

, which can reach up to 15 s. The gradient of color from blue to green suggests changes in the computational time required under varying conditions of honest and attack nodes. This visualization conveys how the balance between honest and attack nodes affects the performance of the system and response time in a cybersecurity context. It is noted that a different dataset might draw a different shape of graph because it generates a different regression function. During the training phase for the Predictor, the mean of the reserved nodes could be statistically calculated as follows:

E [B] = (\frac{1}{n^{*}}) \sum_{j \geq 1} [{((A_{ν}^{j}) - \frac{M}{2})}^{+}],

(20)

and

n^{*} = n (\{{}^{\forall}j : (A_{ν}^{j}) \geq \frac{M}{2}\}),

(21)

where

n^{*}

is a counting value when the input data

A_{ν}

pass

M / 2

and a gap average between

A_{ν}

and

M / 2

within a training dataset becomes the mean of the reserved nodes B from (20). The optimal number of reserved nodes

B^{*}

is revised from the mean of the reserved nodes. The default setup in programming for this demonstration is

B^{*} = c \cdot E [B]

, where c is an arbitrary value to determine the number of reserve nodes (

c \in Z^{+}

). If the value c is bigger, the system keeps more backup nodes, which means the cost of the system is becoming expensive but more secure. The system is becoming lighter but less secure if the value is smaller. The value is fixed as a default setup (i.e.,

c = 3

) in this simulation. All hyper-parameters for the Predictor, including M,

δ

,

λ_{A}

, and

λ_{H}

, are either pre-configured or statistically obtained during the training phase of the Predictor. Although the Predictor is also a ML model, it is not the same as the AI models for protection from DP attacks.

3.2. BGG Decision Engine

The BGG decision engine, simply called the BGG engine, emulates the BGG model to provide the best strategic choice at the moment of an action. According to the BGG [17], player H (a defender) can choose one of the following strategies:

Normal mode: a normal operation which implicates no attacks are occurred;
Safety mode: the network is running in safety mode by releasing backup nodes.

On the other hand, player A (an attacker) might succeed in catching the blocks or fail to catch the honest nodes. Therefore, his response to player A would be one of the following:

Not Burst: the system is defended from attacks;
Burst: the system is burst by attacks.

The best response of an attacker

q (s_{H})

is described based on the burst of a blockchain network, which depends on the strategic decision of player H (a defender):

q (s_{H}) = \{\begin{matrix} E [1_{\{A_{ν} \geq \frac{M}{2}\}}], s_{H} = \{Normal\}, \\ E [1_{\{A_{ν} \geq \frac{M}{2} + B\}}], s_{H} = \{Safety\} . \end{matrix}

(22)

After predicting the moment of taking an action (i.e., executing Safety mode) from a Predictor, a BGG engine takes one action out of three strategic choices to win this game. Although a BGG in the theory has only two strategic choices (i.e., the Normal or Safety modes), the BGG engine has the Burst mode in addition to the two choices. The Burst mode indicates that the defense system cannot defend against attacks because attacks are heavier than its defense capacity (i.e., there are not enough reserved nodes). The best response of an attacker for Burst mode is the same as the response of Safety mode. It is noted that Burst mode is a strategic choice from a defender which is not the same as Burst status from an attacker. The algorithm for choosing a proper strategy in the BGG engine is given in Algorithm 1.

Algorithm 1: Algorithm for selecting the best strategic choice.

4. BGG Controller Simulation Results

The datasets for training and testing are randomly generated because actual datasets for governing blocks are not available at this moment. The training and testing datasets for the BGG controller were generated through a systematic random sampling process, addressing the absence of actual datasets for governance blocks. The generated samples are mainly the numbers of attacked nodes at each moment

τ_{ν - 1}

for making the decision. The decision is about choosing the best strategy for operating the DP defense mechanism. Specifically, a total of 10,000 samples were created, consisting of simulated scenarios where nodes are attacked at specific moments

τ_{ν - 1}

. These datasets comprise separate sets, with 70% allocated for training, 15% for validation, and 15% for testing. The Predictor is trained to predict whether an attacker governs more than a half of the total number of nodes at the next observation moment

τ_{ν}

. This simulation includes generating datasets for both training and testing. The progress measures of the CNN (i.e., the Predictor) are shown in Table 1. This structure ensures a robust evaluation of the range of attack patterns and dynamics between genuine and corrupted nodes. During the training phase, the Predictor in the BGG controller is tasked with predicting whether an attacker could gain control over more than half of the total nodes, demonstrating the capacity of the model to generalize these simulated conditions for real-world scenarios effectively. Such meticulous dataset generation and partitioning are crucial for enhancing the controller’s predictive capabilities against data poisoning attacks.

It is noted that the measurements of the BGG controller might not be the same because the training data for the machine learning are generated differently for each trial. For this trial, the best cross-entropy performance was

0.6248

at 6 epochs, shown in Figure 6a. The dataset for training is divided into three different sets: 70% for training, 15% for testing, and another 15% for validation within the training dataset. The overall training session was settled after 12 epochs in the validation.

An error histogram represents the overall gap between target values and predicted values after the training of a feed-forward CNN and the Predictor has been trained based on 10,000 samples. The error histogram based on the simulated training data is shown in Figure 6b and the bin corresponding error is around

- 0.1203

with 20 bins. The simulation also provides the regression graph of the Predictor training, which is shown in Figure 7a. Additionally, the regression function between the predicted values (i.e., target) and the output values is as follows:

Y = 0.54 \cdot T + 3.1,

(23)

where T is the target variable and Y is the output variable. In this simulation, the mean square error (MSE) of the training is

0.735

. The regression function (23) indicates the linear relationship between the prediction and the output based on the machine learning training.

In the testing phase of the BGG controller, an additional dataset consisting of 10,000 attack samples was provided for evaluation. This dataset was used to assess the BGG controller, which operates in conjunction with the Predictor and the BGG engine. It is essential to recognize that the testing outcomes for the BGG controller may differ due to the variations in how the testing data are generated. In Figure 7b, the confusion matrix illustrates the performance of the BGG controller, achieving an impressive prediction accuracy of 96%. Despite correctly predicting some attempted attacks (82 out of 10,000), the BGG controller cannot defend against them as it exceeds system capacity limits due to a shortage of reserved nodes. Consequently, the BGG controller achieves a performance score of 94.2% (=

97.1 % - 2.9 %)

. It is noted that repeated training may yield different simulation results owing to variations in the initial conditions and the selection of folding samples.

5. Conclusions

This paper presents a secured execution environment that adapts blockchain technology to defend against data poisoning (DP) attacks targeting artificial intelligence (AI) systems during the inference phase. This research introduces one of the first blockchain-based DP defense mechanisms, establishing an innovative framework grounded in the Blockchain Governance Game (BGG). The proposed model relies on stochastic predictions, presuming specific distributions of node behaviors, which may not be accurate under real-world adversarial conditions where unexpected attack patterns can complicate its effectiveness. Nevertheless, this work significantly enhances the integration of blockchain technology into DP defense systems for AI, addressing vulnerabilities and improving resilience against DP attacks. The convolutional neural network is integrated into the Predictor, and the new decision system is designed to bolster DP defense strategies through a blend of mathematics and AI techniques. Simulations of the BGG controller illustrate realistic DP attack scenarios, emphasizing how the BGG engine secures AI models within a distributed blockchain network. Future research could focus on adapting various machine learning algorithms to predict attack timing during both training and inference phases, and developing advanced training methods utilizing real network traffic data to refine defense mechanisms against data poisoning attacks, ultimately enhancing the integrity and reliability of AI systems in dynamic environments.

Funding

This work was supported in part by Macao Polytechnic University (MPU) under grant RP/FCA-05/2024.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The corresponding MATLAB codes are publicly available on GitHub (https://github.com/amangkim/mlebgg, accessed on 2 April 2025) for users to perform demonstrations so they can fully understand the algorithms in this paper.

Acknowledgments

This paper was revised using AI/ML-assisted tools. Special thanks are given to the reviewers who provided valuable advice for improving this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Barreno, M.; Nelson, B.; Sears, R.; Joseph, A.D.; Tygar, J.D. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security, Taipei, Taiwan, 21–24 March 2006; pp. 16–25. [Google Scholar]
Xu, Z.; Saleh, J.H. Machine learning for reliability engineering and safety applications: Review of current status and future opportunities. Reliab. Eng. Syst. Saf. 2021, 211, 107530. [Google Scholar]
Olufowobi, H.; Engel, R.; Baracaldo, N.; Bathen, L.A.D.; Tata, S.; Ludwig, H. Data Provenance Model for Internet of Things (IoT) Systems. In Service-Oriented Computing—ICSOC 2016 Workshops, Proceedings of the ASOCA, ISyCC, BSCI, and Satellite Events, Banff, AB, Canada, 10–13 October 2016; Springer: Cham, Switzerland, 2016; pp. 85–91. [Google Scholar]
Al Hammadi, A.Y.; Lee, D.; Yeun, C.Y.; Damiani, E.; Kim, S.K.; Yoo, P.D.; Choi, H.J. Novel EEG Sensor-Based Risk Framework for the Detection of Insider Threats in Safety Critical Industrial Infrastructure. IEEE Access 2020, 8, 206222–206234. [Google Scholar]
Biggio, B.; Corona, I.; Maiorca, D.; Nelson, B.; Šrndić, N.; Laskov, P.; Giacinto, G.; Roli, F. Evasion Attacks against Machine Learning at Test Time. In Machine Learning and Knowledge Discovery in Databases, Proceedings of the European Conference, ECML PKDD 2024, Vilnius, Lithuania, 9–13 September 2024; Blockeel, H., Kersting, K., Nijssen, S., Železný, F., Eds.; Springer: Cham, Switzerland, 2013; pp. 387–402. [Google Scholar]
Ramírez, M.A.; Kim, S.K.; Hamadi, H.A.; Damiani, E.; Byon, Y.J.; Kim, T.Y.; Cho, C.S.; Yeun, C.Y. Poisoning Attacks and Defenses on Artificial Intelligence: A Survey. arXiv 2022, arXiv:2202.10276. [Google Scholar]
Rizk, A.; Elragal, A. Data science: Developing theoretical contributions in information systems via text analytics. J. Big Data 2020, 1, 1–26. [Google Scholar]
Sarker, I.H.; Abushark, Y.B.; Khan, A.I. ContextPCA: Predicting Context-Aware Smartphone Apps Usage Based On Machine Learning Techniques. Symmetry 2020, 12, 499. [Google Scholar] [CrossRef]
Wall, M.E.; Rechtsteiner, A.; Rocha, L.M. Singular Value Decomposition and Principal Component Analysis. arXiv 2002, arXiv:physics/0208101. [Google Scholar]
Kim, S.K.; Yeun, C.Y.; Damiani, E.; Lo, N.W. A machine learning framework for biometric authentication using electrocardiogram. IEEE Access 2019, 7, 94858–94868. [Google Scholar] [CrossRef]
Kim, S.K.; Yeun, C.Y.; Yoo, P.D. An enhanced machine learning-based biometric authentication system using RR-interval framed electrocardiograms. IEEE Access 2019, 7, 168669–168674. [Google Scholar] [CrossRef]
Demontis, A.; Melis, M.; Biggio, B.; Maiorca, D.; Arp, D.; Rieck, K.; Corona, I.; Giacinto, G.; Roli, F. Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection. IEEE Trans. Dependable Secur. Comput. 2019, 16, 711–724. [Google Scholar]
Zhu, Z.A.; Lu, Y.Z.; Chiang, C.K. Generating Adversarial Examples By Makeup Attacks on Face Recognition. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2516–2520. [Google Scholar]
Liu, X.; Xie, L.; Wang, Y.; Zou, J.; Xiong, J.; Ying, Z.; Vasilakos, A.V. Privacy and Security Issues in Deep Learning: A Survey. IEEE Access 2021, 9, 4566–4593. [Google Scholar] [CrossRef]
Chang, S.; Deng, Y.; Zhang, Y.; Zhao, Q.; Wang, R.; Zhang, K. An Advanced Scheme for Range Ambiguity Suppression of Spaceborne SAR Based on Blind Source Separation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5230112. [Google Scholar] [CrossRef]
Kim, S.K. Various Blockchain Governance Games: A Review. Mathematics 2023, 11, 2273. [Google Scholar] [CrossRef]
Kim, S.K. Blockchain Governance Game. Comput. Ind. Eng. 2019, 136, 373–380. [Google Scholar] [CrossRef]
Kan, L.; Wei, Y.; Hafiz Muhammad, A.; Siyuan, W.; Gao, L.C.; Kai, H. A Multiple Blockchains Architecture on Inter-Blockchain Communication. In Proceedings of the 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), Lisbon, Portugal, 16–20 July 2018; pp. 139–145. [Google Scholar]
Miller, D. Blockchain and the Internet of Things in the Industrial Sector. IT Prof. 2018, 20, 15–18. [Google Scholar] [CrossRef]
Fiaidhi, J.; Mohammed, S.; Mohammed, S. EDI with Blockchain as an Enabler for Extreme Automation. IT Prof. 2018, 20, 66–72. [Google Scholar] [CrossRef]
Samaniego, M.; Jamsrandorj, U.; Deters, R. Blockchain as a Service for IoT. In Proceedings of the 2016 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), Chengdu, China, 15–18 December 2016; pp. 433–436. [Google Scholar]
Aste, T.; Tasca, P.; Di Matteo, T. Blockchain Technologies: The Foreseeable Impact on Society and Industry. Computer 2017, 50, 18–28. [Google Scholar]
Kim, S.K. Enhanced IoV Security Network by Using Blockchain Governance Game. Mathematics 2021, 9, 109. [Google Scholar] [CrossRef]
Kim, S.K. Strategic Alliance for Blockchain Governance Game. Probab. Eng. Informational Sci. 2020, 36, 184–200. [Google Scholar]
Kim, S.K. Multi-Layered Blockchain Governance Game. Axioms 2022, 11, 27. [Google Scholar] [CrossRef]
Whitten, J.; Bentley, L. System Analysis and Design Methods; Mc-Graw Hill: New York, NY, USA, 2005. [Google Scholar]
Kim, S.K.; Feng, X.; Hamadi, H.A.; Damiani, E.; Yeun, C.Y.; Nandyala, S. Advanced Machine Learning Based Malware Detection Systems. IEEE Access 2024, 12, 115296–115305. [Google Scholar]

Figure 1. Data poisoning attacks [6].

Figure 2. Blockchain Governance game [17].

Figure 3. BGG Adapted DP Defense Structure.

Figure 4. BGG Controller.

Figure 5. The moment of taking an action by the Predictor.

Figure 6. (a) Validation of CNN performance determined by cross-entropy; (b) the error histogram of the training dataset.

Figure 7. (a) CNN training regression result of the Predictor; (b) the confusion matrix of the BGG controller.

Table 1. Progress of CNN training for the Predictor.

Layers	10
Epoch	12 iterations
Time	<0.02 s
Performance	0.662
Gradient	0.0285
Validation check	6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, S.-K. Enhanced Blockchain-Based Data Poisoning Defense Mechanism. Appl. Sci. 2025, 15, 4069. https://doi.org/10.3390/app15074069

AMA Style

Kim S-K. Enhanced Blockchain-Based Data Poisoning Defense Mechanism. Applied Sciences. 2025; 15(7):4069. https://doi.org/10.3390/app15074069

Chicago/Turabian Style

Kim, Song-Kyoo. 2025. "Enhanced Blockchain-Based Data Poisoning Defense Mechanism" Applied Sciences 15, no. 7: 4069. https://doi.org/10.3390/app15074069

APA Style

Kim, S.-K. (2025). Enhanced Blockchain-Based Data Poisoning Defense Mechanism. Applied Sciences, 15(7), 4069. https://doi.org/10.3390/app15074069

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Blockchain-Based Data Poisoning Defense Mechanism

Abstract

1. Introduction

2. Novel Blockchain Adapted Data Poisoning Defense Mechanism

2.1. Theoretical Background

2.2. BGG-Based DP Defense Structure

3. Novel Design of BGG Controller for DP Defense

3.1. The Predictor

3.2. BGG Decision Engine

4. BGG Controller Simulation Results

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI