3.1. System Model
This research’s system architecture, as shown in 
Figure 1, includes a blockchain maintained collectively by all clients. During a training round, all clients are randomly assigned to three roles: miners, working clients, and validating clients. The working clients engage in federated learning for data training, incorporating distillation defense mechanisms during the process to safeguard local data. To ensure model security and quality, the system employs validating clients to assess the performance of the models uploaded by working clients. It is important to note that since validators do not directly access local data from other clients, this study uses proxy comparison methods for data verification. Finally, the system records the process of each training round and its ultimate outcome through selected miner roles onto the collectively maintained blockchain.
To ensure fairness and effectiveness, this study employs role rotation and role-based incentive mechanisms. These mechanisms are employed by the system to ensure each participant contributes their data to the federated learning training and receives corresponding incentives to sustain system operations. Moreover, the role-based incentive mechanism serves as the criterion for selecting the crucial role of miners within the system.
In traditional federated learning, the global model 
G is the weighted average of all local models 
. The aggregation process can be represented as follows:
In this equation, 
n represents the number of clients participating in federated learning, and 
 signifies the weight of the local model 
 used for weighting the local model. In this study, during distillation, the prediction outcomes of each local model and the global model are fused for training the local model. The loss function for each local model 
 is represented as Equation (
2):
        where 
 represents the loss function on the 
i-th device, 
 is the cross-entropy loss function used to measure the performance of the local model 
 on local data 
, 
 is the knowledge distillation loss function employed to assess the disparity between the local model 
 and the global model 
G, and 
 is a hyperparameter used to weigh the local model’s prediction against the global model’s prediction, with a larger 
 emphasizing the influence of the global model more strongly. This study considers the local weight 
 during the aggregation process and introduces the temperature parameter 
 from knowledge distillation to control the degree of softening of the parameters. A larger 
 results in smoother model predictions, while a smaller one emphasizes the model’s own predictions more. In the experimentation phase, this temperature parameter is adjusted by the local clients to control the strength of knowledge distillation.
In the process of distillation defense, each local model 
 retains distilled knowledge. This study incorporates this knowledge into the global model 
G using the following approach, as shown in Equation (
3):
In this context,  serves as a hyperparameter governing the influence of distilled knowledge on the weights of the global model. A higher  value tends to make the global model more affected by distilled knowledge, while a lower  value tends to preserve the model’s own weights to a greater extent.
  3.3. Workflow
Assuming a set of clients  participating in federated learning, experiencing  rounds of learning, each client is randomly designated as a miner m (), a validator v (), or a working client d (), where . Clients perform their respective functions based on their assigned roles. Each client possesses a unique ID for identification, which is also the client’s public key used for verifying generated transactions or transaction blocks. In each round , the global model  is constructed. The primary steps and Algorithm 1 are outlined as follows:
Step 1: Each selected working client d for training utilizes the global model  to generate a local model . In traditional federated learning,  is directly used to compute the global model ; in the proposed model of this study,  must undergo validation by the validator.
Step 2: Based on the learning outcomes, the working clients calculate the expected reward  and encapsulate the expected reward  along with the local model  into transactions  signed with the private key of each working client, which are then sent to the validator v.
Step 3: The validator v verifies the transactions  from the working clients and obtains the validation reward .
Step 4: Based on local data, the validator 
v votes 
, indicating affirmation or negation towards the model.
        
Step 5: The validator updates the model locally for voting purposes.
Step 6: The validator’s transaction 
 is broadcasted to the miners, who obtain all validator voting results, as shown in Equation (
5).
        
| Algorithm 1: Distillation Defense Combined With Blockchain In Decentralized Federated Learning | 
| ![Electronics 13 00679 i001]() | 
If  is voted as positive within a legitimate block , the working client involved in the training receives the reward . This process ensures model quality and legitimacy for federated learning tasks through reward and validation mechanisms.
In this validation mechanism, it is essential to note that the validator can only access 
 but cannot directly access the 
 model used for training, the training data 
, or the testing data 
. Consequently, the validator cannot obtain 
 or 
. Here, 
 denotes the validator’s assessment of the accuracy and performance of the local model 
, while 
 represents the validator’s evaluation of the accuracy and performance of the previous round’s global model. To address this issue, this study introduces the validator 
v employing a proxy comparison method. Specifically, the validator compares the local model 
 provided by the working client 
 on the validator client’s testing dataset accuracy 
 with the previous round’s global model 
 accuracy 
 on the validator client’s dataset. This comparison serves as a proxy, representing the precision comparison between the working client’s model 
 and the previous round’s global model 
.
        
In Equation (
6), the 
 represents a threshold used to determine the tolerance for testing accuracy decline, while 
 denotes the validator’s vote, signifying their stance on the local model. If the decline in testing accuracy exceeds the threshold, the vote is against (Negative); otherwise, it is in favor (Positive). This proxy comparison method aids the validator in assessing the quality of the working client’s model without directly accessing the training data on the working client’s end, thus enhancing privacy and security assurance concerning data in federated learning.
  3.5. Incentive Mechanism Based on Role Rewards
In order to encourage the active participation of nonmalicious clients in the training process of federated learning, this study introduces an incentive mechanism based on role rewards. The design of this incentive mechanism aims to promote full cooperation among participants in the decentralized federated learning on the blockchain. By rewarding the working clients, the system stimulates their desire to actively engage in model training, particularly through distillation defense rewards, ensuring the security and resilience of the model. Validators receive rewards by verifying the votes and transaction signatures of working clients, emphasizing the importance of validation and aiding in resisting manipulation. For miner clients, rewards are related to the number of valid transactions they validate, effectively reducing the threat of witch attacks. Through unit rewards and differentiated incentives, the system ensures fairness and balance, allowing each participant to receive appropriate rewards based on their contributions and tasks. This establishes an attractive, secure, and fair decentralized federated learning ecosystem based on blockchain.
  3.5.1. Rewards for Working Clients
The rewards for the working clients engaged in model training consist of two parts: the reward obtained by submitting the local model 
 and the distillation reward acquired by training with their own local model. Firstly, during the voting process, if the positive votes received by 
 (denoted as 
) are greater than or equal to the negative votes (denoted as 
), then the participating working client 
 qualifies for a reward, as indicated by Equation (
7):
The reward is proportional to the number of data samples 
 in the local training process. Secondly, to incentivize the working client 
 engaged in model training to use distillation defense to enhance their local security and simultaneously improve training accuracy while reducing sensitivity to perturbations through distillation, this research introduces a distillation reward term 
. This reward is associated with the effectiveness of 
 after distillation, ensuring that the working client selects an appropriate temperature during the distillation process that does not compromise training accuracy while increasing sensitivity to perturbations. The effectiveness of this distillation defense is measured by the difference between the testing accuracy 
 of the local model 
 on the validation dataset and the testing accuracy 
 of the previous round’s global model 
 on the same validation dataset. The better the distillation effect, the more distillation reward the working client 
 receives, as depicted by Equation (
8):
          where 
 is the testing accuracy of the working client 
 using the local model 
 after employing distillation defense on the validation dataset, 
 is the testing accuracy of the previous round’s global model 
 on the same validation dataset, 
 represents the number of samples in the validation dataset of the working client 
, and 
 denotes the unit reward for distillation defense. Each participating training working client 
 conducts distillation locally. Every working client involved in training uses its local model to generate soft targets. The specific process is as follows: The participating training working client uses its local data to train the original model 
, as depicted in Equation (
9):
Here, 
 represents the original model parameters on the client 
, and 
L denotes the loss function. The original model generates a probability vector as soft targets 
P used for training a smaller model 
 on the corresponding client. This smaller model shares the same network structure as the original model but has fewer parameters, allowing it to retain the knowledge of the original model while reducing computational resource requirements and sensitivity to disturbances. The process of generating soft targets 
P is shown in Equation (
10):
Here, the Softmax function is used for normalization, providing the original probability distribution, and 
T is the temperature parameter utilized to regulate the smoothness of the soft target distribution. This study utilizes the output of the original model as soft targets, enabling the model to intricately learn relationships between different categories to defend against adversarial attacks and consequently enhance model performance. The process of training the smaller model 
 is illustrated in Equation (
11):
Here, L represents the loss function, and  denotes the soft targets derived from the original model. In essence, the training of  involves optimizing the loss function L to determine the parameters that result in the minimum loss on the client ’s local dataset  under the softening target .
The specifics of the temperature parameter adjustment process are exhibited in the experiment section. After the completion of training for each participating working client, they continue to engage in the decentralized federated learning iteration process, aiming to enhance the model’s robustness and reduce the risk of adversarial sample attacks through multiple iterations.
  3.5.2. Rewards for Validators
The validator’s reward consists of two parts: the validation reward 
 and the validation voting reward 
, as shown in Equation (
12):
For the validation reward , it is calculated based on the validator ’s successful verification of the signed transactions . If the validator successfully verifies , they receive the validation reward.  represents the number of work transactions  successfully verified by the validator , and  denotes the unit reward.
Regarding the validation voting reward , this reward is calculated based on the validator ’s participation in assessing the votes generated by the working client , denoted by . If the validator participates in the voting process based on the proxy comparison logic, they receive the validation voting reward.  represents the number of times the validator  has engaged in voting, specifically in assessing the votes generated by the working client .
The validator’s reward mechanism primarily focuses on the validator client’s duties in voting and verifying to uphold the security and legitimacy of the federated learning system. The rewards for validator clients are contingent on their level of activity and participation, ensuring their effective execution of validation tasks.
  3.5.3. Rewards for Miners
The reward for the miner client is represented in Equation (
13):
Here,  denotes the reward for the miner client  in round , and  represents the unit reward. The miner client’s reward depends on the number of transactions  it validates. This incentivizes miner clients to actively validate transactions related to the model updates .
In this research system, miner clients primarily serve the role of blockchain validation and propagation. They ensure the legitimacy of transactions by validating transaction signatures and packaging legitimate transactions into blocks. Their role is an integral part of maintaining the security and legitimacy of the system implemented in this research.
  3.6. Distillation Defense Principle
This study employs distillation defense as a method to counter adversarial sample attacks. This section elaborates on the algorithm and theoretical justification of distillation defense. The algorithm is presented as shown in Algorithm 2.
The distillation defense method in this research is achieved by adjusting the temperature values of neural networks during training to enhance the model’s defensibility. During the training phase, we set the model’s temperature value relatively high, which smoothens the model’s outputs, thereby reducing sensitivity to minor input variations and subsequently mitigating the impact of adversarial sample attacks. In the subsequent aggregation phase, we adjust the parameters appropriately. For the specific parameter adjustment process, please refer to the experimental section. Because the information from the smoothed outputs learned during training is embedded in the model parameters, even with temperature adjustments afterward, the model will maintain a certain level of robustness.
Next, this research theoretically demonstrates this process: Firstly, defining a Jacobian matrix considering the partial derivatives of the output vector 
 of the model f with respect to the input vector 
x, as shown in Equation (
14):
| Algorithm 2: Temperature-Based Distillation Defense | 
| ![Electronics 13 00679 i002]() | 
Under high-temperature training, the model’s output distribution becomes smoother, reflected by the model’s probability distribution P. Under the influence of , the output probabilities of the model become more uniform, reducing differences between output categories. Simultaneously, this smoothness affects the model’s sensitivity to small changes in input, i.e., the change in output concerning the input. During high-temperature training, due to the smoother output distribution, the absolute values of elements in the Jacobian matrix decrease. As the elements of the Jacobian matrix describe the sensitivity of the model output concerning the input, the smoothening of the output distribution results in a weakened response to small changes in the input.
During adversarial sample attacks, the slight perturbations in input data are suppressed by the weakened sensitivity of the model. This is because the model has learned a smooth response to small input variations during training. The reduced sensitivity to minor input changes due to the stability of the output makes the model more robust against adversarial sample attacks that exhibit considerable jitter.
In the framework of differential privacy, this study introduces the definition of differential privacy for the model output  with respect to the input. For two adjacent inputs x and , , we expect the difference between the outputs  and  to be bounded.
The formal definition of differential privacy can be expressed as
        
In Equation (
15), 
 represents the sensitivity of the model output to small changes in the input, 
 is the perturbation range of the input, and 
 is the privacy parameter of differential privacy. A larger value of 
 corresponds to a reduced sensitivity of the model to small changes in the input, resulting in more bounded differences in the output, thereby enhancing privacy protection.
Through the above formalization, this study theoretically proves that under high-temperature training, the sensitivity of the model’s output to small changes in the input is protected by differential privacy. This protection makes the model more robust against adversarial sample attacks, as the sensitivity reduction of the model to small perturbations in input data suppresses the risk of privacy leakage.