Federated Transfer Learning Strategy: A Novel Cross-Device Fault Diagnosis Method Based on Repaired Data

Federated learning has attracted much attention in fault diagnosis since it can effectively protect data privacy. However, efficient fault diagnosis performance relies on the uninterrupted training of model parameters with massive amounts of perfect data. To solve the problems of model training difficulty and parameter negative transfer caused by data corruption, a novel cross-device fault diagnosis method based on repaired data is proposed. Specifically, the local model training link in each source client performs random forest regression fitting on the fault samples with missing fragments, and then the repaired data is used for network training. To avoid inpainting fragments to produce the wrong characteristics of faulty samples, joint domain discrepancy loss is introduced to correct the phenomenon of parameter bias during local model training. Considering the randomness of the overall performance change brought about by the local model update, an adaptive update is proposed for each round of global model download and local model update. Finally, the experimental verification was carried out in various industrial scenarios established by three sets of bearing data sets, and the effectiveness of the proposed method in terms of fault diagnosis performance and data privacy protection was verified by comparison with various currently popular federated transfer learning methods.


Introduction
With the rapid development of digital intelligent manufacturing, data-driven deep learning methods have made significant progress [1,2]. Various deep learning networks, including computer vision, natural language processing, and autonomous driving, continue to emerge in an endless stream [3,4]. These advancements not only enhance the reliability of equipment utilized in intelligent manufacturing but also improve work safety while reducing maintenance costs [5]. Although deep learning methods alleviate the requirement for operator expertise, the high performance of the network often relies on feature knowledge obtained from a large amount of high-quality training and testing data [6,7].
In practical scenarios, the majority of users in the industrial field possess private condition monitoring data, and there exist analogous mechanical equipment configurations among them [8]. Therefore, amalgamating the condition monitoring knowledge of multiple users to construct a global model for intelligent fault diagnosis can effectively address the issue of insufficient individual user data. However, the device data collected during actual production often contains a significant amount of company-protected device privacy information that is not shared with other users. Thus, centralized data management and centralized fault diagnosis model training for each client are no longer viable [9,10]. In recent years, a federated learning strategy has been proposed to address the issue of collaborative diagnosis among multiple users, effectively mitigating the non-circulation of diagnostic knowledge caused by data privacy concerns [11]. The concept of federated learning was initially introduced by Mcmahan et al. [12], who pointed out that the central server is used to manage the model communication between each client, and the models of each client are averaged. Li et al. [13] proposed a MOON network that leverages model representation similarity to rectify the local training losses of each client, thereby presenting a simplified federated learning framework that effectively addresses the challenge of image heterogeneity adaptation across multiple users. Considering the inherent heterogeneity of local data distribution, Marfoq et al. [14] propose a federated learning strategy for multitask learning that captures complex relationships among personalized networks through penalty terms.
Despite the preliminary progress made in research on protecting data privacy through federated learning, further advancements are necessary to fully address this issue [15]. Existing federated learning methods often assume that users conform to the same data distribution, meaning that each user collects information on similar mechanical equipment under comparable working conditions. However, in practical scenarios, due to diverse project requirements and distinct operating conditions of industrial equipment, there exist significant discrepancies in data distribution among customers, which pose challenges for the generalization of conventional fault diagnosis methods [16]. Transfer learning breaks the basic assumption that training data and test data must satisfy independent and identical distribution, as it enables the transfer of labeled information from a source domain to diagnose unknown target domain samples [17]. Chen et al. [18] proposed a dual adversarial-guided unsupervised multi-domain adaptation network, which constructed the edge confrontation module (EA-Module) to extract the common features of samples in multiple sets of source domains and validated the method on the transfer task of a rotating machinery dataset. Li et al. [19] proposed a novel joint attention feature transfer network to address the issue of data imbalance in real-world industrial scenarios. Experimental results on the gearbox dataset demonstrate its superior adaptability to sample scarcity.
The existing federated transfer learning model often assumes that each user has stored relatively complete and perfect data sample information when solving multi-user fault diagnosis tasks, which is not common in actual industrial scenarios. Simultaneously, the federated learning strategy does not fully utilize the diagnostic knowledge learned from each source client for other source clients after the global model communication link or completely abandons the accumulated sample diagnostic knowledge of each source client in the local client training link in the current research.
Referring to the aforementioned issues in current research, this study proposes a federated transfer learning strategy based on data restoration (FTLS-DR). When faced with data damage in the client, this strategy employs linear regression completion on the damaged data as a preliminary step before utilizing it for source client network training. To mitigate any negative transfer effects of broken data on the local model, an offset optimization of the source client network is performed using a joint function composed of maximum mean discrepancy (MMD) and Wasserstein distance (WD). Subsequently, the central server dynamically evaluates the global model based on its performance in task verification for each source client and builds a new round of each source client network by adaptively weighting the global model download link. The main innovations in this paper are as follows:

1.
A novel federated learning strategy is proposed to solve the problem that the source client lacks complete samples for network training, which rarely occurs in current federated transfer learning research.

2.
The joint function proposed for optimizing source-client networks in federated transfer learning strategies employs Wasserstein distance and multi-kernel MMD to measure domain distances and effectively alleviates the model-negative transfer phenomenon caused by distribution discrepancies through periodic training.

3.
To address the challenge of diagnosing targets across different devices and under varying working conditions, an adaptive global model update method is employed by the central server. This approach ensures excellent fault diagnosis performance while safeguarding source client data privacy. The subsequent sections of this article are structured as follows: Section 2 introduces the related work studied in this paper, while Section 3 presents the network structure and detailed training process of the proposed federated transfer learning strategy. In Section 4, multiple sets of experiments are conducted to discuss the proposed scheme. Finally, Section 5 concludes the entire paper.

Federated Learning
Federated learning was initially proposed as a solution to address the challenge of safeguarding client data privacy in the realm of cross-device fault diagnosis [20]. The framework is designed to facilitate the coordination of network model training among independent parties while ensuring the protection of their respective data privacy [21]. As a distributed machine learning framework, federated learning is divided into three categories: horizontal federated learning (HFL), vertical federated learning (VFL), and federated transfer learning (FTL). Additionally, it mainly includes three sets of training steps: First, the central server initializes the network structure and distributes it along with initial parameter settings to each client. Subsequently, each client utilizes the received network model to perform model training based on local data and uploads the final training result to the central server. Finally, the central server summarizes the client network models of all parties to build a global model with more complete diagnostic knowledge to improve network performance as a whole.
The training process of the federated learning strategy is distributed to each client, and finally, the aggregation of diagnostic knowledge is realized on the central server, which not only ensures the privacy of all source client users but also promotes knowledge sharing among clients [22,23]. For example, Lee et al. [24] introduced reinforcement learning knowledge into the federated learning strategy and proposed a client selection scheme based on a reward mechanism, which improves the learning efficiency of the network while using fewer agents. Considering that the optimal design of federated learning algorithms in edge computing systems needs to be solved urgently, Li et al. [25] proposed a generalized federated learning strategy that uses the tricks of general inner approximation and complementary geometric programming to iteratively explore the full potential of federated learning. Although significant progress has been made in the aforementioned federated learning methods, there are still numerous challenges that require resolution [26]. This paper further investigates the application of federated learning schemes in few-shot fault diagnosis scenarios.

Transfer Learning
The data-driven deep learning model demonstrates efficient performance in diagnosing faults based on a comprehensive analysis of monitoring data. However, establishing an ideal data set for training deep learning models is challenging in real-world industrial scenarios due to various factors [27]. The main reasons can be summarized in three points: (1) Faults rarely occur under normal operation of mechanical equipment, which makes the collected sample data mostly healthy and free of faulty sample information. (2) The cost of obtaining fault sample information in simulated industrial scenarios within a laboratory setting is relatively high. (3) The fault samples simulated in the laboratory are devoid of environmental information present in real-life scenarios, thereby lacking authenticity.
Transfer learning, as a technique for utilizing diagnostic knowledge from known datasets to address less strongly related fault diagnosis tasks, is highly beneficial for most current domain adaptation methods [28]. For instance, Liu et al. [29] proposed a transfer learning network based on confrontational discriminative domain adaptation to address the fault problem of gas turbines. The approach involves transferring the model trained in the source domain to target domain data, followed by adversarial training that adaptively optimizes model parameters using information from both domains. He et al. [30] proposed a multi-signal fusion confrontation network that integrates vibration and sound signals to diagnose common faults in axial piston pumps. The addition of a multi-signal fusion module enables the re-weighting of each signal, enhancing the accuracy and reliability of fault diagnosis. This study employs the data augmentation method in the transfer learning strategy to enhance the generalization performance of the diagnostic model.

Random Forest
As a fusion strategy of decision tree and bagging methods, the random forest (RF) algorithm constructs a set of low-bias and non-correlation trees (T a , a = 1, . . . , R tree ) from the predictions given by multiple sets of decision tree models [31,32]. The RF algorithm is often used to solve multi-classification problems and regression problems. When tackling multiclassification problems, the prediction outcomes of all decision trees will be aggregated through voting, and the category with the highest number of votes will be deemed the ultimate diagnostic result. When solving a regression problem, the final prediction will be the mean of all decision tree outputs.
For regression, the mathematical expression of the predicted value given by random forest is as follows [33,34]: where h(x, θ a ) stands for the predicted output of the a-th decision tree, and R tree represents the number of decision trees in the random forest. This study introduces the random forest algorithm into the federated learning strategy to solve the problem of data corruption in client communication.

Network Architecture and Training Initialization
The federated transfer learning strategy proposed in this paper consists of multiple sets of local clients (i.e., multiple sets of source clients and a single target client) and a single central server. To simulate fault diagnosis requirements in realistic scenarios, each source client is assigned a unique diagnostic task that necessitates local data for resolution, while the target client solely possesses target tasks without any training data. Specifically, the local models within each source domain client share identical network configurations as the global model residing on the central server, which is a 3-layer feature extractor and 2-layer classifier network.
Considering that client data privacy needs to be protected, source clients are only allowed to share local model parameters with the central server. In the initialization phase, each source client independently performs model parameter training and diagnostic knowledge learning locally until the maximum set value of training is reached. Subsequently, upon completion of training, the model parameters from each source client are uploaded to the central server for evaluation. The central server then performs weighted aggregation based on the evaluation results of each model to form global model parameters. The federated transfer learning strategy proceeds with initialization until the global model completes its first parameter update.

Source-Client Periodic Training
Considering that the training data contains a large number of diagnostic fault samples, the damaged local training data will be repaired first and then used for the parameter training of the model. The specific local training process and network architecture are shown in Figure 1.
The random forest algorithm gradually learns the complete part of the training data and performs regression-fitting predictions on the damaged part. In this study, the number of decision trees in the random forest is set to 100, and the number of leaves is 5 groups. At the same time, it is stipulated that the prediction rhythm of predicting 1 point for every 15 points will gradually slip, and the fitting and repair of the damaged data will be completed finally. It is worth noting that "broken data" refers to sample data that loses part of the fragmented information. The repaired time-domain training samples are input to the feature extractor after undergoing a fast Fourier transformation, in which the number of neurons is set to 1000, 800, and 1200. Additionally, the domain discrepancy loss L W function is introduced to solve the problem that there is a significant distribution discrepancy between the training samples and the final fault samples in the target domain. It is worth noting that the Wasserstein Distance (WD) was chosen to evaluate the discrepancy between the datasets [35]. The specific mathematical expression is as follows: where ∏(P, Q) is the set of all joint distributions of the two sets of distributions P and Q, γ(P, Q) indicates the "mass" that needs to be transported from x to y in order to transform the distributions P into the distribution Q.

Source-Client Periodic Training
Considering that the training data contains a large number of diagnostic fault samples, the damaged local training data will be repaired first and then used for the parameter training of the model. The specific local training process and network architecture are shown in Figure 1. The random forest algorithm gradually learns the complete part of the training data and performs regression-fitting predictions on the damaged part. In this study, the number of decision trees in the random forest is set to 100, and the number of leaves is 5 groups. At the same time, it is stipulated that the prediction rhythm of predicting 1 point for every 15 points will gradually slip, and the fitting and repair of the damaged data will be completed finally. It is worth noting that "broken data" refers to sample data that loses part of the fragmented information. The repaired time-domain training samples are input to the feature extractor after undergoing a fast Fourier transformation, in which the number of neurons is set to 1000, 800, and 1200. Additionally, the domain discrepancy loss W L function is introduced to solve the problem that there is a significant distribution discrepancy between the training samples and the final fault samples in the target domain. It is worth noting that the Wasserstein Distance (WD) was chosen to evaluate the discrepancy between the datasets [35]. The specific mathematical expression is as follows: is the set of all joint distributions of the two sets of distributions P and indicates the "mass" that needs to be transported from x to y in order to transform the distributions P into the distribution Q.
The Maximum Mean Discrepancy (MMD) is introduced as the feedback loss f L for local models to optimize the model structure, aiming to mitigate the impact of erroneous diagnostic information extracted from repaired data on network diagnostic performance. Simultaneously, the cross-entropy loss is selected as the sample classification loss of the Softmax classifier, as shown in Formulas (3) and (4): The Maximum Mean Discrepancy (MMD) is introduced as the feedback loss L f for local models to optimize the model structure, aiming to mitigate the impact of erroneous diagnostic information extracted from repaired data on network diagnostic performance. Simultaneously, the cross-entropy loss is selected as the sample classification loss of the Softmax classifier, as shown in Formulas (3) and (4): where k is a mapping relationship that maps the original variable to the high-dimensional space, o s and o t represents the features extracted from the source domain and target domain samples, I[·] represents the probability score of the sample fault type by the softmax classifier.
During the initial phase of source client training, the domain discrepancy loss and the feedback loss are jointly composed of the joint domain discrepancy loss. The local model simultaneously optimizes the joint domain discrepancy loss and the sample classification loss to minimize the domain discrepancy between the source and target domains in the source client task while also utilizing them to rectify model hyperparameters caused by inpainting data offset. The specific mathematical expression is as follows: where δ1 and δ4 are empirical coefficients during model training.
The local training in the second stage cancels the optimization of the network parameters by the classification loss. The joint domain discrepancy loss is further optimized to alleviate the negative transfer phenomenon of sample error features to network training caused by random forest regression fitting, and the specific function formula is shown in Equation (6). min where δ3 is the empirical coefficient during model training.
The local model is iteratively updated through the continuous joint training of the three sets of objective functions until it reaches the initial preset value. The source client ultimately acquires a set of feature extractors that can effectively capture the relevant information from the fitting data, as well as a set of classifiers capable of distinguishing incomplete feature samples, thereby enabling periodic training for the source client.

Federated Learning Dynamic Interaction
The dynamic interaction process of the federated transfer learning strategy proposed in this paper mainly includes three links: the global model update link, the source client task verification link, and the local model adaptive update link.
The local model parameters from each source client are initially transmitted to the central server, as shown in Figure 2. The central server then assesses the diagnostic knowledge contribution of each source client to the global model and weights and aggregates it to form a new global model network. The functional description is as follows: where λ i,j represents the evaluation coefficient of the i-th client in the j-th round of federated communication, A i,j T and A i,j S represent the final diagnosis accuracy and training accuracy of the i-th client in the j-th round of source client training.
Following this, the updated global model is downloaded to each source client for model validation. Specifically, the central server performs reverse verification on all source client tasks one by one and obtains the corresponding sample diagnostic loss to optimize the parameters of the local model, which can effectively improve the ability of the local model to extract the cross-domain universal characteristics of fault samples.
where [z 1 , z 2 , . . . , z K ] represents the distribution of diagnostic results of the global model of K-group source client tasks, x i t is the task verification sample of the i-th source client, M(·) represents the diagnostic function of the global model, and L c−cen is the diagnostic loss of the global model for the source client task.   (12) and (13): In the local model adaptive update link, the parameter information of the global model and the sample diagnostic loss of the source client task are used for a new round of local model parameter updates. Considering the specificity of each source client task, the local model parameters are not completely replaced by the global model. To enhance the generalization performance of local models for cross-domain fault samples while preserving the sample diagnostic knowledge of local tasks, the local model parameters of each source client are adaptively updated, as shown in Formulas (12) and (13): where A i,j cenT represents the verification diagnosis accuracy of the j-th round of the global model for the i-th client task, H[·] represents the adaptive update function of the local model parameters.
The three sets of steps of the federation dynamic interaction cycle alternately: the global model gradually masters the fault diagnosis knowledge of all source clients, and the local model of each source client is optimized. Finally, the optimized global model will be delivered to the target client for final verification of the target task.

Dataset Description
In this section, three sets of bearing datasets (including a public dataset and two laboratory simulation datasets) are utilized to validate the efficacy of the proposed method, encompassing three health status categories: normal condition (NC), inner ring fault (IRF), and outer ring fault (ORF). The data set information is shown in Table 1. The CWRU Bearing Dataset from Case Western Reserve University comprises sample data obtained by the Electromechanical Signal Analyzer at four distinct rotational speeds. The damage diameters of the outer and inner ring faults are categorized as 0.1778 mm, 0.3556 mm, and 0.5334 mm, respectively. In the experiment, the vibration acceleration signal collected by the sensor located at the 6 o'clock position of the motor drive end is selected for research and discussion. Simultaneously, two groups of sampling frequencies are set to 12 kHz and 48 kHz, respectively.

MDS
The Motor Drive Simulation (MDS) Experiment Dataset is collected by the LMS vibration data acquisition instrument at a sampling frequency of 12.8 kHz and a three-way acceleration sensor, specifically the PCB353B33 model. The damage sizes of the outer and inner rings of the bearing are specifically artificial EDM cracks, each with a width and depth of 0.5 mm. Additionally, sample information was collected on the health status of rolling bearings at three different speeds: 1000 rpm, 1300 rpm, and 1500 rpm. The fault samples collected in the time domain are subjected to Fast Fourier Transform (FFT) processing to obtain frequency domain signal samples for training.

GPTFS
The Gear Power Transmission Fault Simulation (GPTFS) Experimental Dataset uses a specially processed cylindrical roller bearing (NU205EM) for experiments and artificially increases crack faults in the outer ring and inner ring of the bearing (i.e., EDM, the crack size is 0.2 mm, 0.4 mm, and 0.6 mm). During the data collection process, the PCB315A acceleration sensor was mounted onto the bearing base and set to a signal collection frequency of 12.8 kHz, and the test bench is shown in Figure 3. Specifically, for the bearing experiment, the data samples were collected from a control motor operating at constant speeds of 1000 rpm, 1500 rpm, and 2000 rpm while also subjecting it to 0 N and 20 N motor loads as per experimental requirements.
cially increases crack faults in the outer ring and inner ring of the bearing (i.e., EDM, the crack size is 0.2 mm, 0.4 mm, and 0.6 mm). During the data collection process, the PCB315A acceleration sensor was mounted onto the bearing base and set to a signal collection frequency of 12.8 kHz, and the test bench is shown in Figure 3. Specifically, for the bearing experiment, the data samples were collected from a control motor operating at constant speeds of 1000 rpm, 1500 rpm, and 2000 rpm while also subjecting it to 0 N and 20 N motor loads as per experimental requirements.

Motor
Rotor

Different Comparison Schemes
In order to demonstrate the superiority of the proposed federated transfer learning scheme in addressing the few-shot learning problem, multiple sets of comparative experiments with identical experimental configurations were conducted to validate its effectiveness.
Baseline: The baseline method [36], which does not incorporate any federated transfer learning knowledge, is commonly employed as a reference group in experiments to assess the reliability and efficacy of proposed schemes. Each source client model performs direct diagnosis on the target task after local task training, and the final diagnosis result

Different Comparison Schemes
In order to demonstrate the superiority of the proposed federated transfer learning scheme in addressing the few-shot learning problem, multiple sets of comparative experiments with identical experimental configurations were conducted to validate its effectiveness.
Baseline: The baseline method [36], which does not incorporate any federated transfer learning knowledge, is commonly employed as a reference group in experiments to assess the reliability and efficacy of proposed schemes. Each source client model performs direct diagnosis on the target task after local task training, and the final diagnosis result for the target task is obtained by aggregating and averaging the results from all source clients.
FedAvg: The Federated Averaging (FedAvg) method [37] aims to centrally average the locally trained models and aggregate them into a global model, which is then distributed to each client device through training. This approach achieves the objective of training a shared model with scattered data by employing two stages: local model training and global model aggregation, which ensure diagnostic knowledge sharing while preserving data privacy.

FTLS-DPP:
The Federated Transfer Learning Scheme based on Data Privacy Protection (FTLS-DPP) method is a collaborative strategy designed to address the issue of industrial data islands, with its training process being executed independently on each local client. Specifically, the local model employs differential training to enhance the diagnostic accuracy and generalization of the network, while the global model assesses the task contribution of each local model for weighted aggregation. These two sets of training cycles alternate to accomplish the target customer terminal task.

Cross-Machine Federated Transfer Learning Tasks and Parameters Setting
The training process of each source client model is conducted independently in the experiment, thereby ensuring the privacy of individual client data. The complete training of the federated transfer policy does not involve any information regarding the target client tasks, and the detailed experimental task settings are presented in Table 2. Specifically, K sets of source clients and a group of target clients were established using bearing samples collected by three sets of test platforms under various working conditions during the experimental verification stage. Each set of clients contains unique fault diagnosis tasks that are consistent with the final target, and there are notable differences in these tasks.  Four groups of samples with missing information and client tasks were established in this study to simulate diagnostic tasks under various working conditions. In the first scenario, each source client sample set contains ideal sample data, and there are discernible discrepancies in the diagnostic tasks of each source client. In the second scenario, not only does the training data contain defects in each client diagnosis task, but it also exhibits a 12.5% rate of sample damage. Furthermore, the federation strategy focuses on more intricate cross-device and cross-type fault sample diagnoses in this scenario. In the third and fourth scenarios, both cross-device and cross-model diagnostic tasks were present in the target client, while load information was also integrated into the data of each source client. By setting up four groups of federated diagnostic tasks, the proposed diagnostic strategy is fully applied. To clarify the operation of the proposed federated transfer learning strategy, the relevant parameter information is established based on the requirements of the target task and presented in Table 3.

Diagnosis Result and Discussion
The random forest algorithm is used to perform regression fitting on the damaged training data in the source client, and the fitted data is directly applied to the training process of the network. Figure 4 shows the comparison of the fitting curves of the training samples for each health type of the three groups of source clients in Case 2. It can be clearly seen from the figure that the predicted data for client 1 was constructed by using the CWRU data set, and the real data met a relatively ideal fit, which shows that there are obvious periodic fault characteristics in the data set. As more uncertain environmental interference is mixed into the data set, the peaks of the fitting curves predicted in Client 2 and Client 3 begin to stagger from the real data, but the trend of the fitting curves is always consistent with the real data.

Diagnosis Result and Discussion
The random forest algorithm is used to perform regression fitting on the damaged training data in the source client, and the fitted data is directly applied to the training process of the network. Figure 4 shows the comparison of the fitting curves of the training samples for each health type of the three groups of source clients in Case 2. It can be clearly seen from the figure that the predicted data for client 1 was constructed by using the CWRU data set, and the real data met a relatively ideal fit, which shows that there are obvious periodic fault characteristics in the data set. As more uncertain environmental interference is mixed into the data set, the peaks of the fitting curves predicted in Client 2 and Client 3 begin to stagger from the real data, but the trend of the fitting curves is always consistent with the real data.  Figure 4. The comparison display of forecast data and real data in case 2: The blue curve represents the predicted data. The red curve represents the real data. Figure 5 shows the comparison between the fitting curve of the outer ring fault sample of the GPTFS data set predicted in the selected case 3 and the real data. Although the trend of the predicted data is basically consistent with the real data, there are still some discrepancies in the magnitude of kurtosis. Given the complexity of the samples in the dataset, the existing prediction bias is allowed during the training of the local model. In the experimental section, a group of damaged fault samples in each case is selected to describe the results of random forest regression fitting. The detailed data restoration indicators are shown in Table 4.  Figure 4. The comparison display of forecast data and real data in case 2: The blue curve represents the predicted data. The red curve represents the real data. Figure 5 shows the comparison between the fitting curve of the outer ring fault sample of the GPTFS data set predicted in the selected case 3 and the real data. Although the trend of the predicted data is basically consistent with the real data, there are still some discrepancies in the magnitude of kurtosis. Given the complexity of the samples in the dataset, the existing prediction bias is allowed during the training of the local model. In the experimental section, a group of damaged fault samples in each case is selected to describe the results of random forest regression fitting. The detailed data restoration indicators are shown in Table 4.
the predicted data. The red curve represents the real data. Figure 5 shows the comparison between the fitting curve of the outer ring fault sample of the GPTFS data set predicted in the selected case 3 and the real data. Although the trend of the predicted data is basically consistent with the real data, there are still some discrepancies in the magnitude of kurtosis. Given the complexity of the samples in the dataset, the existing prediction bias is allowed during the training of the local model. In the experimental section, a group of damaged fault samples in each case is selected to describe the results of random forest regression fitting. The detailed data restoration indicators are shown in Table 4.    In the experimental phase, each diagnostic method was tested five times in each scenario to ensure experiment reliability. The diagnostic accuracy rate and corresponding standard deviation of these comparative experiments are presented in Table 5 and Figure 6. The FTLS-DR method proposed in this study outperforms other comparison methods in terms of diagnostic accuracy and fluctuation range across all four cases. Specifically, in the case of complete data training in case 1, the diagnostic accuracy of the proposed method in each client task and target client task is higher than 98%, with a standard deviation of 1.04%. Comparing the FedAvg method with the FTLS-DPP method, the diagnostic accuracy is only 84.78% and 92.83%, and the standard deviation is greater than 7.21%. The proposed method still demonstrates superior model generalization performance and diagnostic accuracy, even in the presence of corrupted training data. Specifically, the diagnostic accuracy for the unknown target client diagnosis task remains above 78.06% when 25% of the training data is damaged. It can be inferred that the FTLS-DR method proposed in this paper has better universal feature extraction capabilities for fault samples, rendering it more suitable for diagnostic tasks in complex scenarios.
In order to demonstrate the distribution of features extracted from data samples and validate the advantages of feature extraction using the proposed FTLS-DR method, the high-dimensional features of the target client sample extracted in the final verification link are visualized and displayed through dimension reduction [38], as shown in Figure 7. In cases 1 to 3, each group of clients encompasses three distinct bearing health states, while in case 4, seven bearing health states are set for the problem of misclassification of fuzzy The FTLS-DR method proposed in this study outperforms other comparison methods in terms of diagnostic accuracy and fluctuation range across all four cases. Specifically, in the case of complete data training in case 1, the diagnostic accuracy of the proposed method in each client task and target client task is higher than 98%, with a standard deviation of 1.04%. Comparing the FedAvg method with the FTLS-DPP method, the diagnostic accuracy is only 84.78% and 92.83%, and the standard deviation is greater than 7.21%. The proposed method still demonstrates superior model generalization performance and diagnostic accuracy, even in the presence of corrupted training data. Specifically, the diagnostic accuracy for the unknown target client diagnosis task remains above 78.06% when 25% of the training data is damaged. It can be inferred that the FTLS-DR method proposed in this paper has better universal feature extraction capabilities for fault samples, rendering it more suitable for diagnostic tasks in complex scenarios.
In order to demonstrate the distribution of features extracted from data samples and validate the advantages of feature extraction using the proposed FTLS-DR method, the high-dimensional features of the target client sample extracted in the final verification link are visualized and displayed through dimension reduction [38], as shown in Figure 7. In cases 1 to 3, each group of clients encompasses three distinct bearing health states, while in case 4, seven bearing health states are set for the problem of misclassification of fuzzy fault samples. The proposed federated transfer strategy still shows satisfactory diagnostic results in the face of fault sample diagnosis under unknown working conditions. From the extracted sample features, it can be seen that the sample data features in each healthy state are accurately extracted and perfectly classified. Except for a small number of fault samples in the outer circle that were misclassified in case 2, there was a staggered phenomenon of individual fault sample cluster boundaries in case 4. This further demonstrates that the FTLS-DR method can still perform satisfactorily in the face of complex transfer tasks across devices and bearing models.

Conclusions
Aiming at the problem of data privacy protection in actual industrial scenarios, this paper proposes a new cross-device fault diagnosis method based on repaired data. The proposed federated transfer learning strategy is different from the traditional fault diagnosis method. The target client sample does not participate in the network training and parameter updates from the initial training stage to the final target task verification process. Multiple sets of diagnostic tasks are established on three sets of bearing datasets to simulate engineering requirements in real-world scenarios. The results show that the proposed federated transfer learning strategy effectively solves the problem of difficult diagnosis of fault samples caused by the lack of complete local training samples. The proposed FTLS-DR method not only effectively guarantees the privacy of client data but also achieves the best diagnostic results among other comparison methods. In addition, the key indicators and fitting accuracy of the restoration data were measured from multiple perspectives, and the comprehensive evaluation proves that the method has a good prospect for practical engineering diagnosis.

Conclusions
Aiming at the problem of data privacy protection in actual industrial scenarios, this paper proposes a new cross-device fault diagnosis method based on repaired data. The proposed federated transfer learning strategy is different from the traditional fault diagnosis method. The target client sample does not participate in the network training and parameter updates from the initial training stage to the final target task verification process. Multiple sets of diagnostic tasks are established on three sets of bearing datasets to simulate engineering requirements in real-world scenarios. The results show that the proposed federated transfer learning strategy effectively solves the problem of difficult diagnosis of fault samples caused by the lack of complete local training samples. The proposed FTLS-DR method not only effectively guarantees the privacy of client data but also achieves the best diagnostic results among other comparison methods. In addition, the key indicators and fitting accuracy of the restoration data were measured from multiple perspectives, and the comprehensive evaluation proves that the method has a good prospect for practical engineering diagnosis.

Data Availability Statement:
The data used to support the findings of this study are available from the corresponding author upon request.