Multi-Representation Domain Adaptation Network with Duplex Adversarial Learning for Hot-Rolling Mill Fault Diagnosis

The multi-process manufacturing of steel rolling products requires the cooperation of complicated and variable rolling conditions. Such conditions pose challenges to the fault diagnosis of the key equipment of the rolling mill. The development of transfer learning has alleviated the problem of fault diagnosis under variable working conditions to a certain extent. However, existing diagnosis methods based on transfer learning only consider the distribution alignment from a single representation, which may only transfer part of the state knowledge and generate fuzzy decision boundaries. Therefore, this paper proposes a multi-representation domain adaptation network with duplex adversarial learning for hot rolling mill fault diagnosis. First, a multi-representation network structure is designed to extract rolling mill equipment status information from multiple perspectives. Then, the domain adversarial strategy is adopted to match the source and target domains of each pair of representations for learning domain-invariant features from multiple representation networks. In addition, the maximum classifier discrepancy adversarial algorithm is adopted to generate target features that are close to the source support, thereby forming a robust decision boundary. Finally, the average value of the predicted probabilities of the two classifiers is used as the final diagnostic result. Extensive experiments are conducted on an experimental platform of a four-high hot rolling mill to collect the fault state data of the reduction gearbox and roll bearing. The experimental results reveal that the method can effectively realize the fault diagnosis of rolling mill equipment under variable working conditions and can achieve average diagnostic rates of up to 99.15% and 99.40% on the data sets of the rolling mill gearbox and bearing, which are respectively 2.19% and 1.93% higher than the rates achieved by the most competitive method.


Introduction
The rolling mill is indispensable in the production of steel products, and its safe and reliable operation is an effective premise to ensure product quality [1,2]. As modern industrial equipment tends to be large-scale and complex, rolling mill equipment is also developing in the direction of diversified production processes and continuous rolling processes. This complex and variable rolling condition poses great challenges to the condition monitoring and fault diagnosis of rolling mill equipment [3,4]. Under the continuous effect of long-term high load, key components, including a hot rolling mill gearbox, roll bearings, and so on, are prone to failure and damage. If such faults are not detected in a timely manner, they will severely affect the product quality, resulting in considerable economic losses [5].
With the development of artificial intelligence and sensing technology, fault diagnosis has shifted from traditional methods based on expert experience and signal analysis to data-driven fault diagnosis [6]. The support vector machine, random forest, artificial neural network, and other algorithms have made great breakthroughs in solving the traditional problem of relying on complex physical modeling and artificial analysis [7]. However, these fault diagnosis algorithms based on traditional machine learning must be constructed by professionals in feature engineering. The features of these structures are usually only suitable for specific diagnostic tasks and are not universal. In addition, because of the shallow model architecture, traditional machine learning algorithms cannot fully map the nonlinear relationship between state data and fault space.
As a branch of machine learning, deep learning can overcome the lack of nonlinear mapping ability of shallow machine learning algorithms and adaptively learn faultsensitive features from multiple hidden layers. In recent years, deep learning has been widely reported in the field of fault diagnosis [8,9]. Shao et al. [10] proposed a multi-signal fault diagnosis algorithm based on the convolutional neural network (CNN), which uses vibration and current signals to monitor the state of the motor. For mining the deep-seated state information of mechanical signals, Han et al. [11] used the time-and frequency-domain information together as the model input and proposed an intelligent fault diagnosis method of a dual-stream CNN based on multilevel information fusion. Jia et al. [12] constructed a local connection network through a normalized sparse autoencoder for intelligent fault diagnosis of gearboxes and bearings. Shi et al. [13] studied the health status monitoring of rolling mills based on multi-source sensor fusion under imbalanced and small samples. Yang et al. [14] proposed a residual wide-kernel deep convolutional auto-encoder for intelligent rotating machinery fault diagnosis. Yu et al. [15] developed an approach based on multi-sensor information fusion and improved deep belief networks (DBNs) for the health state diagnosis of rolling mills. The existing literature reveals that the method based on traditional deep learning can achieve superior performance when it can collect sufficient label status data from the target mechanical equipment [16]. However, the actual industrial production process is complex and accompanied by a large amount of environmental noise. The complex and variable working conditions of the hot rolling mill result in the model trained under certain working condition data suffering significant performance degradation when applied for mechanical diagnosis under other working conditions [17].
The change in data distribution caused by the change in mechanical equipment working conditions is called domain shift [18], as shown in the left panel of Figure 1. Transfer learning is a realistic approach to learning knowledge from one or more tasks and applying it to other related tasks; it can effectively compensate for the differences across domains [19]. In particular, domain adaptation, one of the branches of transfer learning, extracts domain-invariant features through distributed difference measurement or domain adversarial training, which is one of the common algorithms for mechanical condition monitoring and fault diagnosis under variable working conditions [20], as shown in the middle panel of Figure 1. Li et al. [21] used the multi-core maximum mean difference (MMD) to minimize the domain distribution distance in multiple layers of the deep network, which effectively improved the generalization performance of the model. By integrating CORrelation ALignment (CORAL) into a convolutional autoencoder, Qian et al. [22] realized the state recognition of a planetary gearbox under variable working conditions. Li et al. [23] applied the confrontation training method to align the edge distribution and explored the unmarked distribution matching of auxiliary states in parallel data. The bearings at different installation positions were effectively diagnosed. Han et al. [24] proposed a joint distribution domain-adaptive depth transfer network for industrial fault diagnosis, which improved the distribution matching accuracy. Tang et al. [25] added sample label information in the process of domain confrontation and applied conditional distribution domain adaptation to learn domain-invariant features; thus, the accuracy of bearing fault diagnosis was improved. Guo et al. [26] proposed a deep migration learning network with simultaneous MMD measurement and domain confrontation training to maximize the domain recognition error and minimize the probability distribution difference. This pro- Although various domain-adaptive and improved transfer learning methods have alleviated the domain offset problem caused by varying working conditions to a certain extent, the existing domain-adaptive methods only express the transfer diagnosis knowledge from a single piece of information; that is, only part of the mechanical state information is concerned, and the important information related to the machine health may be lost. Thus, the diagnostic performance is unsatisfactory. Literature [28] shows that extracting specific features of observational objects from multiple perspectives can significantly improve the accuracy of cross-domain image classification. To fully transfer health state knowledge from source tasks to target diagnostic tasks, multi-representation information distribution matching should be considered. In addition, owing to the different characteristics of each domain, achieving complete matching of the feature distribution of different domains is difficult, which easily leads to unclear decision boundaries and reduces the accuracy of target diagnosis tasks. To deal with the above two problems, a multirepresentation domain adaptation network is proposed in this paper for the diagnosis problem of key equipment of the hot rolling mill under variable conditions. The multirepresentation network structure is designed to extract multi-representation information, and the domain adversarial strategy is applied to match the source and target domains represented by each pair simultaneously. This process enables the transfer of sufficient mechanical state knowledge. In addition, the maximum classifier discrepancy is introduced, and adversarial training is introduced to generate target features close to the source support, thereby forming a robust decision boundary, as shown in the right panel of Figure 1. The contributions of this study are as follows: 1) A multi-representation network structure is designed to fully extract the status information of rolling mill equipment from multiple perspectives. 2) Domain confrontation and maximum classifier difference discrepancy confrontation training are simultaneously applied to express the transfer of diagnostic knowledge from multiple features and divide the classification boundary of specific tasks. 3) Extensive experiments are performed to collect the fault state data of the reduction gearbox and roll bearing from a four-high (4-H) hot rolling mill experimental platform. Thus, the effectiveness of the proposed method for rolling mill equipment fault diagnosis under variable working conditions is verified. Although various domain-adaptive and improved transfer learning methods have alleviated the domain offset problem caused by varying working conditions to a certain extent, the existing domain-adaptive methods only express the transfer diagnosis knowledge from a single piece of information; that is, only part of the mechanical state information is concerned, and the important information related to the machine health may be lost. Thus, the diagnostic performance is unsatisfactory. Literature [28] shows that extracting specific features of observational objects from multiple perspectives can significantly improve the accuracy of cross-domain image classification. To fully transfer health state knowledge from source tasks to target diagnostic tasks, multi-representation information distribution matching should be considered. In addition, owing to the different characteristics of each domain, achieving complete matching of the feature distribution of different domains is difficult, which easily leads to unclear decision boundaries and reduces the accuracy of target diagnosis tasks. To deal with the above two problems, a multi-representation domain adaptation network is proposed in this paper for the diagnosis problem of key equipment of the hot rolling mill under variable conditions. The multi-representation network structure is designed to extract multirepresentation information, and the domain adversarial strategy is applied to match the source and target domains represented by each pair simultaneously. This process enables the transfer of sufficient mechanical state knowledge. In addition, the maximum classifier discrepancy is introduced, and adversarial training is introduced to generate target features close to the source support, thereby forming a robust decision boundary, as shown in the right panel of Figure 1. The contributions of this study are as follows: (1) A multi-representation network structure is designed to fully extract the status information of rolling mill equipment from multiple perspectives. (2) Domain confrontation and maximum classifier difference discrepancy confrontation training are simultaneously applied to express the transfer of diagnostic knowledge from multiple features and divide the classification boundary of specific tasks. The remainder of this paper is organized as follows.

Problem Setup
In this study, the general definition of the domain-adaptive fault diagnosis method is followed. Specifically, it is assumed that a tagged source domain dataset D s = x s i , y s i n s i=1 can be collected under a certain working condition, where n s is the number of source domain samples, and y s i {1, 2, 3, · · · , k} represents the corresponding health status tag. The unlabeled data that can be obtained under the working conditions that need to be diagnosed are defined as the target domain D t = x t i n t i=1 , where n t is the number of samples in the target domain, and D t and D s share the same label space. Because of the varying working conditions, such as speed of revolution or load, the distribution of the source domain is inconsistent with that of the target domain, that is, P(X s ) = P X t . The purpose of fault diagnosis under variable working conditions is to build a cross-domain diagnosis model y = f (x), which can learn domain-invariance and distinguishability characteristics by eliminating the distribution differences between the two domains and minimize the risk of the target diagnosis task E (x,y) [f(x) = y] under source supervision.

Domain Adversarial Training
Domain adversarial training is a typical domain-adaptive method; Ganin et al. [29] first introduced the concept of adversarial training in the field of transfer learning, aiming at minimizing the edge distribution distance of two domains. Specifically, the basic architecture of an adversarial network includes a feature extractor F and a domain classifier D; usually, a classifier C is also included. For a pattern distinguishability problem, their parameters are represented by θ F , θ D , and θ C , respectively. In the training process, feature extractor F and domain classifier D are two players in a minimax game, that is, domain classifier D attempts to identify whether the representation learned by feature extractor F originates from the source domain or the target domain, and feature extractor F generates cross-domain-invariant characteristics as far as possible to fool domain classifier D. In this adversarial training process, the distribution difference between the source domain and the target domain gradually reduces. At the same time, under the supervision of the source domain, classifier C is trained to distinguish the categories of different samples. By adding a gradient reverse layer (GRL) to feature extractor F and domain classifier D, the model optimization of this process can be simultaneously realized.

Multi-Representation Domain Adaptation Network
The integrated architecture of the proposed multi-representation network is shown in Figure 2. It mainly consists of four parts: the shared feature extractor, the multirepresentation feature extractor, the domain classifier, and the state classifier. Specifically, the shared feature extractor includes three convolution layers. The first convolution layer uses a convolution core with a size of 32 to filter out interference noise, while the other two convolution layers use a convolution core with a size of 3 to extract common underlying characteristics. The multi-representation feature extractor module contains different network branches, each of which has a different network structure and convolution scale. It aims to extract the feature representation of specific characteristics from different angles. In this study, four different network structures are used as the multi-representation feature extractor, and the specific structure of each network branch G i is shown in Figure 2. A domain classifier is added after each representation network branch to judge the feature source of the network branch learning. Each characteristic obtained by the representation branch structure is spliced into a feature vector as the input of the state classifier, and the two classifiers are trained separately to distinguish different rolling mill running states. Meanwhile, the discrepancy between the two classifiers is used to detect the target samples close to the decision boundary. This process allows the feature extractor to learn a more robust characteristic representation during adversarial training.

Model Optimization
After the model is built, the specific optimization function should be designed to update the model parameters to achieve the expected diagnostic performance. Specifically, the loss objective function of the proposed model can be divided into three parts: supervised source domain classification loss, domain distinguishability loss in the multi-representation branch structure, and discrepancy loss of two classifiers. Under the supervised

Model Optimization
After the model is built, the specific optimization function should be designed to update the model parameters to achieve the expected diagnostic performance. Specifically, the loss objective function of the proposed model can be divided into three parts: supervised source domain classification loss, domain distinguishability loss in the multirepresentation branch structure, and discrepancy loss of two classifiers. Under the supervised training of source domain samples, the two classifiers can independently learn to divide the decision boundary of fault classification. The loss function of the two classifiers can be expressed as follows: where cat represents vector connection operation.
In each multi-representation network branch, a domain classifier is used to perform adversarial training to realize the edge distribution matching represented by this feature; thus, the domain-invariant feature can be learned in this process. The domain adversarial loss of the multi-representation structure can be formulated as follows: In addition to domain adversarial training, the second adversarial strategy of the proposed model is maximum classifier difference confrontation, which aims to use the predicted difference between the two classifiers to establish the relationship between the target sample and the task specific decision boundary. The two classifiers aim to detect the target samples far away from the source support, and the feature extractor is used to generate the target representation close to the source support. In this adversarial training process, more distinguished domain-invariant features can be learned. The maximum classifier discrepancy loss function of this model is given as follows: By adding a GRL, the two adversarial training processes and source supervision training can be carried out synchronously, and the parameters of each module of the model can be updated synchronously. The total loss function of the proposed model is as follows: where the weight parameter λ changes gradually according to the formula λ = 2 1+exp(−γ·p) − 1 with γ being set to 10. In this study, p changes linearly from 0 to 1 with the training process.
According to the total loss function formula, the proposed parameter optimization problem of each module of the model can be expressed by the following formula: whereθ D j ,θ C j are the optimized values of θ D j , θ C j , respectively. Through the random gradient descent algorithm, the parameter update process for each network module is as follows: where η is the learning rate, which is adjusted with the training progress according to the formula η 0 (1+α·p) β , with η 0 = 0.01, α = 10, and β = 0.75. This learning rate attenuation method helps the model rapidly converge to the optimal value [31].
The overall training process of the proposed method is shown in Figure 3. The proposed method follows a simple end-to-end approach based on the standard unsupervised transfer learning training process. Only labeled source domain and unlabeled target domain samples are input into the network, and unlabeled target samples participate in the training. The total loss value in Equation (12) is obtained through forward calculation, and then the parameters in Equations (14)- (17) are optimized through the stochastic gradient descent (SGD) algorithm.

Experimental Study
In this section, by collecting the operating data of the bearing and reducer under different working conditions on a 4-H hot rolling mill test bench, an extensive experimental scheme was designed to verify the performance of the proposed method. The diagnostic

Update network parameters
Diagnostic results Whether the model converges?

Experimental Study
In this section, by collecting the operating data of the bearing and reducer under different working conditions on a 4-H hot rolling mill test bench, an extensive experimental scheme was designed to verify the performance of the proposed method. The diagnostic results of several typical diagnostic models and the proposed method under the same experimental conditions are compared and analyzed. The overall structure of the 4-H hot rolling mill experimental platform is shown in Figure 4. It mainly includes a control console, a variable frequency adjustable speed drive motor, a reduction gearbox, a direction-changing gearbox, and a 4-H rolling mill. The control console is mainly composed of a variable-frequency motor controller, a loading motor controller, a pressure sensor display screen, and an emergency stop switch. The variable-frequency speed regulating motor is the driving source of the whole rolling mill system. The motor, reduction gearbox, and direction-changing gearbox are connected through couplings, and the direction-changing gearbox and 4-H rolling mill are connected through cross universal joints. The 4-H mill is composed of a mill stand, two backup rolls, and two working rolls. A loading device is installed at the top of the mill housing, which can exert pressure on the roll by electric or manual methods. Through the motor control button of the control console, the speed of the drive motor and the roll load can be adjusted to simulate different rolling conditions.

Gearbox Dataset Description
The gearbox data were collected on the reduction gearbox of the 4-H experimental platform. As shown in Figure 5, the reduction gearbox inclu drical spur gears-a large gear with 55 teeth and a small gear with 25 tee acquisition experiment, the operating states of six health modes were simul different single-point faults of large gears and small gears and composite fa gears. The detailed health states are listed in Table 1. An acceleration sen on the reduction gearbox box to collect vibration signals. Gears with d modes were replaced in turn to simulate different gearbox operation stat motor speed was controlled at 880× g, and three different load pressures w turn to simulate different working conditions. The vibration signals collec load were used as a data source. The sampling frequency was set to 5120 H

Gearbox Dataset Description
The gearbox data were collected on the reduction gearbox of the 4-H hot rolling mill experimental platform. As shown in Figure 5, the reduction gearbox includes two cylindrical spur gears-a large gear with 55 teeth and a small gear with 25 teeth. In the data acquisition experiment, the operating states of six health modes were simulated, including different single-point faults of large gears and small gears and composite faults of the two gears. The detailed health states are listed in Table 1. An acceleration sensor was placed on the reduction gearbox box to collect vibration signals. Gears with different failure modes were replaced in turn to simulate different gearbox operation states. The driving motor speed was controlled at 880× g, and three different load pressures were applied in turn to simulate different working conditions. The vibration signals collected under each load were used as a data source. The sampling frequency was set to 5120 Hz.
gears. The detailed health states are listed in Table 1. An acceleration sensor wa on the reduction gearbox box to collect vibration signals. Gears with differen modes were replaced in turn to simulate different gearbox operation states. Th motor speed was controlled at 880× g, and three different load pressures were a turn to simulate different working conditions. The vibration signals collected un load were used as a data source. The sampling frequency was set to 5120 Hz.  Large gear tooth breakage and small gear w 4 Large gear pitting and small gear wear 5 Small gear wear

Bearing Dataset Description
The bearing data were collected by monitoring the outer bearing of the wor on the rolling mill, and the acceleration sensor was placed in the horizontal dir the bearing seat. Four different bearing states were simulated: normal, inner r (IRF), outer ring fault (ORF), and rolling element fault (REF). These faults were in in different parts of the rolling bearing through EDM, as shown in Figure 6. Du collection, the load pressure was constant, and the motor speed was set to 600, 1200× g to simulate different working conditions. The vibration signal collected

Bearing Dataset Description
The bearing data were collected by monitoring the outer bearing of the working roll on the rolling mill, and the acceleration sensor was placed in the horizontal direction of the bearing seat. Four different bearing states were simulated: normal, inner ring fault (IRF), outer ring fault (ORF), and rolling element fault (REF). These faults were introduced in different parts of the rolling bearing through EDM, as shown in Figure 6. During data collection, the load pressure was constant, and the motor speed was set to 600, 840, and 1200× g to simulate different working conditions. The vibration signal collected at each speed was used as a data source. The sampling frequency of the acquisition card was set to 10,240 Hz.
Entropy 2023, 25, x FOR PEER REVIEW speed was used as a data source. The sampling frequency of the acquisitio to 10,240 Hz.

Experimental Setup
For the data of each failure mode under different working conditions, ing window with a size of 1024 to intercept samples. In the gearbox datas were obtained for each fault class; there were 1800 samples under each wor In the bearing dataset, 200 samples were obtained for each fault class, incl 800 samples under each working condition. In each diagnostic task, 50% were randomly selected as the training set and the remaining 50% as the t the data under each working condition was used as a source, in the expe source was randomly selected as the source domain in cross-domain fault the remaining data sources were selected in turn as the target domain to b this study, a total of 12 diagnostic tasks were set, and the detailed inform in Table 2. In the process of model training, the size of the mini batch was

Experimental Setup
For the data of each failure mode under different working conditions, we used a sliding window with a size of 1024 to intercept samples. In the gearbox dataset, 300 samples were obtained for each fault class; there were 1800 samples under each working condition. In the bearing dataset, 200 samples were obtained for each fault class, including a total of 800 samples under each working condition. In each diagnostic task, 50% of the samples were randomly selected as the training set and the remaining 50% as the test set. Because the data under each working condition was used as a source, in the experiment, a data source was randomly selected as the source domain in cross-domain fault diagnosis, and the remaining data sources were selected in turn as the target domain to be diagnosed. In this study, a total of 12 diagnostic tasks were set, and the detailed information is shown in Table 2. In the process of model training, the size of the mini batch was set to 32, and a total of 20 epochs were trained. In addition, several typical diagnostic methods were introduced to compare the performance of the proposed model with its actual performance. They are briefly described as follows: For a fair comparison, all comparison methods used the same network parameters as the proposed model. To avoid the effect of random factors, each trial was repeated 10 times, and the average diagnostic results were adopted.

Diagnosis Result Discussion
In this section, the diagnostic results of the proposed method and other comparative methods on different diagnostic tasks are presented and discussed. The diagnostic results of different methods from the gearbox dataset are shown in Figure 7, and specific diagnostic accuracy and standard deviation are listed in Table 3. Clearly, the diagnostic performance of the proposed method on six diagnostic tasks of the gearbox dataset was better than that of other comparative methods. Because a domain-adaptive algorithm is not applied, the CNN achieved the lowest average diagnostic accuracy of 78.76% on the six diagnostic tasks. As the distributed difference measurement algorithm is introduced in the DAN and D-CORAL, their diagnostic performance slightly improves compared with the CNN, with their average diagnostic accuracy reaching 81.15% and 81.85%, respectively. However, on diagnostic task A 2 , the DAN and D-CORAL showed a negative transfer phenomenon, and their diagnostic accuracy was lower than that of the CNN. The DANN still achieved 82.88% diagnostic accuracy, indicating that the DANN with a domain adversarial strategy can better mitigate the effect of negative transfer than the DAN and D-CORAL. The JAN and MADA consider the conditional distribution domain adaptation; therefore, their diagnostic performance is significantly improved compared with that of the global distribution domain-adaptive method. Their average diagnostic accuracy in the six diagnostic tasks reached 93.50% and 96.96%, respectively; however, when using a single-feature representation for domain adaptation, the JAN and MADA may lose some important diagnostic information, and the diagnostic performance degrades. The proposed method migrates information from the perspective of multi-feature representation and considers the decision boundary division of the target task. The joint distribution difference between the source domain and the target domain can be well compensated. Therefore, the proposed method obtained the highest average diagnostic accuracy in this diagnostic task, which is 2.19% higher than the most competitive MADA, and showed the best model stability.      The diagnostic results of different methods on the diagnostic task of the rolling mill bearing dataset are shown in Figure 8 and Table 4. Similar to the aforementioned case, the proposed method achieved the best diagnostic performance on the bearing dataset, and the average diagnostic accuracy on the six diagnostic tasks of the bearing dataset was 99.40%. The DAN, D-CORAL, and DANN obtained similar diagnostic accuracy, which were 4.92%, 5.26%, and 5.27% higher than that of the CNN, respectively. Compared with the DAN, D-CORAL, and DANN, the JAN and MADA showed better diagnostic performance improvement, with their average diagnostic accuracy reaching 95.57% and 97.47%, respectively. The shift from global distribution matching to conditional distribution matching is the key to improving diagnostic performance. The proposed method further considers the multi-representation diagnostic information transfer and the target decision boundary division, which further improves the diagnostic accuracy and reliability of cross-domain diagnostic tasks.

Visualization Results
To compare the diagnostic performance of the proposed method with th typical methods more clearly, this section presents several visualization resu nostic tasks A3 and B3.

Visualization Results
To compare the diagnostic performance of the proposed method with that of several typical methods more clearly, this section presents several visualization results on diagnostic tasks A 3 and B 3 .
First, the t-distributed stochastic neighbor embedding (t-SNE) algorithm [36] was applied to intuitively understand the transfer learning process of diagnostic knowledge. The high-level representation learned by the feature extractor is plotted directly after dimensionality reduction. The values in green denote source instances, and those in blue denote target instances. Figure 9 shows the feature distribution of the proposed method and the CNN, DAN, and DANN on the gearbox dataset for diagnosis task A 3 . It can be seen that category-level distribution differences of varying degrees exist in the high-level characteristics of the CNN, DAN, and DANN. Specifically, the characteristic distributions of category 2 samples in the source and target domains are not well matched. This is because the same fault mode exists for large gear tooth breakage and small gear wear, which easily causes feature confusion. Multi-representation feature learning and duplex adversarial strategies are used to extract features from multiple perspectives and clarify the clear division of the target decision boundary, as shown in Figure 9d. The proposed method can compensate for the lack of diagnostic knowledge, match the feature distribution of each fault state, and accurately transfer diagnostic knowledge. Similarly, on diagnostic task B 3 , the proposed method still achieved the best migration effect. As shown in Figure 10, the CNN, DAN, and DANN had serious feature distribution aliasing, which will greatly reduce the diagnostic accuracy.
vision of the target decision boundary, as shown in Figure 9d. The proposed method can compensate for the lack of diagnostic knowledge, match the feature distribution of each fault state, and accurately transfer diagnostic knowledge. Similarly, on diagnostic task B3, the proposed method still achieved the best migration effect. As shown in Figure 10, the CNN, DAN, and DANN had serious feature distribution aliasing, which will greatly reduce the diagnostic accuracy. According to the visual characteristics of learning, the confusion matrixes of the proposed method and the three comparison methods are further displayed. As shown in Figure 11, due to the category-level distribution deviation of transfer characteristics, almost all samples with the large gear tooth breakage fault of the CNN are incorrectly divided into large gear tooth breakage and small gear wear fault modes. Although the misclassification of the DAN and DANN in the fault category of broken large gear teeth was alleviated, the diagnosis rates were still too low at only 35.33% and 36%, respectively. The proposed method not only achieves a 100% diagnosis rate for the large gear tooth broken fault mode but also obtains satisfactory recognition accuracy for other fault categories. In diagnosis task B3, as shown in Figure 12, which corresponds to the characteristic distribution shown in Figure 10, the CNN, DAN, and DANN misdiagnosed the inner ring fault as normal to varying degrees, causing the machine to run with the fault; hence, these methods have poor fault diagnosis. The proposed method can correctly identify the fault mode in the inner ring fault and does not divide any fault samples into normal states, proving the reliability of the proposed method's diagnosis performance. vision of the target decision boundary, as shown in Figure 9d. The proposed method can compensate for the lack of diagnostic knowledge, match the feature distribution of each fault state, and accurately transfer diagnostic knowledge. Similarly, on diagnostic task B3, the proposed method still achieved the best migration effect. As shown in Figure 10, the CNN, DAN, and DANN had serious feature distribution aliasing, which will greatly reduce the diagnostic accuracy. According to the visual characteristics of learning, the confusion matrixes of the proposed method and the three comparison methods are further displayed. As shown in Figure 11, due to the category-level distribution deviation of transfer characteristics, almost all samples with the large gear tooth breakage fault of the CNN are incorrectly divided into large gear tooth breakage and small gear wear fault modes. Although the misclassification of the DAN and DANN in the fault category of broken large gear teeth was alleviated, the diagnosis rates were still too low at only 35.33% and 36%, respectively. The proposed method not only achieves a 100% diagnosis rate for the large gear tooth broken fault mode but also obtains satisfactory recognition accuracy for other fault categories. In diagnosis task B3, as shown in Figure 12, which corresponds to the characteristic distribution shown in Figure 10, the CNN, DAN, and DANN misdiagnosed the inner ring fault as normal to varying degrees, causing the machine to run with the fault; hence, these methods have poor fault diagnosis. The proposed method can correctly identify the fault mode in the inner ring fault and does not divide any fault samples into normal states, proving the reliability of the proposed method's diagnosis performance. According to the visual characteristics of learning, the confusion matrixes of the proposed method and the three comparison methods are further displayed. As shown in Figure 11, due to the category-level distribution deviation of transfer characteristics, almost all samples with the large gear tooth breakage fault of the CNN are incorrectly divided into large gear tooth breakage and small gear wear fault modes. Although the misclassification of the DAN and DANN in the fault category of broken large gear teeth was alleviated, the diagnosis rates were still too low at only 35.33% and 36%, respectively. The proposed method not only achieves a 100% diagnosis rate for the large gear tooth broken fault mode but also obtains satisfactory recognition accuracy for other fault categories. In diagnosis task B 3 , as shown in Figure 12, which corresponds to the characteristic distribution shown in Figure 10, the CNN, DAN, and DANN misdiagnosed the inner ring fault as normal to varying degrees, causing the machine to run with the fault; hence, these methods have poor fault diagnosis. The proposed method can correctly identify the fault mode in the inner ring fault and does not divide any fault samples into normal states, proving the reliability of the proposed method's diagnosis performance. Finally, the model sensitivity and stability of different methods are analyzed, as shown in Figures 13 and 14. These figures show the receiver operating characteristic (ROC) curves of diagnostic tasks A 3 and B 3 with the proposed methods, CNN, DAN, and DANN. Clearly, the area under the curve (AUC) of the proposed method for each fault category was basically close to 1, and the CNN obtained the minimum AUC, followed by the DAN and DANN. This confirms that the introduction of the domain-adaptive algorithm can improve the diagnosis performance to a certain extent in the fault diagnosis task under variable conditions; however, only considering the edge distribution matching is not enough. The multi-representation feature extraction mechanism and dual adversarial strategy of the proposed method realize more comprehensive learning of diagnostic knowledge and accurate transfer, and thus the proposed method has high sensitivity and stability. the DAN and DANN. This confirms that the introduction of the domain-adaptive algorithm can improve the diagnosis performance to a certain extent in the fault diagnosis task under variable conditions; however, only considering the edge distribution matching is not enough. The multi-representation feature extraction mechanism and dual adversarial strategy of the proposed method realize more comprehensive learning of diagnostic knowledge and accurate transfer, and thus the proposed method has high sensitivity and stability. category was basically close to 1, and the CNN obtained the minimum AUC, followed by the DAN and DANN. This confirms that the introduction of the domain-adaptive algorithm can improve the diagnosis performance to a certain extent in the fault diagnosis task under variable conditions; however, only considering the edge distribution matching is not enough. The multi-representation feature extraction mechanism and dual adversarial strategy of the proposed method realize more comprehensive learning of diagnostic knowledge and accurate transfer, and thus the proposed method has high sensitivity and stability.

Conclusions
This study developed a multi-representation domain adaptation network with duplex adversarial learning for rolling mill fault diagnosis under varying working conditions. The proposed method can extract comprehensive features and perform accurate knowledge transfer to realize high-performance fault diagnosis of key components of the hot rolling mill. Specifically, a multi-representation network structure was designed to

Conclusions
This study developed a multi-representation domain adaptation network with duplex adversarial learning for rolling mill fault diagnosis under varying working conditions. The proposed method can extract comprehensive features and perform accurate knowledge transfer to realize high-performance fault diagnosis of key components of the hot rolling mill. Specifically, a multi-representation network structure was designed to extract rolling mill equipment status information from multiple perspectives. Then, the domain adversarial strategy was adopted to match the source and target domains of each pair of representations for learning the domain-invariant features from multiple representations. In addition, maximum classifier diversity was adopted to generate target features that are close to the source support, thus forming a robust decision boundary. Extensive experiments were carried out on the reducer and roll bearing fault state data set of a four-high rolling mill experimental platform. The average diagnostic rates of the proposed method on different diagnostic tasks reached 99.15% and 99.40%, which were 2.19% and 1.93% higher than the rates of the most competitive method, respectively. Furthermore, t-SNE feature visualization, the confusion matrix, and the ROC curve were applied to intuitively display the implementation results of the proposed method. The experimental results showed that the proposed duplex adaptive multi-representation domain adaptation method can effectively diagnose knowledge transfer from multiple perspectives and divide clear fault category decision boundaries. The proposed method is superior to other domain-adaptive methods in model stability and fault identification accuracy and can realize effective fault diagnosis of rolling mill equipment under variable working conditions.
Although the experiments for this method showed good diagnostic accuracy achieved by the proposed method, only four different representation subnets were used for multiview feature extraction. To gain a better understanding of feature representation from more perspectives, more network branches at different scales are needed. However, that will inevitably increase the computational complexity of the network as well as require more sample training to avoid overfitting. Therefore, further research is necessary to learn representation features from more perspectives and to design lightweight networks.