A Multi-Scale Attention-Fused Domain Adaptation Network for Robust Robotic Grinding Condition Monitoring

Xiong, Penghang; Chen, Hao; Zhang, Shenan; He, Shuai; Qian, Lu

doi:10.3390/machines14030307

Open AccessArticle

A Multi-Scale Attention-Fused Domain Adaptation Network for Robust Robotic Grinding Condition Monitoring

by

Penghang Xiong

¹,

Hao Chen

²,

Shenan Zhang

²,

Shuai He

² and

Lu Qian

^2,*

¹

HexaCercle Science & Technology Co., Ltd., Wuhan 430074, China

²

School of Transportation and Logistics Engineering, Wuhan University of Technology, Wuhan 430063, China

^*

Author to whom correspondence should be addressed.

Machines 2026, 14(3), 307; https://doi.org/10.3390/machines14030307

Submission received: 30 January 2026 / Revised: 1 March 2026 / Accepted: 3 March 2026 / Published: 8 March 2026

(This article belongs to the Section Advanced Manufacturing)

Download

Browse Figures

Versions Notes

Abstract

Robotic grinding systems are pivotal in precision manufacturing for their flexibility and cost-effectiveness. However, high-precision online monitoring of robotic grinding state remains challenging due to complex machining mechanisms and unstable environments. Existing deep learning methods, reliant on the assumption of identically distributed data, suffer from poor generalization under domain shifts, such as feature shifts caused by varying processing parameters (cross-condition) or differences between machines (cross-device). To address this issue, this paper proposes a Multi-Scale Attention-Fused Domain Adaptation Network (MADAN) for robust state monitoring in robotic grinding, where an integrated robust transfer framework is constructed by an enhanced signal augmentation module, a multi-scale attention-aware fusion module and an adversarial domain adaptation strategy. The robustness of the monitoring model is greatly improved in processing scenarios involving noise and disturbances with the enhanced signal augmentation module. Based on a Cross-Scale Self-Attention Fusion Network mechanism, the multi-scale attention-aware fusion module adaptively aligns and integrates deep features of vibration signals across different receptive fields. Furthermore, the adversarial domain adaptation strategy is implemented to reduce significant distribution discrepancies between different domains. Two sets of experiments on a robotic grinding platform are carried out including cross-condition and cross-device state monitoring, respectively. Experimental results demonstrate that MADAN achieves superior performance in these two state monitoring tasks. Specifically, the model attains an average accuracy of 96.83% in cross-condition scenarios and 95.68% in cross-device scenarios, significantly outperforming State-of-the-Art transfer learning methods in both detection precision and generalization stability.

Keywords:

robotic grinding; condition monitoring; transfer learning; multi-scale attention; domain adaptation

1. Introduction

As the industrial landscape evolves towards Industry 5.0, the role of robotization in technological processes has expanded beyond mere efficiency gains. Current trends emphasize the importance of human-centricity, sustainability, and resilience in production. In this new paradigm, collaborative robots are not only designed to perform tasks but also to enhance human potential and ensure safer, more flexible manufacturing environments. Furthermore, the sustainability aspect, particularly energy efficiency, has become a critical criterion in evaluating robotic solutions for precision assembly and machining processes [1,2]. As a core technology for high-precision surface machining, grinding has a wide range of applications across high-end manufacturing fields, from aerospace to rail transportation [3,4]. For instance, the manufacturing of critical components such as aero-engine turbine blades, precision molds, artificial hip joints, and high-precision bearings all rely on grinding processes to achieve the final surface quality and geometric accuracy required for their optimal performance and longevity. Compared to traditional machining methods, robotic grinding systems offer optimized solutions for precision machining of complex surfaces and large structural components, owing to their superior motion flexibility and cost advantages [5]. However, in order to fully realize the high-performance potential of robotic grinding, the establishment of a comprehensive quality monitoring system to ensure process stability has become a critical issue that needs to be addressed.

Direct monitoring of the grinding surface is an effective method for quality monitoring of the grinding process. By capturing surface images and analyzing surface features such as particle size and surface roughness, this approach ensures the high geometric precision of the workpiece being processed. Early predictions of workpiece surface roughness primarily relied on process kinematics, grinding machine characteristics, and other factors [6,7]. Some studies have utilized digital cameras to capture spot images produced by laser illumination on the workpiece, employing polynomial networks for intelligent prediction of surface roughness [8,9]. However, due to the complex grinding mechanisms, directly monitoring surface images remains challenging, and the aforementioned methods are built on solid theoretical foundations [10]. Beyond direct surface observation, advanced condition monitoring and fault diagnosis techniques, such as accelerometric signal processing [11] and optimized neural networks [12], have been increasingly elaborated and proven essential for ensuring the operational stability of complex mechanical and robotic systems.

With the development of sensor technologies, the condition of grinding equipment can be assessed and analyzed through large volumes of historical and real-time sensor data, as well as information containing feature data and rule-based knowledge [13,14,15]. Therefore, by installing sensors, the health status of the grinding equipment can be analyzed, enabling indirect monitoring of the grinding process. To enhance safe operation, efficient and accurate fault classification and diagnosis methods have been introduced in practical industry settings, such as Support Vector Machines (SVM) [16], k-Nearest Neighbors (KNN) [17], and Random Forests (RF) [18]. Subsequently, deep learning research has advanced machine diagnostics and health monitoring to a new stage. In robot fault diagnosis research, ref. [19] studied gearbox fault diagnosis in robots using vibration signals, and ref. [20] proposed a knowledge transfer network for industrial robot rotational vector (RV) gearbox fault diagnosis. In tool fault diagnosis, ref. [21] constructed a translation-invariant vibration signal wavelet framework for intelligent tool wear state recognition, while ref. [22] developed a tool condition monitoring model based on Long Short-Term Memory (LSTM) networks. In grinding process fault diagnosis, in terms of quality detection, ref. [23] mapped vibration signals to surface quality for state recognition; ref. [24] combined time-series-to-image encoding with pre-trained Convolutional Neural Networks (CNN) for classification; ref. [25] applied LSTM models to capture temporal dependencies in vibration data; and ref. [26] enhanced performance through multi-sensor fusion and multi-scale dilated convolution networks. Despite significant results under specific experimental conditions, these approaches still face substantial challenges in real industrial scenarios due to fluctuations in operating conditions and equipment variations. Most existing studies are based on the “same distribution assumption,” meaning the training and testing data must come from the same operational conditions and equipment environments. However, in real-world production, changes in cutting parameters or even individual differences between identical equipment models can cause shifts in signal features. Such cross-domain variations in operating conditions and equipment lead to a dramatic decline in the generalization ability of pre-trained models, making them difficult to directly apply in complex, real production lines [27,28,29], which remains a critical bottleneck for the industrial deployment of intelligent monitoring frameworks.

To address the issue of poor generalization performance of the aforementioned models, transfer learning has become a research hotspot in recent years [30,31]. Shallow methods in this domain include Transfer Component Analysis (TCA) and Joint Distribution Adaptation (JDA) [32,33]. With the development of deep learning, researchers have begun to explore embedding statistical criteria during feature extraction to capture domain-invariant features. In ref. [34], by introducing adaptation layers into deep networks and minimizing the maximum mean discrepancy (MMD), preliminary alignment of the global distribution in the feature space was achieved. Ref. [35] took a different approach, this method alleviates the feature shift problem by aligning second-order statistics (covariance matrices) of source and target domain features, with a lower computational cost [36]. Building upon this, further advancements utilized multi-kernel MMD to perform multi-scale distribution adaptation across multiple high-level networks, significantly improving the model’s ability to represent complex nonlinear features. Inspired by Generative Adversarial Networks (GANs), the concept of adversarial games was introduced into the domain adaptation field [37]. A classic domain-adversarial structure was constructed, where the domain discriminator and feature extractor engage in an adversarial game, compelling the network to learn deep features that are both highly discriminative and domain-invariant. For complex components such as planetary gearboxes, ref. [38] proposed a data-model fusion-driven improved adversarial network, enhancing the model’s robustness under extreme operating conditions by optimizing the adversarial mechanism. Additionally, in the context of multi-source monitoring data utilization, researchers developed a multi-source information fusion framework [39], aiming to deeply mine the common fault characteristics of various rotating machinery under varying working conditions. Subsequently, the Reinforced Integrated Deep Transfer Learning Network [40] introduced reinforcement learning and ensemble learning concepts into the transfer process. By dynamically adjusting the contribution weights of each source domain, this approach enabled adaptive integration and optimized configuration of multi-source domain features for bearing fault diagnosis. Although the aforementioned research has established a comprehensive technical path, from statistical alignment to adversarial games and from single-source transfer to multi-source fusion, several limitations still remain: First, the effectiveness of these methods in the specific domain of robotic grinding quality detection remains to be verified. Second, existing research tends to focus on macro-level distribution alignment and lacks a thorough exploration of internal features within individual signals, with a general absence of deep feature analysis from a multi-scale perspective.

To address the aforementioned challenges, this paper proposes a Multi-Scale Attention-Fused Domain Adaptation Network (MADAN). The model aims to achieve high-precision robotic grinding state detection by deeply mining multi-dimensional features of individual signals and combining advanced transfer learning strategies. The contributions of this paper are as follows:

We develop a robust transfer monitoring framework for robotic grinding by integrating an enhanced signal augmentation module with adversarial domain adaptation, which alleviates the dependency on labeled target data and substantially improves model generalization under extreme noise and fluctuating operating conditions.
A multi-scale attention-aware feature fusion strategy is proposed to adaptively integrate feature flows across multiple receptive fields. This enables dynamic weight allocation based on signal content, significantly enhancing feature representation for complex grinding signals.
The proposed MADAN is validated extensively across diverse operating conditions and distinct robotic platforms. Experimental results demonstrate that MADAN consistently outperforms State-of-the-Art methods, achieving superior detection accuracy and stability in cross-condition and cross-device scenarios.

The remainder of this paper is organized as follows: Section 2 briefly overviews the relevant theoretical foundations; Section 3 provides a detailed introduction to the proposed methodological framework; subsequent sections evaluate and compare the model’s performance through cross-condition and cross-device state detection experiments; finally, the paper concludes with a summary and conclusions.

2. Fundamentals of Adversarial Transfer Learning

Adversarial Transfer Learning effectively reduces the feature distribution discrepancy between the source and target domains by introducing an adversarial training mechanism, thereby enhancing the generalization performance of the model on the target domain. Its core idea is to construct a minmax game between a feature extractor

G_{f}

and a domain discriminator

G_{d}

: the feature extractor strives to learn domain-invariant features that remain consistent across both domains, while the domain discriminator attempts to accurately distinguish whether the features originate from the source or target domain.

This paper adopts an adversarial training framework based on the Gradient Reversal Layer (GRL). During forward propagation, the GRL acts as an identity transformation, whereas during backpropagation, it multiplies the gradient received from the domain discriminator by a negative coefficient

- λ

before passing it back to the feature extractor. This mechanism can be formalized as the following optimization objective:

\underset{θ_{f}, θ_{y}}{m i n} \underset{θ_{d}}{m a x} L = L_{cls} - λ \cdot L_{domain}

(1)

where

L_{cls}

denotes the classification loss on the source domain,

L_{domain}

represents the domain discrimination loss, and

λ

is a trade-off coefficient. Through this adversarial process, the feature extractor is driven to learn deep representations that are both discriminative and domain-invariant, thereby achieving stable and accurate state monitoring on the target domain.

The primary advantage of this adversarial transfer learning approach lies in its ability to align feature distributions without requiring target domain labels. This capability significantly reduces the model’s reliance on labeled data from the target domain. In real-world robotic grinding environments, acquiring high-quality, comprehensively annotated data is notoriously expensive, time-consuming, and labor-intensive. Operating conditions such as running regimes and specific robotic platforms often fluctuate, leading to the emergence of data silos. A model trained on one specific setup often fails to generalize to another due to these severe domain shifts. By minimizing the domain discrepancy through an unsupervised adversarial minmax game, the proposed framework successfully bridges these variations. It extracts generalized, domain-invariant degradation features solely from the unlabeled target signals, guided by the rich knowledge of the source domain. This makes it particularly suitable for cross-domain diagnosis tasks in real industrial scenarios where labeled data is scarce, ultimately enhancing the rapid deployability and scalability of the monitoring framework across diverse production lines.

3. Methodology

To overcome the degradation of monitoring accuracy caused by domain shifts in robotic grinding processes, this section details the architecture of the proposed Multi-Scale Attention-Fused Domain Adaptation Network (MADAN). As illustrated in Figure 1, the MADAN framework is meticulously engineered to extract domain-invariant and discriminative features from raw vibration signals. The architecture comprises three primary functional modules: (1) the Pre-processing and Augmentation module, which employs an enhanced signal transformation strategy to improve the data quality and expand the feature space under noisy industrial conditions; (2) the Multi-scale Attention-aware Feature Extractor, which utilizes parallel convolutional pathways with different receptive fields and a Cross-Scale Self-Attention Fusion Network mechanism to adaptively capture and fuse intricate temporal-frequency characteristics; and (3)the Adversarial Domain Adaptation Module, which facilitates the alignment of feature distributions between the source and target domains through a minimax game between the feature generator and the domain discriminator. By synergistically integrating these components, MADAN effectively mitigates the negative impact of cross-condition and cross-device variations, ensuring precise and robust grinding quality detection.

3.1. Data Augmentation

In deep learning-based fault diagnosis models, data augmentation is a crucial technique for enhancing model robustness, mitigating overfitting, and alleviating inter-domain distribution discrepancies. Particularly in industrial vibration signal analysis scenarios, signals under different operating conditions, equipment, or loads often exhibit significant differences. Direct training can easily cause the model to overfit to source-domain-specific features, hindering generalization to the target domain. To address this, this study proposes a feature-level data augmentation method for one-dimensional vibration signals. By applying various controllable perturbations to the input signals during training, it forces the feature extraction network to learn intrinsic features that are insensitive to domain changes and more discriminative.

Let the original input signal be

X \in R^{B \times C \times L}

, where

B

is the batch size,

C

is the number of channels (in this study

C = 1

), and

L

is the signal length. During the training phase, the data augmentation module applies the following transformations to

X

with certain probabilities:

(1) Gaussian Noise Injection: To enhance model robustness against subtle signal fluctuations and measurement noise, zero-mean Gaussian noise is added to the input signal:

X_{noise} = X + ϵ, ϵ \sim N (0, σ^{2} \cdot std (X))

(2)

where

σ

is a configurable noise standard deviation, and

std (X)

is the standard deviation of the signal along the time dimension, enabling adaptive noise intensity.

(2) Random Scaling: Simulates amplitude variations in signals due to sensor gain or operating condition changes:

X_{scale} = α X, α \sim U (a_{\min}, a_{\max})

(3)

where

α

is a uniformly sampled scaling factor. By adjusting the overall energy distribution of the signal, it improves the model’s adaptability to amplitude variations.

(3) Frequency-Domain Perturbation: This operator aims to alter the spectral distribution of the signal without disrupting its fundamental phase structure. First, the signal is transformed to the frequency domain using the Discrete Fourier Transform (DFT). A random perturbation, ranging from 0.9 to 1.1 times, is applied to the amplitude spectrum

∣ X (f) ∣

, followed by an inverse transform (IDFT). The final result is fused via a residual connection:

x_{f r e q} = (1 - α) x + α \cdot IDFT (∣ X (f) ∣_{p e r t u r b e d} \cdot e^{j ∠ X (f)})

(4)

where the fusion coefficient

α

is set to 0.3.

(4) Random Shift: Applies cyclic shift to the signal, simulating phase changes or start-point uncertainty along the time axis:

X_{shift} = circshift (X, s), s \sim {- S, - S + 1, \dots, S}

(5)

where

S

is the maximum shift range, typically set to a certain proportion of the signal length. This operation helps the model focus on signal morphology rather than absolute phase.

(5) Random Masking: Randomly sets a continuous segment of the signal to zero, simulating data loss or local interference:

X_{mask} [t : t + Δ t] = 0, t \sim U (0, L - Δ t)

(6)

This strategy forces the network not to overly rely on local signal segments, enhancing the globality of feature extraction.

(6) Temporal Jitter: Adds slight high-frequency jitter to the signal, simulating minor perturbations during sensor acquisition:

X_{jitter} = X + η, η \sim N (0, β^{2} \cdot std (X))

(7)

where

β

is the jitter coefficient, much smaller than the noise standard deviation

σ

.

The above augmentation strategies are applied in random combinations during training. All augmentation parameters undergo normalization to ensure the basic statistical properties of the signal are not disrupted. During the testing phase, this module is disabled, and the original signal is passed directly. Experiments show that this data augmentation module effectively enhances the generalization capability of the feature extraction network for cross-domain signals, providing more robust input representations for subsequent multi-scale feature fusion and domain alignment.

3.2. Multi-Scale Feature Extraction Network

In the condition monitoring of robot grinding systems, vibration signals contain rich time-domain and frequency-domain features. Different fault patterns are often reflected in signal components at different temporal scales. To fully exploit the deep representation information in the signal, this paper designs a multi-scale parallel feature extraction network aimed at simultaneously capturing local detail features and global trend features within the signal, constructing feature representations with strong discriminability and robustness.

Let the input signal be

X \in R^{B \times 1 \times L}

, where

B

is the batch size and

L

is the signal length (in this study

L = 1024

). The network employs three parallel convolutional branches corresponding to large, medium, and small receptive fields, respectively. Its structure is shown in Figure 2.

The large receptive field branch employs convolutional kernels with a size

k_{1} = 32

and stride

s_{1} = 4

, designed to capture low-frequency components and long-term trends in the signal. These features typically correlate with the overall energy distribution and slowly varying characteristics induced by certain fault modes. The medium receptive field branch utilizes kernels of size

k_{2} = 16

and stride

s_{2} = 2

, targeting mid-frequency features such as periodic modulation components resulting from repetitive impacts or specific grinding dynamics. The small receptive field branch employs smaller kernels (

k_{3} = 8

) and stride

s_{3} = 2

to focus on high-frequency details and transient impulses, which are critical for identifying sudden anomalies or fine-grained surface quality variations. Each branch is constructed as a lightweight sequential stack comprising a 1D convolutional layer, batch normalization, a ReLU activation function, and a Dropout layer for regularization. The final layer of each branch employs Global Average Pooling (GAP) to transform the variable-length feature maps into fixed-dimensional feature vectors

f_{i} \in R^{B \times C_{i}}

, where

C_{i}

is the output channel number for the

i

-th branch.

To reduce feature redundancy and computational cost for subsequent fusion, the output of each branch is projected into a lower-dimensional space via a dedicated linear layer:

{\tilde{f}}_{i} = W_{i} f_{i} + b_{i}

(8)

where

{\tilde{f}}_{i} \in R^{B \times d_{i}}

,

W_{i}

is the projection matrix, and

d_{i} = C_{i} / 2

. The resulting feature set

\{{\tilde{f}}_{1}, {\tilde{f}}_{2}, {\tilde{f}}_{3}\}

represents abstract, scale-specific representations of the input signal.

From a signal processing perspective, this multi-scale parallel extraction can be viewed as an efficient, learnable form of multi-resolution analysis. The convolutional kernels with different sizes act as filters with varying passbands. This design enables the network to jointly model: (1) macro-features (large scale) reflecting global signal energy and trends; (2) meso-features (medium scale) corresponding to fault-induced periodic modulations; (3) micro-features (small scale) capturing transient impacts and high-frequency resonances. By extracting such complementary features in parallel, the network builds a more comprehensive and domain-invariant representation, forming a solid foundation for effective cross-domain alignment in subsequent modules.

3.3. Cross-Scale Self-Attention Fusion Network

The feature vectors obtained from the multi-scale feature extraction network represent information from the signal at different temporal scales. To effectively integrate these heterogeneous features and explore their intrinsic relationships, this paper proposes a Cross-Scale Self-Attention Fusion Network based on the Multi-Head Attention mechanism. This module aims to adaptively weigh the importance of different scale features and enhance the discriminative power of the representation through feature interaction, thereby obtaining a unified feature representation rich in multi-scale information. Its structure is shown in Figure 3.

Given the feature set

\{{\tilde{f}}_{1}, {\tilde{f}}_{2}, {\tilde{f}}_{3}\}

output by the multi-scale feature extraction network, with dimensions

d_{1}, d_{2}, d_{3}

respectively, each scale feature is first mapped to a unified semantic space via learnable linear projection layers to enable effective attention interaction:

z_{i} = W_{p}^{(i)} {\tilde{f}}_{i} + b_{p}^{(i)}

(9)

where

z_{i} \in R^{B \times D}

and

D

is the unified feature dimension. The projected features

z_{i}

not only share a consistent dimension but are also transformed into a comparable feature subspace.

Subsequently, the projected features from the three scales are stacked into a sequence form

Z = [z_{1}, z_{2}, z_{3}] \in R^{B \times 3 \times D}

, serving as the input to the multi-head self-attention module. For each attention head

h

, its Query, Key, and Value matrices are computed:

Q_{h} = Z W_{h}^{Q}, K_{h} = Z W_{h}^{K}, V_{h} = Z W_{h}^{V}

(10)

where

W_{h}^{Q}, W_{h}^{K}, W_{h}^{V}

are learnable parameters.

Attention weights are calculated via scaled dot-product attention:

{Attention}_{h} (Q_{h}, K_{h}, V_{h}) = Softmax (\frac{Q_{h} K_{h}^{⊤}}{\sqrt{D_{h}}}) V_{h}

(11)

This process allows the features at each scale to be adaptively reconstructed based on features from other scales, highlighting important information and suppressing redundancy. The outputs from all attention heads are concatenated and linearly projected to obtain the fused feature

F_{fused}

.

Finally, the fused feature undergoes a nonlinear transformation and dimensional mapping through a fully connected layer, yielding the final output feature

F_{out} \in R^{B \times D^{'}}

. The core advantage of this cross-scale self-attention fusion mechanism lies in its adaptive weight allocation capability, enabling the network to autonomously learn the importance of different scale features for the final diagnostic task without manually setting fixed weights. Furthermore, it facilitates global feature interaction across scales, allowing macro trend information and local detail information to complement each other. This enhances the discriminability and domain invariance of the final feature representation, providing high-quality features for subsequent domain alignment and classification tasks.

3.4. Network Optimization of MADAN

To effectively train the proposed MADAN framework, this work constructs an improved multi-task joint optimization objective. Rather than relying on a single adaptation scheme, we propose a dual-mechanism alignment strategy at the architecture level, synergizing both statistical constraints and adversarial games. The total loss function consists of three parts: a label-smoothed supervised classification loss, a domain distribution alignment loss, and a domain adversarial loss. At the loss function level, these components are jointly optimized via a dynamic weight scheduling strategy, expressed mathematically as:

L_{total} = L_{cls} + λ_{align} \cdot L_{align} + λ_{adv} \cdot L_{adv}

(12)

where

λ_{align}

and

λ_{adv}

are the weight coefficients for the domain alignment loss and adversarial loss, respectively, used to balance the contributions of different optimization objectives.

3.4.1. Supervised Classification Loss

The classification loss aims to ensure the model learns accurate fault pattern discrimination from labeled source domain data. A label-smoothed cross-entropy loss function is employed to enhance model generalization and mitigate overfitting to source domain data:

L_{cls} = [CE ({\hat{y}}_{s}, y_{s})]

(13)

where

{\hat{y}}_{s}

is the predicted output from the classifier for the source domain samples, and

y_{s}

is the corresponding ground truth label. Label smoothing mixes the one-hot encoding of the true label with a uniform distribution to alleviate the model’s overconfidence in training samples.

3.4.2. Domain Distribution Alignment Loss

To reduce the feature distribution discrepancy between the source and target domains, a domain alignment loss based on Maximum Mean Discrepancy (MMD) is introduced. This loss measures the distance between two distributions in a Reproducing Kernel Hilbert Space (RKHS):

L_{align} = ∥ \frac{1}{n_{s}} \sum_{i = 1}^{n_{s}} ϕ (f_{s}^{(i)}) - \frac{1}{n_{t}} \sum_{j = 1}^{n_{t}} ϕ (f_{t}^{(j)}) ∥_{H}^{2}

(14)

where

f_{s}

and

f_{t}

are the features extracted by the feature network for source and target domain samples, respectively,

ϕ (\cdot)

is the kernel function mapping, and

H

is the RKHS. This study employs Multi-Kernel MMD (MK-MMD) to enhance the robustness of the metric. Furthermore, the alignment weight

λ_{align}

adopts a dynamic scheduling strategy, linearly increasing with the training epoch to a preset maximum, allowing the model to focus more on the classification task initially and gradually strengthen domain alignment later.

3.4.3. Domain Adversarial Loss

To learn domain-invariant features, a domain adversarial training mechanism based on the Gradient Reversal Layer is introduced. This module contains a domain classifier D aimed at distinguishing whether features come from the source or target domain, while the feature extractor G is trained to fool this classifier, forming an adversarial game:

L_{adv} = \frac{1}{2} [CE (D (G (x_{s})), 0) + CE (D (G (x_{t})), 1)]

(15)

The Gradient Reversal Layer passes features identically during forward propagation. During backpropagation, it multiplies the gradient computed by the domain classifier by a negative coefficient

- λ_{adv}

before passing it back to the feature extractor, thereby pushing the feature distribution to evolve in a direction that confuses the domain classifier.

4. Experiments and Discussion

To evaluate the performance of the proposed Multi-Scale Attention-Fused Domain Adaptation Network (MADAN) for cross-domain fault diagnosis in robotic grinding, this study designed a systematic experimental workflow. The dataset was split into a 70% training set and 30% test set with balanced fault categories. An auxiliary supervision mechanism was added to improve feature learning diversity and the model’s generalization ability in the target domain. A multi-dimensional signal augmentation strategy was adopted to simulate typical industrial disturbances (e.g., load fluctuations, sensor noise, spindle speed drift, and tool wear), including Gaussian noise injection (sensor noise), signal amplitude adjustment (grinding pressure fluctuations), signal shifting (spindle speed drift), and signal time-axis warping (grinding contact changes).

The model was trained with a total loss function combining classification loss, Multi-kernel Maximum Mean Discrepancy (MKMMD) domain alignment loss, and adversarial domain loss. The AdamW optimizer, combined with a cosine annealing learning rate scheduler and early stopping mechanism, was used to balance convergence and prevent overfitting. The loss weights for the main classifier and auxiliary branch were set to 0.7 and 0.3, respectively. Domain alignment was achieved via MKMMD (kernel bandwidth = 2.0, temperature parameter = 1.0), and the domain adaptation loss weight was dynamically increased during training to align source and target domain feature distributions.

Since MADAN’s performance is affected by hyperparameters, Optuna was used for automatic tuning. The Tree-structured Parzen Estimator (TPE) was selected for hyperparameter sampling with pruning to improve efficiency. Two types of hyperparameters were optimized: network structure (convolutional kernel sizes, attention heads, channels, residual blocks) and optimization parameters (penalty coefficients for MKMMD and adversarial losses). Hyperparameter optimization results are shown in Table 1.

4.1. Cross-Condition State Monitoring

4.1.1. Dataset Description

To evaluate the practical performance of the proposed MADAN model, an experimental data acquisition system was developed based on a UR5 collaborative robot. As shown in Figure 4, the grinding unit’s core is a UR5 robot integrated onto a linear guide rail. A flat-wheel grinding tool was mounted on the robot’s end-effector, and an accelerometer was attached to the fixture to capture vibration signals along the grinder’s rotational axis. This study used 2024-T3 aluminum alloy and Carbon Fiber Reinforced Polymer (CFRP) as workpieces (detailed specifications in Table 2). Based on surface texture, grinding quality was divided into three states: under-grinding, normal grinding, and over-grinding (see Figure 5), where the actual dimensions (length and width) of the three images are 2 × 1.5 mm. Both under-grinding and over-grinding are abnormal, caused by contact force anomalies (insufficient or excessive contact) from grinding path deviations. The UR5 grinding dataset was divided into four operational conditions by grinding speed, feed rate, and material type (detailed in Table 3), with 1500 samples per condition, and the sampling frequency is 10,000 Hz. Four cross-domain state monitoring tasks (U1→U2, U2→U1, U3→U4, U4→U3) were designed to assess MADAN’s transfer learning performance.

The loss convergence trajectories for each task are shown in Figure 6. At the start of training, total loss for all tasks decreased sharply within the first 20 epochs, indicating rapid convergence. Although U3→U4 and U4→U3 tasks showed minor oscillations in mid-to-late training, all curves stabilized within [0.40, 0.45] after about 80 epochs. These results show MADAN has good training stability and strong transfer learning capabilities across different grinding conditions.

4.1.2. Experimental Results

In this experiment, five domain adaptation methods are used as baselines: Deep CORAL [35]: Aligns the second-order statistical differences in features between domains but focuses only on global distribution matching at a single scale, missing local details of grinding signals. DAN [36]: Combines MK-MMD loss and classification loss to learn domain-invariant features, but lacks a mechanism for adaptive fusion of multi-scale features. DANN [37]: Uses a Gradient Reversal Layer (GRL) for adversarial training to achieve domain invariance but does not include a multi-scale feature extraction module for grinding signals. ITLDMFD [38]: Integrates information theory and multi-scale feature decomposition but fuses features with fixed weights, unable to dynamically adjust according to grinding conditions. DADW [41]: a hybrid domain adaptation method that dynamically preserves critical parameters via Fisher information and aligns domains adaptively for stable cross-domain transfer learning.

Based on the quantitative results in Table 4 and visual analysis in Figure 7, the proposed MADAN exhibits significant performance advantages in cross-condition state monitoring for robotic grinding. In all four cross-condition transfer tasks (U1→U2, U2→U1, U3→U4, U4→U3), MADAN achieves the highest recognition accuracy with an average of 96.83%, which is 1.81 percentage points higher than ITLDMFD (the second-ranked method) and 0.89 percentage points higher than DADW, outperforming all comparative methods comprehensively. In simple transfer tasks with small condition differences (U1→U2, U2→U1), MADAN reaches 100% accuracy in the U1→U2 task and 99.67% in the U2→U1 task, with standard deviations of only ±0.00 and ±0.24, respectively. The excellent stability indicates that the model achieves complete alignment between the source and target domains under small domain shift scenarios. In the most challenging complex transfer tasks with large condition differences (U3→U4, U4→U3), MADAN attains an accuracy of 92.31% in the U3→U4 task, which is 2.20 and 2.67 percentage points higher than ITLDMFD and DANN, respectively; and 95.34% in the U4→U3 task, 2.51 percentage points higher than ITLDMFD, demonstrating strong adaptability to complex grinding conditions. These results confirm that the strategy of “multi-scale feature extraction + cross-scale attention fusion + MK-MMD and GRL dual-mechanism domain alignment” adopted by MADAN can effectively learn feature representations with both domain invariance and high discriminability, thereby achieving superior performance in cross-condition state monitoring for robotic grinding.

The confusion matrix and feature visualization further verify the feature learning capability and classification reliability of MADAN. Confusion matrix analysis (Figure 8) shows that in simple transfer tasks, MADAN achieves 100% classification accuracy in the U1→U2 task, with only 0.02 sample misclassifications in the U2→U1 task, presenting nearly perfect stability. In complex transfer tasks, most categories maintain 100% or near-100% accuracy, with only a small number of misclassifications in the highly overlapping C1 category, which is highly consistent with the quantitative results. Table 5 presents a comparative analysis of precision, recall, and F1-score for different models on the U4→U3 task across three classes (0, 1, 2). The results demonstrate that the MADAN model achieves the most balanced and superior performance across all metrics. In particular, it significantly outperforms other models in terms of F1-score, reaching 91.98% and 97.54% for classes 1 and 2, respectively, reflecting its strong capability in cross-domain feature extraction and classification. Feature visualization of the U3→U4 task (Figure 9) intuitively reflects the advantages of MADAN: comparative methods (Deep Coral, DAN, DANN, ITLDMFD, DADW) all exhibit problems such as intra-class feature dispersion, insufficient cross-domain alignment or ambiguous class boundaries to varying degrees. In contrast, the feature distribution of MADAN presents an ideal state of “intra-class aggregation and inter-class separation”—the same-class feature points from the source and target domains are fully clustered, with clear boundaries between different classes. This fully demonstrates the core value of MADAN’s integrated strategy in learning features with both domain invariance and class discriminability, and further verifies the effectiveness and robustness of MADAN in cross-condition state monitoring for robotic grinding.

4.2. Cross-Device State Monitoring

4.2.1. Dataset Description

To verify the industrial feasibility of the proposed Multi-Scale Attention-Fused Domain Adaptation Network (MADAN), experimental data were collected and validated on an industrial robot platform. As shown in Figure 10, an ABB-IRB-4600 robot is equipped with an accelerometer to collect vibration signals during the grinding process. Main parameter settings are presented in Table 6. The dataset consists of 1500 samples and the sampling frequency is 10,000 Hz, each being a 1-dimensional vibration sequence of length 2000. To evaluate the transfer learning performance of MADAN, four cross-device condition monitoring tasks were designed via condition combinations: U1→U5, U2→U5, U3→U5, and U4→U5. Figure 11 displays the total loss curves of the model under these four transfer tasks. The results indicate: Rapid Convergence: In the early training stage (within the first 10 epochs), the total loss of all four tasks drops sharply, meaning the model efficiently captures common features between the source and target domains. Training Stability: With the progress of training, the loss curves gradually stabilize. Though minor local fluctuations occur due to the adversarial domain adaptation mechanism and dynamic weight adjustments, the overall trend shows good convergence. Consistency Across Tasks: The convergence trajectories of the four tasks are highly consistent, with the final total loss stabilizing at approximately 0.4. This proves that the MADAN model has strong robustness and good generalization capabilities when processing vibration signals of industrial robot grinding under different operating conditions.

4.2.2. Experimental Results

Consistent with Section 4.1.2, five mainstream domain adaptation methods (Deep Coral, DAN, DANN, ITLDMFD, DADW) are selected as baselines. Based on the quantitative results in Table 7 and visual analysis in Figure 12, the proposed MADAN demonstrates excellent performance and robustness in the cross-device transfer task (UR5 collaborative robot → ABB industrial robot) for robotic grinding state monitoring: in all four cross-device transfer tasks (U1→U5, U2→U5, U3→U5, U4→U5), MADAN achieves the highest average accuracy of 95.68%, which is 0.58 percentage points higher than ITLDMFD (95.10%, the second-best method) and 6.01 percentage points higher than the traditional method DAN, verifying its effectiveness in large domain shift scenarios such as cross-device transfer. In the most challenging U4→U5 task with the largest differences in equipment and operating conditions, MADAN reaches an accuracy of 96.38%, which is 0.82 percentage points higher than ITLDMFD (95.56%) and 6.04 percentage points higher than DANN (90.34%). According to the standard deviation data in Table 6, MADAN has the smallest accuracy fluctuation in each task (e.g., only ±0.24 in the U2→U5 task), outperforming all other comparative methods, indicating its more stable performance in cross-device transfer. The bar chart in Figure 12 intuitively confirms this: MADAN maintains the highest accuracy in all tasks, with a prominent advantage in difficult scenarios such as U4→U5. These results confirm that MADAN’s strategy of “multi-scale feature extraction + cross-scale attention fusion + MK-MMD and GRL dual-mechanism domain alignment” can effectively address the large domain shift problem in cross-device scenarios and learn feature representations with both domain invariance and high discriminability, thus achieving excellent performance in cross-device state monitoring.

The confusion matrix and feature visualization further verify the feature learning capability and classification reliability of MADAN in cross-device transfer: confusion matrix analysis (Figure 13) shows that in the four cross-device tasks, MADAN maintains an accuracy of over 95% for most categories. Specifically, the C0 category achieves 100% accuracy in the U4→U5 task, with only a small number of misclassifications in the highly overlapping C1 category, which is highly consistent with the quantitative results. Table 8 shows the comparative analysis of precision, recall, and F1-score of different models on the U4→U5 task for the three classes (0, 1, 2). The overall performance is steadily improved: with the refinement of the model architecture, the overall classification performance is significantly enhanced. The MADAN model achieves the most balanced and outstanding F1-scores across the three classes (96.60%, 94.58%, and 95.88%, respectively), demonstrating its high robustness under complex cross-domain tasks. Feature visualization of the U1→U5 task (Figure 14) intuitively reflects the advantages of MADAN: comparative methods (Deep Coral, DAN, DANN, ITLDMFD, DADW) all have problems such as cross-domain intra-class feature dispersion and ambiguous class boundaries. In contrast, the feature distribution of MADAN presents an ideal state of “cross-domain aggregation and inter-class separation”—the same-class feature points from the source and target domains are fully clustered, with clear boundaries between different classes. This confirms the effectiveness of MADAN’s strategy in addressing the large domain shift problem in cross-device scenarios and verifies its reliability and robustness in cross-device state monitoring.

4.3. Ablation Study

To verify the effectiveness of each module in the proposed network, the following experiments were designed. A: Remove the Cross-Scale Self-Attention Fusion Network module and replace it with concatenation and addition. B: Replace the multi-scale channels with a single channel (retaining the large-scale channel) for direct transfer. C: Replace the multi-scale channels with a single channel (retaining the medium-scale channel) for direct transfer. D: Replace the multi-scale channels with a single channel (retaining the small-scale channel) for direct transfer. E: Model without data augmentation module. F: Complete model.

Based on the violin plots in Figure 15 and Table 9, this ablation study is conducted to evaluate the efficacy and indispensability of the core components within the proposed MADAN—specifically, the Multi-scale Channels and Cross-Scale Self-Attention Fusion Network Fusion. To this end, six comparative variants were designed, where Variant F denotes the full model, and Variants A through E represent ablation versions where core modules are systematically removed or substituted. From the perspective of accuracy distribution, the full model (Variant F) demonstrates superior performance, with accuracy predominantly clustered between 95% and 100%. Its minimal variance and high mean accuracy underscore its robustness. In contrast, Variant A (where the Cross-Scale Self-Attention Fusion Network mechanism is replaced by simple concatenation and summation) exhibits a broader accuracy spread ranging from 85% to 100%, with inferior mean performance and stability compared to Variant F. This validates that Cross-Scale Self-Attention Fusion Network fusion facilitates superior adaptive feature aggregation, whereas static operations like concatenation or addition fail to dynamically capture the weighted relationships between features at different scales. Furthermore, the performances of Variant B (large-scale single channel only), Variant C (mid-scale single channel only), and Variant D (small-scale single channel only) are significantly lower than that of the full model. Notably, Variant C shows the most substantial fluctuations (70–100%) and the lowest mean accuracy, highlighting the critical complementarity of multi-scale channels. A single scale, whether coarse or fine, captures only partial information, failing to represent the multi-granular characteristics of robotic grinding signals. In contrast, multi-scale fusion comprehensively extracts both global trends and local details. When integrated with the adaptive aggregation of the Cross-Scale Self-Attention Fusion Network mechanism, it yields a significant performance boost. Finally, the comparison between Variant F and Variant E (Data Augmentation) underscores the effectiveness of the proposed data augmentation module in enhancing the model’s generalization and reliability.

5. Conclusions

To deal with the degradation of monitoring accuracy caused by domain shifts in robotic grinding processes, this paper proposes a Multi-Scale Attention-Fused Domain Adaptation Network (MADAN) for robotic grinding state monitoring. Through theoretical analysis and experimental validation, the following conclusions are drawn:

The introduced multi-scale fusion mechanism with Cross-Scale Self-Attention Fusion Network adaptively captures critical features from vibration signals across frequency bands and time scales, significantly improving the characterization of complex grinding physics.
By integrating enhanced signal augmentation with adversarial domain adaptation, MADAN effectively mitigates domain distribution discrepancies. This enables the extraction of domain-invariant features that are robust to operational and hardware variations, enhancing model transferability without target domain labels.
MADAN achieved superior performance, with an average accuracy of 96.83% in cross-condition scenarios and still a high performance of 95.68% in more challenging cross-device tasks, outperforming existing State-of-the-Art methods and demonstrating strong potential for industrial deployment.

Despite the proposed MADAN being proven to be effective in robotic grinding monitoring, there are still several research issues that can be further addressed in the future. First, in order to be applied more effectively in engineering practice, the model could be optimized via compression and acceleration techniques (pruning or quantization) to realize low-latency inference on embedded edge computing nodes for edge-side real-time deployment. Moreover, multi-source sensor data (such as acoustic emission, force sensing, and industrial machine vision) and a deep multimodal fusion technique can be adopted, based on which the model’s monitoring sensitivity will be further improved for complex machining tasks.

Author Contributions

Conceptualization, P.X., H.C. and L.Q.; Methodology, P.X., H.C., S.Z. and L.Q.; Validation, S.H.; Investigation, P.X., S.Z. and S.H.; Resources, P.X. and L.Q.; Data curation, P.X. and L.Q.; Writing—original draft, P.X., H.C. and S.Z.; Writing—review & editing, H.C. and L.Q.; Visualization, H.C. and S.H.; Supervision, L.Q.; Project administration, P.X.; Funding acquisition, P.X. and L.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant (No. 52305033), and in part by “the Fundamental Research Funds for the Central Universities” (WUT: 104972024KFYjc0017&213118004).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Penghang Xiong was employed by the company HexaCercle Science & Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Diez-Olivan, A.; Del Ser, J.; Galar, D.; Sierra, B. Data fusion and machine learning for industrial prognosis: Trends and perspectives towards Industry 4.0. Inf. Fusion 2019, 50, 92–111. [Google Scholar] [CrossRef]
Peta, K.; Wiśniewski, M.; Kotarski, M.; Ciszak, O. Comparison of Single-Arm and Dual-Arm Collaborative Robots in Precision Assembly. Appl. Sci. 2025, 15, 2976. [Google Scholar] [CrossRef]
Zhu, D.; Feng, X.; Xu, X.; Yang, Z.; Li, W.; Yan, S.; Ding, H. Robotic grinding of complex components: A step towards efficient and intelligent machining—Challenges, solutions, and applications. Robot. Comput. Manuf. 2020, 65, 101908. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, Y.; Zhao, X.; Tao, B.; Ding, H. Deadlock-Aware Control for Multirobot Coordination With Multiple Safety Constraints. IEEE Trans. Robot. 2025, 41, 5209–5228. [Google Scholar] [CrossRef]
Leali, F.; Vergnano, A.; Pini, F.; Pellicciari, M.; Berselli, G. A workcell calibration method for enhancing accuracy in robot machining of aerospace parts. Int. J. Adv. Manuf. Technol. 2016, 85, 47–55. [Google Scholar] [CrossRef]
Baek, D.K.; Ko, T.J.; Kim, H.S. Optimization of feedrate in a face milling operation using a surface roughness model. Int. J. Mach. Tools Manuf. 2001, 41, 451–462. [Google Scholar] [CrossRef]
Ehmann, K.; Hong, M. A Generalized Model of the Surface Generation Process in Metal Cutting. CIRP Ann. 1994, 43, 483–486. [Google Scholar] [CrossRef]
Lee, B.; Tarng, Y. Surface roughness inspection by computer vision in turning operations. Int. J. Mach. Tools Manuf. 2001, 41, 1251–1263. [Google Scholar] [CrossRef]
Sodhi, M.S.; Tiliouine, K. Surface roughness monitoring using computer vision. Int. J. Mach. Tools Manuf. 1996, 36, 817–828. [Google Scholar] [CrossRef]
Lu, C. Study on prediction of surface quality in machining process. J. Mech. Work. Technol. 2008, 205, 439–450. [Google Scholar] [CrossRef]
Niola, V.; Cosenza, C.; Fornaro, E.; Malfi, P.; Melluso, F.; Nicolella, A.; Savino, S.; Spirto, M. Torque/Speed Equilibrium Point Monitoring of an Aircraft Hybrid Electric Propulsion System Through Accelerometric Signal Processing. Appl. Sci. 2025, 15, 2135. [Google Scholar] [CrossRef]
Huang, J.; Huangfu, C.; Zhang, Q.; Li, S.; Yan, Y.; Cai, J. Actuator Fault Diagnosis of 3-PR (P) S Parallel Robot Based on DBO-BP Neural Network. J. Dyn. Monit. Diagn. 2025, 4, 91–100. [Google Scholar] [CrossRef]
Li, X.; Yu, S.; Lei, Y.; Li, N.; Yang, B. Intelligent Machinery Fault Diagnosis with Event-Based Camera. IEEE Trans. Ind. Inform. 2024, 20, 380–389. [Google Scholar] [CrossRef]
Nti, I.K.; Adekoya, A.F.; Weyori, B.A.; Nyarko-Boateng, O. Applications of artificial intelligence in engineering and manufacturing: A systematic review. J. Intell. Manuf. 2021, 33, 1581–1601. [Google Scholar] [CrossRef]
Tercan, H.; Meisen, T. Machine learning and deep learning based predictive quality in manufacturing: A systematic review. J. Intell. Manuf. 2022, 33, 1879–1905. [Google Scholar] [CrossRef]
Nieto, P.G.; García-Gonzalo, E.; Lasheras, F.S.; Juez, F.d.C. Hybrid PSO–SVM-based method for forecasting of the remaining useful life for aircraft engines and evaluation of its reliability. Reliab. Eng. Syst. Saf. 2015, 138, 219–231. [Google Scholar] [CrossRef]
Song, B.; Tan, S.; Shi, H.; Zhao, B. Fault detection and diagnosis via standardized k nearest neighbor for multimode process. J. Taiwan Inst. Chem. Eng. 2020, 106, 1–8. [Google Scholar] [CrossRef]
Cerrada, M.; Zurita, G.; Cabrera, D.; Sánchez, R.-V.; Artés, M.; Li, C. Fault diagnosis in spur gears based on genetic algorithm and random forest. Mech. Syst. Signal Process. 2016, 70–71, 87–103. [Google Scholar] [CrossRef]
Kim, Y.; Park, J.; Na, K.; Yuan, H.; Youn, B.D.; Kang, C.-S. Phase-based time domain averaging (PTDA) for fault detection of a gearbox in an industrial robot using vibration signals. Mech. Syst. Signal Process. 2020, 138, 106544. [Google Scholar] [CrossRef]
Yin, T.; Lu, N.; Guo, G.; Lei, Y.; Wang, S.; Guan, X. Knowledge and data dual-driven transfer network for industrial robot fault diagnosis. Mech. Syst. Signal Process. 2022, 182, 109597. [Google Scholar] [CrossRef]
Cao, X.-C.; Chen, B.-Q.; Yao, B.; He, W.-P. Combining translation-invariant wavelet frames and convolutional neural network for intelligent tool wear state identification. Comput. Ind. 2019, 106, 71–84. [Google Scholar] [CrossRef]
Cai, W.; Zhang, W.; Hu, X.; Liu, Y. A hybrid information model based on long short-term memory network for tool condition monitoring. J. Intell. Manuf. 2020, 31, 1497–1510. [Google Scholar] [CrossRef]
Lu, H.; Zhao, X.; Tao, B.; Yin, Z. Online Process Monitoring Based on Vibration-Surface Quality Map for Robotic Grinding. IEEE/ASME Trans. Mechatron. 2020, 25, 2882–2892. [Google Scholar] [CrossRef]
Chung, K.-J.; Dai, C.-H.; Chiang, T.-C.; Xie, J.-J.; Lin, M.-T. Application of Recurrence Plots and VGG Deep Learning Model to the Study of Condition Monitoring of Robotic Grinding. Int. J. Precis. Eng. Manuf. 2023, 24, 1675–1683. [Google Scholar] [CrossRef]
Zhao, X.; Lu, H.; Yu, W.; Tao, B.; Ding, H. Robotic Grinding Process Monitoring by Vibration Signal Based on LSTM Method. IEEE Trans. Instrum. Meas. 2022, 71, 1–10. [Google Scholar] [CrossRef]
He, W.; Mao, J.; Wang, Y.; Li, Z.; Xie, H.; Shao, H.; Zhao, X. Unified Diagnostic and Matching Framework of Fault and Quality for Robotic Grinding System. IEEE Trans. Instrum. Meas. 2024, 73, 1–12. [Google Scholar] [CrossRef]
Gupta, J.; Pathak, S.; Kumar, G. Deep Learning (CNN) and Transfer Learning: A Review. J. Phys. Conf. Series. 2022, 2273, 012029. [Google Scholar] [CrossRef]
Wang, J.; Chen, Y.; Hao, S.; Feng, W.; Shen, Z. Balanced distribution adaptation for transfer learning. In Proceedings of the IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA, 18–21 November 2017; pp. 1129–1134. [Google Scholar] [CrossRef]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D.D. A survey of transfer learning. J. Big Data 2016, 3, 1345–1459. [Google Scholar] [CrossRef]
Li, C.; Zhang, S.; Qin, Y.; Estupinan, E. A systematic review of deep transfer learning for machinery fault diagnosis. Neurocomputing 2020, 407, 121–135. [Google Scholar] [CrossRef]
Li, X.; Zhang, W.; Ding, Q.; Li, X. Diagnosing Rotating Machines With Weakly Supervised Data Using Deep Transfer Learning. IEEE Trans. Ind. Inform. 2020, 16, 1688–1697. [Google Scholar] [CrossRef]
Geng, J.; Deng, X.; Ma, X.; Jiang, W. Transfer Learning for SAR Image Classification Via Deep Joint Distribution Adaptation Networks. IEEE Trans. Geosci. Remote Sens. 2020, 58, 5377–5392. [Google Scholar] [CrossRef]
Matasci, G.; Volpi, M.; Kanevski, M.; Bruzzone, L.; Tuia, D. Semisupervised Transfer Component Analysis for Domain Adaptation in Remote Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 3550–3564. [Google Scholar] [CrossRef]
Tzeng, E.; Hoffman, J.; Zhang, N.; Saenko, K.; Darrell, T. Deep domain confusion: Maximizing for domain invariance. arXiv 2014. [Google Scholar] [CrossRef]
Sun, B.; Saenko, K. Deep CORAL: Correlation Alignment for Deep Domain Adaptation, Computer Vision—ECCV 2016 Workshops. arXiv 2016, arXiv:1607.01719. [Google Scholar]
Long, M.; Cao, Y.; Wang, J.; Jordan, M. Learning Transferable Features with Deep Adaptation Networks. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Volume 37, pp. 97–105. [Google Scholar]
Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; March, M.; Lempitsky, V. Domain-Adversarial Training of Neural Networks. In Domain Adaptation in Computer Vision Applications; Csurka, G., Ed.; Springer International Publishing: Cham, The Netherlands, 2017; pp. 189–209. [Google Scholar]
Chen, L.; Feng, G.; Zhen, D.; Jing, H.; Liang, X.; Gu, F. Cross-Condition Fault Diagnosis of Planetary Gearboxes Driven by Data-Model Fusion Based on Improved Domain-Adversarial Transfer Learning. Shock. Vib. 2025, 2025, 9982177. [Google Scholar] [CrossRef]
Gao, T.; Yang, J.; Tang, Q. A multi-source domain information fusion network for rotating machinery fault diagnosis under variable operating conditions. Inf. Fusion 2024, 106, 102278. [Google Scholar] [CrossRef]
Li, X.; Jiang, H.; Xie, M.; Wang, T.; Wang, R.; Wu, Z. A reinforcement ensemble deep transfer learning network for rolling bearing fault diagnosis with Multi-source domains. Adv. Eng. Informatics 2022, 51, 101480. [Google Scholar] [CrossRef]
Shi, J.; Zhao, X.; Tao, B.; Tang, Z.; Ding, T.; Lu, H.; Qiu, T.; Chen, D. Incremental transfer learning for robot drilling state monitoring under multiple working conditions. J. Intell. Manuf. 2025, 36, 3965–3982. [Google Scholar] [CrossRef]

Figure 1. The overall framework of the proposed MADAN.

Figure 2. Multi-Scale Feature Extraction Network.

Figure 3. Cross-Scale Self-Attention Fusion Network.

Figure 4. UR5 robotic grinding experimental platform.

Figure 5. Three different states of robot grinding images (with actual dimensions 2 × 1.5 mm).

Figure 6. Loss curves across different operational conditions.

Figure 7. Comparison results of condition monitoring tasks.

Figure 8. Visualization of confusion matrix results, (a) transfer task from U1 to U2; (b) transfer task from U2 to U1; (c) transfer task from U3 to U4; (d) transfer task from U4 to U3.

Figure 9. Feature visualization for the U3 to U4 transfer task.

Figure 10. ABB-IRB-4600 robot grinding platform.

Figure 11. Loss curves across different operational conditions.

Figure 12. Comparison results of condition monitoring tasks.

Figure 13. Visualization of confusion matrix results, (a) transfer task from U1 to U5; (b) transfer task from U2 to U5; (c) transfer task from U3 to U5; (d) transfer task from U4 to U5.

Figure 14. Feature visualization for the U1 to U5 transfer task.

Figure 15. Results of the ablation study.

Table 1. Hyperparameter optimization results of MADAN.

Hyperparameter	Search Range	Result
Large-scale convolutional kernel size	{16, 32, 64}	32
Medium-scale convolutional kernel size	{8, 16, 32}	16
Small-scale convolutional kernel size	{4, 8, 16}	8
Number of attention heads	{2, 4, 8}	4
Base number of channels	{16, 32, 64}	32
Number of residual blocks per scale	{1, 2, 3}	2
Domain alignment loss weight λ	[0.1, 5.0]	1.0
Adversarial loss weight	[0.01, 0.5]	0.1

Table 2. Parameters of experimental equipment.

Accelerometer Sensor (IEPE: 1A314E)	(1) Sensitivity: 101 mV/g sensitivity
	(2) Range: ±500 g
	(3) Frequency Response: 0.5–5000 Hz (X direction)
	0.5–7000 Hz (Y/Z direction)
	(4) Resonant Frequency: >28 kHz
	(5) Temperature Range: −40~+120 °C
Data Acquisition Instrument (DHDAS: DH5922D)	(1) Number of Channels: 8 CH + 2 CH
	(2) Maximum Analysis Bandwidth: DC-50 kHz
	(3) ADC Resolution: 24 bit/channel
Flat Sandpaper	(1) Diameter: 20 mm
	(2) Grit: 120
	(3) Material: Silicon dioxide
2024-T3 Aluminum (Al)	(1) Thickness: 4 mm
2024-T3 Aluminum (Al)	(2) Dimensions (Length × Width): 400 × 100 mm
Carbon Fiber Resin Composite (CF)	(1) Thickness: 4 mm
Carbon Fiber Resin Composite (CF)	(2) Dimensions (Length × Width): 400 × 100 mm

Table 3. Parameters of processing materials.

Dataset	Grinding Speed	Feed Speed	Grinding Material
U1	350 r/min	0.5 mm/s	CF
U2	350 r/min	0.25 mm/s	CF
U3	350 r/min	0.5 mm/s	2024-T3 Al
U4	350 r/min	0.25 mm/s	2024-T3 Al

Table 4. Experimental comparison results for MADAN.

Method	U1→U2	U2→U1	U3→U4	U4→U3	Mean
Deep Coral	96.20 ± 0.60	96.77 ± 0.50	89.47 ± 0.88	93.03 ± 0.99	93.87
DAN	96.83 ± 0.72	96.57 ± 0.53	88.11 ± 0.73	93.50 ± 1.01	93.75
DANN	97.84 ± 0.43	98.40 ± 0.23	89.64 ± 0.44	91.67 ± 0.53	94.39
ITLDMFD	98.62 ± 0.26	98.53 ± 0.39	90.11 ± 2.98	92.83 ± 0.51	95.02
DADW	99.61 ± 0.33	99.04 ± 0.29	91.09 ± 2.03	94.00 ± 1.01	95.94
MADAN	100.00 ± 0.00	99.67 ± 0.24	92.31 ± 1.56	95.34 ± 1.36	96.83

Table 5. Analysis of precision, recall, and F1-score for U4→U3.

Model	Precision (0/1/2)			Recall (0/1/2)			F1-Score (0/1/2)
Model	0	1	2	0	1	2	0	1	2
Deep Coral	89.19	91.84	90.00	99.00	82.57	90.00	93.84	86.96	90.00
DAN	83.33	100.0	100.0	100.0	82.00	98.00	90.91	90.11	98.99
DANN	91.59	97.53	87.50	98.00	79.00	98.00	94.69	87.29	92.45
ITLDMFD	90.09	91.49	96.84	100.0	86.00	92.00	94.79	88.66	94.36
DADW	94.34	97.67	90.74	100.0	84.00	98.00	97.09	90.32	94.23
MADAN	90.91	98.85	96.12	100.0	86.00	99.00	95.24	91.98	97.54

Table 6. Main technical parameters of ABB-IRB-4600 industrial robot.

Technical Indicators	Payload (kg)	Arm Load (kg)	Effective Working Radius (mm)	Repeat Positioning Accuracy (mm)
Design Parameters	40	20	2550	±0.06

Table 7. Experimental comparison results for MADAN.

Method	U1→U5	U2→U5	U3→U5	U4→U5	Mean
Deep Coral	90.23 ± 1.29	89.54 ± 1.87	88.74 ± 1.63	82.07 ± 2.17	87.65
DAN	91.21 ± 1.69	90.73 ± 2.01	89.41 ± 1.87	87.34 ± 1.72	89.67
DANN	90.78 ± 0.79	91.22 ± 1.24	88.56 ± 1.57	90.34 ± 2.13	90.23
ITLDMFD	96.01 ± 1.32	95.12 ± 1.64	93.69 ± 0.89	95.56 ± 1.41	95.10
DADW	95.89 ± 1.43	95.19 ± 0.64	92.13 ± 1.64	95.12 ± 1.51	94.58
MADAN	96.22 ± 1.42	95.32 ± 0.24	94.78 ± 1.37	96.38 ± 1.36	95.68

Table 8. Analysis of precision, recall, and F1-score for U4→U5.

Model	Precision (0/1/2)			Recall (0/1/2)			F1-Score (0/1/2)
Model	0	1	2	0	1	2	0	1	2
Deep Coral	79.84	79.80	82.93	99.00	79.00	68.00	88.40	79.40	74.73
DAN	78.13	92.41	93.68	100.0	73.00	89.00	87.50	81.56	91.30
DANN	81.97	96.88	94.12	100.0	91.00	80.00	90.01	93.86	86.49
ITLDMFD	96.15	92.08	90.20	100.0	93.00	92.00	98.04	92.54	91.09
DADW	95.28	88.78	96.84	100.0	94.00	91.00	97.59	91.33	93.84
MADAN	93.46	93.20	98.94	100.0	96.00	93.00	96.60	94.58	95.88

Table 9. Accuracy results of the ablation study.

Task	A	B	C	D	E	F
1→2	100	100	94.65	100	97.33	100
2→1	98.67	99.60	96.45	99.01	96.26	99.67
3→4	91.37	91.92	90.57	90.90	92.26	92.31
4→3	94.32	93.32	94.41	94.32	95.32	95.34
1→5	94.65	92.59	82.15	93.60	94.98	96.22
2→5	92.55	93.24	72.86	92.33	92.29	95.32
3→5	89.23	94.38	74.58	92.43	91.25	94.78
4→5	94.64	94.95	78.85	89.21	94.54	96.38

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xiong, P.; Chen, H.; Zhang, S.; He, S.; Qian, L. A Multi-Scale Attention-Fused Domain Adaptation Network for Robust Robotic Grinding Condition Monitoring. Machines 2026, 14, 307. https://doi.org/10.3390/machines14030307

AMA Style

Xiong P, Chen H, Zhang S, He S, Qian L. A Multi-Scale Attention-Fused Domain Adaptation Network for Robust Robotic Grinding Condition Monitoring. Machines. 2026; 14(3):307. https://doi.org/10.3390/machines14030307

Chicago/Turabian Style

Xiong, Penghang, Hao Chen, Shenan Zhang, Shuai He, and Lu Qian. 2026. "A Multi-Scale Attention-Fused Domain Adaptation Network for Robust Robotic Grinding Condition Monitoring" Machines 14, no. 3: 307. https://doi.org/10.3390/machines14030307

APA Style

Xiong, P., Chen, H., Zhang, S., He, S., & Qian, L. (2026). A Multi-Scale Attention-Fused Domain Adaptation Network for Robust Robotic Grinding Condition Monitoring. Machines, 14(3), 307. https://doi.org/10.3390/machines14030307

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Scale Attention-Fused Domain Adaptation Network for Robust Robotic Grinding Condition Monitoring

Abstract

1. Introduction

2. Fundamentals of Adversarial Transfer Learning

3. Methodology

3.1. Data Augmentation

3.2. Multi-Scale Feature Extraction Network

3.3. Cross-Scale Self-Attention Fusion Network

3.4. Network Optimization of MADAN

3.4.1. Supervised Classification Loss

3.4.2. Domain Distribution Alignment Loss

3.4.3. Domain Adversarial Loss

4. Experiments and Discussion

4.1. Cross-Condition State Monitoring

4.1.1. Dataset Description

4.1.2. Experimental Results

4.2. Cross-Device State Monitoring

4.2.1. Dataset Description

4.2.2. Experimental Results

4.3. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI