Next Article in Journal
Estimating Bus Mass Using a Hybrid Approach: Integrating Forgetting Factor Recursive Least Squares with the Extended Kalman Filter
Previous Article in Journal
Laterally Excited Bulk Acoustic Wave Resonators with Rotated Electrodes Using X-Cut LiNbO3 Thin-Film Substrates
Previous Article in Special Issue
Recipe Based Anomaly Detection with Adaptable Learning: Implications on Sustainable Smart Manufacturing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Tool Wear State Identification Method with Variable Cutting Parameters Based on Multi-Source Unsupervised Domain Adaptation

1
School of Mechatronic Engineering, Harbin Institute of Technology, Harbin 150001, China
2
Inspur Genersoft Co., Ltd., Jinan 250101, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(6), 1742; https://doi.org/10.3390/s25061742
Submission received: 9 January 2025 / Revised: 27 February 2025 / Accepted: 6 March 2025 / Published: 11 March 2025
(This article belongs to the Special Issue Artificial Intelligence and Sensing Technology in Smart Manufacturing)

Abstract

:
Accurately identifying tool wear states with variable cutting parameters can improve machining quality and efficiency. However, existing wear state recognition methods based on unsupervised domain adaptation mostly employ the knowledge transfer learning strategy in a single source domain. They cannot fully utilize the sensor data distribution information of multiple cutting parameters, hindering recognition performance improvement. Thus, this paper proposes a wear-state recognition method for variable cutting parameters based on multi-source unsupervised domain adaptation. First, non-stationary Transformer encoders extract non-stationary common features; then, sliced Wasserstein distance-based domain-specific feature distribution alignment and classifier output alignment scale down the domain shift and make multi-domain distribution synchronous alignment less complex. Finally, the milling experiments with variable cutting parameters are conducted to validate the recognition performance of the proposed method.

1. Introduction

Tool wear is one of the critical factors affecting machining quality and machining efficiency [1]. Severely worn tools cause the surface quality of the workpiece to deteriorate and reduce the dimensional accuracy of the workpiece. If processing continues until the tool is damaged, the workpiece will be scrapped, and even the machine tool will be damaged. According to statistics, tool wear or breakage accounts for about 20% of downtime and economic losses [2]. However, tool replacement relies on manual experience to avoid excessive wear, resulting in only 50–80% of the practical tool life being used [3]. Therefore, accurately predicting tool wear state is significant for improving machining efficiency. Existing tool wear state prediction models can be divided into physics-based and data-driven models [4]. Physics-based models are mathematical models built based on knowledge of physical models, laws, and measurement data, and their performance depends mainly on the quality and accuracy of the knowledge in the relevant field. The inability to update the model using online monitoring data limits the effectiveness and flexibility of the physics-based models. The data-driven model uses historical monitoring data for modeling and attempts to update the model and make decisions based on new online monitoring data. Thanks to the widespread application of intelligent sensing and the rapid development of machine learning technology, data-driven tool wear prediction research has become a hotspot in recent years.
With the rapid development of deep learning and computing power, deep learning has attracted extensive attention in academia and industry as it demonstrates automatic feature extraction and great representation learning capabilities for data based on deep network structures [5,6]. Common deep networks for wear state identification include autoencoders (AE) [7], convolutional neural networks (CNN) [8,9], recurrent neural networks (RNN) [10], Transformers, etc. Additionally, Yu et al. [11] proposed Pareto-optimal Adaptive Loss Residual Shrinkage Network (PALRSN), which improves the recognition accuracy of small sample categories through the adaptive loss function. Li et al. [12] proposed a tool wear prediction method based on Informer encoder and a stacked bidirectional gated recurrent unit. In general, these studies have achieved good performance by utilizing deep networks for tool wear detection research. However, in practical processing scenarios, collecting large amounts of labeled data for training is highly labor-intensive and costly. Additionally, changes in workpieces, cutting tools, and cutting parameters can lead to variations in cutting conditions within the machining scenario [12]. This affects the data distribution used for model training and machining monitoring, resulting in a significant decline in the recognition performance of the aforementioned models.
To address the problem of wear state identification under variable cutting conditions, many scholars have introduced the transfer learning paradigm into monitoring model development to reduce the amount of data required. Transfer learning-based wear state identification methods generally aim to learn transferable common knowledge from historical cutting conditions containing rich data (source domain) and utilize it in target cutting conditions with sparsely labeled data or unlabeled data (target domain). Based on the data labeling of the source and target domains, Pan et al. [13] categorized the transfer learning tasks into inductive transfer learning, transductive transfer learning, and unsupervised transfer learning. According to the research, there are two main types of transfer task scenarios for tool wear state identification under variable working conditions, i.e., inductive transfer learning where the source domain consists of labeled data, and domain adaption (DA).
In the inductive transfer learning task scenario, researchers often apply the transferred knowledge to the wear state identification under target cutting conditions based on feature transfer and parameter/model transfer methods. For the feature transfer-based approach, they attempt to seek common features in the two domains by analyzing the feature correlations between the source and target domains to achieve the transfer of tool wear knowledge. For example, Li et al. [14] used a genetic algorithm to generate candidate feature subsets in the source and target domains, and then transferred the feature information from the source domain to the target domain through a relational model and obtained the optimal feature subset based on the maximum mean discrepancy (MMD) metric, and finally realized the tool wear state identification in the target domain based on the particle swarm-optimized support vector machine. For parameter/model transfer-based wear state identification methods, they assume that several network model parameters or hyper-parameters are shared between the source domain task and the target domain task so that the model can be transferred to the identification task under the new working condition through pre-training and fine-tuning [15], etc. For instance, Zhang et al. [16] constructed a wear state identification model based on the model transfer method. They realized the tool wear state identification under variable feed rate by freezing the shallow feature extractor of the pre-trained improved deep residual network and fine-tuning the high-level feature extractor and classification network with cutting force signals as input after wavelet transform. Bahador et al. [17] also realized the tool wear state recognition under different machining equipment based on the pre-training and fine-tuning approach by freezing the CNN feature extractor and fine-tuning the training of the fully-connected classification network using the data from the target domain.
DA-based, especially unsupervised domain adaptation (UDA)-based [18,19], wear state identification methods relax the labeled data requirement for the target domain, which can only utilize the data under known cutting conditions and the unlabeled data under the new cutting conditions for transfer knowledge learning, which reduces the monitoring task cost to a certain extent. The UDA methods commonly used for wear state identification under variable cutting conditions can be divided into two categories: discrepancy-based methods [20,21] and adversarial-based methods [22,23]. Adversarial-based methods achieve the learning of domain-invariant features by designing the domain discriminator or adversarial objectives during the training process to encourage domain confusion. For instance, Li et al. [24] proposed a dynamic domain adversarial self-adaptive method for tool wear state recognition under different milling conditions. Discrepancy-based methods measure the discrepancies between the source and target domains within a model-specific network layer, e.g., using statistical metrics, and thus seek alignment between the source and target domains. For example, Liu et al. [25] proposed an interpretable domain adaptation Transformer for the transferable fault diagnosis. This method utilizes a multi-layer domain adaptation Transformer framework to capture key global information for learning domain modulation information, while minimizing feature distribution discrepancies.
The above wear state identification methods under variable cutting conditions for unlabeled target domains are mostly used to establish a monitoring task transfer between a known cutting condition and a new condition; however, in practice, there are monitoring data corresponding to more than one cutting condition. For the wear state monitoring in this case, a natural way to deal with it is to integrate the historical data from multiple conditions into one source domain. For example, Kim et al. [26] proposed a multi-domain mixture density network, which maps multi-sensor data from multiple cutting conditions to a common feature space and combines it with an adversarial learning method to guide the model to learn the public domain invariant representation. Zhu et al. [27] also proposed an unsupervised dual-regression domain adversarial adaptation network, which integrates data from multiple machining conditions into a single source domain and utilizes the weight discrepancy restriction and prediction consistency loss to align the distributions between domains, and then realizes the prediction of tool wear value. However, the distribution of monitoring data is different among different cutting conditions, and directly integrating multi-condition data into a single-source domain for single-source unsupervised domain adaptation (SUDA) [28] is prone to ignoring the feature distributions among different conditions during the process of learning domain invariant representation, resulting in a negative transfer effect.
Based on the above research status, we proposed a tool wear state identification method with variable cutting parameters based on multi-source unsupervised domain adaptation (MUDA). Overall, this method treats monitoring data in multiple cutting parameters as separate source domains and jointly uses unlabeled data in target cutting parameters as input to build a cross-domain wear state identification model. This model can identify the difference in feature distributions between multiple known cutting parameters and target cutting parameters, automatically extract high-level domain-specific invariant representations, and then achieves effective identification of the wear state under target cutting parameters, with an average accuracy exceeding 90%. The main contributions of this study are as follows.
(1)
A novel multi-source domain adaptive tool wear state prediction method based on Multiple Feature Spaces Adaptation Network (MFSAN) architecture is proposed. This method achieves tool wear state prediction under varying cutting parameters by constructing a multi-feature space adaptation network.
(2)
A public feature extractor based on a Non-Stationary Transformer Encoder (NSTE) is proposed. This extractor utilizes a sequence stationarization module and NSTE to explore non-stationary input features in multi-channel signals, thereby extracting advanced public features related to wear.
(3)
The proposed model incorporates a domain-specific feature distribution alignment module based on sliced Wasserstein distance (SWD) and a domain-specific classifier output alignment module. SWD allows for the measurement of differences in the hidden feature space with low computational cost. These two alignment modules mitigate domain shift and simplify the synchronization of alignment across multiple domain distributions.

2. Proposed Method

2.1. Problem Description

This paper investigates tool wear state monitoring under variable cutting parameters based on MUDA, aiming to construct an effective cross-domain wear state identification model with data under multiple existing cutting parameters. The proposed model can identify the feature distribution discrepancies between multiple existing cutting parameters and the target cutting parameters and extract high-level domain-invariant representations, directly realizing the accurate identification of wear states under the target cutting parameters.
Several basic assumptions and formulations are made to depict the problem. First, multi-channel sensor history data exist for multiple existing cutting parameters, and the variation in cutting parameters such as cutting speed, spindle speed, and depth of cut leads to large differences in the distribution of sensor data collected under different cutting parameters. The sensor data under a single cutting parameter and its wear state can form a domain D, formally described as D = x l , y l l = 1 L . Among them, x l is the input generated by multi-channel sensor data in domain D, y l is the corresponding wear state label, there are C label values, and L is the number of samples in domain D. Sensor data and wear state under multiple existing cutting parameters form multiple domains, which are the source domain D S in unsupervised domain adaptation. There are M source domains D i S = x i , l , y i , l l = 1 L i , 1 i M in D S and enough labeled samples in each source domain to build an effective cross-domain classifier. The sensor data collected under the target cutting parameters form the target domain D T , and the single target domain is D T = u k k = 1 K . There are only K small numbers of unlabeled samples u k in the target domain, and an effective wear state recognition model cannot be constructed using only these unlabeled samples. In addition, the wear state label spaces of each source domain and target domain mentioned above are identical, but the marginal distribution of each domain is noticeably distinct.
As depicted in Figure 1, MUDA can effectively learn from many labeled samples in the presence of existing cutting parameters and a limited number of unlabeled target domain samples, in contrast to SUDA. MUDA utilizes a cross-domain high-level feature extractor F and classifier T to decrease the domain shift resulting from differences in marginal distribution across various cutting parameters. This method reduces the reliance on labeled samples from the target domain when training the model. It also enables the efficient transfer of wear knowledge from multiple source domains to the target domain, creating a precise classification boundary for tool wear state in the target domain.

2.2. The Method for Tool Wear State Recognition Based on MFSAN

In order to solve the problem existing in unsupervised domain adaptation, Zhu et al. proposed the MFSAN, which can align domain-specific distributions and domain-specific classifiers in two stages [29]. MFSAN can serve as a generalized multi-source unsupervised domain adaptive architecture. On the one hand, it maps the target domain and each source domain to different feature spaces separately. It performs domain-specific distribution alignment to learn multiple domain-invariant representations, which reduces the difficulty of acquiring domain-invariant representations while entirely using multi-source domain samples for feature learning. On the other hand, MFSAN considers the relationship between the target domain samples and the domain-specific classification boundary and uses the domain-specific decision boundary to align the unlabeled target domain samples through the classifier output, improving the classification ability on the target domain.
Figure 2 illustrates the overall network architecture of MFSAN. In order to reduce the complexity of the network structure, the feature extractor F consists of a common feature extractor F 1 that shares parameters and multiple domain-specific feature extractors F d = F d i i = 1 M . The classifier is a multi-output network with predictors T d = T d i i = 1 M corresponding to each domain-specific feature extractor. All source domain and target domain data first enter the common feature extractor F 1 to extract the common representations of all domains. In order to map each pair of source and target domains into a domain-specific feature space, MFSAN designs a domain-specific extractor F d i for each source domain. The network parameters are not shared among the domain-specific feature extractors, so domain-specific domain-invariant representations between multiple source and target domains can be obtained. During training, the differences between individual source and target domains can be minimized in various approaches, such as statistical difference measure loss, adversarial loss, etc. A single domain-specific predictor T d i outputs the corresponding predicted labels through a Softmax classifier after receiving the high-level features of the corresponding source domains.
There are two alignment stages in the MFSAN architecture: the alignment of domain-specific distributions and the alignment of domain-specific classifiers. In the domain-specific distribution alignment stage, the respective high-level features of each pair of source and target domains are obtained by mapping the respective domain-specific feature networks. The MMD calculates the discrepancy in feature distribution between the source and target domains for each pair, which is then utilized as one of the loss functions. Minimizing this loss function motivates each domain-specific extractor to learn a domain-invariant representation of the source and target domains for each pair. Since the target domain samples are trained on different predictors, the output of each classifier on the target sample diverges. In particular, samples close to the classification boundary are more likely to be misclassified. Therefore, in the domain-specific classifier alignment stage, MFSAN uses the absolute value of the mutual difference between the probability outputs of the target domain samples on all predictors as the difference loss. By minimizing this loss, all predictors can obtain similar target sample prediction outputs, and the final target domain sample label is obtained by averaging all outputs.
Based on the two-stage alignment, this section proposes a variable-parameter tool wear state identification method based on the MFSAN. Figure 3 illustrates the overall architecture of the proposed method, consisting of one common feature extractor, one domain-specific distribution alignment module, and one domain-specific classifier alignment module. The common feature extractor is mainly composed of NSTEs and in order to avoid the consumption of computational resources caused by the repeated training of multiple networks, multi-channel multi-domain feature sequences from multiple source and target domains are mapped into the same common deep feature space by parameter sharing within the feature extractor. Furthermore, the generated common features of each pair of source domains and target domains are sent to the domain-specific distribution alignment module to mine domain-invariant representations between known cutting parameters and target cutting parameters. Meanwhile, the domain-specific feature distribution discrepancies between cutting parameters are measured based on SWD. In addition, within the domain-specific classifier alignment module, while each domain-specific wear state classifier is performing supervised training for the respective source domain samples, the module aligns the wear state probability outputs of the unlabeled samples of the target domain in the respective classifiers to obtain more reliable wear state prediction results.

2.3. Common Feature Extractor

To accurately depict the tool wear state of milling machining, this section extracts typical features in the time domain, frequency domain, and time-frequency domain from several sensor channels and then generate the sequence of time-series features as model input. Following normalization in the preprocessing stage, the feature sequences can maintain the same scales to avoid occurrences such as gradient anomalies during model training. However, the majority of the feature sequences after this preprocessing may still exhibit non-stationary characteristics. To enhance the common feature extraction ability on these non-stationary feature sequences, this subsection applies the non-stationary Transformer encoder as the backbone network and develops the variable cutting parameter common feature extractor, as shown in Figure 4.
As shown in Figure 4, compared with the classic Transformer encoder [30], the series stationarization operation is conducted outside the NSTE [31], allowing the common feature extractor to obtain a smooth wear feature input sequence. This input sequence thus follows a stable distribution and generalizes more easily. The series stationarization operation contains instance normalization and denormalization layers. Instance normalization performs translation and scaling operations on each input sample along the temporal dimension. For a single sample x i , l = [ x i , l , 1 , x i , l , , x i , l , l n ] R l n × d f in a source domain D i S = x i , l , y i , l l = 1 L , l n represents the sequence length within a single sample, and d f is the feature dimension number. The instance normalization operation is as follows:
μ x i , l = 1 l n t = 1 l n x i , l , t σ x i , l 2 = 1 l n t = 1 l n x i , l , t μ x i , l 2 x i , l , t = 1 σ x i , l x i , l , t μ x i , l
where μ x i , l and σ x i , l represent the normalized mean and standard deviation of sample x i , l , respectively, and their dimension is R d f × 1 . x i , l , t represents a single timestep sample after normalization. Additionally, 1 / σ x i , l and ⊙ represent matrix element division and multiplication, respectively.
After instance normalization, the distribution of sample x i , l is more stable. After the NSTE mapping f N S T E x i , l , the encoder output enters the de-normalization layer for reverse-scale transformation to restore the lost distribution information and enters the domain-specific distribution alignment module to obtain domain-invariant representation. The de-normalization layer operates as follows:
z i , l = f N S T E x i , l z i , l = σ x i , l z i , l + μ x i , l
where z i , l is NSTE output and z i , l is the reverse normalized output.
The series stationarization operation produces a more stable distribution of the encoder input. However, the scaling dot product self-attention mechanism inside the Transformer encoder is prone to over-stationarization when it performs global temporal correlation on stabilized inputs. For example, certain statistical feature sequences, such as the mean feature sequence, exhibit monotonicity along the time dimension, similar to the tool wear trend. After instance normalization during model training, these statistical feature sequences are segmented and normalized into several sequence segments with the same mean and variance, which follow more similar distributions than the sequences before stationarization. When these sequence segments enter the attention module for global temporal correlation computation, the scaled dot product self-attention mechanism may fail to recognize the monotonicity associated with wear trends, which weakens the ability of high-level feature extraction that contributes to the identification of wear states. To this end, the scaled dot product self-attention mechanism is revised into a de-stationary self-attention mechanism inside the non-stationary encoder to approximate the un-stationary attention feature map, thereby mining the non-stationary temporal dependencies related to tool wear.
Based on the assumption of linear properties and the translation invariance of the Softmax function, in order to simplify the expression, the calculation of the Softmax function of the unstationarized feature sequence input x i , l by the scaled dot product attention mechanism can be modified as follows [31]:
Softmax QK d k = Softmax σ x i , l 2 Q K + 1 μ Q K d k
where Q and K represent the query matrix and key matrix corresponding to the unstationarized feature sequence input x i , l , respectively. d k represents the characteristic dimension of the key matrix, and d k is the same as d f . Q and K represent the query matrix and key matrix corresponding to the stationarized feature sequence input x i , l , respectively. Moreover, σ x represents the instance normalized standard deviation approximation scalar and μ Q represents the mean value of the query matrix Q along the time series direction with dimension R d k × 1 . 1 is an all-1 vector with dimension R l n × 1 .
Furthermore, the de-stationary factor positive scaling scalar τ = σ x i , l 2 R + and shifting vector Δ = K μ Q R l n × 1 are defined in the de-stationary attention mechanism. In order to effectively learn the de-stationarity factor during the training process, a multilayer perceptron (MLP) is applied as a mapper to obtain information from the statistical values μ x i , l and σ x i , l in the unstationarized feature sequence x i , l and its instance normalization, respectively. The de-stationary attention mechanism can be expressed as follows:
log τ = MLP σ x i , l , x i , l Δ = MLP μ x i , l , x i , l Attn Q , K , V , τ , Δ = Softmax τ Q K + 1 Δ d k V
where V represents the value matrix corresponding to the stabilized feature sequence input x i , j .
In the common feature extractor, the series stationarization operation and the NSTE improve the non-stationary input predictability of feature sequences and fully exploit the non-stationary timing dependencies related to tool wear. Finally, the flattening layer outputs high-level wear-related representations of each cutting parameter in the common feature space.

2.4. Domain-Specific Distribution Alignment Module

The domain-specific distribution alignment module sends each pair of common features from the source and target domains to domain-specific fully connected networks F d i to extract and align domain-specific features. This module alleviates the challenge of directly aligning multiple cutting parameter distributions. The parameters are not shared among F d i , and F d i maps common features of the target domain to obtain multiple domain-specific features. In order to mine the domain-invariant representations between the known cutting parameters and the target cutting parameters in the specific feature space and to shrink the discrepancy of their distributions in each specific feature space, this section explicitly measures the features based on the sliced-Wasserstein distance.
Wasserstein distance (WD) can mine the geometric relationships within the latent feature space and offer meaningful metrics when measuring discrepancies in feature distributions with little or no overlap. Furthermore, WD can avoid the vanishing gradient problem and reduce mode collapse during training [32]. Therefore, WD is widely used in loss function design [33] and domain adaptation research [27,34]. WD is defined as follows: Define the L2 norm as · . For any p 1 , the set of Borel probability measures with p-order moments defined on the metric space R d , · with a given dimension d is defined as P p R d . For any probability measure μ , υ defined on Z 1 , Z 2 R d , its probability density functions are I μ and I υ , respectively, and the p-order WD of μ and υ are as follows:
W D p : = inf π μ , υ Z 1 × Z 2 z 1 z 2 p d π z 1 z 2 1 p
where μ , υ represents a set of transportation plans π , and the marginal distributions of π are μ and υ , respectively.
Directly using WD in deep learning scenarios will bring high computational and storage complexity [35,36]. To reduce complexity, Bonnnel et al. proposed [37,38] the SWD, a metric derived from the two ideas of an optimal transport closed-form expression for two distributions in one-dimensional space and the approach of transforming the distribution into a set of projected one-dimensional distributions using the Radon transform. S d 1 represents the d-dimensional unit ball in the L2 norm for any dimension d 2 , + . SWD can uniformly sample the projection direction on the unit sphere in the data ambient space and obtain the expectation of the resulting one-dimensional optimal transmission distance [33]. In order to facilitate calculation, the Monte Carlo method is usually used to extract N uniform sampling projection directions θ j j = 1 N from S d 1 for approximation:
S W D p p μ , υ 1 N i = 1 N W D p p I μ · , θ j , I υ · , θ j
where I μ · , θ j and I υ · , θ j represent Radon transform functions.
By calculating the above formula, SWD achieves lower computational cost and better scalability when calculating the discrepancy between two distributions, especially in high-dimensional statistical inference scenarios, such as measuring the distribution discrepancy in the latent feature space. Thus, this section uses 2nd-order SWD to measure the domain-specific feature distribution of each pair of source domain and target domain and obtains the average measurement value as the domain-specific distribution alignment loss L swd :
L swd = 1 M i = 1 M S W D 2 2 F d i F 1 x i , F d i F 1 u
By minimizing L swd during the training process, the feature distributions between each pair of known cutting parameters and target cutting parameters are aligned, and each domain-specific fully connected network obtains the corresponding domain-invariant representation.

2.5. Domain-Specific Classifier Alignment Module

In the domain-specific classifier alignment module, the wear state recognition network is a multi-output network with non-parameter sharing composed of wear state classifiers T = T d i i = 1 M specific to each domain. Each domain-specific wear state classifier T d i is a Softmax classifier network. On the one hand, T d i receives the domain-specific domain-invariant representation of the corresponding source domain, identifies its corresponding wear state, and uses the cross-entropy loss function as the classification loss to optimize the network parameters. The overall classification loss calculation formula is as follows:
L task = i = 1 M E x D i S L c e T d i F d i F 1 x i , y i
where L c e · , · represents the cross-entropy loss function.
On the other hand, T d i simultaneously receives domain-specific invariant representations of target domain samples and predicts the corresponding tool wear state. The outputs of the specific wear state classifiers in each domain are inconsistent with those of the target domain samples. Especially when the target domain samples are close to the wear classification boundary, the output of each classifier may be significantly different. To this end, this section calculates the mutual difference between the probability outputs of the target domain sample on all classifiers. Then, we utilize the absolute value of the difference as the difference alignment loss L calign :
L calign = 2 M × M 1 i = 1 M 1 i = i + 1 M E u D T T d i F d i F 1 u k T d i F d i F 1 u k
By minimizing L calign , each domain-specific wear state classifier outputs similar wear state results for samples in the same target domain. Finally, the proposed method obtains the mean output value of each classifier to identify the wear state of samples in the target domain.

2.6. Training Procedure for the Proposed Method

Combined with the previous content, the overall loss function of proposed method based on the MFSAN is as follows:
L total = L task + β 1 L swd + β 2 L calign
where β 1 and β 2 represent hyperparameter weights.
The hyperparameter weights β 1 and β 2 determine the importance of L swd and L calign to training, respectively. To balance the losses, the values of β 1 and β 2 during the training procedure are approximately changed as follows:
β 1 = β 2 = 1.8 1 + exp 10 × ς 1
where ς represents the adaptive parameter, which increases linearly from 0 to 1 during training.
Table 1 and Table 2 show the network parameters of the proposed method and the hyperparameters in the training stage, respectively. Figure 5 illustrates the overall training procedure of the proposed method. First, multi-channel force and vibration sensor data are collected online during machining. Data preprocessing converts high-frequency raw data into multi-channel and multi-domain statistical feature sequences. Then, the proposed method selects M sequence data containing known cutting parameters of the complete tool wear life cycle as the source domain input and randomly selects several unlabeled sequence data of target cutting parameters as the target domain input. Next, the training network is constructed through the settings of the above network parameters and hyperparameters. During the training procedure, each input passes through the common feature extractor, domain-specific distribution alignment module, and domain-specific classifier alignment module in sequence, and each module calculates the domain-specific distribution alignment loss L swd , classification loss L task , and output difference alignment loss L calign , respectively. Minimizing each loss function allows the network model to learn domain-specific invariant representations under each known cutting parameter and target parameter. Meanwhile, each wear state classifier improves the wear state identification accuracy of target parameter sequence data. After training, the model can accurately identify tool wear states for other sequence data within the target parameters that have not participated in the training.

3. Experimental Research

3.1. Experiment Design

The experimental platform for variable cutting parameters is shown in Figure 6, and the cutting process is square-shouldered climb milling along the X-axis direction on a DYNA TC500 three-axis milling machine with nine four-flute cemented carbide resharpened end mills with TiAlN coatings. The variable cutting parameters in the milling process mainly consider three factors: cutting speed v c , radial depth of cut a e , and axial depth of cut a p . Through the three-factor three-level orthogonal experiment, we obtained nine groups of cutting parameters, each tool corresponding to the cutting parameters shown in Table 3. The workpiece to be processed is I-shaped 40Cr13 steel with a hardness of 290 HB. A dynamometer and an accelerometer were installed between the worktables and workpiece to measure the cutting force signal and vibration signal with 10,000 Hz sample frequency, respectively. After the signal amplifier amplified the electrical signal generated by the sensor, it was transmitted to the local CNC system through the data acquisition module and EtherCAT bus. It was then transmitted to the edge through the Kafka message queue for processing and storage.
During the experiment, each milling stroke included four tool paths. After a certain number of milling strokes, an industrial microscope is used to measure the flank wear of the four cutting edges of the milling cutter. The maximum flank wear value is used as the wear value of the milling cutter to determine the tool wear state, including slight wear (VB 85 μ m), normal wear ( 85 μ m < VB 165 μ m), and severe wear (VB > 165 μ m). Figure 7 shows the images of the flank surface corresponding to different tool wear states. According to the ISO 8688-1 standard [39], the tool is deemed blunt when the flank wear bandwidth reaches 300 μ m.
Figure 8 depicts the time domain and amplitude spectrum of the Y-direction milling force signal at various wear states. Using the cutting frequency of 238 Hz as an example, the amplitude of the force signal grows as tool wear increases.

3.2. Multi-Source Domain Unsupervised Adaptive Tasks

The raw data obtained through experiments require preliminary processing, including anomaly data removal, data segmentation, etc. After the preliminary processing of data samples, it is necessary to extract multi-dimensional statistical features, combine statistical features, and complete data normalization. First, the non-overlapping sliding window method divides the multi-channel original time series data. Then, 11 common features are extracted from each segmented data point from each sensor channel. Table 4 shows the names and mathematical expressions of the employed time, frequency, and time-frequency domain features. Statistical feature extraction is performed on each segmented data set to collect feature information about tool wear state, which can limit the impact of implicit noise during data collection. In the next step, the extracted features were stitched with the data samples to construct multi-dimensional data samples. Since the extracted statistical features have different value scales, model training is very sensitive to this. Therefore, this section uses Z-score normalization to normalize each feature along the sequence direction. Augmented Dickey-Fuller (ADF) [40] and Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) [41] stationarity tests are performed on each standardized feature sequence. The results showed that many feature sequences showed non-stationary characteristics. The number of samples in each group of the obtained experimental dataset is shown in Table 5. Finally, 27 sets of multi-source domain unsupervised adaptive tasks, as shown in Table A1, are designed for model performance evaluation. For Tasks 1–9, the cutting speed parameter in the target domain does not appear in the source domain; for Tasks 10–18, the radial cutting width parameter in the target domain does not appear in the source domain; for Tasks 19–27, the axial cutting depth parameter in the target domain does not appear in the source domain. For example, in Task 1, multiple source domains are the data from cutting parameters N1–N6 ( v c = 135 or 140 m/min), and the target domain is the data from cutting parameter N7 ( v c =150m/min). During the model training procedure, some unlabeled data in the target domain are randomly selected to participate in, and the remaining data are used to test the performance of the model. The results of above tasks are shown in Table A1.

3.3. Design of Ablation Experiment

The tool wear state identification method with variable cutting parameters proposed in this paper is based on the multi-source unsupervised domain adaptive training strategy. The NSTE builds non-stationary temporal correlations related to tool wear in the common feature extractor. Meanwhile, SWD is applied in the domain-specific distribution alignment module to measure the feature distribution difference between each pair of known cutting parameters and target cutting parameters in a specific feature space. In order to analyze and evaluate the effectiveness of each of the above key components in identifying tool wear state under varying cutting parameters, this section conducts ablation experiments on the common feature extractor network, the metric function in the domain-specific distribution alignment module, and the overall training strategy.
Firstly, in order to analyze the effectiveness of the series stationarization operation and NSTE in common feature extractor, five comparison networks (M1–M5) were designed. Compared with the proposed method, M1 retained series stationarization operation but was replaced by two classic Transformer encoders for common feature extraction. M2–M5 all removed the series stationarization operation. M2 used two classic Transformer encoders to replace the NSTE. M3 adopted a Squeeze-and-Excitation module [42]. Instead of the NSTE, M4 and M5 adopted 4-layer BiLSTM and 4-layer BiGRU networks, respectively.
Secondly, two comparison methods (M6 and M7) were designed to analyze the impact of the feature distribution discrepancy metric function in the domain-specific distribution alignment module on the wear state identification accuracy. M6 applies MMD as the metric function, a commonly used metric function in transfer learning tasks [43]. Specifically, it uses the same kernel function to map the domain-specific features under the known cutting parameters and target cutting parameters obtained in each domain-specific space to the regenerated Hilbert space to measure the discrepancy in feature distribution. M7 adopts the Correlation Alignment (CORAL) metric, which measures the covariance second-order statistical feature difference of domain-specific features under known cutting parameters and target cutting parameters to obtain the discrepancy in feature distribution [44].
Next, in order to explore the effectiveness of the multi-source unsupervised domain adaptive training strategy based on MFSAN for identifying tool wear states under varying cutting parameters, comparative methods (M8–M11) are designed to conduct ablation experiments. Among them, M8 and M9 integrate data from multiple known cutting parameters into a single training set, treat the target cutting parameter data in each group of tasks as a test set, and then adopt the supervised training strategy. M8 uses a method that combines Transformer and LSTM [45]. Furthermore, the training strategies in M10 and M11 are single-source unsupervised domain adaptive strategies. Compared with the domain division strategy of the proposed method, M10 and M11 regard the data under multiple known cutting parameters as a single source domain and select the same target cutting parameter-unlabeled data as the target domain in the proposed method. Specifically, M10 applies the Deep Adaptation Network (DAN) [46] as an overall training strategy, and M11 uses the Deep Subdomain Adaptation Network (DSAN) [47]. The feature extraction network in each comparison method is the same as the common feature extractor network of the proposed method, and the classifiers are fully connected networks with PReLU as the activation function (dimension: 792-128-64-32-3).
Finally, the methods M1–11 were employed to conduct the 27 sets of tool wear state identification tasks as outlined in Table A1. By comparing the accuracy rates of M1–11 with the proposed method in the identification tasks, an analysis and evaluation of the proposed method and the various modules used were carried out.

4. Analysis and Discussion

4.1. Results Comparison and Analysis

The accuracy of the proposed method for the 27 groups of tasks is shown in Table A1. Among them, Tasks 10–18, which perform data set segmentation based on radial cutting width, have the highest overall accuracy of 93.93%. Tasks 19–27, which perform data set segmentation based on axial cutting depth, have the lowest overall accuracy of 92.11%. The overall accuracy of Tasks 1–9, which perform data set segmentation based on cutting speed, is similar to that of Tasks 10–18, which is 93.63%. The wear state recognition accuracy of the proposed method in each task ranges from 86.01% to 98.45%, with an average accuracy of 93.22%. The accuracy of the proposed method is lower than 90% on four groups of tasks, except for Tasks 3, 12, 21, and 24, and is higher than 90% on other tasks. Overall, the proposed method can realize tool wear state identification under variable cutting parameters with high accuracy.
This section adopts the confusion matrix and data prediction results to analyze and express the performance of the proposed method further. First, taking Tasks 7–9 as examples, the recognition confusion matrices and prediction results of the proposed method for target domain cutting data are depicted in Figure 9. The average recognition accuracy of the proposed method for the three sets of tasks is 93.55%. Figure 9 illustrates that the primary error source of the proposed method is the misjudgment of wear state data close to the classification boundary. In Tasks 7–9, the actual slight wear and actual normal wear data close to the classification boundary are misjudged as normal and severe wear state in advance, respectively. Among them, for Task 8, 12.29% of the actual slight wear data and 14.37% of the actual normal wear data are misjudged as normal and severe wear state in advance.
In addition, the three tasks with the lowest recognition accuracy among the 27 tasks performed using the proposed method were further analyzed, including Task 3 (88.43%), Task 12 (88.26%), and Task 24 (86.01%). Figure 10 depicts the corresponding recognition confusion matrices and prediction results in the target domain. It is observed that the target domains of these three tasks are all data sets corresponding to N9 cutting parameters. In these three tasks, the proposed method achieved 98.15% and 100% recognition accuracy for actual slight wear samples and actual severe state data, respectively, while only achieving 78.88% recognition accuracy for actual normal wear samples. From the data prediction results in Figure 10, the main misjudgment of N9 cutting parameter data by the proposed method is that actual normal wear samples close to the classification boundary are continuously misjudged as severe wear in advance. In Tasks 3, 12, and 24, approximately 18.32%, 19.52%, and 23.72% of the actual late normal wear samples were misjudged in advance, respectively. This phenomenon may be because the workpiece in Figure 6 has a thin-walled structure after a large amount of cutting. Continuing to use this workpiece for machining under N9 cutting parameters leads to chatter. Chatter interference information was coupled with tool wear-related information, and the feature extractor captured the corresponding generated features, resulting in model identification errors. In addition, the increase in signal value caused by the larger cutting parameters of N9 may have also exacerbated this problem, and its life expectancy is also the shortest among all groups of cutting parameters. This viewpoint is supported by Table A1. Among the 27 sets of tasks, the proposed method exhibits lower recognition accuracy in Tasks 10, 18, 19, and 21, where the target domains are N3 and N7. These two datasets correspond to relatively large processing parameters: N3 has the largest a e and a p , while N7 has the largest v c and a p . It is worth noting that the proposed method can correctly identify 100% of the actual mid- to late-stage severe wear data in all tasks.
Among the 27 tasks, Task 17 (98.45%), which has the highest wear state recognition accuracy, and Task 24 (86.01%), which has the lowest wear state recognition accuracy, were studied further. The t-distributed stochastic neighbor embedding (t-SNE) method was utilized to visualize the features within the domain-specific wear state classifiers of the proposed methods in the two tasks, as shown in Figure 11 and Figure 12, respectively. In the Figure 11 and Figure 12, red, blue, and yellow represent slight, normal, and severe wear states, respectively. Additionally, the samples filled with gray are the features of each known cutting parameter training data in the corresponding domain-specific classifier. Samples filled in red, blue, and yellow are the features of the target cutting parameter test data in the corresponding domain-specific classifiers.
In Figure 11, each domain-specific network can separate the data under different wear states in each pair of source and target domains. Meanwhile, the data in each pair of source domain and target domain under the same wear state achieve a better degree of mutual aggregation. In Figure 12, some of the actual normal wear state data of the target cutting parameters are aggregated with the actual severe wear data in each known parameter, consistent with the data prediction results in Figure 10c. Overall, the proposed method enables each domain-specific network to learn the domain-invariant representation between each pair of known cutting parameters and target cutting parameters through the two-stage alignment, achieving cross-domain inter-class separation of the tool wear state under variable cutting parameters.

4.2. Ablation Studies

The accuracy of the proposed method and methods M1-5 in tool wear state identification across 27 sets of variable cutting parameters is presented in Figure 13 and Table A2. It is observed that the proposed method has the most significant number of groups with the highest accuracy on 27 tasks and achieved the highest average accuracy of 93.22% on 27 groups of tasks. Compared with the M1–M5 method, the average accuracy of the proposed method increased by 1.84%, 1.41%, 2.36%, 2.70%, and 2.50%, respectively. Second, the proposed method achieves higher recognition accuracy and has the best accuracy stability compared with the other four methods. Specifically, the highest accuracy rates of the proposed method and methods M1–M5 on each group of tasks were around 98% to 99%. In comparison, the lowest accuracy rates on each group of tasks were 86.01%, 80.17%, 83.39%, 81.05%, 79.96%, and 78.21%, respectively. The differences between the highest and lowest recognition accuracy rates were 12.44%, 19.11%, 15.15%, 18.23%, 18.59%, and 20.51%, respectively. Third, when combining the proposed method with methods M1–3, it was found that using only the series stationarization operation on the Transformer encoder network architecture would reduce performance. This phenomenon may be due to the performance degradation of the scaled-dot product attention mechanism in identifying non-stationary temporal features related to wear trends after over-stationarity. The series stationarization operation and the de-stationary attention mechanism serve as complementary modules and jointly participate in extracting common wear features to enhance accuracy. In addition, comparing methods PM, M4, and M5, it was found that the performance of the common feature extractor based on the Transformer encoder architecture was improved, indicating that the wear-related global temporal correlation constructed by it is more conducive to improving the accuracy of wear state identification.
The accuracy of the proposed method and methods M6–7 in tool wear state identification across 27 sets of variable cutting parameters is presented in Figure 14 and Table A2. First, as shown in Figure 14, the proposed method achieved the highest accuracy in 17 wear state identification tasks, compared with M6 (7 tasks) and M7 (5 tasks). The average accuracy rates of these three methods on each group of tasks were 93.22%, 88.00%, and 91.23%, respectively. Second, compared with the other two methods, the SWD feature distribution discrepancy measure can achieve higher recognition accuracy. Meanwhile, the stability of the wear state recognition accuracy was also the best. Specifically, the highest accuracy rates of the proposed method, M6, and M7, for each group of tasks were all over 98%. The lowest accuracy rates on each group of tasks were 86.01%, 69.77%, and 82.54%, respectively. The differences between the highest and lowest recognition accuracy rates were 12.44%, 29.80%, and 16.19%, respectively. The above results show the effectiveness of utilizing SWD as a metric function to align domain-specific distributions and learn domain-specific domain-invariant representations. Thus, SWD can help improve identification accuracy when used in tool wear state identification tasks under variable cutting parameters.
The accuracy of the proposed method and methods M8–11 in tool wear state identification across 27 sets of variable cutting parameters is presented in Figure 15 and Table A2. As depicted in Figure 15, the accuracy of M8 and M9 was notably worse than that of the other three methods, with an average accuracy of only 51.87% (M8) and 52.67% (M9) across 27 tasks. This result indicates that relying solely on a supervised training strategy for identifying tool wear states under varying cutting parameters is insufficient to fulfill the few-shot scenario requirements. Due to the alteration in the known and target cutting parameters, there is a notable discrepancy in the data feature distributions. Consequently, the supervised training strategy can lead to the network model overfitting.
The average accuracy rates of M10 and M11 on the 27 tasks were 78.78% and 77.70%, respectively, higher than M9 (52.67%). However, compared with the proposed method, their average accuracy dropped by 14.44% and 15.52%, respectively. In addition, compared with M8–M11, the proposed method achieved the best wear state identification accuracy in all 27 groups of tasks. It can be inferred that it is difficult to integrate multiple known cutting parameter data into a source domain and to learn the common domain invariant representation of multiple known cutting parameter data and target cutting parameter data in the common feature space. Thus, under the single-source unsupervised adaptive training strategy, subtle feature differences between each cutting parameter may not be recognized, impairing the wear state identification performance under variable cutting parameters.

5. Conclusions and Future Works

A novel wear state identification method with variable cutting parameters based on the multi-source unsupervised adaptive training strategy is proposed. The performance and effectiveness of the proposed method were evaluated and analyzed. The main conclusions are as follows:
(1)
A multi-source unsupervised domain adaptive training strategy based on MFSAN boosts tool wear state identification accuracy under variable cutting parameter scenarios. The strategy fully utilizes multiple known cutting parameter data sets and effectively achieves mutual separation of wear states under varied cutting parameters by aligning domain-specific feature distribution and domain-specific classifier output in two stages.
(2)
The common feature extractor based on the NSTE and the domain-specific feature distribution measure with SWD assist in improving the wear state classification performance.
(3)
The effectiveness of the proposed method is evaluated through the tasks of identifying tool wear status with variable cutting parameters. Among 27 sets of tasks, the proposed method demonstrates an average accuracy of 93.22%, representing a significant enhancement of 14.44% over methods such as DAN and DSAN. The use of NSTE and SWD improves the recognition accuracy of the proposed method by 1.41% and 1.99%, respectively.
Although this paper has studied the implementation of tool wear identification under variable cutting parameter conditions, it still can be improved. Further studies could be conducted on the generalization of wear monitoring tasks under complex variable working conditions, like variable processing paths and cross-processing equipment, or on interpretable models for simultaneous tool wear and breakage detection.

Author Contributions

Conceptualization, Z.C. and W.L.; methodology, Z.C. and H.J.; software, Z.C., W.L. and J.S.; validation, Z.C., W.L. and J.S.; formal analysis, Z.C. and H.F.; writing—original draft preparation, Z.C.; writing—review and editing, Z.C., W.L., J.S., H.J. and H.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

Author Wangyang Li was employed by the Inspur Digital Enterprise Technology Limited, company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Table A1. Task division and the wear state recognition accuracy of the proposed method in each task.
Table A1. Task division and the wear state recognition accuracy of the proposed method in each task.
No.Source DomainTarget
Domain
Accuracy (%)Average
Accuracy (%)
Overall
Accuracy (%)
Task1N1, N2, N3, N4, N5, N6N793.4192.9793.63
Task2N1, N2, N3, N4, N5, N6N897.09
Task3N1, N2, N3, N4, N5, N6N988.43
Task4N1, N2, N3, N7, N8, N9N493.9394.36
Task5N1, N2, N3, N7, N8, N9N595.04
Task6N1, N2, N3, N7, N8, N9N694.12
Task7N4, N5, N6, N7, N8, N9N196.0793.55
Task8N4, N5, N6, N7, N8, N9N290.34
Task9N4, N5, N6, N7, N8, N9N394.24
Task10N1, N4, N7, N2, N5, N8N390.7991.4993.93
Task11N1, N4, N7, N2, N5, N8N695.43
Task12N1, N4, N7, N2, N5, N8N988.26
Task13N1, N4, N7, N3, N6, N9N292.1994.84
Task14N1, N4, N7, N3, N6, N9N595.44
Task15N1, N4, N7, N3, N6, N9N896.91
Task16N2, N5, N8, N3, N6, N9N197.0295.44
Task17N2, N5, N8, N3, N6, N9N498.45
Task18N2, N5, N8, N3, N6, N9N790.84
Task19N1, N6, N8, N2, N4, N9N391.1291.7292.11
Task20N1, N6, N8, N2, N4, N9N595.04
Task21N1, N6, N8, N2, N4, N9N789.01
Task22N1, N6, N8, N3, N5, N7N291.0890.70
Task23N1, N6, N8, N3, N5, N7N495.00
Task24N1, N6, N8, N3, N5, N7N986.01
Task25N2, N4, N9, N3, N5, N7N193.8193.92
Task26N2, N4, N9, N3, N5, N7N693.25
Task27N2, N4, N9, N3, N5, N7N894.72
Table A2. Results of ablation experiment.
Table A2. Results of ablation experiment.
No.Accuracy (%)
PMM1M2M3M4M5M6M7M8M9M10M11
Task193.4185.4783.3986.4580.9585.4774.9784.0151.8437.3668.5061.05
Task297.0998.0098.5499.2795.9998.1897.2797.4575.5744.8185.0685.06
Task388.4386.7086.1884.9783.9485.4984.2883.7761.9826.9876.3473.92
Task493.9395.6093.1093.2193.9392.6294.8893.9358.1339.3885.3688.93
Task595.0495.2495.6496.6394.0593.8595.8395.8348.2158.3376.1977.58
Task694.1294.7794.5582.7991.2984.5398.9193.4679.4482.5281.9280.39
Task796.0795.8393.1092.1492.7494.8879.1795.2443.7557.0276.0774.05
Task890.3486.2590.3486.3787.6187.9869.7786.2573.3056.6976.9572.86
Task994.2487.5091.6191.9494.0894.9081.9192.1166.7546.3672.2067.43
Task1090.7991.6193.2691.4593.7595.2391.6192.4364.7554.1382.2476.65
Task1195.4390.8591.9492.1685.1983.2299.5694.9979.9384.6483.8886.49
Task1288.2688.7789.8188.4389.8187.3980.1487.9157.6828.2873.5873.58
Task1392.1987.1186.6288.2389.4788.8576.3387.6143.8587.0874.6070.51
Task1495.4494.8496.2392.0690.4893.6594.4494.6437.2049.8576.5978.37
Task1596.9097.0997.4598.3698.5498.7398.0098.7339.6360.3876.5082.33
Task1697.0294.5293.8193.5792.9892.6288.8195.1284.3856.2187.0284.05
Task1798.4599.2995.3695.0094.6496.3196.3195.1217.8647.2390.6087.50
Task1890.8489.8789.9994.6390.4890.7277.0582.5451.8433.6177.1770.09
Task1991.1289.6493.2692.4392.6092.2785.8687.1785.6359.5681.0979.44
Task2095.0493.4595.6492.2692.8694.2595.0494.6419.6470.8369.6474.60
Task2189.0188.5285.2388.6589.3886.0875.7091.2111.0312.2767.1668.50
Task2291.0886.0086.3784.1488.1087.4984.5183.0282.6773.7978.3278.69
Task2395.0096.9196.9196.7993.4597.1496.0794.2917.8647.8685.1288.21
Task2486.0186.0186.5387.0585.8487.2283.7783.5926.9527.1177.7267.53
Task2593.8191.7989.8888.4586.5586.9187.7492.7458.6452.9081.5577.50
Task2693.2580.1790.2081.0579.9678.2192.3891.0745.0765.3685.1989.33
Task2794.7295.6394.1794.7295.4595.2695.8194.3517.0561.6180.5183.24
Average93.2291.3991.8290.8690.5290.7288.0091.2351.8752.6778.7877.70

References

  1. Brito, L.C.; da Silva, M.B.; Viana Duarte, M.A. Identification of cutting tool wear condition in turning using self-organizing map trained with imbalanced data. J. Intell. Manuf. 2021, 32, 127–140. [Google Scholar] [CrossRef]
  2. Kong, D.; Chen, Y.; Li, N. Gaussian process regression for tool wear prediction. Mech. Syst. Signal Process. 2018, 104, 556–574. [Google Scholar] [CrossRef]
  3. Zhou, Y.; Xue, W. A Multisensor Fusion Method for Tool Condition Monitoring in Milling. Sensors 2018, 18, 3866. [Google Scholar] [CrossRef]
  4. Zeng, Y.; Liu, R.; Liu, X. A novel approach to tool condition monitoring based on multi-sensor data fusion imaging and an attention mechanism. Meas. Sci. Technol. 2021, 32, 055601. [Google Scholar] [CrossRef]
  5. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  6. Ren, L.; Jia, Z.; Laili, Y.; Huang, D. Deep Learning for Time-Series Prediction in IIoT: Progress, Challenges, and Prospects. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 15072–15091. [Google Scholar] [CrossRef]
  7. Zhang, X.; Han, C.; Luo, M.; Zhang, D. Tool Wear Monitoring for Complex Part Milling Based on Deep Learning. Appl. Sci. 2020, 10, 6916. [Google Scholar] [CrossRef]
  8. Loizou, J.; Tian, W.; Robertson, J.; Camelio, J. Automated wear characterization for broaching tools based on machine vision systems. J. Manuf. Syst. 2015, 37, 558–563. [Google Scholar] [CrossRef]
  9. Li, Z.; Liu, X.; Incecik, A.; Gupta, M.K.; Kroclzyk, G.M.; Gardoni, P. A novel ensemble deep learning model for cutting tool wear monitoring using audio sensors. J. Manuf. Process. 2022, 79, 233–249. [Google Scholar] [CrossRef]
  10. Wang, J.; Yan, J.; Li, C.; Gao, R.X.; Zhao, R. Deep heterogeneous GRU model for predictive analytics in smart manufacturing: Application to tool wear prediction. Comput. Ind. 2019, 111, 1–14. [Google Scholar] [CrossRef]
  11. Yu, Y.; Guo, L.; Gao, H.; Liu, Y.; Feng, T. Pareto-Optimal Adaptive Loss Residual Shrinkage Network for Imbalanced Fault Diagnostics of Machines. IEEE Trans. Ind. Inform. 2022, 18, 2233–2243. [Google Scholar] [CrossRef]
  12. Li, W.; Fu, H.; Han, Z.; Zhang, X.; Jin, H. Intelligent tool wear prediction based on Informer encoder and stacked bidirectional gated recurrent unit. Robot. Comput.-Integr. Manuf. 2022, 77, 102368. [Google Scholar] [CrossRef]
  13. Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
  14. Li, J.; Lu, J.; Chen, C.; Ma, J.; Liao, X. Tool wear state prediction based on feature-based transfer learning. Int. J. Adv. Manuf. Technol. 2021, 113, 3283–3301. [Google Scholar] [CrossRef]
  15. Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A Survey on Deep Transfer Learning. In Artificial Neural Networks and Machine Learning—ICANN 2018, Part III, Proceedings of the 27th International Conference on Artificial Neural Networks, Rhodes, Greece, 4–7 October 2018; Lecture Notes in Computer Science; Kurkova, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I., Eds.; Springer International Publishing: Cham, Switzerland, 2018; Volume 11141, pp. 270–279. [Google Scholar] [CrossRef]
  16. Zhang, N.; Zhao, J.; Ma, L.; Kong, H.; Li, H. Tool Wear Monitoring Based on Transfer Learning and Improved Deep Residual Network. IEEE Access 2022, 10, 119546–119557. [Google Scholar] [CrossRef]
  17. Bahador, A.; Du, C.; Ng, H.P.; Dzulqarnain, N.A.; Ho, C.L. Cost-effective classification of tool wear with transfer learning based on tool vibration for hard turning processes. Measurement 2022, 201, 111701. [Google Scholar] [CrossRef]
  18. Shi, Y.; Ying, X.; Yang, J. Deep Unsupervised Domain Adaptation with Time Series Sensor Data: A Survey. Sensors 2022, 22, 5507. [Google Scholar] [CrossRef]
  19. Zhang, S.; Su, L.; Gu, J.; LI, K.; Zhou, L.; Pecht, M. Rotating machinery fault detection and diagnosis based on deep domain adaptation: A survey. Chin. J. Aeronaut. 2023, 36, 45–74. [Google Scholar] [CrossRef]
  20. Huang, Z.; Shao, J.; Zhu, J.; Zhang, W.; Li, X. Tool wear condition monitoring across machining processes based on feature transfer by deep adversarial domain confusion network. J. Intell. Manuf. 2024, 35, 1079–1105. [Google Scholar] [CrossRef]
  21. He, J.; Sun, Y.; Yin, C.; He, Y.; Wang, Y. Cross-domain adaptation network based on attention mechanism for tool wear prediction. J. Intell. Manuf. 2023, 34, 3365–3387. [Google Scholar] [CrossRef]
  22. Sun, W.; Zhou, J.; Sun, B.; Zhou, Y.; Jiang, Y. Markov Transition Field Enhanced Deep Domain Adaptation Network for Milling Tool Condition Monitoring. Micromachines 2022, 13, 873. [Google Scholar] [CrossRef]
  23. Li, S.; Huang, S.; Li, H.; Liu, W.; Wu, W.; Liu, J. Multi-condition tool wear prediction for milling CFRP base on a novel hybrid monitoring method. Meas. Sci. Technol. 2024, 35, 035017. [Google Scholar] [CrossRef]
  24. Li, K.; Chen, M.; Lin, Y.; Li, Z.; Jia, X.; Li, B. A novel adversarial domain adaptation transfer learning method for tool wear state prediction. Knowl.-Based Syst. 2022, 254, 109537. [Google Scholar] [CrossRef]
  25. Liu, D.; Cui, L.; Wang, G.; Cheng, W. Interpretable domain adaptation transformer: A transfer learning method for fault diagnosis of rotating machinery. Struct. Health Monit. 2024. [Google Scholar] [CrossRef]
  26. Kim, G.; Yang, S.M.; Kim, S.; Kim, D.Y.; Choi, J.G.; Park, H.W.; Lim, S. A multi-domain mixture density network for tool wear prediction under multiple machining conditions. Int. J. Prod. Res. 2023, 5, 1–20. [Google Scholar] [CrossRef]
  27. Zhu, Y.; Zi, Y.; Xu, J.; Li, J. An unsupervised dual-regression domain adversarial adaption network for tool wear prediction in multi-working conditions. Measurement 2022, 200, 111644. [Google Scholar] [CrossRef]
  28. Wilson, G.; Cook, D.J. A Survey of Unsupervised Deep Domain Adaptation. ACM Trans. Intell. Syst. Technol. 2020, 11, 1–46. [Google Scholar] [CrossRef]
  29. Zhu, Y.; Zhuang, F.; Wang, D. Aligning Domain-Specific Distribution and Classifier for Cross-Domain Classification from Multiple Sources. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 30, pp. 5989–5996. [Google Scholar] [CrossRef]
  30. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems 30 (NIPS 2017); Guyon, I., Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; The MIT Press: Cambridge, MA, USA, 2017; Volume 30. [Google Scholar]
  31. Liu, Y.; Wu, H.; Wang, J.; Long, M. Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting. In Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2022. [Google Scholar]
  32. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70. [Google Scholar]
  33. Frogner, C.; Zhang, C.; Mobahi, H.; Araya-Polo, M.; Poggio, T. Learning with a Wasserstein Loss. In Advances in Neural Information Processing Systems 28 (NIPS 2015); Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., Eds.; The MIT Press: Cambridge, MA, USA, 2015; Volume 28. [Google Scholar]
  34. Chen, P.; Zhao, R.; He, T.; Wei, K.; Yang, Q. Unsupervised domain adaptation of bearing fault diagnosis based on Join Sliced Wasserstein Distance. ISA Trans. 2022, 129, 504–519. [Google Scholar] [CrossRef]
  35. Nguyen, K.; Nguyen, D.; Ho, N.L. Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud Reconstruction. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
  36. Damodaran, B.B.; Kellenberger, B.; Flamary, R.; Tuia, D.; Courty, N. DeepJDOT: Deep Joint Distribution Optimal Transport for Unsupervised Domain Adaptation. In Proceedings of the Computer Vision—ECCV 2018, Lecture Notes in Computer Science, Part IV, Munich, Germany, 8–14 September 2018; Volume 11208, pp. 467–483. [Google Scholar] [CrossRef]
  37. Helgason, S. Integral Geometry and Radon Transforms; Springer: New York, NY, USA, 2010. [Google Scholar]
  38. Bonneel, N.; Rabin, J.; Peyre, G.; Pfister, H. Sliced and Radon Wasserstein Barycenters of Measures. J. Math. Imaging Vis. 2015, 51, 22–45. [Google Scholar] [CrossRef]
  39. ISO 8688-1; Tool Life Testing in Milling. Part 1: Face Milling. ISO: Geneva, Switzerland, 1989.
  40. Worden, K.; Iakovidis, I.; Cross, E.J. On Stationarity and the Interpretation of the ADF Statistic. In Dynamics of Civil Structures: Proceedings of the 36th IMAC, A Conference and Exposition on Structural Dynamics 2018, Orlando, FL, USA, 12–15 February 2018; Conference Proceedings of the Society for Experimental Mechanics Series; Pakzad, S., Ed.; Springer International Publishing: Cham, Switzerland, 2019; Volume 2, pp. 29–38. [Google Scholar] [CrossRef]
  41. Kagalwala, A. kpsstest: A command that implements the Kwiatkowski, Phillips, Schmidt, and Shin test with sample-specific critical values and reports p-values. Stata J. 2022, 22, 269–292. [Google Scholar] [CrossRef]
  42. Li, W.; Fu, H.; Zhuo, Y.; Liu, C.; Jin, H. Semi-supervised multi-source meta-domain generalization method for tool wear state prediction under varying cutting conditions. J. Manuf. Syst. 2023, 71, 323–341. [Google Scholar] [CrossRef]
  43. Long, M.; Cao, Y.; Cao, Z.; Wang, J.; Jordan, M. Transferable Representation Learning with Deep Adaptation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 3071–3085. [Google Scholar] [CrossRef] [PubMed]
  44. Sun, B.; Saenko, K. Deep CORAL: Correlation Alignment for Deep Domain Adaptation. In Proceedings of the Computer Vision—ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–10, 15–16 October 2016; Hua, G., Jégou, H., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 443–450. [Google Scholar]
  45. Sun, H.; Jin, H.; Zhuo, Y.; Ding, Y.; Guo, Z.; Han, Z. Investigation on a chatter detection method based on meta learning for machining multiple types of workpieces. J. Manuf. Process. 2024, 131, 1815–1832. [Google Scholar] [CrossRef]
  46. Long, M.; Cao, Y.; Wang, J.; Jordan, M.I. Learning Transferable Features with Deep Adaptation Networks. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015; Volume 37, pp. 97–105. [Google Scholar]
  47. Zhu, Y.; Zhuang, F.; Wang, J.; Ke, G.; Chen, J.; Bian, J.; Xiong, H.; He, Q. Deep Subdomain Adaptation Network for Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 1713–1722. [Google Scholar] [CrossRef]
Figure 1. Schematic diagram of tool wear state identification based on multi-source unsupervised domain adaptation.
Figure 1. Schematic diagram of tool wear state identification based on multi-source unsupervised domain adaptation.
Sensors 25 01742 g001
Figure 2. Multiple feature space adaptation network based on two-stage alignment.
Figure 2. Multiple feature space adaptation network based on two-stage alignment.
Sensors 25 01742 g002
Figure 3. Tool wear state identification method with varying cutting parameters based on MFSAN.
Figure 3. Tool wear state identification method with varying cutting parameters based on MFSAN.
Sensors 25 01742 g003
Figure 4. Network structure of common feature extractor network and de-stationary attention mechanism.
Figure 4. Network structure of common feature extractor network and de-stationary attention mechanism.
Sensors 25 01742 g004
Figure 5. Training procedure of tool wear state identification method with varying cutting parameters based on the MFSAN .
Figure 5. Training procedure of tool wear state identification method with varying cutting parameters based on the MFSAN .
Sensors 25 01742 g005
Figure 6. Variable cutting parameters milling experimental platform.
Figure 6. Variable cutting parameters milling experimental platform.
Sensors 25 01742 g006
Figure 7. Images of flank wear at the different wear states: (a) slight wear (54.907 μ m), (b) normal wear (107.298 μ m), (c) severe wear (293.181 μ m).
Figure 7. Images of flank wear at the different wear states: (a) slight wear (54.907 μ m), (b) normal wear (107.298 μ m), (c) severe wear (293.181 μ m).
Sensors 25 01742 g007
Figure 8. Cutting force signals in different wear states: (a) time domain diagram (slight wear), (b) frequency domain diagram (slight wear), (c) time domain diagram (normal wear), (d) frequency domain diagram (normal wear), (e) time domain diagram (severe wear), (f) frequency domain diagram (severe wear).
Figure 8. Cutting force signals in different wear states: (a) time domain diagram (slight wear), (b) frequency domain diagram (slight wear), (c) time domain diagram (normal wear), (d) frequency domain diagram (normal wear), (e) time domain diagram (severe wear), (f) frequency domain diagram (severe wear).
Sensors 25 01742 g008
Figure 9. Performance of the proposed method in Tasks 7–9: (a) confusion matrix in Task 7, (b) Prediction results in Task 7, (c) confusion matrix in Task 8, (d) prediction results in Task 8, (e) confusion matrix in Task 9, (f) prediction results in Task 9. In the confusion matrix, the deeper the color, the higher the proportion it signifies.
Figure 9. Performance of the proposed method in Tasks 7–9: (a) confusion matrix in Task 7, (b) Prediction results in Task 7, (c) confusion matrix in Task 8, (d) prediction results in Task 8, (e) confusion matrix in Task 9, (f) prediction results in Task 9. In the confusion matrix, the deeper the color, the higher the proportion it signifies.
Sensors 25 01742 g009
Figure 10. Performance of the proposed method in Tasks 3, 12, and 24: (a) confusion matrix in Task 3, (b) prediction results in Task 3, (c) confusion matrix in Task 12, (d) prediction results in Task 12, (e) confusion matrix in Task 24, (f) prediction results in Task 24. In the confusion matrix, the deeper the color, the higher the proportion it signifies.
Figure 10. Performance of the proposed method in Tasks 3, 12, and 24: (a) confusion matrix in Task 3, (b) prediction results in Task 3, (c) confusion matrix in Task 12, (d) prediction results in Task 12, (e) confusion matrix in Task 24, (f) prediction results in Task 24. In the confusion matrix, the deeper the color, the higher the proportion it signifies.
Sensors 25 01742 g010
Figure 11. Feature visualization of each domain-specific classifier in Task 17: (a) N2 (source)-N4 (target), (b) N5 (source)-N4 (target), (c) N8 (source)-N4 (target), (d) N3 (source)-N4 (target), (e) N6 (source)-N4 (target), (f) N9 (source)-N4 (target).
Figure 11. Feature visualization of each domain-specific classifier in Task 17: (a) N2 (source)-N4 (target), (b) N5 (source)-N4 (target), (c) N8 (source)-N4 (target), (d) N3 (source)-N4 (target), (e) N6 (source)-N4 (target), (f) N9 (source)-N4 (target).
Sensors 25 01742 g011
Figure 12. Feature visualization of each domain-specific classifier in Task 24: (a) N1 (source)-N9 (target), (b) N6 (source)-N9 (target), (c) N8 (source)-N9 (target), (d) N3 (source)-N9 (target), (e) N5 (source)-N9 (target), (f) N7 (source)-N9 (target).
Figure 12. Feature visualization of each domain-specific classifier in Task 24: (a) N1 (source)-N9 (target), (b) N6 (source)-N9 (target), (c) N8 (source)-N9 (target), (d) N3 (source)-N9 (target), (e) N5 (source)-N9 (target), (f) N7 (source)-N9 (target).
Sensors 25 01742 g012
Figure 13. Comparison of prediction results between the proposed method and other methods (different feature extractors): (a) Tasks 1–9, (b) Tasks 10–18, (c) Tasks 19–27.
Figure 13. Comparison of prediction results between the proposed method and other methods (different feature extractors): (a) Tasks 1–9, (b) Tasks 10–18, (c) Tasks 19–27.
Sensors 25 01742 g013
Figure 14. Comparison of prediction results between the proposed method and other methods (different domain-specific feature metric functions): (a) Tasks 1–9, (b) Tasks 10–18, (c) Tasks 19–27.
Figure 14. Comparison of prediction results between the proposed method and other methods (different domain-specific feature metric functions): (a) Tasks 1–9, (b) Tasks 10–18, (c) Tasks 19–27.
Sensors 25 01742 g014
Figure 15. Comparison of prediction results between the proposed method and other methods (different training strategies): (a) Tasks 1–9, (b) Tasks 10–18, (c) Tasks 19–27.
Figure 15. Comparison of prediction results between the proposed method and other methods (different training strategies): (a) Tasks 1–9, (b) Tasks 10–18, (c) Tasks 19–27.
Sensors 25 01742 g015
Table 1. Network model parameters.
Table 1. Network model parameters.
Network ModulesParameters
NSTENumber of encoders: 2, non-stationary self-attention head number: 1
Point-wise Feed ForwardConvolution kernel size of one-dimensional convolutional layer 1:1, padding: 0, input channel dimension: 66, output channel dimension: 264; convolution kernel size of one-dimensional convolutional layer 2:1, padding: 0, input channel dimension: 264, output channel dimension: 64
ProjectorNumber of hidden layers: 1, dimension of hidden layer: 64, activation function: ReLU
Domain-specific fully connected networkDimensions of each hidden layer: 792-128-64-32, activation function: PReLU
Domain-specific classiferDimensions of hidden layer: 32-3
Table 2. Hyperparameters for the proposed model training.
Table 2. Hyperparameters for the proposed model training.
HyperparametersValueHyperparametersValue
Batch size32OptimizerAdamW
Training times100Weight decay in the optimizer0.00005
Learning rate (LR)0.0008Momentum in the optimizer0.9
LR schedulerCosine Annealing Warm UpDropout0.1
LR warmup steps15Number of SWD projection directions320
Table 3. Milling experimental cutting parameters.
Table 3. Milling experimental cutting parameters.
No. v c (m/min) a e (mm) a p (mm) f z (mm/r)n (rpm)
N11351.50.60.1163580
N213520.70.1163580
N31352.50.80.1163580
N41401.50.70.1163710
N514020.80.1163710
N61402.50.60.1163710
N71501.50.80.1163980
N815020.60.1163980
N91502.50.70.1163980
Table 4. Statistical features in time, frequency, and time–frequency domain [42].
Table 4. Statistical features in time, frequency, and time–frequency domain [42].
No.FeatureFormula
1Mean x m e a n = 1 N i = 1 N x i
2Root mean square x r m s = 1 N i = 1 N x i 2
3Max x s = max x i
4Standard deviation x s d = 1 N i = 1 N x i x m e a n 2
5Peak value x p = max x i
6Peak-to-peak x p p = max x i min x i
7Spectral power f s p = i = 1 N f i 3 P f i
8Frequency centroid f f c = i = 1 N f i P f i / i = 1 N P f i
9Root mean square frequency f r m s f = i = 1 N f i 2 P f i / i = 1 N P f i
10Root variance frequency f r v f = i = 1 N f i f f c 2 P f i / i = 1 N P f i
11Wavelet packet energy e w p e = i = 1 N w t φ 2 i / N
Table 5. Sample numbers for each group in the experimental dataset.
Table 5. Sample numbers for each group in the experimental dataset.
No.Number of Slight
Wear Samples
Number of Normal
Wear Samples
Number of Severe
Wear Samples
Total Number of Samples
N180235248563
N2144272252668
N396144192432
N4100200304604
N572192156420
N64488200332
N760280224564
N872164216452
N960239128427
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cai, Z.; Li, W.; Song, J.; Jin, H.; Fu, H. Tool Wear State Identification Method with Variable Cutting Parameters Based on Multi-Source Unsupervised Domain Adaptation. Sensors 2025, 25, 1742. https://doi.org/10.3390/s25061742

AMA Style

Cai Z, Li W, Song J, Jin H, Fu H. Tool Wear State Identification Method with Variable Cutting Parameters Based on Multi-Source Unsupervised Domain Adaptation. Sensors. 2025; 25(6):1742. https://doi.org/10.3390/s25061742

Chicago/Turabian Style

Cai, Zhigang, Wangyang Li, Jianxin Song, Hongyu Jin, and Hongya Fu. 2025. "Tool Wear State Identification Method with Variable Cutting Parameters Based on Multi-Source Unsupervised Domain Adaptation" Sensors 25, no. 6: 1742. https://doi.org/10.3390/s25061742

APA Style

Cai, Z., Li, W., Song, J., Jin, H., & Fu, H. (2025). Tool Wear State Identification Method with Variable Cutting Parameters Based on Multi-Source Unsupervised Domain Adaptation. Sensors, 25(6), 1742. https://doi.org/10.3390/s25061742

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop