Identification of Milling Cutter Wear State under Variable Working Conditions Based on Optimized SDP

: Traditional data-driven tool wear state recognition methods rely on complete data under targeted working conditions. However, in actual cutting operations, working conditions vary, and data for many conditions lack labels, with data distribution characteristics differing between conditions. To address these issues, this article proposes a method for recognizing the wear state of milling cutters under varying working conditions based on an optimized symmetrized dot pattern (SDP). This method utilizes complete data from source working conditions for representation learning, transferring a generalized milling cutter wear state recognition model to varying working condition scenarios. By leveraging computer image processing features, the vibration signals produced by milling are converted into desymmetrization dot pattern images. Clustering analysis is used to extract template images of different wear states, and differential evolution algorithms are employed to adaptively optimize parameters using the maximization of Euclidean distance as an indicator. Transfer learning with a residual network incorporating an attention mechanism is used to recognize the wear state of milling cutters under varying working conditions. The experimental results indicate that the method proposed in this paper reduces the impact of working condition changes on the mapping relationship of milling cutter wear states. In the wear state identification experiment under varying conditions, the accuracy reached 97.39%, demonstrating good recognition precision and generalization ability.


Introduction
Cutting tools, as a critical resource in advanced manufacturing, serve as the executing end in cutting processes [1].During these processes, the contact surface between the tool and the workpiece undergoes complex changes in stress and temperature fields, leading to tool wear [2].When tool wear reaches a certain level, it can cause an increase in the surface roughness of the workpiece, a decrease in precision, and in severe cases, breakage or chipping of the tool, posing a danger to both the machine and the operator [3,4].Timely and accurate identification of tool wear states can mitigate the impact of tool wear on production [5].Therefore, the recognition of tool wear states is an essential component of advanced manufacturing technology and is key to ensuring quality, improving productivity, and reducing energy and time costs [6][7][8].
In the context of big data, data-driven machine learning methods have become the main approach for identifying tool wear states.These methods use machine learning theories and algorithms to mine information from data collected during cutting processes, establishing a mapping relationship between sensor signals and tool wear states.Through continuous acquisition of sensor signals during the machining process, the current tool wear state can be identified [9][10][11][12].Li et al. [13] have proposed a data-driven monitoring method based on radar chart feature fusion, establishing a decision tree ensemble learning model that can quickly and accurately identify the current tool wear state.Huang et al. [14] have used reconstructed time series layers to represent multi-sensor original signals and employed convolutional neural networks for automatic recognition of tool wear.Data-driven recognition methods do not require precise analytical models or extensive domain expertise and reasoning mechanisms, but they typically need a large amount of training data, and the training and testing data must be independently and identically distributed [15,16].However, the actual cutting environment is harsh and variable, with data distribution differences under different conditions, making traditional data-driven tool wear recognition methods unsuitable for some conditions with scarce data or a lack of labeled data [17].Compared to traditional data-driven approaches, transfer learning can mine common features in two different but related domains, allowing for the transfer of features contained in the data from the source condition to the target condition [18].Baskar et al. [19] have used transfer learning methods to classify computer animation and photography images available in small datasets.Sakallı et al. [20] have recognized thermal images obtained from the analysis of asynchronous motors and transformers running conditions using a transfer learning architecture, diagnosing the condition of asynchronous motors and transformers.
Transfer learning applies known information to different but related domains, reducing the need for data features and offering a new approach to solving cross-domain learning problems.However, its research results are mainly based on image processing in computer vision [21][22][23].Typically, one-dimensional time series signals are collected in tool state monitoring [24], and converting one-dimensional data into two-dimensional images for time series classification tasks [25] can better utilize the capabilities of transfer learning in image processing [26], effectively addressing variable condition issues.The symmetrized dot pattern (SDP) transformation method uses numerical calculations to convert onedimensional time series into two-dimensional symmetric images in polar coordinate space.By corresponding mapping relationships, it can reflect all information of signals [27].Zhang et al. [28] used SDP for the visual reconstruction of original signals, combined with a multicovariance Gaussian process regression method for tool wear monitoring, showing that this method can accurately recognize tool wear states under the same conditions.Sun et al. [29] proposed a rolling bearing fault diagnosis method based on empirical mode decomposition and SDP, decomposing the vibration signals of rolling bearings through SDP to convert each intrinsic mode function component into a fused image, using an improved Manhattan distance for image classification diagnosis.The SDP transformation can convert discrete data of different signals into visual images according to their time series, providing a better representation of data and thereby mining more patterns and features [30,31].
The effectiveness of the SDP transformation is mainly limited by the selection of the lag coefficient l and the aperture gain ξ, which are difficult to set through subjective experience [32].To fully exploit the superiority of SDP in multi-channel fusion, this paper uses the maximization of the difference degree of template images of various wear states as an indicator, utilizing the global optimization capability of the differential evolution (DE) algorithm to search for parameters.On this basis, a residual network with a fusion attention mechanism is used for transfer learning on images under the source condition, building a generalized milling tool wear state recognition model to identify the wear state of milling tools under variable conditions, verified through milling tool wear experiments.

The Analysis of SDP's Characteristics
SDP is a method that combines time series signal processing with image analysis, describing the amplitude and frequency changes of time series data in the Cartesian coordinate system in a graphical manner within the polar coordinate system.The SDP transformation was initially proposed for the visual representation of speech.Due to its simplicity in computation and the intuitive and vivid images it produces, which effec-tively reflect the characteristics of time series signals, it has achieved favorable results in applications such as speech recognition, equipment status monitoring, and fault diagnosis.

The Basic Principle of SDP
The discrete sampling data of a time series signal, after undergoing the SDP transformation, map the normalized time series waveform onto the polar coordinate system to create a symmetrical dot plot.Suppose the signal's discrete sampling sequence of N sampling points is F = {F 1 , F 2 , • • • , F N }, where the amplitude of the j-th sampling point is F j .These are substituted into Equations ( 1) and (2), transforming them into symmetrical polar coordinates S r j , Θ ij , Φ ij , where r j is the radial distance, and Θ ij and Φ ij are the polar angles.The principle schematic diagram of the SDP transformation is shown in Figure 1.
where H and L represent the maximum and minimum values of the time series signal, respectively; m is the number of mirrors; Θ ′ is the starting angle, with   dinate system in a graphical manner within the polar coordinate system.The SDP transformation was initially proposed for the visual representation of speech.Due to its simplicity in computation and the intuitive and vivid images it produces, which effectively reflect the characteristics of time series signals, it has achieved favorable results in applications such as speech recognition, equipment status monitoring, and fault diagnosis.

The Basic Principle of SDP
The discrete sampling data of a time series signal, after undergoing the SDP transformation, map the normalized time series waveform onto the polar coordinate system to create a symmetrical dot plot.Suppose the signal's discrete sampling sequence of N sampling points is  = { 1 ,  2 , ⋯ ,   }, where the amplitude of the j-th sampling point is   .These are substituted into Equations ( 1) and ( 2), transforming them into symmetrical polar coordinates (  ,   ,   ), where   is the radial distance, and   and   are the polar angles.The principle schematic diagram of the SDP transformation is shown in Figure 1.
where H and L represent the maximum and minimum values of the time series signal, respectively; m is the number of mirrors;  ′ is the starting angle, with  ′ = (360 ° ⁄ ), where  = 1, 2, ⋯ , ; l is the lag coefficient; and ξ is the aperture gain, with  ≤ 360 ° ⁄ .

The Impact of Different Parameters on the Effect of the SDP Transformation
During the SDP transformation process, it is necessary to preset three parameters: the number of mirrors m, the lag coefficient l, and the aperture gain ξ, with the number of mirrors m depending on the number of signals in the fused image.Different parameters influence the topological structure of the SDP image, and the principle behind selecting parameters is to ensure the scatter points from the SDP transformation cover the polar coordinate system as fully as possible without overlapping.Thus, analyzing the effects of different parameters on the SDP transformation can aid in finding the optimal parameters, thereby better representing the signal.
1.The impact of the lag coefficient l on the SDP transformation effect.
The choice of the lag coefficient l affects the difference between   and  + , which will determine the variation between the polar radius and polar angle of the scatter points on the polar coordinate system.For a time series signal with frequency f, if the sampling frequency is  s , then the number of sampling points per cycle of the signal is  s  ⁄ , making the lag coefficient l have a periodic effect on the SDP transformation, with a period of  s  ⁄ .

The Impact of Different Parameters on the Effect of the SDP Transformation
During the SDP transformation process, it is necessary to preset three parameters: the number of mirrors m, the lag coefficient l, and the aperture gain ξ, with the number of mirrors m depending on the number of signals in the fused image.Different parameters influence the topological structure of the SDP image, and the principle behind selecting parameters is to ensure the scatter points from the SDP transformation cover the polar coordinate system as fully as possible without overlapping.Thus, analyzing the effects of different parameters on the SDP transformation can aid in finding the optimal parameters, thereby better representing the signal.

1.
The impact of the lag coefficient l on the SDP transformation effect.
The choice of the lag coefficient l affects the difference between F j and F j+l , which will determine the variation between the polar radius and polar angle of the scatter points on the polar coordinate system.For a time series signal with frequency f, if the sampling frequency is f s , then the number of sampling points per cycle of the signal is f s / f , making the lag coefficient l have a periodic effect on the SDP transformation, with a period of f s / f .Taking a sine signal y = sin(200πt) with a sampling frequency of 12,800 Hz as an example, the range for the lag coefficient l in the SDP transformation is [0,128].With the number of mirrors m set to 1 and the aperture gain ξ to 60 • , the changes in the SDP image of this sine signal with varying lag coefficient l are shown in Figures 2 and 3.   From Figure 2, it is evident that when  = 0, the scatter points transformed by SDP all fall on an arc, forming two symmetrical spiral arms.As the lag coefficient l increases, the margins of the two spiral arms gradually expand, and the area of the spiral arms starts   From Figure 2, it is evident that when  = 0, the scatter points transformed by SDP all fall on an arc, forming two symmetrical spiral arms.As the lag coefficient l increases, the margins of the two spiral arms gradually expand, and the area of the spiral arms starts From Figure 2, it is evident that when l = 0, the scatter points transformed by SDP all fall on an arc, forming two symmetrical spiral arms.As the lag coefficient l increases, the margins of the two spiral arms gradually expand, and the area of the spiral arms starts to increase from 0, with the intersection points of the two spiral arms and the polar coordinate boundary gradually gathering towards the starting angle Θ ′ (Θ ′ = 0 • ).When l = 32, the area of the spiral arms reaches its maximum.As the lag coefficient l continues to increase, the area of the spiral arms begins to decrease, but the intersection points of the two spiral arms and the polar coordinate boundary continue to gather towards the starting angle Θ ′ .When l = 64, the two spiral arms turn into arcs, with the area of the spiral arms at 0, at which point the intersection points of the two spiral arms and the polar coordinate boundary are at the starting angle Θ ′ .Hence, as the lag coefficient l changes in the first half of the cycle, the area of the two symmetrical spiral arms first increases and then decreases, with the intersection points of the spiral arms and the polar coordinate boundary gradually gathering towards the starting angle Θ ′ until they reach it.
From Figure 3, it can be seen that the SDP image when the lag coefficient l changes in the second half of the cycle is symmetrical to the first half, with the area of the two symmetrical spiral arms also first increasing and then decreasing, but the intersection points of the spiral arms and the polar coordinate boundary gradually move away from the starting angle Θ ′ .Therefore, when searching for optimal parameters, the maximum value of the lag coefficient l should be half of its period, i.e., l ∈ [0, f s /2 f ].

2.
The impact of the aperture gain ξ on the SDP transformation effect.
Continuing with the sine signal y = sin(200πt) with a sampling frequency of 12,800 Hz as an example, to achieve a better SDP transformation effect, the lag coefficient when the spiral arm area is largest in Figure 2   to increase from 0, with the intersection points of the two spiral arms and the polar coordinate boundary gradually gathering towards the starting angle  ′ ( ′ = 0°).When  = 32, the area of the spiral arms reaches its maximum.As the lag coefficient l continues to increase, the area of the spiral arms begins to decrease, but the intersection points of the two spiral arms and the polar coordinate boundary continue to gather towards the starting angle  ′ .When  = 64, the two spiral arms turn into arcs, with the area of the spiral arms at 0, at which point the intersection points of the two spiral arms and the polar coordinate boundary are at the starting angle  ′ .Hence, as the lag coefficient l changes in the first half of the cycle, the area of the two symmetrical spiral arms first increases and then decreases, with the intersection points of the spiral arms and the polar coordinate boundary gradually gathering towards the starting angle  ′ until they reach it.
From Figure 3, it can be seen that the SDP image when the lag coefficient l changes in the second half of the cycle is symmetrical to the first half, with the area of the two symmetrical spiral arms also first increasing and then decreasing, but the intersection points of the spiral arms and the polar coordinate boundary gradually move away from the starting angle  ′ .Therefore, when searching for optimal parameters, the maximum value of the lag coefficient l should be half of its period, i.e.,  ∈ [0,  s 2 ⁄ ].
2. The impact of the aperture gain ξ on the SDP transformation effect.
Continuing with the sine signal  = sin(200π) with a sampling frequency of 12,800 Hz as an example, to achieve a better SDP transformation effect, the lag coefficient when the spiral arm area is largest in Figure 2 is chosen, i.e., the lag coefficient l is 32.With the number of mirrors m set to 3, and based on  ≤ 360° ⁄ , the range for the aperture gain ξ in the SDP transformation is [0°, 120°].The changes in the SDP image of this sine signal with varying aperture gain ξ are shown in Figure 4. From Figure 4, it is observed that as the aperture gain ξ increases, the area of each spiral arm also gradually increases.When ξ is too small, the area of each spiral arm is small, which hinders image feature extraction; when ξ is too large, overlapping occurs between adjacent spiral arms of different mirrors, masking features.Hence, choosing an appropriate aperture gain ξ within the range (0°, 360° ⁄ ] can ensure the SDP image better represents the signal without overlap between adjacent spiral arms of different mirrors.
In conclusion, the selection of different lag coefficients l and aperture gains ξ influences the topological structure of the SDP image.Keeping one influencing parameter constant and only optimizing the other overlooks the interaction between the two, resulting in a solution that is only relatively optimal, limiting the application effect of the SDP transformation.Therefore, adaptively selecting the optimal combination of parameters l and ξ for the SDP transformation is extremely important, facilitating a better representation of the signal through the SDP image.

Recognizing the Wear State of Milling Cutters under Variable Conditions Based on Optimized SDP
The vibration signals during the milling process contain a wealth of information about the operational state of the tools and are convenient to extract, making them suitable From Figure 4, it is observed that as the aperture gain ξ increases, the area of each spiral arm also gradually increases.When ξ is too small, the area of each spiral arm is small, which hinders image feature extraction; when ξ is too large, overlapping occurs between adjacent spiral arms of different mirrors, masking features.Hence, choosing an appropriate aperture gain ξ within the range (0 • , 360 • /m] can ensure the SDP image better represents the signal without overlap between adjacent spiral arms of different mirrors.
In conclusion, the selection of different lag coefficients l and aperture gains ξ influences the topological structure of the SDP image.Keeping one influencing parameter constant and only optimizing the other overlooks the interaction between the two, resulting in a solution that is only relatively optimal, limiting the application effect of the SDP transformation.Therefore, adaptively selecting the optimal combination of parameters l and ξ for the SDP transformation is extremely important, facilitating a better representation of the signal through the SDP image.

Recognizing the Wear State of Milling Cutters under Variable Conditions Based on Optimized SDP
The vibration signals during the milling process contain a wealth of information about the operational state of the tools and are convenient to extract, making them suitable for identifying the wear state of milling cutters without affecting the process.However, complex tool paths can have different impacts on vibration signals in various directions, meaning that characterizing the tool wear state using a vibration signal from a single direction might lead to misjudgments due to path variations.Therefore, employing the SDP transformation to distribute vibration signals from the X, Y, and Z axes across a designated area and merge them into a visual image can leverage the complementarity and redundancy between the three-axis vibration signals, thereby better characterizing tool wear information.

Optimization of SDP Parameters Based on the Differential Evolution Algorithm
The discrete sampling data of vibration signals, after undergoing the SDP transformation, maps the time series waveform onto the polar coordinate system to create symmetrical scatter points.It combines vibration signals from the X, Y, and Z axes according to their time series in a designated area, thus constructing a fused image of the three-axis vibration signals.The effect of the SDP transformation is primarily influenced by the lag coefficient l and the aperture gain ξ, mainly reflected in the SDP image's spiral arm area, curvature, and geometric center shape features.The selection of parameters should avoid overlapping adjacent spiral arms from different mirrors in the SDP image, thereby masking features.The SDP image uses symmetrical spiral arms for representation because symmetry is conducive to the human visual perception system's recognition and memory of image features.However, with the development and application of computer vision, it is no longer necessary to identify image features manually.Therefore, improving the SDP method by replacing Equation (2) with Equation ( 3) can yield the desymmetrization dot pattern (DDP), which can reflect the characteristics of time series signals well and improve the computational efficiency of image processing.
Taking a sine signal y = sin(200πt) with a sampling frequency of 12,800 Hz as an example, with the number of mirrors m set to 3, lag coefficient l to 32, and aperture gain ξ to 90 • , the SDP and DDP images of this sine signal under the same parameters are shown in Figure 5.The figure shows that using DDP can effectively avoid overlapping adjacent spiral arms from different mirrors caused by parameter selection, aiding in finding optimal parameters.complex tool paths can have different impacts on vibration signals in various dir meaning that characterizing the tool wear state using a vibration signal from a si rection might lead to misjudgments due to path variations.Therefore, employing transformation to distribute vibration signals from the X, Y, and Z axes across a des area and merge them into a visual image can leverage the complementarity and dancy between the three-axis vibration signals, thereby better characterizing to information.

Optimization of SDP Parameters Based on the Differential Evolution Algorithm
The discrete sampling data of vibration signals, after undergoing the SDP tr mation, maps the time series waveform onto the polar coordinate system to crea metrical scatter points.It combines vibration signals from the X, Y, and Z axes ac to their time series in a designated area, thus constructing a fused image of the th vibration signals.The effect of the SDP transformation is primarily influenced by coefficient l and the aperture gain ξ, mainly reflected in the SDP image's spiral ar curvature, and geometric center shape features.The selection of parameters shoul overlapping adjacent spiral arms from different mirrors in the SDP image, thereb ing features.The SDP image uses symmetrical spiral arms for representation becau metry is conducive to the human visual perception system's recognition and mem image features.However, with the development and application of computer visi no longer necessary to identify image features manually.Therefore, improving t method by replacing Equation (2) with Equation ( 3) can yield the desymmetriza pattern (DDP), which can reflect the characteristics of time series signals well and i the computational efficiency of image processing.Using reasonable parameters in the DDP transformation can make the featur ences of signals under different wear states more apparent in model training, prov basis for milling cutter wear state identification based on DDP images.Since differe images of the same wear state will have unique features, if these unique features weakened, image matching will tend to favor these unique features, ultimately lea inaccurate wear state identification.Therefore, using cluster analysis to establish te images for each wear state can strengthen common features and weaken unique f within the same wear state, helping to improve the accuracy of wear state identifi Describing the template images of each wear state with cluster centers and con DDP images into binary images, if more than 60% of the pixels at the same positio Using reasonable parameters in the DDP transformation can make the feature differences of signals under different wear states more apparent in model training, providing a basis for milling cutter wear state identification based on DDP images.Since different DDP images of the same wear state will have unique features, if these unique features are not weakened, image matching will tend to favor these unique features, ultimately leading to inaccurate wear state identification.Therefore, using cluster analysis to establish template images for each wear state can strengthen common features and weaken unique features within the same wear state, helping to improve the accuracy of wear state identification.Describing the template images of each wear state with cluster centers and converting DDP images into binary images, if more than 60% of the pixels at the same position in all binary images of the same wear state are 0, the corresponding pixel in the template image of that wear state is marked as 0, considered as an inherent feature of that template image.
Optimizing parameters with the maximization of the differences between template images of each wear state as an indicator can narrow the intra-class distance and expand the inter-class distance, thus facilitating the construction of a generalized decision boundary and reducing the impact of working condition changes on the milling cutter wear state identification model.The Euclidean distance is a distance metric commonly used to evaluate the similarity between images.The Euclidean distance d can assess the difference in template images of different wear states, expressed as: where {x 1 , x 2 , • • • , x n } and {y 1 , y 2 , • • • , y n } are the grayscale values of corresponding pixels in template images x and y, respectively.Reasonably selecting the lag coefficient l and aperture gain ξ can enhance the image differences between various wear states, but relying on subjective judgment can only choose relatively optimal parameters from a limited set, making it difficult to make a reasonable selection of the parameter combination [l, ξ].Therefore, the global optimization capability of the differential evolution algorithm are leveraged to search for parameter combinations, by mimicking the evolutionary process in nature to guide the population individuals gradually closer to the optimum.The differential evolution algorithm randomly generates an initial population within the feasible domain of variables [l, ξ], selects parent individuals for mutation, and performs crossover operations between parent individuals and mutated individuals to generate offspring.Through selection operations between parents and offspring, the algorithm saves the optimum individual to the next generation of the population.After continuous iterations of the population until convergence, it obtains the global optimum individual.The process is shown in Figure 6.
binary images of the same wear state are 0, the corresponding pixel in the templat of that wear state is marked as 0, considered as an inherent feature of that templat Optimizing parameters with the maximization of the differences between t images of each wear state as an indicator can narrow the intra-class distance and the inter-class distance, thus facilitating the construction of a generalized decision ary and reducing the impact of working condition changes on the milling cutter w identification model.The Euclidean distance is a distance metric commonly used uate the similarity between images.The Euclidean distance d can assess the diffe template images of different wear states, expressed as: where { 1 ,  2 , ⋯ ,   } and { 1 ,  2 , ⋯ ,   } are the grayscale values of correspondin in template images x and y, respectively.
Reasonably selecting the lag coefficient l and aperture gain ξ can enhance th differences between various wear states, but relying on subjective judgment c choose relatively optimal parameters from a limited set, making it difficult to mak sonable selection of the parameter combination [l, ξ].Therefore, the global optim capability of the differential evolution algorithm are leveraged to search for pa combinations, by mimicking the evolutionary process in nature to guide the pop individuals gradually closer to the optimum.The differential evolution algorit domly generates an initial population within the feasible domain of variables [l, ξ] parent individuals for mutation, and performs crossover operations between pare viduals and mutated individuals to generate offspring.Through selection operat tween parents and offspring, the algorithm saves the optimum individual to the n eration of the population.After continuous iterations of the population until conv it obtains the global optimum individual.The process is shown in Figure 6.

A Residual Network with an Integrated Attention Mechanism
In the milling process, the vibration signals from the three-axis direction mer a visualized DDP image serve as the input for constructing a milling cutter we identification model.The significant representational learning capability of a c

A Residual Network with an Integrated Attention Mechanism
In the milling process, the vibration signals from the three-axis direction merged into a visualized DDP image serve as the input for constructing a milling cutter wear state identification model.The significant representational learning capability of a convolutional neural network (CNN) in image recognition can be utilized.CNN abstracts information in layers, and as the network deepens, so does the level of abstraction.Compared to shallow convolutional neural networks, deeper networks can extract higher-dimensional feature information from images.As the network deepens, the expressions each layer needs to learn are relatively simpler, thereby optimizing model performance.However, training such models might encounter issues of gradient vanishing or exploding.
The residual network (ResNet) introduces residual connections into CNN, thereby simplifying the network learning process and effectively solving the issues of gradient disappearance, explosion, and network degradation caused by increased depth.ResNet consists of multiple residual blocks, each containing convolutional layers, batch normalization layers, and activation functions.Inputs and outputs are linearly added through skip connections before entering the activation function.The basic structure of a residual block is shown in Figure 7, where x is the input to the network structure, and F(x) is the mapping function.Traditional convolutional neural networks directly learn the mapping between input and output, whereas residual networks learn the residual mapping between input and output through skip connections, enhancing gradient propagation and breaking the symmetry of the network.
Appl.Sci.2024, 14, x FOR PEER REVIEW 8 of 21 tional neural network (CNN) in image recognition can be utilized.CNN abstracts information in layers, and as the network deepens, so does the level of abstraction.Compared to shallow convolutional neural networks, deeper networks can extract higher-dimensional feature information from images.As the network deepens, the expressions each layer needs to learn are relatively simpler, thereby optimizing model performance.However, training such models might encounter issues of gradient vanishing or exploding.
The residual network (ResNet) introduces residual connections into CNN, thereby simplifying the network learning process and effectively solving the issues of gradient disappearance, explosion, and network degradation caused by increased depth.ResNet consists of multiple residual blocks, each containing convolutional layers, batch normalization layers, and activation functions.Inputs and outputs are linearly added through skip connections before entering the activation function.The basic structure of a residual block is shown in Figure 7, where x is the input to the network structure, and F(x) is the mapping function.Traditional convolutional neural networks directly learn the mapping between input and output, whereas residual networks learn the residual mapping between input and output through skip connections, enhancing gradient propagation and breaking the symmetry of the network.Deepening the layers of a neural network enhances the network's expressive power, but it also increases the load of information processed, leading to information overload.The attention mechanism, mimicking human visual and cognitive systems, allows neural networks to focus on relevant parts of the input.By incorporating attention mechanisms, neural networks can automatically learn to focus selectively on important information in the input, reduce attention to other information, and even filter out irrelevant information, solving the problem of information overload and improving network performance and generalization ability.The convolutional block attention module (CBAM) is an attention mechanism module that enhances CNN performance by adding channel and spatial attention, thus improving network perception without increasing complexity.The structure of CBAM is shown in Figure 8, where channel attention helps enhance feature representation across different channels, and spatial attention helps extract key information from different spatial locations, focusing the network on information critical to the current task.Deepening the layers of a neural network enhances the network's expressive power, but it also increases the load of information processed, leading to information overload.The attention mechanism, mimicking human visual and cognitive systems, allows neural networks to focus on relevant parts of the input.By incorporating attention mechanisms, neural networks can automatically learn to focus selectively on important information in the input, reduce attention to other information, and even filter out irrelevant information, solving the problem of information overload and improving network performance and generalization ability.The convolutional block attention module (CBAM) is an attention mechanism module that enhances CNN performance by adding channel and spatial attention, thus improving network perception without increasing complexity.The structure of CBAM is shown in Figure 8, where channel attention helps enhance feature representation across different channels, and spatial attention helps extract key information from different spatial locations, focusing the network on information critical to the current task.
generalization ability.The convolutional block attention module (CBAM) is an attention mechanism module that enhances CNN performance by adding channel and spatial attention, thus improving network perception without increasing complexity.The structure of CBAM is shown in Figure 8, where channel attention helps enhance feature representation across different channels, and spatial attention helps extract key information from different spatial locations, focusing the network on information critical to the current task.In CBAM, the channel attention module (CAM) aggregates spatial information of the feature map F through global average pooling and global maximum pooling operations, followed by element-wise summation after entering a multi-layer perceptron (MLP), resulting in channel attention M c , expressed as: where σ is the Sigmoid function; F c avg is the global average pooling feature; F c max is the global maximum pooling feature; and W 0 and W 1 are the weights of the MLP, connected by the ReLU activation function between them.
The spatial attention module (SAM) in CBAM generates an effective feature descriptor by concatenating the features obtained from average pooling and maximum pooling along the channel axis of the feature map F, followed by a convolution operation to obtain spatial attention M s , expressed as: where f 7×7 represents a convolution operation with a 7 × 7 kernel; σ is the Sigmoid function; and F s avg and F s max are the average and maximum pooling features along the channel axis, respectively.
To accurately and swiftly classify DDP images based on the actual needs of milling cutter wear state identification, using ResNet-50 as the backbone of the identification model maintains high processing efficiency while keeping the network depth.To further enhance the performance of the identification model, integrating the attention mechanism allows the fusion of CBAM and ResNet-50, suppressing ineffective features and strengthening the network's ability to discern important features.By taking the output F(x) of each residual block's non-skip connection in ResNet-50 as the input feature map for CBAM, the CBAM-ResNet-50 network is constructed, enhancing the milling cutter wear state feature extraction capability of the identification model from both channel and spatial perspectives.

The Process of Recognizing the Wear State of Milling Cutters under Variable Conditions Based on Optimized SDP
In actual milling operations, working conditions vary widely, and a large amount of data from these conditions lacks labels, with different distributions between conditions.Utilizing transfer learning to find shared parameter information between source and target domains can reduce the model's reliance on extensive target domain data and enhance the model's generalization ability, thereby achieving the recognition of milling cutter wear states under variable conditions.The specific implementation steps are as follows.

1.
Collect vibration signals of milling cutters in the initial wear, normal wear, and severe wear stages under different working conditions, and measure the corresponding milling cutter wear amount; Replace the final layer in the CBAM-ResNet-50 model, perform representational learning on DDP images of complete data under source conditions, and fine-tune to form a generalized milling cutter wear state recognition model; 6.
Classify DDP images of vibration signals under target conditions to identify the wear state of milling cutters in variable condition scenarios.
The structure diagram for recognizing the wear state of milling cutters under variable conditions based on optimized SDP is shown in Figure 9.The structure diagram for recognizing the wear state of milling cutters under variable conditions based on optimized SDP is shown in Figure 9.

Milling Cutter Wear Experiment and Analysis of Recognition Results
The milling cutter wear experiment was conducted on a vertical CNC milling machine, using an uncoated high-speed steel end mill with a straight shank.The geometric parameters of the milling cutter are shown in Table 1.The workpiece was made of 45 modulated steel, measuring 100 mm × 100 mm × 80 mm.To effectively capture the vibration signals during milling, an accelerometer was mounted on one side of the workpiece.A Dytran 3097A2 model accelerometer, with a range of ±50 g and sensitivity of 100 mV/g, was selected to collect vibration signals in the X, Y, and Z directions.The accompanying SIRIUSi 8XACC data acquisition card was used, with a sampling rate set to 10,000 Hz.The digital microscope consists of an LED light source, a 0.7× to 5× parfocal lens, and a chargecoupled device camera.The magnification ranges from 20× to 180× with a working distance of 105 mm.It is used to measure the central width of the main rear face of milling cutters.The milling tool wear experiment setup is shown in Figure 10.

Milling Cutter Wear Experiment and Analysis of Recognition Results
The milling cutter wear experiment was conducted on a vertical CNC milling machine, using an uncoated high-speed steel end mill with a straight shank.The geometric parameters of the milling cutter are shown in Table 1.The workpiece was made of 45 modulated steel, measuring 100 mm × 100 mm × 80 mm.To effectively capture the vibration signals during milling, an accelerometer was mounted on one side of the workpiece.A Dytran 3097A2 model accelerometer, with a range of ±50 g and sensitivity of 100 mV/g, was selected to collect vibration signals in the X, Y, and Z directions.The accompanying SIRIUSi 8XACC data acquisition card was used, with a sampling rate set to 10,000 Hz.The digital microscope consists of an LED light source, a 0.7× to 5× parfocal lens, and a charge-coupled device camera.The magnification ranges from 20× to 180× with a working distance of 105 mm.It is used to measure the central width of the main rear face of milling cutters.The milling tool wear experiment setup is shown in Figure 10.Under dry cutting operations, experiments were conducted under six different conditions, with a full lifecycle wear experiment for the milling cutter in each condition, as shown in Table 2. To gather data related to the evolution of cutter wear states, signals were collected online, and wear was measured offline.The experiment used peripheral milling along the X-axis direction of the milling machine, with each cut spanning 100 mm.After every ten cuts, the wear of the milling cutter was measured to ensure the collected vibration signals corresponded to the cutter's wear state.Due to the potential for installation state changes when the tool is removed, wear measurement was conducted on the machine.The primary flank wear volume of the milling cutter was assessed to evaluate the tool wear state, calculating the difference between the initial and current values in the central zone of the blade's flank as the current wear volume for that blade.The average current wear volume of the three blades was determined as the current wear volume VB of the milling cutter.According to the machining handbook, VB less than 0.05 mm indicates the initial wear stage, between 0.05 to 0.3 mm indicates normal wear, and more than 0.3 mm indicates severe wear.The new milling cutter and three different states of worn milling cutters are shown in Figure 11.Taking condition 1 as an example, the milling cutter wear curve is shown in Figure 12.Under dry cutting operations, experiments were conducted under six different conditions, with a full lifecycle wear experiment for the milling cutter in each condition, as shown in Table 2. To gather data related to the evolution of cutter wear states, signals were collected online, and wear was measured offline.The experiment used peripheral milling along the X-axis direction of the milling machine, with each cut spanning 100 mm.After every ten cuts, the wear of the milling cutter was measured to ensure the collected vibration signals corresponded to the cutter's wear state.Due to the potential for installation state changes when the tool is removed, wear measurement was conducted on the machine.The primary flank wear volume of the milling cutter was assessed to evaluate the tool wear state, calculating the difference between the initial and current values in the central zone of the blade's flank as the current wear volume for that blade.The average current wear volume of the three blades was determined as the current wear volume VB of the milling cutter.According to the machining handbook, VB less than 0.05 mm indicates the initial wear stage, between 0.05 to 0.3 mm indicates normal wear, and more than 0.3 mm indicates severe wear.The new milling cutter and three different states of worn milling cutters are shown in Figure 11.Taking condition 1 as an example, the milling cutter wear curve is shown in Figure 12.In Figure 13a-c, the amplitude range of vibration signals in three axes is not uniform.By merging three-axis signals into one visual image, misjudgment due to path variation when characterizing tool wear state with a single directional vibration signal in milling operations can be avoided.Meanwhile, Figure 13d,e underwent SDP and DDP transformations, respectively.Compared to the SDP transformation, the DDP transformation adopts non-symmetrical spiral arms to represent time-domain signals, increasing the range of spiral arm opening angles, which can reduce the overlapping phenomenon between adjacent arms.As the DDP image spiral arms are primarily influenced by parameters l and ξ, selecting a reasonable parameter combination [l, ξ] for DDP transformation makes the feature differences of DDP images under different wear states more pronounced, facilitating improved recognition accuracy of milling cutter wear states.Utilizing cluster analysis to extract template images for each wear state, the Euclidean distance d measures the difference in template images between different wear states, serving as an indicator for parameter optimization.Continuing with working condition 1 as an example, after performing DDP transformations on the processed vibration signals under different parameters, the template images for each wear state are described using cluster centers.The variation of the template images difference d is shown in Figure 14.In Figure 13a-c, the amplitude range of vibration signals in three axes is not uniform.By merging three-axis signals into one visual image, misjudgment due to path variation when characterizing tool wear state with a single directional vibration signal in milling operations can be avoided.Meanwhile, Figure 13d,e underwent SDP and DDP transformations, respectively.Compared to the SDP transformation, the DDP transformation adopts non-symmetrical spiral arms to represent time-domain signals, increasing the range of spiral arm opening angles, which can reduce the overlapping phenomenon between adjacent arms.As the DDP image spiral arms are primarily influenced by parameters l and ξ, selecting a reasonable parameter combination [l, ξ] for DDP transformation makes the feature differences of DDP images under different wear states more pronounced, facilitating improved recognition accuracy of milling cutter wear states.Utilizing cluster analysis to extract template images for each wear state, the Euclidean distance d measures the difference in template images between different wear states, serving as an indicator for parameter optimization.Continuing with working condition 1 as an example, after performing DDP transformations on the processed vibration signals under different parameters, the template images for each wear state are described using cluster centers.The variation of the template images difference d is shown in Figure 14.
From Figure 14, it is evident that different parameter combinations [l, ξ] correspond to multi-peaked functions for template image difference d, and there is a coupling effect between parameters l and ξ in DDP transformation.Therefore, optimizing based on a single influencing parameter and neglecting the interaction between the two results is a relatively optimal solution.Utilizing the global optimization capability of DE, iterative optimization of the parameter combination [l, ξ] is conducted with template image difference d maximization as the indicator.According to the milling process parameters of condition 1, the range for the lag coefficient l is [0, 13], and the range for the aperture gain ξ is [0, 120].The variation in the objective function value d of DE-DDP with the number of iterations g is shown in Figure 15.From the figure, DE-DDP converges from the 14th iteration, with the optimal solution being l = 13, ξ = 84.From Figure 14, it is evident that different parameter combinations [l, ξ] correspo to multi-peaked functions for template image difference d, and there is a coupling eff between parameters l and ξ in DDP transformation.Therefore, optimizing based on a s gle influencing parameter and neglecting the interaction between the two results is a r atively optimal solution.Utilizing the global optimization capability of DE, iterative op mization of the parameter combination [l, ξ] is conducted with template image differen d maximization as the indicator.According to the milling process parameters of conditi 1, the range for the lag coefficient l is [0, 13], and the range for the aperture gain  In the process of parameter optimization, using cluster analysis to extract templa images of different wear states strengthens common features and weakens unique fe tures within the same wear state, aiding in the quantitative indicator during iterative o timization.The dataset of condition 1, under the optimal parameter combination, yiel template images of different wear states as shown in Figure 16.From the figure, it is cle that template images under different wear states have distinct features, and the cor sponding spiral arm differences vary in size.Therefore, using DDP transformation merge three-axis directional vibration signals into one can circumvent the phenomen where single-direction vibration signal features cannot fully reflect the differences in to wear state information, limiting its applicability.From Figure 14, it is evident that different parameter combinations [l, ξ] correspond to multi-peaked functions for template image difference d, and there is a coupling effect between parameters l and ξ in DDP transformation.Therefore, optimizing based on a single influencing parameter and neglecting the interaction between the two results is a relatively optimal solution.Utilizing the global optimization capability of DE, iterative optimization of the parameter combination [l, ξ] is conducted with template image difference d maximization as the indicator.According to the milling process parameters of condition 1, the range for the lag coefficient l is [0, 13], and the range for the aperture gain ξ is [0, 120].The variation in the objective function value d of DE-DDP with the number of iterations g is shown in Figure 15.From the figure, DE-DDP converges from the 14th iteration, with the optimal solution being  = 13,  = 84.In the process of parameter optimization, using cluster analysis to extract template images of different wear states strengthens common features and weakens unique features within the same wear state, aiding in the quantitative indicator during iterative optimization.The dataset of condition 1, under the optimal parameter combination, yields template images of different wear states as shown in Figure 16.From the figure, it is clear that template images under different wear states have distinct features, and the corresponding spiral arm differences vary in size.Therefore, using DDP transformation to merge three-axis directional vibration signals into one can circumvent the phenomenon where single-direction vibration signal features cannot fully reflect the differences in tool wear state information, limiting its applicability.In the process of parameter optimization, using cluster analysis to extract template images of different wear states strengthens common features and weakens unique features within the same wear state, aiding in the quantitative indicator during iterative optimization.The dataset of condition 1, under the optimal parameter combination, yields template images of different wear states as shown in Figure 16.From the figure, it is clear that template images under different wear states have distinct features, and the corresponding spiral arm differences vary in size.Therefore, using DDP transformation to merge three-axis directional vibration signals into one can circumvent the phenomenon where single-direction vibration signal features cannot fully reflect the differences in tool wear state information, limiting its applicability.For milling cutter wear experiments under conditions 1 to 5, after obtaining the optimal parameter combinations for DDP transformation according to the aforementioned method, DDP images of vibration signals from all three axes were drawn.The image resolution was set to 224 × 224, and then converted into grayscale images to serve as comprehensive data for the milling cutter wear state recognition model.The data under each wear state was divided into a training set and a validation set in a 7:3 ratio.The dataset from condition 6 was used as test data to assess the generalization ability of the proposed method in recognizing milling cutter wear states under varying conditions.Since the vibration signals under condition 6 were unlabeled, appropriate DDP transformation parameters could not be obtained.Therefore, the optimal parameter combinations from conditions 1 to 5 were used for the DDP transformation of vibration signals under condition 6, with each set of three-axis vibration signals producing five DDP images, which were then recognized individually, and the wear state with the highest probability was deemed the recognition result.
Due to the limited comprehensive datasets in actual milling operations, training models on small sample datasets cannot effectively extract features reflecting tool wear states.Transfer learning can introduce model weights well-trained on large datasets into new models, thereby enhancing the training efficiency of new models and reducing the demand for large sample sizes.Thus, model parameters of ResNet-50 pre-trained on the large ImageNet dataset were transferred to the recognition model based on ResNet-50 as the backbone network, speeding up the model's convergence and shortening training time, while also improving the model's robustness and generalization ability.To further enhance the feature extraction capability of the recognition model, an attention mechanism was introduced to construct the CBAM-ResNet-50 network, improving the model's perception from both channel and spatial dimensions.
Since the pre-trained model used the ImageNet dataset with 1000 categories, while the milling cutter wear states were divided into three classes as data labels, it was necessary to design an adapter layer with an output dimension of three to replace the final layer of the pre-trained model.Additionally, due to the different data distribution characteristics between the two datasets, the model needed fine-tuning to better adapt to milling cutter wear state recognition.During the representational learning process of DDP images of comprehensive data under source conditions in the recognition model, model parameters were adjusted along the gradient descent direction during backpropagation to reduce the loss value continuously, optimizing the model.The adaptive moment estimation (Adam) algorithm was selected as the model's optimizer, combining the advantages of the momentum method and adaptive learning rate, aimed at accelerating the convergence of gradient descent, making the optimization of neural network parameters more effective.In gradient descent, the learning rate was set to control the step size of parameter updates, typically starting with a larger learning rate to accelerate network convergence, and reducing it later to guide the network towards the optimum solution more effectively.
To explore the impact of learning rate decay strategies and sizes on the recognition model, under the same training options, three initial learning rates of 0.001, 0.0005, and 0.0001 were set, with two training strategies of fixed learning rate and decaying learning rate employed.Training results are shown in Table 3, where the model achieved the highest accuracy of 97.39% with an initial learning rate of 0.001 using the learning rate decay strategy.Models using the learning rate decay strategy generally had slightly higher accuracies than those with fixed learning rates under the same initial learning rate.The accuracy change curves of the model using the two training strategies with an initial learning rate of 0.001 are shown in Figure 17, and the accuracy change curves of the model trained using the learning rate decay strategy under three initial learning rates are shown in Figure 18.strategy.Models using the learning rate decay strategy generally had slightly higher accuracies than those with fixed learning rates under the same initial learning rate.The accuracy change curves of the model using the two training strategies with an initial learning rate of 0.001 are shown in Figure 17, and the accuracy change curves of the model trained using the learning rate decay strategy under three initial learning rates are shown in Figure 18.   Figure 17 shows that with an initial learning rate of 0.001, using a fixed learning rate for training resulted in slower model convergence and relatively lower accuracy.Updating the learning rate with a decay method, which reduces the learning rate after a fixed curacies than those with fixed learning rates under the same initial learning rate.The accuracy change curves of the model using the two training strategies with an initial learning rate of 0.001 are shown in Figure 17, and the accuracy change curves of the model trained using the learning rate decay strategy under three initial learning rates are shown in Figure 18.   Figure 17 shows that with an initial learning rate of 0.001, using a fixed learning rate for training resulted in slower model convergence and relatively lower accuracy.Updating the learning rate with a decay method, which reduces the learning rate after a fixed Figure 17 shows that with an initial learning rate of 0.001, using a fixed learning rate for training resulted in slower model convergence and relatively lower accuracy.Updating the learning rate with a decay method, which reduces the learning rate after a fixed number of training epochs, can avoid the issue of parameter oscillation near the minimum value in the later stages of training.In Figure 18, with all models using the learning rate decay strategy, when the initial learning rate was set to 0.0001, the model converged slowly and failed to reach the optimum solution.When set to 0.0005, the model converged fastest but with lower accuracy.Thus, setting the initial learning rate to 0.001 and using the decay strategy was more suitable for training the recognition model.Considering the experimental conditions and model training effects, the hyperparameters for the recognition model were chosen as follows: Adam optimizer, initial learning rate of 0.001, learning rate decay strategy, 100 epochs, and a batch size of 32.
To verify the performance of the proposed model in milling cutter wear state recognition, a comparative experiment was conducted with VGG-16, ShuffleNet, and ResNet-50 neural networks, all trained under the same conditions using models pre-trained on the ImageNet dataset.After 100 epochs, the models' performances were evaluated using the accuracy and loss values of the validation set.The accuracy and loss changes of the validation set during the training process for each model are shown in Figures 19 and 20.
the decay strategy was more suitable for training the recognition model.Considering the experimental conditions and model training effects, the hyperparameters for the recognition model were chosen as follows: Adam optimizer, initial learning rate of 0.001, learning rate decay strategy, 100 epochs, and a batch size of 32.
To verify the performance of the proposed model in milling cutter wear state recognition, a comparative experiment was conducted with VGG-16, ShuffleNet, and ResNet-50 neural networks, all trained under the same conditions using models pre-trained on the ImageNet dataset.After 100 epochs, the models' performances were evaluated using the accuracy and loss values of the validation set.The accuracy and loss changes of the validation set during the training process for each model are shown in Figures 19 and 20     the decay strategy was more suitable for training the recognition model.Considering the experimental conditions and model training effects, the hyperparameters for the recognition model were chosen as follows: Adam optimizer, initial learning rate of 0.001, learning rate decay strategy, 100 epochs, and a batch size of 32.
To verify the performance of the proposed model in milling cutter wear state recognition, a comparative experiment was conducted with VGG-16, ShuffleNet, and ResNet-50 neural networks, all trained under the same conditions using models pre-trained on the ImageNet dataset.After 100 epochs, the models' performances were evaluated using the accuracy and loss values of the validation set.The accuracy and loss changes of the validation set during the training process for each model are shown in Figures 19 and 20      To further test the generalization ability of the proposed method in recognizing milling cutter wear states under varying conditions, the trained model was saved, and the test set based on data from condition 6 was classified.The classification results of the test set were visualized using a confusion matrix, as shown in Figure 21, with the horizontal axis representing the predicted milling cutter wear states and the vertical axis representing the true wear states.The elements on the main diagonal of the matrix indicate the number of samples correctly predicted, while other parts represent the number of samples incorrectly predicted.converges first, with a relatively smooth curve and less fluctuation than other models, showing more stability and the smallest loss value.
To further test the generalization ability of the proposed method in recognizing milling cutter wear states under varying conditions, the trained model was saved, and the test set based on data from condition 6 was classified.The classification results of the test set were visualized using a confusion matrix, as shown in Figure 21, with the horizontal axis representing the predicted milling cutter wear states and the vertical axis representing the true wear states.The elements on the main diagonal of the matrix indicate the number of samples correctly predicted, while other parts represent the of samples incorrectly predicted.Precision, recall, and F1 score, commonly used to evaluate the performance of multiclassification models, were obtained from the confusion matrix.Precision is the proportion of correctly predicted samples of a certain type, recall is the proportion of correctly predicted samples among the true samples of that type, and the F1 score is the harmonic mean of precision and recall, providing a comprehensive evaluation of both.The performance indicators of the recognition model's generalization ability test under varying conditions, based on the confusion matrix in Figure 21, are shown in Table 4.  4 shows that the recognition model performed well on different types of samples in the test set, with relatively few misidentifications, demonstrating strong generalization ability.The results indicate that the proposed method can effectively extract feature information from milling vibration signals and reduce the impact of condition changes on the mapping relationship of milling cutter wear states.It has good recognition accuracy Precision, recall, and F1 score, commonly used to evaluate the performance of multiclassification models, were obtained from the confusion matrix.Precision is the proportion of correctly predicted samples of a certain type, recall is the proportion of correctly predicted samples among the true samples of that type, and the F1 score is the harmonic mean of precision and recall, providing a comprehensive evaluation of both.The performance indicators of the recognition model's generalization ability test under varying conditions, based on the confusion matrix in Figure 21, are shown in Table 4. Table 4 shows that the recognition model performed well on different types of samples in the test set, with relatively few misidentifications, demonstrating strong generalization ability.The results indicate that the proposed method can effectively extract feature information from milling vibration signals and reduce the impact of condition changes on the mapping relationship of milling cutter wear states.It has good recognition accuracy and generalization ability in recognizing milling cutter wear states under varying conditions, making it applicable to the recognition of tool wear states in actual milling operations.

Conclusions
Addressing the issue of varying working conditions and the lack of corresponding data labels in actual cutting operations, this paper proposes a method using optimized SDP for recognizing milling cutter wear states under variable conditions.This method can use data from source conditions to build a generalized milling cutter wear state recognition model, solving the problem of traditional data-driven training models being unable to recognize conditions under variable working conditions.Experimental and comparative analyses lead to the following conclusions.

1.
The vibration signals in milling operations can be transformed from one-dimensional time series to two-dimensional images using SDP transformation, merging vibration signals from the X, Y, and Z axes based on their time series in a designated area to form a unified feature representation.Using a desymmetrization dot pattern effectively avoids overlapping adjacent spiral arms from different mirrors due to parameter selection, aiding in finding optimal parameters.2.
Using cluster analysis to extract template images for each wear state, with Euclidean distance maximization as an indicator, adaptively optimizes the selection of SDP parameters l and ξ using the differential evolution algorithm.This effectively solves the problem of SDP transformation effects being limited by preset parameter selection, narrows the intra-class distance, and expands the inter-class difference, thus facilitating the construction of a generalized decision boundary and reducing the impact of working condition changes on the milling cutter wear state mapping relationship.

3.
The use of the CBAM-ResNet-50 network for feature learning of images under source conditions constructs a generalized milling cutter wear state recognition model.The results show that the recognition accuracy for test samples under variable conditions can reach over 95%.This method can effectively use labeled data from source conditions to classify unlabeled data from target conditions, achieving the recognition of milling cutter wear states in variable condition scenarios with good recognition accuracy and generalization ability.
Experimental analysis results indicate that this method can serve as an effective solution for milling cutter wear monitoring under variable conditions, aligning more closely with real-world cutting operation scenarios.However, recognizing only three different wear states of milling cutters is insufficient, necessitating further research and improvement to identify more precise wear states of milling cutters.

Figure 1 .
Figure 1.The schematic diagram of the principle of the SDP transformation.

Figure 1 .
Figure 1.The schematic diagram of the principle of the SDP transformation.
Appl.Sci.2024, 14, x FOR PEER REVIEW 4 of 21 Taking a sine signal  = sin(200π) with a sampling frequency of 12,800 Hz as an example, the range for the lag coefficient l in the SDP transformation is [0, 128].With the number of mirrors m set to 1 and the aperture gain ξ to 60°, the changes in the SDP image of this sine signal with varying lag coefficient l are shown in Figures 2 and 3.

Figure 2 .
Figure 2. The SDP image when the lag coefficient l changes during the first half of the cycle.

Figure 3 .
Figure 3.The SDP image when the lag coefficient l changes during the second half of the cycle.

Figure 2 .
Figure 2. The SDP image when the lag coefficient l changes during the first half of the cycle.

Figure 2 .
Figure 2. The SDP image when the lag coefficient l changes during the first half of the cycle.

Figure 3 .
Figure 3.The SDP image when the lag coefficient l changes during the second half of the cycle.

Figure 3 .
Figure 3.The SDP image when the lag coefficient l changes during the second half of the cycle.
is chosen, i.e., the lag coefficient l is 32.With the number of mirrors m set to 3, and based on ξ ≤ 360 • /m, the range for the aperture gain ξ in the SDP transformation is [0 • , 120 • ].The changes in the SDP image of this sine signal with varying aperture gain ξ are shown in Figure 4.

Figure 4 .
Figure 4.The SDP image when the aperture gain ξ changes.

Figure 4 .
Figure 4.The SDP image when the aperture gain ξ changes.
=  ′ + (  + − − ) , Taking a sine signal  = sin(200π) with a sampling frequency of 12,800 H example, with the number of mirrors m set to 3, lag coefficient l to 32, and apertur to 90°, the SDP and DDP images of this sine signal under the same parameters are in Figure 5.The figure shows that using DDP can effectively avoid overlapping a spiral arms from different mirrors caused by parameter selection, aiding in findin mal parameters.

Figure 5 .
Figure 5. SDP and DDP images under the same parameters.

Figure 5 .
Figure 5. SDP and DDP images under the same parameters.

Figure 6 .
Figure 6.Flowchart of the differential evolution algorithm.

Figure 6 .
Figure 6.Flowchart of the differential evolution algorithm.

Figure 7 .
Figure 7.The basic structure of a residual block.

Figure 7 .
Figure 7.The basic structure of a residual block.

2 .
Use cluster analysis to extract template images for each wear state, with the maximization of differences between template images of each wear state as an indicator, and employ DE to search for the optimal parameter combination [l, ξ] for DDP transformation of milling vibration signals; 3. Use the optimal parameter combination [l, ξ] to perform DDP transformation of milling vibration signals, merging vibration signals from three-axis directions into visualized DDP images; 4. Utilize the ResNet-50 model pre-trained on the ImageNet dataset, introduce CBAM, and construct the CBAM-ResNet-50 model; 5.

21 5.
Appl.Sci.2024, 14, x FOR PEER REVIEW 10 of Replace the final layer in the CBAM-ResNet-50 model, perform representational learning on DDP images of complete data under source conditions, and fine-tune to form a generalized milling cutter wear state recognition model; 6. Classify DDP images of vibration signals under target conditions to identify the wear state of milling cutters in variable condition scenarios.

Figure 9 .
Figure 9.The structure diagram for recognizing the wear state of milling cutters under variable conditions based on optimized SDP.

Figure 9 .
Figure 9.The structure diagram for recognizing the wear state of milling cutters under variable conditions based on optimized SDP.

Figure 11 .
Figure 11.Different wear states of milling cutter.

Figure 12 .
Figure 12.Milling cutter wear curve.This experiment is able to collect vibration signals from the X, Y, and Z axes of the workpiece.After data processing, the SDP transform is used to merge the multi-channel signals into a visual image, which intuitively represents the characteristics of the threeaxis vibration signals.This approach leverages the complementarity and redundancy among the signals to better characterize the information related to tool wear.The SDP transformation effect is limited by the preset parameters: the number of mirrors m, the lag coefficient l, and the aperture gain ξ, where the number of mirrors m depends on the number of merged signals, thus  = 3, making the starting angles  ′ the vibration signals in the X, Y, and Z directions 120°, 240°, and 360°, respectively.Different selections of l and ξ affect the topological structure of the SDP image.However, optimizing the SDP transformation by adopting the DDP transformation can better avoid overlapping between the spiral arms of different signals, concealing features.Taking the vibration signals collected from the X, Y, and Z axes during the 101st tool pass of working condition 1 as an example, after data processing, both SDP and DDP transformations are performed under the same parameters, as shown in Figure 13.

Figure 11 .
Figure 11.Different wear states of milling cutter.

Figure 11 .
Figure 11.Different wear states of milling cutter.

Figure 12 .
Figure 12.Milling cutter wear curve.This experiment is able to collect vibration signals from the X, Y, and Z axes of the workpiece.After data processing, the SDP transform is used to merge the multi-channel signals into a visual image, which intuitively represents the characteristics of the threeaxis vibration signals.This approach leverages the complementarity and redundancy among the signals to better characterize the information related to tool wear.The SDP transformation effect is limited by the preset parameters: the number of mirrors m, the lag coefficient l, and the aperture gain ξ, where the number of mirrors m depends on the number of merged signals, thus  = 3, making the starting angles  ′ of the vibration signals in the X, Y, and Z directions 120°, 240°, and 360°, respectively.Different selections of l and ξ affect the topological structure of the SDP image.However, optimizing the SDP transformation by adopting the DDP transformation can better avoid overlapping between the spiral arms of different signals, concealing features.Taking the vibration signals collected from the X, Y, and Z axes during the 101st tool pass of working condition 1 as an example, after data processing, both SDP and DDP transformations are performed under the same parameters, as shown in Figure 13.

Figure 12 .
Figure 12.Milling cutter wear curve.This experiment is able to collect vibration signals from the X, Y, and Z axes of the workpiece.After data processing, the SDP transform is used to merge the multi-channel signals into a visual image, which intuitively represents the characteristics of the three-axis vibration signals.This approach leverages the complementarity and redundancy among the signals to better characterize the information related to tool wear.The SDP transformation effect is limited by the preset parameters: the number of mirrors m, the lag coefficient l, and the aperture gain ξ, where the number of mirrors m depends on the number of merged signals, thus m = 3, making the starting angles Θ ′ of the vibration signals in the X, Y, and Z directions 120 • , 240 • , and 360 • , respectively.Different selections of l and ξ affect the topological structure of the SDP image.However, optimizing the SDP transformation by adopting the DDP transformation can better avoid overlapping between the spiral arms of different signals, concealing features.Taking the vibration signals collected from the X, Y, and Z axes during the 101st tool pass of working condition 1 as an example, after data processing, both SDP and DDP transformations are performed under the same parameters, as shown in Figure 13.

Figure 13 .
Figure 13.Comparison between SDP transformation and DDP transformation.

Figure 13 .
Figure 13.Comparison between SDP transformation and DDP transformation.

Figure 14 .
Figure 14.Variation in template image difference d under different parameters.
ξ is 120].The variation in the objective function value d of DE-DDP with the number of ite tions g is shown in Figure 15.From the figure, DE-DDP converges from the 14th iteratio with the optimal solution being  = 13,  = 84.

Figure 15 .
Figure 15.Variation in DE-DDP objective function value d with iteration number g.

Figure 14 .
Figure 14.Variation in template image difference d under different parameters.

Figure 14 .
Figure 14.Variation in template image difference d under different parameters.

Figure 15 .
Figure 15.Variation in DE-DDP objective function value d with iteration number g.

Figure 15 .
Figure 15.Variation in DE-DDP objective function value d with iteration number g.

21 Figure 16 .
Figure 16.Template images of different wear states.For milling cutter wear experiments under conditions 1 to 5, after obtaining the optimal parameter combinations for DDP transformation according to the aforementioned method, DDP images of vibration signals from all three axes were drawn.The image resolution was set to 224 × 224, and then converted into grayscale images to serve as comprehensive data for the milling cutter wear state recognition model.The data under each wear state was divided into a training set and a validation set in a 7:3 ratio.The dataset from

Figure 16 .
Figure 16.Template images of different wear states.

Figure 17 .
Figure 17.Accuracy change curves of the validation set under different training strategies.

Figure 18 .
Figure 18.Accuracy change curves of the validation set under different initial learning rates.

Figure 17 .
Figure 17.Accuracy change curves of the validation set under different training strategies.

Figure 17 .
Figure 17.Accuracy change curves of the validation set under different training strategies.

Figure 18 .
Figure 18.Accuracy change curves of the validation set under different initial learning rates.

Figure 18 .
Figure 18.Accuracy change curves of the validation set under different initial learning rates. .

Figure 19 .
Figure 19.Accuracy change curves of the validation set under different models.

Figure 20 .
Figure 20.Loss change curves of the validation set under different models.

Figure 19
Figure 19 shows that all models trained on the ImageNet dataset and transferred to milling cutter wear state recognition achieved high accuracies in a short time, indicating that transfer learning can effectively solve the problems of long training time and low accuracy in classification recognition algorithms.Compared to the VGG-16 and Shuf-fleNet neural networks, the ResNet-50 network structure incorporates residual blocks, allowing for deeper networks to extract high-dimensional feature information and avoid network degradation.The CBAM-ResNet-50 network, which introduces an attention

Figure 19 .
Figure 19.Accuracy change curves of the validation set under different models. .

Figure 19 .
Figure 19.Accuracy change curves of the validation set under different models.

Figure 20 .
Figure 20.Loss change curves of the validation set under different models.

Figure 19
Figure 19 shows that all models trained on the ImageNet dataset and transferred to milling cutter wear state recognition achieved high accuracies in a short time, indicating that transfer learning can effectively solve the problems of long training time and low accuracy in classification recognition algorithms.Compared to the VGG-16 and Shuf-fleNet neural networks, the ResNet-50 network structure incorporates residual blocks, allowing for deeper networks to extract high-dimensional feature information and avoid network degradation.The CBAM-ResNet-50 network, which introduces an attention

Figure 20 .
Figure 20.Loss change curves of the validation set under different models.

Figure 19
Figure19shows that all models trained on the ImageNet dataset and transferred to milling cutter wear state recognition achieved high accuracies in a short time, indicating that transfer learning can effectively solve the problems of long training time and low accuracy in classification recognition algorithms.Compared to the VGG-16 and ShuffleNet neural networks, the ResNet-50 network structure incorporates residual blocks, allowing for deeper networks to extract high-dimensional feature information and avoid network degradation.The CBAM-ResNet-50 network, which introduces an attention mechanism on the basis of ResNet-50, enables the neural network to automatically learn and selectively focus on important information in the input, thus converging faster and predicting more accurately, ultimately achieving an accuracy of 97.39%.The loss change curves in Figure20represent the change in the gap between model predictions and true results as the number of iterations increases.The CBAM-ResNet-50 network's loss rate converges first, with a relatively smooth curve and less fluctuation than other models, showing more stability and the smallest loss value.To further test the generalization ability of the proposed method in recognizing milling cutter wear states under varying conditions, the trained model was saved, and the test set based on data from condition 6 was classified.The classification results of the test set were visualized using a confusion matrix, as shown in Figure21, with the horizontal axis representing the predicted milling cutter wear states and the vertical axis representing

Figure 21 .
Figure 21.The confusion matrix of the test results.

Figure 21 .
Figure 21.The confusion matrix of the test results.

Table 1 .
Geometric parameters of the milling cutter.

Table 1 .
Geometric parameters of the milling cutter.

of the Cutter Shank Diameter Cutting Length Overall Length Number of Teeth Helix Angle
• Appl.Sci.2024, 14, x FOR PEER REVIEW 11 of 21

Table 3 .
Model training results with different learning rates.

Table 3 .
Model training results with different learning rates.

Table 3 .
Model training results with different learning rates.

Table 4 .
Indicators of the model's generalization ability test.

Table 4 .
Indicators of the model's generalization ability test.