Cutting Tool Wear Condition Monitoring in Milling Using Deep Learning and Data Fusion

Bienvenu, Cikala Bagalwa; Bovic, Kilundu Y’Ebondo; Dany, Katamba Mpoyi; Casavola, Caterina; Pappalettera, Giovanni

doi:10.3390/app16126063

Open AccessArticle

Cutting Tool Wear Condition Monitoring in Milling Using Deep Learning and Data Fusion

by

Cikala Bagalwa Bienvenu

^1,2

,

Kilundu Y’Ebondo Bovic

^3,4

,

Katamba Mpoyi Dany

^5,*

,

Caterina Casavola

⁵ and

Giovanni Pappalettera

⁵

¹

Department of Mechanical Engineering, Mapon University, Kindu 611101, Democratic Republic of the Congo

²

Polytechnic, Notre Dame du KASAYI University, Kananga 5011031, Democratic Republic of the Congo

³

ISIB, Haute Ecole Bruxelles-Brabant, Rue des Goujons 28, 1070 Bruxelles, Belgium

⁴

Section Mécanique, Institut Supérieur des Techniques Appliquées, Kinshasa 1021101, Democratic Republic of the Congo

⁵

Dipartimento di Meccanica, Matematica e Management, Politecnico di Bari, Via E. Orabona 4, 70125 Bari, Italy

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(12), 6063; https://doi.org/10.3390/app16126063 (registering DOI)

Submission received: 13 May 2026 / Revised: 11 June 2026 / Accepted: 12 June 2026 / Published: 15 June 2026

(This article belongs to the Special Issue Structural Health Monitoring Using Ultrasonic and Vibrational Methods)

Download

Browse Figures

Versions Notes

Abstract

Tool wear directly affects surface quality, dimensional accuracy, and manufacturing cost in milling operations, making reliable wear state classification essential for process control. This paper presents an offline deep learning framework for multiclass tool wear classification using the UC Berkeley milling dataset (NASA-Ames). Statistical features are extracted from vibration, acoustic emission, and spindle motor current signals, and dimensionality is reduced from 78 to 9 informative variables using LASSO regression. A four-layer Long Short-Term Memory (LSTM) network then models the temporal evolution of tool degradation across three wear states: healthy, degraded, and failed. Two model variants are compared: Model A uses sensor-derived features only, while Model B additionally incorporates feed rate and depth of cut as inputs. To prevent data leakage, partitioning is performed at the machining-case level rather than at the individual window level. Model A achieves 92% classification accuracy; Model B reaches 95%, demonstrating that cutting conditions provide contextual information that resolves ambiguity between wear states produced under different machining regimes. These results confirm that combining multisensor feature fusion, LASSO-based selection, and sequential deep learning constitutes an effective framework for tool wear classification in milling.

Keywords:

deep learning; tool wear monitoring; milling; LSTM; data fusion; LASSO

1. Introduction

Tool wear is a critical phenomenon in machining processes such as milling, drilling, and turning, as it directly affects surface integrity, dimensional accuracy, productivity, and manufacturing cost. Machining parameters such as spindle speed, feed rate, tool geometry, and workpiece material significantly influence the evolution of tool wear. In particular, Cui et al. [1] demonstrated that cutting parameters strongly affect wear mechanisms during high-speed surface milling of hardened steel.

Several analytical and empirical models have been proposed to predict tool wear progression [2]. However, these models often fail to account for the stochastic behaviour of machining processes and the variability encountered in real industrial environments [3], limiting their predictive performance in practical applications. To overcome these limitations, data-driven approaches have emerged as effective alternatives. As reviewed by Sick [4], such approaches provide improved accuracy by learning nonlinear relationships between sensor signals and tool condition directly from data, without requiring an explicit physical model of the wear process.

In modern automated and semi-automated manufacturing environments, reliable tool condition monitoring (TCM) systems are essential to ensure process stability and product quality. A typical TCM system includes data acquisition, signal preprocessing, feature extraction, and decision-making stages [5]. Based on the monitoring principle, TCM techniques are generally classified into direct and indirect methods (Figure 1). Direct monitoring relies on optical, laser, or ultrasonic measurements to assess tool geometry and wear [6]. Although these methods offer high accuracy, they are intrusive, time-consuming, and unsuitable for real-time industrial applications, as they often require interruption of the machining process [5]. Moreover, unexpected tool damage occurring during cutting may not always be detected using direct techniques [7].

To address these drawbacks, indirect monitoring methods have been widely adopted for real-time tool condition assessment. These methods infer tool wear by correlating it with process-related signals such as cutting forces, vibrations, acoustic emission (AE), spindle current, and power consumption. Cutting force signals are among the most commonly used indicators due to their high sensitivity to tool condition changes [7]. Under identical cutting conditions, worn tools generate higher cutting forces, although variations may also arise from tool geometry and workpiece material, requiring appropriate filtering and normalisation [4]. Machine learning techniques have been successfully applied to cutting force-based monitoring, including Support Vector Machines optimised with genetic algorithms [8] and mutual information-based feature selection combined with ν-SVM, achieving high classification accuracy [9]. Experimental studies have also shown that average cutting force increases consistently as tool wear progresses [10].

Vibration signals generated by cutting force oscillations are also widely used for tool condition monitoring. Captured by accelerometers, these signals carry information related to both machine structure and machining operations [7]. Time-domain features such as RMS, peak-to-peak value, and kurtosis generally increase with tool wear, while frequency-domain analysis enables the identification of wear-related spectral components [10]. Despite their advantages, vibration signals are sensitive to environmental conditions, sensor placement, and cutting fluid effects, which complicates feature extraction [11]. Spindle motor current signals provide a more robust alternative, as they are less sensitive to external noise and reflect increased cutting resistance caused by tool wear [12]. These signals are typically analysed in the time domain due to their limited frequency content [13].

Acoustic emission signals, generated by plastic deformation and fracture in the cutting zone, are particularly suitable for monitoring milling operations due to their transient nature [5]. Previous studies have shown that AE signals are highly sensitive to tool wear, especially in high-frequency bands [14]. Furthermore, combining AE with vibration or cutting force signals provides complementary information that improves monitoring reliability [15]. Consequently, multisensory data fusion techniques have been increasingly adopted to enhance robustness and accuracy in TCM systems [16]. However, redundant sensor information may degrade model performance, making feature and sensor selection an essential preprocessing step [17]. Advanced fusion strategies such as IMDG and IMG have been shown to significantly improve classification accuracy in neural network-based monitoring models [18].

Concerning model architectures, the CNN-Transformer neural network (CTNN) was introduced to process multisensory condition monitoring data in a parallel architecture, with Transformers capturing global temporal associations and CNNs preserving local sequence order [18]. Self-supervised frameworks have also been explored: the disentangled variational autoencoder (D-VAE) integrated with a temporal convolutional neural network (TCNN) was proposed to model and trend tool wear in latent spaces, without requiring extensive manual feature engineering [19]. More recently, unsupervised contrastive learning approaches have used operational time as a proxy for degradation, imposing invariance to fluctuating operating conditions and deriving continuous health indicators from vibration signals alone [20].

For sequential modelling of wear evolution, LSTM networks have attracted particular attention. Unlike classical machine learning models such as Support Vector Machines and Random Forests, which treat individual observations as independent, LSTM networks exploit gating mechanisms to retain relevant information over extended sequences. This makes them well suited to tool wear monitoring, where the current state of the cutting tool depends on its cumulative machining history. LSTM-based approaches have demonstrated improved robustness and reliability for real-time TCM in milling operations [21,22,23,24].

Despite the significant progress achieved in tool condition monitoring, several limitations remain in the existing literature. Many studies focus primarily on sensor-derived features while overlooking the influence of machining conditions, even though cutting parameters directly affect vibration, acoustic emission, and spindle current responses. High-dimensional multisensor feature sets frequently contain redundant information that reduces model robustness and increases computational complexity. Furthermore, although LSTM-based models and multisensor fusion approaches have been reported, the quantitative contribution of machining-condition information to multiclass tool wear classification has not been sufficiently examined using the UC Berkeley milling dataset.

To address these limitations, this study proposes a tool wear classification framework that combines statistical features extracted from vibration, acoustic emission, and spindle motor current signals with machining-condition information. Feature dimensionality is reduced using LASSO regression to eliminate redundant variables, and a deep LSTM architecture is employed to model the temporal evolution of wear-related features. Two models are investigated and compared: the first relies exclusively on sensor-derived features, while the second additionally integrates machining parameters—namely feed rate and depth of cut. A case-level data partitioning strategy is adopted to prevent data leakage and provide a realistic assessment of model generalisation performance.

2. Materials and Methods

2.1. Experimental Dataset

This study develops and evaluates a tool wear classification framework using the UC Berkeley milling dataset provided by NASA Ames Research Center (Moffett Field, CA, USA) and the University of California, Berkeley (Berkeley, CA, USA) [25]. The experiments were carried out on a Matsuura MC-510V machining center (Matsuura Machinery Corporation, Fukui, Japan) under different cutting conditions. The dataset comprises 167 milling runs distributed across 16 experimental cases defined by combinations of workpiece material, feed rate, and depth of cut, thus constituting a full 2² factorial design for each material. Two workpiece materials were considered: cast iron and J45 steel using a 70 mm face mill fitted with six KC710 inserts. Cutting parameters included two levels of depth of cut (0.75 mm and 1.50 mm) and two levels of feed rate (0.25 mm/rev and 0.50 mm/rev), while cutting speed was maintained constant at 200 m/min.

Flank wear (VB), defined as the distance from the cutting edge to the end of the worn region on the flank face, was used as the fault indicator and was measured using a LEICA MZ12 microscope from Leica Microsystems GmbH (Wetzlar, Germany). The 16 experimental cases and their associated cutting parameters are summarised in Table 1. Note that cases 1–4 and 9–12 share the same parameter combinations, as do cases 5–8 and 13–16; this repetition is intentional and enables assessment of run-to-run variability under identical nominal conditions.

2.2. Data Acquisition

Data acquisition was performed using five sensors. Figure 2 illustrates the sensor placement on the Matsuura MC-510V machining centre [25]. Six monitoring signals were recorded at a sampling frequency of 250 Hz, with each cut comprising 9000 sampling points. The variables recorded during each experiment are described in Table 2.

The cutting parameters—specifically feed rate and depth of cut—were systematically varied across the 16 cases, as detailed in Table 1. These conditions were selected to represent a range of realistic industrial machining scenarios. Figure 3 illustrates the signal preprocessing workflow applied to the raw acquisition data [25].

2.3. Data Processing and Labelling

Each machining cut is labelled according to the tool wear condition (healthy, degraded, and failed), following the labelling strategy proposed by Cheng et al. [26]. The adopted labelling scheme is summarized in Table 3. When the wear value was not explicitly available for a given cut, its label was inferred from the wear values of the preceding and subsequent cuts, ensuring consistent categorisation of unlabelled samples. Strict inequality is used at class boundaries to avoid ambiguous assignment: a wear value of exactly 0.2 mm is assigned to the degraded class.

Figure 4 presents the monitoring signals for cut 146, which clearly capture tool engagement and disengagement during the machining process. Figure 5 shows the signals from cut 1 for comparison.

Upon inspection of the dataset, certain runs exhibited significant anomalies, including unusually high signal amplitudes inconsistent with typical machining patterns, suggesting either sensor malfunction or physical disturbances during the milling operation. Figure 6 illustrates an example of such anomalous signals (cut 106). Runs 17, 94, and 105 were identified as corrupted and excluded from the analysis, yielding a final dataset of 164 valid cuts.

2.4. Feature Extraction

To ensure that the analysis focused exclusively on stable cutting behaviour, transient regions associated with tool engagement and disengagement were removed from each cut. Specifically, start and end sample indices were defined for each cut to delimit the steady-state cutting region (Figure 7). Only samples within these indices were retained for subsequent feature extraction.

Each cut was then processed using a sliding window of 1000 elements with a step of 500, resulting in a total of 1197 sub-cuts, as depicted in Figure 8. From each sub-cut, 13 statistical features were computed per signal, yielding a total of 78 features across the six acquired signals. These features are defined in Table 4.

2.5. Feature Normalisation

To ensure numerical consistency and improve model training stability, Z-score normalization method was applied to scale each feature, as expressed in Equation (1).

z = \frac{x - μ}{ς}

(1)

where x denotes the original value, μ represents the mean of the feature across the training set, and ς denotes its standard deviation. Z-score normalization transforms each feature to have zero mean and unit variance, placing all variables on a comparable scale and preventing features with larger magnitudes from dominating the learning process.

2.6. Feature Selection

The original 78 features were reduced using LASSO (Least Absolute Shrinkage and Selection Operator) regression, which introduces an L1 regularisation term that penalises the coefficients of less informative features, effectively setting many of them to zero. The LASSO optimisation problem is defined in Equation (2):

\underset{β}{m i n} \frac{1}{2 n} {‖ y - X β ‖}^{2} + α {‖ β ‖}_{1}

(2)

where

y, X, β, n

and

α

are, respectively, the response vector, the matrix of explanatory variables, the vector of regression coefficients, the number of observations, and the regularization parameter (greater than zero). After empirical evaluation, a value of α = 0.025 was selected as it provided the best trade-off between dimensionality reduction and classification performance. As a result, the feature set was reduced from 78 to 9 highly informative variables, thereby decreasing redundancy, improving numerical stability, and reducing computational complexity. The correlation structure of the selected features is discussed in Section 3.

2.7. Data Partitioning and Class Balancing

To prevent data leakage, the dataset was partitioned at the machining-case level rather than at the individual window level. This ensures that sub-cuts originating from the same machining operation are assigned exclusively to a single subset, thereby preventing highly correlated windows from appearing simultaneously in training and evaluation sets—a common source of overoptimistic performance estimates in sliding-window studies.

The data were split into three subsets: training, validation, and testing. This was achieved by first reserving 20% of cases for testing, then allocating 80% of the remaining cases to training and 20% to validation. Categorical target labels were converted to binary matrix format through one-hot encoding, consistent with the requirements of the Softmax output layer.

Inspection of the class distribution revealed a significant imbalance, with the failed class (class 2) substantially underrepresented relative to the healthy and degraded classes. To address this, random oversampling was applied to the minority classes in the training set, replicating existing observations until all classes contained 674 samples each. Class distributions before and after balancing are presented in Figure 9 and Figure 10, respectively. All data preprocessing, feature selection, class balancing, model training, and performance evaluation were performed using Python 3.13.5 and scikit-learn version 1.7.0.

2.8. Model Architecture

The proposed deep learning model comprises four stacked LSTM layers with 64, 32, 32, and 16 units, respectively, followed by a fully connected Softmax output layer. This architecture progressively transforms the input feature sequences into increasingly discriminative representations of tool condition, enabling effective modelling of wear progression under varying machining conditions [22,23].

LSTM networks are particularly well suited to this task because their gating mechanisms enable selective retention of relevant information over extended sequences, allowing the model to capture long-term dependencies in the cumulative wear process. The internal operations of an LSTM cell are governed by gating mechanisms, defined in Equations (3)–(5):

f_{t} = σ (W_{h f} h_{t - 1} + W_{x f} x_{t})

(3)

c_{t} = f_{t} ⊙ c_{t - 1} + t a n h (W_{h f} h_{t - 1} + W_{x f} x_{t})

(4)

h_{t} = t a n h (c_{t})

(5)

where

f_{t}

denotes the forget gate,

c_{t}

the cell state, and

h_{t}

the hidden state at time step

t .

The symbol σ represents the sigmoid activation function, while tanh denotes the hyperbolic tangent function. The operator

⊙

indicates element-wise multiplication. Figure 11 illustrates the structure of an LSTM cell and the flow of information through its gates.

The statistical features, rather than raw signals, serve as model inputs. This constitutes an important denoising and dimensionality reduction stage: machining signals are often corrupted by structural vibrations, environmental disturbances, and measurement noise, and statistical descriptors provide a compact and robust representation of the signal content within each window.

Two model variants are investigated: Model A, which takes as input only the 9 sensor-derived features; and Model B, which appends feed rate and depth of cut to the input vector, yielding an 11-dimensional input.

2.9. Evaluation Metrics

Model performance is evaluated using standard classification metrics. Accuracy, defined as the proportion of correctly classified samples, is given by Equation (6):

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(6)

where

T P, T N, F P,

and

F N

denote true positives, true negatives, false positives, and false negatives, respectively. The loss function used during training is categorical cross entropy, defined in Equation (7):

L = - \sum_{i = 1}^{n} y_{t r u e, i} l o g (y_{p r e d, i})

(7)

where n denotes the number of classes,

y_{true}

the true label,

y_{pred}

the predicted probability, and

i

the class index. Additional metrics are precision, recall, and F1-score which are defined respectively by Equation (8), Equation (9), Equation (10).

Precision = \frac{T P}{T P + F P}

(8)

Recall = \frac{T P}{T P + F N}

(9)

F 1 = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}

(10)

3. Results and Discussion

3.1. Feature Selection Results

Inspection of the initial 78-feature set revealed several pairs of strongly correlated features, indicating significant redundancy. Such multicollinearity can increase model complexity, reduce interpretability, and adversely affect generalisation. Following LASSO regression with α = 0.025, the feature set was reduced to 9 informative variables. The selected features exhibit substantially lower mutual correlation, as shown in Figure 12, thereby reducing redundancy and improving numerical stability during model training.

3.2. Model A: Sensor-Only Classification

For Model A, which relies exclusively on sensor-derived features, the training and validation accuracy curves follow a similar progression (Figure 13). Training accuracy reaches approximately 92% by the 14th epoch, indicating effective learning of the primary wear-related patterns. Validation accuracy follows a closely parallel trend, suggesting satisfactory generalisation. The corresponding loss curve decreases rapidly and stabilises at approximately 0.25 by the 14th epoch (Figure 14), confirming convergence.

The limited gap between training and validation curves indicates that the model does not suffer from significant overfitting. Nevertheless, additional validation strategies such as k-fold cross-validation or leave-one-case-out evaluation would provide a more rigorous assessment of robustness and are recommended for future work.

The confusion matrix on the test set is presented in Figure 15. The model achieves correct prediction rates of 88.97% for class 0 (healthy), 86.61% for class 1 (degraded), and 100% for class 2 (failed). The most frequent misclassifications occur between classes 0 and 1, which is expected given the gradual and continuous nature of the healthy-to-degraded transition. The full classification report is provided in Table 5.

3.3. Model B: Enriched Classification with Machining Parameters

Model B employs the same four-layer LSTM architecture as Model A but incorporates feed rate and depth of cut as additional inputs. The training accuracy reaches approximately 96% by the 10th epoch (Figure 16), converging more rapidly and to a higher level than Model A, suggesting that the additional contextual information facilitates more discriminative feature learning. The loss curve decreases steadily and stabilises around the 10th epoch for both training and validation subsets (Figure 17).

As with Model A, the limited gap between training and validation curves indicates that the enriched architecture does not suffer from overfitting. The confusion matrix (Figure 18) shows correct prediction rates of 97.24% for class 0, 89.76% for class 1, and 97.74% for class 2, with minimal confusion across all three classes. The classification report is provided in Table 6.

3.4. Comparative Analysis and Discussion

Model B achieves an overall accuracy of 95%, compared with 92% for Model A. The weighted averages of precision, recall, and F1-score are also consistently higher for Model B across all three classes. This 3-percentage-point improvement, achieved solely by appending two scalar cutting parameters to the input vector, confirms that machining conditions carry complementary information that meaningfully enhances wear state discrimination.

This improvement can be explained from a physical perspective. Cutting parameters directly influence cutting forces, vibration amplitudes, acoustic emission activity, and spindle motor load. Consequently, similar wear states can generate substantially different sensor responses under different operating conditions—a phenomenon that creates classification ambiguity when the model receives sensor signals alone. By incorporating feed rate and depth of cut, Model B receives contextual information that allows it to distinguish between signal variations caused by tool wear and those caused by changes in the machining regime. This reduces ambiguity, particularly for the degraded class (class 1), where transitions are most gradual.

The consistent convergence behaviour observed in both models, combined with the limited generalisation gap in the accuracy and loss curves, indicates that the proposed four-layer LSTM architecture successfully captures the temporal evolution of tool degradation without significant overfitting. These findings are consistent with recent studies showing that deep recurrent architectures and multisensor fusion strategies outperform conventional machine learning approaches when modelling wear evolution and remaining useful life in milling operations [21,24,27,28].

Although the enriched LSTM model achieves 95% accuracy, recent work has shown that pretrained CNN models can exceed 99% accuracy on wear stage estimation tasks using the same dataset [29]. However, the present approach offers a significant advantage in computational efficiency: Transformer-based and pretrained CNN architectures typically require substantially larger datasets and longer training times to reach their full performance potential. The proposed LSTM framework, by contrast, achieves competitive accuracy with a compact feature set and a lightweight architecture, making it more suitable for deployment in resource-constrained industrial environments.

4. Conclusions

This study proposed and evaluated a deep learning framework for multiclass tool wear classification in milling, based on the UC Berkeley milling dataset. The main findings are as follows.

Feature dimensionality reduction using LASSO regression successfully reduced the original 78-variable feature set to 9 informative indicators, substantially decreasing redundancy and computational cost while preserving the information relevant to wear state discrimination. A four-layer LSTM network trained on these features achieved an overall classification accuracy of 92% using sensor-derived inputs alone, demonstrating the model’s ability to capture the temporal and cumulative nature of tool degradation.

The key contribution of this work is the quantitative demonstration that integrating machining parameters—feed rate and depth of cut—into the model input improves classification accuracy from 92% to 95%. This confirms that cutting conditions carry contextual information that resolves ambiguity between wear states generated under different machining regimes, and that their inclusion is a simple and effective enhancement to sensor-only monitoring frameworks.

The case-level data partitioning strategy adopted in this study prevents the data leakage that can arise from window-level splitting and provides a more realistic estimate of generalisation performance than is typically reported in the literature.

Future work will focus on three directions: (i) validating the proposed framework under real industrial production conditions with unseen tool and material combinations; (ii) investigating transfer-learning strategies to extend the approach to different machining operations and tool types; and (iii) incorporating additional process variables and online adaptation mechanisms to improve prediction robustness under non-stationary operating conditions and using leave-one-case-out validation.

Author Contributions

Conceptualization C.B.B., K.Y.B. and K.M.D.; methodology, C.B.B. and K.Y.B., software, C.B.B.; validation, C.B.B. and K.Y.B.; formal analysis, C.B.B.; investigation C.B.B. and K.Y.B.; resources, C.B.B. and K.Y.B.; data curation, K.Y.B.; writing—original draft preparation, C.B.B.; writing—review and editing, K.Y.B., C.B.B., K.M.D., C.C. and G.P.; visualization, C.B.B.; supervision, K.Y.B., G.P. and C.C. project administration, K.M.D.; funding acquisition, K.Y.B., G.P. and C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The UC Berkeley milling dataset used in this study is publicly available through the NASA Ames Prognostic Data Repository.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cui, X.; Zhao, J.; Dong, Y. The Effects of Cutting Parameters on Tool Life and Wear Mechanisms of CBN Tool in High-Speed Face Milling of Hardened Steel. Int. J. Adv. Manuf. Technol. 2012, 66, 955–964. [Google Scholar] [CrossRef]
Poulachon, G.; Moisan, A.; Jawahir, I.S. Tool-Wear Mechanisms in Hard Turning with Polycrystalline Cubic Boron Nitride Tools. Wear 2001, 250, 576–586. [Google Scholar] [CrossRef]
Karandikar, J.M.; Abbas, A.E.; Schmitz, T.L. Tool Life Prediction Using Random Walk Bayesian Updating. Mach. Sci. Technol. 2013, 17, 410–442. [Google Scholar] [CrossRef]
Sick, B. On-line and indirect tool wear monitoring in turning with artificial neural networks: A review of more than a decade of research. Mech. Syst. Signal Process. 2002, 16, 487–546. [Google Scholar] [CrossRef]
Nath, C. Integrated Tool Condition Monitoring Systems and Their Applications: A Comprehensive Review. Procedia Manuf. 2020, 48, 852–863. [Google Scholar] [CrossRef]
Nouri, M.; Fussell, B.K.; Ziniti, B.L.; Linder, E. Real-Time Tool Wear Monitoring in Milling Using a Cutting Condition Independent Method. Int. J. Mach. Tools Manuf. 2015, 89, 1–13. [Google Scholar] [CrossRef]
Mohamed, A.; Hassan, M.; M’Saoubi, R.; Attia, H. Tool Condition Monitoring for High-Performance Machining Systems—A Review. Sensors 2022, 22, 2206. [Google Scholar] [CrossRef]
Liao, X.; Zhou, G.; Zhang, Z.; Lu, J.; Ma, J. Tool Wear State Recognition Based on GWO–SVM with Feature Selection of Genetic Algorithm. Int. J. Adv. Manuf. Technol. 2019, 104, 1051–1063. [Google Scholar] [CrossRef]
Hu, M.; Ming, W.; An, Q.; Chen, M. Tool Wear Monitoring in Milling of Titanium Alloy Ti–6Al–4 V under MQL Conditions Based on a New Tool Wear Categorization Method. Int. J. Adv. Manuf. Technol. 2019, 104, 4117–4128. [Google Scholar] [CrossRef]
Mohanraj, T.; Shankar, S.; Rajasekar, R.; Sakthivel, N.R.; Pramanik, A. Tool Condition Monitoring Techniques in Milling Process—A Review. J. Mater. Res. Technol. 2020, 9, 1032–1042. [Google Scholar] [CrossRef]
Yu, J.; Liang, S.; Tang, D.; Liu, H. A Weighted Hidden Markov Model Approach for Continuous-State Tool Wear Monitoring and Tool Life Prediction. Int. J. Adv. Manuf. Technol. 2016, 91, 201–211. [Google Scholar] [CrossRef]
Stavropoulos, P.; Papacharalampopoulos, A.; Vasiliadis, E.; Chryssolouris, G. Tool Wear Predictability Estimation in Milling Based on Multi-Sensorial Data. Int. J. Adv. Manuf. Technol. 2016, 82, 509–521. [Google Scholar] [CrossRef]
Pimenov, D.Y.; Kumar Gupta, M.; da Silva, L.R.R.; Kiran, M.; Khanna, N.; Krolczyk, G.M. Application of Measurement Systems in Tool Condition Monitoring of Milling: A Review of Measurement Science Approach. Measurement 2022, 199, 111503. [Google Scholar] [CrossRef]
Haber, R.E.; Jiménez, J.E.; Peres, C.R.; Alique, J.R. An Investigation of Tool-Wear Monitoring in a High-Speed Machining Process. Sens. Actuators A Phys. 2004, 116, 539–545. [Google Scholar] [CrossRef]
Barreiro, J.; Fernández-Abia, A.I.; González-Laguna, A.; Pereira, O. TCM System in Contour Milling of Very Thick-Very Large Steel Plates Based on Vibration and AE Signals. J. Mater. Process. Technol. 2017, 246, 144–157. [Google Scholar] [CrossRef]
Rangwala, S.; Dornfeld, D. Sensor Integration Using Neural Networks for Intelligent Tool Condition Monitoring. J. Eng. Ind. 1990, 112, 219–228. [Google Scholar] [CrossRef]
Zhou, Y.; Xue, W. Review of Tool Condition Monitoring Methods in Milling Processes. Int. J. Adv. Manuf. Technol. 2018, 96, 2509–2523. [Google Scholar] [CrossRef]
Liu, H.; Liu, Z.; Jia, W.; Zhang, D.; Wang, Q.; Tan, J. Tool Wear Estimation Using a CNN-Transformer Model with Semi-Supervised Learning. Meas. Sci. Technol. 2021, 32, 125010. [Google Scholar] [CrossRef]
von Hahn, T.; Mechefske, C.K. Self-Supervised Learning for Tool Wear Monitoring with a Disentangled-Variational-Autoencoder. Int. J. Hydromechatronics 2021, 4, 69–98. [Google Scholar] [CrossRef]
Rombach, K.; Michau, G.; Burzle, W.; Koller, S.; Fink, O. Learning Informative Health Indicators Through Unsupervised Contrastive Learning. IEEE Trans. Reliab. 2025, 74, 2408–2420. [Google Scholar] [CrossRef]
Qin, B.; Wang, Y.; Liu, K.; Qiao, S.; Niu, M.; Jiang, Y. A Tool Wear Monitoring Approach Based on Triplet Long Short-Term Memory Neural Networks. Proc. Inst. Mech. Eng. B J. Eng. Manuf. 2024, 238, 1610–1619. [Google Scholar] [CrossRef]
Wang, K.; Wang, A.; Wu, L.; Xie, G. Machine Tool Wear Prediction Technology Based on Multi-Sensor Information Fusion. Sensors 2024, 24, 2652. [Google Scholar] [CrossRef] [PubMed]
Chen, M.; Mao, J.; Fu, Y.; Liu, X.; Zhou, Y.; Sun, W. In-Situ Tool Wear Condition Monitoring during the End Milling Process Based on Dynamic Mode and Abnormal Evaluation. Sci. Rep. 2024, 14, 12888. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Meurer, M.; Bergs, T. Deep Learning Based Tool Wear Estimation Considering Cutting Conditions. Procedia CIRP 2024, 130, 133–138. [Google Scholar] [CrossRef]
Agogino, A.M.; Goebel, K.F. Mill Data Set; NASA Ames Prognostics Data Repository: Moffett Field, CA, USA, 2007. Available online: https://data.nasa.gov/dataset/milling-wear (accessed on 6 June 2026).
Cheng, Y.; Zhu, H.; Hu, K.; Wu, J.; Shao, X.; Wang, Y. Multisensory Data-Driven Health Degradation Monitoring of Machining Tools by Generalized Multiclass Support Vector Machine. IEEE Access 2019, 7, 47102–47113. [Google Scholar] [CrossRef]
Kamat, P.; Kumar, S.; Kotecha, K. DeepTool: A Deep Learning Framework for Tool Wear Onset Detection and Remaining Useful Life Prediction. MethodsX 2024, 13, 102965. [Google Scholar] [CrossRef]
Shah, M.; Vakharia, V.; Chaudhari, R.; Vora, J.; Pimenov, D.Y.; Giasin, K. Tool Wear Prediction in Face Milling of Stainless Steel Using Singular Generative Adversarial Network and LSTM Deep Learning Models. Int. J. Adv. Manuf. Technol. 2022, 121, 723–736. [Google Scholar] [CrossRef]
Karabacak, Y.E. Deep Learning-Based CNC Milling Tool Wear Stage Estimation with Multi-Signal Analysis. Eksploat. I Niezawodn.–Maint. Reliab. 2023, 25, 2023. [Google Scholar] [CrossRef]

Figure 1. Description of tool monitoring methods [5].

Figure 2. Sensor placement [25].

Figure 3. Preprocessing process [25].

Figure 4. Signals from cut 146.

Figure 5. Signals from cut 1.

Figure 6. Signals from cut 106.

Figure 7. Identification of steady-state cutting region (yellow box: tool engagement, green box: tool disengagement).

Figure 8. Signal segmentation by sliding window.

Figure 9. Class distribution before balancing.

Figure 10. Class distribution after balancing.

Figure 11. Structure of an LSTM cell and information flow through gates and states.

Figure 12. Correlation matrix of features post-reduction.

Figure 13. Model A—Accuracy curves during training and validation.

Figure 14. Model A—loss curves during training and validation.

Figure 15. Confusion Matrix on Test Data.

Figure 16. Model B—Accuracy curves during training and validation.

Figure 17. Model B—Loss curves during training and validation.

Figure 18. Confusion Matrix on Enriched Test Data.

Table 1. Parameters for each case [25].

Case	Depth of Cut (mm)	Feed (mm/rev)	Material
1	1.5	0.5	Cast iron
2	0.75	0.5	Cast iron
3	0.75	0.25	Cast iron
4	1.5	0.25	Cast iron
5	1.5	0.5	Steel
6	1.5	0.25	Steel
7	0.75	0.25	Steel
8	0.75	0.5	Steel
9	1.5	0.5	Cast iron
10	1.5	0.25	Cast iron
11	0.75	0.25	Cast iron
12	0.75	0.5	Cast iron
13	0.75	0.25	Steel
14	0.75	0.5	Steel
15	1.5	0.25	Steel
16	1.5	0.5	Steel

Table 2. Elements of the UC Berkeley milling dataset and their description [25].

Name	Description of Elements
case	Case (count of cuts from 1 to 16)
run	Count of sub-cuts per case
VB	Coating wear (VB) observed on the cutting tool, but not at every sub-cut.
time	Time of each experiment, reset after the end of each case
DOC	Depth of cut, kept constant in each case
feed	Feed rate, kept constant in each case
material	Material, kept constant in each case
smcAC	Alternating current at the spindle motor
smcDC	Direct current at the spindle motor
vib_table	Vibration measured on the table
vib_spindle	Vibration measured on the spindle
AE_table	Acoustic emission measured on the table
AE_spindle	Acoustic emission measured on the spindle

Table 3. Tool wear labelling scheme.

State	Label	Flank Wear [ $m m$ ]
Healthy	0	$0 \leq x < 0.2$
Degraded	1	$0.2 \leq x < 0.7$
Failed	2	$\geq 0.7$

Table 4. Statistical features extracted from each sub-cut.

Features	Expression
Mean	$μ = \frac{1}{n} \sum_{i = 1}^{n} x$
Variance	$ς^{2} = \frac{1}{n} \sum_{i = 1}^{n} (x - μ)^{2}$
Standard Deviation	$ς = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (x - μ)^{2}}$
Root Mean Square (RMS)	$X_{r m s} = \sqrt{\frac{\sum_{i = 1}^{n} x^{2}}{n}}$
Effective Amplitude (RMSampl)	$X_{r} = {(\frac{1}{n} \sum_{i = 1}^{n} \sqrt{\| x \|})}^{2}$
Average Rectified Value	$X_{a r v} = \frac{1}{n} \sum_{i = 1}^{n} \| x \|$
Peak-to-Peak Value	$X_{p} = x_{\max} - x_{\min}$
Form Factor	$k_{f} = \frac{X_{r m s}}{X_{a r v}}$
Crest Factor	$k_{a} = \frac{\|x_{\max}\|}{X_{rms}}$
Average Factor	$k_{a v} = \frac{\|x_{\max}\|}{X_{a r v}}$
Median	$Q_{(x)} = \{\begin{array}{l} x_{\frac{n + 1}{2}} & if n is odd \\ \frac{x_{\frac{n}{2} + x} \frac{n}{2} + 1}{2} & if n is even \end{array}$
Maximum Value	$M = x_{\max}$
Sum	$S = \sum_{i = 1}^{n} x$

Table 5. Classification Report for the Model A (Sensor-Only).

Class	Precision	Recall	F1-Score	Support
0	0.95	0.89	0.92	145
1	0.88	0.87	0.87	127
2	0.92	1.00	0.96	133
Accuracy			0.92	405
Macro avg	0.92	0.92	0.92	405
Weighted avg	0.92	0.92	0.92	405

Table 6. Classification Report for the Model B (enriched with machining parameters).

Class	Precision	Recall	F1-Score	Support
0	0.97	0.97	0.97	145
1	0.94	0.90	0.92	127
2	0.96	0.98	0.96	133
Accuracy			0.95	405
Macro avg	0.95	0.95	0.95	405
Weighted avg	0.95	0.95	0.95	405

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bienvenu, C.B.; Bovic, K.Y.; Dany, K.M.; Casavola, C.; Pappalettera, G. Cutting Tool Wear Condition Monitoring in Milling Using Deep Learning and Data Fusion. Appl. Sci. 2026, 16, 6063. https://doi.org/10.3390/app16126063

AMA Style

Bienvenu CB, Bovic KY, Dany KM, Casavola C, Pappalettera G. Cutting Tool Wear Condition Monitoring in Milling Using Deep Learning and Data Fusion. Applied Sciences. 2026; 16(12):6063. https://doi.org/10.3390/app16126063

Chicago/Turabian Style

Bienvenu, Cikala Bagalwa, Kilundu Y’Ebondo Bovic, Katamba Mpoyi Dany, Caterina Casavola, and Giovanni Pappalettera. 2026. "Cutting Tool Wear Condition Monitoring in Milling Using Deep Learning and Data Fusion" Applied Sciences 16, no. 12: 6063. https://doi.org/10.3390/app16126063

APA Style

Bienvenu, C. B., Bovic, K. Y., Dany, K. M., Casavola, C., & Pappalettera, G. (2026). Cutting Tool Wear Condition Monitoring in Milling Using Deep Learning and Data Fusion. Applied Sciences, 16(12), 6063. https://doi.org/10.3390/app16126063

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Cutting Tool Wear Condition Monitoring in Milling Using Deep Learning and Data Fusion

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Dataset

2.2. Data Acquisition

2.3. Data Processing and Labelling

2.4. Feature Extraction

2.5. Feature Normalisation

2.6. Feature Selection

2.7. Data Partitioning and Class Balancing

2.8. Model Architecture

2.9. Evaluation Metrics

3. Results and Discussion

3.1. Feature Selection Results

3.2. Model A: Sensor-Only Classification

3.3. Model B: Enriched Classification with Machining Parameters

3.4. Comparative Analysis and Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI