Development of Robust Machine Learning Models for Tool-Wear Monitoring in Blanking Processes Under Data Scarcity

Hofmann, Johannes; Veitenheimer, Ciarán-Victor; Fei, Chenkai; Chen, Chengting; Wang, Haoyu; Zhao, Lianhao; Groche, Peter

doi:10.3390/app151910323

Open AccessArticle

Development of Robust Machine Learning Models for Tool-Wear Monitoring in Blanking Processes Under Data Scarcity

by

Johannes Hofmann

^*

,

Ciarán-Victor Veitenheimer

,

Chenkai Fei

,

Chengting Chen

,

Haoyu Wang

,

Lianhao Zhao

and

Peter Groche

Institute for Production Engineering and Forming Machines, Technical University of Darmstadt, Otto-Berndt-Straße 2, 64287 Darmstadt, Germany

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(19), 10323; https://doi.org/10.3390/app151910323

Submission received: 20 August 2025 / Revised: 16 September 2025 / Accepted: 22 September 2025 / Published: 23 September 2025

Download

Browse Figures

Versions Notes

Abstract

Tool wear is a major challenge in sheet-metal forming, as it directly affects product quality and process stability. Reliable monitoring of tool-wear conditions is therefore essential, yet it remains challenging due to limited data availability and uncertainties in manufacturing conditions. To this end, this study evaluates different strategies for developing robust machine learning models under data scarcity for fluctuating manufacturing conditions: a 1D-CNN using time-series data (baseline model), a 1D-CNN with signal fusion of force and acceleration signals, and a 2D-CNN based on Gramian Angular Field (GAF) transformation. Experiments are conducted using inline data from a blanking process with varying material thicknesses and varying availability of training data. The results show that the fusion model achieved the highest improvement (up to 93.2% with the least training data) compared to the baseline model (78.3%). While the average accuracy of the 2D-CNN was comparable to that of the baseline model, its performance was more consistent, with a reduced standard deviation of 5.4% compared to 9.2%. The findings underscore the benefits of sensor fusion and structured signal representation in enhancing classification robustness.

Keywords:

blanking; condition monitoring; uncertainty; machine learning

1. Introduction

Blanking is one of the most economical processes within the manufacturing value chain. It is widely applied in large-scale production, particularly in the automotive and aerospace industries, due to its high material utilization, low energy consumption, and excellent dimensional consistency [1,2,3]. In modern high-speed production environments where output rates exceed 100 parts per minute, tool wear can result in unplanned machine downtime or accelerated accumulation of non-conforming components. This poses significant financial risks to manufacturers [2,4].

In current industrial practice, monitoring is still often based on the experience of skilled machine operators and periodic random sampling, as these operators are usually responsible for supervising multiple production lines at the same time. To prevent such failures and ensure consistent product quality, real-time tool-condition monitoring has therefore become essential. Early studies proposed signal-based supervised methods, in which features were manually extracted from force [5], accelerometers [6], or acoustic emission sensors [7] to monitor equipment conditions. These rule-based or threshold-driven approaches offered good interpretability and practical applicability and were sometimes supported by dimensionality-reduction techniques, such as Principal Component Analysis (PCA), to enhance feature processing [8]. To this end, Jin et al. classified different process parameters based on force signals, which were transformed using PCA for dimensionality reduction [9]. Unterberg et al. segmented AE signals based on domain knowledge and extracted statistical features for the indirect monitoring of punch wear. The wear state was estimated using ensemble learning and linear regression [10].

However, methods based on feature engineering often struggle to adapt to variations in production, such as material fluctuations, changes in tool geometry, or environmental disturbances [11]. With the advancement of deep learning, particularly the development of Convolutional Neural Networks (CNNs), research has gradually shifted toward automatic feature extraction through the hidden layers of deep neural networks, significantly improving model generalization and adaptability. When applied to production processes, these methods enable real-time state description, assessment, and prediction under high-speed production conditions [12,13,14]. For instance, Huang et al. [15] and Wang et al. [16] achieved tool-wear detection by integrating vibration signals and image segmentation with neural networks. Molitor et al. [17], on the other hand, employed pre-trained Convolutional Neural Networks (CNNs) to process workpiece images, demonstrating the potential of image-based approaches for tool-wear classification in blanking processes. In contrast, image data offers the advantage of directly monitoring the tool surface, allowing for a visual assessment of wear conditions. Time-series signals, such as force or acceleration, only provide indirect information about wear and require physical coupling and modeling for wear detection.

When time-series data is converted into two-dimensional data, such as Gramian Angular Field (GAF) [18], the transformation can effectively extract temporal features and reveal hidden time-invariant patterns within the signal dynamics. Wang et al. [19] and Zhou et al. [20] successfully applied GAF-based feature extraction combined with machine learning to perform time-series-based condition monitoring in milling processes and bearing fault diagnosis, respectively. Martinez-Arellano et al. used GAF transformation to encode force signals and classify different tool conditions of a cutting process by using Convolutional Neural Network (CNN)s. Discussing their findings, the authors stated that the use of several different sensor types could contribute to improving model performance [21]. Kou et al. extended this approach by using GAFs to transform acceleration and current signals, which were then processed with infrared images using CNNs to classify tool conditions in milling processes. The authors emphasized that the fusion of heterogeneous sensor signals enables higher model performance by linking complementary information [22].

Following these results, signal-fusion techniques integrating sensor data (e.g., force, acceleration, and acoustic emission) have been adopted in smart manufacturing for data preprocessing, particularly in fault diagnosis and condition monitoring for different cutting and milling processes [23]. Different types of sensors capture distinct aspects of the manufacturing process, and their fusion can enhance the robustness of predictive models in noisy environments. However, the application of different sensor types for condition monitoring in forming technology has not yet been adopted. Furthermore, the influence on model robustness has likewise not been demonstrated.

In this context, robustness refers not only to the model’s ability to maintain stable performance under process fluctuations and signal interference but also to its capacity to adapt to manufacturing scenarios such as variations in material thickness, production batches, or tool geometry. However, the application of such techniques remains relatively limited in high-speed manufacturing processes like blanking and is still in an active exploratory phase [24,25].

The feasibility of monitoring punching processes by employing data-driven models has thus been demonstrated in the literature by various authors. However, despite these advancements, two major challenges remain unresolved in industrial applications. First, models trained on data collected under specific signal conditions often experience significant performance degradation when applied to data from different signal sources or operating settings [11]. This sensitivity to variations in signal characteristics severely limits the transferability and robustness of such models in practical scenarios. Second, acquiring large-scale, high-quality labeled datasets in industrial environments is expensive and disruptive to production, as it requires frequent machine downtime, comprehensive coverage of diverse tool-wear conditions, and extensive manual labeling efforts [13,24,25,26]. For this reason, new methods for robust and data-efficient monitoring must be developed for forming processes. To enable this, approaches from related domains, such as cutting processes, are adapted and modified.

Based on the aforementioned challenges, this study focuses on the following two key research questions:

RQ1: Can time-series image transformation improve model performance for small datasets in monitoring blanking processes under uncertain manufacturing conditions?
RQ2: Can data fusion improve the performance of process monitoring during blanking (either by reducing the required training data or by improving model robustness)?

To this end, this study proposes a blanking tool-condition monitoring method that integrates time-series image transformation and multimodal signal fusion. This method aims to achieve high-accuracy, high-robustness process-state recognition under small-sample conditions.

2. Methodology

This study investigates whether transforming time-series sensor signals into two-dimensional representations and fusing heterogeneous data sources can improve the robustness and data efficiency of CNN-based models for tool-wear classification under manufacturing conditions. The evaluation setup is designed to systematically assess model performance with limited training data and across varying process scenarios, focusing on classification robustness under domain shifts.

2.1. Data Acquisition

The dataset was recorded on a Bruderer BSTA 500 high-speed press (Bruderer AG, Frasnacht, Switzerland) equipped with two synchronized sensors. A Kistler 9051A force washer (Kistler Instrumente AG, Winterthur, Switzerland) was mounted on the punch to measure the axial cutting force in the direct force flux, and a PCB 352C03 piezoelectric accelerometer (PCB Piezotronics, Depew, NY, USA) was attached near the die to record vertical tool vibrations. The voltage signals were digitized using a

μ

-Controller (NI cRIO 9047) containing an analogue measurement module (NI 9215). All signals were sampled at 25 kHz. The acquisition range for both sensors was limited to crank angles between 150 and 210° by a trigger. During preprocessing, the signals were divided based on the trigger signal. In order to make the signals suitable for processing by the modeling strategies, each signal was resampled to a fixed length of 2800 data points. The blanking tool and the sensor signals are depicted in Figure 1.

Tool-wear progression was quantified using the cutting-edge radius

r_{i}

of the punch with

i \in {0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5}

mm. Edge-rounding is a characteristic manifestation of abrasive wear in blanking processes, affecting cut quality and tool lifetime [27]. To simulate domain shift, five different sheet thicknesses

s_{j}

with

j \in {1.97, 1.94, 1.91, 1.88, 1.85}

mm were processed under identical conditions.

The datasets

X_{i, j}

formed the basis for a comparative evaluation of alternative modeling strategies. To test the modeling approaches for robustness against process uncertainty, such as variations in sheet thickness, the models were trained on a subset of

X_{i, 1.97}

and evaluated on test sets of all sheet thicknesses,

X_{i . j}

. To simulate data scarcity, six training subsets of

X_{i, 1.97}

were constructed by randomly sampling the training data while ensuring class balance. Each evaluation was repeated ten times to account for sampling variance.

2.2. Machine Learning Pipeline

Based on the collected data, an experimental pipeline was designed to systematically evaluate the robustness, data efficiency, and cross-domain generalization of three CNN-based modeling strategies. Figure 2 illustrates the three implemented modeling paths: (1) a baseline 1D-CNN operating on individual time-series signals, (2) a 2D-CNN using GAF transformations, and (3) a 1D-CNN with early fusion of multimodal signals. The pipeline also included hyperparameter optimization and cross-domain testing.

In the first configuration, each sensor signal was fed individually into a 1D-CNN composed of two convolutional layers. The detailed structure of the model is shown in Figure 3 on the left. Model training was performed using the Adam optimizer [28]. For the baseline 1D-CNN, a fixed model configuration was used. The hyperparameters for training were selected based on preliminary tuning with Bayesian optimization (see Table 1). This model served as the baseline for comparing the other configurations.

The second configuration employed the GAF transformation of the signals, followed by processing with a 2D-CNN. Thus, all force and acceleration signals were normalized to the range

[0, 1]

using min-max scaling on a per-stroke basis. The GAF transformation encodes temporal relationships by projecting time-series values into polar coordinates and computing the cosine of angular summations [18]. This preserves both temporal structure and signal morphology in a two-dimensional format. Each normalized 1D signal was transformed using the summation method and rendered as an image using a fixed rainbow colormap. The resulting GAF images (Figure 4) were used to train a ResNet-50 model without additional preprocessing [29]. The selected hyperparameters and the search space used for Bayesian hyperparameter optimization are summarized in Table 2.

In the third configuration, both sensor signals were concatenated into a two-channel input for early fusion and processed with a deeper 1D-CNN composed of five convolutional layers. The detailed structure of the model is shown in Figure 3 on the right. Model training was performed using the Adam optimizer with the hyperparameters in Table 2 [28].

Evaluation was based on mean classification accuracy and standard deviation over ten repetitions per training configuration, as described in Section 2.1. The results were analyzed based on the mean classification accuracy for each test set, as well as the accuracy of each target

r_{i}

in the confusion matrices.

3. Results Analysis

This section presents a comparison of the three modeling strategies with respect to the two research questions introduced in Section 1. It first presents an evaluation of the baseline model before answering the research questions.

3.1. Baseline Performance of Two Signal Types

First, it is important to examine the performance of the baseline 1D-CNN models trained on individual time-series signals (e.g., acceleration or force). These results establish a reference point for evaluating the benefits of time-series transformation using GAFs and signal-fusion modeling.

Baseline performance: Figure 5 illustrates the classification accuracy of the models trained on the data of material thickness 1.97 mm across varying training data proportions and tested on different material thicknesses

s_{j}

. The left subplot shows the results for acceleration, and the right subplot shows the results for force. Each color of the curve corresponds to a specific sheet thickness. As described in Section 2, each point on the line chart represents the average test accuracy over 10 models trained on individually sampled subsets of the training data. The shaded areas represent how widely the accuracy results of the 10 models spread in the form of standard deviation.

Overall, the acceleration-based models consistently outperformed their force-based counterparts. With only 40 samples per wear state for training, the acceleration models achieved over 90% classification accuracy in the test of all material thicknesses. In contrast, the force-based models failed to achieve comparable performance, even with 200 samples per wear state. This indicates a higher informational value for the machine learning model in each sample per wear class of the training dataset, as well as a higher robustness in differentiating tool-wear classes under variations in sheet thickness. Although the signals were recorded at the same measurement frequency, the higher frequency components that dominate the acceleration signal appear to contain additional information for the model. This could explain the superior accuracy in data-scarce scenarios, as well as the lower overall accuracy. For trained machine operators, however, this is accompanied by a loss of physical interpretability that the force signal offers with a distinct punch, push, and withdraw phase in the cutting process. Nevertheless, this explains the high sensitivity of the force signals to a change in sheet thickness, as the maximum cutting force is physically directly linked to the sheet thickness. As a result, it is to be expected that the extracted features between the training and test domains shift with force signals, and that poor model performance is to be assumed.

In addition to superior overall accuracy, the acceleration-based models demonstrated greater consistency across repeated trials, as indicated by their lower standard deviations. This stability was especially pronounced under limited data conditions.

To further examine the per-class classification behavior, confusion matrices are presented in Figure 6 for the models trained with 200 samples per wear state and tested on

X_{i, 1.85}

.

The acceleration models presented strong diagonal dominance, with fewer errors across most classes. In contrast, the models trained on force signals showed a more widespread misclassification pattern. As seen in the confusion matrix, classes such as

r_{0.10}

,

r_{0.45}

, and

r_{0.50}

were frequently confused with both neighboring and distant classes. For example,

r_{0.10}

was often predicted as

r_{0.15}

,

r_{0.20}

, and even as

r_{0.35}

and

r_{0.50}

.

Due to the thickness difference between the training data

X_{i, 1.97}

and the test data

X_{i, 1.85}

, both models tended to misclassify severe wear conditions (

r_{0.50}

–

r_{0.40}

) as mild ones (

r_{0.10}

–

r_{0.05}

). The acceleration-signal-based models tended to misclassify a small number of higher wear classes such as

r_{0.40}

and

r_{0.45}

as

r_{0.05}

, which can be attributed to the local similarity in transient vibrations. In industrial applications, it is essential to avoid misclassifying high wear conditions as low wear. Such misclassifications can result in severe tool wear being undetected, which may lead to tool breakage, prolonged downtime, and, consequently, significant economic losses. In addition, such failures pose a potential safety risk to machine operators. These critical misclassifications were observed in both baseline models.

3.2. Can GAF Transformation of Sensor Signals Improve Performance?

As described in Section 2, GAFs transform time-series signals into image-like representations, enabling CNNs to exploit time-invariant structural features. To assess the impact of different feature representations on tool-wear classification, we compared a 2D-CNN GAF-based model with a 1D-CNN, using identical force signals as input. The force signals were selected for this analysis, as their lower classification performance offers greater potential for a differentiated evaluation of possible improvements. The results, as shown in Figure 7, include the classification accuracies and standard deviations across all conditions.

Overall, transforming the force signals using GAFs led to a consistent improvement in classification performance. Using only 20 samples per wear state, the GAF transformation achieved 98.1% accuracy when the sheet thickness was not changed between the training and test sets, exceeding the 1D-CNN model’s 81%. Tested on

X_{i, 1.94}

, using 20 samples per wear state, the GAF transformation achieved 79.5% accuracy, outperforming the 1D-CNN model at 76%. This trend continued for all training data sizes, eventually reaching 90.3%, compared to 86.5% for the 1D-CNN model. For the test sets

X_{i, 1.91}

,

X_{i, 1.88}

, and

X_{i, 1.85}

, transforming time-series signals using GAFs did not demonstrate an overall improvement in accuracy. However, a significant improvement in the stability of the predictions, which was reflected in the standard deviation, was observed. The reduced standard deviation allows for a clearer definition of the operational boundaries within which the modeling approach can be applied in industrial settings. The findings of the investigations conducted on the various preprocessing methods for all sheet thicknesses are documented in Appendix A. Table A1 presents the mean accuracies of the models, while Table A2 presents the standard deviations of these models.

The extraction of temporally inherent features through transformation enhanced the separability of individual classes. As a result, classification accuracy improved significantly for similar data and small data shifts. However, although the extracted features improved class separability, they did not necessarily increase the robustness of this separation against data shifts, as the extracted features between the training and test sets changed.

To investigate class-wise behavior, Figure 8 presents confusion matrices for both models trained on 40 samples per wear state and evaluated on 1.94 mm.

The GAF-based model showed improved diagonal concentration, indicating stronger classification ability. Compared with the 1D-force model, which predicted

r_{0.50}

as

r_{0.05}

, the 2D-CNN also significantly reduced this misclassification behavior. While the 1D-CNN model misclassified 91 samples of

r_{0.50}

as

r_{0.05}

, the number of misclassifications by the 2D-CNN model in this case was only 8. In conclusion, GAF transformation, with its ability to leverage spatial feature extraction to overcome the limitations of the 1D-CNN model, can improve accuracy by significantly reducing the standard deviation between different samples. On the other hand, the confusion matrix shows a decline in the tendency for critical misclassifications between distant wear classes, while still showing critical misclassifications in both cases.

3.3. Can Fusion of Raw Signals Improve Performance?

Although the 2D-CNN achieved competitive performance under consistent conditions, its robustness under significant domain shifts (variations in material thickness) remained limited. Moreover, in some samples, the model exhibited critical misclassifications in predicting distant wear classes (e.g., classifying

r_{0.05}

as

r_{0.50}

), undermining its practical applicability for predictive maintenance. To address this, a 1D-CNN architecture that fuses both acceleration and force signals as dual-channel inputs was investigated. This approach enables the model to learn joint patterns and leverage complementary features across modalities.

The acceleration-based 1D-CNN was used as the reference model due to its strong standalone performance. Figure 9 compares the accuracy of the fusion model and the 1D-CNN across varying training data proportions. Each curve corresponds to a specific sheet thickness, and the shaded areas show the standard deviation of the test results.

Overall, the fusion model achieved consistently higher classification accuracy than the acceleration-only 1D-CNN across all material thicknesses and training data proportions. On the test set furthest from the training distribution

X_{i, 1.85}

, the fusion model reached an accuracy of 93.2% with only 20 samples per wear state for training, whereas the acceleration model lagged behind at 78.3%, marking a substantial improvement of nearly 15 percentage points. This advantage remained evident across all training sizes. When trained with 200 samples per wear state and tested on

X_{i, 1.85}

, the acceleration model only gradually approached 96.0%, while the fusion model exceeded this level, reaching 96.8%. In addition to higher accuracy, the fusion model also showed greater stability in repeated experiments. Furthermore, the acceleration model exhibited large fluctuations at low data levels when tested on

X_{i, 1.85}

, with a standard deviation of 10.96 for 20 training samples, which dropped to 4.49 at 100 training samples. In contrast, the fusion model remained much more stable, with the standard deviation staying below 2.0 throughout and falling to 1.15 at 200 training samples per wear state. Similar trends were observed across other test sets, where the fusion model maintained low variance. This indicates that the fusion model benefited from additional information in the training data, providing more accurate predictions. This is particularly important in scenarios with limited data or distribution shifts. By combining two signals, the fusion model benefited from a more comprehensive signal representation. The force signal captured the primary mechanical load during the cutting phase, characterized by a dominant peak in the punch phase, followed by less pronounced push and withdraw phases. In contrast, the acceleration signal reflected dynamic mechanical responses such as oscillations, impacts, and tool vibrations throughout the process. This combination enabled the model to learn both global load characteristics and transient structural responses, leading to higher prediction accuracy and improved robustness.

To investigate the specific benefits of fusion on class-level prediction, Figure 10 displays the confusion matrices for both models trained with only 20 samples per wear state and evaluated on

X_{i, 1.85}

.

The 1D-fusion model improved both overall accuracy and misclassification robustness, particularly in low-data settings. Fusion reduced critical misclassification errors that could lead to incorrect tool-wear decisions in practice. While the acceleration-based model misclassified distant classes, such as 54 samples of

r_{0.05}

misclassified as

r_{0.50}

and 29 samples of

r_{0.40}

misclassified as

r_{0.05}

, the 1D-fusion model tended to predict one wear condition as a close wear condition, such as predicting

r_{0.35}

as

r_{0.40}

. Although the benefit diminished with larger datasets, fusion still maintained a small edge in consistency and class stability.

4. Conclusions

Monitoring blanking processes in real production environments using data-driven approaches poses a challenge, as the availability of large labeled datasets is limited and the trained models must be robust to fluctuating uncertainties inherent in manufacturing processes. To this end, this study answers two research questions.

RQ1: Can time-series image transformation improve model performance for small datasets in monitoring blanking processes under uncertain manufacturing conditions?

This study showed that transforming force signals with Gramian Angular Fields enhances robustness by encoding temporal patterns into structured representations, reducing prediction variance across material batches for force signals. Although the GAF-based models failed to match the acceleration-based models in terms of accuracy, they may be effective if acceleration sensors are unavailable. Since data-driven monitoring of production processes often relies on retrofitting existing tools, not every sensor can be integrated due to spatial limitations. As a result, only a limited set of sensor signals may be available depending on the specific application. In these scenarios, the use of GAFs can contribute to improved robustness in model performance.

RQ2: Can data fusion improve the performance of process monitoring during blanking (either by reducing the required training data or by improving model robustness)?

The fusion models consistently achieved the highest performance. The combination of different sensors allowed for the acquisition of diverse process characteristics in punching operations, which could then be provided to the model. This additional information reduced misclassifications and improved robustness to distributional data shifts. However, their implementation requires greater integration effort and sensor alignment.

Future work will focus on enabling the application of trained models in data-scarce scenarios across different use cases, such as changes in sheet material. Fluctuating process uncertainties will be further investigated to ensure the transferability of the approach to real production environments. To this end, semi-supervised transfer learning methods will be explored to further reduce the requirements for labeled datasets in other application contexts. In addition, the integration of synthetic data will be part of future investigations in order to reduce the need for real process data. To this end, the use of model-based methods such as variational autoencoders, as well as direct signal-altering methods such as jittering, scaling, or interpolation, will be investigated. Furthermore, the integration of numerical data will be investigated.

Author Contributions

Conceptualization, J.H. and C.-V.V.; methodology, J.H. and C.-V.V.; investigation, J.H., C.-V.V., C.F., C.C., H.W. and L.Z.; writing—original draft preparation, C.F., C.C., H.W. and L.Z.; writing—review and editing, J.H., C.-V.V. and P.G.; visualization, J.H., C.-V.V., C.F., C.C., H.W. and L.Z.; supervision, J.H., C.-V.V. and P.G.; project administration, P.G.; funding acquisition, P.G. All authors have read and agreed to the published version of the manuscript.

Funding

This project was supported by the Federal Ministry for Economic Affairs and Climate Action (BMWK) on the basis of a decision by the German Bundestag under grant agreement no. KK5031610FM2. In addition, we would like to thank our project partner DREISTERN GmbH & Co. KG. (Schopfheim, Germany). The PROMATE project is supported by the Distr@l funding program, which is funded by the State of Hessen in accordance with funding guideline 2a. We would also like to thank our project partner, Data Hive Cassel GmbH (Kassel, Germany).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Comparison of model accuracies (%) across selected data proportions.

Model	Samples/ Wear State	1.85 mm	1.88 mm	1.91 mm	1.94 mm	1.97 mm
1D-acceleration	20	78.3	89.6	96.2	98.6	99.5
1D-force	20	59.3	60.7	70.0	76.0	81.0
1D-fusion	20	93.2	98.3	99.6	99.7	99.8
2D-GAF-force	20	51.4	56.0	68.8	79.5	98.1
1D-acceleration	200	96.0	99.8	99.9	100	100
1D-force	200	69.1	71.0	83.2	86.5	98.5
1D-fusion	200	96.8	99.7	99.9	100	100
2D-GAF-force	200	61.3	70.6	88.5	90.3	99.8

Table A2. Comparison of model standard deviations across selected data proportions.

Model	Samples/ Wear State	1.85 mm	1.88 mm	1.91 mm	1.94 mm	1.97 mm
1D-acceleration	20	10.9	7.99	5.49	2.59	0.77
1D-force	20	9.12	8.65	8.01	6.72	13.46
1D-fusion	20	1.82	1.04	0.17	0.19	0.12
2D-GAF-force	20	8.29	6.82	6.21	4.84	0.64
1D-acceleration	200	1.86	0.18	0.01	0	0
1D-force	200	4.82	5.59	7.35	6.31	2.51
1D-fusion	200	1.15	0.41	0.01	0	0
2D-GAF-force	200	5.74	3.86	4.58	1.88	0.22

References

Cao, J.; Brinksmeier, E.; Fu, M.; Gao, R.X.; Liang, B.; Merklein, M.; Schmidt, M.; Yanagimoto, J. Manufacturing of advanced smart tooling for metal forming. CIRP Ann. 2019, 68, 605–628. [Google Scholar] [CrossRef]
Lange, K.; Pöhlandt, K. Handbook of Metal Forming; McGraw-Hill: New York, NY, USA, 1985. [Google Scholar]
Vollmer, R.; Palm, C. Process monitoring and real time algorithmic for hot stamping lines. Procedia Manuf. 2019, 29, 256–263. [Google Scholar] [CrossRef]
Munaro, R.; Attanasio, A.; Del Prete, A. Tool wear monitoring with artificial intelligence methods: A review. J. Manuf. Mater. Process. 2023, 7, 129. [Google Scholar] [CrossRef]
Groche, P.; Hohmann, J.; Übelacker, D. Overview and comparison of different sensor positions and measuring methods for the process force measurement in stamping operations. Measurement 2019, 135, 122–130. [Google Scholar] [CrossRef]
Demmel, P.; Hirsch, M.; Golle, R.; Hoffmann, H. In situ temperature measurement in the shearing zone during sheet metal blanking. Adv. Mater. Res. 2012, 445, 207–212. [Google Scholar] [CrossRef]
Sari, D.Y.; Wu, T.L.; Lin, B.T. Preliminary study for online monitoring during the punching process. Int. J. Adv. Manuf. Technol. 2017, 88, 2275–2285. [Google Scholar] [CrossRef]
Maćkiewicz, A.; Ratajczak, W. Principal components analysis (PCA). Comput. Geosci. 1993, 19, 303–342. [Google Scholar] [CrossRef]
Jin, J.; Shi, J. Diagnostic Feature Extraction From Stamping Tonnage Signals Based on Design of Experiments. J. Manuf. Sci. Eng. 1999, 122, 360–369. [Google Scholar] [CrossRef]
Unterberg, M.; Becker, M.; Niemietz, P.; Bergs, T. Data-driven indirect punch wear monitoring in sheet-metal stamping processes. J. Intell. Manuf. 2024, 35, 1721–1735. [Google Scholar] [CrossRef]
Kubik, C.; Hofmann, J.; Molitor, D.A.; Becker, M.; Groche, P. Reliable machine learning models for manufacturing processes: Generalization beyond experimental conditions. Int. J. Adv. Manuf. Technol. 2025, 71, 5913–5927. [Google Scholar] [CrossRef]
Ge, M.; Zhang, G.; Du, R.; Xu, Y. Feature extraction from energy distribution of stamping processes using wavelet transform. J. Vib. Control 2002, 8, 1023–1032. [Google Scholar] [CrossRef]
Kubik, C.; Molitor, D.A.; Becker, M.; Groche, P. Knowledge discovery in engineering applications using machine learning techniques. J. Manuf. Sci. Eng. 2022, 144, 091003. [Google Scholar] [CrossRef]
Hofmann, J.; Becker, M.; Kubik, C.; Groche, P. Machine learning based operator assistance in roll forming. Prod. Eng. 2024, 19, 283–294. [Google Scholar] [CrossRef]
Huang, C.Y.; Dzulfikri, Z. Stamping monitoring by using an adaptive 1D convolutional neural network. Sensors 2021, 21, 262. [Google Scholar] [CrossRef]
Wang, Q.; Wang, H.; Hou, L.; Yi, S. Overview of Tool Wear Monitoring Methods Based on Convolutional Neural Network. Appl. Sci. 2021, 11, 12041. [Google Scholar] [CrossRef]
Molitor, D.A.; Kubik, C.; Hetfleisch, R.H.; Groche, P. Workpiece image-based tool wear classification in blanking processes using deep convolutional neural networks. Prod. Eng. 2022, 16, 481–492. [Google Scholar] [CrossRef]
Wang, Z.; Oates, T. Encoding time series as images for visual inspection and classification using tiled convolutional neural networks. In Proceedings of the Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 1, pp. 1–7. [Google Scholar]
Wang, H.; Sun, W.; Sun, W.; Ren, Y.; Zhou, Y.; Qian, Q.; Kumar, A. A novel tool condition monitoring based on Gramian angular field and comparative learning. Int. J. Hydromechatronics 2023, 6, 93–107. [Google Scholar] [CrossRef]
Zhou, Y.; Long, X.; Sun, M.; Chen, Z. Bearing fault diagnosis based on Gramian angular field and DenseNet. Math. Biosci. Eng. 2022, 19, 14086–14101. [Google Scholar] [CrossRef]
Martínez-Arellano, G.; Terrazas, G.; Ratchev, S. Tool wear classification using time series imaging and deep learning. Int. J. Adv. Manuf. Technol. 2019, 104, 3647–3662. [Google Scholar] [CrossRef]
Kou, R.; Lian, S.W.; Xie, N.; Lu, B.; Liu, X. Image-based tool condition monitoring based on convolution neural network in turning process. Int. J. Adv. Manuf. Technol. 2022, 119, 3279–3291. [Google Scholar] [CrossRef]
Tsanousa, A.; Bektsis, E.; Kyriakopoulos, C.; González, A.G.; Leturiondo, U.; Gialampoukidis, I.; Karakostas, A.; Vrochidis, S.; Kompatsiaris, I. A review of multisensor data fusion solutions in smart manufacturing: Systems and trends. Sensors 2022, 22, 1734. [Google Scholar] [CrossRef]
Freiesleben, T.; Grote, T. Beyond generalization: A theory of robustness in machine learning. Synthese 2023, 202, 109. [Google Scholar] [CrossRef]
Kubik, C.; Knauer, S.M.; Groche, P. Smart sheet metal forming: Importance of data acquisition, preprocessing and transformation on the performance of a multiclass support vector machine for predicting wear states during blanking. J. Intell. Manuf. 2022, 33, 259–282. [Google Scholar] [CrossRef]
Pandhare, V.; Singh, J.; Lee, J. Convolutional neural network based rolling-element bearing fault diagnosis for naturally occurring and progressing defects using time-frequency domain features. In Proceedings of the 2019 Prognostics and System Health Management Conference (PHM-Paris), Paris, France, 2–5 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 320–326. [Google Scholar]
Kubik, C.; Molitor, D.A.; Rojahn, M.; Groche, P. Towards a real-time tool state detection in sheet metal forming processes validated by wear classification during blanking. IOP Conf. Ser. Mater. Sci. Eng. 2022, 1238, 012067. [Google Scholar] [CrossRef]
Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]

Figure 1. Sensor setup and data sources in the blanking process. Two synchronized sensors capture force and acceleration signals, which are labeled based on tool-wear conditions.

Figure 2. The workflow includes hyperparameter optimization, training, and robustness testing for the 1D-CNN, the 2D-CNN with GAF transformation, and the 1D-CNN with early data fusion.

Figure 3. (Left) 1D-CNN architecture for single-channel input (force or acceleration). (Right) 1D-CNN architecture with fusion of force and acceleration signals.

Figure 4. Example Gramian Angular Field (GAF) images for force

r_{0.05}

(left) and

r_{0.50}

(right).

Figure 4. Example Gramian Angular Field (GAF) images for force

r_{0.05}

(left) and

r_{0.50}

(right).

Figure 5. Accuracy for the 1D-acceleration (left) and 1D-force (right) models.

Figure 6. Confusion matrices for the 1D-CNN models trained on acceleration (left) and force (right) on 1.85 mm with 200 samples per wear state.

Figure 7. Accuracy of the 2D-CNN (left) and 1D-CNN (right) models trained on force signals.

Figure 8. Confusion matrices for the 2D-force (left) and 1D-force (right) models on 1.94 mm with 10% training data.

Figure 9. Accuracy of the 1D-fusion (left) and 1D-CNN (right) models trained on acceleration.

Figure 10. Confusion matrices for the 1D-fusion (left) and 1D-CNN (right) models trained on acceleration on 1.85 mm with 20 samples per wear state.

Table 1. Training hyperparameters for baseline 1D-CNN models.

Parameter	1D-Acceleration	1D-Force	Search Space
Learning Rate	0.000971	0.000347	$(10^{- 4}, 10^{- 3})$
Weight Decay	$3.66 \times 10^{- 4}$	$1.26 \times 10^{- 4}$	$(10^{- 4}, 10^{- 3})$
Batch Size	16	32	${8, 16, 32}$
Number of Convolutional Layers	3	3	${1, 2, 3, 4}$
Starting Number of Channels	32	8	${8, 16, 32}$
Optimizer	Adam	Adam	–
Early Stopping (Patience)	5	5	–
Training Epochs	40	40	–

Table 2. Training hyperparameters for 2D-CNN and fusion models.

Parameter	2D-CNN	Fusion Model	Search Space
Learning Rate	0.000329	0.002568	$(10^{- 5}, 10^{- 2})$
Weight Decay	$1.39 \times 10^{- 4}$	$3.23 \times 10^{- 5}$	$(10^{- 6}, 10^{- 3})$
Batch Size	64	32	${8, 16, 32, 64}$
Step Size	5 epochs	5 epochs	${1, 2, 3, 4, 5}$
Learning Rate Decay ( $γ$ )	0.32007	0.87294	$(0.1, 0.9)$
Optimizer	Adam	Adam	–
Early Stopping (Patience)	4 (if epochs > 15)	5	–
Training Epochs	40	40	–

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hofmann, J.; Veitenheimer, C.-V.; Fei, C.; Chen, C.; Wang, H.; Zhao, L.; Groche, P. Development of Robust Machine Learning Models for Tool-Wear Monitoring in Blanking Processes Under Data Scarcity. Appl. Sci. 2025, 15, 10323. https://doi.org/10.3390/app151910323

AMA Style

Hofmann J, Veitenheimer C-V, Fei C, Chen C, Wang H, Zhao L, Groche P. Development of Robust Machine Learning Models for Tool-Wear Monitoring in Blanking Processes Under Data Scarcity. Applied Sciences. 2025; 15(19):10323. https://doi.org/10.3390/app151910323

Chicago/Turabian Style

Hofmann, Johannes, Ciarán-Victor Veitenheimer, Chenkai Fei, Chengting Chen, Haoyu Wang, Lianhao Zhao, and Peter Groche. 2025. "Development of Robust Machine Learning Models for Tool-Wear Monitoring in Blanking Processes Under Data Scarcity" Applied Sciences 15, no. 19: 10323. https://doi.org/10.3390/app151910323

APA Style

Hofmann, J., Veitenheimer, C.-V., Fei, C., Chen, C., Wang, H., Zhao, L., & Groche, P. (2025). Development of Robust Machine Learning Models for Tool-Wear Monitoring in Blanking Processes Under Data Scarcity. Applied Sciences, 15(19), 10323. https://doi.org/10.3390/app151910323

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of Robust Machine Learning Models for Tool-Wear Monitoring in Blanking Processes Under Data Scarcity

Abstract

1. Introduction

2. Methodology

2.1. Data Acquisition

2.2. Machine Learning Pipeline

3. Results Analysis

3.1. Baseline Performance of Two Signal Types

3.2. Can GAF Transformation of Sensor Signals Improve Performance?

3.3. Can Fusion of Raw Signals Improve Performance?

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI