1. Introduction
The forming range of sheet metal is evaluated using the forming limit curve (FLC), whose limits are defined by the major and minor strain pairs at the onset of localized necking. In Europe, the procedure used for the generation of the FLC is summarized in DIN EN ISO 12004-2 [
1]. The FLC is generated by means of Nakajima [
2] and Marciniak [
3] setups. The sheet metal is mounted into a clamping unit and then deformed to fracture by using a flat shaped or a hemispherical punch. The evaluation method proposed in the ISO standard is based on the Bragard study of 1972 [
4], which evaluates the strain distribution of the material prior to failure and is referred to as the “cross-sectional method”. To determine the limiting strains, sections perpendicular to crack initiation are defined and the strain propagation is approximated with a second order polynomial. However, such an approach only allows evaluation of the last stage of the strain distribution without considering strain progression. Presently, the forming behavior is evaluated using digital image correlation techniques (DIC) and employed to determine the FLC [
5]. For this purpose, a stereo camera system measures strains on the material surface during Nakajima or Marciniak tests and thus enables the evaluation of strain progression. Despite technological progress, the standard is still based on the “cross-section method”. An added complexity in determining the forming limits is related to the intrinsic properties of the materials, for example, modern lightweight materials such as high-strength steels and aluminum alloys tend to develop multiple local strain maxima or spontaneous crack initiation without a necking phase. Consequently, second order functions are unsuitable to approximate the limit strains. To overcome the disadvantages of the “cross-section method”, several techniques have been developed which take into account the forming history during the Nakajima tests. These time-dependent methods include the “line-fit” approach proposed by Volk et al. [
6] as well as the correlation coefficient method [
7]. Both approaches are based on observing the reduction in thickness within a predefined instability zone and determining the initiation of necking as a response to a sudden decrease in the thickness of the specimen. One disadvantage of these time-dependent methods is their dependence on a predefined instability zone and thus limited area of evaluation. In 2015, an approach based on machine learning was considered to provide new insights to forming development [
8]. Conventional pattern recognition [
9] involves the automatic processing and evaluation of data, whereby a physical signal, e.g., images or speech, is first converted into suitable, more compact characteristic features that are commonly defined by experts of the respective field. To enable an automated separation into subspaces/classes, the data is first assigned to different classes by experts. Using a classification algorithm and a representative subset of the data, decision boundaries are learned based on the feature representation/label pairs, such that the hypothesis of the learned class boundaries can be verified with the remaining disjoint data. In [
8], such a pattern recognition technique was used to predict the crack class for a DX54D steel with a prediction accuracy of 90% before the actual crack formation was observed.
Metallographic investigations have shown that both conventional deep drawing steels [
10] and dual phase steels [
11] have superficial patterns associated with local necking, which enables the application of automated pattern recognition methods using the strain distributions for the determination of forming limits [
12]. In [
13] (henceforth referred to as Part 1) this idea was extended by the assessment of several evaluation areas and by comparing the obtained classification results with expert annotations for several failure classes (diffuse, local necking, and crack). This study demonstrated that a consistent determination of local necking is possible with the use of expert knowledge. Additionally, it was shown that the accuracy and efficacy of the results are affected by subjective expert annotations as well as external factors such as sampling frequency, which have a particularly negative effect on the classification results of the diffuse necking class.
To address these problems and to remove the dependence on expert annotations, an unsupervised classification approach was developed (henceforth referred to as Part 2) in [
14]. Several rectangular areas close to the strain maximum were evaluated and Histogram of Oriented Gradients (HoG) were used as characteristic features [
15], to classify the local necking using One-Class Support Vector Machines (SVM) [
16]. It was shown that the results are within the range of the “line-fit” method, while incorporating image information and ameliorating the need for defining a specific evaluation area. Additionally, advantages of this approach include independence from expert annotations, transferability to new materials, and the introduction of a probabilistic FLC.
On the other hand, the conditional location dependence, i.e., the choice of the evaluation areas is still dependent on the strain maxima, which can be a disadvantage. Furthermore, the criteria used to determine the local necking are constrained by expert a priori knowledge, namely, the evaluation of HoG features, which are not necessarily applicable to other materials. Additionally, the assumption and partitioning of the forming process into a homogeneous and an inhomogeneous forming area can influence the outcome of the probabilistic FLC, if using another split criterion then approximately 50% of the sequence length.
In recent years, convolutional neural networks (CNN) have found widespread use in the field of pattern recognition, and can process a large amount of data in finite time due to the computing capacity that is now available, e.g., due to powerful graphics cards. The main advantage of these approaches is the automated, data-driven learning of representative features that are adapted to the problem, such that the design of handcrafted features is no longer necessary for the evaluation and assessment of classification problems. Different CNN approaches have outperformed conventional pattern recognition methods in various challenges (e.g., ILSVRC [
17]) and are being successfully applied in the medical field, e.g., computer aided diagnosis [
18], non-rigid registration [
19], reconstruction [
20,
21], landmark detection [
22], and in mechanical engineering for example in the field of event detection [
23] or defect detection for photo-voltaic module cells [
24].
In contrast to Part 1 and 2, this study uses a CNN for the automatic extraction of characteristic features, for the task of FLC estimation. Here, the extreme cases of the forming process (beginning of the homogeneous and end of the inhomogeneous forming phase) are used to train a Siamese CNN [
25] to optimally separate the respective forming phases from each other. This allows the characteristic features to be adapted to the respective material properties, to enable a more precise description of the failure states and by extension, the estimation of a more accurate FLC. As a further advantage, time independency is introduced by clustering the feature representations of images using Students t Mixture Models [
26], so that video sequences of incomplete forming processes may also be investigated, for e.g., prior to the occurrence of fracture. This not only allows a determination of forming limits independent of strain paths, which use the mean value of a heuristically defined region, but also the generation of the probabilistic FLC as introduced in Part 2, and theoretically a real-time monitoring of forming processes. To enable comparisons with previous studies, the data set of Part 1 is used, and the temporal derivative according to Part 2 is evaluated, which emphasizes the strain development over a sequence of successive images, as suggested by Vacher et al. [
27]. In addition, the generalizability and transferability of the methodology to other materials that remain unseen during learning will be investigated. The data set consists of a deep drawing steel DX54D of two different thicknesses, a dual phase steel DP800, an aluminum alloy AA6014 (in previous studies referred to as AC170) and additionally another aluminum alloy AA5182 with Portevin-LE Chatlier effects (PLC) [
28]. The results of the CNN approach are compared with the results of the method proposed in Part 2 and with the results of the time-dependent FLC of the “line-fit” method. Additionally, the transferability of the approach to unseen data is demonstrated by assessing forming tests which were stopped before fracture occurrence and for which metallographic examination results are available.
  2. Experimental Procedure and Materials
The FLC is usually determined experimentally with a Nakajima test setup, coupled with a measurement system that consists of a stereo camera, a clamping unit with an inner die diameter of 110 mm and a hemispherical punch with a diameter of 100 mm (cf. 
Figure 1). To calculate the strain distributions, a forming process is recorded by means of an optical measuring system (ARAMIS gom GmbH) and further processed using block matching (DIC) approaches. These methods require the specimens to be prepared with a white primer and a speckle pattern of black graphite to enable image correlation. A lubrication system according to DIN EN ISO-12004-2 is used to minimize the friction between the punch and the specimen to initiate a rupture at the top of the specimen. With a variation of the sample geometry to be investigated, starting from an uncut sample geometry up to conical blanks with parallel connections, induce different loading conditions and strain paths. The remaining conjunction width determines the name of the geometry, e.g., 50 mm corresponds to S050. For each sample geometry, three forming experiments are carried out and recorded. The test parameters are varied across the investigated materials (cf. 
Table 1), e.g., punching speed (1 to 2 mm/s) and sampling rate (15, 20, 40 Hz), in order to evaluate different boundary conditions to provide varying combinations of strain paths, and to test the generalization of the method to multiple loading conditions as uniaxial, plane-strain, or biaxial.
In this study, the three known materials from Part 1 and 2 are reused, while a challenging aluminum alloy is also evaluated: (1) DX54D, a deep drawing steel, with ductile necking behavior and observable localization on the surface; (2) DP800, a dual phase steel, high-strength material with a matrix of ferrite and martensite precipitations and hence multiple observable local maxima during Nakajima tests; (3) AA6014, a lightweight aluminum alloy of the 6xxx series with multiple maxima during Nakajima tests under plain strain conditions and (4) AA5182, an aluminum alloy of the 5xxx series with several local maxima in the form of shear bands (PLC effect). The principal material properties are summarized in 
Table 1. For a comprehensive description of the forming behavior of the evaluated materials please refer to Part 1.
  3. Method
The proposed classification approach mainly follows the typical processing pipeline used in pattern recognition solutions, even though it uses a CNN. The conventional pattern recognition pipeline consists of four steps: data acquisition, preprocessing, feature extraction, and classification. The main difference, introduced with the advent of deep learning techniques, is the data-driven feature extraction and combination with the classification step. Instead of predefining the extracted features and employing these to train and test a classifier as in conventional pattern recognition systems, the feature extraction and classification steps are combined and formulated as optimization problem. Solving this optimization problem enables estimation of a problem-specific optimal solution, e.g., by minimizing a suitable problem-specific loss function. Comparable to the conventional pattern recognition approaches, the data set is subdivided into multiple disjoint data sets. During training of the network, the training data set is used to minimize the loss function, by adjusting the weights of the layers of the CNN using a gradient descent optimizer. After the network has seen all the training data, the separation hypothesis is evaluated using the validation data set to assess if the hypothesis generalizes to unseen data or if it over-fits to the training data. This procedure is repeated until convergence or until the error in the validation data set increases, and subsequently, the performance of the network is assessed using the remaining unseen test data set. There exist supervised and unsupervised deep learning-based techniques, similar to conventional pattern recognition approaches that either require expert annotations or solve the optimization problem without the use of data (class-) labels. The framework proposed in this study employs both types of learning techniques. In the supervised part, the extreme cases of the homogeneous and inhomogeneous forming phase, are used as labels for the data set. The number of images used per material for training are found in 
Table 2, where # images depicts the total number of images per sequence for the individual material and geometry, # homog. the number of images of the homogeneous forming phase and # neck. the number of images of the inhomogeneous forming phase. Only these extreme cases e.g., beginning 20 from the clear homogeneous and last 3 images from the inhomogeneous phase, close to fracture, are used to train two identical CNN’s that share the same weights and hence assess the similarity of the images. Such a setup was designed to account for the difficulty in assigning labels in a reliable manner to images in a sequence. Consequently, we adopt a Siamese CNN to extract features from images which are guaranteed to belong to either the homogeneous or inhomogeneous class. This setup enables pairwise comparisons between the two types of images using a similarity metric in a low-dimensional feature space. As a result, of the similarity metric, the two instances either belong to the same or different class. This supervised classification setup is used to separate the two classes in an optimal manner, by simultaneously learning compact representations of both classes.
In the unsupervised part, the unused data between the beginning of the homogeneous and the end of the inhomogeneous forming phase, as well as the extreme cases are assessed using the trained network to create low-dimensional representations. Using an unsupervised clustering approach, the low-dimensional instances of each forming sequence are then assigned to one of the three clusters: homogeneous, transition, necking. For clarification, the data sets are disjoint, meaning that e.g., the extreme cases of all but one material are used for training, and validation as well as to learn an optimal separation between the two classes and a low-dimensional representation of the data. The actual evaluation and clustering is performed using the held-out material.
  3.1. Preprocessing
The study uses the data set of Part 1 & 2 and was acquired using the ARAMIS optical measuring system (v6.3.0-7), and consists of three-channel video sequences comprising of major and minor strain as well as thinning. The difference between two successive images were used as input to remove the correlation with the punch displacement and to emphasize the incremental changes. Since the necking and in general the material instabilities are localized changes in the strain distribution, the incremental changes allow a better evaluation of this phenomenon. The sequences were adjusted by removing the frames that contain the fracture information, identified at the end of the forming process with an increasing amount of defect pixels due to specimen cancellation.
Single defect pixels or missing values that occurred occasionally were interpolated linearly using the temporal information, e.g., the strain progression. Other defect pixels, such as static defect pixels that contain no information during the whole forming process were replaced by the average value of a 
 neighborhood. To reduce the influence of outliers that usually occur at border regions of the specimen and are off by a large magnitude when compared to the rest of the image, the data was normalized using the 0.5% and 99.5% percentile of the image intensities and standardized afterwards. Furthermore, to be able to train the network without special restrictions due to the specimen geometries and to facilitate the experiments’ procedure and support interchangeability of materials, the data was center cropped by a rectangle with a side length of 72 px. This size was determined based on the S245 geometry of the DX54D material, and the largest possible inner rectangle. Since the aim of the study is to identify the localized necking based on the material changes, the dark homogeneous border regions must be avoided, to prevent the network from learning features based on these curved border regions, rather than the intensities that are included within each patch. The changing image size and the amount of the curved region included within the evaluation area would bias the network, as visualized in 
Figure 2.
  3.2. Network Architecture
The Siamese network comprises two identical sub-networks, where each consists of the first 3 convolutional blocks of a VGG16 network [
29] that was pretrained on a large-scale image database [
30], rather than training the network from scratch with a limited amount of data. Additionally, to adapt to the new data, one dropout layer (0.5 dropout rate) between the two fully connected layers (512, 256 neurons) followed by one L2 normalization layer. The two CNN’s form a Siamese CNN architecture [
25] as visualized in 
Figure 3 and depicted by feature learning, which is used for learning low-dimensional representations of the data, and to assess the images in a pairwise manner. Two inputs (
) are evaluated simultaneously using an identical network structure (
) with shared weights (
). This setup allows a pairwise comparison between the two, L2 normalized, low-dimensional outputs of the network (
), as different input images are assessed identically, based on a suitable distance measure.
  3.3. Supervised Siamese Optimization
While other loss functions usually sum over individual samples, this loss function evaluates the input samples 
 in a pairwise manner, while learning the parameterized distance function 
 using the Euclidean distance between the low-dimensional outputs of 
.
        
The general loss function is described as,
        
        where 
 denotes a labeled input sample pair. The label 
Y refers to the extreme cases of the input sequence, where 
 if the samples are from the same class and 
 if the samples are from different forming phases. 
 denotes the loss for samples of the similar phase while 
 denotes the loss for dissimilar samples. These losses are designed such that minimizing 
L with respect to 
W leads to low values of 
 if the samples are similar and high values otherwise.
The final loss function, the contrastive loss [
31] is denoted as,
        
        where 
m is a margin parameter that defines a radius around 
, the threshold distance for dissimilar pairs. Due to the incomplete nature of the data, as only the extreme cases are used for training, this parameter cannot be optimized and is naively set to 1.0.
  3.4. Unsupervised Clustering
So far the network is trained and optimized to create low-dimensional manifolds that support the optimal separation of the extreme cases of forming sequences. Even though, the network never saw a complete forming sequence, leads to the extraction of discriminative features from complete forming sequences. Hence, it may be used to assess and cluster the individual frames of video sequences using the low-dimensional feature representations. Principal component analysis (PCA) is applied to further reduce the dimensionality of the manifolds to two dimensions, facilitating the unsupervised clustering. The first two components were chosen, across all different geometries and materials, as they cover more than 95% of the variance of the data. In contrast to the training process of the network, where geometries and loading conditions of different materials were combined to increase the amount and variance of the data, to enable the network to learn generalized feature representations that describe necking, the individual geometries and materials are not combined during the assessment of the held-out data. This reduces the available information and number of data points that may be used to cluster the data. An increase in the density of the distributions is achieved by artificially augmenting the data, e.g., by cropping randomly, flipping, translating (by 5 px) and rotating (by 15 deg.) each image. The effect of such transformations is well described and visualized in [
31]. For evaluation of each geometry of the held-out material, the three complete sequences of this geometry, depicted by 
 are used by the network to create the manifolds, while after PCA dimensionality reduction, the components are clustered and described by distributions (cf. 
Figure 3 depicted by clustering), using Student’s t Mixture Models (SMM) [
26]. These models are more robust to outliers compared to Gaussian Mixture models as shown in medical image segmentation [
32] or registration [
33]. This procedure enables the forming sequences to be described as probabilities, wherein, the probabilities represent the membership likelihood of the center cropped unaltered images at each time point, to the clusters of the mixture model. The clusters in turn represent distinct phases of the forming process, thereby enabling the frames of sequences to be classified into the same. The complete “unsupervised” pipeline is visualized in 
Figure 4. 
Figure 4a shows the unclustered PCA reduced features of the center cropped and augmented data of three trials of AA6014-S070. 
Figure 4b depicts the same data after being clustered with SMM, whereas 
Figure 4c visualizes the individual probability progression (with respect to mixture component membership) of each of the three sequences.
  6. Discussion
Nakajima-based forming processes induce patterns on the surface of sheet metal materials and within the strain distributions when using a measurement device. In part 1 a supervised classification algorithm with expert knowledge/annotations was introduced, leading to good agreement between experts and classification results. Part 2 introduced an unsupervised classification algorithm, without expert annotations, based on conventional pattern recognition that evaluates edge information within small rectangular patches near maximum strain values using a one-class SVM with HoG features. In general, it led to consistent results throughout the experiments conducted, and introduced a probabilistic FLC. The main idea of this procedure was to introduce well-established pattern recognition methods and specific features that evaluate edge responses to support the hypothesis that localized necking correlates with a sudden increase/decrease of principal strains within strain distributions. As long as the hypothesis is correct, the method seems suitable as a base line approach that enables the estimation of multiple degrees of necking certainty using the FLC quantiles.
However, it has four disadvantages: (1) location dependency, due to evaluation only in the vicinity of the maximum strain area; (2) time-dependency, as it requires specimen to be formed until fracture and hence renders a comparison with prematurely stopped forming processes impossible; (3) predefined features limited to edge information; (4) transfer of knowledge to other materials or forming processes is not possible (i.e., it does not permit generalization to new materials/processes).
The proposed method overcomes these limitations in a two-step approach. First a Siamese CNN is trained in a supervised manner, where only the extreme cases of the homogeneous and inhomogeneous forming phase (begin/end of forming process) are used to train the network. These images are separated optimally by minimizing the contrastive loss function, while augmentation is used to increase the data amount throughout several experiments. In the second step, complete forming sequences, not only the extreme cases, are assessed by the network and transformed into low-dimensional representative manifolds. Using PCA and SMM, these manifolds are clustered in an unsupervised manner into three distributions corresponding to the three phases, homogeneous, transition and localized necking. This procedure overcomes the mentioned limitations: (1) location dependency is reduced as a consequence of using the maximum possible square evaluation area of all materials; (2) the overall framework is now independent of time, by replacing the one-class SVM with SMM, allowing assessment of individual frames of incomplete forming sequences coupled with strain paths; (3) optimal features are learned by the Siamese CNN, and are not limited to edge information; (4) transfer of knowledge from one material to another is possible as shown with the lomocv experiment, enabling on-line supervision of unknown materials during forming processes; and (5) generalization of the method to materials with complex forming behavior such as AA5182, with limited data is possible as shown within the overfit and losocv experiment. However, accommodating measurement artifacts and defect pixels remains challenging, as their presence or absence may bias training of the network or negatively affect clustering of the data.