Machine Learning-Based Detection Technique for NDT in Industrial Manufacturing

: Fluorescent penetrant inspection (FPI) is a well-assessed non-destructive test method used in manufacturing for detecting cracks and other ﬂaws of the product under test. This is a critical phase in the mechanical and aerospace industrial sector. The purpose of this work was to present the implementation of an automated inspection system, developing a vision-based expert system to automate the inspection phase of the FPI process in an aerospace manufacturing line. The aim of this process was to identify the defectiveness status of some mechanical parts by the means of images. This paper will present, test and compare different machine learning architectures to perform the automated defect detection on a given dataset. For each test sample, several images at different angles were captured to properly populate the input dataset. In this way, the defectiveness status should be found combining the information contained in all the pictures. In particular, the system was designed for increasing the reliability of the evaluations performed on the airplane part, by implementing proper artiﬁcial intelligence (AI) techniques to reduce current human operators’ effort. The results show that, for applications in which the dataset available is quite small, a well-designed feature extraction process before the machine learning classiﬁer is a very important step for achieving high classiﬁcation accuracy.


Introduction
Non-destructive testing (NDT)-also known as non-destructive inspection (NDI) or examination (NDE)-is among the most commonly used techniques for material structure investigations, especially in critical fields like aerospace, where most of the components should be analysed [1].
Several technologies exist for defect identification in NDT applications. Ultrasonic testing is adopted for the detection of defects inside sound-conducting materials. The resulting signal should be properly investigated for identifying the nature and size of the defect [2]. Another well-investigated technique for NDT is the use of infrared cameras: the material was subjected to a mechanical wave that was converted in heat at the correspondence of defects [3]. This concept has also been extended for applications in which the testing was performed by an UAV [4]. Furthermore, eddy currents have been used in NDT for crack identification [5].
Another NDT that is gaining scientific and technological relevance in multiple areas is continuous wave terahertz imaging, that has been proved to provide very good results in the identification of several defects, such as voids and water infiltrations [6].
Other approaches to NDT are the acoustic emissions, the digital image correlation, and the infrared thermography. These techniques have been successfully applied to the inspection of composite material [7]. Fluorescent penetrant inspection (FPI) is a well-assessed NDT method used in manufacturing for detecting porosity, cracks, fractures, laps, seams and other flaws of the manufactured sample under test, caused by fatigue, impact, quenching, machining, grinding, forging, bursts, shrinkage or overload [8].
Although several technologies exist for assisting the NDI/NDT personnel in detecting the defects, simple visual inspection may still account for up to 80% of all quality inspections in the automotive and the aerospace field. Quality visual control is always associated with a certain degree of variability in both throughput and accuracy; an automated vision system can significantly improve both of them. Indeed, visual inspectors have must rely solely on their vision to assess the condition of the components produced (axes, turbine, fuel injectors etc.); typically, when they perform NDI and NDT, they are often supported by highly sophisticated imaging and scanning devices in order to detect the defects [1].
Image processing algorithms and computational intelligence (CI) can be implemented for introducing an automated approach for defect classification [9].
Several studies have been conducted in the area of using image processing to detect cracks and defects in materials: a comparison of image analysis techniques for detecting cracks in concrete structures can be found in [10,11], but only very few studies in the literature have focused on automatic crack detection in metals using image analysis. Some of these investigated the possible FPI automation of drive and crank shafts for the automotive industry [12] by using fixed CCD cameras and UV light illumination, but did not reveal their results, while in [13], forged metal parts were analysed with a camera, but the proposed approach shows relatively low robustness and some false responses; an FPI system was proposed in [14], however, unfortunately lacks accuracy, still showing errors in classifying image features.
Computational intelligence techniques have firmly established themselves as effective numerical tools in the recent past. They have been extensively employed in many systems and application fields, like signal processing, automatic control, industrial and consumer electronics. Image processing is also an extremely attractive area for the development of new computational intelligence-based techniques and their applications, such as image preprocessing and compression algorithms, image analysis pattern recognition, classification and clustering, image inferencing [15].
Among the most relevant industrial applications, NDT can benefit from recently emerged computational intelligence (CI) techniques (i.e., independent component analysis (ICA), wavelet representation, support vector machines, fuzzy systems, neural networks).
Several practical examples for the solution of relevant inspection problems are presented in [16], where the advantages of using the CI approach have been assessed on applications of NDT/NDI in both civil engineering and industrial problems.
Additionally, in order to manage the uncertainty intrinsically connected with NDT, fuzzy logic can be used, as shown in [17], where reasonably simple computational models are presented as good fault detection as the known more complicated models. Moreover, an evolutionary neural network (ENN) can be used to quantify the defects in NDT, as shown in [18]. In this case, information extracted from the pixels of visual camera imaging provides characteristic parameters for network training and validation, resulting in an error of less than 2%.
Very good performances were obtained using random forests for image classification [19,20], and deep learning algorithms [21].
In terms of real-time capabilities, recent literature [22] has assessed the ability of neural networks to manage input data with a time interval sampled approximately one-tenth of the time constant of the considered system.
The proposed system, specifically tailored on the stringent requirements of a highvolume aerospace test case, consists of a robot for part handling, a machine vision solution for detecting the defects, and intelligent software allowing real-time image processing and highly reliable decision making.
The aim of this process was to identify the defectiveness status of some mechanical parts by means of images. Different possible system architecture will be presented and tested on a basic dataset of real images, in order to properly validate and compare the proposed procedures. For each test sample, several images at different angles are captured to populate the input dataset. In this way, the defectiveness status should be found to combine the information contained in all the pictures.
The proposed technique combines deterministic feature extraction with machine learning classifiers in order to exploit the advantages of the latter reducing the training set size required. In fact, for this application, the number of damaged pieces is very small, thus the number of occurrences of each defect is a very crucial aspect in the training of ML classifiers.
With respect to statistical or purely deterministic approaches [23], the proposed approach is able to exploit the capability of neural networks to provide a non-linear classifier. The main disadvantage of this approach is the reduction in the learning capability of the ML, because the information in input is limited by the process of feature extraction. However, the results obtained for the considered problem shows that this method achieves better performance with respect to a pure ML [24].
Other works have exploited feature extraction in NDT; however, in many cases, this is the outcome of a properly trained ML model. For example, in [25], deep learning is used for extracting key features from eddy currents testing; this kind of approaches cannot be used in the application proposed in this paper also due to the lack of a numerical model of the defectiveness.
The paper is structured as follows: Section 2 describes the architecture of the proposed automated system; Section 3 presents the considered ML classifiers and an additional data analysis technique; while Section 4 describes the dataset considered in this case study; finally, numerical results and conclusions are reported in Sections 5 and 6, respectively.

Automated FPI System Architecture
The context in which the proposed machine learning technique is introduced is the automated fluorescent penetrant inspection (AutoFPI) project, a research program funded by the MANUNET initiative, in the framework of European Union's Horizon 2020 research and innovation initiative: its aim is the development of an automated system with the scope of analysing failures in industrial parts and components.
As previously described, FPIs are NDT techniques, with a sequence of operations which can be generally described by the flow chart shown in Figure 1. Their main drawback is the requirement, in the last stage (the evaluation part), that an expert operator works in dark room to process the parts manually.   In particular, the system was designed for increasing the reliability of the evaluations performed on the part, by implementing an ad hoc designed human-machine interface and proper artificial intelligence (AI) techniques to learn from expert operators how to evaluate part quality and identify features under the UV light.
The employment of a robotic part manipulation system and the high-speed decisionmaking algorithm will likely increase the FPI efficiency, thus reducing the inspection time and allowing to process more parts every day.
The inspection process will gain repeatability and the availability of data will allow improved flaw localization and sizing and the issuing of detailed reports and statistics on production.
Finally, the operators can optionally check and validate the overall process quality by working with a PC, also in a remote working environment. All these improvements will ultimately minimize the cost of the process, but will also have social impacts, improving operators' working conditions, avoiding manual part processing and monotonous tasks.
Within this framework, it is possible to consider some machine learning processes that automatically evaluate the part status. Their design should take into account that the final defectiveness rate is a function of all the pictures taken from different angles.
The main constraint in the ML design phase is the low number of available samples. In particular, the defectiveness rate is very low, and it is almost impossible to train a convolutional deep learning network that can directly assess the entire part. Thus, multistep architectures were investigated for reducing the number of samples required for the training phase; some deterministic steps were introduced in the system and the final ML impact was limited.
Two systems were designed: they differ from the amount of image preprocessing required before the neural network.
The part analysis starts with a preliminary deterministic analysis of the images, in which the spots are identified and extracted in small boxes. The accuracy of this step highly depends on the noise level in the original images: generally, if the cleaning phase has been properly performed, the noise is relatively low. It introduces a new user-defined parameter, i.e., the box size, that can be used for tuning the trade-off between the number of data contained in each box and the computational time required for training the system.
Another important advantage of this approach is related to the fact that image multiplication techniques can be exploited for enlarging the available dataset.
The investigated procedure is based on this approach. The main architecture based on the spot identification is the one depicted in Figure 3, where each spot identified in the original picture is directly processed by means of a neural network. The input size of this network is the number of pixels in the box and its output is the spot classification. In this architecture, it is possible to exploit both deep neural network and multi-layer perceptron. The required dataset is smaller with respect to the previous cases, and it is simpler to acquire: in fact, for each image it is possible to have more than one indication.
In this case, there are two deterministic steps that should be carefully designed: the first one is the spot identification and the second one is the heuristic logic for finding the final sample status.
In an alternative design architecture (Figure 4), a further deterministic step was introduced: in fact, for each spot, some features are deterministically extracted and used as input of the neural network. Within this framework the aim of this paper was to analyse the performance of the two neural network-based classifiers.

Machine Learning Techniques for Spot Identification
Machine learning (ML), especially neural networks, can be effectively implemented for the defectiveness status identification. The target of ML system is the classification of each spot identified in the original image. The input layer is provided as a B/W image with 60 × 60 pixel size.
The output of the system is the spot classification among three possible classes which were identified in the considered application for aerospace manufacturing: "healthy" (when no defect is found); "linear defect"; and "crack" (based on the defect shape, respectively).
In this Section, two different classifiers are described: the first one is the direct spot classifier and the second one is the indirect spot classifier. This last approach requires an additional deterministic feature extraction step.

Direct Spot Classifier
The direct spot classifier, shown in Figure 5, directly receives as input the image in which there is the spot. Thus, its input layer has 60 × 60 neurons. The developed neural network has only one hidden layer to reduce the possibility of incurring in over-training, while the output layer is composed of three neurons, one for each target status of the part. The network is trained with the scaled conjugate gradient algorithm using the cross-entropy error index [26].

Indirect Spot Classifier
The second proposed classifier is the indirect method: the image is preliminary analysed, and some deterministic features are extracted.
The features were selected after interviewing the expert users: from these, it was found that the most important aspects for the classification are the area of the spot and its aspect ratio.
Thus, the new selected features are the following: area of the spot, major and minor axis of the ellipse circumscribed to the spot, its eccentricity, and finally the orientation of the spot. Figure 6 shows the architecture of this indirect system. For better understanding which is the real contribution of the feature extraction process on the classification, with respect to the neural network's one, the t-distributed stochastic neighbour embedding method was applied, as described in the next section.

t-Distributed Stochastic Neighbour Embedding
The t-distributed stochastic neighbour embedding (t-DSNE) is a machine learning algorithm for high-dimensional data visualization [27].
It is a non-linear dimensionality reduction technique well suited for embedding highdimensional data for visualization in a low-dimensional space of two or three dimensions. Specifically, it models each high-dimensional object by a two-or three-dimensional point in such a way that similar objects are represented by nearby points and dissimilar objects are modelled by distant points with high probability.
This method was implemented in this paper for analysing and validating the sample distribution at three different levels of the classification: at the first step, when spot images were analysed; then, after the deterministic features were extracted; and finally, at the end of the classification process, as reported in the following section.

Case Study and Dataset
The considered dataset is composed by some 60 × 60 pixel boxes each one containing a single spot. These boxes are in black and white because the original colour images were preprocessed with a thresholding algorithm for reducing the image noise.
The dataset is thus composed of single spots that were classified by the expert operators in three possible classes: cracks; linear indications; and healthy indication. Figure 7 shows some spots of the available dataset: the colour of the box is representative of the type of defect: in green, the healthy indications; in orange, the linear ones; and in red, the cracks. Due to the fact that classification was manually performed, it may be possible to have some misclassified samples.
The available spotted images were selected in order to have the same number of indications for the three classes. From this dataset, the training, validation and testing sets were extracted. In this case also, the size of each class is the same in all the subset. This created a dataset division as shown in Figure 8.
For all the performed analysis, the training set was composed of 65% of the samples; the validation set of 15%; and the test set of 20%-for a total of 1502 images, as reported in detail in Table 1.  This dataset was used to train the two classification networks of the architectures described in Figures 3 and 4. In particular, the first one was a direct spot classification because the network was directly fed with the pixels' values, while the second one was an indirect spot classified since the image is deterministically preprocessed.

Numerical Results and Validation
Numerical analysis was performed on the described dataset, considering the two previously presented approaches, i.e., the direct and indirect spot classifier, respectively. Additionally, the results of the classification were compared by means of the t-DSNE approach described in Section 3.3, in order to graphically compare and validate the effectiveness of the proposed techniques.
All the machine learning techniques were implemented in Matlab R2019b starting from the Neural Network Toolbox and properly modifying the network features to be suited for the specific problem.

Direct Spot Classifier
The neural network was composed of an input layer with 360 neurons, one for each pixel, one hidden layer with 150 neurons, designed after a preliminary sensitivity analysis, and a final output layer with three neurons, that return the probability that a spot belongs to each classification type.
As mentioned in Section 3.1, the error function used in training the neural network was the cross entropy [26] and the training algorithm was the scaled conjugate gradient. The termination criterion selected was the validation check. Twenty-five independent trials were performed and Figure 9 shows the training and testing convergence curves.
The best network obtained according to the testing result was analysed. It reached the best validation performance at epoch 77, as shown in Figure 10, where the training (blue line), the validation (green line) and the testing (red line) are shown. Here, it is possible to see that the training performances are much better than the validation and testing ones, showing that the network has a relatively reduced generalization capability.
The performance of this network can be more deeply analysed by means of the confusion matrices, in which the percentage of correct and wrong classifications are reported for each case. In Figure 11, both the confusion matrix on the entire dataset and on the testing set are reported.     The first aspect that can be noticed is the different performance between the entire dataset and the test set: in fact, in the first one, the percentage of correct classification is 73%, while on the test set, it is only 61.7%.
Secondly, it is possible to see that the most of errors are concentrated in the misclassification of defects (linear classified as cracks, 134 cases over the entire dataset, and crack classified as linear, 86 cases). There is a quite high number of defects that has been classified as healthy indications: this is a quite relevant error because it can lead to safety issues in the specific application domain.

Indirect Spot Classifier
The neural network is now composed of an input layer with five neurons, one for each feature, one hidden layer with 10 neurons (sized with a sensitivity analysis) and a final output layer with three neurons, with the same meaning seen before. As in the previous case, the error function used is the cross entropy and the training algorithm is the scaled conjugate gradient. Figure 12 shows the training and testing convergence curves for 25 independent trials.  Comparing these results with the one of a previously proposed direct classifier, it is possible to see that the cross-entropy error is significantly lower, especially in the testing performances.
In this case, once more, the best network obtained according to the testing result was analysed. Its detailed convergence curves are shown in Figure 13. Thus, the distance between the training performance and the testing one is much smaller than before, meaning that a higher generalization capability of the system was reached.
The performance of this network can be analysed more deeply by means of the confusion matrices depicted in Figure 14.
The first observation is related to the very high performance rate of this network: it achieves 97.3% correct classification on the test set.
In this case also, most of the misclassifications were related to cracks classified as healthy (1% of the test set) and crack indications classified as linear or vice versa (altogether another 1% of the test set).
For better understanding the misclassification impact, Figure 15 shows some examples related to it. For each spot, the output class, its confidence level and the target class are each indicated.
Analysing these cases, it is possible to see that the second case can be considered border line between a real crack and a healthy indication. The performance of this classifier is very good and its training computational time is reduced by the fact that inputs are five numbers for each sample.

Results Validation and Comparison
As mentioned in Section 3.2, in order to evaluate the real contribution of the feature extraction process on the classification, with respect to the neural network's one, the t-distributed stochastic neighbour embedding method was applied, as described in Section 3.3. Figure 16 shows the 2-D representation of the spot images samples: the classification classes are represented by means of the colours: red for healthy indications; green for cracks; and blue for linear ones. In this figure, it is possible to observe that there are almost no clusters, meaning that the initial distribution of samples cannot be linearly classified. The high concentration of healthy indications in the centre of the picture corresponds to the samples in which the number of white pixels is very low.   Figure 17 shows the t-DSNE visualization after the extraction of the features. In this case, there are some clusters, but they are often composed of samples belonging to different classification classes. The only exception is a very isolated cluster of healthy indication on the top of the Figure: it corresponds to those samples in which the number of white pixels, i.e., the area of the spot, is very low. It is possible to see that the deterministic feature extraction helps the classification, but it is still not accurate enough. Finally, Figure 18 shows the t-DSNE of the final classification after the neural network run: it is possible to see that the samples are well clustered, thus validating the effectiveness of the proposed procedure with respect to the simple feature extraction process, and its increased accuracy in defect detection.

Conclusions
In this paper, different ML-based architectures were tested to perform NDT applied to an industrial quality test. For each considered part, several images taken at different angles were analysed. The proposed methods were able to detect and classify the defectiveness status of the relevant parts, combining the information contained in the source pictures with the expert operator experience employed to train the system, thus improving the overall NDT procedure effectiveness.
The resulting automated FPI process may find wide application in several manufacturing sectors such as aerospace, automotive, energy, and nuclear, where production requires the extensive use of FPI techniques, thus leading to significant improvements on the current state of the art practices.
In addition to an increased quality of inspection and probability of detection, the results of the automated process may be less dependent on subjective operator analysis, also leading to a reduced number of "false calls" and a higher customer confidence.
The reduction in inspection time, obtained by the robotic part manipulation and the high-speed decision-making algorithm, will highly increase the FPI efficiency, allowing to process more parts every day and impacting the overall production lead time, also improving human operators' working conditions.
The combination between deterministic feature extraction and machine learning classifier provides the optimal performance considering the non-linearity problems and the limits of the available samples due to the high reliability of the production process.
increase and empowerment of collaboration between companies and research facilities"-Manunet Call 2017.
Institutional Review Board Statement: Not applicable.