Metal Additive Manufacturing Parts Inspection Using Convolutional Neural Network

: Metal additive manufacturing (AM) is gaining increasing attention from academia and industry due to its unique advantages compared to the traditional manufacturing process. Parts quality inspection is playing a crucial role in the AM industry, which can be adopted for product improvement. However, the traditional inspection process has relied on manual recognition, which could su ﬀ er from low e ﬃ ciency and potential bias. This study presented a convolutional neural network (CNN) approach toward robust AM quality inspection, such as good quality, crack, gas porosity, and lack of fusion. To obtain the appropriate model, experiments were performed on a series of architectures. Moreover, data augmentation was adopted to deal with data scarcity. L2 regularization (weight decay) and dropout were applied to avoid overﬁtting. The impact of each strategy was evaluated. The ﬁnal CNN model achieved an accuracy of 92.1%, and it took 8.01 milliseconds to recognize one image. The CNN model presented here can help in automatic defect recognition in the AM industry. experiments were investigated on hyper-parameters including kernel size and the number of layers, data augmentation operations, and regularization. Our ﬁnal model achieved an accuracy of 92.1% with 8.01 milliseconds recognition time of one image. The results indicate the promising


Introduction
Metal additive manufacturing (AM) processes have introduced some capabilities unparalleled by traditional manufacturing, as they realize custom-designed shape, complex features, and low materials consumptions provided by AM [1]. Laser metal deposition (LMD) is a form of AM which accomplishes the layer-by-layer fabrication of near net-shaped components by introducing a powder stream into a high-energy laser beam. During the LMD process, a melt pool is formed by rastering the laser beam across the sample surface, and the powders are injected into the melt pool for each layer deposition. LMD has been explored for various applications, e.g., metallic component repair, surface modification, and layering gradient metal alloy on a dissimilar metal base [2][3][4][5]. Some of the control parameters involved in LMD are the laser power, laser scan speed, powder feed rate, shielding gas flow rate, and the quality of the powder feedstock. The above parameters constantly affect the parts being formed. Some studies have focused on the process parameter selection and optimization of the performance of LMD parts, but the defect presence is still high compared to traditional manufacturing [1,[6][7][8].
The common defects present in the LMD process are crack, gas porosity, and lack of fusion (LoF), which have negatively affected the properties of LMD fabricated parts [9]. The direct joining of two dissimilar alloys is usually compromised by cracks, resulting from the residual stress, formation of brittle intermetallic compounds, or differences in thermal expansion coefficient [2,5,10]. Reichardt tried to fabricate gradient components transitioning from AISI 304L stainless steel to Ti-6Al-4V, but the component was halted due to cracks in the build [3]. A similar phenomenon has been observed by Huang [4] and Cui [5], which the material deposited cracked prior to analysis. Gas porosity, which is caused by the entrapment of gas from the powder feed system or the release of gas present in the classifications. The calculated image-based features were then fed to an SVM [29] classifier. The results showed that the CNN methods offered the best classification performance [30]. Similarly, Wang [20] and Chowdhury [31] successfully applied the traditional computer vision and CNN algorithms to micrography recognition tasks, which turned out that CNN represented the highest classification accuracies. Besides, CNN was adopted to link experimental microstructure with ionic conductivity for yttria-stabilized zirconia samples [32]. The CNN models have been applied in surface detection in bearing rollers, aluminum parts, and steel plates [33][34][35][36][37]. It was found out that CNN-based methods had better and more robust performance compared to the SVM classifiers.
The objective of this work is to explore a good CNN-based architecture with its parameters for the robust inspection of LMD fabricated parts. We will first discuss the model training and then provide performance evaluation and failure analysis.

Additive Manufacturing Parts Inspection
In this section, the sample preparation, data preprocessing, data augmentation and convolutional neural network architecture is described.

Sample Preparation
The sample preparation was performed by the researchers at Missouri University of Science and Technology (Missouri University of Science and Technology). The specimens were fabricated by the LMD process, which included AISI 304 stainless steel, AISI 316 stainless steel, Ti-6Al-4V, AlCoCrFeNi alloys, Inconel 718 alloys. A 1 kW continuous wave yttrium aluminum garnet (YAG) fiber laser (IPG Photonics, Oxford, MA, USA) with a 2 mm beam diameter was used in the experiments. Table 1 lists the process parameters employed to fabricate the parts. The energy density is defined as E = Laser Power/Scan speed × Layer thickness (J/mm 2 ) [38,39], which is considered as a key factor affecting the quality. For the quality inspection, the samples were transverse cross-sectioned and prepared with the standard metallographic procedure. The images were captured by a Hirox (Hackensack, NJ, USA) digital microscope with a magnification of 100 and a resolution of 1600 × 1200 pixels, which provided enough information about the defects.

Preprocessing
The optical images obtained were split into blocks of size 224 × 224 pixels. After splitting, each block was screened, and 4140 image blocks were processed for the experiment. Some of the image blocks were not selected as they consisted of unusable regions, such as the mounting epoxy materials. Four types of parts quality, including good quality, crack, lack of fusion, and gas porosity are shown in Figure 1. The number of each type is given in Table 2. The samples were shuffled and randomly split into a training set (3519 samples) and a test set (621 samples). Then the training dataset was divided into training samples (2898 samples) and validation samples of 621 images.

Data Augmentation
To achieve good performance of CNN, a large number of labeled datasets are needed. As our dataset is comprised of several thousands of samples, the expansion of the dataset is necessary. Therefore, data augmentation operations [22] were applied to our original images. Each image was passed through a series transformation: random rotation from −180° to 180°, horizontal flipping, random crop, adding Gaussian noise and blur [16], as shown in Figure 2.

Data Augmentation
To achieve good performance of CNN, a large number of labeled datasets are needed. As our dataset is comprised of several thousands of samples, the expansion of the dataset is necessary. Therefore, data augmentation operations [22] were applied to our original images. Each image was passed through a series transformation: random rotation from −180 • to 180 • , horizontal flipping, random crop, adding Gaussian noise and blur [16], as shown in Figure 2.

Convolutional Neural Network (CNN) Architecture
Several CNN models were explored using our dataset to obtain the optimal hyper-parameters of the model. Figure 3 presents the final schematic framework after several experiments. The overall schematic model is composed of feature extraction and classification. To make this work in a selfcontained way, the fundamentals of our CNN model will be briefly described below. First, there were M depth images, Xm, and the images were scaled to 224 × 224 pixels with grayscales after the data augmentation process, and then fed into the first convolutional layer with the kernel size of 5 × 5 for feature extraction. In order to model non-linearities of the mapping between input and output, the Rectified Linear Unit (ReLU(x) = max (0,x)) [21] was used in each convolutional layer operation. Moreover, a max pooling layer of 2 × 2 was followed by each convolutional layer. The max pooling layer substituted the activation in a sub-region of the feature map with the maximum value in that region. The pooling layer downsampled the previous feature map.

Convolutional Neural Network (CNN) Architecture
Several CNN models were explored using our dataset to obtain the optimal hyper-parameters of the model. Figure 3 presents the final schematic framework after several experiments. The overall schematic model is composed of feature extraction and classification. To make this work in a self-contained way, the fundamentals of our CNN model will be briefly described below. First, there were M depth images, X m , and the images were scaled to 224 × 224 pixels with grayscales after the data augmentation process, and then fed into the first convolutional layer with the kernel size of 5 × 5 for feature extraction. In order to model non-linearities of the mapping between input and output, the Rectified Linear Unit (ReLU(x) = max (0,x)) [21] was used in each convolutional layer operation. Moreover, a max pooling layer of 2 × 2 was followed by each convolutional layer. The max pooling layer substituted the activation in a sub-region of the feature map with the maximum value in that region. The pooling layer downsampled the previous feature map. Appl

Convolutional Neural Network (CNN) Architecture
Several CNN models were explored using our dataset to obtain the optimal hyper-parameters of the model. Figure 3 presents the final schematic framework after several experiments. The overall schematic model is composed of feature extraction and classification. To make this work in a selfcontained way, the fundamentals of our CNN model will be briefly described below. First, there were M depth images, Xm, and the images were scaled to 224 × 224 pixels with grayscales after the data augmentation process, and then fed into the first convolutional layer with the kernel size of 5 × 5 for feature extraction. In order to model non-linearities of the mapping between input and output, the Rectified Linear Unit (ReLU(x) = max (0,x)) [21] was used in each convolutional layer operation. Moreover, a max pooling layer of 2 × 2 was followed by each convolutional layer. The max pooling layer substituted the activation in a sub-region of the feature map with the maximum value in that region. The pooling layer downsampled the previous feature map.  After the feature extraction, the classification module took the 28 × 28 × 128 feature maps and flattened them as a 512 feature vector. Then fully connected (FC) layers were applied to densify the 512 feature vector to the dimension of 64 and C, where C is the number of categories in our dataset. The vectors of C dimensions ([V 1 , V 2 , . . . , V c ]) were computed the predicted probability of a class using Softmax function [21] Equation (1) and transformed to the output.
where P(y m = c X m ) was the predicted probability of a sample X m being class c.
During the training process of our CNN model, the difference between the true class and corresponding predicted class were calculated by the cross-entropy loss function as in Equation (2). Through the optimization of parameters ω in the network, our target was to minimize the loss function for the training dataset X.

Hyper-Parameter Tuning
The hyper-parameter is a crucial part of the CNN model and has a significant impact on the performance, such as the number of the convolutional layers, kernel size, and L2 regularization and dropout [21] parameters. To determine the optimal hyper-parameters, several architectures were built and trained. The study of hyper-parameters was based on the training and validation datasets, which is described in Sections 3.1-3.3.

Training Details
The experiments in this work were conducted with one six-core Advanced Micro Devices (AMD, Santa Clara, CA, USA) Ryzen 5 2600 processor and one Nvidia (Santa Clara, CA, USA) GeForce 1070 GPU. The code was developed in Python 3.6.8 using TensorFlow (version 1.13.1) and Keras (version 2.2.4). Some parameters were common in all experiments and are described here. A batch size of 32 and a learning rate of 1 × 10 −4 were used. Each convolutional layer was followed by a max pooling layer with a filter size of 2 × 2 and a stride of 1. Batch normalization was used for centering and normalization of the images [40] and applied before the fully connected layers. An Adam optimizer [41] was used in the training process. Each network was identified with a unique Model #.

Evaluation Metrics
The commonly used evaluation metrics were implemented for our multiclass performance as in Equations (3)-(5). Precision: Recall: F score: In Equations (3)-(5), True Positive (TP) describes a sample from X m from a certain class y m that is correctly classified as y m ; False Positive (FP) represents a sample of X m which does not belong to class y m but incorrectly classified as y m ; False Negative (FN) is defined as a sample of X m belonging to the class y m that is incorrectly classified as "not y m " classes. F score in Equation (5) indicates the overall performance of the precision and recall, which is their harmonic mean in the interval of [0, 1].

Evaluation of the CNN Architecture
Because of the complexity and parametric variation existing in the CNN models, it was not feasible to perform all possible models with their parameters, e.g., kernel size, the number of convolutional layers. Therefore, six representative CNN models with an increasing number of convolutional layers were compared to find the optimal network. The experimental results of the six CNN frameworks are tabulated in Table 3. It was shown that by increasing the depth and the number of kernels of the network, the validation accuracy changed from 74.6% to 83.8% (from Model 1 to Model 6). The accuracy and loss plots for Model 6 in Figure 4 suggested the overfitting, in which the model fit better on the training dataset than on the validation dataset. Considering the validation accuracy, the following data augmentation operations were conducted with the network used in Models 3-6.

Evaluation of the CNN Architecture
Because of the complexity and parametric variation existing in the CNN models, it was not feasible to perform all possible models with their parameters, e.g., kernel size, the number of convolutional layers. Therefore, six representative CNN models with an increasing number of convolutional layers were compared to find the optimal network. The experimental results of the six CNN frameworks are tabulated in Table 3. It was shown that by increasing the depth and the number of kernels of the network, the validation accuracy changed from 74.6% to 83.8% (from Model 1 to Model 6). The accuracy and loss plots for Model 6 in Figure 4 suggested the overfitting, in which the model fit better on the training dataset than on the validation dataset. Considering the validation accuracy, the following data augmentation operations were conducted with the network used in Models 3-6.

Impact of Data Augmentation
Data augmentation operations were carried out on original images, and the classification performance is provided in Table 4. It was found that the validation accuracy has improved up to 5.6% compared to Table 3. It took an average of 5 min 13 s longer to train the models compared to doing so without data augmentation. Therefore, with the availability of large training datasets, the CNN models could achieve high accuracies. The following regularization was performed on the architectures of Models 5 and 6 associated with data augmentation.

Impact of Data Augmentation
Data augmentation operations were carried out on original images, and the classification performance is provided in Table 4. It was found that the validation accuracy has improved up to 5.6% compared to Table 3. It took an average of 5 min 13 s longer to train the models compared to doing so without data augmentation. Therefore, with the availability of large training datasets, the CNN models could achieve high accuracies. The following regularization was performed on the architectures of Models 5 and 6 associated with data augmentation. Table 4. Experimental results of Models 3-6 using data augmentation operations. The convolutional layers' parameters are denoted as "C kernel size/number of kernels". The fully connected layers are denoted as "FC number of hidden units". Epoch = 30 and learning rate = 1 × 10 −4 .

Regularization
Regularization could be used to mitigate the problem of overfitting during the training process. The implementation of L2 regularization and dropout were explored in our study. L2 regularization applies a penalty on large framework parameters and forces them to be relatively small. Dropout is a method that randomly drops units in the network. Several combinations of L2 regularization and dropout in convolutional and fully connected layers were tested. The training time and validation accuracy of eight models were presented in Table 5. The use of L2 in the convolutional layer with a ratio of 1 × 10 −5 and a dropout rate of 0.25 on all layers was demonstrated to be the most conducive for both architectures. Regarding the convolutional kernel size, the size of 5 has a 4.3% higher validation accuracy than the size of 3. Therefore, Model 11 was chosen, and the corresponding accuracy and loss plots are shown in Figure 5. It was observed that the overfitting issue was alleviated compared to Model 6 in Figure 4, but more fine tuning of regularization parameters and longer training time were needed to improve the network performance. Table 5. Experimental results of different L2 regularization and dropout parameters. The convolutional layers' parameters are denoted as "C kernel size/number of kernels". The fully connected layers are denoted as "FC number of hidden units". Epoch = 30 and learning rate = 1 × 10 −4 . Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 15

Regularization
Regularization could be used to mitigate the problem of overfitting during the training process. The implementation of L2 regularization and dropout were explored in our study. L2 regularization applies a penalty on large framework parameters and forces them to be relatively small. Dropout is a method that randomly drops units in the network. Several combinations of L2 regularization and dropout in convolutional and fully connected layers were tested. The training time and validation accuracy of eight models were presented in Table 5. The use of L2 in the convolutional layer with a ratio of 1 × 10 −5 and a dropout rate of 0.25 on all layers was demonstrated to be the most conducive for both architectures. Regarding the convolutional kernel size, the size of 5 has a 4.3% higher validation accuracy than the size of 3. Therefore, Model 11 was chosen, and the corresponding accuracy and loss plots are shown in Figure 5. It was observed that the overfitting issue was alleviated compared to Model 6 in Figure 4, but more fine tuning of regularization parameters and longer training time were needed to improve the network performance. Table 5. Experimental results of different L2 regularization and dropout parameters. The convolutional layers' parameters are denoted as "C kernel size/number of kernels". The fully connected layers are denoted as "FC number of hidden units". Epoch = 30 and learning rate = 1 × 10 −4 . Model 11 was trained for the fine tuning dropout parameters with epochs of 300, and the results are listed in Table 6. Model 16 achieved a validation accuracy of 94.3% and outperformed other models. The accuracy and loss plots of the four tested models are shown in Figure 6. The learning rate was increasing until around 100-150 epochs in Figure 6a. The training and validation accuracy climbed until about 50-100 epochs and then plateaued as seen in Figure 6c  Model 11 was trained for the fine tuning dropout parameters with epochs of 300, and the results are listed in Table 6. Model 16 achieved a validation accuracy of 94.3% and outperformed other models. The accuracy and loss plots of the four tested models are shown in Figure 6. The learning rate was increasing until around 100-150 epochs in Figure 6a. The training and validation accuracy climbed until about 50-100 epochs and then plateaued as seen in Figure 6c Figure 3. Table 6. Results of fine tuning dropout parameters. The convolutional layers' parameters are denoted as "C kernel size/number of kernels". The fully connected layers are denoted as "FC number of hidden units". Epoch = 300 and learning rate = 1 × 10 −4 .   Table 6. Results of fine tuning dropout parameters. The convolutional layers' parameters are denoted as "C kernel size/number of kernels". The fully connected layers are denoted as "FC number of hidden units". Epoch = 300 and learning rate = 1 × 10 −4 .

Performance Evaluation
The performance of our final model was evaluated on the test dataset. The results of the precision recall and F score of each class are reported in Table 7. Note that the overall F score could reach above 0.9, indicating good classification performance. It took about 8.01 milliseconds to handle one image in our test process, which could be adopted for real-time inspection applications. The classification accuracy of three different alloys in the test dataset is listed in Table 8. It is shown that the average accuracy is 91.9% (AlCoCrFeNi alloy), 91.5% (Ti-6Al-4V) and 92.7% (AISI 304 stainless steel) respectively, and the difference is~1%. This indicates that the model can classify the defects robustly for different alloys.  Table 9 reports the comparison of the test accuracy of our approach and other methods whose codes are publicly available. The results were experimental on our metal AM parts quality dataset. The accuracy obtained by histogram of oriented gradients (HOG) + SVM [42] was 79.6%, while the accuracy of 89.3% was achieved using Liu's CNN model [36]. Our approach had an accuracy of 92.1%. This could be attributed to our model being efficient in learning the internal features of the AM metal defects, which would be a benefit for our classification task. Table 9. The performance of classification accuracy with different methods.

Feature Visualization
To have a better understanding of what the CNN model has learned, the learned filters and the extracted feature maps are visualized in Figure 7. The 32 filters from the first convolutional layers are shown in Figure 7a. Four samples representing crack, lack of fusion, gas porosity, and good quality are present in Figure 7b-e. It seems difficult to interpret those 32 5 × 5 filters in Figure 7a, but some low-level features are extracted by reviewing the features maps in Figure 7b-e. For example, these filters were able to identify the edges of the crack, as in Figure 7b. The irregular shape of the lack of fusion was emphasized by the filters, which was different from the round shape in gas porosity. The second and the third convolutional layers are not discussed here, as they contain the high-dimensional information and make it less visually interpretable. Attention maps can be obtained for a given input image with back-propagation on a CNN model. The value of each pixel on the attention map is able to reveal to what extent the same pixel on the input image makes contributions to the final output of the network [18,19]. Therefore, through the attention maps, it can intuitively analyze which part of the AM build metal images attracts the attention of the network. Figure 8 includes the AM build metallic parts profiles shown in a-d and the corresponding attention maps e-h. The defects were highlighted in the red circles and rectangles in  Attention maps can be obtained for a given input image with back-propagation on a CNN model. The value of each pixel on the attention map is able to reveal to what extent the same pixel on the input image makes contributions to the final output of the network [18,19]. Therefore, through the attention maps, it can intuitively analyze which part of the AM build metal images attracts the attention of the network. Figure 8 includes the AM build metallic parts profiles shown in a-d and the corresponding attention maps e-h. The defects were highlighted in the red circles and rectangles in Figure 8a-d. The bright places indicate where the CNN model focuses, as seen in Figure 8e-h, while the dark regions suggest where the network is less interested. Those maps demonstrate that the network pays attention to the defects and verify the effectiveness of our model.

Failure Case Study
The failure cases that were not correctly classified in the test dataset will be discussed in this subsection. Some incorrectly classified images are shown in Figure 9. The image shown in Figure 9a was misclassified as "good" as the dirt on the sample surface led to misclassification. The sample images shown in Figure 9b-d could be attributed to the high similarity between the gas porosity and lack of fusion, which makes it difficult to distinguish them. To address the failure cases for performance improvement, some future work will be explored: (1) A variety of AM manufactured materials can be considered to be included to enlarge the dataset, e.g., ceramics, glass, polymers, and composites. (2) The architecture of the CNN models can be explored to enhance its performance.

Conclusions
In this paper, we presented the application of a convolutional neural network (CNN) for robust quality inspection of metal additive manufacturing (AM) parts. The Missouri S&T dataset, including optical microscope images of real-world metal AM parts were used to train and test the CNN model. This work contributed to the development of a CNN model with excellent performance in recognition of good quality, crack, gas porosities, and lack of fusion categories. To generate the appropriate model, extensive experiments were investigated on hyper-parameters including kernel size and the number of layers, data augmentation operations, and regularization. Our final model achieved an accuracy of 92.1% with 8.01 milliseconds recognition time of one image. The results indicate the promising application of the CNN method in quality inspection in the AM industry. It would be interesting to explore more CNN architectures and include a variety of materials in the future.

Failure Case Study
The failure cases that were not correctly classified in the test dataset will be discussed in this subsection. Some incorrectly classified images are shown in Figure 9. The image shown in Figure 9a was misclassified as "good" as the dirt on the sample surface led to misclassification. The sample images shown in Figure 9b-d could be attributed to the high similarity between the gas porosity and lack of fusion, which makes it difficult to distinguish them.

Failure Case Study
The failure cases that were not correctly classified in the test dataset will be discussed in this subsection. Some incorrectly classified images are shown in Figure 9. The image shown in Figure 9a was misclassified as "good" as the dirt on the sample surface led to misclassification. The sample images shown in Figure 9b-d could be attributed to the high similarity between the gas porosity and lack of fusion, which makes it difficult to distinguish them. To address the failure cases for performance improvement, some future work will be explored: (1) A variety of AM manufactured materials can be considered to be included to enlarge the dataset, e.g., ceramics, glass, polymers, and composites. (2) The architecture of the CNN models can be explored to enhance its performance.

Conclusions
In this paper, we presented the application of a convolutional neural network (CNN) for robust quality inspection of metal additive manufacturing (AM) parts. The Missouri S&T dataset, including optical microscope images of real-world metal AM parts were used to train and test the CNN model. This work contributed to the development of a CNN model with excellent performance in recognition of good quality, crack, gas porosities, and lack of fusion categories. To generate the appropriate model, extensive experiments were investigated on hyper-parameters including kernel size and the number of layers, data augmentation operations, and regularization. Our final model achieved an accuracy of 92.1% with 8.01 milliseconds recognition time of one image. The results indicate the promising application of the CNN method in quality inspection in the AM industry. It would be interesting to explore more CNN architectures and include a variety of materials in the future. To address the failure cases for performance improvement, some future work will be explored: (1) A variety of AM manufactured materials can be considered to be included to enlarge the dataset, e.g., ceramics, glass, polymers, and composites. (2) The architecture of the CNN models can be explored to enhance its performance.

Conclusions
In this paper, we presented the application of a convolutional neural network (CNN) for robust quality inspection of metal additive manufacturing (AM) parts. The Missouri S&T dataset, including optical microscope images of real-world metal AM parts were used to train and test the CNN model. This work contributed to the development of a CNN model with excellent performance in recognition of good quality, crack, gas porosities, and lack of fusion categories. To generate the appropriate model, extensive experiments were investigated on hyper-parameters including kernel size and the number of layers, data augmentation operations, and regularization. Our final model achieved an accuracy of 92.1% with 8.01 milliseconds recognition time of one image. The results indicate the promising application of the CNN method in quality inspection in the AM industry. It would be interesting to explore more CNN architectures and include a variety of materials in the future.
Author Contributions: W.C. designed and performed the experiments, wrote the manuscript; Y.Z., X.Z. and L.L. assisted the analysis; F.L. supervised the research. All the authors reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding:
The authors are grateful for the financial support from NSF (National Science Foundation) grants CMMI-1625736 and EEC-1937128, and Intelligent System Center (ISC) at Missouri University of Science and Technology.