Microstructure Instance Segmentation from Aluminum Alloy Metallographic Image Using Different Loss Functions

Automatic segmentation of metallographic image is very important for the implementation of an automatic metallographic analysis system. In this paper, a novel instance segmentation framework of a metallographic image was implemented, which can assign each pixel to a physical instance of a microstructure. In this framework, we used the Mask R-CNN as the basic network to complete the learning and recognition of the latent feature of an aluminum alloy microstructure. Meanwhile, we implemented five different loss functions based on this framework and compared the influence of these loss functions on metallographic image segmentation performance. We carried out several experiments to verify the effectiveness of the proposed framework. In these experiments, we compared and analyzed six different evaluation metrics and provided constructive suggestions for the performance evaluation of metallographic image segmentation method. A large number of experimental results have shown that the proposed method can achieve the instance segmentation of an aluminum alloy metallographic image and the segmentation results are satisfactory.


Introduction
Aluminum alloy products have been widely used in machinery manufacturing, transportation, electrical, shipbuilding, automobile, aviation, aerospace, chemical industry, construction and other fields [1][2][3][4]. Their properties highly depend on the distribution, shape and size of microstructures. Metallographic image processing is an important tool for analyzing the microstructure of metal materials [5][6][7][8]. Recently, a large number of automatic metallographic image processing methods have been proposed, which greatly improve the efficiency of metallographic analysis [9][10][11]. According to the use of these methods, we can classify them into four categories: grain boundary extraction, quantitative calculation, microstructural classification and segmentation.
Metallographic image segmentation is an important part of the automatic metallographic analysis system and it is also a challenging task. It aims to segment and recognize the different microstructures in the given metallographic image. At present, more and more scholars are beginning to pay attention to how to accomplish this task efficiently, and many effective methods have been introduced. To date, proposed segmentation method, and their comparative analysis results are given. (3) We compared and analyzed six typical segmentation performance evaluation metrics, and obtained some conclusions which are helpful for the segmentation performance evaluation of aluminum alloy metallographic images. This paper is organized as follows: Section 1 introduces prior works and our contributions. In Section 2, we introduced the proposed method, including parameter learning and instance segmentation. Section 3 presents the experiment results' quantitative performance comparison, qualitative performance comparison and convergence analysis. The paper is concluded in Section 4.

Overview
The microstructure instance segmentation is an important part of the automated analysis system of aluminum alloy metallographic images. The basic framework of our proposed method is shown in Figure 1. It consists of two parts: parameter learning and instance segmentation. In the next section, we introduce the proposed method in detail.

Parameter Learning
The instance segmentation method maps a given image from image space to instance space. In this process, the mapping function is very important. Because of the complexity of this process, the form of the mapping function is very complex, so we cannot obtain the exact form directly. To solve this problem, we used Mask R-CNN to model this process. Mask R-CNN contains a large number of unknown parameters. The main task of parameter learning is to estimate all unknown parameters of Mask R-CNN according to the specific sample data. The specific flow chart is shown in Figure 1.
The input of the parameter learning strategy is a given metallographic image instance segmentation training dataset. In order to improve the quality of the training dataset, we need to preprocess the given training dataset by using the image processing methods, including image resize, image flipping, Gaussian smoothing, image denoising and contrast enhancement. Then, the network parameters are learned from this enhanced dataset.
Mask R-CNN is one of the most popular instance segmentation networks at present, which has satisfactory performance. Many methods are improved on the basis of Mask R-CNN. The typical Mask R-CNN consists of five main modules: Feature Pyramid Networks (FPN), Region Proposal Network (RPN), Region of Interest (ROI) Align, Regions with CNN Features (R-CNN) and Fully Convolutional Networks (FCN). The detailed description is as follows: (1) FPN uses feature maps from the bottom to the top, makes full use of the extracted features of each scale and can better mine multi-scale information. The input of this part is a given metallographic image and the output is a multi-resolution feature map of the metallographic image. In this module, we used a typical ResNet101 deep residual network to extract deep feature, which consists of 33 residual blocks (each residual block consists of 3 convolutional layers), 1 convolutional layer and 1 fully connected layer.
(2) RPN aims to obtain candidate regions. The input of this network is multi-resolution feature maps, and the output is a series of category probability and coordinates of the proposal regions. It consists of one 3 × 3 size convolutional layer, one 2k × 1 × 1 size convolutional layer and one 4k × 1 × 1 size convolutional layer. In this paper, the proportional parameter k = 3.
(3) ROI Align aims to obtain a fixed size feature map. The input of the network is the proposal region, and the output is the fixed size feature map. Compared with the typical ROI Pooling method, it eliminates the network quantization error and improves the segmentation effect of small objects.
(4) The tasks of R-CNN are classification and regression. The input of the network is the fixed size feature map, and the output is the predicted category probability and coordinate. This network consists of two convolution layers and two full connection layers.
(5) FCN aims to obtain the pixel-level segmentation results of metallographic images. Its input also is the fixed size feature map, and the output is the pixel-level segmentation image. This network consists of four 3 × 3 convolution layers, one deconvolution layer and one 1 × 1 convolution layer.
The overall loss function for training the Mask R-CNN can be denoted as follows: where θ is the network parameter, which is very important for generating the accurate instance segmentation results. λ rcnn and λ mask are the loss-balancing parameters. The RPN loss function L RPN is defined by where c and c * are the predicted and ground-truth labels, respectively. r and r * are the predicted and ground-truth regression targets, respectively. λ loc1 is the loss-balancing parameter. L cls (θ) is the softmax loss and L loc (θ) is the smooth L 1 loss. The R-CNN loss function L RCNN is defined by where t and t * are the predicted and ground-truth regression targets, respectively. λ loc2 is the loss-balancing parameter. The Mask loss function L MASK is defined by where m and m * are the predicted and ground-truth mask labels, respectively. L MASK (θ) is the standard binary cross-entropy loss. Using the maximum likelihood estimation, we can compute the network model parameters θ by solving the following optimization problem: The stochastic gradient descent (SGD) algorithm is used to solve this optimization problem.

Instance Segmentation
The microstructure instance segmentation aims to entail associating each pixel with a physical instance of a microstructure. As shown in Figure 1, we input the given aluminum alloy metallographic images and use the Mask R-CNN to achieve the microstructure instance segmentation. Suppose that we are given a training dataset D = {I n , y n ; n = 1 : N}. Here, I n is the n-th metallographic image and y n is the n-th ground-truth label. For easy description, we let f θ represent the Mask R-CNN, and the parameter θ is obtained by solving Equation (5). Therefore, the instance segmentation image y can be obtained by where I represents the given aluminum alloy metallographic image. To summarize, our proposed instance segmentation method in a form of a pseudo-code is done in Algorithm 1.
• R-CNN and Mask: Pool size, Mask pool size, Train ROI per image, ROI positive ratio, Mask shape, Detection max instances, Detection min confidence, Detection threshold.
• Learning rate and momentum: Learning rate, Learning momentum, Weight decay.
• Initialize parameter θ obtained from pretraining on the MSCOCO dataset .
Step 2: Optimize θ by using D: • While not converge do.
• Compute network parameter θ * by solving optimization problem (5) using SGD with momentum algorithm.

Loss Functions
As opposed to common natural scene images, in metallographic images such as the ones we are processing in this work, the microstructures occupy only a very small region of the background. This often causes that the predictions are strongly biased towards background. As a result the microstructures are often missing or only partially segmented.
Many improved loss functions have been proposed to solve the problem of data imbalance, so it is necessary to analyze the impact of these loss functions on the performance of metallographic image segmentation. However, there is still no research work in this area. To solve this problem, in this paper, we applied five different loss functions to metallographic image segmentation and compared their segmentation results.
The overall loss function for training the Mask R-CNN consists of three parts: RPN loss, R-CNN loss and Mask loss, as defined by Equation (1)-(4). RPN loss is used for classification and regression of foreground and background. R-CNN loss is used for classification and regression of specific categories. Mask loss is used for pixel level segmentation. Obviously, only Mask loss directly affects the accuracy of instance segmentation. Therefore, in this paper, we mainly focused on five different Mask loss functions, including L BCE , L DICE , L IOU , L Tversky and L SS . Let m * i and m i be the predicted and ground truth binary labels, respectively. The detailed description is as follows: (1) The loss function L BCE is the standard binary cross-entropy loss [35], which is defined by (2) The Dice loss is proposed in reference [36], which is able to solve the problem that the foreground occupies only a very small region of the background. It is defined by where ξ is the smoothing parameter.
(3) The IOU loss is proposed in [37], which is able to solve the problem that the two classes (foreground and background) are very imbalanced. Its function L IOU is defined by where ε is the smoothing parameter.
(4) The Tversky loss is proposed in reference [38], which is used to address the issue of data imbalance and achieve a much better trade-off between precision and recall. To define the Tversky loss function, we use the following formulation: where α and β control the magnitude of penalties for false positives and false negatives, respectively. (5) In [39], Brosch et al. used a combination of sensitivity and specificity, which can be used together to measure classification performance even for vastly unbalanced problems. This novel loss function L SS is defined by where γ is the sensitivity ratio used to assign different weights to the two terms.

Experimental Results
In order to verify the effectiveness of our proposed method, a large number of experiments were performed for the analysis of instance segmentation performance.

Experimental Setup
Datasets: In order to verify the proposed method, we built an experimental dataset, which contains 100 five-series aluminum alloy metallographic images. These aluminum alloy metallographic images were taken by a metallographic microscope and include three different types of phases, such as Mg 2 Si, aluminum and Fe-containing phase. The typical aluminum alloy metallographic images are shown in Figure 2. In this figure, the two images above are two typical aluminum alloy metallographic images, and the two images below are their ground truths. In the labelled images, we set aluminum as the background and labelled it in black. At the same time, we set microstructure as the object or foreground. It should be noted that in the semantic segmentation task, we labelled different instances of the same object with the same color. As opposed to the semantic segmentation, in the instance segmentation, we labelled different instances of the same object with different colors. For example, we labelled each instance microstructure with a different color, as shown in Figure 2. Implementation details: In the experiments, we used the cross validation method to ensure the accuracy of the evaluation results. We divided dataset D into five mutually exclusive subsets with the same size, D = D 1 ∪ D 2 ∪ D 3 ∪ D 4 ∪ D 5 , and D i ∩ D j = ∅ where (i =j). We picked four subsets as the training set and the remaining subset as the test set, and then obtained five experimental results, as shown in the second to sixth columns in Tables 1-6. In the process of training Mask R-CNN, we used a fixed learning rate γ 1 of 0.01, a momentum ψ 1 of 0.9 in SGD with momentum algorithm which can not only improve the training stability and accelerate the training process to certain extent, but also gives a certain ability to get rid of local optimum solution. In FPN, the backbone is Resnet101, the pyramid size is 256 and the strides are 4, 8, 16, 32 and 64. In RPN, the anchor stride is 1, the non-max suppression threshold is 0.7, the number of train anchors per image is 256, anchor ratios are 0.5, 1 and 2, and anchor scales are 32, 64, 128, 256 and 512. In R-CNN and FCN, the pool size is 7, the mask pool size is 14, the train ROI per image is 200, the ROI positive ratio is 0.33, the mask shape is 28 × 28, the detection max instance is 100, the detection min confidence is 0.7 and the detection threshold is 0.3. The training process reaches its convergence when the epoch number κ 1 = 100 and each epoch contains 100 steps. By using the coco pre-training model to initialize network parameters, we made the model converge faster and obtained a better result. Moreover, the L 2 regularization was used, and the regularization coefficient λ = 0.0001.

Evaluation Metrics
The aim of this section was to analyze the instance segmentation performance of the proposed method. Let D = {I n , y n } represent the test dataset and f θ represent our proposed method. We used six different evaluation metrics to evaluate this method, including Acc, Precision, Sn (Sensitivity, also named Recall), Sp (Specificity), IOU and F 1 . These metrics have been widely used in the literature. These six evaluation metrics are defined by the following formula: (1) The Acc can be calculated by the following formula where TN is the number of true negatives, TP is the number of true positives, FN is the number of false negatives and FP is the number of false positives.
(2) The Precision is defined by (3) The Sn is defined by (4) The Sp is defined by (5) The IOU is defined by (6) The F 1 is defined by

Performance Comparison
(1) Quantitative performance comparison In this experiment, we implemented five different mask loss functions on the basis of the proposed framework. For the easy description, we set f i (i = 1 : 5) to denote the i-th method. The first method f 1 uses loss function L BCE . Similarly, f 2 uses loss function L DICE , f 3 uses loss function L IoU , f 4 uses loss function L Tversky , and f 5 uses loss function L SS . In addition, we show the quantitative comparison results of six different evaluation metrics, including Acc, Precision, Sn, Sp, IOU and F 1 .
In Table 1, the comparison results of Acc are shown. The first column denotes five different instance segmentation methods which were used in this experiment. The second to sixth columns show the results of five different cross experiments. The last column in the table shows the median of five experimental results. The first row denotes the experiment number. From these experimental results, we can see that the Acc of the five methods can be more than 99%. Therefore, from these experimental results, we can conclude that the five different instance segmentation methods implemented in our proposed framework can obtain a good segmentation performance. However, we can clearly observed that the distribution of this evaluation metric is very dense, so it is difficult to evaluate the performance of the metallographic image segmentation method. The main reason for this problem is that the proportion of target and background in metallographic image is too small. Similarly, the comparison results of Sp are shown in Table 2. From these experimental results, we can see that the Sp of the five methods can reach up to more than 99%. Similarly to the Acc metric, this evaluation metric is also disturbed by background, so it cannot effectively evaluate the performance of the metallographic image segmentation method. The comparison results of Precision are shown in Table 3. From these experimental results, we can see that the precision of the five methods can reach more than 60%. Therefore, from these experimental results, we can see that the proposed five different instance segmentation methods can obtain satisfactory segmentation results. This metric can avoid the interference caused by large background, so it is more suitable to evaluate the performance of the metallographic image segmentation method.   Tables 4-6, respectively. We can see that the Sn of the five methods can reach more than 62%, the IOU of the five methods can reach more than 53%, and the F1 of five methods can reach more than 61%. These three metrics can also effectively evaluate the performance of the metallographic image segmentation method. These experimental results verify the effectiveness of the proposed framework.      To sum up, we can obtain the following conclusions: (1) Acc and Sp evaluation metrics cannot effectively evaluate the performance of the metallographic image segmentation method. Instead, we can use Precision, Sn, IOU and F1 four evaluation metrics to effectively evaluate the performance of the metallographic image segmentation method. (2) The segmentation framework proposed in this paper can complete the task of instance segmentation and the five different methods based on the framework can achieve a satisfactory segmentation performance.
In addition, we show the box plot of the experiment results obtained by the five methods for six evaluation metrics. As shown in Figures 3 and 4, the box plot is able to reflect the characteristics of the experimental data distribution. In the figures, from top to bottom, we can clearly observe the statistical characteristics of the experimental data, including upper extreme, upper quartile, median, lower quartile and lower extreme.
In Figure 3a, we show the box plot of Acc metric. The X-axis represents accuracy (Acc), and the Y-axis represents five different methods. From this figure, we can observe the following: (1) All median values are located between 0.999436 and 0.999551 and the length of the interval is less than 0.000115. (2) The experimental results are mainly distributed between 0.999270 and 0.999599 and the length of the interval is less than 0.000163. Therefore, we can conclude that these five methods can obtain a good segmentation performance. However, the performance of the segmentation algorithm cannot be evaluated effectively because the distribution of the metric values is too centralized.
In Figure 3b, we show the box plot of the specificity metric. The X-axis represents specificity(Sp) and the Y-axis represents the five different methods. From the figure, we can observe: (1) All median values are concentrated between 0.999601 and 0.999732, and the length of the interval is less than 0.000131. (2) The experimental results are mainly distributed between 0.999494 and 0.999882, and the length of the interval is less than 0.000388. Similarly to Acc, the distribution of metric value is too centralized, so it is not suitable to evaluate the performance of the segmentation method of the aluminum alloy metallographic image. In Figure 3c, we show the box plot of the precision metric. The X-axis represents precision, and the Y-axis represents the five different methods. From the figure, we can observe the following: (1) All median values are concentrated between 0.633345 and 0.676730, and the length of the interval is less than 0.043385. (2) The experimental results are mainly distributed between 0.590931 and 0.764655, and the length of the interval is less than 0.173724. Therefore, we can conclude that these five methods can get good segmentation performance and that this metric can be used to evaluate the performance of the aluminum alloy metallographic image segmentation method. Figure 4a shows the box plot of the sensitivity metric. We can observe: (1) All median values are concentrated between 0.658808 and 0.690875, and the length of the interval is less than 0.032067. (2) The experimental results are mainly distributed between 0.624957 and 0.756356, and the length of the interval is less than 0.131399. Therefore, we can conclude that the proposed segmentation framework of the aluminum alloy metallographic image has a satisfactory robustness.  In Figure 4b, we show the box plot of the IOU metric. From the figure, we can observe: (1) All median values are concentrated between 0.572862 and 0.590477, and the length of the interval is less than 0.017615.
(2) The experimental results are mainly distributed between 0.537472 and 0.653169, and the length of the interval is less than 0.115697. We can find that these five methods can obtain a good segmentation performance.
In Figure 4c, we show the box plot of F1 metric. From this figure, we can observe: (1) All median values are concentrated between 0.648007 and 0.663737, and the length of the interval is less than 0.01573.
(2) The experimental results are mainly distributed between 0.611533 and 0.737506, and the length of the interval is less than 0.125973. We can find that these five methods can get good segmentation performance. Similarly, we can conclude that the proposed segmentation framework of aluminum alloy metallographic image has satisfactory robustness.
In summary, we can make the following conclusions: (1) The proposed segmentation framework of a aluminum alloy metallographic image can effectively achieve the task of instance segmentation and five different methods based on this framework can obtain a satisfactory segmentation result. (2) The distributions of Acc and specificity metric values are too centralized so they are not suitable to evaluate the performance of the aluminum alloy metallographic image segmentation method. Instead, the Precision, Sn, IOU and F1 four evaluation metrics are suitable to evaluate the performance of segmentation method of the aluminum alloy metallographic image.
(2) Qualitative performance comparison To further evaluate the proposed instance segmentation framework of metallographic image, Figure 5 shows the qualitative performance comparison among the five different methods implemented in this framework, including the binary cross-entropy loss ( f 1 ), Dice loss ( f 2 ), IoU loss ( f 3 ), Tversky loss ( f 4 ) and SS loss ( f 5 ). Three original metallographic images are shown in Figure 5a and their corresponding ground truth results are shown in Figure 5b. In the ground truth, the Aluminum is represented in black, and the other instances of microstructure are represented in different colors. The instance segmentation results of the five different methods are shown in Figure 5c-g. From these results, we can observe that these five methods can accomplish the task of instance segmentation of microstructure in a metallographic image. For convenient observation, Figure 6 shows the local enlarged maps of instance segmentation results obtained by the five proposed methods, including the binary cross-entropy loss ( f 1 ), Dice loss ( f 2 ), IoU loss ( f 3 ), Tversky loss ( f 4 ) and SS loss ( f 5 ). The original local enlarged map is shown in Figure 6a and the corresponding ground truth is shown in Figure 6b.
In these local enlarged maps, we can see two instances of the Fe-containing phase. As opposed to semantic segmentation, the instance segmentation method proposed in this paper can not only segment different microstructures, but also segment different instances in the microstructure. As shown in Figure 6, we segmented different instances of the Fe-containing phase and marked them with different colors. From these experimental results, we can conclude that the instance segmentation methods proposed in this paper can achieve a satisfactory segmentation performance. The proposed instance segmentation framework can automatically realize the representation learning of different metal alloys by using the given samples. We do not need to design features manually for different segmentation objects. Therefore, this instance segmentation framework can also be used for other types of metal alloys. In addition, for the proposed instance segmentation framework, the discriminability of the object instance has an important impact on the segmentation performance. When the phases of a given metal alloy have a stronger discriminability, this framework can obtain better instance segmentation results.

Convergence Analysis
The aim of this experiment was to analyze the convergence of the proposed framework. The loss curves with time for five different methods are shown in Figure 7. From the Figure 7, we can observe that these five methods can converge and the methods f 5 have a better convergence speed than the other methods. This verifies the convergence of the proposed instance segmentation framework.

Conclusions
A new metallographic image segmentation framework is proposed, which can achieve the task of automatic instance segmentation for aluminum alloy metallographic images. This segmentation framework provides a powerful tool for the quantitative analysis of metallographic images. Based on this proposed framework, we implemented five different instance segmentation methods by using different loss functions. A large number of experimental results verified the effectiveness of the proposed method. In order to evaluate the performance of the metallographic image segmentation method effectively, we compared and analyzed six typical segmentation performance evaluation metrics, including Accuracy, Precision, Sensitivity, Specificity, IOU and F 1 measure. From the experimental results, we found that Precision, Specificity, IOU and F 1 measure can effectively evaluate the performance of metallographic image segmentation methods. In the future, we plan to investigate the use of weakly supervised learning methods for dealing with the problem of insufficiency in high-quality hand-labeled data samples.