Next Article in Journal
A Lightweight Chaotic Map-Based Key Agreement Scheme for the Advanced Metering Infrastructure
Next Article in Special Issue
Optimization of Well Placement in Carbon Capture and Storage (CCS): Bayesian Optimization Framework under Permutation Invariance
Previous Article in Journal
Three-Dimensional Probabilistic Semi-Explicit Cracking Model for Concrete Structures
Previous Article in Special Issue
3D Ultrasound Mosaic of the Whole Shoulder: A Feasibility Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Deep Learning Approach to Semantic Segmentation of Steel Microstructures

by
Jorge Muñoz-Rodenas
1,
Francisco García-Sevilla
1,2,*,
Valentín Miguel-Eguía
1,2,*,
Juana Coello-Sobrino
1,2 and
Alberto Martínez-Martínez
2
1
High Technical School of Industrial Engineering of Albacete, Castilla-La Mancha University, 02006 Albacete, Spain
2
Materials Science and Engineering Laboratory, Regional Development Institute, Castilla-La Mancha University, 02006 Albacete, Spain
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2024, 14(6), 2297; https://doi.org/10.3390/app14062297
Submission received: 20 February 2024 / Revised: 5 March 2024 / Accepted: 6 March 2024 / Published: 8 March 2024
(This article belongs to the Special Issue Novel Applications of Machine Learning and Bayesian Optimization)

Abstract

:

Featured Application

A segmentation tool for microconstituents recognition in steel optical micrographs.

Abstract

The utilization of convolutional neural networks (CNNs) for semantic segmentation has proven to be successful in various applications, such as autonomous vehicle environment analysis, medical imaging, and satellite imagery. In this study, we investigate the application of different segmentation networks, including Deeplabv3+, U-Net, and SegNet, each recognized for their effectiveness in semantic segmentation tasks. Additionally, in the case of Deeplabv3+, we leverage the use of pre-trained ResNet50, ResNet18 and MobileNetv2 as feature extractors for a comprehensive analysis of steel microstructures. Our specific focus is on distinguishing perlite and ferrite phases in micrographs of low-carbon steel specimens subjected to annealing heat treatment. The micrographs obtained using an optical microscope are manually segmented. Preprocessing techniques are then applied to create a dataset for building a supervised learning model. In the results section, we discuss in detail the performance of the obtained models and the metrics used. The models achieve a remarkable 95% to 98% accuracy in correctly labeling pixels for each phase. This underscores the effectiveness of our approach in differentiating perlite and ferrite phases within steel microstructures.

1. Introduction

The recognition of the microconstituents of a steel micrograph is a complicated task and only within the reach of highly qualified personnel with broad experience in the field of materials science. Manual identification of steel phases can be a tedious and error-prone task; therefore, machine learning (ML) models have emerged as valuable complements to the traditional visual inspection methods employed by metallurgists. In recent years, many studies have addressed the challenge of developing artificial intelligence techniques that enable computers to handle complex tasks, such as microstructure identification [1,2,3] and the inference of properties through these identification techniques utilizing ML has been investigated [4,5,6,7], yielding promising advancements. Nevertheless, given the complexity involved in microstructure identification, particularly within steel micrographs, the adoption of advanced techniques becomes necessary. For the realization of an effective image segmentation in the context of steel microstructures, a powerful tool, such as a deep neural network, is required.
In previous work [8], it was determined that, for the categorization of steel microstructures, convolutional neural networks exhibit a notable superiority over classical machine learning algorithms. The present study constitutes a continuation of the exploration of deep learning techniques within the domain of optical micrographs of carbon steels, with a focus on segmentation algorithms. These networks allow us to establish a labeling of each pixel according to the phase of the microconstituent to be classified by means of supervised learning methods.
Recent advancements in the field of steel microstructure identification using segmentation techniques can be found. The following are discussed to provide background and context for the present article.
Luengo et al. [9] present a comprehensive overview of AI techniques for metallographic image segmentation, utilizing two distinct datasets: The Ultra-High Carbon Steel Micrograph Database (UHCSM) and the Metallography Dataset from Additive Manufacturing of Steels (MetalDAM). The paper contributes significantly by introducing the novel dataset, MetalDAM, available at https://dasci.es/transferencia/open-data/metal-dam/, accessed on 23 June 2023, providing an updated taxonomy of segmentation methods and exploring various deep learning-based ensemble strategies. Ensemble models exhibit superior performance in segmentation, achieving an Intersection over Union (IoU) metric of 76.71 for the UHCS dataset and 67.77 for the MetalDAM dataset. However, the performance achieved in both datasets is low. The authors conclude that microstructure segmentation faces limitations due to the insufficient availability of large datasets, the absence of pre-trained models tailored to this domain, and the notable challenges related to generalization errors in machine learning methods.
Bulgarevich et al. [10] address the challenge of segmenting optical images of microstructures using a supervised machine learning approach. They employ the Random Forest (RF) algorithm along with image processing and segmentation protocols, including Euclidean distance conversion and structure tensor extraction, for accurate image analysis. This research recognizes the RF algorithm as a highly versatile method for segmenting various microstructures, such as ferrite, pearlite, bainite, martensite, and martensite–austenite, within steel microstructures. The results demonstrate that the segmentation quality achieved is practical and allows meaningful statistics on the volume fraction of each phase to be obtained.
Bachmann et al. [11] present an exhaustive approach for detecting prior austenite grains (PAGs) in Nital-etched micrographs of bainitic and martensitic steels. The study utilizes a correlative microscopy technique, combining a light optical microscope (LOM), a scanning electron microscope (SEM), and electron backscatter diffraction (EBSD). The detection of PAGs is accomplished through semantic segmentation using advanced deep learning (DL) methods, specifically U-NET in conjunction with DenseNet, applied to LOM images.
To ensure effective model evaluation, the authors emphasize the critical importance of accurately measuring grain sizes in the metallurgical structure of the material. Their experiments reveal an IoU of around 70%, indicating potential discrepancies between metric values and visual perception of model quality. Recognizing the limitations of traditional metrics like IoU and pixel accuracy, particularly in the context of grain size measurement within segmentation tasks, they propose a novel approach. To address this, they introduce a method for quantifying grain size distribution from segmentation maps, calculating the mean, median, and standard deviation. By binning detected grains into intervals of a specific width (500 µm2) and calculating probability density, they accurately assess segmentation quality compared with values of the ground truth and identify potential errors in grain size determination. The results show a mean error of 6.1% in average grain size, underscoring the high quality of the DL model.
Han et al. [12] introduced a segmentation method (CES) based on the extraction of center–environment features tailored for small material image samples. The proposed method is applied to several datasets that include carbon steels, titanium alloy, wood, and cross-sectional morphology of Pt-Al and WC-Co coating image data. Expert annotators are engaged in the process, drawing region-specific curves based on their domain knowledge. Additionally, the method takes advantage of several machine learning algorithms to achieve highly accurate segmentation. Notably, the results of the study indicate that the Gradient Boosting Decision Tree (GBDT) outperforms other methods in this context.
Additionally, a comparison is made with segmentation methods based on deep learning networks such as SegNet, PSPNet, and UNet++, which are found to be 10% higher in IoU and mean IoU metrics compared to the methodology used by the authors. This difference is attributed to the significantly fewer pixels annotated to create the masks using CES compared to deep learning methods. While the proposed method is commendable for its innovative approach and reduced annotation cost, it falls short in achieving comparable segmentation accuracy to deep learning algorithms. The observed 10% disparity in results highlights the limitations of this method, suggesting that a balance between annotation efficiency and segmentation performance has yet to be fully realized.
Kim et al. [13] displayed the segmentation of a low-carbon steel microstructure without the need for labeled images, employing a deep learning approach. Specifically, a convolutional neural network combined with the Simple Linear Iterative Clustering (SLIC) superpixel algorithm. By leveraging a diverse range of microstructure optical images containing ferrite, pearlite, bainite, and martensite, the model effectively distinguished and delineated regions corresponding to each constituent phase.
Breumier et al. [14] trained a U-Net model to perform the segmentation of bainite, ferrite and martensite on EBSD maps using the kernel average misorientation and the pattern quality index as input. The model can differentiate the three constituents with a 92% mean accuracy in the test results.
Chaurasia et al. [15] proposed a versatile approach for classifying multiphase steels. It involves generating 3D polycrystalline microstructure templates using the Johnson–Mehl–Avrami–Kolmogorov (JMAK) kinetic model, creating realistic single-phase microstructures through nucleation and growth concepts. Cropped images of pearlite and ferrite are strategically placed on these templates to synthesize accurately labeled ferrite–pearlite microstructures. Subsequently, a deep learning architecture, UNET, is trained using synthetic microstructures and tested on real microstructures. The results, compared with manually annotated microstructures, demonstrate a prominent level of agreement, reaching an accuracy of about 98%.
Liu et al. [16] conducted a study that focuses on recognizing the microconstituents of ferrite and pearlite and making predictions of their mechanical properties. For this purpose, they elaborate a residual U-shaped network based on ResNet32 to identify grain boundaries and their size, obtaining better segmentation results than the conventional neural network FCN-8s, reaching over 93% in frequency weighted intersection over union (FWIoU).
Azimi et al. [17] utilized fully convolutional networks (FCNs) along with a max-voting scheme for the classification of martensite, bainite, pearlite, and ferrite phases in low-carbon steels, achieving a classification accuracy of 93.94%.
Recently, works similar to the research in this paper have been published, such as Ostormujof et al. [18] that accomplished the successful classification of ferrite–martensite dual-phase steel microstructures through the implementation of the U-Net model and achieved pixel-wise accuracies of around 98%, as well as Xie et al. [19], who provided a comparison with different segmentation architectures for steel micrographs like DeepLabv3+, Enet, Unet, and PSPnet. They propose a new semantic network based on the improvement of a fully convolutional network (FCN) with the atrous spatial pyramid pooling (ASPP) technique for feature extraction, surpassing the previous ones according to the metric Intersection over Union (IoU), achieving a performance of up to 80.43%. In our specific study, we employed LOM images as opposed to the SEM images used in the referenced article. This choice might introduce differences in the characteristics and features of the micrographs, potentially impacting the performance of segmentation algorithms. It is worth noting that the selection of imaging modalities can influence the choice of segmentation techniques and their effectiveness in each context. Ma et al. [20] conducted training on two datasets comprising images of steel alloys, one consisting of carbide and the other predominantly of ferrite microconstituents. They employed PSPNet and DeepLabv3+ with ResNet18 segmentation networks. The authors proposed enhancing the receptive field of the convolutional neural network (CNN) to improve contextual perception of images without altering the network architecture. This was achieved by scaling the original image size to 0.5 times during image loading. Additionally, the authors established an automated quantitative analysis of the microstructures using OpenCV software after segmentation, extracting morphological information from classified pixels to obtain the average carbide radius and the number of carbides. The results, evaluated on original large-size images, yielded a mean Intersection over Union (mIoU) score of approximately 80%.
Additionally, Bihani et al. [21] present, in this case in the context of mudrock SEM images, a method for filtering and segmentation using deep learning to identify pore and grain features named MudrockNet, which is based on DeepLab-v3+. The predictions for the test data obtain a mean IoU of 0.6663 for silt grains, 0.7797 for clay grains, and 0.6751 for pores.
Automated phase identification in steel microstructures is a rapidly evolving field. While previous studies have addressed segmentation challenges with varying degrees of success, several issues remain unresolved, including the application of segmentation to low-magnification optical images and the scarcity of dedicated steel microstructure image databases. To address these shortcomings, this research delves into the exploration of optimal architectures for this problem, specifically targeting the development of a robust segmentation model capable of automatically identifying pearlite and ferrite phases in annealed steel microstructures, which have a major influence on the properties and behavior of annealed steels.
It can be concluded that numerous studies have explored the segmentation of steel microstructures, generating segmentation models created from ad hoc networks with varying degrees of success. Nevertheless, most experiments are conducted using data obtained from scanning electron microscopy (SEM) images, rendering them unsuitable for samples produced with optical technology. This work aims to delve deeper into obtaining segmentation models for the identification of pearlite and ferrite in images coming from optical microscopy. The Deeplabv3+ and U-Net architectures will be employed for the segmentation of LOM steel microstructure images. Leveraging convolutional neural networks, these architectures have demonstrated effectiveness in image segmentation across various domains.
The methodology employed in this study integrates ImageJ with trainable Weka segmentation, Random Forest classifier training, and data augmentation to prepare a diverse dataset for the subsequent creation and training of U-Net, SegNet and DeepLabV3+ segmentation models for steel micrograph analysis. In the following sections, we will delve into the methodology and analyze the results and discussions.

2. Materials and Methods

2.1. Steel Specimens and LOM Images

The experimental procedures involved the utilization of three steel samples that underwent annealing treatment to produce ferrite and pearlite microstructures, with their respective chemical compositions detailed in Table 1. Metallographic sample preparation was conducted by grinding and polishing according to the typical procedure used for optical microscopy and were etched with Nital-1-(alcoholic nitric acid at 1%) for 30 s, permitting observation of the grain boundaries and microstructures to be distinguished.
For the development of segmentation models, a dataset comprising 34 steel micrography images, each with a resolution of 2080 × 1542 pixels, was compiled. The selection of these images aimed to provide a comprehensive representation of the diverse microstructural features inherent in various steel samples.
As seen in Figure 1a,b, once the steel undergoes an annealing heat treatment, a crystalline structure is obtained, revealing two distinctive zones. One zone is characterized by ferrite, appearing as a whitish matrix, while the other zone appears darker with a lamellar constituent, indicating the presence of pearlite. The normalizing heat treatment results in a similar microstructure, albeit with finer constituents, as shown in Figure 1c. Figure 2 provides a detailed depiction of these constituents. As observed in Figure 2b, the pearlite consists of alternating fine bands of ferrite and cementite, maintaining a dark aspect, as mentioned earlier.

2.2. Image Preprocessing

In the preprocessing stage of the segmentation deep learning experiment carried out in this work, a comprehensive approach was implemented to enhance the quality and diversity of the dataset. This involved the initial creation of masks using specialized software, followed by a thorough data augmentation process. ImageJ, with its trainable Weka segmentation plugin, was utilized for the creation of masks [22,23]. This allowed for the creation of masks, outlining specific regions of interest within the steel microstructure images. Manual annotations made by the authors guided the algorithm in learning the features necessary for accurate segmentation. The annotations of the pearlite areas have been manually performed on two of the original images for each sample. Subsequently, the trainable Weka segmentation option has been applied to the rest of the images to automate the generation of masks since manual mask generation is a time-consuming process and prone to errors. Thus, by using the ImageJ segmentation assistant, the quality of the masks was improved, and the processing time was reduced. Nevertheless, the authors reviewed each generated mask, making adjustments to images containing any errors.
The trainable classifier employed for mask creation was based on the Random Forest algorithm. Configured with 200 decision trees, this algorithm demonstrated robustness in handling the complexity of steel micrography images. The training process involved feeding the algorithm with the manually annotated masks, allowing it to learn and generalize patterns within the dataset. Following the initial mask creation and classifier training, a data augmentation step was introduced to enhance the dataset’s diversity. This involved applying various transformations such as rotation, scaling, and flipping to the original 34 steel micrography images. The augmented dataset served to increase the model’s ability to generalize across a broader range of microstructural variations.
Each original image captured by the optical microscope has a resolution of 2080 × 1542 pixels. For the execution of the experiments, we have chosen to use images of 224 × 224 pixels. This choice is based on various practical and efficiency considerations. Smaller images demand fewer computational resources for both training and inference. The utilization of 224 × 224 images enables the model to execute more rapidly. Furthermore, for the transfer learning from pretrained models utilized in the experiments, such as ResNet50, ResNet18, or MobileNetV2, these models are often trained on massive datasets with specific image sizes. Employing the same image size during both training and inference eases the transfer of knowledge from pretrained models, as the initial layers are tailored to that size. It is important to note that although 224 × 224 pixels is a commonly used size, it is not a strict constraint. The image size can be adjusted to conduct experiments with a different set of images, but it might be necessary to adjust other model parameters and, in some cases, retrain the model to accommodate the new input size.
For data augmentation, each original image and mask were cropped into 54 images of size 224 × 224 pixels. Subsequently, rotations of 90°, 180°, and 270° were applied to the cropped images, resulting in 216 images for each original image. This process yielded a final dataset of 7344 images. These images were distributed randomly, with 70% allocated for training data, 20% for validation data, and 10% for the test data.
Taking into consideration the information provided before, an example of the result of the cropping and rotating images can be appreciated in Figure 3. The masking process intended to isolate the ferrite areas contained in the images can also be observed.
After the preprocessing stage was completed, the model creation phase was initiated. This involved training various segmentation models to identify important features in the preprocessed dataset. Using the enriched dataset, different model setups and methods were experimented with. The aim was to determine which approach worked best for accurately outlining the steel microstructure images. In the following section, the training process details and metrics are described.

2.3. Segmentation Model Training

In executing the experiments, various segmentation networks were employed to establish a comparative analysis and identify the most suitable one for the context of microstructures in steels subjected to an annealing heat treatment. The segmentation networks utilized include U-Net [24], SegNet [25], and DeepLabV3+ [26]. Diverse pre-trained backbones, such as ResNet18, ResNet50, and MobileNetV2, were employed for the latter.
The same algorithm has been applied to all networks. Initially, each model undergoes training using the selected images for training and validation. Once the model is generated, it is applied to the test images, subsequently obtaining various metrics [27] that facilitate result analysis. In Appendix A, comprehensive details regarding each layer within the architectures of the segmentation networks utilized are presented in tabular form. The description of the networks employed in the experiments is provided next.

2.3.1. U-Net

U-Net is commonly used in the context of semantic image segmentation, and its effectiveness in capturing both global context and fine details makes it particularly well-suited for tasks such as medical image segmentation and satellite image analysis, and it is also employed for the segmentation of materials microstructures [28,29]. U-Net is characterized by a U-shaped architecture with an encoder–decoder structure and skip connections. The encoder, on the left side of the U, consists of down-sampling layers that capture hierarchical features from the input image. The decoder, on the right side, involves up-sampling layers and skip connections that preserve high-resolution details and aid in precise localization. Skip connections connect corresponding encoder and decoder stages, facilitating the retention of spatial information. The bottleneck at the base of the U combines abstract features from the encoder with detailed spatial information from the decoder.
In the conducted experiments with U-Net, the bias term of all convolutional layers is initialized to zero. Additionally, the convolution layer weights in the encoder and decoder subnetworks are initialized using the ‘He’ weight initialization method [30]. The encoder–decoder has a depth of 3, resulting in a U-Net comprising 46 layers with 48 connections. The most relevant hyperparameters configured for training include the Adam optimizer, a learning rate of 0.001, L2 regularization, and a maximum number of epochs set to 2. Experiments were conducted by increasing the number of epochs, yet substantial improvements were not achieved; instead, there was an increase in computational time. The loss layer utilizes cross-entropy loss to quantify the disparity between the predicted values and their corresponding actual data. The formula is expressed as follows in Equation (1).
l o s s = 1 N n = 1 N i = 1 K w i t n i ln y n i  
Here, N represents the number of samples, K is the number of classes, wi denotes the weight for class i, tni is the indicator of whether the nth sample belongs to the ith class, and yni represents the output for sample n for class i.

2.3.2. SegNet

SegNet [31] is a convolutional neural network architecture tailored for semantic image segmentation. Its distinctive features include a conventional encoder–decoder structure, where the encoder captures hierarchical features, and the decoder reconstructs the segmented output through up-sampling layers. Notably, SegNet utilizes max-pooling indices from the encoder during decoding to recover spatial information lost during down-sampling, contributing to accurate segmentation. The network leverages feature maps from the encoder for precise localization. Employing a class-specific softmax activation in the final layer enables pixel-wise classification. Although SegNet lacks skip connections between the encoder and decoder, its design, particularly the incorporation of pooling indices, makes it well-suited for tasks demanding detailed pixel-wise segmentation.
In this study, the segmentation experiments have utilized the SegNet architecture in conjunction with VGG16 [32,33]. In this context, VGG16 plays a role as a feature extractor, capturing high-level semantic information from the input images. It complements the segmentation capabilities of SegNet, contributing to an enhanced overall performance of the segmentation model.

2.3.3. DeepLabV3+

The segmentation models were built by integrating the DeepLabV3+ architecture with various pre-trained backbones, including ResNet50, ResNet18 [34], and MobileNetV2 [35]. This diverse combination harnesses the strengths of DeepLabV3+ for pixel-wise segmentation and different backbone architectures for feature extraction. The models were trained using an augmented dataset, integrating insights obtained from the Random Forest classifier.
In Figure 4, a schematic representation of the DeepLabV3+ architecture is shown. The model employs a pretrained backbone (ResNet50, ResNet18 and MobileNetv2) for feature extraction. The Atrous Spatial Pyramid Pooling (ASPP) module is employed to capture multi-scale contextual information. The subsequent decoder, featuring skip connections, refines and up-samples the features to produce a high-resolution semantic segmentation map. This architecture provides detailed pixel-wise predictions for accurate object recognition in images.

2.4. Training Parameters, Metrics and Other Details

The training process involved optimizing various parameters, including learning rates, batch sizes, and epochs. A validation set was used to monitor the model’s performance and prevent overfitting.
When conducting experiments, identical training parameters were chosen to ensure a more faithful comparison of results. Adam optimizer with a learning rate of 0.001 and a maximum number of epochs set to 3 were selected. Additionally, the ‘Validation Patience’ parameter was set to 4 to avoid unnecessary computation. All the aforementioned information is summarized in Table 2, which compiles essential data regarding the networks for computational time considerations.
To evaluate the performance of the segmentation models, various metrics were employed. Accuracy measures the proportion of correctly classified pixels to the total number of pixels in each class, as defined by the ground truth, and its score is calculated using Equation (2), where TP represents true positives, and FN represents false negatives. Mean Accuracy, computed as the average Accuracy of all classes across all images, provides an aggregate assessment of model performance. Global Accuracy, on the other hand, considers the ratio of correctly classified pixels, irrespective of class, to the total number of pixels.
A c c u r a c y s c o r e = T P T P + F N
Additionally, the Boundary F1 (BF) score, known as the BF Score, evaluates the alignment between predicted boundaries and true boundaries. Calculated using Equation (3), precision assesses the accuracy of the predicted boundaries, while recall gauges the model’s ability to capture true boundaries. A higher BF score indicates better agreement between predicted and true boundaries. The Mean BF Score offers an aggregate measure of boundary prediction performance across all classes and images.
B F s c o r e = 2 × p r e c i s i o n × r e c a l l p r e c i s i o n + r e c a l l
Furthermore, the Intersection over Union (IoU) score assesses the ratio of correctly classified pixels to the total number of ground truth and predicted pixels in each class. The IoU score is computed using Equation (4), where TP represents true positives, FP represents false positives, and FN represents false negatives. The Mean IoU provides an average IoU score across all classes and images, offering insights into the overall segmentation accuracy of the model.
I o U s c o r e = T P T P + F P + F N
The trained segmentation models were evaluated on a separate test set of steel micrograph images not seen during training. The metrics used for the evaluation of the models have been previously specified.
All experiments were conducted on a robust computing system equipped with an Intel(R) Core(TM) i7-5930K CPU @ 3.50 GHz, DIMM 64 GB RAM, and an NVIDIA® GEFORCE RTX 3080 (10 GB). MATLAB® (R2023b, The MathWorks, Inc., Natick, MA, USA) was utilized for coding and generating segmentation models, and ImageJ was employed for mask creation, guaranteeing a stable and reproducible computational environment. All codes performed for this research are available upon request.

3. Results

Five different models were trained using 5141 training images and 1469 validation images. The training results are presented in Table 3, and the training progress can be observed in Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9 for each of the models. The progression of both accuracy and the loss function is depicted.
The model was then applied to 734 test images that it had not seen previously. The results of the test experiments are included in Table 4, which displays the usual metrics for segmentation problems. Additionally, the confusion matrices are shown in Figure 10. It can be inferred that the DeepLabv3+ model with MobileNetv2 achieves a performance improvement, though only slightly surpassing the other networks that also accurately solve the segmentation problem.
To visually explore the results, two test images were utilized, and each was processed by every trained model. These images are depicted in Figure 11. The segmentation performed by each model can be observed for comparison with the original sample, as well as with the mask or ground truth generated during data preprocessing before training. The objective is to distinguish between the two microconstituents: ferrite as the matrix element represented by the lighter zone in the micrograph and pearlite composed of alternating layers of cementite and ferrite. It is crucial to emphasize that the ferrite constituting the pearlite should not be segmented together with the ferrite, forming the matrix of the microstructure.
In the training phase, it can be observed that the SegNet model requires more iterations and, consequently, more computational time to achieve maximum accuracy, as depicted in Figure 6, exceeding more than twice the others. However, its final training accuracy does not differ significantly from the rest, trailing only by a couple of percentage points compared to DeepLabv3+, which yields the best results. This increased number of iterations is due to the reduction in MiniBatchSize to 16 samples for SegNet, compared to the MiniBatchSize of 32 samples used for the other networks. Notably, when employing a MiniBatchSize of 32 samples, the performance of SegNet decreases to approximately 91% to 93%, emphasizing the need to reduce the MiniBatchSize to 16 for optimal performance. Despite the longer training time associated with the reduced MiniBatchSize, SegNet’s final accuracy remains competitive, showcasing its ability to achieve high performance even with a smaller batch size. As shown in Figure 7, Figure 8 and Figure 9 achieving maximum accuracy during training requires only a few iterations for DeepLabv3+ segmentation networks. The encoder that leads to the shortest training time is ResNet18, which has the fewest layers among the three. However, MobileNetV2 exhibits slightly superior results to the other networks, achieving excellent scores in all metrics as indicated in Table 4.
During the training process of the segmentation model, anomalies or irregularities that might occur in individual images are likely to diminish or be addressed as the model learns from a diverse set of images. The learning process, driven by probabilities, helps the model to generalize and effectively segment objects or regions of interest in images, even in cases where there might be variations or anomalies in the data. In this case, the model might not learn extensively about these imperfections due to their limited occurrence in the training data.

4. Discussion

Different random test samples were selected for segmentation using the obtained models. The accuracy and loss values in Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9 are obtained during training. The overall values, as shown in Table 4, are calculated based on test images that the model had not previously seen. These test values closely resemble those observed during training, indicating that no “overfitting” has occurred in any of the models.
Algorithms with lower loss rates and higher accuracy during training may demonstrate superior generalization performance on unseen data, resulting in higher final accuracy.
In Figure 11 (segmented samples), a comparison of 224 × 224 images of annealed steel is presented, highlighting the region considered as perlite in green hues and the matrix or ferrite, which appears light in the original image and violet in the segmented image. The grayscale image corresponds to the mask generated during data preprocessing. Although the results are very similar, subtle differences can be perceived. It is important to note that some masks were created manually, while the rest underwent preprocessing using a Random Forest algorithm with WEKA software (ImageJ2-Figi GPLv3+, Waikato University, New Zeland). This process may have introduced errors in pixel annotation in some masks, causing the model to learn from imperfect images. As shown in microstructure A, there is an error in the bottom right part of the mask (slightly pointed area), Figure 12a, where the ferrite zone connecting with the one in the top right has not been completely obtained. This flaw is highlighted in red in Figure 12b. This error has also been transferred to the training models, which consequently failed to detect the ferrite in that zone. However, a slight improvement in the segmented area compared to the mask is noticeable. Similarly, in image B, impurities can be observed on the ferrite area (two dots on the left side), which were also transferred to the training dataset. In this case, models like DeepLabv3 with ResNet50-18 have effectively eliminated these impurities during the segmentation process.
As shown in Figure 13, another test sample was selected, and errors in the identification of ferrite and perlite were marked on the corresponding mask image. The segmented images by the models demonstrate improvement over the mask created for training. We can observe that in the original image, it is difficult to appreciate the laminar structure of perlite. Although ferrite, as the matrix element of the microstructure, should be easily detected due to its more uniform and clear texture, the models encounter issues in some areas, such as the band in Figure 13b, which is indicated in the red rectangular area. Considering perlite as alternating layers of ferrite and cementite, the thickness of this bright band between two darker zones causes the models to interpret that area as perlite. The models with DeepLabv3+/MobileNetv2, shown in Figure 13f and, to some extent, U-Net, manage to enhance segmentation in that specific area.

5. Conclusions

This study investigates the identification of microconstituents, specifically ferrite and perlite, in optical metallographic images of steels using deep learning networks specialized in image segmentation problems. The work encompasses challenging tasks, particularly in obtaining and preparing the images. While other studies often concentrate on detecting various microconstituents using electron microscopy, where differences are typically more pronounced, our focus is on optical images. As the core of this study is grounded in optical images, a preliminary investigation has been undertaken on microconstituents derived from annealing heat treatments.
Segmenting distinct and clearly identifiable textures, such as perlite and ferrite, could be approached using classical algorithms with the application of conventional computer vision filters or classical machine learning techniques. However, in other studies conducted by the authors, it has been confirmed that the application of deep learning techniques to steel metallographic images enhances the metrics compared to classical machine learning algorithms. The advantage of approaching the study through deep learning is the creation of models that can be integrated into more general models in the future through transfer learning or model ensemble, thereby forming a superior structure.

Author Contributions

Conceptualization, F.G.-S. and V.M.-E.; methodology, J.M.-R., F.G.-S., J.C.-S., A.M.-M. and V.M.-E.; software, J.M.-R. and F.G.-S.; validation, J.C.-S., A.M.-M. and J.M.-R.; formal analysis, J.M.-R., F.G.-S. and V.M.-E.; investigation, J.M.-R., A.M.-M. and J.C.-S.; resources, A.M.-M., J.C.-S. and V.M.-E.; data curation, J.M.-R. and F.G.-S.; writing original draft preparation, J.M.-R.; writing review and editing, J.M.-R., F.G.-S. and V.M.-E.; visualization, F.G.-S. and V.M.-E.; supervision, F.G.-S., J.C.-S., A.M.-M. and V.M.-E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. U-Net Layer information (Activation format: S—Spatial; C–Channel; B—Batch).
Table A1. U-Net Layer information (Activation format: S—Spatial; C–Channel; B—Batch).
NameTypeActivationsLearnables
1ImageInputLayerImage Input22422431SSCB0
2Encoder-Stage-1-Conv-12-D Convolution224224641SSCB1792
3Encoder-Stage-1-ReLU-1ReLU224224641SSCB0
4Encoder-Stage-1-Conv-22-D Convolution224224641SSCB36,928
5Encoder-Stage-1-ReLU-2ReLU224224641SSCB0
6Encoder-Stage-1-MaxPool2-D Max Pooling112112641SSCB0
7Encoder-Stage-2-Conv-12-D Convolution1121121281SSCB73,856
8Encoder-Stage-2-ReLU-1ReLU1121121281SSCB0
9Encoder-Stage-2-Conv-22-D Convolution1121121281SSCB147,584
10Encoder-Stage-2-ReLU-2ReLU1121121281SSCB0
11Encoder-Stage-2-MaxPool2-D Max Pooling56561281SSCB0
12Encoder-Stage-3-Conv-12-D Convolution56562561SSCB295,168
13Encoder-Stage-3-ReLU-1ReLU56562561SSCB0
14Encoder-Stage-3-Conv-22-D Convolution56562561SSCB590,080
15Encoder-Stage-3-ReLU-2ReLU56562561SSCB0
16Encoder-Stage-3-DropOutDropout56562561SSCB0
17Encoder-Stage-3-MaxPool2-D Max Pooling28282561SSCB0
18Bridge-Conv-12-D Convolution28285121SSCB1,180,160
19Bridge-ReLU-1ReLU28285121SSCB0
20Bridge-Conv-22-D Convolution28285121SSCB2,359,808
21Bridge-ReLU-2ReLU28285121SSCB0
22Bridge-DropOutDropout28285121SSCB0
23Decoder-Stage-1-UpConv2-D Transposed Convolution56562561SSCB524,544
24Decoder-Stage-1-UpReLUReLU56562561SSCB0
25Decoder-Stage-1-DepthConcatenationDepth concatenation56565121SSCB0
26Decoder-Stage-1-Conv-12-D Convolution56562561SSCB1,179,904
27Decoder-Stage-1-ReLU-1ReLU56562561SSCB0
28Decoder-Stage-1-Conv-22-D Convolution56562561SSCB590,080
29Decoder-Stage-1-ReLU-2ReLU56562561SSCB0
30Decoder-Stage-2-UpConv2-D Transposed Convolution1121121281SSCB131,200
31Decoder-Stage-2-UpReLUReLU1121121281SSCB0
32Decoder-Stage-2-DepthConcatenationDepth concatenation1121122561SSCB0
33Decoder-Stage-2-Conv-12-D Convolution1121121281SSCB295,040
34Decoder-Stage-2-ReLU-1ReLU1121121281SSCB0
35Decoder-Stage-2-Conv-22-D Convolution1121121281SSCB147,584
36Decoder-Stage-2-ReLU-2ReLU1121121281SSCB0
37Decoder-Stage-3-UpConv2-D Transposed Convolution224224641SSCB32,832
38Decoder-Stage-3-UpReLUReLU224224641SSCB0
39Decoder-Stage-3-DepthConcatenationDepth concatenation2242241281SSCB0
40Decoder-Stage-3-Conv-12-D Convolution224224641SSCB73,792
41Decoder-Stage-3-ReLU-1ReLU224224641SSCB0
42Decoder-Stage-3-Conv-22-D Convolution224224641SSCB36,928
43Decoder-Stage-3-ReLU-2ReLU224224641SSCB0
44Final-ConvolutionLayer2-D Convolution22422421SSCB130
45Softmax-LayerSoftmax22422421SSCB0
46Segmentation-LayerPixel Classification Layer22422421SSCB0
Table A2. SegNet Layer information (Activation format: S—Spatial; C–Channel; B—Batch).
Table A2. SegNet Layer information (Activation format: S—Spatial; C–Channel; B—Batch).
NameTypeActivationLearnables
1conv1_12-D Convolution224224641SSCB1792
2bn_conv1_1Batch Normalization224224641SSCB128
3relu1_1ReLU224224641SSCB0
4conv1_22-D Convolution224224641SSCB36,928
5bn_conv1_2Batch Normalization224224641SSCB128
6relu1_2ReLU224224641SSCB0
7pool12-D Max Pooling 0
8conv2_12-D Convolution1121121281SSCB73,856
9bn_conv2_1Batch Normalization1121121281SSCB256
10relu2_1ReLU1121121281SSCB0
11conv2_22-D Convolution1121121281SSCB147,584
12bn_conv2_2Batch Normalization1121121281SSCB256
13relu2_2ReLU1121121281SSCB0
14pool22-D Max Pooling 0
15conv3_12-D Convolution56562561SSCB295,168
16bn_conv3_1Batch Normalization56562561SSCB512
17relu3_1ReLU56562561SSCB0
18conv3_22-D Convolution56562561SSCB590,080
19bn_conv3_2Batch Normalization56562561SSCB512
20relu3_2ReLU56562561SSCB0
21conv3_32-D Convolution56562561SSCB590,080
22bn_conv3_3Batch Normalization56562561SSCB512
23relu3_3ReLU56562561SSCB0
24pool32-D Max Pooling 0
25conv4_12-D Convolution28285121SSCB1,180,160
26bn_conv4_1Batch Normalization28285121SSCB1024
27relu4_1ReLU28285121SSCB0
28conv4_22-D Convolution28285121SSCB2,359,808
29bn_conv4_2Batch Normalization28285121SSCB1024
30relu4_2ReLU28285121SSCB0
31conv4_32-D Convolution28285121SSCB2,359,808
32bn_conv4_3Batch Normalization28285121SSCB1024
33relu4_3ReLU28285121SSCB0
34pool42-D Max Pooling 0
35conv5_12-D Convolution14145121SSCB2,359,808
36bn_conv5_1Batch Normalization14145121SSCB1024
37relu5_1ReLU14145121SSCB0
38conv5_22-D Convolution14145121SSCB2,359,808
39bn_conv5_2Batch Normalization14145121SSCB1024
40relu5_2ReLU14145121SSCB0
41conv5_32-D Convolution14145121SSCB2,359,808
42bn_conv5_3Batch Normalization14145121SSCB1024
43relu5_3ReLU14145121SSCB0
44pool52-D Max Pooling 0
45decoder5_unpool2-D Max Unpooling14145121SSCB0
46decoder5_conv32-D Convolution14145121SSCB2,359,808
47decoder5_bn_3Batch Normalization14145121SSCB1024
48decoder5_relu_3ReLU14145121SSCB0
49decoder5_conv22-D Convolution14145121SSCB2,359,808
50decoder5_bn_2Batch Normalization14145121SSCB1024
51decoder5_relu_2ReLU14145121SSCB0
52decoder5_conv12-D Convolution14145121SSCB2,359,808
53decoder5_bn_1Batch Normalization14145121SSCB1024
54decoder5_relu_1ReLU14145121SSCB0
55decoder4_unpool2-D Max Unpooling28285121SSCB0
56decoder4_conv32-D Convolution28285121SSCB2,359,808
57decoder4_bn_3Batch Normalization28285121SSCB1024
58decoder4_relu_3ReLU28285121SSCB0
59decoder4_conv22-D Convolution28285121SSCB2,359,808
60decoder4_bn_2Batch Normalization28285121SSCB1024
61decoder4_relu_2ReLU28285121SSCB0
62decoder4_conv12-D Convolution28282561SSCB1,179,904
63decoder4_bn_1Batch Normalization28282561SSCB512
64decoder4_relu_1ReLU28282561SSCB0
65decoder3_unpool2-D Max Unpooling56562561SSCB0
66decoder3_conv32-D Convolution56562561SSCB590,080
67decoder3_bn_3Batch Normalization56562561SSCB512
68decoder3_relu_3ReLU56562561SSCB0
69decoder3_conv22-D Convolution56562561SSCB590,080
70decoder3_bn_2Batch Normalization56562561SSCB512
71decoder3_relu_2ReLU56562561SSCB0
72decoder3_conv12-D Convolution56561281SSCB295,040
73decoder3_bn_1Batch Normalization56561281SSCB256
74decoder3_relu_1ReLU56561281SSCB0
75decoder2_unpool2-D Max Unpooling1121121281SSCB0
76decoder2_conv22-D Convolution1121121281SSCB147,584
77decoder2_bn_2Batch Normalization1121121281SSCB256
78decoder2_relu_2ReLU1121121281SSCB0
79decoder2_conv12-D Convolution112112641SSCB73,792
80decoder2_bn_1Batch Normalization112112641SSCB128
81decoder2_relu_1ReLU112112641SSCB0
82decoder1_unpool2-D Max Unpooling224224641SSCB0
83decoder1_conv22-D Convolution224224641SSCB36,928
84decoder1_bn_2Batch Normalization224224641SSCB128
85decoder1_relu_2ReLU224224641SSCB0
86decoder1_conv12-D Convolution22422421SSCB1154
87decoder1_bn_1Batch Normalization22422421SSCB4
88decoder1_relu_1ReLU22422421SSCB0
89softmaxSoftmax22422421SSCB0
90pixelLabelsPixel Classification Layer22422421SSCB0
Table A3. DeepLabv3+/ResNet50 Layer information (Activation format: S—Spatial; C–Channel; B—Batch).
Table A3. DeepLabv3+/ResNet50 Layer information (Activation format: S—Spatial; C–Channel; B—Batch).
NameTypeActivationsLearnables
1input_1Image Input22422431SSCB0
2conv12-D Convolution112112641SSCB9472
3bn_conv1Batch Normalization112112641SSCB128
4activation_1_reluReLU112112641SSCB0
5max_pooling2d_12-D Max Pooling5656641SSCB0
6res2a_branch2a2-D Convolution5656641SSCB4160
7bn2a_branch2aBatch Normalization5656641SSCB128
8activation_2_reluReLU5656641SSCB0
9res2a_branch2b2-D Convolution5656641SSCB36,928
10bn2a_branch2bBatch Normalization5656641SSCB128
11activation_3_reluReLU5656641SSCB0
12res2a_branch2c2-D Convolution56562561SSCB16,640
13res2a_branch12-D Convolution56562561SSCB16,640
14bn2a_branch2cBatch Normalization56562561SSCB512
15bn2a_branch1Batch Normalization56562561SSCB512
16add_1Addition56562561SSCB0
17activation_4_reluReLU56562561SSCB0
18res2b_branch2a2-D Convolution5656641SSCB16,448
19bn2b_branch2aBatch Normalization5656641SSCB128
20activation_5_reluReLU5656641SSCB0
21res2b_branch2b2-D Convolution5656641SSCB36,928
22bn2b_branch2bBatch Normalization5656641SSCB128
23activation_6_reluReLU5656641SSCB0
24res2b_branch2c2-D Convolution56562561SSCB16,640
25bn2b_branch2cBatch Normalization56562561SSCB512
26add_2Addition56562561SSCB0
27activation_7_reluReLU56562561SSCB0
28res2c_branch2a2-D Convolution5656641SSCB16,448
29bn2c_branch2aBatch Normalization5656641SSCB128
30activation_8_reluReLU5656641SSCB0
31res2c_branch2b2-D Convolution5656641SSCB36,928
32bn2c_branch2bBatch Normalization5656641SSCB128
33activation_9_reluReLU5656641SSCB0
34res2c_branch2c2-D Convolution56562561SSCB16,640
35bn2c_branch2cBatch Normalization56562561SSCB512
36add_3Addition56562561SSCB0
37activation_10_reluReLU56562561SSCB0
38res3a_branch2a2-D Convolution28281281SSCB32,896
39bn3a_branch2aBatch Normalization28281281SSCB256
40activation_11_reluReLU28281281SSCB0
41res3a_branch2b2-D Convolution28281281SSCB147,584
42bn3a_branch2bBatch Normalization28281281SSCB256
43activation_12_reluReLU28281281SSCB0
44res3a_branch2c2-D Convolution28285121SSCB66,048
45res3a_branch12-D Convolution28285121SSCB131,584
46bn3a_branch2cBatch Normalization28285121SSCB1024
47bn3a_branch1Batch Normalization28285121SSCB1024
48add_4Addition28285121SSCB0
49activation_13_reluReLU28285121SSCB0
50res3b_branch2a2-D Convolution28281281SSCB65,664
51bn3b_branch2aBatch Normalization28281281SSCB256
52activation_14_reluReLU28281281SSCB0
53res3b_branch2b2-D Convolution28281281SSCB147,584
54bn3b_branch2bBatch Normalization28281281SSCB256
55activation_15_reluReLU28281281SSCB0
56res3b_branch2c2-D Convolution28285121SSCB66,048
57bn3b_branch2cBatch Normalization28285121SSCB1024
58add_5Addition28285121SSCB0
59activation_16_reluReLU28285121SSCB0
60res3c_branch2a2-D Convolution28281281SSCB65,664
61bn3c_branch2aBatch Normalization28281281SSCB256
62activation_17_reluReLU28281281SSCB0
63res3c_branch2b2-D Convolution28281281SSCB147,584
64bn3c_branch2bBatch Normalization28281281SSCB256
65activation_18_reluReLU28281281SSCB0
66res3c_branch2c2-D Convolution28285121SSCB66,048
67bn3c_branch2cBatch Normalization28285121SSCB1024
68add_6Addition28285121SSCB0
69activation_19_reluReLU28285121SSCB0
70res3d_branch2a2-D Convolution28281281SSCB65,664
71bn3d_branch2aBatch Normalization28281281SSCB256
72activation_20_reluReLU28281281SSCB0
73res3d_branch2b2-D Convolution28281281SSCB147,584
74bn3d_branch2bBatch Normalization28281281SSCB256
75activation_21_reluReLU28281281SSCB0
76res3d_branch2c2-D Convolution28285121SSCB66,048
77bn3d_branch2cBatch Normalization28285121SSCB1024
78add_7Addition28285121SSCB0
79activation_22_reluReLU28285121SSCB0
80res4a_branch2a2-D Convolution14142561SSCB131,328
81bn4a_branch2aBatch Normalization14142561SSCB512
82activation_23_reluReLU14142561SSCB0
83res4a_branch2b2-D Convolution14142561SSCB590,080
84bn4a_branch2bBatch Normalization14142561SSCB512
85activation_24_reluReLU14142561SSCB0
86res4a_branch2c2-D Convolution141410241SSCB263,168
87res4a_branch12-D Convolution141410241SSCB525,312
88bn4a_branch2cBatch Normalization141410241SSCB2048
89bn4a_branch1Batch Normalization141410241SSCB2048
90add_8Addition141410241SSCB0
91activation_25_reluReLU141410241SSCB0
92res4b_branch2a2-D Convolution14142561SSCB262,400
93bn4b_branch2aBatch Normalization14142561SSCB512
94activation_26_reluReLU14142561SSCB0
95res4b_branch2b2-D Convolution14142561SSCB590,080
96bn4b_branch2bBatch Normalization14142561SSCB512
97activation_27_reluReLU14142561SSCB0
98res4b_branch2c2-D Convolution141410241SSCB263,168
99bn4b_branch2cBatch Normalization141410241SSCB2048
100add_9Addition141410241SSCB0
101activation_28_reluReLU141410241SSCB0
102res4c_branch2a2-D Convolution14142561SSCB262,400
103bn4c_branch2aBatch Normalization14142561SSCB512
104activation_29_reluReLU14142561SSCB0
105res4c_branch2b2-D Convolution14142561SSCB590,080
106bn4c_branch2bBatch Normalization14142561SSCB512
107activation_30_reluReLU14142561SSCB0
108res4c_branch2c2-D Convolution141410241SSCB263,168
109bn4c_branch2cBatch Normalization141410241SSCB2048
110add_10Addition141410241SSCB0
111activation_31_reluReLU141410241SSCB0
112res4d_branch2a2-D Convolution14142561SSCB262,400
113bn4d_branch2aBatch Normalization14142561SSCB512
114activation_32_reluReLU14142561SSCB0
115res4d_branch2b2-D Convolution14142561SSCB590,080
116bn4d_branch2bBatch Normalization14142561SSCB512
117activation_33_reluReLU14142561SSCB0
118res4d_branch2c2-D Convolution141410241SSCB263,168
119bn4d_branch2cBatch Normalization141410241SSCB2048
120add_11Addition141410241SSCB0
121activation_34_reluReLU141410241SSCB0
122res4e_branch2a2-D Convolution14142561SSCB262,400
123bn4e_branch2aBatch Normalization14142561SSCB512
124activation_35_reluReLU14142561SSCB0
125res4e_branch2b2-D Convolution14142561SSCB590,080
126bn4e_branch2bBatch Normalization14142561SSCB512
127activation_36_reluReLU14142561SSCB0
128res4e_branch2c2-D Convolution141410241SSCB263,168
129bn4e_branch2cBatch Normalization141410241SSCB2048
130add_12Addition141410241SSCB0
131activation_37_reluReLU141410241SSCB0
132res4f_branch2a2-D Convolution14142561SSCB262,400
133bn4f_branch2aBatch Normalization14142561SSCB512
134activation_38_reluReLU14142561SSCB0
135res4f_branch2b2-D Convolution14142561SSCB590,080
136bn4f_branch2bBatch Normalization14142561SSCB512
137activation_39_reluReLU14142561SSCB0
138res4f_branch2c2-D Convolution141410241SSCB263,168
139bn4f_branch2cBatch Normalization141410241SSCB2048
140add_13Addition141410241SSCB0
141activation_40_reluReLU141410241SSCB0
142res5a_branch2a2-D Convolution14145121SSCB524,800
143bn5a_branch2aBatch Normalization14145121SSCB1024
144activation_41_reluReLU14145121SSCB0
145res5a_branch2b2-D Convolution14145121SSCB2,359,808
146bn5a_branch2bBatch Normalization14145121SSCB1024
147activation_42_reluReLU14145121SSCB0
148res5a_branch2c2-D Convolution141420481SSCB1,050,624
149res5a_branch12-D Convolution141420481SSCB2,099,200
150bn5a_branch2cBatch Normalization141420481SSCB4096
151bn5a_branch1Batch Normalization141420481SSCB4096
152add_14Addition141420481SSCB0
153activation_43_reluReLU141420481SSCB0
154res5b_branch2a2-D Convolution14145121SSCB1,049,088
155bn5b_branch2aBatch Normalization14145121SSCB1024
156activation_44_reluReLU14145121SSCB0
157res5b_branch2b2-D Convolution14145121SSCB2,359,808
158bn5b_branch2bBatch Normalization14145121SSCB1024
159activation_45_reluReLU14145121SSCB0
160res5b_branch2c2-D Convolution141420481SSCB1,050,624
161bn5b_branch2cBatch Normalization141420481SSCB4096
162add_15Addition141420481SSCB0
163activation_46_reluReLU141420481SSCB0
164res5c_branch2a2-D Convolution14145121SSCB1,049,088
165bn5c_branch2aBatch Normalization14145121SSCB1024
166activation_47_reluReLU14145121SSCB0
167res5c_branch2b2-D Convolution14145121SSCB2,359,808
168bn5c_branch2bBatch Normalization14145121SSCB1024
169activation_48_reluReLU14145121SSCB0
170res5c_branch2c2-D Convolution141420481SSCB1,050,624
171bn5c_branch2cBatch Normalization141420481SSCB4096
172add_16Addition141420481SSCB0
173activation_49_reluReLU141420481SSCB0
174aspp_Conv_12-D Convolution14142561SSCB524,544
175aspp_BatchNorm_1Batch Normalization14142561SSCB512
176aspp_Relu_1ReLU14142561SSCB0
177aspp_Conv_22-D Convolution14142561SSCB4,718,848
178aspp_BatchNorm_2Batch Normalization14142561SSCB512
179aspp_Relu_2ReLU14142561SSCB0
180aspp_Conv_32-D Convolution14142561SSCB4,718,848
181aspp_BatchNorm_3Batch Normalization14142561SSCB512
182aspp_Relu_3ReLU14142561SSCB0
183aspp_Conv_42-D Convolution14142561SSCB4,718,848
184aspp_BatchNorm_4Batch Normalization14142561SSCB512
185aspp_Relu_4ReLU14142561SSCB0
186catAsppDepth concatenation141410241SSCB0
187dec_c12-D Convolution14142561SSCB262,400
188dec_bn1Batch Normalization14142561SSCB512
189dec_relu1ReLU14142561SSCB0
190dec_upsample12-D Transposed Convolution56562561SSCB4,194,560
191dec_c22-D Convolution5656481SSCB12,336
192dec_bn2Batch Normalization5656481SSCB96
193dec_relu2ReLU5656481SSCB0
194dec_crop1Crop 2D56562561SSCB0
195dec_cat1Depth concatenation56563041SSCB0
196dec_c32-D Convolution56562561SSCB700,672
197dec_bn3Batch Normalization56562561SSCB512
198dec_relu3ReLU56562561SSCB0
199dec_c42-D Convolution56562561SSCB590,080
200dec_bn4Batch Normalization56562561SSCB512
201dec_relu4ReLU56562561SSCB0
202scorer2-D Convolution565621SSCB514
203dec_upsample22-D Transposed Convolution22422421SSCB258
204dec_crop2Crop 2D22422421SSCB0
205softmax-outSoftmax22422421SSCB0
206labelsPixel Classification Layer22422421SSCB0
Table A4. DeepLabv3+/ResNet18 Layer information (Activation format: S—Spatial; C–Channel; B—Batch).
Table A4. DeepLabv3+/ResNet18 Layer information (Activation format: S—Spatial; C–Channel; B—Batch).
NameTypeActivationsLearnables
1dataImage Input22422431SSCB0
2conv12-D Convolution112112641SSCB9472
3bn_conv1Batch Normalization112112641SSCB128
4conv1_reluReLU112112641SSCB0
5pool12-D Max Pooling5656641SSCB0
6res2a_branch2a2-D Convolution5656641SSCB36,928
7bn2a_branch2aBatch Normalization5656641SSCB128
8res2a_branch2a_reluReLU5656641SSCB0
9res2a_branch2b2-D Convolution5656641SSCB36,928
10bn2a_branch2bBatch Normalization5656641SSCB128
11res2aAddition5656641SSCB0
12res2a_reluReLU5656641SSCB0
13res2b_branch2a2-D Convolution5656641SSCB36,928
14bn2b_branch2aBatch Normalization5656641SSCB128
15res2b_branch2a_reluReLU5656641SSCB0
16res2b_branch2b2-D Convolution5656641SSCB36,928
17bn2b_branch2bBatch Normalization5656641SSCB128
18res2bAddition5656641SSCB0
19res2b_reluReLU5656641SSCB0
20res3a_branch2a2-D Convolution28281281SSCB73,856
21bn3a_branch2aBatch Normalization28281281SSCB256
22res3a_branch2a_reluReLU28281281SSCB0
23res3a_branch2b2-D Convolution28281281SSCB147,584
24bn3a_branch2bBatch Normalization28281281SSCB256
25res3a_branch12-D Convolution28281281SSCB8320
26bn3a_branch1Batch Normalization28281281SSCB256
27res3aAddition28281281SSCB0
28res3a_reluReLU28281281SSCB0
29res3b_branch2a2-D Convolution28281281SSCB147,584
30bn3b_branch2aBatch Normalization28281281SSCB256
31res3b_branch2a_reluReLU28281281SSCB0
32res3b_branch2b2-D Convolution28281281SSCB147,584
33bn3b_branch2bBatch Normalization28281281SSCB256
34res3bAddition28281281SSCB0
35res3b_reluReLU28281281SSCB0
36res4a_branch2a2-D Convolution14142561SSCB295,168
37bn4a_branch2aBatch Normalization14142561SSCB512
38res4a_branch2a_reluReLU14142561SSCB0
39res4a_branch2b2-D Convolution14142561SSCB590,080
40bn4a_branch2bBatch Normalization14142561SSCB512
41res4a_branch12-D Convolution14142561SSCB33,024
42bn4a_branch1Batch Normalization14142561SSCB512
43res4aAddition14142561SSCB0
44res4a_reluReLU14142561SSCB0
45res4b_branch2a2-D Convolution14142561SSCB590,080
46bn4b_branch2aBatch Normalization14142561SSCB512
47res4b_branch2a_reluReLU14142561SSCB0
48res4b_branch2b2-D Convolution14142561SSCB590,080
49bn4b_branch2bBatch Normalization14142561SSCB512
50res4bAddition14142561SSCB0
51res4b_reluReLU14142561SSCB0
52res5a_branch2a2-D Convolution14145121SSCB1,180,160
53bn5a_branch2aBatch Normalization14145121SSCB1024
54res5a_branch2a_reluReLU14145121SSCB0
55res5a_branch2b2-D Convolution14145121SSCB2,359,808
56bn5a_branch2bBatch Normalization14145121SSCB1024
57res5a_branch12-D Convolution14145121SSCB131,584
58bn5a_branch1Batch Normalization14145121SSCB1024
59res5aAddition14145121SSCB0
60res5a_reluReLU14145121SSCB0
61res5b_branch2a2-D Convolution14145121SSCB2,359,808
62bn5b_branch2aBatch Normalization14145121SSCB1024
63res5b_branch2a_reluReLU14145121SSCB0
64res5b_branch2b2-D Convolution14145121SSCB2,359,808
65bn5b_branch2bBatch Normalization14145121SSCB1024
66res5bAddition14145121SSCB0
67res5b_reluReLU14145121SSCB0
68aspp_Conv_12-D Convolution14142561SSCB131,328
69aspp_BatchNorm_1Batch Normalization14142561SSCB512
70aspp_Relu_1ReLU14142561SSCB0
71aspp_Conv_22-D Convolution14142561SSCB1,179,904
72aspp_BatchNorm_2Batch Normalization14142561SSCB512
73aspp_Relu_2ReLU14142561SSCB0
74aspp_Conv_32-D Convolution14142561SSCB1,179,904
75aspp_BatchNorm_3Batch Normalization14142561SSCB512
76aspp_Relu_3ReLU14142561SSCB0
77aspp_Conv_42-D Convolution14142561SSCB1,179,904
78aspp_BatchNorm_4Batch Normalization14142561SSCB512
79aspp_Relu_4ReLU14142561SSCB0
80catAsppDepth concatenation1414###1SSCB0
81dec_c12-D Convolution14142561SSCB262,400
82dec_bn1Batch Normalization14142561SSCB512
83dec_relu1ReLU14142561SSCB0
84dec_upsample12-D Transposed Convolution56562561SSCB4,194,560
85dec_c22-D Convolution5656481SSCB3120
86dec_bn2Batch Normalization5656481SSCB96
87dec_relu2ReLU5656481SSCB0
88dec_crop1Crop 2D56562561SSCB0
89dec_cat1Depth concatenation56563041SSCB0
90dec_c32-D Convolution56562561SSCB700,672
91dec_bn3Batch Normalization56562561SSCB512
92dec_relu3ReLU56562561SSCB0
93dec_c42-D Convolution56562561SSCB590,080
94dec_bn4Batch Normalization56562561SSCB512
95dec_relu4ReLU56562561SSCB0
96scorer2-D Convolution565621SSCB514
97dec_upsample22-D Transposed Convolution22422421SSCB258
98dec_crop2Crop 2D22422421SSCB0
99softmax-outSoftmax22422421SSCB0
100labelsPixel Classification Layer22422421SSCB0
Table A5. DeepLabv3+/MobileNetv2 Layer information (Activation format: S—Spatial; C–Channel; B—Batch).
Table A5. DeepLabv3+/MobileNetv2 Layer information (Activation format: S—Spatial; C–Channel; B—Batch).
NameTypeActivationsLearnables
1input_1Image Input22422431SSCB0
2Conv12-D Convolution112112321SSCB896
3bn_Conv1Batch Normalization112112321SSCB64
4Conv1_reluClipped ReLU112112321SSCB0
5expanded_conv_depthwise2-D Grouped Convolution112112321SSCB320
6expanded_conv_depthwise_BNBatch Normalization112112321SSCB64
7expanded_conv_depthwise_reluClipped ReLU112112321SSCB0
8expanded_conv_project2-D Convolution112112161SSCB528
9expanded_conv_project_BNBatch Normalization112112161SSCB32
10block_1_expand2-D Convolution112112961SSCB1632
11block_1_expand_BNBatch Normalization112112961SSCB192
12block_1_expand_reluClipped ReLU112112961SSCB0
13block_1_depthwise2-D Grouped Convolution5656961SSCB960
14block_1_depthwise_BNBatch Normalization5656961SSCB192
15block_1_depthwise_reluClipped ReLU5656961SSCB0
16block_1_project2-D Convolution5656241SSCB2328
17block_1_project_BNBatch Normalization5656241SSCB48
18block_2_expand2-D Convolution56561441SSCB3600
19block_2_expand_BNBatch Normalization56561441SSCB288
20block_2_expand_reluClipped ReLU56561441SSCB0
21block_2_depthwise2-D Grouped Convolution56561441SSCB1440
22block_2_depthwise_BNBatch Normalization56561441SSCB288
23block_2_depthwise_reluClipped ReLU56561441SSCB0
24block_2_project2-D Convolution5656241SSCB3480
25block_2_project_BNBatch Normalization5656241SSCB48
26block_2_addAddition5656241SSCB0
27block_3_expand2-D Convolution56561441SSCB3600
28block_3_expand_BNBatch Normalization56561441SSCB288
29block_3_expand_reluClipped ReLU56561441SSCB0
30block_3_depthwise2-D Grouped Convolution28281441SSCB1440
31block_3_depthwise_BNBatch Normalization28281441SSCB288
32block_3_depthwise_reluClipped ReLU28281441SSCB0
33block_3_project2-D Convolution2828321SSCB4640
34block_3_project_BNBatch Normalization2828321SSCB64
35block_4_expand2-D Convolution28281921SSCB6336
36block_4_expand_BNBatch Normalization28281921SSCB384
37block_4_expand_reluClipped ReLU28281921SSCB0
38block_4_depthwise2-D Grouped Convolution28281921SSCB1920
39block_4_depthwise_BNBatch Normalization28281921SSCB384
40block_4_depthwise_reluClipped ReLU28281921SSCB0
41block_4_project2-D Convolution2828321SSCB6176
42block_4_project_BNBatch Normalization2828321SSCB64
43block_4_addAddition2828321SSCB0
44block_5_expand2-D Convolution28281921SSCB6336
45block_5_expand_BNBatch Normalization28281921SSCB384
46block_5_expand_reluClipped ReLU28281921SSCB0
47block_5_depthwise2-D Grouped Convolution28281921SSCB1920
48block_5_depthwise_BNBatch Normalization28281921SSCB384
49block_5_depthwise_reluClipped ReLU28281921SSCB0
50block_5_project2-D Convolution2828321SSCB6176
51block_5_project_BNBatch Normalization2828321SSCB64
52block_5_addAddition2828321SSCB0
53block_6_expand2-D Convolution28281921SSCB6336
54block_6_expand_BNBatch Normalization28281921SSCB384
55block_6_expand_reluClipped ReLU28281921SSCB0
56block_6_depthwise2-D Grouped Convolution14141921SSCB1920
57block_6_depthwise_BNBatch Normalization14141921SSCB384
58block_6_depthwise_reluClipped ReLU14141921SSCB0
59block_6_project2-D Convolution1414641SSCB12,352
60block_6_project_BNBatch Normalization1414641SSCB128
61block_7_expand2-D Convolution14143841SSCB24,960
62block_7_expand_BNBatch Normalization14143841SSCB768
63block_7_expand_reluClipped ReLU14143841SSCB0
64block_7_depthwise2-D Grouped Convolution14143841SSCB3840
65block_7_depthwise_BNBatch Normalization14143841SSCB768
66block_7_depthwise_reluClipped ReLU14143841SSCB0
67block_7_project2-D Convolution1414641SSCB24,640
68block_7_project_BNBatch Normalization1414641SSCB128
69block_7_addAddition1414641SSCB0
70block_8_expand2-D Convolution14143841SSCB24,960
71block_8_expand_BNBatch Normalization14143841SSCB768
72block_8_expand_reluClipped ReLU14143841SSCB0
73block_8_depthwise2-D Grouped Convolution14143841SSCB3840
74block_8_depthwise_BNBatch Normalization14143841SSCB768
75block_8_depthwise_reluClipped ReLU14143841SSCB0
76block_8_project2-D Convolution1414641SSCB24,640
77block_8_project_BNBatch Normalization1414641SSCB128
78block_8_addAddition1414641SSCB0
79block_9_expand2-D Convolution14143841SSCB24,960
80block_9_expand_BNBatch Normalization14143841SSCB768
81block_9_expand_reluClipped ReLU14143841SSCB0
82block_9_depthwise2-D Grouped Convolution14143841SSCB3840
83block_9_depthwise_BNBatch Normalization14143841SSCB768
84block_9_depthwise_reluClipped ReLU14143841SSCB0
85block_9_project2-D Convolution1414641SSCB24,640
86block_9_project_BNBatch Normalization1414641SSCB128
87block_9_addAddition1414641SSCB0
88block_10_expand2-D Convolution14143841SSCB24,960
89block_10_expand_BNBatch Normalization14143841SSCB768
90block_10_expand_reluClipped ReLU14143841SSCB0
91block_10_depthwise2-D Grouped Convolution14143841SSCB3840
92block_10_depthwise_BNBatch Normalization14143841SSCB768
93block_10_depthwise_reluClipped ReLU14143841SSCB0
94block_10_project2-D Convolution1414961SSCB36,960
95block_10_project_BNBatch Normalization1414961SSCB192
96block_11_expand2-D Convolution14145761SSCB55,872
97block_11_expand_BNBatch Normalization14145761SSCB1152
98block_11_expand_reluClipped ReLU14145761SSCB0
99block_11_depthwise2-D Grouped Convolution14145761SSCB5760
100block_11_depthwise_BNBatch Normalization14145761SSCB1152
101block_11_depthwise_reluClipped ReLU14145761SSCB0
102block_11_project2-D Convolution1414961SSCB55,392
103block_11_project_BNBatch Normalization1414961SSCB192
104block_11_addAddition1414961SSCB0
105block_12_expand2-D Convolution14145761SSCB55,872
106block_12_expand_BNBatch Normalization14145761SSCB1152
107block_12_expand_reluClipped ReLU14145761SSCB0
108block_12_depthwise2-D Grouped Convolution14145761SSCB5760
109block_12_depthwise_BNBatch Normalization14145761SSCB1152
110block_12_depthwise_reluClipped ReLU14145761SSCB0
111block_12_project2-D Convolution1414961SSCB55,392
112block_12_project_BNBatch Normalization1414961SSCB192
113block_12_addAddition1414961SSCB0
114block_13_expand2-D Convolution14145761SSCB55,872
115block_13_expand_BNBatch Normalization14145761SSCB1152
116block_13_expand_reluClipped ReLU14145761SSCB0
117block_13_depthwise2-D Grouped Convolution14145761SSCB5760
118block_13_depthwise_BNBatch Normalization14145761SSCB1152
119block_13_depthwise_reluClipped ReLU14145761SSCB0
120block_13_project2-D Convolution14141601SSCB92,320
121block_13_project_BNBatch Normalization14141601SSCB320
122block_14_expand2-D Convolution14149601SSCB154,560
123block_14_expand_BNBatch Normalization14149601SSCB1920
124block_14_expand_reluClipped ReLU14149601SSCB0
125block_14_depthwise2-D Grouped Convolution14149601SSCB9600
126block_14_depthwise_BNBatch Normalization14149601SSCB1920
127block_14_depthwise_reluClipped ReLU14149601SSCB0
128block_14_project2-D Convolution14141601SSCB153,760
129block_14_project_BNBatch Normalization14141601SSCB320
130block_14_addAddition14141601SSCB0
131block_15_expand2-D Convolution14149601SSCB154,560
132block_15_expand_BNBatch Normalization14149601SSCB1920
133block_15_expand_reluClipped ReLU14149601SSCB0
134block_15_depthwise2-D Grouped Convolution14149601SSCB9600
135block_15_depthwise_BNBatch Normalization14149601SSCB1920
136block_15_depthwise_reluClipped ReLU14149601SSCB0
137block_15_project2-D Convolution14141601SSCB153,760
138block_15_project_BNBatch Normalization14141601SSCB320
139block_15_addAddition14141601SSCB0
140block_16_expand2-D Convolution14149601SSCB154,560
141block_16_expand_BNBatch Normalization14149601SSCB1920
142block_16_expand_reluClipped ReLU14149601SSCB0
143block_16_depthwise2-D Grouped Convolution14149601SSCB9600
144block_16_depthwise_BNBatch Normalization14149601SSCB1920
145block_16_depthwise_reluClipped ReLU14149601SSCB0
146block_16_project2-D Convolution14143201SSCB307,520
147block_16_project_BNBatch Normalization14143201SSCB640
148aspp_Conv_1_depthwise2-D Grouped Convolution14143201SSCB640
149aspp_Conv_1_pointwise2-D Convolution14142561SSCB82,176
150aspp_BatchNorm_1Batch Normalization14142561SSCB512
151aspp_Relu_1ReLU14142561SSCB0
152aspp_Conv_2_depthwise2-D Grouped Convolution14143201SSCB3200
153aspp_Conv_2_pointwise2-D Convolution14142561SSCB82,176
154aspp_BatchNorm_2Batch Normalization14142561SSCB512
155aspp_Relu_2ReLU14142561SSCB0
156aspp_Conv_3_depthwise2-D Grouped Convolution14143201SSCB3200
157aspp_Conv_3_pointwise2-D Convolution14142561SSCB82,176
158aspp_BatchNorm_3Batch Normalization14142561SSCB512
159aspp_Relu_3ReLU14142561SSCB0
160aspp_Conv_4_depthwise2-D Grouped Convolution14143201SSCB3200
161aspp_Conv_4_pointwise2-D Convolution14142561SSCB82,176
162aspp_BatchNorm_4Batch Normalization14142561SSCB512
163aspp_Relu_4ReLU14142561SSCB0
164catAsppDepth concatenation141410241SSCB0
165dec_c12-D Convolution14142561SSCB262,400
166dec_bn1Batch Normalization14142561SSCB512
167dec_relu1ReLU14142561SSCB0
168dec_upsample12-D Transposed Convolution56562561SSCB4,194,560
169dec_c22-D Convolution5656481SSCB6960
170dec_bn2Batch Normalization5656481SSCB96
171dec_relu2ReLU5656481SSCB0
172dec_crop1Crop 2D56562561SSCB0
173dec_cat1Depth concatenation56563041SSCB0
174dec_c3_depthwise2-D Grouped Convolution56563041SSCB3040
175dec_c3_pointwise2-D Convolution56562561SSCB78,080
176dec_bn3Batch Normalization56562561SSCB512
177dec_relu3ReLU56562561SSCB0
178dec_c4_depthwise2-D Grouped Convolution56562561SSCB2560
179dec_c4_pointwise2-D Convolution56562561SSCB65,792
180dec_bn4Batch Normalization56562561SSCB512
181dec_relu4ReLU56562561SSCB0
182scorer2-D Convolution565621SSCB514
183dec_upsample22-D Transposed Convolution22422421SSCB258
184dec_crop2Crop 2D22422421SSCB0
185softmax-outSoftmax22422421SSCB0
186labelsPixel Classification Layer22422421SSCB0

References

  1. Larmuseau, M.; Sluydts, M.; Theuwissen, K.; Duprez, L.; Dhaene, T.; Cottenier, S. Race against the Machine: Can Deep Learning Recognize Microstructures as Well as the Trained Human Eye? Scr. Mater. 2021, 193, 33–37. [Google Scholar] [CrossRef]
  2. DeCost, B.L.; Francis, T.; Holm, E.A. Exploring the Microstructure Manifold: Image Texture Representations Applied to Ultrahigh Carbon Steel Microstructures. Acta Mater. 2017, 133, 30–40. [Google Scholar] [CrossRef]
  3. Gupta, S.; Banerjee, A.; Sarkar, J.; Kundu, M.; Sinha, S.K.; Bandyopadhyay, N.R.; Ganguly, S. Modelling the Steel Microstructure Knowledge for In-Silico Recognition of Phases Using Machine Learning. Mater. Chem. Phys. 2020, 252, 123286. [Google Scholar] [CrossRef]
  4. Wang, J.; Fa, Y.; Tian, Y.; Yu, X. A Machine-Learning Approach to Predict Creep Properties of Cr–Mo Steel with Time-Temperature Parameters. J. Mater. Res. Technol. 2021, 13, 635–650. [Google Scholar] [CrossRef]
  5. Yucel, B.; Yucel, S.; Ray, A.; Duprez, L.; Kalidindi, S.R. Mining the Correlations between Optical Micrographs and Mechanical Properties of Cold-Rolled HSLA Steels Using Machine Learning Approaches. Integr. Mater. Manuf. Innov. 2020, 9, 240–256. [Google Scholar] [CrossRef]
  6. Wang, Z.-L.; Adachi, Y. Property Prediction and Properties-to-Microstructure Inverse Analysis of Steels by a Machine-Learning Approach. Mater. Sci. Eng. A Struct. Mater. 2019, 744, 661–670. [Google Scholar] [CrossRef]
  7. Larmuseau, M.; Theuwissen, K.; Lejaeghere, K.; Duprez, L.; Dhaene, T.; Cottenier, S. Towards Accurate Processing-Structure-Property Links Using Deep Learning. Scr. Mater. 2022, 211, 114478. [Google Scholar] [CrossRef]
  8. Muñoz-Rodenas, J.; García-Sevilla, F.; Coello-Sobrino, J.; Martínez-Martínez, A.; Miguel-Eguía, V. Effectiveness of Machine-Learning and Deep-Learning Strategies for the Classification of Heat Treatments Applied to Low-Carbon Steels Based on Microstructural Analysis. Appl. Sci. 2023, 13, 3479. [Google Scholar] [CrossRef]
  9. Luengo, J.; Moreno, R.; Sevillano, I.; Charte, D.; Peláez, A.; Fernández, M.; Herrera, F. A tutorial on the segmentation of metallographic images: Taxonomy, new MetalDAM dataset, deep learning-based ensemble model, experimental analysis and challenges. Inf. Fusion 2022, 78, 232–253. [Google Scholar] [CrossRef]
  10. Bulgarevich, D.; Tsukamoto, S.; Kasuya, T.; Demura, M.; Watanabe, M. Pattern recognition with machine learning on optical microscopy images of typical metallurgical microstructures. Sci. Rep. 2018, 8, 2078. [Google Scholar] [CrossRef]
  11. Bachmann, B.; Müller, M.; Britz, D.; Durmaz, A.; Ackermann, M.; Shchyglo, O.; Staudt, T.; Mücklich, F. Efficient reconstruction of prior austenite grains in steel from etched light optical micrographs using deep learning and annotations from correlative microscopy. Front. Mater. 2022, 9, 1033505. [Google Scholar] [CrossRef]
  12. Han, Y.; Li, R.; Yang, S.; Chen, Q.; Wang, B.; Liu, Y. Center-environment feature models for materials image segmentation based on machine learning. Sci. Rep. 2022, 12, 12960. [Google Scholar] [CrossRef]
  13. Kim, H.; Inoue, J.; Kasuya, T. Unsupervised microstructure segmentation by mimicking metallurgists’ approach to pattern recognition. Sci. Rep. 2020, 10, 17835. [Google Scholar] [CrossRef]
  14. Breumier, S.; Martinez, T.; Frincu, B.; Gey, N.; Couturier, A.; Loukachenko, N.; Aba-perea, P.E.; Germain, L. Leveraging EBSD data by deep learning for bainite, ferrite and martensite segmentation. Mater. Charact. 2022, 186, 111805. [Google Scholar] [CrossRef]
  15. Chaurasia, N.; Jha, S.K.; Sangal, S. A Novel Training Methodology for Phase Segmentation of Steel Microstructures Using a Deep Learning Algorithm. Materialia 2023, 30, 101803. [Google Scholar] [CrossRef]
  16. Liu, J.; Cao, G.; Wang, H.; Cui, C.; Liu, Z. Development of Intelligent Methodologies Perceiving Microstructure and Mechanical Properties of Hot Rolled Steels. Measurement 2023, 221, 113526. [Google Scholar] [CrossRef]
  17. Azimi, S.M.; Britz, D.; Engstler, M.; Fritz, M.; Mücklich, F. Advanced Steel Microstructural Classification by Deep Learning Methods. Sci. Rep. 2018, 8, 2128. [Google Scholar] [CrossRef]
  18. Martinez Ostormujof, T.; Purushottam Raj Purohit, R.R.P.; Breumier, S.; Gey, N.; Salib, M.; Germain, L. Deep Learning for Automated Phase Segmentation in EBSD Maps. A Case Study in Dual Phase Steel Microstructures. Mater. Charact. 2022, 184, 111638. [Google Scholar] [CrossRef]
  19. Xie, L.; Li, W.; Fan, L.; Zhou, M. Automatic Identification of the Multiphase Microstructures of Steels Based on ASPP-FCN. Steel Res. Int. 2023, 94, 202200204. [Google Scholar] [CrossRef]
  20. Ma, X.; Yu, Y. Training Tricks for Steel Microstructure Segmentation with Deep Learning. Processes 2023, 11, 3298. [Google Scholar] [CrossRef]
  21. Bihani, A.; Daigle, H.; Santos, J.E.; Landry, C.; Prodanović, M.; Milliken, K. MudrockNet: Semantic Segmentation of Mudrock SEM Images through Deep Learning. Comput. Geosci. 2022, 158, 104952. [Google Scholar] [CrossRef]
  22. Arganda-Carreras, I.; Kaynig, V.; Rueden, C.; Eliceiri, K.W.; Schindelin, J.; Cardona, A.; Seung, H.S. Trainable Weka Segmentation: A machine learning tool for microscopy pixel classification. Bioinformatics 2017, 33, 2424–2426. [Google Scholar] [CrossRef] [PubMed]
  23. Somasundaram, E.; Kaufman, R.; Brady, S. Advancements in Automated Tissue Segmentation Pipeline for Contrast-Enhanced CT Scans of Adult and Pediatric Patients. In Proceedings of the SPIE Medical Imaging, Orlando, FL, USA, 13–16 February 2017; Armato, S.G., Petrick, N.A., Eds.; SPIE: Bellingham, WA, USA, 2017. [Google Scholar]
  24. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2015, Munich, Germany, 5–9 October 2015; Volume 9351, pp. 234–241. [Google Scholar]
  25. Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
  26. Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Springer International Publishing: Cham, Switzerland, 2018; pp. 833–851, ISBN 9783030012335. [Google Scholar]
  27. Csurka, G.; Larlus, D.; Perronnin, F. What is a good evaluation measure for semantic segmentation? In Proceedings of the British Machine Vision Conference, Bristol, UK, 9–13 September 2013; pp. 32.1–32.11. [Google Scholar]
  28. Swain, B.R.; Cho, D.; Park, J.; Roh, J.-S.; Ko, J. Complex-Phase Steel Microstructure Segmentation Using UNet: Analysis across Different Magnifications and Steel Types. Materials 2023, 16, 7254. [Google Scholar] [CrossRef] [PubMed]
  29. Han, Y.; Li, R.; Wang, B.; Ruan, L.; Chen, Q. A Pseudo-Labeling Based Weakly Supervised Segmentation Method for Few-Shot Texture Images. Expert Syst. Appl. 2024, 238, 122110. [Google Scholar] [CrossRef]
  30. He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep Into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
  31. Ajioka, F.; Wang, Z.-L.; Ogawa, T.; Adachi, Y. Development of High Accuracy Segmentation Model for Microstructure of Steel by Deep Learning. ISIJ Int. 2020, 60, 954–959. [Google Scholar] [CrossRef]
  32. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556v6. [Google Scholar]
  33. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. IJCV 2015, 115, 211–252. [Google Scholar] [CrossRef]
  34. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar]
  35. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4510–4520. [Google Scholar]
Figure 1. Samples of C45E steel in an annealing state, (a,b) and normalizing; (c) reagent, Nital-1.
Figure 1. Samples of C45E steel in an annealing state, (a,b) and normalizing; (c) reagent, Nital-1.
Applsci 14 02297 g001
Figure 2. Microconstituents corresponding to the C45E Steel (reagent Nital-1). The red contours correspond to ferrite (a) and pearlite (b) areas.
Figure 2. Microconstituents corresponding to the C45E Steel (reagent Nital-1). The red contours correspond to ferrite (a) and pearlite (b) areas.
Applsci 14 02297 g002
Figure 3. Cropped and rotated sample images and masks with a resolution of 224 × 224 pixels.
Figure 3. Cropped and rotated sample images and masks with a resolution of 224 × 224 pixels.
Applsci 14 02297 g003
Figure 4. DeepLabV3+ and Resnet50 segmentation network architecture (adapted from [26]).
Figure 4. DeepLabV3+ and Resnet50 segmentation network architecture (adapted from [26]).
Applsci 14 02297 g004
Figure 5. Training progress of U-Net: (a) training and validation accuracy; (b) training and validation loss.
Figure 5. Training progress of U-Net: (a) training and validation accuracy; (b) training and validation loss.
Applsci 14 02297 g005
Figure 6. Training progress of SegNet: (a) training and validation accuracy; (b) training and validation loss.
Figure 6. Training progress of SegNet: (a) training and validation accuracy; (b) training and validation loss.
Applsci 14 02297 g006
Figure 7. Training progress of DeepLabv3+/ResNet50: (a) training and validation accuracy; (b) training and validation loss.
Figure 7. Training progress of DeepLabv3+/ResNet50: (a) training and validation accuracy; (b) training and validation loss.
Applsci 14 02297 g007
Figure 8. Training progress of DeepLabv3+/ResNet18: (a) training and validation accuracy; (b) training and validation loss.
Figure 8. Training progress of DeepLabv3+/ResNet18: (a) training and validation accuracy; (b) training and validation loss.
Applsci 14 02297 g008
Figure 9. Training progress of DeepLabv3+/MobileNetv2: (a) training and validation accuracy; (b) training and validation loss.
Figure 9. Training progress of DeepLabv3+/MobileNetv2: (a) training and validation accuracy; (b) training and validation loss.
Applsci 14 02297 g009
Figure 10. Confusion Matrix: (a) U-Net, (b) SegNet, (c) DeepLabv3+/ResNet50, (d) DeepLabv3+/ResNet18, and (e) Mobilenetv2.
Figure 10. Confusion Matrix: (a) U-Net, (b) SegNet, (c) DeepLabv3+/ResNet50, (d) DeepLabv3+/ResNet18, and (e) Mobilenetv2.
Applsci 14 02297 g010
Figure 11. Segmented samples. A and B correspond to two randomly selected samples.
Figure 11. Segmented samples. A and B correspond to two randomly selected samples.
Applsci 14 02297 g011
Figure 12. Detail of the error in mask production during preprocessing. (a) Image from test dataset; (b) mask. The red box indicates the lack of ferrite in the mask.
Figure 12. Detail of the error in mask production during preprocessing. (a) Image from test dataset; (b) mask. The red box indicates the lack of ferrite in the mask.
Applsci 14 02297 g012
Figure 13. U-Net, (a) test image sample, (b) mask of the sample, (c) semantic segmentation U-NET, (d) DeepLabv3+/ResNet50, (e) DeepLabv3+/ResNet18, and (f) Deeplabv3/MobileNetv2.
Figure 13. U-Net, (a) test image sample, (b) mask of the sample, (c) semantic segmentation U-NET, (d) DeepLabv3+/ResNet50, (e) DeepLabv3+/ResNet18, and (f) Deeplabv3/MobileNetv2.
Applsci 14 02297 g013
Table 1. Chemical composition (weight %) of low-carbon steel samples.
Table 1. Chemical composition (weight %) of low-carbon steel samples.
SteelCSiMnPSCrMoNiCu
C45E0.450.250.650.0250.0350.400.100.400.30
Table 2. Training parameters and network information.
Table 2. Training parameters and network information.
NetworkOptimizerLearning RateMax EpochsBatch SizeTrainable ParametersLayers
U-NetAdam0.0013327,697,41046
SegNet1629,444,16691
DeepLabv3+ (Resnet50)3243,980,180206
DeepLabv3+ (Resnet18)20,607,636100
DeepLabv3+ (MobileNet)6,784,276186
Table 3. Training results (LR = 0.001) (bold numbers represent the maximum values).
Table 3. Training results (LR = 0.001) (bold numbers represent the maximum values).
ModelTraining Accuracy (%)Training Loss Final Validation Accuracy (%)Final Validation LossOutput Network IterationTime Elapsed (hh:mm:ss)
U-Net95.8740.12896.5530.09532000:06:50
DeepLabv3+resn5097.1850.06897.4850.06048000:08:14
DeepLabv3+resn1897.2290.06897.1770.07148000:04:49
DeepLabv3+mobn97.3790.06397.9690.05048000:07:41
SegNet95.3590.23696.7730.16996300:20:23
Table 4. Test image metrics (bold numbers represent the maximum values).
Table 4. Test image metrics (bold numbers represent the maximum values).
ModelGlobal AccuracyMean AccuracyMean IOUWeighted IOUMean BF Score
U-Net0.96670.95510.92020.93590.8578
DeepLabv3+ResNet500.97570.97220.94180.95290.8798
DeepLabv3+ResNet180.97250.97170.93490.94720.8471
DeepLabv3+MobNetv20.98020.97430.95210.96140.9149
SegNet0.96750.95960.92290.93770.8127
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Muñoz-Rodenas, J.; García-Sevilla, F.; Miguel-Eguía, V.; Coello-Sobrino, J.; Martínez-Martínez, A. A Deep Learning Approach to Semantic Segmentation of Steel Microstructures. Appl. Sci. 2024, 14, 2297. https://doi.org/10.3390/app14062297

AMA Style

Muñoz-Rodenas J, García-Sevilla F, Miguel-Eguía V, Coello-Sobrino J, Martínez-Martínez A. A Deep Learning Approach to Semantic Segmentation of Steel Microstructures. Applied Sciences. 2024; 14(6):2297. https://doi.org/10.3390/app14062297

Chicago/Turabian Style

Muñoz-Rodenas, Jorge, Francisco García-Sevilla, Valentín Miguel-Eguía, Juana Coello-Sobrino, and Alberto Martínez-Martínez. 2024. "A Deep Learning Approach to Semantic Segmentation of Steel Microstructures" Applied Sciences 14, no. 6: 2297. https://doi.org/10.3390/app14062297

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop