Microstructural Classiﬁcation of Bainitic Subclasses in Low-Carbon Multi-Phase Steels Using Machine Learning Techniques

: With its excellent property combinations and ability to speciﬁcally adjust tailor-made microstructures, steel is still the world’s most important engineering and construction material. To fulﬁll ever-increasing demands and tighter tolerances in today’s steel industry, steel research remains indispensable. The continuous material development leads to more and more complex microstructures, which is especially true for steel designs that include bainitic structures. This poses new challenges for the classiﬁcation and quantiﬁcation of these microstructures. Machine learning (ML) based microstructure classiﬁcation offers exciting potentials in this context. This paper is concerned with the automated, objective, and reproducible classiﬁcation of the carbon-rich second phase objects in multi-phase steels by using machine learning techniques. For successful applications of ML-based classiﬁcations, a holistic approach combining computer science expertise and material science domain knowledge is necessary. Seven microstructure classes are considered: pearlite, martensite, and the bainitic subclasses degenerate pearlite, debris of cementite, incomplete transformation product, and upper and lower bainite, which can all be present simultaneously in one micrograph. Based on SEM images, textural features (Haralick parameters and local binary pattern) and morphological parameters are calculated and classiﬁed with a support vector machine. Of all second phase objects, 82.9% are classiﬁed correctly. Regarding the total area of these objects, 89.2% are classiﬁed correctly. The reported classiﬁcation can be the basis for an improved, sophisticated microstructure quantiﬁcation, enabling process–microstructure–property correlations to be established and thereby forming the backbone of further, microstructure-centered material development.


Introduction
Due to its excellent property combinations and ability to specifically adjust tailor-made microstructures, steel is still the world's most important engineering and construction material and is omnipresent in every aspect of our lives.It can also be recycled over and over again without loss of property [1].In addition to the variation in chemical composition, steel owes its tremendous variety of property combinations to the large spectrum of process routes and heat treatments.Steel research is still indispensable, continuously leading to constant further developments and improvements.There are more than 3500 steel grades, and 75% of modern steel grades have been developed in the last 20 years [1].One of many consequences is that the microstructures have constantly evolved and become finer and more complex, thus requiring advanced characterization and classification approaches.This is especially true for steel designs that include bainitic microstructures.
For a reliable and reproducible characterization of complex microstructures, machine learning (ML) based microstructure classification offers exciting potentials.Prominent examples for ML classifications of steel microstructures include Gola et al. [2,3], who used a combination of morphological and textural parameters with a support vector machine (SVM) to classify the carbon-rich second phase of two-phase steels into pearlite, bainite, and martensite.Azimi et al. [4] applied deep learning (DL) to the same dataset to classify pearlite, bainite, martensite, and tempered martensite.DL was also used by DeCost et al. [5] for the classification of ultrahigh carbon steel microstructures.General overviews of the spectrum of ML applications in microstructure research can be found in [6,7].
In this context, ML offers promising opportunities for the classification of the different subclasses of the steel microstructure bainite, as well.Bainite is a typical constituent of modern high strength steels, notably low-carbon and low-alloy steels, which combine high strength and high toughness, making these types of steel interesting for many applications.To adjust the desired strength or toughness of these steels, it is crucial to know and understand what types of bainite are present, depending on chemical composition and processing parameters.A ML classification of bainite subclasses can be the basis for a sophisticated microstructure quantification, enabling process-microstructure-property correlations to be established.Thereby, it can form the backbone for further microstructurecentered materials development, which is needed to fulfill the increasing demands and tighter tolerances in today's steel industry.
The characterization or classification of bainite, however, is a difficult task, due to the variety and amount of the phases involved as well as the fineness and complexity of the structures.The continuous advancement of alloying concepts and processing routes has led to more and more diversity in bainitic structures, so that the simple first classification schemes, such as upper and lower bainite, are no longer sufficient.In this context, the definition of classes and the assignment of the ground truth for a ML classification must be discussed.It should be noted that, especially for complex microstructures, ML cannot be applied as a panacea, without precisely grasping the complex material-specific questions, but special attention must be paid when assigning the ground truth for the ML model [8].The diversity in bainitic structures can cause ambiguous interpretations and lead to a lack of consensus among human experts in labeling and classifying them.There is also no consistent nomenclature to describe bainitic microstructures [9,10], and many different classification schemes can be found in the literature.Existing schemes are usually based on the description of morphologies and arrangement of the ferritic and the carbon-rich phases.The first concept of classification schemes provides a description of the bainite type in one integral expression, e.g., [11][12][13][14][15].The second concept describes the ferritic and the carbon-rich phase separately, e.g., [16][17][18][19].
Approaches for a more objective ground truth assignment for ML segmentation or classification include Shen et al. [20], who use electron backscatter diffraction (EBSD) to generate annotations for DL segmentation of steel microstructures.Müller et al. [8] propose the use of EBSD, reference samples, and unsupervised learning as supporting methods for assigning the ground truth, demonstrated on a bainite case study.Given the abovementioned challenges in dealing with bainite, it is not surprising that only a few approaches to the automated classification of steel microstructures, including simultaneously present bainite subclasses, are found in the literature.Although ML approaches for microstructure classification were applied by Gola et al. [2,3] and Azimi et al. [4], all structures that were neither pearlite nor martensite were labeled as bainite and consequently, bainitic subclasses are not yet considered.Müller et al. [21] employed textural parameters combined with ML to classify pearlite, martensite, and four bainite subclasses in specifically produced reference samples.Textural parameters and ML were also used by Tsutsui et al. [22] for classifying samples with bainite and martensite.Non-ML based approaches for bainite classifications include Zajac et al. [15,23], who utilized misorientation angle distribution from EBSD measurements to differentiate granular, upper, and lower bainite.Ackermann et al. [24] applied correlative characterization (electron probe microanalysis, EBSD, and nanohardness) to classify low-, medium-, and high-temperature bainite morphologies.A combination of EBSD and ML are used by Tsutsui et al. [24], who utilize misorientation parameters and variant pairs from EBSD to distinguish bainite formed at high and low temperatures, as well as martensite and bainite-martensite mixtures.
The present paper follows the approach applied by Gola et al. [3], i.e., the machine learning classification of the carbon-rich second phase objects in multi-phase-steels, based on scanning electron microscope (SEM) images.Here, bainite subclasses are now to be considered, resulting in seven classes for the ML classification: pearlite, martensite, and five bainite subclasses.The task is the automated, objective, and reproducible classification of the carbon-rich second phase objects in SEM micrographs, as illustrated in Figure 1.Several classes can be present simultaneously in one micrograph.This classification in turn will enable a precise calculation of phase fractions and microstructural quantification, which again is the basis for establishing processing-microstructure-property correlations and further materials development.misorientation angle distribution from EBSD measurements to differentiate granular, upper, and lower bainite.Ackermann et al. [24] applied correlative characterization (electron probe microanalysis, EBSD, and nanohardness) to classify low-, medium-, and high-temperature bainite morphologies.A combination of EBSD and ML are used by Tsutsui et al. [24], who utilize misorientation parameters and variant pairs from EBSD to distinguish bainite formed at high and low temperatures, as well as martensite and bainite-martensite mixtures.The present paper follows the approach applied by Gola et al. [3], i.e., the machine learning classification of the carbon-rich second phase objects in multi-phase-steels, based on scanning electron microscope (SEM) images.Here, bainite subclasses are now to be considered, resulting in seven classes for the ML classification: pearlite, martensite, and five bainite subclasses.The task is the automated, objective, and reproducible classification of the carbon-rich second phase objects in SEM micrographs, as illustrated in Figure 1.Several classes can be present simultaneously in one micrograph.This classification in turn will enable a precise calculation of phase fractions and microstructural quantification, which again is the basis for establishing processingmicrostructure-property correlations and further materials development.First, dataset generation, ground truth assignment, and ML concepts will be described.Assigning the ground truth for the ML classification proved to be challenging.The investigated industrial samples do not show many textbook-like structures, as complex alloying concepts and industrial thermomechanical processing lead to structures that are not as clear and distinct as schematics reported in the literature.To achieve a wellfounded and objective ground truth, round robin tests with a group of experts as well as supporting methods, such as the use of reference samples and correlative EBSD measurements, as described in previous works [8,25], were taken into account.In this context, it must be emphasized that the assignment of ground truth or available data and ML algorithms should not be treated in isolation, but rather as part of a holistic approach First, dataset generation, ground truth assignment, and ML concepts will be described.Assigning the ground truth for the ML classification proved to be challenging.The investigated industrial samples do not show many textbook-like structures, as complex alloying concepts and industrial thermomechanical processing lead to structures that are not as clear and distinct as schematics reported in the literature.To achieve a well-founded and objective ground truth, round robin tests with a group of experts as well as supporting methods, such as the use of reference samples and correlative EBSD measurements, as described in previous works [8,25], were taken into account.In this context, it must be emphasized that the assignment of ground truth or available data and ML algorithms should not be treated in isolation, but rather as part of a holistic approach to building the ML model, starting with the selection of appropriate samples and achieving reproducible sample contrasting and suitable imaging techniques [8].
Regarding ML approaches, different classification models and strategies will be tested and discussed.Also, misclassifications of the model will be evaluated.Considering the above-mentioned challenges regarding the characterization and classification of bainite, i.e., ambiguous interpretations by different experts, selection of the classification scheme, definition of classes and class boundaries, and assignment of the ground truth, a perfect classification result cannot be expected.Instead, an "inherent uncertainty" of a bainite classification can be assumed.Approaches on how to handle this uncertainty and how it influences the final phase fraction result will be discussed.

Data Set Generation
This study was conducted with the same images and dataset that was used in [3] for the classification of pearlite, bainite, and martensite.Sample materials are low-carbon multi-phase steels from industrial production, consisting of objects from a carbon-rich second phase in a matrix of polygonal ferrite.Carbon-rich second phase can be pearlite, martensite, or different bainite types.In one micrograph, several classes of the carbonrich second phase can be present simultaneously (Figure 1).By controlling the type of second phases, these steels have a broad range of properties and applications.Typical applications include pressure vessels or linepipes.Both the chemical composition and the processing steps of the steels play only a secondary role in the classification, as it should be based on the microstructure itself without possible bias from incorporating chemistry or processing.Additionally, exact chemical compositions cannot be reported, as they are part of an industrial collaboration.For sample preparation, contrasting, and image acquisition, the reader is referred to the previous work by Gola et al. [2,3].
For classification, only the carbon-rich second phase objects are of interest, not the ferritic matrix.The first step of the feature extraction process is the definition and extraction of the second phase objects.The light microscopic (LM) image is first segmented by thresholding.Short etching times with modified Beraha's reagent lead to good contrasting of the second-phase areas, while ferrite grain boundaries are only slightly attacked, making it easy to segment the second phase by simple thresholding [26].This segmented LM image is then applied as a binary mask to the SEM image, removing the ferritic matrix, which is not of interest for further analysis, and enabling the definition and extraction of individual second phase objects from the SEM image (Figure 2).For each individual second phase object, three parameter groups are extracted, all of which are based on the substructure inside the second phase objects: (1) Haralick parameters as well as (2) local binary pattern, representing the image texture, and (3) morphological characteristics for all substructure particles inside the object (Figure 2).The texture parameters developed by Haralick et al. [27], in essence, describe how often a gray value in the image occurs in a given spatial relationship to another gray value.For this purpose, the grayscale cooccurrence matrix (GLCM) of the image is computed.From the GLCM, several parameters can be calculated that represent the image texture.Here, mean values and amplitudes for each parameter are calculated based on Webel et al. [28], resulting in 38 features for this group.Local Binary Pattern (LBP) is a texture descriptor, originally proposed by Ojala et al. [29].LBP features encode the neighboring context of each pixel into a histogram of the entire image, which is used as the final feature descriptor.LBP can be calculated for different numbers of neighboring pixels (N) and distances of the neighboring pixels (R).
Here, a multi-scale LBP combing for different R-N settings, yielding 64 features, is used (1-8, 2.4-8, 4.2-16, and 6.2-16) [21].For the calculation of morphological parameters of the substructure, the second phase object is segmented by simple thresholding.For all substructure particles inside this second phase object, standard morphological parameters (equivalent diameter, maximum ferret diameter, aspect ratio, etc.) are computed from this binary image.For each parameter, the values of all single particles are combined into mean value and standard deviation of their logarithmic distribution.Additionally, the total area of the substructure, as well as the substructure density (substructure area divided by object area), are calculated.This parameter group has 46 features.aspect ratio, etc.) are computed from this binary image.For each parameter, the values of all single particles are combined into mean value and standard deviation of their logarithmic distribution.Additionally, the total area of the substructure, as well as the substructure density (substructure area divided by object area), are calculated.This parameter group has 46 features.Compared to a previous study [3], LBP were added to the dataset, as they showed promising potential for microstructural classification [21].However, morphological characteristics of the second phase objects were removed, as they are the least stable parameter class regarding processing conditions and sample orientation during image acquisition.All processes of object and feature extraction were performed using MATLAB (R2020a, MathWorks, Natick, MA, USA).

Ground Truth Assignment
To label the bainitic microstructures present in the samples, the classification scheme suggested by Zajac et al. [15] was chosen, as it is the most convenient to use in common parlance and fits best with the present bainitic structures.In total, seven classes are considered: pearlite, degenerate pearlite, debris of cementite, incomplete transformation product, upper bainite, lower bainite, and martensite, as shown in Figure 3. Compared to a previous study [3], LBP were added to the dataset, as they showed promising potential for microstructural classification [21].However, morphological characteristics of the second phase objects were removed, as they are the least stable parameter class regarding processing conditions and sample orientation during image acquisition.All processes of object and feature extraction were performed using MATLAB (R2020a, MathWorks, Natick, MA, USA).

Ground Truth Assignment
To label the bainitic microstructures present in the samples, the classification scheme suggested by Zajac et al. [15] was chosen, as it is the most convenient to use in common parlance and fits best with the present bainitic structures.In total, seven classes are considered: pearlite, degenerate pearlite, debris of cementite, incomplete transformation product, upper bainite, lower bainite, and martensite, as shown in Figure 3.
Pearlite (P) shows pronounced, regular, and mostly continuous lamellar structures.Compared to pearlite, degenerate pearlite (DP) exhibits incomplete or not very pronounced, continuous lamellar structures.Debris of cementite (DC) consists of cementite particles at object boundaries or inside the objects.It does not exhibit lamellar or lath structures.Incomplete transformation products (ITP) are "composed of fragmented debris of ferrite, cementite, and M/A" [15], forming when austenite decomposition ceases due to alloying elements that decrease the ferrite growth rates, such that the "residual austenite transforms to an unusual microstructure" [15].The key for assignment of the class ITP is the presence of untransformed austenite or M/As.Upper bainite (UB) consists of lath-like ferrite with cementite at the lath boundaries, while lower bainite (LB) consists of lath-like ferrite with cementite precipitates inside the ferrite laths.Objects with characteristics of more than one class were not labeled.Other bainitic structures, e.g., degenerate upper bainite, or isolated M/A particles that are not part of an ITP object, were not present in the investigated samples.Pearlite (P) shows pronounced, regular, and mostly continuous lamellar structures.Compared to pearlite, degenerate pearlite (DP) exhibits incomplete or not very pronounced, continuous lamellar structures.Debris of cementite (DC) consists of cementite particles at object boundaries or inside the objects.It does not exhibit lamellar or lath structures.Incomplete transformation products (ITP) are "composed of fragmented debris of ferrite, cementite, and M/A" [15], forming when austenite decomposition ceases due to alloying elements that decrease the ferrite growth rates, such that the "residual austenite transforms to an unusual microstructure" [15].The key for assignment of the class ITP is the presence of untransformed austenite or M/As.Upper bainite (UB) consists of lath-like ferrite with cementite at the lath boundaries, while lower bainite (LB) consists of lath-like ferrite with cementite precipitates inside the ferrite laths.Objects with characteristics of more than one class were not labeled.Other bainitic structures, e.g., degenerate upper bainite, or isolated M/A particles that are not part of an ITP object, were not present in the investigated samples.
Assigning the ground truth proved to be challenging.The investigated industrial samples do not show many textbook-like structures, as complex alloying concepts and industrial thermomechanical processing lead to structures that are not as clear and distinct as schematics reported in the literature.In general, for complex microstructures such as bainite, it can be dangerous to only rely on the visual appearance of the microstructures Assigning the ground truth proved to be challenging.The investigated industrial samples do not show many textbook-like structures, as complex alloying concepts and industrial thermomechanical processing lead to structures that are not as clear and distinct as schematics reported in the literature.In general, for complex microstructures such as bainite, it can be dangerous to only rely on the visual appearance of the microstructures to the expert eye, as it can easily introduce a subjective and non-reproducible component.Therefore, supporting methods should be applied.Performing a round robin test with a group of experts is a simple and effective means for a more objective ground truth.By doing this with a group of eight experts, a consensus on how to assign the ground truth for the present microstructures could be accomplished.Additionally, supporting methods as described in [8,25], e.g., the use of reference samples and correlative EBSD measurements, were used.The derived knowledge and experience from those help in getting a ground truth as well-funded, objective, and reproducible as possible for this complex bainite classification task at hand.Table 1 shows the summary of the final annotated dataset with classes and numbers of objects per class.

Machine Learning Classification
Firstly, correlated features (R 2 > 0.90) are removed.Thus, the number of features was reduced from 148 to 72 (Haralick: 19, LBP: 32, and Morphology: 21).Additionally, the data were standardized so that all features have the same data range.In order to assess the generalization of the trained ML model and to be able to directly compare different classification models using the same data, the data were randomly split into a training set (80%) and a test set (20%).While splitting the data, the class distribution in training and test set was kept the same.Different classification strategies are tested.On the one hand, all seven classes are classified at once.On the other hand, hierarchical classifications are tested that first distinguish the easier main classes (e.g., pearlite (P) vs. bainite (B) vs. martensite (M)) before bainite subclasses will be taken into account.Considering the complexity of the structures of the seven classes, it could be difficult for a machine learning algorithm to distinguish all of them at once; this is why a hierarchical classification appears promising.The different classification strategies are summarized in Table 2.For ML classification, a support vector machine (SVM) was used.A SVM classifies data by finding the best hyperplane that separates the data points of one class from the data points of another class.The implementation was done using the MATLAB classification learner app, which allows automated classifier training of different SVM to find its best kernel and parameter settings.
As seen in Table 1, the final dataset is highly unbalanced.Unbalanced data are a common and long known problem for machine learning classifications.Using unbalanced classes to build a ML model can introduce a bias towards classes with more data [30].Frequently, when classifying images, data augmentation is used to balance the classes.However, for this task, typical data augmentation techniques for increasing the number of images or data cannot be applied because the extracted features or the feature extraction process, respectively, are either invariant against these augmentations (e.g., rotating, flipping) or the microstructure characteristics would be falsified (e.g., cropping, scaling, distorting).Nevertheless, different strategies to counter unbalanced data exist [31] of which following were tested: (1) introducing misclassification costs; (2) under-sampling: for every class only the number of objects of the smallest class is used; (3) over-sampling: data points from under-represented classes can be used multiple times.Here, they are used twice and combined with an under-sampling of still over-represented classes.(4) Synthetic samples are created using the Synthetic Minority Oversampling Technique (SMOTE) [32]: the number of objects for every class is increased to the number of the biggest class.Preliminary tests showed no negative effects of using unbalanced data, i.e., no biases of the classifier.This is in agreement with [33], who suggest that SVMs are less prone to class imbalance problems than other classification algorithms.In fact, creating a balanced subset of the data by under-sampling and over-sampling yielded worse classification results than using the whole unbalanced data.Applying SMOTE, the classification improved only marginally.Therefore, for simplicity, only unbalanced data will be used for testing different classification strategies.
For the best model from classification strategy variations, a feature ranking and feature selection based on minimum redundancy maximum relevance (MRMR) algorithm [34] is performed.Additionally, a hyperparameter optimization is done using Bayes optimization in the MATLAB classification learner app.The overall classification accuracy is not the best-suited performance metric when classes are unbalanced because the impact of the least represented examples is reduced when compared to that of the majority class [35].Instead, confusion matrix and metrics derived from it, such as class-precisions, class-recalls, or F1 scores, are better suited [36].The accuracy is the ratio of correctly predicted examples to the total examples.Recall is the ratio of true positives to the sum of true positives plus false negatives, while precision is the ratio of true positives to the sum of true positives plus false positives.F1 score is defined as the product of precision and recall times two, divided by the sum of precision and recall.F1 score can be calculated for each class.Overall F1 score is the mean value of F1 scores of each class.Here, accuracy and overall F1 scores are reported to assess and compare classification results.

Classification Results
Tables 3-6 show the confusion matrices with the performance metrics precision, recall, accuracy, and F1 score for the four different classification strategies.To allow a direct comparison between 7-class-classification and hierarchical classification, the different models of the hierarchical classification are not evaluated individually, but regarding the final seven classes, on the same test set as the 7-class-classification.Classification strategies 1-3 show almost identical classification results (F1 scores of 81.0%, 81.3%, and 81.0%).Only classification strategy 4 has a slight drop in accuracy (F1 score of 79.6%).The results suggest that despite the complexity of the investigated classes, distinguishing all seven classes at once does not impair the classification accuracy.

Best Model
For further analysis, the 7-class-classification model is chosen.Despite using many features, the classification model seems to generalize well, as accuracy on the unseen test set is in the same range as the accuracy from the 5-fold cross validation during classifier training.Still, a feature ranking and feature selection is done based on minimum redundancy maximum relevance (MRMR) algorithm.By reducing the number of features from 72 to 40 a slight increase in accuracy and F1 score (81.0 to 81.7%) is achieved.By hyperparameter optimization in the MATLAB classification learner app, the classification could again be slightly improved to 82.9% accuracy with a 82.4% F1 score.The following SVM parameter settings were finally used: quadratic kernel, one vs.one multiclass method and a box constraint level of 1.6298.The resulting confusion matrix is shown in Table 7.The comparatively small differences between different sampling and classification strategies as well as the modest improvements by feature selection and hyperparameter optimization suggest that for classifying the present complex microstructures, the aspects of conventional ML techniques have an overall smaller effect on the classification result than a thoroughly material science-based feature engineering.
Table 8 shows the top 15 features after the MRMR feature ranking.Features from all three parameter groups are represented, justifying that parameters from all three groups are carried.The 40 features from the best model consist of 13 Haralick features, 9 LBP features, and 18 morphological features.Regarding feature types, these are 22 image texture features and 18 morphological features, a 55/45 split, suggesting that both feature types are important, with a certain higher statistical importance of the textural features.Considering the amount of analyzed second phase objects and their variety of structures, it is difficult to correlate the microstructures with the extracted features and discuss their importance for the classification accuracy.Precisely because it is virtually impossible for the human mind to recognize the patterns and relationships in all this data, machine learning algorithms are needed to analyze it and build the classification models.Additionally, image texture parameters can be hard to elucidate.Nonetheless, it is helpful to try to interpret the most important features regarding the microstructure classes, for a better material science-based understanding and evaluation.Still, it should be kept in mind that following remarks are only assumptions and not verifiable conclusions.
Ultimately, the main differences between the seven microstructure classes lie in size, shape, and arrangement of the cementite particles inside the second phase objects.The Haralick image texture parameter contrast is a measure of the local variations in an image [27].Low contrast values mean fewer local variations in the image.This means that if a second phase objects does not have much substructure, i.e., carbide particles, but more ferritic areas, this dark background that only has few local variations can lower the overall contrast value.This can be the case for the classes of debris of cementite or lower bainite.
Morphological parameters of the cementite particles, i.e., typical size and shape characteristics [37,38], are captured in the form of mean values and standard deviations of the logarithmic distribution of all single cementite particles in the second phase object.Standard deviation is particularly interesting because it captures how homogeneous cementite particles are, with regard to a specific morphological feature.Standard deviations of aspect ratio, major axis length, or roundness should be lower for upper bainite (mostly longer cementite precipitates on lath boundaries) than lower bainite (small cementite precipitates inside the laths but also some bigger precipitates on lath or object boundaries).Martensite should also be lower because there are fewer individual particles but more connected components that form a network structure.Mean sphericity and mean axial ratio should be able to capture the average shape of particles, i.e., differences between lamellar structures, such as pearlite or small precipitates, as in debris of cementite or lower bainite.The total substructure area, i.e., the sum of areas of all cementite particles can be sensitive for ITP and small debris of cementite objects, because these smaller particles have less subarea compared to the usually bigger particles like pearlite, upper and lower bainite, or martensite.
Local binary patterns are good at capturing small and fine details of images [39], e.g., edges, corners, spots, etc.The result is in the form of a histogram, in which individual bins of the histogram can be analyzed and used for comparing and classifying microstructures.By using uniform LBP, the length of the histogram can be reduced and the performance of classifiers using these LBP features can be improved [29,40].Bin 0 represents bright spots, while bins 1 to 7 correspond to different edges or corners of varying positive and negative curvature [29].Thus, it is plausible that LBP can capture the differences in size, shape, and arrangement of the cementite particles.All in all, the use and choice of important features seems appropriate.

Misclassifications
Looking at the F1 scores for each class in the confusion matrix in Table 7, high values are achieved for classifying pearlite, martensite, and upper and lower bainite.Between pearlite and degenerate pearlite there are some misclassifications, which can be understood, as it is not easy to define what is still a regular, mostly continuous lamella (pearlite) and what is already a "degenerate" and incomplete lamella (Figure 4a).This is amplified by the varying appearance of lamellae, depending on their orientation, with respect to the sample surface.If regular lamellae are cut inclined, they can appear somewhat irregular or similar to M/A (red circles in Figure 4b).Misclassifications also occur between martensite and incomplete transformation product.This also seems plausible, since the M/A parts of an ITP (red circle in Figure 4c) can look like martensite.Between upper and lower bainite only two mix-ups are observed.Instead, some mix-ups with debris of cementite are found.Based on their morphology and distribution inside the object, cementite particles in DC can appear similar to upper or lower bainite (Figure 4d).However, because these DC objects do not show any ferritic lath structures, they are not UB or LB.Nevertheless, this can explain the mix-ups.
site and incomplete transformation product.This also seems plausible, since the M/A parts of an ITP (red circle in Figure 4c) can look like martensite.Between upper and lower bainite only two mix-ups are observed.Instead, some mix-ups with debris of cementite are found.Based on their morphology and distribution inside the object, cementite particles in DC can appear similar to upper or lower bainite (Figure 4d).However, because these DC objects do not show any ferritic lath structures, they are not UB or LB.Nevertheless, this can explain the mix-ups.Overall, most misclassifications occur in the three classes of DP, DC, and ITP.This seems understandable, as those classes, compared to the other classes, have similar appearances and weaker class boundaries.For instance, the transition from DP to DC is smooth and it is hard to define when a cementite particle is still like an irregular lamella or already like debris (Figure 4e,f).Also, ITP can have cementite particles, too, that look similar to DC or DP.If the retained austenite or M/A part of the ITP is not that pronounced, it makes sense that it can be misclassified as DC or DP (Figure 4g,h).Based on the described relationships, it seems plausible that these are not only difficult for the expert to assess, but also difficult for a ML algorithm to learn.If the three similar and hard to distinguish classes DP, DC, ITP are combined into one group of "granular bainitic structures", the classification accuracy increases to 93.2%.This naturally raises the question about the necessity of bainite subclasses that are similar and not easy to distinguish.In general, there is a controversy about bainite classification schemes and subclass definitions, as described in the introduction.However, before a conclusion about the necessity can be drawn, the subclasses must first be captured, analyzed, and used in correlations Overall, most misclassifications occur in the three classes of DP, DC, and ITP.This seems understandable, as those classes, compared to the other classes, have similar appearances and weaker class boundaries.For instance, the transition from DP to DC is smooth and it is hard to define when a cementite particle is still like an irregular lamella or already like debris (Figure 4e,f).Also, ITP can have cementite particles, too, that look similar to DC or DP.If the retained austenite or M/A part of the ITP is not that pronounced, it makes sense that it can be misclassified as DC or DP (Figure 4g,h).Based on the described relationships, it seems plausible that these are not only difficult for the expert to assess, but also difficult for a ML algorithm to learn.If the three similar and hard to distinguish classes DP, DC, ITP are combined into one group of "granular bainitic structures", the classification accuracy increases to 93.2%.This naturally raises the question about the necessity of bainite subclasses that are similar and not easy to distinguish.In general, there is a controversy about bainite classification schemes and subclass definitions, as described in the introduction.However, before a conclusion about the necessity can be drawn, the subclasses must first be captured, analyzed, and used in correlations with mechanical properties to finally determine their actual influence on properties of industrial steel grades.This suggested classification pipeline provides this opportunity.

Phase Fraction Determination
The classification is the basis for computing phase fractions.Thus, it is important to estimate how accurately the classification accuracy translates to the determined phase fractions.It is important to note that the reported classification results refer to the number of classified objects ("number fraction accuracy").However, those objects differ in their size.Analyzing misclassifications regarding the object area shows that most misclassifications are smaller objects.This makes sense, as the most misclassifications are in the classes DC, DP and ITP, which tend to be smaller objects.If the classification result is related to the object area ("area fraction accuracy") instead of the object numbers, the accuracy increases from 82.9% to 89.2%.This suggests that the error in phase fraction determination is reasonably small, and a precise calculation of phase fractions is possible.

Inherent Uncertainty of Bainite Classifications
Considering the complexity of bainitic structures as well as the challenges during their assessment, 100% classification accuracy seems unrealistic, and an "inherent uncertainty" of any bainite classification should be expected.Reasons for this uncertainty are, in particular, ambiguous interpretations from different experts, choice of classification scheme and definition of classes and class boundaries.Various bainite classification schemes are proposed in the literature, as explained in the introduction.The scheme that will be chosen for the ML classification workflow should be application-oriented and ready to use immediately in process-microstructure-property correlations.In this work, the scheme suggested by Zajac et al. [15] is used not only because it fits very well with the present bainitic structures, but also because each class is expressed as one integral expression (e.g., debris of cementite, upper or lower bainite) that can easily be plugged into correlations.Other schemes, such as the one suggested by Gerdemann et al. [18], express classes in a code of letters that correlate to the present microstructure constituents.This is more of a pure description of microstructure constituents and hard to use in common parlance or in correlations.
Basically, each classification scheme has strictly defined classes that must be represented during the ground truth assignment and in the ML model.However, class boundaries are rarely explicitly defined.To assign images to existing, strict classes on the one hand, but without clear class boundaries on the other hand, will result in some uncertainty.Although the ground truth assignment was as objective and well funded as possible, by transferring knowledge from EBSD measurements and reference samples, there is still some remaining uncertainty, as well as some bias, stemming from choosing classification scheme and classes.Approaches to make class definition and ground truth even more objective will be discussed in the outlook section.
During application of the suggested ML classification, it is important to deal with this uncertainty and to be able to judge the classification quality by deriving confidence metrics.A simple approach is the use of a probabilistic classifier.By interpreting class probabilities (values from 0 to 1) as a confidence of the predicted class, a better judgement of the classification is possible, which is especially important during a serial use in industrial processes.A standard SVM classifier is not probabilistic by itself, but it can be interpreted as a probabilistic classifier by fitting an appropriate score-to-posterior-probability transformation function [41].This transformation function computes the posterior probability that an observation is classified into the positive class.This is done using the MATLAB function "fitSVMPosterior", based on the approach suggested by [41].Pragmatically speaking, the probabilistic approach allows us to define a threshold for a "minimum classification confidence" (e.g., a value of 0.75 as the class probability for the predicted class), which can be used to filter out objects about which the classifier is "insecure".These objects (e.g., classifications with a class probability lower than 0.75) could then be tagged for assessment by a human expert.Furthermore, it is possible to judge the quality of the whole classification result by using these classification confidences (e.g., the mean of class probabilities of all predictions).
Furthermore, it allows us to deal with one limit of the present object-based classification approach.The object-based approach is "metallographically motivated", i.e., it follows the conventional approach of separating foreground from background and then analyzing the individual objects, also done in standard particle analysis.Furthermore, it provides the advantage that after classification, extracted features, such as morphological characteristics, e.g., carbide size and shape characteristics for each object, can be used directly in microstructure-property correlations.However, one limit of this approach is that large second phase objects can be present in the micrograph that can contain several grains, and therefore, structures from different bainite classes.These objects would be classified as just one class.Assuming that these larger objects that contain structures from more than one class should manifest in low confidence predictions, allows us to define a threshold for minimum classification confidence and a minimum object size to filter out these objects.In a post-processing step, these objects could then be automatically tiled in sub-images that are again classified in order to capture all structures of the different present classes (Figure 5).To achieve a tiling that is accurate to the shape of the second phase object and to avoid tiles that only contain the black background, a superpixel tiling, based on the MATLAB function superpixels [42,43], is performed.Figure 5 shows a large second phase object that would be classified as lower bainite, but with a low "confidence" (class probability of only 0.58), which is plausible because the object also exhibits structures from upper bainite and ITP.By automatically tiling the image into sub-images, a more accurate and sophisticated classification is achieved: structures from all three present classes are captured and correctly predicted, and instead of assigning the whole object as lower bainite, the object can be quantified as consisting of 68% lower bainite, 18% upper bainite, and 14% ITP.
Furthermore, it allows us to deal with one limit of the present object-based classification approach.The object-based approach is "metallographically motivated", i.e., it follows the conventional approach of separating foreground from background and then analyzing the individual objects, also done in standard particle analysis.Furthermore, it provides the advantage that after classification, extracted features, such as morphological characteristics, e.g., carbide size and shape characteristics for each object, can be used directly in microstructure-property correlations.However, one limit of this approach is that large second phase objects can be present in the micrograph that can contain several grains, and therefore, structures from different bainite classes.These objects would be classified as just one class.Assuming that these larger objects that contain structures from more than one class should manifest in low confidence predictions, allows us to define a threshold for minimum classification confidence and a minimum object size to filter out these objects.In a post-processing step, these objects could then be automatically tiled in sub-images that are again classified in order to capture all structures of the different present classes (Figure 5).To achieve a tiling that is accurate to the shape of the second phase object and to avoid tiles that only contain the black background, a superpixel tiling, based on the MATLAB function superpixels [42,43], is performed.Figure 5 shows a large second phase object that would be classified as lower bainite, but with a low "confidence" (class probability of only 0.58), which is plausible because the object also exhibits structures from upper bainite and ITP.By automatically tiling the image into sub-images, a more accurate and sophisticated classification is achieved: structures from all three present classes are captured and correctly predicted, and instead of assigning the whole object as lower bainite, the object can be quantified as consisting of 68% lower bainite, 18% upper bainite, and 14% ITP.

Outlook
Bainitic microstructures are a controversial topic.There is no consensus among human experts, neither in the microstructure formation mechanisms nor in labeling and classifying bainitic structures [9,10].Future work will include correlative characterization combining EBSD, SEM, and LM, as described in Müller et al. [25].Examples of using EBSD for ML-based microstructure classification can be found in [22,44].In this correlative approach, EBSD is an ideally complementary information source to LM and SEM, as it is based on measuring crystallographic orientations and does not have the subjective component of how the microstructure visually appears to the human expert eye in the microscope.Regarding bainite classification, the misorientation angle distribution can be a

Outlook
Bainitic microstructures are a controversial topic.There is no consensus among human experts, neither in the microstructure formation mechanisms nor in labeling and classifying bainitic structures [9,10].Future work will include correlative characterization combining EBSD, SEM, and LM, as described in Müller et al. [25].Examples of using EBSD for ML-based microstructure classification can be found in [22,44].In this correlative approach, EBSD is an ideally complementary information source to LM and SEM, as it is based on measuring crystallographic orientations and does not have the subjective component of how the microstructure visually appears to the human expert eye in the microscope.Regarding bainite classification, the misorientation angle distribution can be a powerful tool to distinguish different bainite types.However, the limited resolution of EBSD, considering step sizes that allow representative areas to be measured, usually does not allow the investigation of fine structures, such as cementite precipitates, in different bainite types.Additionally, for the investigated steels, it is challenging to define the second phase objects using only EBSD.Therefore, LM and SEM are needed [25].
This correlative approach is part of an ongoing study that allows us to systematically assess the accuracies of bainite classification when using LM, SEM, or EBSD features or a combination of them.On the one hand, limits and capabilities for bainite classification of each characterization techniques can be studied.Thereby, it could be concluded which technique, i.e., LM, SEM, or EBSD, is sufficient, respectively, and necessary for specific classification tasks, e.g., distinguishing only main classes, such as pearlite vs. bainite vs. martensite or also distinguishing bainite subclasses.Such an understanding is important for transferring the classification workflows to industrial applications.On the other hand, EBSD could also be used to automatically generate annotations for the microstructure classes, as suggested in [25] and done in [20].This could allow, with a set of correlative micrographs and EBSD-based annotations, the training of a bainite classification scheme, which uses only SEM or even only LM images during application.Alternatively, this EBSD data could be combined with unsupervised learning.Clusters representing bainitic subclasses could be derived, eliminating the remaining bias during ground truth assignment stemming from choosing classification scheme and classes.By comparing these unbiased, artificial intelligence-determined clusters with human-defined classes and labels, more objectivity could be introduced to the controversy of bainite classification [8].For a more detailed study of different bainite types, TEM analysis could also be included in the correlative approach.However, the time required and the limited areas that can be measured restrict practical use.

Conclusions
This work proposes an automated, objective, and reproducible machine learning classification of the carbon-rich second phase objects in multi-phase steels, including bainite subclasses, based on SEM micrographs.The following classes are considered in this complex classification task: pearlite, degenerate pearlite, debris of cementite, incomplete transformation product, and upper and lower bainite, as well as martensite, which can all be present simultaneously in one micrograph.Classification accuracies of 82.9% (number fraction) resp.89.2% (area fraction) are accomplished.This classification can be the basis for an improved, sophisticated microstructure quantification that facilitates establishing process-microstructure-property correlations.Thereby, it can form the backbone for a further, microstructure-centered materials development, which is needed to fulfill increasing demands and tighter tolerances in today's steel industry.Also, the objectivity, reproducibility, automation, and potential to analyze high amounts of data make the MLbased approach very interesting for industrial applications.Although the accuracy is not in the range of other reported and simpler microstructure classifications, it is a notable result considering the complexity of the microstructures at hand.Because of the various challenges when dealing with bainite, an inherent uncertainty in bainite classifications should be expected.One way to deal with this uncertainty and to judge the classification quality during "serial use" in industrial applications is the use of a probabilistic classifier, which allows the extraction of confidence metrics of the classification.

Figure 1 .
Figure 1.Illustration of classification task: (a) SEM micrograph.(b) Extraction of carbon-rich second phase objects: several classes can be present simultaneously.(c) Microstructure classification based on extracted features.Objects are colored according to the classification result.(d) Determination of phase fractions according to the classification result (DC: debris of cementite, ITP: incomplete transformation product, UB: upper bainite, LB: lower bainite).

Figure 1 .
Figure 1.Illustration of classification task: (a) SEM micrograph.(b) Extraction of carbon-rich second phase objects: several classes can be present simultaneously.(c) Microstructure classification based on extracted features.Objects are colored according to the classification result.(d) Determination of phase fractions according to the classification result (DC: debris of cementite, ITP: incomplete transformation product, UB: upper bainite, LB: lower bainite).

Figure 2 .
Figure 2. Different steps of object and feature extraction for microstructure classification.SEM (a) and segmented LM micrograph (b) are combined to remove ferritic matrix, define, and extract individual objects (c).SEM micrograph of an individual object is used to calculate textural features (d).Segmented SEM micrograph is used to compute morphological parameters (e).

Figure 2 .
Figure 2. Different steps of object and feature extraction for microstructure classification.SEM (a) and segmented LM micrograph (b) are combined to remove ferritic matrix, define, and extract individual objects (c).SEM micrograph of an individual object is used to calculate textural features (d).Segmented SEM micrograph is used to compute morphological parameters (e).

Figure 3 .
Figure 3. Seven microstructure classes considered for classification.

Figure 4 .
Figure 4. Examples of some misclassifications of the ML model.(a) DP classified as P since cementite lamellae are in the transition range from regular to degenerate shape.(b) DP classified as ITP as one lamella is cut inclined and appears like M/A.(c) ITP classified as M as the M/A part of ITP looks similar to martensite.(d) DC classified as LB since the cementite precipitates are arranged similar to those in lower bainite.(e) DP classified as DC since cementite lamellae are in the transition range from lamella to debris shape.(f) DC classified as DP since cementite particles are in the transition range from lamella to debris shape.(g) ITP classified as DC as there are also cementite particles in the ITP object, and the M/A fraction is rather small.(h) ITP classified as DC as there are also cementite particles in the ITP object, and the M/A fraction is rather small.(g) ITP classified as DP as there are also degenerate cementite lamellae in the ITP object.

Figure 4 .
Figure 4. Examples of some misclassifications of the ML model.(a) DP classified as P since cementite lamellae are in the transition range from regular to degenerate shape.(b) DP classified as ITP as one lamella is cut inclined and appears like M/A.(c) ITP classified as M as the M/A part of ITP looks similar to martensite.(d) DC classified as LB since the cementite precipitates are arranged similar to those in lower bainite.(e) DP classified as DC since cementite lamellae are in the transition range from lamella to debris shape.(f) DC classified as DP since cementite particles are in the transition range from lamella to debris shape.(g) ITP classified as DC as there are also cementite particles in the ITP object, and the M/A fraction is rather small.(h) ITP classified as DC as there are also cementite particles in the ITP object, and the M/A fraction is rather small.(g) ITP classified as DP as there are also degenerate cementite lamellae in the ITP object.

Figure 5 .
Figure 5. Big second phase object that was flagged as an insecure prediction after probabilistic classification because it contains structures from several classes.By automatically tiling the image into sub-images, a more accurate and sophisticated classification is achieved.

Figure 5 .
Figure 5. Big second phase object that was flagged as an insecure prediction after probabilistic classification because it contains structures from several classes.By automatically tiling the image into sub-images, a more accurate and sophisticated classification is achieved.

Table 1 .
Summary of class distribution in final annotated dataset.

Table 2 .
Overview of classification strategies.

Table 3 .
Confusion matrix of classification strategy 1 (seven classes at once).

Table 7 .
Confusion matrix of the best classification model (reduction to 40 features and hyperparameter optimization).