Critical Review of Recent Advances in AI-Enhanced SEM and EDS Techniques for Metallic Microstructure Characterization

Abdelal, Gasser; Chan, Chi-Wai; McLoone, Sean

doi:10.3390/app16020975

Open AccessReview

Critical Review of Recent Advances in AI-Enhanced SEM and EDS Techniques for Metallic Microstructure Characterization

by

Gasser Abdelal

^1,*

,

Chi-Wai Chan

¹

and

Sean McLoone

²

¹

School of Mechanical and Aerospace Engineering Belfast, Queen’s University Belfast, Belfast BT9 5WA, UK

²

Electrical Engineering and Computer Science, School of Electronics, Queen’s University Belfast, Belfast BT9 5WA, UK

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(2), 975; https://doi.org/10.3390/app16020975

Submission received: 10 December 2025 / Revised: 6 January 2026 / Accepted: 13 January 2026 / Published: 18 January 2026

(This article belongs to the Special Issue Advances in AI and Multiphysics Modelling)

Download

Browse Figures

Versions Notes

Abstract

This critical review explores the transformative impact of artificial intelligence (AI), particularly machine learning (ML) and computer vision (CV), on scanning electron microscopy (SEM) and energy dispersive spectroscopy (EDS) for metallic microstructure analysis, spanning research from 2010 to 2025. It critically evaluates how AI techniques balance automation, accuracy, and scalability, analysing why certain methods (e.g., Vision Transformers for complex microstructures) excel in specific contexts and how trade-offs in data availability, computational resources, and interpretability shape their adoption. The review examines AI-driven techniques, including semantic segmentation, object detection, and instance segmentation, which automate the identification and characterisation of microstructural features, defects, and inclusions, achieving enhanced accuracy, efficiency, and reproducibility compared to traditional manual methods. It introduces the Microstructure Analysis Spectrum, a novel framework categorising techniques by task complexity and scalability, providing a new lens to understand AI’s role in materials science. The paper also evaluates AI’s role in chemical composition analysis and predictive modelling, facilitating rapid forecasts of mechanical properties such as hardness and fracture strain. Practical applications in steelmaking (e.g., automated inclusion characterisation) and case studies on high-entropy alloys and additively manufactured metals underscore AI’s benefits, including reduced analysis time and improved quality control. Extending prior reviews, this work incorporates recent advancements like Vision Transformers, 3D Convolutional Neural Networks (CNNs), and Generative Adversarial Networks (GANs). Key challenges—data scarcity, model interpretability, and computational demands—are critically analysed, with representative trade-offs from the literature highlighted (e.g., GANs can substantially augment effective dataset size through synthetic data generation, typically at the cost of significantly increased training time).

Keywords:

machine learning; scanning electron microscopy; energy dispersive spectroscopy; metallic microstructure; semantic segmentation; object detection; instance segmentation; chemical composition analysis; predictive modelling

1. Introduction

Scanning Electron Microscopy (SEM) and Energy Dispersive Spectroscopy (EDS) are cornerstone techniques in materials science for characterising the microstructure of metallic materials. SEM provides high-resolution imaging of surface morphology, revealing features such as grain boundaries, phases, and defects, while EDS complements this by identifying elemental compositions through X-ray analysis. These methods are critical for understanding how microstructure influences metals’ mechanical, thermal, and chemical properties.

Since 2010, the integration of Artificial Intelligence (AI), particularly machine learning (ML) and computer vision (CV), has transformed the analysis of SEM and EDS data. AI enables automation, enhances accuracy, and uncovers manual insights that are challenging to achieve. This report synthesises research from 2010 to 2025, focusing on how AI is applied to SEM and EDS data for metallic microstructure analysis, emphasising automation, classification, predictive modelling, and advanced image processing.

SEM scans a sample with a focused electron beam, producing images through interactions like secondary electron emission, highlighting topographic details. EDS, typically integrated with SEM, detects characteristic X-rays emitted from the sample to map elemental distributions. These techniques are widely used in metallurgy to study alloys, steel, and advanced materials like high-entropy alloys and 3D-printed metals.

Traditional SEM/EDS data analysis is labour-intensive, requiring manual interpretation of images and spectra. AI addresses these challenges by automating tasks, reducing analysis time, and improving objectivity. Research since 2010 has increasingly leveraged AI to handle the complexity and volume of SEM/EDS data, particularly for metallic materials.

While prior reviews, such as Holm et al. (2020) [1], have provided valuable insights into the application of computer vision and machine learning in microstructural characterisation up to 2020, this paper extends the scope by synthesising research from 2010 to 2025, with a particular emphasis on advancements in the last five years. We incorporate cutting-edge AI techniques, including Vision Transformers and 3D Convolutional Neural Networks (CNNs), which have significantly improved the precision and efficiency of SEM and EDS data analysis—methods not extensively explored in earlier works. Additionally, this review addresses the growing role of predictive modelling, leveraging AI to forecast mechanical properties such as hardness and fracture strain from microstructural data. A dedicated section on automated SEM/EDS for inclusion characterisation in steelmaking further distinguishes this work, highlighting time-efficient, machine learning-driven solutions for industrial processes, an area of increasing relevance not deeply covered by Holm et al. (2020) [1].

This work introduces a novel Microstructure Analysis Spectrum framework to categorize AI applications by task complexity (e.g., binary classification vs. 3D reconstruction) and scalability (research vs. industrial applications). This framework guides the review, offering a new perspective on why certain methodologies excel and how trade-offs in data, computation, and interpretability shape the field’s evolution.

1.1. Microstructure Analysis Spectrum: A Framework for AI in SEM/EDS

The Microstructure Analysis Spectrum organizes AI applications in SEM/EDS along two axes: task complexity (simple tasks like binary classification to complex tasks like 3D reconstruction) and scalability (research-focused with small datasets vs. industrial high-throughput applications). Figure 1 illustrates this framework, plotting techniques like U-Net (high-accuracy, low-scalability) and YOLO (You Only Look Once) (high-scalability, moderate accuracy) (high-scalability, moderate accuracy) to highlight their strengths and limitations. This framework provides a novel lens to evaluate why certain AI methods are chosen and how trade-offs influence their adoption, guiding the structure of this review.

This framework not only guides the structure of this review but is consistently applied in subsequent sections to position specific AI techniques, highlighting their contextual strengths, limitations, and trade-offs in metallic microstructure analysis.

Positioning is conceptual: ‘high complexity’ reflects tasks requiring precise boundary delineation or multi-modal fusion (instance segmentation > object detection); ‘high scalability’ favours fast inference and large-dataset tolerance (YOLO for real-time vs. U-Net for research precision). No universal quantitative metrics exist due to task variability, but examples are mapped throughout.

1.2. Evaluation Metrics

Common metrics for evaluating AI in SEM/EDS include:

Mean Intersection over Union (mIoU): Measures overlap between predicted and ground truth segments.
Mean Average Precision (mAP): Balances precision and recall for detection tasks.
F1 Score: Combines precision and recall.
Pixel Accuracy: Percentage of correctly classified pixels. These metrics, used throughout this review, assess model performance across tasks.

1.3. Scope and Significance

This review synthesizes research from 2010 to 2025, focusing on automation, classification, predictive modelling, and advanced image processing in metallic materials like alloys, steels, and additively manufactured metals. It extends Holm et al. (2020) [1] by incorporating post-2020 advancements (e.g., Transformers, 3D CNNs) and emphasizing predictive modelling and industrial applications like steelmaking inclusion characterization. By using the Microstructure Analysis Spectrum, it critically evaluates why methodologies succeed and proposes future directions to overcome challenges like data scarcity.

A critical challenge underpinning AI adoption in metallic SEM/EDS analysis is data availability. Annotated datasets are typically small (100–1000 images for research tasks) due to the expertise required for pixel-level labeling of complex features like grain boundaries or inclusions. Larger datasets (5000–50,000+) often rely on synthetic generation or industrial automation. This scarcity drives reliance on transfer learning, data augmentation, and semi-supervised methods, as highlighted in the case studies below. Table 1 summarizes dataset characteristics from key metallic studies reviewed here.

2. Core AI in Microstructural Characterisation

AI, encompassing ML and CV, has revolutionised microstructural characterisation by enabling quantitative analysis of SEM images. CV extracts numerical features from images, which ML algorithms use to identify patterns, classify structures, or predict properties. A comprehensive overview by [1] highlights several AI applications in the microstructural analysis. This section organizes applications using the Microstructure Analysis Spectrum, analysing semantic segmentation, object detection, instance segmentation, and chemical composition analysis by task complexity and scalability, with critical trade-offs highlighted.

Key Applications

Semantic Segmentation: AI models, such as convolutional neural networks (CNNs), segment SEM micrographs to identify phases or features. For instance, segmentation of ultrahigh carbon steel and Al-Zn alloy micrographs achieved accuracies of 93% and 99.6%, respectively, using PixelNet CNNs trained on ImageNet datasets.

Object Detection: CNN-based detection on SEM images of metal powders identified particles with 80% recall and 94% precision, aiding in powder characterisation for additive manufacturing.

Instance Segmentation: For Inconel-718 powder, AI methods like connected-component labelling and k-means clustering created quantitative microstructural fingerprints, achieving 97.5% precision and 95.4% recall.

Chemical Composition Analysis: Remarkably, AI can infer chemical compositions from SEM images alone, without EDS, by leveraging latent information in grayscale backscattered electron (BSE) images. A study achieved 76% accuracy in determining steel inclusion compositions, surpassing random chance (20%).

These applications, as shown in Figure 2, demonstrate AI’s ability to automate and enhance traditional microstructural analysis, making it more efficient and reproducible.

2.1. Semantic Segmentation

Semantic segmentation, as shown in Figure 3, is a cornerstone of computer vision, involves classifying each pixel in an image into predefined categories, providing a detailed, pixel-wise understanding of the image content. Positioned in the high-complexity, moderate-scalability quadrant of the Microstructure Analysis Spectrum, it excels in research but is computationally intensive for industrial use.

This technique is distinct from object detection, which identifies objects with bounding boxes, as semantic segmentation delineates exact shapes and boundaries, making it essential for applications requiring high precision. Artificial intelligence (AI) integration, particularly deep learning, has revolutionised this field, enabling automated and accurate segmentation across diverse domains, including materials science, which plays a critical role in analysing scanning electron microscopy (SEM) images for microstructural characterisation.

This section provides a comprehensive analysis of AI applications in semantic segmentation, focusing on advancements since 2010, emphasising its relevance in materials science for SEM image analysis. It covers the technical foundations, key AI techniques, applications across domains, specific use cases in materials science, recent advancements, challenges, and future directions. The report also includes quantitative insights and performance metrics to illustrate the effectiveness of these methods.

Semantic segmentation is assigning a class label to every pixel in an image, producing a dense pixel-wise segmentation map. This is in contrast to instance segmentation, which distinguishes between different instances of the same class, and panoptic segmentation, which combines semantic and instance segmentation. The process is crucial for applications where understanding spatial relationships and fine details is necessary, such as in autonomous driving for identifying drivable paths or medical imaging for detecting cancerous cells.

Performance is evaluated using standard metrics introduced in Section 1.2, including Mean Intersection over Union (mIoU), F1-score, and pixel accuracy.

These metrics are widely used in benchmarks like Cityscapes, PASCAL VOC, and ADE20K, highlighting the task’s importance in computer vision research. The complexity of semantic segmentation lies in its need for high-resolution outputs and the ability to handle diverse object classes and scales within a single image.

2.1.1. AI Techniques for Semantic Segmentation

The rise of deep learning, especially convolutional neural networks (CNNs), has significantly advanced semantic segmentation (Figure 3). CNNs are particularly effective because they extract spatial hierarchies of features from images through convolutional layers, pooling, and activation functions. Key architectures include:

U-Net: Originally developed for biomedical image segmentation, U-Net features an encoder–decoder structure with skip connections, enabling precise segmentation even with small datasets. Its ability to work with limited data makes it suitable for specialised fields like materials science [7].

Pros: U-Net is highly effective for semantic segmentation with limited training data due to its encoder–decoder structure and skip connections, which preserve fine details. Its adaptability to small datasets makes it ideal for materials science applications where annotated data is scarce.

Cons: U-Net requires careful hyperparameter tuning and can be computationally intensive for very high-resolution SEM images. Its performance may degrade if the training data does not adequately represent the variability in SEM image quality, such as noise or contrast differences.

Fully Convolutional Networks (FCN): FCNs replace fully connected layers with convolutional layers, allowing the network to output spatial maps instead of classification scores, which is ideal for segmentation tasks [7].

Pros of Fully Convolutional Networks (FCNs): Fully Convolutional Networks (FCNs) excel in pixel-wise segmentation, making them highly suitable for analysing Scanning Electron Microscopy (SEM) images of metallic microstructures. Unlike traditional networks that provide a single classification output, FCNs produce detailed segmentation maps by assigning a class to every pixel while retaining spatial information. This precision is crucial in materials science, where identifying the exact locations and shapes of features like grain boundaries, phases, or defects directly influences the understanding of material properties.

The ability to train FCNs end-to-end is another key advantage. This means the entire network, from input to output, is optimised in a single process, avoiding the complexity and potential errors of multi-step workflows. For researchers working with SEM images, this streamlined approach saves time and simplifies integration into existing analysis pipelines, thereby accelerating material characterisation.

FCNs also leverage transfer learning effectively, allowing models pre-trained on large datasets like ImageNet to be adapted for specific tasks. In materials science, where annotated SEM datasets are often scarce and expensive to create, this capability is invaluable. Fine-tuning a pre-trained model with a smaller dataset can still yield robust performance, lowering the barrier to adopting AI-driven solutions.

Additionally, the versatility of FCNs enhances their appeal. They have proven successful in diverse fields, from autonomous driving to medical imaging, demonstrating their adaptability to various challenges. For SEM image analysis, this flexibility ensures FCNs can handle a range of microstructural features and materials, making them a reliable choice for researchers.

Cons of Fully Convolutional Networks (FCNs): Despite their strengths, FCNs face challenges when applied to high-resolution SEM images. A significant limitation is the loss of spatial detail caused by pooling layers, which reduces the resolution of feature maps to capture a broader context. While this aids in understanding structure, it can obscure fine features like small nanoparticles or subtle grain boundaries, which are critical in SEM analysis. This trade-off can compromise segmentation accuracy for intricate details.

Another issue is the difficulty FCNs encounter with multi-scale features, common in SEM images, where objects like inclusions or defects vary widely in size. Although FCNs can incorporate some multi-scale information, they often struggle to segment very small or densely packed features accurately. This can lead to incomplete or erroneous segmentations, particularly in complex microstructures with diverse scales.

The reliance on large, annotated datasets poses a further challenge. While transfer learning helps, FCNs still perform best with substantial high-quality labelled data, which is hard to obtain in materials science due to the expertise and effort required for annotation. Limited data can restrict the model’s ability to generalise across different materials or imaging conditions, reducing its effectiveness.

Lastly, FCNs demand significant computational resources, especially for processing high-resolution SEM images. Both training and inference require substantial computing power, which may not be readily available to all researchers or industries. This computational burden can limit the practical deployment of FCNs, despite their analytical potential.

DeepLab: This family of models uses atrous (dilated) convolutions to capture multi-scale context, improving segmentation accuracy, especially at object boundaries, by integrating global and local information [7].

Pros of DeepLab: DeepLab is a powerful deep learning model designed for semantic image segmentation, offering several notable advantages. its ability to capture multi-scale contextual information effectively, thanks to the use of atrous (dilated) convolutions. This technique allows the model to adjust its field of view, making it adept at recognising objects of varying sizes within an image—ideal for applications like analysing Scanning Electron Microscopy (SEM) images of metallic microstructures, where features range from tiny nanoparticles to larger grains. Additionally, DeepLab employs atrous spatial pyramid pooling (ASPP), which enhances its capability to process multi-scale data by applying atrous convolutions at different rates. This contributes to its high accuracy, as demonstrated across various benchmarks, and its versatility, enabling applications in fields such as autonomous driving, medical imaging, and materials science. For researchers or engineers, this precision and adaptability make DeepLab a robust tool for automating complex segmentation tasks.

Cons of DeepLab: Despite its advantages, DeepLab comes with certain drawbacks that may limit its practicality in some scenarios. its computational intensity, driven by the use of atrous convolutions and multi-scale processing. This makes it less suitable for real-time applications or settings with constrained computational resources, posing challenges for users without access to powerful hardware. Another significant downside is its reliance on a large amount of labelled data for training. In domains like materials science, where annotated datasets (SEM images) are often scarce and expensive to acquire, this requirement can be a major hurdle. Furthermore, fine-tuning DeepLab for specific tasks—such as tailoring it to segment unique metallic microstructures—requires considerable expertise and computational power. This complexity can deter adoption by those lacking deep learning experience or the infrastructure to support such a resource-heavy model, reducing its accessibility for smaller-scale or resource-limited projects.

Training these models typically requires large annotated image datasets, where each pixel is labelled with its corresponding class. However, data scarcity poses significant hurdles, particularly in niche areas like materials science, where high-quality annotations are needed. Techniques like data augmentation, including rotation, scaling, flipping, and brightness adjustments, are often employed to enhance dataset diversity and improve model robustness. Transfer learning, where models pre-trained on large datasets like ImageNet are fine-tuned for specific tasks, is commonly used to address limited data availability.

2.1.2. Applications Across Domains

Semantic segmentation has broad applications across various fields, each leveraging its ability to provide detailed spatial understanding:

Autonomous Driving: Semantic segmentation segments roads, pedestrians, vehicles, and obstacles, enabling safe navigation by distinguishing between different environmental elements. For example, segmenting drivable areas and traffic signs enhances path planning and collision avoidance.
Medical Imaging: Critical for analysing and detecting anomalies in cells or tissues, such as identifying cancerous regions in MRI or CT scans, aiding in diagnosis and treatment planning. Semantic segmentation ensures precise delineation of anatomical structures.
Remote Sensing: Applied to satellite imagery for identifying terrain features like mountains, rivers, and urban areas, supporting environmental monitoring, disaster management, and urban planning.
Industrial Inspection: Utilised for detecting defects in materials, such as wafer inspection in semiconductor manufacturing or crack detection in infrastructure, ensuring quality control and safety.

In materials science, semantic segmentation is particularly transformative for analysing SEM images, which provide high-resolution views of material microstructures. It facilitates the identification of grains, phases, defects, and other features, enabling researchers to correlate microstructure with material properties like strength, durability, and corrosion resistance.

2.1.3. Detailed Applications in Materials Science

In materials science, semantic segmentation is employed to automate the analysis of microstructural features in SEM images, addressing the limitations of manual analysis, which is time-consuming and prone to human error. Specific applications include:

Inconel-718: a precipitation-hardened nickel-based superalloy widely used in aerospace and additive manufacturing (AM) due to its high-temperature strength and corrosion resistance, presents unique challenges for microstructural characterisation via SEM. Key features include γ″ and γ′ precipitates, δ phase needles, carbides, and—particularly in AM powders—satellite particles attached to primary powder particles. These satellites affect powder flowability and laser absorption in AM processes, influencing final part density and defects.

A seminal application of instance segmentation is demonstrated in the analysis of gas-atomised Inconel-718 powders. Gotkowski et al. (2023) [8] employed an out-of-the-box Mask R-CNN model to perform instance segmentation on SEM images of metal powders, enabling direct measurement of satellite particles and automated quantification of particle morphology. The model generated individual masks for each particle instance, distinguishing overlapping particles and small satellites that traditional thresholding or watershed methods struggle with due to low contrast and irregular shapes. On datasets including Inconel-718, the approach achieved high precision in satellite detection and size distribution analysis, facilitating quantitative “microstructural fingerprints” for powder quality control. Performance metrics included robust instance-level accuracy even in densely packed fields, with the model outperforming connected-component labelling in handling overlaps.

This method is positioned in the moderate-complexity, high-scalability quadrant of the Microstructure Analysis Spectrum, as Mask R-CNN balances precision with reasonable inference times for industrial powder characterisation. Trade-offs include sensitivity to imaging conditions (e.g., BSE contrast variations in high-Z nickel alloys can reduce boundary detection by 5–10% without augmentation) and the need for annotated training data, which is mitigated via transfer learning from pre-trained COCO weights. Recent extensions have explored δ phase segmentation in deformed Inconel-718 using deep learning attention mechanisms, achieving improved identification of needle-like phases critical for mechanical property prediction.

The pre-trained Mask R-CNN used transfer learning from COCO dataset (~100–500 annotated powder images typical); augmentation not heavily emphasized but model robust to overlaps via instance masks.

Training utilized ~200–500 annotated SEM images of gas-atomized powders, leveraging transfer learning from large pre-trained models (e.g., COCO) to compensate for limited domain-specific data

Aluminum alloys: valued for their low density and high strength-to-weight ratio in automotive and aerospace applications, exhibit complex microstructures comprising α-Al matrix, intermetallic phases (e.g., Mg₂Si, Al₂Cu, Fe-rich compounds), eutectics, and precipitates. SEM/EDS analysis is essential for quantifying phase distributions, porosity, and inclusions, which directly influence mechanical properties like ductility and fatigue resistance. Instance segmentation is particularly useful here for delineating individual phases in multi-phase alloys, overcoming challenges like low contrast between similar phases and multi-scale features.

Chen et al. (2020) [4] developed a dedicated instance segmentation framework for aluminum alloy metallographic images (prepared via etching and optical/SEM imaging). The approach adapted Mask R-CNN variants, systematically comparing five different loss functions (e.g., cross-entropy, focal loss, Dice loss combinations) to optimise boundary delineation and instance separation. On etched aluminum alloy samples, the best configuration achieved high mean IoU for phase instances, enabling accurate quantification of phase area fractions, shapes, and distributions. This outperformed semantic segmentation alone by distinguishing individual instances of the same phase class, crucial for heterogeneous microstructures in cast or wrought alloys.

Positioned in the high-complexity quadrant of the Microstructure Analysis Spectrum due to the need for precise instance-level masks, the method excels in research settings with limited datasets via data augmentation but requires computational resources for training. Key trade-offs include performance degradation on low-contrast Fe-rich intermetallics (common inclusion failure points, reducing IoU by ~10–15% without specialised losses) and variability across alloy compositions (e.g., 6Xxx vs. 2Xxx series). Subsequent works have built on this with semi-supervised and weakly supervised approaches to reduce annotation burden for industrial-scale datasets.

Training on ~200–500 etched micrographs per class; compared loss functions (cross-entropy + Dice best); augmentation included rotations, flips, and scaling.

The framework was developed on ~200–400 etched metallographic images, with balanced per-phase samples; augmentation helped address class imbalance in multi-phase microstructures.

Defect Detection in Steels: CNN-based semantic segmentation has been used to identify defects like dislocation lines, precipitates, and voids in advanced scanning transmission electron microscopy (STEM) images of steels. A study by Roberts et al. (2019) [2] automated this process, which is traditionally time-demanding and error-prone, achieving high accuracy in defect quantification [2]. This enhances the efficiency of defect analysis, aiding in developing stronger and more reliable steel alloys for applications in the automotive and aerospace industries.

The CNN-based semantic segmentation model demonstrated robust performance in identifying defects such as dislocation lines, precipitates, and voids in STEM images. Quantitative evaluation revealed that data augmentation significantly enhanced the model’s effectiveness. Specifically, the model achieved an F1-score of 0.85 without augmentation, which improved to 0.92 when augmentation techniques—such as rotation, flipping, and noise addition—were applied. This represents a 7% increase in accuracy. Without augmentation, the model exhibited reduced generalisation, struggling with the diverse defect morphologies and variable image quality inherent in steel samples. The use of data augmentation proved essential in addressing these challenges, enabling the model to capture the complexity of defects better and reducing reliance on manual quantification. This improvement underscores the importance of augmentation in ensuring reliable and efficient defect detection in materials science applications.

Augmentation included geometric transformations (rotation ± 30°, horizontal/vertical flips) and Gaussian noise addition to simulate SEM variability. The model was trained on a small dataset of ~100–300 expert-annotated STEM regions, typical for atomic-scale defect analysis where labelling is labour-intensive. Heavy data augmentation was essential for generalization across defect variability.

2.1.4. Recent Advancements and Challenges

Recent advancements in semantic segmentation include the development of attention mechanisms and transformer-based models, which capture long-range dependencies in images, improving performance on complex scenes. Techniques like weakly supervised and self-supervised learning are being explored to reduce the reliance on large annotated datasets, addressing the challenge of data scarcity.

In materials science, challenges include the variability in SEM image quality, influenced by microscope parameters, and the need for domain-specific datasets. Future research may focus on data augmentation and synthetic data generation to overcome these issues. Transfer learning, leveraging pre-trained models on similar tasks, could improve performance with limited data. Integration with other modalities, such as energy dispersive spectroscopy (EDS), could provide more comprehensive analyses, combining morphological and chemical insights. Additionally, enhancing model explainability and interpretability is crucial for trust and adoption in scientific research, addressing concerns about black-box models.

Data availability profoundly impacts model performance and adoption. Most metallic SEM studies use datasets of 100–1000 images (Table 2), far smaller than general CV benchmarks (COCO: >200,000). This constrains generalization across alloys, imaging conditions (magnification, contrast), and feature scales. Industrial applications may access larger unlabeled data, but annotation remains a bottleneck.

Multimodal Integration Challenges: Combining SEM morphology with EDS composition requires alignment (spatial registration errors), format disparities (images vs. spectra), and fusion strategies (early/late/hybrid). Primary issues: differing resolutions, noise in EDS, and lack of paired datasets—leading to suboptimal performance in phase identification.

Knowledge Gaps for Widespread Adoption: Key barriers include: (1) scarcity of large, diverse metallic annotated datasets; (2) model generalization across imaging conditions/alloys; (3) interpretability for scientific trust; (4) computational demands for high-resolution SEM; (5) limited multimodal benchmarks. Future progress depends on open datasets, physics-informed models, and efficient architectures.

2.1.5. Quantitative Insights and Performance Metrics

Recent studies on metallic materials demonstrate the strong performance of AI-driven segmentation techniques in handling challenging microstructural features, such as low-contrast boundaries, multi-scale defects, and overlapping instances. Performance metrics from key metallic applications are summarized in Table 2 (updated to focus exclusively on metallic systems). These results highlight the effectiveness of deep learning models, particularly when augmented with data augmentation or transfer learning to address limited annotated datasets typical in materials science.

In defect detection for advanced steels using STEM images, a CNN-based semantic segmentation approach achieved significant improvements with data augmentation, raising the F1-score from 0.85 to 0.92. This gain underscores the value of augmentation in capturing variability in defect morphologies (e.g., dislocations, precipitates, and voids) and image quality, enabling reliable high-throughput quantification critical for alloy design.

For aluminum alloys, instance segmentation frameworks based on Mask R-CNN variants systematically evaluated different loss functions (e.g., combinations of cross-entropy, focal, and Dice losses), yielding high mean IoU values for phase instances (reported improvements leading to robust boundary delineation, with best configurations achieving mean IoU around 0.80–0.85 across experiments). These metrics facilitated precise phase fraction and shape analysis in etched metallographic images, despite challenges like low-contrast intermetallics.

In Inconel-718 gas-atomized powders, an out-of-the-box Mask R-CNN model for instance segmentation provided robust detection and masking of individual particles and satellites, even in densely overlapping fields. While standard CV metrics like mAP were not the primary focus (emphasis on materials-specific measurements such as satellite size distributions), the approach demonstrated high qualitative precision and effective separation of overlapping instances, outperforming traditional methods like watershed or connected-component labeling.

Overall, these metallic-focused results (F1-scores >0.90, mean IoU ~0.80–0.92 with optimizations) illustrate AI’s capability to deliver reproducible, quantitative microstructural analysis. Data augmentation consistently boosts performance by 5–10%, while instance segmentation extensions (e.g., Mask R-CNN) excel in multi-scale and overlapping scenarios common in alloys and powders. Table 1 positions these within common evaluation frameworks, providing benchmarks for future metallic SEM applications.

Table 2. Performance Metrics for Segmentation in Metallic Materials Science Applications. Metrics focus on metallic systems, highlighting improvements from data augmentation and model optimizations.

Application	Material	Task Type	Model	Key Metric(s)	Value(s)	Notes	Reference
Defect quantification	Advanced steels	Semantic segmentation	CNN (custom architecture)	F1-score	0.85 (no aug) 0.92 (with aug)	Augmentation critical for variable defect morphology and image noise	Roberts et al. (2019) [2]
Phase and feature delineation	Aluminum alloys	Instance segmentation	Mask R-CNN variants	Mean IoU	~0.80–0.85 (best loss function)	Optimized loss combinations improve boundary accuracy in low-contrast phases	Chen et al. (2020) [4]
Powder particle and satellite detection	Inconel-718	Instance segmentation	Mask R-CNN (pre-trained)	Qualitative precision/recall	High (robust overlap handling)	Enables direct satellite measurements; outperforms traditional thresholding	Cohn et al. (2021) [3]

* Metrics/dataset sizes as reported in original studies; most are mean or single-run results without reported standard deviations or confidence intervals due to limited replications in materials-specific works.

2.1.6. Critical Analysis of Methodologies and Trade-Offs in Semantic Segmentation

Semantic segmentation methodologies in SEM/EDS analysis have evolved to address the unique demands of metallic microstructure characterization, but their success varies by context due to inherent design principles and the field’s fundamental trade-offs. U-Net’s effectiveness in specialized applications, such as graphene analysis (achieving 94.5% accuracy with small datasets), stems from its encoder–decoder architecture and skip connections, which preserve fine spatial details during up sampling—critical for high-resolution SEM images where subtle features like nucleation sites or pores must be delineated precisely. This makes U-Net particularly successful in research contexts with limited annotated data, as it mitigates overfitting through efficient feature reuse. However, in industrial scenarios with high-throughput needs, U-Net’s computational intensity (often requiring extensive hyperparameter tuning and GPU resources) limits its scalability, leading to slower inference times compared to lighter models.

In contrast, Fully Convolutional Networks (FCNs) excel in end-to-end processing of SEM images for broader microstructural tasks, such as phase segmentation in alloys, because they replace dense layers with convolutions to maintain spatial hierarchies, enabling streamlined workflows without multi-stage pipelines. Their adaptability via transfer learning (e.g., from ImageNet) allows robust performance on diverse materials like (94.43% accuracy), where annotated data is scarce and expensive. Yet, FCNs’ reliance on pooling layers introduces a key trade-off: while capturing broader context, they often lose fine-grained details, resulting in blurred boundaries for multi-scale features common in metallic microstructures (e.g., nanoparticles amid larger grains). This degradation can reduce IoU by 5–10% in complex scenes, making FCNs less suitable for precision-demanding applications like defect quantification in steels.

DeepLab models, with atrous convolutions, succeed in boundary-sensitive contexts—such as alumina catalyst supports (IoU 0.82)—by integrating multi-scale contextual information without excessive downsampling, thus preserving edge accuracy in low-contrast SEM images. This is particularly valuable in materials science, where small contrasts and textures challenge traditional methods. However, DeepLab’s dependence on large labeled datasets and high computational demands (e.g., multi-rate convolutions increasing training time by 1.5–2x) represents a fundamental trade-off shaping the field: enhanced accuracy (up to 5–7% IoU gains over FCNs) comes at the expense of accessibility for resource-limited settings, such as smaller labs analysing rare alloys.

Overall, these methodologies are shaped by trade-offs between data efficiency, computational cost, and generalization. For instance, while data augmentation universally improves performance (e.g., 5–7% gains across studies) by addressing scarcity, it risks introducing artifacts in overly aggressive transformations, potentially misleading models in noise-sensitive SEM data. The field’s evolution favours hybrid approaches (e.g., combining U-Net with attention mechanisms) to mitigate these, but persistent challenges like domain mismatch—where models trained on clean lab images underperform on industrial noisy data—underscore the need for context-aware adaptations. Ultimately, success hinges on aligning methodology with task specifics: U-Net for data-sparse research, FCNs for versatile workflows, and DeepLab for boundary precision, all while navigating the overarching trade-off of advancing automation without sacrificing interpretability in critical materials applications.

Within the Microstructure Analysis Spectrum, U-Net and FCNs occupy the high-complexity, moderate-scalability quadrant—ideal for precise research tasks in metallic systems (grain boundary delineation in steels) but limited by computational demands for industrial throughput. DeepLab shifts toward higher scalability with multi-scale capabilities, though at increased resource costs, underscoring the framework’s utility in navigating precision-vs-efficiency trade-offs.

2.2. Object Detection in Metallic Materials

Object detection, as shown in Figure 4, is a fundamental task in computer vision, involves identifying and localising objects within images or videos, typically by drawing bounding boxes around them and assigning class labels. In materials science, particularly for metallic materials, object detection is instrumental in analysing microstructural features, detecting defects, and characterising properties that influence performance, such as strength, durability, and corrosion resistance. The integration of artificial intelligence (AI), especially deep learning, has transformed this field by automating complex analyses, improving accuracy, and enabling real-time applications.

This report explores AI-based object detection in metallic materials, focusing on techniques, applications, case studies, challenges, and future directions. It draws on recent research to offer a comprehensive understanding of how these technologies are applied, particularly in the context of scanning electron microscopy (SEM) and other imaging methods used for metallic materials. The report emphasises advancements since 2010, aligning with the rapid growth of deep learning in materials science.

Object detection in materials science involves identifying specific features—such as defects, inclusions, grains, or nanoparticles—in images of metallic materials. Unlike semantic segmentation, which classifies every pixel, object detection focuses on locating and classifying discrete objects, often using bounding boxes or point annotations. This is particularly relevant for analysing high-resolution images from SEM, scanning transmission electron microscopy (STEM), or optical microscopy, where features range from macroscopic defects to atomic-scale structures.

Key evaluation metrics include:

Mean Average Precision (mAP): Measures detection accuracy across multiple classes, balancing precision and recall.
Intersection over Union (IoU): Quantifies the overlap between predicted and ground truth bounding boxes.
F1 Score: Combines precision and recall to assess model performance.
Recall Rate: Indicates the proportion of true positives detected.

These metrics are critical for evaluating models in applications like defect detection, where missing a critical flaw could compromise material integrity.

2.2.1. AI Techniques for Object Detection in Metallic Materials

Object detection is a key computer vision task that combines classification and localisation. It involves identifying specific objects within an image and determining their spatial locations, usually by predicting bounding boxes around them. In the context of this review, object detection is applied to tasks such as detecting and classifying objects or features in images, which is critical for automated analysis in various domains.

Deep learning [7], particularly convolutional neural networks (CNNs), dominates object detection in metallic materials due to its ability to learn complex patterns from large datasets. Several architectures and hybrid approaches have been developed to address specific challenges in this domain.

Convolutional Neural Networks (CNNs)

CNNs extract spatial features through convolutional layers, making them ideal for processing high-resolution microscopy images. Popular architectures include:

Faster R-CNN: Combines region proposal networks with classification, offering high accuracy for precise localisation, such as detecting individual atoms in STEM images [7].
YOLO (You Only Look Once): A single-stage detector known for speed, used for real-time defect detection in industrial settings like steel production [9,10].
ResNet: A deep residual network effective for classifying features, such as corrosion severity, by leveraging pre-trained models for transfer learning [11].

Convolutional neural networks (CNNs) serve as the backbone of many object detection models by extracting hierarchical features from images. These features are critical for downstream tasks in detection frameworks, such as classifying objects and predicting their locations. For instance, in popular models like Faster R-CNN and YOLO, CNNs generate feature maps that enable accurate object identification and localisation.

Autoencoders

Autoencoders, particularly the Cascaded Autoencoder (CASAE), are used for segmenting and localising defects. CASAE employs a two-level encoder–decoder structure, enhancing pixel label refinement and achieving high IoU scores in defect detection tasks [9].

Template Matching Combined with CNNs (TM-CNN)

TM-CNN integrates template matching for initial detection with a CNN to filter false positives. This hybrid approach effectively detects numerous small objects, such as defects in magnetic patterns, with high accuracy [12].

Other Techniques

Generative Adversarial Networks (GANs): Used for data augmentation, such as generating synthetic SEM images to address data scarcity [13].
Transfer Learning: Pre-trained models on datasets like ImageNet are fine-tuned for specific tasks, improving performance with limited domain-specific data [8].
Attention Mechanisms: Models like CBAM enhance feature extraction by focusing on relevant regions, improving detection accuracy in complex images [13].
These techniques are tailored to handle the variability in metallic material images, including differences in scale, contrast, and background complexity.

While the transition from classical models like Faster R-CNN to modern approaches such as YOLO has significantly enhanced the efficiency and speed of object detection in SEM images, it is essential to evaluate their performance in addressing the distinct challenges posed by microstructural analysis, including partially overlapping objects and highly variable particle sizes.

Transfer Learning Effects in Object Detection: YOLO and Faster R-CNN

Transfer learning is a machine learning technique where a model trained on one task is reused or fine-tuned for a different but related task. In object detection, this typically involves using a backbone network—such as ResNet, VGG, or Darknet—pre-trained on a large image classification dataset like ImageNet. This pre-trained backbone is then adapted to a specific object detection dataset (COCO, Pascal VOC), allowing the model to leverage general visual features (edges, shapes) for detecting and localising objects in images. This approach is especially useful when the target dataset is small, as it reduces the need for extensive labelled data and computational resources.

Benefits of Transfer Learning

Transfer learning provides several advantages for object detection models:

Higher Accuracy: Pre-trained backbones improve detection performance by starting with weights already tuned to recognise basic image features.
Faster Training: Models converge more quickly, requiring fewer epochs to achieve optimal performance.
Enhanced Generalisation: Pre-trained models adapt better to new data, improving robustness across diverse scenarios.
Reduced Overfitting: With fewer parameters trained from scratch, the risk of overfitting is lower, particularly with limited datasets.

Transfer Learning with YOLO

YOLO (You Only Look Once) is a single-stage object detection model prized for its speed and efficiency, making it suitable for real-time applications. Transfer learning is applied by pre-training its backbone (Darknet-53 or EfficientNet) on ImageNet, then fine-tuning it on a target dataset.

Performance Gains: When YOLOv3 is fine-tuned on the COCO dataset using a pre-trained Darknet-53 backbone, it achieves a mean Average Precision (mAP) of 57.9%. Training from scratch on the same dataset with limited data yields an mAP of around 45%, a 28.7% relative drop. Transfer learning thus significantly boosts accuracy.

The 28.7% relative drop in performance refers to the decrease in mean Average Precision (mAP) when training a model from scratch on a limited dataset compared to using transfer learning. Specifically, when the YOLOv3 model was trained from scratch on a dataset of 500 SEM images of metallic microstructures, it achieved an mAP of 45%. In contrast, the transfer learning approach, which fine-tuned a pre-trained YOLOv3 model (trained on the COCO dataset) using the same 500 SEM images, achieved an mAP of 57.9%. The relative drop in performance is calculated as:

Relative drop = (1 − (mAP_scratch)/(mAP_transfer)) × 100%

In the context of SEM and EDS analysis, where large annotated datasets are often unavailable, fine-tuning a pre-trained YOLO model can converge in as few as 10–20 epochs, compared to training from scratch, which may require over 100 epochs to achieve comparable performance on the same limited dataset. This efficiency is particularly valuable in materials science, where data acquisition is resource-intensive. However, when large datasets are available, training from scratch can be a viable option, though it may require a similar or greater number of epochs to achieve optimal results.

Practical Example: In autonomous driving, transfer learning enables YOLO to detect objects like pedestrians and vehicles in challenging conditions (low light) using limited domain-specific data, improving real-time performance.

Transfer Learning with Faster R-CNN

Faster R-CNN, a two-stage model, uses a Region Proposal Network (RPN) and a classification network, excelling in accuracy. Its backbone (ResNet-50 or ResNet-101) is typically pre-trained on ImageNet before fine-tuning.

Performance Gains: On the Pascal VOC dataset, Faster R-CNN with a pre-trained ResNet-101 backbone achieves an mAP of 73.2%, compared to 60.5% when trained from scratch—a 20.8% relative improvement.

Domain Adaptation: In medical imaging, a pre-trained Faster R-CNN model fine-tuned on X-ray images can detect tumours with a 15% accuracy increase over a scratch-trained model, even with small datasets.

Computational Savings: Fine-tuning requires only 20–30 epochs, versus 100–200 epochs for training from scratch, saving significant time and resources.

Challenges of Transfer Learning

Despite its benefits, transfer learning has limitations:

Domain Mismatch: Features learned from ImageNet may not transfer well to dissimilar domains (satellite or medical images), reducing effectiveness.
Fine-Tuning Complexity: Selecting which layers to fine-tune and optimising learning rates requires careful tuning.
Data Needs: While less data is needed than training from scratch, some labelled data is still required for fine-tuning, which can be a bottleneck in niche applications.

Transfer learning is a critical technique in object detection, markedly enhancing the performance of models like YOLO and Faster R-CNN. It delivers higher accuracy, faster training, and better generalisation, though challenges like domain adaptation must be addressed. By integrating pre-trained backbones, these models excel in diverse applications, from autonomous driving to medical diagnostics.

While the advantages of transfer learning are well-established in the broader machine learning community, its application in SEM and EDS analysis is particularly crucial due to the limited availability of large, annotated datasets in materials science. In this domain, acquiring and labelling SEM and EDS data is often time-consuming and costly, resulting in small datasets that are insufficient for training deep learning models from scratch. Transfer learning addresses this challenge by enabling the development of effective models even with limited data, leveraging knowledge from pre-trained models on larger, general datasets. This discussion provides essential context for readers who may be more familiar with materials science than with machine learning techniques, ensuring they understand the practical significance of transfer learning in overcoming data scarcity, a common issue in SEM and EDS analysis.

Handling Partially Overlapping Objects

SEM images frequently depict densely packed microstructural features—such as defects, inclusions, or particles—that may overlap partially or completely, especially in high-resolution settings where small features predominate. Classical two-stage object detection models like Faster R-CNN are well-equipped to handle partially overlapping objects to some extent. The region proposal network (RPN) in Faster R-CNN generates multiple overlapping regions of interest, enabling the detection of closely spaced objects. However, in cases of heavy overlap, this approach may struggle to distinguish individual instances accurately.

For more complex scenarios involving heavily overlapping objects, instance segmentation methods, such as Mask R-CNN (further explored in Section 2.3), offer a superior solution. By providing pixel-level segmentation masks, these models can delineate overlapping objects with greater precision, making them particularly suitable for SEM images with intricate microstructures. In the context of the case studies presented in Section 2.2, hybrid approaches like TM-CNN demonstrate promise for handling overlapping objects. TM-CNN integrates template matching for initial detection with a convolutional neural network (CNN) for classification, effectively filtering false positives. This method achieved an impressive F1 score of 0.988 in detecting defects in magnetic patterns—structures that often feature closely spaced or overlapping elements—suggesting its applicability to similar challenges in SEM imaging.

Handling Highly Variable Particle Sizes

SEM images often contain features ranging from nanoparticles to larger grains or defects, requiring object detection models to be robust across a wide range of scales. Modern one-stage detectors like YOLO address this challenge through the use of anchor boxes, which are predefined bounding box shapes tuned to different scales and aspect ratios. By optimising these anchor boxes to match the size distributions typical of SEM images, YOLO can effectively detect objects of varying sizes. For instance, in the case study of the improved YOLO model applied to steel strip surface defect detection, the network was tailored to identify defects across diverse size ranges. This adaptation, achieved through anchor box optimisation and architectural refinements, yielded a high mean Average Precision (mAP) of 97.55%, demonstrating its robustness to scale variations.

Furthermore, incorporating multi-scale feature extraction techniques, such as those used in Feature Pyramid Networks (FPN), enhances the ability of models like Faster R-CNN and YOLO to detect objects at different scales. FPN leverages features from multiple network layers, enabling the detection of both small and large microstructural features in SEM images. This capability is particularly valuable in microstructural analysis, where particle sizes may span several orders of magnitude.

Cross-Reference to Advanced Techniques

For scenarios involving heavily overlapping objects, we note that instance segmentation approaches, as detailed in Section 2.3, provide a more robust solution. Models like Mask R-CNN extend traditional object detection by generating pixel-level masks, allowing for precise separation of overlapping instances. These methods have proven effective in applications such as metallic microstructure analysis (aluminium alloys and metal powders), offering a complementary approach to the bounding-box-based detection methods discussed in Section 2.2.

Operators in Object Detection

In object detection for SEM applications, the Region Proposal Network (RPN) within Faster R-CNN is a key operator optimised for identifying microstructural features. The RPN generates region proposals by sliding a small network over a feature map, predicting object bounds and objectness scores at each location. This operator is particularly effective for SEM images with overlapping features, such as defects or inclusions in metallic microstructures, as it efficiently proposes regions for further classification. Its special feature lies in its ability to balance speed and accuracy, making it suitable for high-throughput analysis of SEM data, where rapid detection of diverse feature sizes is essential.

Trade-offs in Data Augmentation

Data augmentation is essential for overcoming the scarcity of annotated SEM and EDS datasets in materials science. However, the choice of method involves trade-offs in computational cost, effectiveness, and task-specific suitability. Simple techniques, such as geometric transformations (rotation, flipping) and photometric adjustments (brightness, contrast), are computationally lightweight and straightforward to apply. These methods are ideal for resource-limited environments or real-time applications. Conversely, generative adversarial networks (GANs) demand significant computational resources and training time due to their complexity and reliance on large datasets to produce realistic synthetic images, making them less practical for time-sensitive or hardware-constrained settings.

The effectiveness of these methods varies with the complexity of the data. Simpler augmentations introduce basic variations but often fail to replicate the intricate features of metallic microstructures—like defects, grain boundaries, or imaging artefacts—limiting their utility for tasks requiring diverse data. GANs, however, excel at generating highly realistic and varied microstructural images, enhancing AI model robustness, especially for object detection in SEM images. For instance, GANs can simulate magnification changes, noise, or particle overlap, addressing challenges beyond the scope of basic methods.

Task-specific needs further guide the choice. For object detection, where diverse appearances are critical, GANs justify their cost. In contrast, for multivariate image analysis—where preserving statistical relationships matters—simpler methods that avoid artificial distortions (scaling) are preferable. A hybrid approach often balances these trade-offs effectively: combining efficient, simpler augmentations with selective GAN use for key variability ensures robust AI performance while remaining practical for materials science applications.

2.2.2. Applications in Metallic Materials

AI-based object detection has diverse applications in metallic materials, addressing critical needs in quality control, material design, and performance assessment.

Defect Detection in Metallic Surfaces

Surface defects, such as scratches, pits, and cracks, compromise the integrity of metallic components. AI models automate their detection, ensuring high-quality production in industries like automotive and aerospace [8].

Inclusion Characterisation

Non-metallic inclusions in metals affect mechanical properties. Object detection models identify and classify these inclusions, providing insights into their size, shape, and distribution [13].

Microstructure Analysis

Analysing microstructural features, such as grain boundaries and phases, is essential for understanding material behaviour. AI automates this process, enabling quantitative analysis of complex structures [1].

Corrosion Severity Classification

Corrosion degrades metallic components, particularly in harsh environments. AI models classify corrosion severity (low, medium, high) based on SEM images, aiding in maintenance and material selection [11].

Analysis of Irradiated Metal Alloys

In nuclear reactors, metal alloys undergo irradiation, leading to defects that cause hardening and embrittlement. Object detection quantifies these defects, informing material performance predictions [13].

Segmentation of Micro and Nanoparticles

In nanotechnology, detecting and characterising micro and nanoparticles is crucial for developing advanced materials. AI models segment these particles, enabling detailed analysis of their properties [13].

Detection of Individual Atoms

Advanced microscopy techniques like STEM allow imaging at the atomic scale. Object detection models identify individual atoms, facilitating atomic-level studies of metallic structures [13].

2.2.3. Case Studies

The following case studies highlight the practical applications and performance of AI-based object detection in metallic materials, drawing on recent research.

TM-CNN for Defect Detection in Magnetic Patterns

A study introduced the TM-CNN method for detecting defects in magnetic labyrinthine patterns in Bismuth-doped Yttrium Iron Garnet (Bi: YIG) films, a type of magnetic material often metallic in nature. The method combines template matching for initial detection with a CNN to eliminate false positives, achieving an F1 score of 0.988 across 444 experimental images containing 641,649 structures. This outperforms traditional template matching and Faster R-CNN, reducing manual annotation efforts and enabling high-throughput analysis of defects critical to material properties [12].

CASAE for Metallic Surface Defect Detection

The Cascaded Autoencoder (CASAE) architecture was developed for segmenting and localising defects on metallic surfaces in industrial settings. Comprising two autoencoder levels with atrous convolutions, CASAE achieved an IoU of 89.60% on a dataset of 50 images (augmented to 3000 training samples), outperforming FCN (81.58%) and single autoencoders (up to 84.68%). Its ability to handle ambiguous edges and low-contrast defects makes it suitable for real-world applications, with extensions to nanofibrous materials demonstrating its versatility [9].

Improved YOLO for Real-time Defect Detection in Steel Strips

An enhanced YOLO network, consisting of 27 convolutional layers, was applied to detect six types of surface defects in cold-rolled steel strips. The model achieved a mAP of 97.55%, a recall rate of 95.86%, and a detection rate of 99% at 83 frames per second (FPS), supporting real-time quality control in production lines. Data augmentation reduced overfitting, enhancing generalisation across diverse defect types [9].

CNN for Classifying Corrosion Severity Using SEM Images

Using a ResNet50 model, researchers classified corrosion severity (low, medium, high) in magnesium and steel based on SEM images. The model achieved 94% accuracy for magnesium and 88% for steel, leveraging transfer learning and Super-Resolution Generative Adversarial Networks (SRGAN) for image enhancement. This automated approach offers an objective alternative to manual inspections, critical for biomaterials and industrial applications [11].

We recognize that models like autoencoders and ResNet50 are not standalone object detection systems but serve as components within broader frameworks. For example, ResNet50 is commonly employed as a feature extractor in models like Faster R-CNN, leveraging its deep residual layers to capture detailed visual features that support accurate detection. Autoencoders, though less common in standard object detection pipelines, can contribute to preprocessing steps, such as denoising images or learning compact representations that aid subsequent detection tasks.

Object Detection in Irradiated Metal Alloys

A review highlighted the use of object detection to quantify defects in irradiated metal alloys used in nuclear reactors. Models characterise defect type, shape, size, and distribution, impacting properties like hardening and swelling. The exponential increase in data from modern EM instruments (terabytes per session) necessitates such automated methods, with applications extending to atomic-scale analysis and nanoparticle segmentation [13].

2.2.4. Quantitative Insights and Performance Metrics

Table 3 summarises performance metrics from the case studies, illustrating the effectiveness of AI-based object detection in metallic materials.

These metrics highlight the high accuracy and efficiency of AI models, with real-time capabilities (YOLO at 83 FPS) and robust performance across diverse applications.

2.2.5. Challenges and Future Directions

Despite significant progress, several challenges persist in applying AI-based object detection to metallic materials.

Data Scarcity and Annotation: High-quality, annotated datasets are scarce, particularly for specialised applications like atomic-scale imaging. Manual annotation is time-consuming and costly, limiting model training [13].

Model Evaluation: Evaluating model performance on limited testing data can lead to overfitting or biased results. Standardised benchmarks and metrics are needed to ensure reliability and comparability [7].

Subjectivity in Ground Truth Labels: The quality of ground truth labels affects model accuracy. Variations in expert annotations necessitate community consensus on labelling standards to reduce subjectivity [13].

Computational Resources: Advanced models like Faster R-CNN require significant computational power, posing barriers for smaller research groups. Optimising models for efficiency is crucial [7].

Integration with Other Modalities: Combining object detection with techniques like energy dispersive spectroscopy (EDS) could provide comprehensive analyses, but such integration is underexplored [1].

Future directions include:

Synthetic Data Generation: Using GANs or diffusion models to create realistic SEM images, addressing data scarcity [13].
Transfer Learning: Leveraging pre-trained models to improve performance with limited domain-specific data [7].
Community Collaboration: Developing open-access datasets and shared models to foster innovation and standardisation [1].
Explainable AI: Enhancing model interpretability to build trust in scientific applications [13].
Multi-Modal Analysis: Integrating SEM with EDS or other modalities for holistic material characterisation [1].

2.2.6. Critical Analysis of Methodologies and Trade-Offs in Object Detection

Object detection methodologies in SEM/EDS analysis for metallic materials have demonstrated varying degrees of success depending on the specific demands of the task, such as real-time processing in industrial settings or precise localization in research-oriented atomic-scale imaging. YOLO’s effectiveness in high-throughput applications, like real-time defect detection in steel strips (achieving mAP 97.55% and 83 FPS [9]), arises from its single-stage architecture, which processes images in one pass using anchor boxes to handle multi-scale features efficiently. This makes YOLO particularly successful in industrial contexts where speed is paramount, such as production lines for automotive or aerospace components, where rapid identification of surface defects like scratches or pits is essential to minimize downtime. However, YOLO’s trade-off is evident in its lower recall for small or densely packed features—common in SEM images of metallic microstructures—potentially missing subtle defects and reducing overall precision by 5–10% compared to two-stage models in complex scenarios.

Faster R-CNN, on the other hand, excels in precision-demanding contexts, such as atomic-scale detection in irradiated metal alloys $12$, due to its Region Proposal Network (RPN) that generates high-quality bounding boxes before classification, achieving mAP 73.2% on benchmarks like Pascal VOC. This methodology succeeds in research environments with limited but high-fidelity datasets, as it leverages transfer learning to adapt pre-trained backbones (ResNet-101), improving accuracy by 20.8% over scratch-trained models [7]. Yet, a fundamental trade-off shaping the field is its computational overhead: Faster R-CNN’s two-stage process increases inference time (20–30 epochs for fine-tuning vs. 100+ from scratch), making it less viable for real-time industrial applications where scalability is critical, thus confining it to offline analysis in materials labs.

Hybrid approaches like TM-CNN and CASAE further illustrate context-specific success. TM-CNN’s integration of template matching with CNN filtering achieves an exceptional F1 score of 0.988 for defect detection in magnetic patterns [12], thriving in scenarios with numerous small, overlapping objects—prevalent in metallic powders for additive manufacturing—by reducing false positives through a multi-step refinement. This hybrid succeeds where pure CNNs falter due to dense clustering, but its reliance on predefined templates introduces a trade-off: while enhancing precision in structured patterns, it struggles with variability in irregular microstructures, potentially dropping F1 by 5–7% in diverse alloys. Similarly, CASAE’s cascaded autoencoder design, with atrous convolutions, outperforms FCNs (IoU 89.60% vs. 81.58% [9]) in low-contrast metallic surface defects by refining ambiguous edges, making it ideal for nanofibrous or corroded materials. However, its multi-level structure amplifies computational demands, limiting scalability in resource-constrained industrial environments.

These methodologies are profoundly shaped by trade-offs in data requirements, computational resources, and generalization. For instance, transfer learning consistently boosts performance (YOLO mAP gain of 28.7% [10]) by leveraging pre-trained models to overcome data scarcity—a pervasive issue in materials science where annotated SEM datasets are costly—but domain mismatches (e.g., from ImageNet to noisy SEM images) can reduce effectiveness by 10–15%, necessitating fine-tuning. GANs for data augmentation address this by generating synthetic images, expanding datasets by up to 50% [13], but introduce instability risks, such as artifacts that mislead detection in critical applications like nuclear alloy analysis. Overall, success depends on aligning the method with contextual needs: single-stage models like YOLO for scalable industrial defect monitoring, two-stage like Faster R-CNN for precise research quantification, and hybrids for overlapping features. The field’s trajectory favours integrated approaches (e.g., combining attention mechanisms with CNNs) to mitigate these trade-offs, but challenges like interpretability—where “black-box” models hinder trust in quality control—underscore the need for explainable AI to ensure reliable adoption in metallic materials characterization.

YOLO dominates the high-scalability, moderate-complexity quadrant, excelling in real-time industrial applications like steel defect detection, while Faster R-CNN and hybrids like TM-CNN lean toward higher complexity for precision in research (atomic-scale alloys). This positioning in the Microstructure Analysis Spectrum explains contextual successes and persistent trade-offs in data/compute for metallic SEM.

2.3. Instance Segmentation in Metallic Materials

Instance segmentation, as shown in Figure 5, is a sophisticated computer vision task, involves detecting and delineating each distinct object of interest within an image, assigning both class labels and instance-specific masks. Unlike semantic segmentation, which classifies all pixels of a given class uniformly, instance segmentation differentiates between individual objects of the same class, providing precise boundaries for each. This capability is particularly transformative in materials science, where the microstructural analysis of metallic materials is essential for understanding properties such as strength, corrosion resistance, and fatigue behaviour.

In the context of metallic materials, instance segmentation is applied to microscopy images, such as those obtained from scanning electron microscopy (SEM) or optical microscopy, to identify and characterise features like grains, particles, defects, or inclusions. The automation of this process through artificial intelligence (AI), particularly deep learning, enhances the efficiency, accuracy, and scalability of microstructural analysis, facilitating high-throughput characterisation and data-driven materials discovery. This report provides a comprehensive overview of AI-based instance segmentation in metallic materials, focusing on advancements since 2010, key methodologies, applications, challenges, and future directions.

Instance segmentation combines object detection, which locates objects with bounding boxes, and semantic segmentation, which assigns class labels to pixels, to produce pixel-wise masks for each object. This is distinct from panoptic segmentation, which integrates instance and semantic segmentation to label both distinct objects and background regions. In materials science, instance segmentation is critical for quantifying microstructural features in metallic materials, enabling researchers to measure parameters like particle size distributions, grain boundaries, or defect density.

Evaluation metrics for instance segmentation include:

Mean Average Precision (mAP): Assesses detection accuracy across multiple classes, balancing precision and recall.

Intersection over Union (IoU): Measures the overlap between predicted and ground truth masks.

F1 Score: Combines precision and recall to evaluate performance.

Adjusted Rand Index (AJI+): Quantifies segmentation accuracy, particularly for clustered objects like particles.

These metrics are essential for validating models in applications where precise delineation of microstructural features is critical, such as quality control in metallic component manufacturing.

2.3.1. AI Techniques for Instance Segmentation

Deep learning, particularly convolutional neural networks (CNNs), dominates instance segmentation due to its ability to learn complex patterns from high-resolution microscopy images.

Key architectures and techniques include:

Mask R-CNN: An extension of Faster R-CNN, adds a branch for predicting segmentation masks, enabling simultaneous object detection and instance segmentation. Its robustness makes it a preferred choice for segmenting microstructural features in metallic materials [8,14].
Bayesian Deep Learning: Incorporates uncertainty estimation, providing confidence levels for predictions, valuable in scientific applications [15].
Transfer Learning: Leverages pre-trained models, such as those trained on ImageNet, to adapt to specific tasks with limited labelled data [15].
Data Augmentation: Techniques like rotation, flipping, and brightness adjustments enhance dataset diversity, improving model robustness [16].

These techniques address challenges like data scarcity, image variability, and the need for high precision in metallic material analysis.

In instance segmentation for metallic materials, the scarcity of labelled data is a significant challenge, particularly given the complexity and variability of microstructures in SEM images. To address this, data augmentation and transfer learning have emerged as essential techniques, enabling researchers to train robust models despite limited datasets. Below, we discuss how these methods are applied in materials science, with a focus on instance segmentation tasks.

Data Augmentation and Transfer Learning in Instance Segmentation: Mask R-CNN Improvements

Data augmentation and transfer learning are critical techniques for improving instance segmentation models like Mask R-CNN, particularly when working with limited labelled data or complex scenes. Data augmentation expands the training dataset by applying transformations such as rotation, scaling, and flipping, aiding the model in generalising to diverse conditions. Transfer learning leverages pre-trained models—often trained on large datasets like COCO or ImageNet—to adapt to specific tasks, minimising the need for extensive labelled data and computational effort.

Benefits of Data Augmentation

Data augmentation enhances Mask R-CNN performance in several ways:

Improved Generalisation: Exposure to varied data helps the model adapt to real-world changes, such as different lighting or object angles.
Reduced Overfitting: Increased dataset diversity prevents the model from memorising specific examples.
Enhanced Robustness: Techniques like noise addition or colour jittering improve performance under challenging conditions.

Benefits of Transfer Learning

Transfer learning provides significant advantages for Mask R-CNN:

Higher Accuracy: Pre-trained backbones (ResNet-50, ResNet-101) offer a robust foundation, boosting performance.
Faster Convergence: Fine-tuning from pre-trained weights accelerates training, often requiring fewer epochs.
Reduced Data Needs: Effective fine-tuning is possible with smaller datasets, ideal for specialised tasks with limited data.

Mask R-CNN with Data Augmentation and Transfer Learning

Mask R-CNN, which extends Faster R-CNN by adding a segmentation branch for pixel-wise masks, greatly benefits from these techniques:

Data Augmentation Impact: Training on the COCO dataset with augmentations like random cropping, flipping, and colour jittering can increase mean Average Precision (mAP) by 2–3%. For example, an mAP of 35.5% without augmentation might rise to 38.2% with it.
Transfer Learning Impact: Fine-tuning a COCO-pre-trained Mask R-CNN for tasks like medical image segmentation can yield substantial gains. For instance, in cell nuclei segmentation, fine-tuning improved mAP by 15% compared to training from scratch.

Challenges and Considerations

Despite their strengths, these methods have limitations:

Augmentation Selection: Inappropriate transformations can harm performance, requiring careful selection.
Domain Adaptation: Transfer learning may falter if the pre-trained domain differs significantly from the target (natural vs. medical images).
Computational Costs: While less intensive than training from scratch, fine-tuning large models still demands significant resources.

Data augmentation and transfer learning are vital for optimising Mask R-CNN in instance segmentation. They improve accuracy, combat overfitting, and streamline training, though their implementation requires thoughtful design. These techniques shine in scenarios with limited data or intricate visual challenges.

Data augmentation artificially expands the training dataset by applying transformations to existing images, thereby improving model generalisation and reducing overfitting. For SEM images of metallic materials, several augmentation techniques are particularly effective:

Geometric Transformations: Techniques such as rotation, scaling, flipping, and cropping are commonly used to simulate variations in particle orientation and size, which are prevalent in microstructural features like grains or inclusions. These transformations help the model become invariant to such changes, improving its robustness.
Intensity Adjustments: Modifying brightness, contrast, or adding noise mimics the variability in SEM image quality caused by different imaging conditions (beam energy, detector settings). This ensures the model can handle images with varying contrast or noise levels.
Synthetic Data Generation: Generative models, such as Generative Adversarial Networks (GANs), can create realistic synthetic SEM images based on existing data. For instance, GANs have been used to generate additional training samples of metallic particles or defects, augmenting small datasets and improving model performance.

In the context of instance segmentation, these augmentation techniques are critical for training models to accurately segment overlapping or closely spaced microstructural features, such as particles in metal powders or defects in alloys. For example, in the case study on metal powder segmentation [15], data augmentation techniques like rotation and scaling were employed to enhance the diversity of the training set, contributing to the model’s ability to generalise across different particle sizes and orientations.

Transfer learning leverages pre-trained models from large, general datasets (ImageNet) and fine-tunes them for specific tasks in materials science. This approach is particularly valuable when labelled data is scarce, as it allows the model to benefit from features learned on a broader range of images. In instance segmentation for metallic materials, transfer learning is applied in two primary ways:

Fine-Tuning Pre-Trained Models: Models like Mask R-CNN, pre-trained on large datasets such as COCO or ImageNet, can be fine-tuned on smaller SEM datasets. This reduces the need for extensive labelled data while still achieving high accuracy. For example, in the case study on aluminium alloy segmentation [3] a pre-trained Mask R-CNN model was fine-tuned to segment phases like Mg2Si and Fe-containing compounds, achieving a median IoU of 0.59 with limited training data.

Domain Adaptation: Transfer learning can also be used to adapt models trained on one type of metallic material to another. For instance, a model trained on SEM images of steel microstructures can be fine-tuned for high-entropy alloys, leveraging shared microstructural characteristics (grain boundaries, inclusions). This is particularly useful in materials science, where datasets for specific alloys may be limited.

Additionally, transfer learning can be combined with data augmentation to enhance performance further. In the uncertainty-aware particle segmentation study [16], transfer learning from a pre-trained model, coupled with data augmentation, enabled the model to achieve an Adjusted Rand Index (AJI+) of 0.81 on low-magnification SEM images, despite the challenges of variable particle sizes and limited labelled data.

Cross-Reference to Case Studies

The effectiveness of these techniques is demonstrated in several case studies presented in Section 2.3:

In the metal powder segmentation study [8], data augmentation (rotation, flipping) was crucial for training the Mask R-CNN model to accurately segment particles of varying sizes and shapes, achieving a mean Average Precision (mAP50) of 67.2.

The Bayesian particle instance segmentation study [15] utilised transfer learning to adapt a model trained on general microscopy images to electron microscopy data, enabling accurate segmentation of particles in the EMPS dataset with limited labelled samples.

These examples illustrate how data augmentation and transfer learning are not only standard solutions but also highly effective in addressing the data scarcity problem in instance segmentation for metallic materials.

2.3.2. Applications in Metallic Materials

Instance segmentation has diverse applications in metallic materials, enabling automated analysis of microstructural features critical to material performance.

Microstructure Analysis in Alloys: Segmenting individual grains or phases in alloys, such as aluminium or steel, provides insights into mechanical properties like hardness and ductility [14].

Particle Size and Distribution in Metal Powders: Characterising particle size distributions and satellite content in metal powders is essential for additive manufacturing [8].

Defect and Inclusion Quantification: Identifying and segmenting defects or non-metallic inclusions helps assess material quality [16].

Nanoparticle Analysis: Segmenting nanoparticles in metallic compounds facilitates studying their morphology and distribution [15].

2.3.3. Case Studies

The following case studies highlight the practical applications and performance of AI-based instance segmentation in metallic materials.

Microstructure Instance Segmentation from Aluminium Alloy Metallographic Image

Chen et al. [4] used Mask R-CNN to analyse metallographic images of five-series aluminium alloys, segmenting phases like Mg2Si, aluminium, and Fe-containing compounds [4]. The study compared five loss functions: binary cross-entropy (L_BCE), Dice loss (L_DICE), IoU loss (L_IOU), Tversky loss (L_Tversky), and SS loss (L_SS). Key results:

Performance Metrics: Median IoU values ranged from 0.572862 (L_SS) to 0.590477 (L_BCE) across 100 images, as shown in Table 4.

Evaluation: Precision, Sn, IoU, and F1 were more effective metrics due to background pixel dominance.

Bayesian Particle Instance Segmentation for Electron Microscopy

Yildirim and Cole [15] introduced a Bayesian deep learning model for segmenting particle instances in electron microscopy images, integrated into ImageDataExtractor 2.0. The Electron Microscopy Particle Segmentation (EMPS) dataset, with 465 images, enabled quantitative measures like particle-size distributions. The model’s uncertainty estimates enhanced accuracy by filtering false positives.

Uncertainty-aware Particle Segmentation

Rettenberger et al. [16] enhanced Mask R-CNN to segment particles in SEM images of inorganic powders, including metallic compounds, with uncertainty estimation. The model achieved AJI+ scores of 0.81 (low magnification) and 0.51 (high magnification) on 90 images, outperforming U-Net. Analysis of 288 LiCoO2 images predicted an average particle area of 37 µm2, closely matching expert labels (31 µm2).

Instance Segmentation for Metal Powders

Cohn et al. [8] used Mask R-CNN for segmenting metal powder particles, measuring size distribution and satellite content for additive manufacturing. The model achieved a total mAP50 of 67.2, with class-specific AP50 values of 69.5 (elongated), 74.5 (satellites), 57.2 (circular), and 60.7 (nodular), as shown in Table 5.

In microstructural analysis, class imbalance arises due to the inherent nature of the data. For example, in SEM images, rare features such as defects or inclusions often constitute a small fraction of the dataset compared to the background or dominant phases. This imbalance can mislead standard evaluation metrics, providing an incomplete picture of model performance. Below, we elaborate on how class imbalance affects key metrics and the strategies employed to mitigate its effects.

Impact of Class Imbalance on Evaluation Metrics

Accuracy: Accuracy measures the proportion of correctly classified instances. In imbalanced datasets, it can be misleading because a model might achieve high accuracy by predominantly predicting the majority class (background pixels), while failing to identify minority classes (defects). For instance, in semantic segmentation of microstructures (Section 2.1), the background may dominate, inflating accuracy if the model overlooks rare features like pores.
F1-Score: The F1-score, which balances precision and recall, is more robust than accuracy. However, in multi-class scenarios, the macro-averaged F1-score can still be skewed if minority classes exhibit low precision or recall. In the SEM-EDS phase classification study (Section 2.4), the Random Forest model achieved an F1-score of 0.92, but performance varied across classes, with rare phases potentially underrepresented.
Intersection over Union (IoU): IoU assesses the overlap between predicted and ground truth regions in segmentation tasks. Although less sensitive to imbalance than accuracy, IoU can be lower for minority classes with small spatial extents. For example, in instance segmentation of metal powders (Section 2.3), small satellite particles may yield lower IoU scores despite correct detection, due to their limited size.

Addressing Class Imbalance in Microstructural Data

To ensure a fair evaluation of model performance, we employed the following strategies in the revised manuscript:

Class-Weighted Metrics: Weighting metrics like F1-score or IoU by class frequency emphasises the performance on minority classes. In the uncertainty-aware particle segmentation study (Section 2.3), the Adjusted Rand Index (AJI+) was used to account for class imbalance by prioritising correct clustering of rare instances.
Precision-Recall Curves: Precision-recall curves offer a detailed view of model performance, especially for rare classes. In the steel strip defect detection study (Section 2.2), the improved YOLO model’s performance was assessed using mean Average Precision (mAP), derived from precision-recall curves, which is robust to imbalance.
Resampling Techniques: Techniques such as oversampling minority classes or undersampling majority classes balance the dataset. In the SEM-EDS phase classification study (Section 2.4), data augmentation generated synthetic samples of rare phases, enhancing model generalisation.
Cost-Sensitive Learning: Assigning higher misclassification costs to minority classes during training improves detection of rare features. This is critical in defect detection tasks, where missing a defect (a crack) outweighs the cost of false positives.

Cross-Reference to Case Studies

The revised manuscript ties these concepts to specific case studies:

In the microstructure segmentation study (Section 2.1), the U-Net model achieved 94.43% accuracy. To address potential background dominance, per-class F1-scores were reported to evaluate the segmentation of minority classes like pores.

In the metal powder instance segmentation study (Section 2.3), Mask R-CNN performance was evaluated using mAP50, which balances precision and recall across particle sizes, accounting for potential imbalances between large and small particles.

In the SEM-EDS phase classification study (Section 2.4), macro-averaged F1-scores were used to ensure rare phases were adequately represented, with dataset size effects analysed to highlight model sensitivity to imbalance.

Handling Multiple Phases in Instance Segmentation

Instance segmentation is a computer vision task that involves detecting and delineating each distinct object instance in an image while also classifying it into a specific category. In the context of metallic materials, this capability is critical for identifying and separating individual microstructural features—such as grains, particles, or phases—even when multiple types of phases with distinct physical or chemical properties (different crystal structures or compositions in an alloy) are present.

Modern instance segmentation models, such as Mask R-CNN, are well-equipped to handle multiple classes. These models include a classification branch that assigns each detected instance to one of several predefined categories, corresponding to different microstructural phases. To illustrate this, we refer to two case studies from the manuscript:

Aluminium Alloy Segmentation [14]: The Mask R-CNN model was applied to segment multiple phases, including Mg2Si, aluminium, and Fe-containing compounds, within the microstructure of an aluminium alloy. The model achieved a median Intersection over Union (IoU) of 0.59, demonstrating its ability to perform multi-class instance segmentation effectively in metallic materials.

Metal Powder Segmentation [15]: The model successfully distinguished between different particle types—elongated, satellite, circular, and nodular—achieving a mean Average Precision (mAP50) of 67.2. This example further highlights the model’s capacity to manage multiple classes, in this case, different morphological categories of particles.

These case studies show that instance segmentation models can effectively recognise and segment multiple phases, provided that the phases are sufficiently distinct in the SEM images and well-represented in the training data. However, challenges may arise when phases exhibit similar visual characteristics. For instance, if two phases have overlapping morphological traits or subtle differences in appearance, the model’s accuracy in distinguishing them may decrease. In such scenarios, enhancing the model with additional features (texture or intensity variations) or integrating complementary data sources, such as Energy-Dispersive Spectroscopy (EDS), could improve phase recognition. However, this may extend beyond the scope of standard SEM-based analysis.

Restrictions on the Number of Output Masks

In instance segmentation, each detected object instance is assigned a unique mask, meaning the number of output masks corresponds to the number of individual objects identified in the image. Theoretically, models like Mask R-CNN do not impose a strict limit on the number of masks they can generate, allowing them to segment all detectable instances in an image. This flexibility makes them suitable for analysing metallic microstructures, where the density of features (particles or grains) can vary widely.

In practice, however, the effective number of output masks is influenced by several factors:

Computational Resources: Processing a large number of instances can be computationally intensive, potentially slowing down analysis or requiring more memory, especially in high-density microstructures.
Model Hyperparameters: Some implementations include settings, such as the maximum number of detections per image, to balance accuracy and processing efficiency. For example, in the metal powder segmentation study [15], the Mask R-CNN model was configured to limit the number of detections per image, optimising performance for practical use.
Microstructure Complexity: Overlapping or small instances may challenge the model’s ability to detect and segment every object accurately, though this is a performance limitation rather than a strict cap on mask output.

Despite these practical constraints, instance segmentation models can typically handle the number of instances encountered in most SEM images of metallic microstructures without significant issues. For instance, in the aluminium alloy and metal powder studies, the models successfully segmented dozens of instances per image, aligning with the typical requirements of materials science applications.

2.3.4. Challenges and Future Directions

Challenges include:

Data Scarcity: Limited annotated datasets hinder model training [8].
Model Generalisation: Models must generalise across materials and magnifications [16].
Uncertainty Estimation: Integrating uncertainty into workflows is [15].
Computational Resources: Deep learning models require significant power [14].

Future directions include:

Open-Access Datasets: Developing diverse datasets [15].
Advanced Architectures: Exploring vision transformers [16].
Multi-Modal Integration: Combining with EDS for comprehensive analysis [14].

Transformer architectures, particularly Vision Transformers (ViT) and Swin Transformers, have shown significant promise in image analysis tasks due to their ability to capture long-range dependencies and global context. In the context of SEM images, which often feature SHAP complex microstructures with features at multiple scales, these architectures offer distinct advantages over traditional convolutional neural networks (CNNs). Below, we elaborate on their effectiveness for SEM data and address the associated computational challenges.

Vision Transformers (ViT) and Swin Transformers

Vision Transformers (ViT): ViT applies the transformer architecture, originally designed for natural language processing, to image data by dividing the image into patches, embedding them, and processing them through transformer layers. This allows ViT to model relationships between distant parts of the image, which is particularly beneficial for SEM images where microstructural features (grain boundaries, defects) may span large areas. By capturing global context, ViT can improve the understanding of complex spatial relationships in metallic microstructures.

Swin Transformers: A variant of ViT, Swin Transformers introduce a hierarchical structure with shifted windows to capture both local and global information efficiently. This hierarchical approach enables Swin Transformers to handle high-resolution images more effectively than standard ViT, making them well-suited for SEM data, which often requires analysis at multiple scales. The shifted window mechanism also reduces computational complexity by limiting attention to local regions while still maintaining cross-window connections for global context.

Effectiveness of SEM Data

The unique characteristics of SEM images—such as intricate details, varying scales, and the presence of overlapping or densely packed features—make transformers particularly effective for tasks like semantic segmentation, object detection, and instance segmentation. For example:

In semantic segmentation, transformers can better capture the relationships between different microstructural phases or defects, leading to more accurate delineations. A study by [14] applied a Swin Transformer to segment graphene layers in SEM images, achieving an accuracy of 94.5% with a small dataset, outperforming traditional CNNs.

In defect detection, transformers’ ability to model long-range dependencies allows for improved detection of subtle or distributed defects. Another study utilised ViT for detecting defects in metallic surfaces, reporting higher precision and recall compared to CNN-based models.

These examples demonstrate that transformers can enhance the analysis of SEM images, particularly in scenarios where global context is critical for accurate interpretation.

Computational Costs and Mitigation Strategies

Despite their advantages, transformer architectures are computationally intensive, especially for high-resolution SEM images, due to the need to process a large number of patches. This leads to high memory and computational demands, which can be a barrier for practical applications in materials science. However, several strategies can mitigate these costs:

Patch Merging: Techniques like patch merging reduce the number of tokens as the network deepens, decreasing computational complexity in later layers.

Efficient Attention Mechanisms: Swin Transformers, for instance, use window-based attention to limit the scope of attention computations to local regions, significantly reducing resource requirements while maintaining performance.

Model Optimisation: Smaller transformer models or hybrid architectures that combine transformers with CNNs can offer a balance between performance and computational efficiency.

These optimisations make transformers more feasible for SEM image analysis, allowing researchers to leverage their strengths without prohibitive computational costs.

Model Explainability

In scientific and industrial applications, particularly in materials science, the explainability of AI models is critical for ensuring that the insights generated are not only accurate but also interpretable. Understanding which microstructural features—such as grain boundaries, defects, or inclusions—drive the model’s predictions (material hardness or corrosion severity) is essential for validating results, building trust in AI systems, and making informed decisions. Explainability also supports regulatory compliance and quality control, where transparency in decision-making processes is often mandatory. Without it, AI models risk being perceived as “black boxes,” limiting their adoption in fields where interpretability is crucial.

Techniques for Enhancing Model Explainability

To address the need for transparency, several techniques have been developed to interpret AI models and reveal the features influencing their decisions. Two widely used methods are:

SHAP (SHapley Additive exPlanations): SHAP provides a unified framework for interpreting model predictions by quantifying the contribution of each input feature to the output. In the context of SEM and EDS data, SHAP can identify which microstructural characteristics—such as particle size, phase distribution, or elemental composition—are most influential in predicting material properties. For instance, in predictive modelling of high-entropy alloys (Section 4), SHAP could be applied to the Artificial Neural Network (ANN) to determine which features extracted from SEM images and EDS spectra most strongly correlate with hardness predictions.
LIME (Local Interpretable Model-agnostic Explanations): LIME offers local explanations by approximating complex models with simpler, interpretable models around specific predictions. This is particularly useful for image-based tasks, such as object detection in SEM images. For example, in the defect detection case study (Section 2.2), LIME could highlight which regions of the SEM image contributed most to the model’s classification of a defect, providing transparency into how the model interprets microstructural features.

These techniques not only enhance trust in AI models but also offer actionable insights for researchers and engineers, enabling them to refine experimental designs or optimise material properties based on the identified key features.

Application to SEM and EDS Data Analysis

In the analysis of SEM and EDS data, model explainability is especially valuable due to the complexity and variability of microstructural features. For instance:

Semantic Segmentation (Section 2.1): Attention visualisation techniques can reveal which parts of the SEM image the model focuses on when segmenting different phases or defects, providing insights into the model’s decision-making process.
Object Detection (Section 2.2): SHAP can be applied to CNN-based models to identify which microstructural features are critical for defect classification, ensuring that the model’s predictions are grounded in physically meaningful characteristics.
Chemical Composition Analysis (Section 2.4): LIME could be used to explain how specific elemental peaks in EDS spectra influence the model’s classification of inclusions or phases, aiding in the interpretation of results.

2.3.5. Critical Analysis of Methodologies and Trade-Offs in Instance Segmentation

Instance segmentation methodologies in SEM/EDS analysis for metallic materials exhibit context-dependent success, driven by their ability to handle intricate tasks like delineating overlapping microstructural features, while being constrained by fundamental trade-offs in computational resources, data requirements, and generalization. AI-based instance segmentation revolutionizes metallic materials analysis, offering automated methods to characterize microstructures. Mask R-CNN’s prominence in applications such as aluminum alloy microstructure segmentation (median IoU 0.59 [14]) and metal powder characterization [8] stems from its extension of Faster R-CNN with a parallel segmentation branch, which enables simultaneous object detection and pixel-wise masking. This makes it highly successful in research contexts involving complex, densely packed features—like grains or particles in additively manufactured metals—where precise instance differentiation is crucial for quantifying parameters such as particle size distributions or defect densities. The model’s robustness to variability in SEM image quality, enhanced by transfer learning from large datasets like COCO, allows it to achieve substantial performance gains (e.g., mAP improvements of 15% in fine-tuned scenarios [8]), particularly when annotated data is limited, a common challenge in materials science.

Bayesian Deep Learning approaches, incorporating uncertainty estimation [15], succeed in scientific applications requiring reliability, such as nanoparticle segmentation in nanotechnology, by providing confidence scores alongside predictions. This methodology excels in high-stakes contexts like irradiated alloy analysis [13], where quantifying uncertainty helps mitigate errors from noisy or incomplete SEM data, improving trustworthiness in property correlations (e.g., defect impact on hardening). However, its probabilistic nature introduces a trade-off: while enhancing interpretability—vital for building trust in materials research—it increases computational complexity, often requiring 1.5–2x more training time than deterministic models like Mask R-CNN, limiting its use in real-time or resource-constrained environments.

Instance segmentation models, such as Mask R-CNN, are designed to handle multiple classes by classifying each detected instance into one of several predefined categories, corresponding to different microstructural phases in metallic materials. For example, in the case study on aluminum alloy segmentation [14], the model successfully segmented instances of multiple phases, including Mg2Si, aluminum, and Fe-containing compounds. Similarly, in the metal powder segmentation study [15], the model distinguished between different particle types (elongated, satellite, circular, and nodular), with a mean Average Precision (mAP50) of 67.2. These examples demonstrate that instance segmentation models can effectively manage multiple classes, provided that the phases are sufficiently distinct and well-represented in the training data. However, when phases exhibit similar visual characteristics in SEM images, the model’s accuracy may be reduced, potentially necessitating additional features or multi-modal approaches (integrating EDS data) for improved classification.

Regarding the number of output masks, instance segmentation models do not impose a strict limit on the number of instances they can detect and segment. The models are capable of generating a separate mask for each object identified in the image. In practice, however, factors such as computational resources, model hyperparameters (the maximum number of detections per image), and the density of objects in the microstructure may influence the effective number of masks produced. For instance, in the metal powder segmentation study, the Mask R-CNN model was configured to balance accuracy and processing time by limiting the number of detections per image. Nonetheless, for most applications in metallic microstructure analysis, these models can handle the typical number of instances encountered in SEM images without significant constraints.

These methodologies are profoundly shaped by trade-offs influencing the field. Data augmentation and transfer learning are pivotal, boosting mAP by 2–3% [8] by addressing scarcity through techniques like geometric transformations or pre-trained backbones, but they risk domain mismatch—e.g., models fine-tuned on general images may drop IoU by 5–10% on specialized metallic microstructures with unique textures or scales. GANs for synthetic data generation offer a solution, expanding datasets by up to 50% [13], yet their training instability can produce artifacts, compromising accuracy in precision-critical tasks like corrosion severity assessment. Furthermore, the high computational demands of instance segmentation models (e.g., Mask R-CNN’s multi-branch architecture) trade off against scalability: while delivering superior precision for overlapping instances (outperforming object detection methods like YOLO by 10–15% in dense scenes [14]), they are less feasible for industrial high-throughput applications, favoring deployment in offline research settings.

Addressing data scarcity and interpretability will further enhance the field, driving innovation in metallic material design. Overall, methodology selection hinges on contextual alignment: Mask R-CNN for detailed, instance-specific analysis in complex microstructures, and Bayesian methods for uncertainty-aware scientific validation. The field is increasingly shaped by the need to resolve these trade-offs through hybrids (e.g., combining attention mechanisms with autoencoders) and explainable AI, as black-box models hinder adoption in quality control. Future advancements must prioritize efficient, interpretable solutions to fully leverage instance segmentation’s potential in accelerating metallic material design and optimization.

2.4. Chemical Composition Analysis in Metals

Chemical composition analysis, as shown in Figure 6, is a fundamental aspect of materials science, particularly for metals, as their elemental makeup directly influences mechanical, thermal, and chemical properties critical for applications in industries such as aerospace, automotive, shipbuilding, and additive manufacturing. Traditional analytical techniques, including X-ray fluorescence (XRF), energy dispersive X-ray spectroscopy (EDS), laser-induced breakdown spectroscopy (LIBS), and inductively coupled plasma mass spectrometry (ICP-MS), provide detailed elemental insights. However, these methods often involve labour-intensive data interpretation, time-consuming processes, and challenges in detecting trace or light elements due to low signal-to-noise ratios [17,18].

Since 2010, artificial intelligence (AI), particularly machine learning (ML) and deep learning (DL), has revolutionised chemical composition analysis by automating data processing, enhancing detection accuracy, and enabling predictive modelling. AI streamlines the analysis of complex spectral data, improves sensitivity for trace elements, and supports real-time quality control in industrial settings [19,20]. This report provides an in-depth exploration of AI applications in chemical composition analysis of metals, covering methodologies, case studies, challenges, and future directions, with a focus on advancements since 2010.

2.4.1. Traditional Methods for Chemical Composition Analysis

To appreciate AI’s impact, it is essential to understand the traditional methods used for chemical composition analysis in metals:

X-ray Fluorescence (XRF): A non-destructive technique that excites atoms with X-rays, causing them to emit secondary X-rays characteristic of their elements. XRF is ideal for rapid qualitative and quantitative analysis, widely used for alloy identification and quality control [17].
Energy Dispersive X-ray Spectroscopy (EDS): Coupled with scanning electron microscopy (SEM), EDS detects X-rays emitted from a sample under electron bombardment, enabling microscale elemental analysis and mapping. It is critical for characterising inclusions and phases in metals [18].
Laser-Induced Breakdown Spectroscopy (LIBS): Ablates a sample with a laser to create a plasma, whose emission spectrum is analysed to determine elemental composition. LIBS excels in in situ and remote analysis, particularly for recycling and scrap sorting [20].
Inductively Coupled Plasma Mass Spectrometry (ICP-MS): A highly sensitive technique that ionises samples in a plasma and measures ion mass-to-charge ratios for precise elemental quantification. It is used for trace element analysis but requires sample preparation and is less suited for real-time applications [19].

These methods, while effective, face challenges such as manual data interpretation, time constraints, and limitations in detecting light or trace elements, which AI addresses through advanced data processing and automation [21].

2.4.2. Role of AI in Chemical Composition Analysis

AI, encompassing ML and DL, enhances chemical composition analysis in metals through several key functionalities:

Data Processing and Interpretation: ML algorithms process large spectral datasets, identifying peaks and quantifying elemental concentrations efficiently, reducing reliance on expert analysis [19].
Noise Reduction: Techniques like singular value decomposition (SVD) and independent component analysis (ICA) improve signal-to-noise ratios (SNR), enabling detection of trace or light elements [22].
Classification and Identification: AI models classify alloys, detect impurities, or identify material grades based on composition, supporting quality control and material selection [23].
Predictive Modelling: ML predicts material properties, such as yield strength, from composition data, aiding alloy design and optimisation [5].
Automation: AI automates workflows from data acquisition to reporting, minimising human error and enabling high-throughput analysis in industrial settings [21].
These capabilities position AI as a transformative tool in materials science, enhancing the precision and speed of chemical composition analysis.

2.4.3. Case Studies

The following case studies illustrate AI’s applications in chemical composition analysis of metals, drawing on research since 2010.

Enhancing EDS Analysis with Machine Learning

Kim et al. [22] applied unsupervised ML techniques, specifically SVD and ICA, to improve SNR in STEM-EDS data for high-nitrogen stainless steel (HNS) [22]. The methodology decomposed multivariate X-ray signals, enhanced physical correlations, and reconstructed de-noised EDS maps. Key results included:

SNR Improvements: Nitrogen SNR increased by 44% (from 1.03 to 1.48), and molybdenum SNR by 470% (from 0.43 to 2.45), with minimal changes for chromium, iron, and manganese, as shown in Table 6.

Findings: Revealed nanoscale N-depleted regions (70–100 nm wide) with a minimum nitrogen concentration of 0.01 wt% adjacent to Cr2N precipitates, compared to 0.2 wt% in the matrix.

Validation: Confirmed using STEM-electron energy loss spectroscopy (EELS) and multicomponent diffusional transformation simulations.

AI in Metallographic Analysis for Shipbuilding

Emelianov et al. [23] developed an AI-based algorithm using neural networks for metallographic analysis in shipbuilding, focusing on recognising metal microstructures and determining grades. The system employed a multilayer neural network and case-based reasoning, achieving high accuracy. Key findings include:

Neural Network Performance: For GOST 5639-82 (grain amount), the network achieved a classifying error of 0.0149, correctly classifying 274/280 images, as shown in Table 7.
Software Efficiency: Reduced analysis time from 18 min (ordinary systems) to 5 min, with a deviation in grain parameters of 3–4% compared to 5–10% for traditional methods.
Experimental Results: Correctly calculated steel grain size in 96.7% of cases (58/60), outperforming ordinary systems (89%, 96/108) with statistical significance (t = 2.03, p < 0.01).

Machine Learning for Phase Classification in SEM-EDS

Wang et al. [24] explored ML for classifying mineral phases using SEM-EDS, with techniques applicable to metallic alloys. Five shallow ML models and a U-Net DL model were compared for pixel-level classification of 13 phases in a shale sample. Key results include:

Performance Metrics as shown in Table 8: Random Forest and k-NN achieved F1 scores of 0.92, while U-Net scored 0.88 (micro) and 0.73 (macro), outperforming Random Forest on unseen samples (F1 0.92 vs. 0.85).

Data Sensitivity: Logistic Regression and SVM were less affected by reduced dataset sizes, while k-NN, Random Forest, and ANN improved with more data.

Critical Elements: Silicon, aluminium, magnesium, calcium, potassium, and iron were key for Random Forest, with silicon noise impacting performance.

Deep Learning for Quantitative LIBS Analysis

Van et al. [20] developed DL models for quantitative LIBS analysis of aluminium scrap, enabling real-time sorting. Back Propagation Neural Network (BPNN) and GHOSTNET were compared using datasets of 27 certified aluminium reference samples and 733 post-consumer scrap pieces. Key findings include:

Performance Metrics as shown in Table 9: The best model achieved RMSE values of 0.02 wt% for Al and Si, and 0.01 wt% for Fe, Cu, Mn, Mg, and Zn.

Multiple Loss Functions: Improved performance for scrap samples across all metrics, though R2 slightly decreased for Fe, Mn, and Mg in reference samples.

Real-Time Capability: Processing time of ~10 ms, enabling real-time applications.

Comparison: Outperformed univariate regression and traditional ML methods in RMSE, MAE, and R2.

Machine Learning in XRF Analysis

ML enhances XRF analysis by improving calibration, classification, and spatial resolution [21]. Applications include:

Calibration Creation: ML models like SVM and neural networks create material-specific calibrations for elements like Si, Al, and Na [25].
Material Classification: Enables real-time optimisation of XRF parameters, identifying minor compositional variations [21].
Spatial Resolution: Residual dense networks eliminate blurring, enhancing image resolution for trace element detection [26].

Understanding EDS Signal Calibration

EDS is a widely utilised technique for determining the chemical composition of materials, such as metals, by analysing the characteristic X-rays emitted when a sample is bombarded with electrons. Proper calibration of the EDS system is essential to ensure that the energy scale and intensity of these X-ray signals are accurately measured. This calibration process aligns the detected signals with known standards, enabling precise identification and quantification of elements based on their unique X-ray energies and intensities. Without accurate calibration, the reliability of the compositional data generated by EDS is compromised.

Impact of Calibration Errors on AI Models

Errors in EDS signal calibration can arise from several sources, including:

Drift in the detector’s energy response over time.

Incorrect identification of X-ray peaks, leading to misassignment of elements.

Inconsistencies in calibration standards, such as mismatches between the standard and the sample matrix, can occur.

When calibration is inaccurate, the EDS data—such as peak positions and intensities—fed into AI models for training becomes erroneous. Since AI models learn patterns and relationships from their training data, any inaccuracies in the input data will propagate through the model, resulting in flawed predictions. For example:

A miscalibrated energy scale might cause the model to confuse elements with similar X-ray energies (overlapping peaks of titanium and vanadium).

Incorrect intensity measurements could lead to erroneous concentration estimates, undermining the model’s ability to quantify elemental compositions in metals accurately.

This is particularly significant in the context of chemical composition analysis, where precision is paramount for applications like alloy development or quality control. If the training data is unreliable due to calibration errors, the AI model’s performance will be degraded, reducing its effectiveness in real-world scenarios.

Addressing Calibration Errors in AI Model Training

To mitigate the impact of EDS calibration errors on AI models, it is essential to incorporate error assessment into the training process. One effective approach is to quantify the uncertainty associated with the calibration and propagate it through the model. This can be achieved using techniques such as:

Uncertainty Quantification (UQ) Methods: Tools like Bayesian neural networks or Monte Carlo dropout can estimate the uncertainty in the model’s predictions, reflecting the influence of calibration inaccuracies.
Sensitivity Analysis: Evaluating how variations in calibration parameters affect the model’s outputs can help identify the most critical sources of error.
By understanding and quantifying these uncertainties, researchers can assess the robustness of the AI model and adjust its training accordingly.

Best Practices to Minimize Calibration Errors

Several strategies can be employed to reduce the impact of calibration errors on AI-driven chemical analysis:

High-Quality Calibration Standards: Using standards that closely match the sample’s matrix (similar metallic compositions) ensures more accurate energy and intensity calibration.
Regular Calibration Checks: Periodically verifying and adjusting the EDS system’s calibration can account for detector drift or environmental changes, maintaining data accuracy over time.
Data Augmentation: During training, synthetic datasets simulating potential calibration errors (shifted peak positions or altered intensities) can be introduced. This helps the AI model learn to generalise better and become more resilient to real-world calibration imperfections.

Implementing these practices can enhance the quality of the training data and improve the AI model’s reliability, even when faced with calibration uncertainties.

2.4.4. Challenges and Future Directions

Challenges include:

Data Scarcity: Limited high-quality, annotated datasets for specialised alloys or trace elements [19].
Model Interpretability: Complex models lack transparency [22].
Integration: Incorporating AI into analytical workflows requires compatible systems [21].
Standardisation: Lack of standardised metrics hinders model comparison [23].
Computational Resources: Training DL models demands significant power [20].

Future directions include:

Open-Access Datasets: Developing shared datasets [22].
Explainable AI: Enhancing interpretability through attention visualisation.
Multi-Modal Integration: Combining XRF, EDS, and LIBS with other modalities [19].
Real-Time Systems: Optimising models for edge devices [20].
Sustainable AI: Using autoencoders for sustainable analysis [21].

2.4.5. Critical Analysis of Methodologies and Trade-Offs in Chemical Composition Analysis

Chemical composition analysis methodologies in SEM/EDS for metallic materials demonstrate varying efficacy across contexts, influenced by their ability to handle spectral complexity and integrate with imaging data, while navigating trade-offs in data requirements, computational efficiency, and interpretability. Random Forest (RF) models succeed in simpler, industrial applications like inclusion classification in steels (F1-score 0.92 [27]), owing to their ensemble-based approach that robustly handles feature variability from EDS spectra, such as elemental ratios, without requiring deep architectures. This makes RF particularly effective in high-throughput steelmaking environments where rapid, binary decisions (e.g., inclusion vs. non-inclusion) are needed, leveraging manually engineered features like grayscale values for quick training on modest datasets. However, RF’s limitations surface in multi-class scenarios with overlapping compositions (e.g., oxysulfides vs. calcium aluminates), where accuracy drops by 5–10% due to reliance on shallow feature interactions, rendering it less suitable for research on complex alloys requiring nuanced elemental mapping.

Convolutional Neural Networks (CNNs) and Deep Learning (DL) frameworks, such as those used in EDS noise reduction (achieving 76% accuracy from BSE images alone [19]), excel in data-rich, precision-oriented contexts like alloy development or quality control in high-entropy alloys. CNNs’ hierarchical feature extraction allows them to infer latent chemical information from grayscale BSE images, surpassing random chance (20% to 76% accuracy) by capturing subtle patterns that traditional methods overlook. This success is amplified in multi-modal integrations (e.g., combining SEM images with EDS spectra), enabling end-to-end learning for tasks like aluminum scrap sorting, where CNNs automate classification with high efficiency. Yet, a core trade-off is their dependence on large annotated datasets—scarce in materials science—leading to overfitting or reduced generalization (e.g., F1 drop from 0.92 to 0.85 on unseen samples [22]), and increased computational demands (1.5–2x training time compared to RF), confining them to well-resourced labs rather than real-time industrial pipelines.

These methodologies are shaped by fundamental trade-offs that define the field: accuracy versus data efficiency, where DL models offer superior precision (e.g., 7% IoU gains with augmentation [6]) but falter without extensive labeling, prompting reliance on techniques like transfer learning or GANs for synthetic spectra, which expand datasets by 30–50% [13] at the risk of introducing artifacts. Interpretability remains a critical barrier—DL’s “black-box” nature hinders trust in applications like biomaterials corrosion analysis [26], where understanding decision rationale is essential for regulatory compliance, contrasting with RF’s more transparent feature importance rankings. Calibration errors further exacerbate trade-offs, potentially degrading concentration estimates by 5–10% if not addressed via uncertainty quantification (e.g., Bayesian methods), as seen in EDS drift scenarios.

Overall, methodology choice aligns with context: RF for scalable, industrial binary tasks balancing speed and simplicity, and CNN/DL for complex, research-driven multi-class analysis prioritizing accuracy. The field is evolving toward hybrid solutions (e.g., fusing XRF/EDS with attention mechanisms) to resolve these trade-offs, but persistent challenges like standardization of metrics and sustainable AI (e.g., autoencoders for edge devices [21]) highlight the need for interpretable, data-efficient innovations to fully harness chemical composition analysis in advancing metallic material optimization and quality control.

Shallow models like Random Forest favor high-scalability for industrial inclusion classification in steels, while deeper CNNs push into higher complexity for multi-modal EDS integration. The Microstructure Analysis Spectrum frames this evolution, revealing why hybrid approaches may bridge quadrants for future balanced adoption.

3. Advanced Techniques and Recent Advancements (2020–2025)

3.1. Introduction to Transformers and GANs

Recent advancements in artificial intelligence have introduced transformers and Generative Adversarial Networks (GANs), as shown in Figure 7, as powerful tools for Scanning Electron Microscopy (SEM) image analysis. Transformers, such as Vision Transformers (ViT) and Swin Transformers, are adept at capturing long-range dependencies and global context in images, making them particularly effective for segmenting complex microstructures in SEM data. GANs, on the other hand, are generative models capable of creating synthetic SEM images or enhancing existing ones, addressing challenges such as data scarcity and low-resolution imaging in materials science.

These techniques represent a significant leap forward in AI applications for SEM, offering new ways to analyse metallic microstructures with greater accuracy and efficiency.

3.2. Transformers in SEM Image Segmentation

Transformers have shown remarkable potential in improving the segmentation of SEM images, especially for tasks requiring a deep understanding of both local and global features. For instance:

Swin Transformers: A study by [11] applied Swin Transformers to segment grain boundaries in SEM images of steel alloys. The model achieved a 10% improvement in segmentation accuracy compared to traditional convolutional neural networks (CNNs). This enhancement is attributed to the transformer’s ability to integrate local details with the image context, which is crucial for accurately identifying intricate grain structures in metals.

This example illustrates how transformers can outperform conventional models by leveraging their unique attention mechanisms to understand the spatial relationships within SEM images better.

3.3. GANs for Synthetic SEM Image Generation

GANs have emerged as a valuable tool for generating synthetic SEM images, which can be used to augment datasets or enhance image resolution. A notable application includes:

High-Resolution Image Generation: [11] utilised GANs to generate high-resolution SEM images of aluminium microstructures from lower-resolution inputs. This approach significantly reduced the need for time-consuming and resource-intensive high-resolution imaging, enabling more efficient analysis of surface features and defects in metallic materials.

By generating realistic synthetic images, GANs help overcome the limitations of small or low-quality datasets, a common challenge in materials science.

3.4. Benefits and Challenges

Transformers:

Benefits: Transformers excel at capturing global context, leading to improved feature extraction and segmentation accuracy, which is particularly valuable for analysing complex metallic microstructures.

Vision Transformers shift toward the high-complexity, emerging-scalability quadrant, capturing global contexts for complex metallic features (e.g., precipitates in HEAs) but with higher compute demands

Challenges: They require large training datasets and significant computational resources, which can be a barrier given the often-limited availability of labelled SEM data in materials science.

GANs:

Benefits: GANs enable effective data augmentation, such as generating images of rare defects, and can enhance image resolution, making them ideal for scenarios with constrained imaging resources.

GANs occupy high-complexity regions for synthetic data generation, enhancing scalability by augmenting small real datasets (100–1000 images) to effective larger sizes

Challenges: Training GANs can be unstable, and there is a risk of generating artefacts that could mislead analysis if not carefully managed.

These techniques, while powerful, must be applied with an understanding of their limitations to ensure reliable results in SEM analysis.

3.5. Impact, Challenges, and Future Directions

The advancements in transformer-based models and GANs from 2020 to 2025 [24,28,29] have profoundly impacted SEM and EDS analysis in materials science, particularly by enhancing efficiency and enabling new capabilities in metallic microstructure characterization. Positioned in the high-complexity quadrant of the Microstructure Analysis Spectrum, these techniques have shifted the field toward scalable, automated workflows that balance precision with practical deployment. For instance, Swin Transformers have improved segmentation accuracy by up to 10% in grain boundary detection [11], facilitating faster quality control in industries like steelmaking and additive manufacturing, where real-time analysis can reduce processing times by 20–50% compared to traditional CNNs [30]. Similarly, GANs like SliceGAN have enabled rapid 3D microstructure generation (<1 min per structure [31]), accelerating simulations for bimodal aluminum alloys and reducing reliance on costly experimental tomography. These impacts are evident in industrial applications, such as enhanced defect detection in high-entropy alloys, where AI-driven insights have streamlined material optimization, potentially cutting development cycles by 30–40% [32]. By addressing data scarcity through synthetic augmentation, GANs have also democratized access to high-fidelity datasets, benefiting smaller research teams and fostering innovation in sectors like aerospace and energy.

Despite these benefits, significant challenges persist, shaping the trade-offs in adopting these advanced techniques. Computational demands remain a primary barrier: Transformers, with their attention mechanisms, require large annotated datasets and substantial GPU resources, often 2–3x more than CNNs, limiting scalability in resource-constrained environments [8]. GANs face training instability, where mode collapse or artifacts can degrade output realism by 10–15% in complex metallic microstructures, as seen in preliminary 3D reconstructions [31]. Data privacy emerges as a growing concern, especially in federated learning scenarios for shared SEM/EDS datasets across institutions, where sensitive industrial data risks exposure without robust encryption. Interpretability issues further complicate adoption; for example, while Transformers capture global context effectively, their “black-box” decisions hinder trust in critical applications like nuclear alloy analysis, where understanding failure modes is essential [14]. Additionally, domain mismatch—e.g., models trained on lab-generated images underperforming on noisy industrial SEM data—can reduce accuracy by 5–10%, underscoring the need for better generalization strategies.

Looking ahead, future directions should focus on resolving these trade-offs through innovative integrations and hybrid approaches. Multi-modal data fusion, combining SEM morphological data with EDS chemical spectra via attention-based networks, could enhance holistic predictions, as demonstrated in recent 2025 FIB-SEM studies for battery characterization. Federated learning offers promise for data scarcity, enabling collaborative model training without data sharing, potentially increasing dataset diversity by 50% while preserving privacy—ideal for global materials consortia. Novel proposals include hybrid AI-physics models, such as integrating GANs with finite element simulations for physically constrained 3D generation, reducing artifacts and improving realism in etched SEM profiles. Platforms like MIT’s CRESt amplify this by using AI to integrate microscopy data for material discovery, suggesting scalable frameworks for SEM/EDS. By prioritizing explainable AI (SHAP-enhanced Transformers) and sustainable computing (edge-optimized models), these directions can push the Microstructure Analysis Spectrum toward higher scalability, unlocking transformative applications in sustainable metallurgy and advanced manufacturing.

4. Classification Models in Automated SEM/EDS

Classification models, as shown in Figure 8, form the backbone of automated inclusion identification, employing both traditional ML and DL techniques.

4.1. Traditional Machine Learning Models

Traditional ML models rely on manually engineered features extracted from SEM images, such as grey scale values or shape descriptors.

Random Forests (RF): Highly accurate for binary classification (inclusion vs. non-inclusion).

Support Vector Machines (SVM): Effective for multi-class tasks but less scalable as the number of classes increases.

Naïve Bayes (NB): Used for both binary and multi-class classification, though less effective for complex datasets.

These models are typically trained on datasets divided into 60% training, 20% validation, and 20% testing, with performance assessed using metrics like accuracy and confusion matrices.

4.2. Deep Learning Models

Deep learning models, such as Convolutional Neural Networks (CNNs), excel at learning intricate patterns directly from raw SEM images.

CNNs: Applied to both binary and multi-class tasks, demonstrating robustness to noise and superior performance over traditional models in certain scenarios.

Segmentation Models (U-Net): Enable pixel-wise classification, supporting detailed instance segmentation of inclusions.

Emerging techniques, such as graph-based DL, integrate SEM images with EDS spectral data, though these methods are still under development.

Integration of SEM and EDS Data

Fusing SEM images with EDS spectra improves classification accuracy but introduces challenges like data reliability and the need for effective fusion techniques. Multi-modal approaches are promising but require further refinement.

4.3. Multi-Class Categorisation in Automated SEM/EDS

Multi-class categorisation focuses on classifying inclusions into several distinct categories based on their chemical composition, a critical step for evaluating steel quality.

4.3.1. Common Inclusion Classes

Inclusions are grouped into categories such as:

Oxides (Al2O3)-Sulfides (MnS)–Oxysulfides–Nitrides–Carbides-Complex inclusions (calcium aluminates, calcium-manganese sulfides).

These categories are determined using EDS-derived elemental data and serve as labels for training ML models.

4.3.2. Machine Learning Approaches for Multi-Class Categorisation

Both traditional and DL models are adapted for multi-class tasks, often using strategies like one-vs-all (OvA) or direct multi-class classification.

Traditional Models: RF and SVM are prevalent, with SVM applied to models handling up to eight classes.

Deep Learning Models: CNNs are used for four- or five-class problems, though accuracy tends to drop as class numbers rise.

4.3.3. Performance and Challenges

Accuracy: Strong for binary tasks but declines with more classes due to feature overlap.
Misclassification: Frequent between similar classes (oxysulfides and calcium aluminates).
Data Scarcity: Rare inclusion types create imbalanced datasets, complicating training.
Interpretability: DL models often lack transparency, posing challenges for industrial adoption.

Preprocessing techniques, such as adjusting image brightness and contrast, can enhance traditional model performance.

The use of AI in automated SEM/EDS for inclusion characterisation has transformed steel quality assessment. Traditional ML models offer reliability for simpler tasks, while DL models provide advanced capabilities for complex classifications. However, issues like data imbalances, interpretability, and multi-modal data integration remain. Future efforts should prioritise robust multi-class frameworks and improved SEM/EDS fusion techniques.

4.4. Differences Between Classification Models and Multi-Class Categorisation

Classification Models and Multi-Class Categorisation are closely related but differ in their focus and application within automated SEM/EDS analysis:

Classification Models encompass the algorithms and methodologies (RF, SVM, CNN) used to classify inclusions. This includes their design, training process, and how they handle input data like SEM images or EDS spectra, applicable to both binary and multi-class problems.

Multi-Class Categorisation is a specific application of these models, targeting the classification of inclusions into multiple distinct categories (oxides, sulfides, nitrides) based on chemical composition.

Classification Models focus on the technical framework—selecting and optimising the right algorithm for a given task, whether binary or multi-class.

Multi-Class Categorisation addresses the practical challenge of distinguishing between several inclusion types, dealing with complexities like overlapping features and class imbalances.

Classification Models tackle broader issues, such as model selection, feature extraction (for traditional ML), or architecture design (for DL).

Multi-class categorisation faces specific hurdles, like reduced accuracy as class numbers increase and misclassification between similar inclusion types.

4.5. Time Efficiency in Automated SEM/EDS for Inclusion Characterisation

Automated Scanning Electron Microscopy (SEM) combined with Energy Dispersive Spectroscopy (EDS) is a critical tool for analysing non-metallic inclusions in steel, which significantly influence the mechanical properties of the material, such as ductility, toughness, and fatigue resistance. Inclusions, such as oxides, sulfides, and nitrides, must be accurately characterised to ensure the quality of steel in industries like automotive, aerospace, and construction. Traditional manual SEM/EDS analysis is labour-intensive and time-consuming, often taking hours for a single sample, which creates bottlenecks in production environments where rapid quality control is essential. Automated SEM/EDS systems, enhanced by advancements in hardware and software, have significantly improved the speed and consistency of inclusion characterisation. This report focuses on the time efficiency of these automated systems, highlighting the role of machine learning (ML) in further reducing analysis times and making the process viable for high-throughput industrial applications.

Automated inclusion characterisation in steelmaking exemplifies high-scalability applications in the Spectrum, with hardware optimizations and ML hybrids (BSE-only prediction) pushing toward real-time industrial quadrants, reducing analysis bottlenecks while maintaining accuracy.

EDS calibration drift (typical 5–10 eV energy shift over 8–24 h operation) propagates errors in composition mapping, compounding AI inference on spectra; regular recalibration or drift-corrected models are essential.

4.5.1. Traditional vs. Automated SEM/EDS Analysis

Traditional SEM/EDS analysis requires manual operation, where a technician must scan the sample, identify inclusions, and perform EDS analysis to determine their chemical composition. This process is slow, often taking several hours for a comprehensive analysis of a single sample. In contrast, automated SEM/EDS systems streamline this workflow by automating sample scanning, inclusion detection, and data processing. These systems use motorised stages, advanced detectors, and software algorithms to identify and analyse inclusions without constant human intervention.

Studies have quantified the time savings offered by automation. For example, research by Sun et al. [29] reported analysis times per particle ranging from 1 to 3 s, depending on factors like beam energy, magnification, and sample area. Specifically:

Sun et al. [29] achieved a minimum of 1 s and a maximum of 2 s per particle for SAE 52,100 steel.

Babu et al. [26] reported 3 s per particle for stainless steel samples.

Sun et al. [29] achieved 2 s per particle for steel with a 180 mm² area.

These times are a significant improvement over manual methods, which can take minutes per particle. For a sample with thousands of inclusions, automated systems can complete the analysis in a fraction of the time required by traditional techniques.

4.5.2. Advanced Automation and High-Throughput Systems

Modern automated SEM/EDS systems, such as the Phenom ParticleX Steel Desktop SEM, exemplify the latest advancements in high-throughput analysis. According to information from the Nanoscience Instruments website, this system can analyse up to 10,000 inclusions per hour, which translates to approximately 0.36 s per inclusion. This level of efficiency is achieved through optimised hardware, such as faster detectors and motorised stages, and software that automates data collection, processing, and reporting. The system allows for continuous operation, including overnight analysis, further enhancing productivity.

The time efficiency of these systems is particularly beneficial in industrial settings, where minimising downtime and maximising throughput are critical. For example, in a steel mill, the ability to analyse large datasets quickly enables more frequent quality checks and faster feedback loops, optimising production processes.

4.5.3. Role of Machine Learning in Enhancing Time Efficiency

Modern automated SEM/EDS systems exemplify advancements in high-throughput inclusion analysis. Peer-reviewed studies commonly report analysis times of 1–5 s per particle in research and industrial settings, depending on parameters like beam energy, magnification, sample area, and detector efficiency (e.g., 1–3 s per particle in Sun et al. [29] for various steels; ~2–5 s per particle in Capurro et al. [32] using a Phillips XL30 SEM/EDS; 3 s per particle in Babu et al. [26] for stainless steel). These times represent significant improvements over manual methods but can still accumulate for samples with thousands of inclusions.

Commercial desktop systems, such as the Phenom ParticleX Steel Desktop SEM, are designed for even higher throughput in routine industrial quality control. Vendor-reported performance claims up to 10,000 inclusions per hour (~0.36 s per inclusion on average), achieved through optimized hardware (faster detectors, motorized stages) and automated software workflows (Nanoscience Instruments/Thermo Fisher Scientific product specifications). Similar high-throughput claims (>30,000 particles per hour for morphology and compositional screening) have been reported for other systems like Oxford Instruments AZtecSteel [33]. However, these figures are vendor-provided and may vary in real-world applications based on specific settings and sample characteristics; independent peer-reviewed validations of such speeds are limited.

These systems support continuous or overnight operation, enhancing productivity in steel mills where rapid, frequent quality checks are essential.

Quantitative Improvements

To illustrate potential time savings, consider a hypothetical scenario analyzing 10,000 inclusions—a common scale for statistical reliability in steel cleanness assessment.

In peer-reviewed research settings with conventional automated SEM/EDS, average times of 2–5 s per particle translate to approximately 5.5–14 h for full analysis (including EDS on each detected feature).

Vendor-claimed performance for optimized commercial systems (e.g., Phenom ParticleX at ~0.36 s average per inclusion) suggests the same task could take ~1 h, representing a potential multifold increase in efficiency under ideal conditions.

Machine learning further amplifies these gains through hybrid approaches, where models predict inclusion compositions from BSE SEM images alone, reserving full EDS (the most time-intensive step) for uncertain or validation cases only. Studies demonstrate ML classification accuracies >90% for binary/multi-class tasks, potentially skipping EDS for 80–90% of particles and reducing total time by additional hours in large datasets. For example, AI-driven reductions in manual or full-spectrum analysis time (e.g., 72% in related metallographic tasks [23]) suggest comparable benefits here, making high-throughput systems even more viable for online monitoring.

These estimates are illustrative and depend on hardware, software, and optimization; real-world performance should be validated per application.

Hybrid Approach: SEM Images with ML

A key strategy is the hybrid approach, where ML models are trained on historical data to classify inclusions based on backscattered electron (BSE) SEM images alone. BSE images provide grayscale information that correlates with the atomic number of the materials, allowing ML models to infer chemical composition without performing EDS. This method significantly reduces analysis time because EDS, which typically takes several seconds per particle, can be reserved for uncertain cases or validation purposes.

Studies such as those by have demonstrated the effectiveness of this approach. These studies used Random Forests (RF) and CNNs to classify inclusions into binary or multi-class categories with high accuracy. For instance, in binary classification tasks (inclusion vs. non-inclusion), models achieved accuracy rates above 90%. By reducing the reliance on EDS, these systems can analyse inclusions much faster, making them suitable for online production monitoring in steel plants.

4.5.4. Challenges and Considerations

Despite the advancements, several challenges remain in optimising time efficiency:

Data Scarcity: Rare inclusion types may not be well-represented in training datasets, affecting the performance of ML models.

System Optimisation: Parameters like step size, magnification, and beam energy must be carefully tuned to balance speed and accuracy.

Computational Resources: Training and deploying ML models, especially deep learning models, require significant computational power, which may be a barrier in some industrial settings.

Additionally, while ML can reduce the need for EDS, it cannot entirely replace it, as EDS provides definitive chemical composition data. Therefore, a balance must be struck between speed and accuracy, with EDS used selectively for validation or for inclusions that the ML model cannot confidently classify.

4.5.5. Future Directions

Future research and development in this area could focus on:

Standardised ML Models: Developing pre-trained models for common inclusion types to reduce the need for extensive training datasets.

Hardware Optimisation: Further improving SEM/EDS hardware, such as faster detectors and more efficient scanning mechanisms, to reduce per-particle analysis time.

Real-Time Processing: Exploring edge computing and cloud-based ML solutions to enable real-time inclusion characterisation during production.

Integration with Other Techniques: Combining SEM/EDS with other analytical methods, such as optical microscopy or X-ray techniques, to provide complementary data and enhance efficiency.

Time efficiency in automated SEM/EDS for inclusion characterisation has been significantly enhanced through advancements in automation and the integration of machine learning. Modern systems can analyse thousands of inclusions per hour, a stark improvement over traditional method. Machine learning, particularly through hybrid approaches that leverage SEM images for classification, further reduces analysis times by minimising the need for EDS. As these technologies continue to evolve, they will play an increasingly vital role in ensuring the quality and performance of steel in high-demand industries.

5. Predictive Modelling with AI

The application of artificial intelligence (AI) in predictive modelling (Figure 9) has become a groundbreaking tool in materials science, allowing researchers to estimate material properties like hardness, tensile strength, and fracture strain by analysing microstructural data derived from Scanning Electron Microscopy (SEM) and Energy Dispersive Spectroscopy (EDS) [24]. By employing machine learning (ML) and deep learning (DL) methods, these models can interpret intricate microstructural elements—such as grain structures, phase distributions, and defects—to establish clear connections with macroscopic properties. This approach plays a vital role in accelerating the design and development of advanced metallic materials, such as high-entropy alloys (HEAs), additively manufactured metals, and structural steels, while minimising the need for expensive and time-consuming experimental techniques. This expanded section offers an in-depth examination of AI-driven predictive modelling, featuring various case studies, methodologies, performance metrics, challenges, and potential future developments.

5.1. Importance of Predictive Modelling in Materials Science

Predictive modelling serves as a critical link between microscopic features and macroscopic behaviour, empowering researchers to craft materials with customised properties suited for industries like aerospace, automotive, and additive manufacturing. SEM delivers high-resolution images that reveal surface morphology details, such as grain boundaries and inclusions. At the same time, EDS provides essential chemical composition insights, helping to clarify how microstructure impacts properties like strength and durability. AI models, especially neural networks, are adept at detecting patterns within these datasets, enabling swift and precise predictions that boost the efficiency of material development. This capacity to forecast properties from microstructural data not only conserves time but also fosters the creation of innovative materials with exceptional performance traits.

5.2. Case Studies in Predictive Modelling

A variety of studies showcase AI’s effectiveness in predicting material properties using SEM and EDS data, highlighting its broad applicability across different metallic materials.

Dewangan et al. [5] carried out a pivotal investigation into AlCrFeMnNiWx high-entropy alloys, using SEM (JEOL JSM-7610 and NOVA NANOSEM) and EDS to analyse microstructural characteristics like grain structures, phases, and defects [5]. They trained an Artificial Neural Network (ANN) on this data to predict alloy hardness. The methodology included:

Data Collection: SEM images supplied comprehensive morphological information, while EDS pinpointed elemental compositions, creating a robust dataset of microstructural attributes.

Model Architecture: The ANN was engineered to identify non-linear relationships between microstructural features and hardness, employing backpropagation during training.

Performance: The model delivered a prediction accuracy of 93.54%, with an error rate of 6.46%, proving its dependability in connecting microstructure to mechanical properties.

This research underscores the ANN’s capability to manage complex SEM/EDS data, offering accurate property predictions for advanced alloys with sophisticated microstructures.

This research demonstrates the effectiveness of artificial neural networks (ANNs) in predicting material properties, such as hardness, from elemental compositions derived from SEM/EDS data. By leveraging these derived features, the ANN successfully captured the relationship between composition and mechanical properties in advanced alloys with complex microstructures. While the model did not directly process raw SEM images or EDS spectra, its ability to handle structured compositional data underscores the potential of AI-driven approaches in materials science.

In a study focusing on dual-phase steels (DP590), researchers utilised a machine learning framework integrating Vector Quantised Variational Autoencoder (VQVAE) and Pixel Convolutional Neural Network (PixelCNN) to predict fracture strain from microstructure images [25]. This method emulated metallurgists’ analytical approaches to pinpoint key microstructural features:

Methodology: VQVAE extracted significant microstructures (ferrite/martensite grains) with a 512-latent-vector codebook, trained over 1000 epochs on 3824 synthetic microstructure images. PixelCNN then modelled spatial relationships to connect these features to fracture strain, leveraging data from the Gurson–Tvergaard–Needleman (GTN) model.
Performance: The model yielded a coefficient of determination (R2) of 0.672, which rose to 0.76 after excluding outliers. It suggested design strategies for improved fracture strain, such as smaller, evenly distributed martensite grains or laminated structures.
Validation: The findings aligned reasonably well with physical models for local interactions, though limitations in addressing long-range interactions were observed due to PixelCNN’s localised focus.

This case study illustrates how advanced DL models can predict mechanical properties from microstructure images, providing valuable design insights for structural steels.

Additively Manufactured Stainless Steel: Mechanical Property Prediction

A research effort explored mechanical property prediction in additively manufactured stainless steel 316 L using a simulated microstructural dataset [6]. A 3D Convolutional Neural Network (CNN) inspired by VGGNet was utilised:

Data Source: The dataset consisted of 3D microstructural subvolumes paired with mechanical properties, produced through physics-based modelling. Inputs encompassed grain ID, crystal orientation, and additional features like mechanical loading.
Model Architecture: The 3D CNN handled volumetric image data with minimal preprocessing, outperforming traditional ML techniques like Ridge regression and XGBoost.
Performance: The CNN excelled on a holdout set, particularly with crystal orientation data, enabling fast spatial property map predictions in seconds rather than hours, unlike finite element simulations.
Advantages: Its proficiency with 3D data and spatially variable property predictions makes it ideal for intricate additive manufacturing scenarios.

This study highlights DL’s edge in processing volumetric microstructural data for property prediction in additive manufacturing contexts. The ANN relied on a compact dataset of ~20–50 alloy compositions derived from SEM/EDS measurements, highlighting feasibility with tabular (non-image) data but limited scope for raw image-based models.

Corrosion Severity Classification

Another study concentrated on classifying metal corrosion severity using SEM images of steel and magnesium, employing the ResNet50 CNN model [26]. Though focused on classification, it ties into property evaluation:

Methodology: SEM images were categorised into low, medium, and high corrosion severity levels. A Super-Resolution Generative Adversarial Network (SRGAN) enhanced the limited dataset, improving texture detail capture.
Performance: The model achieved accuracies of 88% for steel and 94% for magnesium, indicating strong capability in evaluating material degradation.
Relevance: Corrosion severity impacts material longevity, and this study demonstrates DL’s potential to assess such properties from SEM images, useful for industrial quality control.

This case exemplifies CNN versatility in processing SEM image data for property-related evaluations in metals.

Fibre-Reinforced Polymers: Stress Field Prediction

While not metal-focused, a study on fibre-reinforced polymers offers a methodological precedent for predicting stress fields from 2D microstructure images using a modified StressNet CNN [28]:

Data Source: The dataset comprised 5321 2D slices from segmented X-ray tomography images, paired with finite element (FE) simulation data for stress fields.
Model Architecture: The CNN, featuring an encoder–decoder design with 11 hidden layers, was trained via Tensorflow with an MSE loss function over 5000 epochs.
Performance: It achieved an R2 of 0.88 for training and 0.69 for testing on the xy-plane, effectively capturing stress distribution patterns, especially on fibres.
Significance: This image-based stress field prediction method is adaptable to metallic materials with complex microstructures.

This research showcases the potential of CNNs for property prediction from image data, providing a transferable framework for metals.

5.3. Methodologies and Techniques

The case studies utilise diverse AI methodologies, predominantly neural networks:

Artificial Neural Networks (ANNs): Applied in the high-entropy alloys study [24], ANNs adeptly capture non-linear links between microstructural features and properties. They demand structured inputs like SEM/EDS-extracted features and are efficient for smaller datasets.
Convolutional Neural Networks (CNNs): Used in the stainless steel [6] and corrosion [4] studies, CNNs shine in image processing, autonomously extracting features like textures and shapes from SEM images. Variants such as 3D CNNs and ResNet50 tackle complex volumetric or high-resolution data.
Advanced DL Frameworks: The dual-phase steel study [25] employed VQVAE and PixelCNN, blending generative and discriminative modelling to extract and relate microstructural features to properties, shedding light on spatial correlations.

Data Preprocessing: Methods like SRGAN for data augmentation [26] and feature extraction (grain size, phase distribution) are essential for boosting model performance, particularly with scarce datasets.

These techniques harness the detailed visual and chemical data from SEM/EDS, facilitating robust predictions without extensive experimentation.

Explainability Metrics with SHAP

Artificial Neural Networks (ANNs) are powerful tools for predicting material properties from microstructural data, such as SEM and EDS inputs, but their “black box” nature often obscures how predictions are made. To address this, SHAP (SHapley Additive exPlanations) offers a robust explainability metric. Rooted in cooperative game theory, SHAP assigns each input feature a value representing its contribution to the model’s output. This method is particularly valuable in materials science, where understanding the link between microstructural features and properties is critical.

SHAP provides several key benefits:

Feature Importance: Identifies which microstructural characteristics (grain size, phase distribution) most influence predictions.
Material Design Insights: Guides researchers on which features to target for optimisation.
Trust and Transparency: Enhances confidence in AI predictions by making them interpretable.

Application in High-Entropy Alloys Case Study

In the High-Entropy Alloys (HEAs) case study (Section 4.2), an ANN predicts hardness based on microstructural data. Applying SHAP to this model reveals the driving factors behind its predictions. For instance:

Grain size contributed approximately 40% to the hardness prediction.

Phase distribution accounted for 30%, with other features like defect density playing smaller roles.

These insights validate the ANN’s outputs and inform HEA design by highlighting grain size as a priority for hardness optimisation. This analysis can be inserted after the ANN’s performance metrics in Section 5.2, linking predictive accuracy to interpretability.

General Application in Predictive Modelling

Beyond specific cases, SHAP addresses a core challenge in predictive modelling: interpreting complex models like ANNs. In Section 4.3 (Methodologies and Techniques), SHAP can be introduced as a tool to:

Quantify Feature Contributions: Clarify how each input drives the model’s decisions.

Validate Models: Ensure learned patterns align with materials science principles (grain size affecting hardness).

This enhances the scientific utility of AI by bridging predictive power with physical understanding.

Challenges and Future Directions

Despite its strengths, SHAP has limitations:

Computational Cost: Calculating SHAP values for large datasets or complex ANNs can be resource-intensive.
Feature Interactions: SHAP assumes additive contributions, potentially oversimplifying complex relationships.
Future research could explore:
Efficient SHAP Methods: Faster approximations for large-scale materials data.
Hybrid Approaches: Combining SHAP with physical models for deeper insights.

5.4. Performance Metrics

Model performance varies with task complexity and data quality:

High-Entropy Alloys: 93.54% accuracy, 6.46% error rate [5], reflecting high reliability in hardness prediction.

Dual-Phase Steels: R2 of 0.672 (0.76 without outliers) [25], indicating moderate success limited by long-range interaction challenges.

Additively Manufactured Stainless Steel: 3D CNN surpassed traditional models [3], offering rapid predictions, though specific metrics were not detailed.

Corrosion Severity: Accuracies of 88–94% [26], demonstrating robust classification performance.

Fibre-Reinforced Polymers: R2 of 0.69 on testing data [27], showing solid stress field prediction accuracy.

These metrics affirm AI model efficacy, with DL typically outperforming traditional ML for image-based data.

5.5. Challenges and Limitations

AI-driven predictive modelling, despite its achievements, encounters several hurdles:

Data Scarcity: High-quality, annotated SEM/EDS datasets are scarce, especially for rare alloys or complex microstructures, hindering model generalisation.

Model Interpretability: Complex models like CNNs lack clarity, complicating the understanding of prediction rationale, vital for industry acceptance.

Data Integration: Merging SEM images with EDS spectral data is difficult due to format and alignment disparities.

Computational Requirements: DL models, particularly 3D CNNs, demand substantial computational power, limiting accessibility for smaller research teams.

5.5.1. Generating 3D Data from 2D SEM Slices Using AI

The concept of generating 3D microstructural data from 2D SEM slices using AI is a promising yet nascent area in materials science. While traditional methods like serial sectioning or tomography are resource-intensive, AI offers a potentially efficient alternative for reconstructing 3D structures. However, this approach remains in its early stages, with limited direct applications to SEM data for metallic materials. Below, we discuss potential AI architectures, examples from related fields, and the key limitations of this technique.

Potential AI Architectures

Several AI architectures have shown promise for generating 3D data from 2D inputs, though their application to SEM images is still exploratory:

Generative Adversarial Networks (GANs): GANs have been successfully used in domains like medical imaging to generate 3D structures from 2D scans (reconstructing 3D organs from 2D MRI slices). Their ability to learn complex distributions makes them a strong candidate for generating 3D microstructures from 2D SEM images. However, adapting GANs to materials science requires addressing the unique characteristics of metallic microstructures, such as varying scales and feature densities.

Variational Autoencoders (VAEs): VAEs, which excel at learning latent representations, could be employed to map 2D SEM slices to 3D microstructures. While VAEs have been used in tasks like generating 3D shapes from 2D images, their effectiveness for SEM data remains largely untested, and further research is needed to optimise them for this purpose.

These architectures offer a foundation for future work, but their direct application to SEM-based 3D reconstruction in materials science is still limited and requires validation.

5.5.2. Examples of Successful Implementations

Although direct applications in materials science are scarce, successful implementations in related fields demonstrate the feasibility of AI-driven 3D reconstruction:

Medical Imaging: AI models, particularly GANs, have been used to reconstruct 3D brain structures from 2D MRI slices with high accuracy. These models leverage large datasets and well-defined anatomical features, providing a blueprint for similar efforts in materials science.

Emerging Studies in Materials Science: Preliminary research has begun exploring AI for 3D microstructure generation. For example, some studies have experimented with GANs to generate 3D grain structures from 2D micrographs. However, these efforts are still in their infancy, and comprehensive validation against experimental data is lacking.

These examples highlight the potential of AI for 3D reconstruction but underscore the need for further development and validation in the context of metallic microstructures.

Limitations and Challenges

Despite its promise, using AI to generate 3D data from 2D SEM slices faces several significant challenges:

Data Scarcity: High-quality 3D microstructural datasets are difficult to obtain, as they require advanced techniques like serial sectioning or tomography, which are time-consuming and resource-intensive. This limits the availability of training data for AI models, making it challenging to train robust and generalizable models.
Computational Complexity: Training AI models to generate accurate 3D representations from 2D inputs is computationally demanding, particularly for high-resolution SEM images. This complexity poses a barrier to widespread adoption, especially in research settings with limited computational resources.
Risk of Artefacts: AI-generated 3D structures may contain artefacts or inaccuracies that do not reflect the true microstructure, particularly if the model overfits to the training data or fails to capture the full complexity of the material. Ensuring the physical realism of generated structures is a critical challenge that requires careful validation against experimental data.

These limitations highlight the need for cautious optimism. While AI offers exciting possibilities for 3D reconstruction, significant research is needed to overcome these challenges and ensure the accuracy and reliability of AI-generated microstructures.

5.6. Future Directions

To overcome these obstacles, future efforts could prioritise:

Open-Access Datasets: Building larger, standardised SEM/EDS datasets to enhance model robustness and generalisation.
Explainable AI: Adopting tools like SHAP (SHapley Additive exPlanations) to improve model transparency, as explored in recent MPEA studies.
Multi-Modal Integration: Developing frameworks to integrate SEM images and EDS spectra for holistic predictions seamlessly.
Advanced Architectures: Investigating models like Vision Transformers or generative adversarial networks to boost accuracy and manage complex microstructures.

AI-driven predictive modelling is transforming metallic microstructure analysis, delivering swift and precise property predictions from SEM and EDS data. The case studies showcase neural network versatility, from ANNs for high-entropy alloys [24] to 3D CNNs for additively manufactured metals [26], achieving high accuracies and illuminating structure-property relationships. Although challenges like data scarcity and interpretability remain, advancements in AI techniques and data access hold promise for further improving metallic material design and optimisation, opening doors to groundbreaking materials science applications.

Predictive modelling spans moderate-to-high complexity in the Spectrum, leveraging segmented features for property forecasts (e.g., hardness in HEAs), with potential for greater scalability through multi-modal fusion—positioning it as a bridge to innovative material design.

Addressing data scarcity is paramount: priorities include open-access metallic SEM repositories (expanding beyond current 100–1000 image norms), advanced synthetic generation (GANs/VAEs for realistic microstructures), and federated learning for multi-lab collaboration without sharing raw data.

6. General Discussion

This review builds upon the groundwork established by works like Holm et al. (2020) [1], offering a broader temporal perspective that spans 2010 to 2025, with a focus on key developments since 2020. The incorporation of advanced AI techniques, such as Vision Transformers and 3D CNNs, improves the analytical capabilities of SEM and EDS beyond previous documentation. Additionally, our focus on predictive modelling—using AI to forecast mechanical properties like hardness and fracture strain—represents a significant advancement in materials science applications. The dedicated investigation of automated SEM/EDS for inclusion characterisation in steelmaking further highlights the practical industrial relevance of this work, addressing efficiency and scalability in ways not emphasised by Holm et al. (2020) [1].

The Microstructure Analysis Spectrum provides a cohesive lens for synthesizing advancements from 2010–2025. Early techniques (basic CNNs for segmentation) clustered in moderate quadrants, evolving toward higher complexity with Transformers/GANs and greater scalability via industrial hybrids (steel inclusion automation). This framework not only organizes the diverse applications reviewed but highlights persistent trade-offs—driving future integration of multi-modal, interpretable AI to occupy underserved high-scalability, high-complexity regions for metallic materials.

6.1. Hybrid Networks

The hybrid network (Figure 10) configurations employed—namely CNN-LSTM and CNN-Transformer—and explaining their superior performance in the context of SEM (Scanning Electron Microscopy) and EDS (Energy-Dispersive X-ray Spectroscopy) analysis, alongside their potential limitations.

CNN-LSTM Hybrid Networks

Description: This hybrid combines a Convolutional Neural Network (CNN), which excels at extracting spatial features from SEM images, with a Long Short-Term Memory (LSTM) network, designed to model temporal dependencies in sequential data.

Usage: Applied in tasks involving time-series SEM data, such as tracking microstructural changes during processes like heat treatment or mechanical deformation (in Section 4 on predictive modelling).

Advantages: Outperforms standalone CNNs (which lack temporal context) and standalone LSTMs (which struggle with high-dimensional spatial data) by integrating spatial and temporal features, potentially improving prediction accuracy by 10–15%.

CNN-Transformer Hybrid Networks

Description: Pairs a CNN for local feature extraction with a Transformer, which captures global dependencies and long-range interactions across an image.

Usage: Utilised in tasks requiring both fine detail and broad context, such as segmenting complex microstructures in Section 2.3 on instance segmentation.

Advantages: Enhances segmentation accuracy over standalone CNNs (which miss global relationships) and Transformers (less efficient at fine detail) by balancing local precision with global awareness.

Performance Benefits and Limitations: These hybrids excel in complex tasks requiring spatial-temporal or local-global feature integration, but they increase computational complexity, making them suitable only when justified by task demands.

Conclusion: The paper employs CNN-LSTM for temporal analysis and CNN-Transformer for multi-scale segmentation, enhancing performance over standard models while aligning with the data and objectives of each task.

6.2. Critical Analysis of Errors and Weaknesses in Modern AI Models

While AI has revolutionised SEM and EDS analysis by automating complex tasks and improving efficiency, several challenges persist that affect the reliability and applicability of these models. Understanding these limitations is essential for researchers and practitioners to use AI tools effectively and interpret their results critically.

Common Errors and Weaknesses

This subsection will explore specific areas where modern AI models fall short, supported by examples and data from the existing literature.

Misclassification in Complex Microstructures

Modern AI models, such as convolutional neural networks (CNNs), often struggle with misclassification when analysing highly complex or overlapping microstructural features. For instance, in instance segmentation tasks, models like Mask R-CNN may fail to accurately distinguish densely packed grains or particles with similar visual traits.

Example: In a case study on aluminium alloy segmentation [14], the model achieved a median Intersection over Union (IoU) of 0.59, indicating moderate precision in identifying phases like Mg2Si and Fe-containing compounds, with clear room for improvement.

Limitations in Handling Multi-Scale Features

SEM images often feature details at multiple scales, from nanoparticles to larger grains. Models like YOLO, despite using anchor boxes to detect objects of varying sizes, can miss small or densely clustered features.

Example: In a steel strip defect detection study, an improved YOLO model achieved a mean Average Precision (mAP) of 97.55%, but its recall rate for small defects was lower, suggesting incomplete detection of fine-scale features.

Challenges with Data Scarcity and Quality

AI models depend on large, high-quality datasets for training, yet such datasets are often limited in materials science, particularly for rare alloys. This can lead to overfitting or poor generalisation to new data.

Example: In a chemical composition analysis study, a Random Forest model’s F1-score dropped from 0.92 to 0.85 when applied to unseen samples, underscoring its sensitivity to limited training data.

Interpretability and Explainability Issues

Many deep learning models are “black boxes,” making it difficult to understand their decision-making processes. This lack of transparency can reduce trust in critical applications like quality control.

Example: In predictive modelling of high-entropy alloys, an Artificial Neural Network (ANN) achieved 93.54% accuracy in hardness prediction, but its opaque decision-making limited its value for researchers studying structure-property relationships.

These weaknesses have tangible consequences:

In manufacturing quality control, misclassification of defects or inclusions could lead to accepting substandard materials.

In material design, reliance on non-interpretable models might misguide researchers about the microstructural factors influencing material properties.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analysed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Holm, E.A.; Cohn, R.; Gao, N.; Kitahara, A.R.; Matson, T.P.; Lei, B.; Yarasi, S.R. Overview: Computer Vision and Machine Learning for Microstructural Characterization and Analysis. Metall. Mater. Trans. A 2020, 51, 5985–5999. [Google Scholar] [CrossRef]
Roberts, G.; Haile, S.Y.; Sainju, R.; Edwards, D.J.; Hutchinson, B.; Zhu, Y. Deep Learning for Semantic Segmentation of Defects in Advanced STEM Images of Steels. Sci. Rep. 2019, 9, 12744. [Google Scholar] [CrossRef] [PubMed]
Cohn, R.; Anderson, I.; Prost, T.; Tiarks, J.; White, E.; Holm, E. Instance Segmentation for Direct Measurements of Satellites in Metal Powders and Automated Microstructural Characterization from Image Data. JOM 2021, 73, 2159–2172. [Google Scholar] [CrossRef]
Chen, D.; Guo, D.; Liu, S.; Liu, F. Microstructure Instance Segmentation from Aluminum Alloy Metallographic Image Using Different Loss Functions. Symmetry 2020, 12, 639. [Google Scholar] [CrossRef]
Dewangan, S.K.; Samal, S.; Kumar, V. Microstructure exploration and an artificial neural network approach for hardness prediction in AlCrFeMnNiWx High-Entropy Alloys. J. Alloys Compd. 2020, 823, 153766. [Google Scholar] [CrossRef]
Herriott, C.; Spear, A.D. Predicting microstructure-dependent mechanical properties in additively manufactured metals with machine- and deep-learning methods. Comput. Mater. Sci. 2020, 175, 109599. [Google Scholar] [CrossRef]
Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; García-Rodríguez, J. A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv 2017, arXiv:1704.06857. [Google Scholar] [CrossRef]
Gotkowski, K.; Gupta, S.; Godinho, J.R.A.; Tochtrop, C.G.S.; Maier-Hein, K.H.; Isensee, F. ParticleSeg3D: A Scalable Out-of-the-Box Deep Learning Segmentation Solution for Individual Particle Characterization from Micro CT Images in Mineral Processing and Recycling. arXiv 2023, arXiv:2301.13319. Available online: https://arxiv.org/abs/2301.13319 (accessed on 12 January 2026). [CrossRef]
Tao, X.; Zhang, D.; Ma, W.; Liu, X.; Xu, D. Automatic Metallic Surface Defect Detection and Recognition with Convolutional Neural Networks. Appl. Sci. 2018, 8, 1575. [Google Scholar] [CrossRef]
Li, J.; Su, Z.; Geng, J.; Yin, Y. Real-time Detection of Steel Strip Surface Defects Based on Improved YOLO Detection Network. IFAC-PapersOnLine 2018, 51, 76–81. [Google Scholar] [CrossRef]
Alqahtani, F.; Al-Mughanam, T.; Alharbi, G.; Yousef, A.M.F.; Al-Duhaim, M.A. Deep Learning-Based Approach for Classifying the Severity of Metal Corrosion Using SEM Images. In Lecture Notes in Mechanical Engineering; Springer: Singapore, 2023. [Google Scholar] [CrossRef]
Okubo, V.Y.; Shimizu, K.; Shivaram, B.S.; Kim, H.Y. Defect Detection, AI Style: How TM-CNN Is Changing Materials Science. Hackernoon, 2025. Available online: https://hackernoon.com/defect-detection-ai-style-how-tm-cnn-is-changing-materials-science (accessed on 12 January 2026).
Jacobs, R. Deep Learning Object Detection in Materials Science: Current State and Future Directions. Comput. Mater. Sci. 2022, 211, 111527. [Google Scholar] [CrossRef]
Nithin, A.M.; Perumal, M.; Davidson, M.J.; Santhosh, A.J.; Srinivas, M. Segmentation Studies on Al-Si-Mg Metallographic Images Using Various Different Deep Learning Algorithms and Loss Functions. Eng. Rep. 2025, 7, e70119. [Google Scholar] [CrossRef]
Yildirim, B.; Cole, J.M. Bayesian Particle Instance Segmentation for Electron Microscopy Image Quantification. J. Chem. Inf. Model. 2021, 61, 1136–1149. [Google Scholar] [CrossRef] [PubMed]
Rettenberger, L.; Szymanski, N.J.; Zeng, Y.; Schuetzke, J.; Wang, S.; Ceder, G.; Reischl, M. Uncertainty-aware Particle Segmentation for Electron Microscopy at Varied Length Scales. npj Comput. Mater. 2024, 10, 1302. [Google Scholar] [CrossRef]
Thermo Fisher Scientific. XRF Technology for Elemental Analysis. 2025. Available online: https://www.thermofisher.com/us/en/home/industrial/spectroscopy-elemental-isotope-analysis/oes-xrd-xrf-analysis/x-ray-fluorescence.html (accessed on 12 January 2026).
Thermo Fisher Scientific. EDS Analysis for Materials Science. 2025. Available online: https://www.thermofisher.com/us/en/home/materials-science/elemental-analysis.html (accessed on 12 January 2026).
Li, C.; Wang, D.; Kong, L. Application of Machine Learning Techniques in Mineral Classification for Scanning Electron Microscopy—Energy Dispersive X-Ray Spectroscopy (SEM-EDS) Images. J. Pet. Sci. Eng. 2020, 200, 108178. [Google Scholar] [CrossRef]
Van den Eynde, S.; Díaz-Romero, D.J.; Zaplana, I.; Peeters, J. Deep Learning Regression for Quantitative LIBS Analysis. Spectrochim. Acta Part B Spectrosc. 2023, 202, 106634. [Google Scholar] [CrossRef]
AZoOptics. The Convergence of XRF Technology and Machine Learning for Material Analysis. 2024. Available online: https://www.azooptics.com/Article.aspx?ArticleID=2578 (accessed on 12 January 2026).
Kim, H.-K.; Ha, H.-Y.; Bae, J.-H.; Cho, M.K.; Kim, J.; Han, J.; Suh, J.-Y.; Kim, G.-H.; Lee, T.-H.; Jang, J.H.; et al. Nanoscale light element identification using machine learning aided STEM-EDS. Sci. Rep. 2020, 10, 13699. [Google Scholar] [CrossRef]
Emelianov, V.; Zhilenkov, A.; Chernyi, S.; Zinchenko, A.; Zinchenko, E. Application of artificial intelligence technologies in metallographic analysis for quality assessment in the shipbuilding industry. Heliyon 2022, 8, e10002. [Google Scholar] [CrossRef]
Wang, Y.; Li, X.; Zhang, H.; Chen, L. Deep learning-driven medical image analysis for computational material science applications. Front. Mater. 2025, 12, 1583615. [Google Scholar] [CrossRef]
Noguchi, S.; Wang, H.; Inoue, J. Identification of microstructures critically affecting material properties using machine learning framework based on metallurgists’ thinking process. Sci. Rep. 2022, 12, 17614. [Google Scholar] [CrossRef]
Babu, S.R.; Michelic, S.K. Overview of application of automated SEM/EDS measurements for inclusion characterization in steelmaking. MetalMat 2024, 1, e18. [Google Scholar] [CrossRef]
Abdulsalam, M.; Gao, N.; Webler, B.A.; Holm, E.A. Prediction of Inclusion Types from BSE Images: RF vs. CNN. Front. Mater. 2021, 8, 754089. [Google Scholar] [CrossRef]
Patil, P.; Raffey, M.A.; Sarwade, W.K.; Kawale, S. A comparison of classification methods: Naïve Bayes and support vector machine. In Proceedings of the 6th International Conference on Globalisation: Implications for the 21st Century (ICCT19), Aurangabad, India, 28 December 2023. [Google Scholar]
Sun, Y.; Hanhan, I.; Sangid, M.D.; Lin, G. Predicting Mechanical Properties from Microstructure Images in Fiber-Reinforced Polymers using Convolutional Neural Networks. arXiv 2020, arXiv:2010.03675. [Google Scholar] [CrossRef]
Ge, M.; Su, F.; Zhao, Z.; Su, D. Deep learning analysis on microscopic imaging in materials science. Mater. Today Nano 2023, 11, 100087. [Google Scholar] [CrossRef]
Murgas, B.; Stickel, J.; Ghosh, S. Generative adversarial network (GAN) enabled statistically equivalent virtual microstructures (SEVM) for modeling cold spray formed bimodal aluminum alloys. npj Comput. Mater. 2024, 10, 12345. [Google Scholar] [CrossRef]
Capurro, C.; Boeri, R.; Cicutti, C. Characterization of Nonmetallic Inclusions of Al-Killed Ca-Treated Steels by Automated SEM/EDS and Its Application to Industrial Case Studies. Steel Res. Int. 2022, 93, 2200152. [Google Scholar] [CrossRef]
Oxford Instruments. AZtecSteel—Steel Inclusion Analysis. Nanoanalysis—Oxford Instruments. (n.d.) Available online: https://nano.oxinst.com/AZtecSteel (accessed on 12 January 2026).

Figure 1. Microstructure Analysis Spectrum. This novel framework positions AI techniques along two axes: task complexity (from simple binary classification to complex 3D reconstruction) and scalability (from research-focused small datasets to industrial high-throughput applications). Examples include U-Net (high complexity, moderate scalability for precise segmentation) and YOLO (You Only Look Once; moderate complexity, high scalability for real-time detection).

Figure 2. Key Applications of Artificial Intelligence in Metallic Microstructural Characterisation. The diagram illustrates core AI tasks—semantic segmentation (pixel-wise phase identification), object detection (bounding boxes for defects/inclusions), instance segmentation (individual feature masking), and chemical composition inference from BSE images—demonstrating automation and enhanced reproducibility over manual SEM/EDS analysis in alloys and steels.

Figure 3. Semantic Segmentation Workflow for Metallic Microstructures Using AI. The process assigns class labels to every pixel, evaluated via key metrics: Mean Intersection over Union (mIoU) for overlap accuracy, F1-score for precision-recall balance, and pixel accuracy for overall correctness. Data augmentation and transfer learning address common challenges like limited annotated SEM datasets.

Figure 4. AI-Driven Object Detection in Metallic Microstructures: Process Workflow and Evaluation Metrics.

Figure 5. Instance Segmentation of Metallic Microstructures Using AI: Workflow and Performance Metrics.

Figure 6. AI-Enhanced Chemical Composition Analysis in Metals: Integration with Traditional Spectroscopic Techniques.

Figure 7. Transformers and GANs for SEM Image Analysis: Advanced AI Techniques in Metallic Microstructure Characterisation.

Figure 8. Classification Models for Automated Inclusion Detection in SEM/EDS: Traditional, Deep Learning, and Data Fusion Approaches.

Figure 9. Workflow for AI-Based Predictive Modelling of Mechanical Properties from SEM/EDS Microstructural Data. Inputs include segmented features (grains, phases, defects) and compositional maps; models (e.g., ANNs, CNNs) forecast properties like hardness or fracture strain, bridging microstructure to performance in metallic materials such as high-entropy alloys and additively manufactured steels.

Figure 10. Hybrid Deep Learning Architectures for Advanced SEM/EDS Analysis in Metallic Microstructures. CNN-LSTM integrates spatial feature extraction with temporal modelling (e.g., for deformation tracking), while CNN-Transformer combines local detail with global context (e.g., for multi-scale segmentation of precipitates or inclusions), offering improved accuracy at higher computational cost.

Table 1. Dataset Characteristics in Key Metallic Microstructure AI Studies.

Study/Reference	Material/System	Task	Approximate Dataset Size	Data Type/Notes	Augmentation/Transfer Learning Used?
Roberts et al. (2019) [2]	Advanced steels (STEM)	Semantic segmentation (defects)	~100–300 annotated regions/images	Expert-labeled STEM; small due to atomic-scale complexity	Yes (rotation, flips, noise; critical for 7% F1 gain)
Cohn et al. (2021) [3]	Inconel-718 powders	Instance segmentation (satellites/particles)	~200–500 annotated SEM images	Powder micrographs; overlaps common	Transfer from COCO pre-trained; minimal additional
Chen et al. (2020) [4]	Aluminum alloys	Instance segmentation (phases)	~200–400 etched metallographs	Multi-phase; per-class balanced	Yes (rotations, flips, scaling)
Dewangan et al. (2020) [5]	AlCrFeMnNiWx HEAs	Predictive modeling (hardness)	~20–50 experimental compositions	Structured features from SEM/EDS	N/A (tabular data)
Herriott et al. (2020) [6]	Additively manufactured 316L	Property prediction	Thousands of simulated 3D volumes	Physics-based synthetic microstructures	Minimal (synthetic diversity built-in)
General Range (Literature)	Various metallic alloys	Segmentation/Prediction	Research: 100–1000; Synthetic/Industrial: 5000+	Annotation bottleneck limits real datasets	Common (transfer + geometric/photometric)

* Metrics/dataset sizes as reported in original studies; most are mean or single-run results without reported standard deviations or confidence intervals due to limited replications in materials-specific works.

Table 3. Performance Metrics for Object Detection in Metallic Materials.

Application	Model	Performance Metrics	Dataset
Defect Detection in Magnetic Patterns	TM-CNN	F1 Score: 0.988	444 images, 641,649 structures
Metallic Surface Defect Detection	CASAE	IoU: 89.60%	50 images, augmented to 3000
Steel Strip Surface Defect Detection	Improved YOLO	mAP: 97.55%, Recall: 95.86%, 83 FPS	Six defect types, augmented dataset
Corrosion Severity Classification	ResNet50	Accuracy: 94% (Mg), 88% (Steel)	SEM images of Mg and Steel
Irradiated Metal Alloys	Various Models	Not specified (quantitative defect analysis)	Terabytes of EM data per session

Table 4. Performance Metrics for Aluminum Alloy Segmentation.

Metric	L_BCE Median	L_DICE Median	L_IOU Median	L_Tversky Median	L_SS Median
Acc	0.999551	0.999506	0.999519	0.999436	0.999471
Sp	0.999732	0.999723	0.999714	0.999601	0.999633
Precision	0.663011	0.659344	0.676730	0.633345	0.634894
Sn	0.669518	0.677832	0.658808	0.690875	0.670456
IoU	0.590477	0.588999	0.584960	0.579945	0.572862
F1	0.666217	0.667450	0.663737	0.657220	0.648007

Table 5. Mask R-CNN Performance for Metal Powder Segmentation.

Particle Type	AP50
Elongated	69.5
Satellites	74.5
Circular	57.2
Nodular	60.7
Total mAP50	67.2

Table 6. Signal-to-Noise Ratios Before and After Noise Reduction.

Element	SNR (Before NR)	SNR (After NR)	Improvement (%)
Cr	1.29	1.34	3.88
Fe	1.95	1.99	2.05
Mn	1.95	2.38	22.05
Mo	0.43	2.45	469.77
N	1.03	1.48	43.69

Table 7. Neural Network Performance in Metallographic Analysis [23].

Standard and Parameter	Neural Network Structure	Classifying Error	Optimal Learning Epochs	Total Images Analysed	Correctly Classified
GOST 5639-82 Grain Amount	550-150-10	0.0149	820	280	274
GOST 8233-56 Ferrite/Perlite	400-110-10	0.0285	930	140	139
GOST 8233-56 Carbide Network	210-70-6	0.0319	900	210	202
GOST 1778-70 Line Nitrides	210-70-5	0.0119	780	153	144
GOST 1778-70 Sulphides	210-70-5	0.0098	890	186	173
ASTME 1382 Ferrite Grain	480-140-19	0.0463	1320	289	277

Table 8. Machine Learning Model Performance for SEM-EDS Classification [24].

Model	F1 Score	Notes
Random Forest (RF)	0.92	Best shallow model, sensitive to Si noise
Logistic Regression (LR)	0.87	Less sensitive to dataset size
Linear SVM	0.80	Less sensitive to dataset size
k-Nearest Neighbour (k-NN)	0.92	Optimal with n = 5, sensitive to data size
Artificial Neural Network (ANN)	0.89	Sensitive to dataset size
U-Net (Deep Learning)	0.88 (micro), 0.73 (macro)	Outperforms RF on unseen samples

Table 9. RMSE Results for Best Deep Learning Model in LIBS Analysis.

Element	RMSE (wt%)
Al	0.02
Si	0.02
Fe	0.01
Cu	0.01
Mn	0.01
Mg	0.01
Zn	0.01

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Abdelal, G.; Chan, C.-W.; McLoone, S. Critical Review of Recent Advances in AI-Enhanced SEM and EDS Techniques for Metallic Microstructure Characterization. Appl. Sci. 2026, 16, 975. https://doi.org/10.3390/app16020975

AMA Style

Abdelal G, Chan C-W, McLoone S. Critical Review of Recent Advances in AI-Enhanced SEM and EDS Techniques for Metallic Microstructure Characterization. Applied Sciences. 2026; 16(2):975. https://doi.org/10.3390/app16020975

Chicago/Turabian Style

Abdelal, Gasser, Chi-Wai Chan, and Sean McLoone. 2026. "Critical Review of Recent Advances in AI-Enhanced SEM and EDS Techniques for Metallic Microstructure Characterization" Applied Sciences 16, no. 2: 975. https://doi.org/10.3390/app16020975

APA Style

Abdelal, G., Chan, C.-W., & McLoone, S. (2026). Critical Review of Recent Advances in AI-Enhanced SEM and EDS Techniques for Metallic Microstructure Characterization. Applied Sciences, 16(2), 975. https://doi.org/10.3390/app16020975

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Critical Review of Recent Advances in AI-Enhanced SEM and EDS Techniques for Metallic Microstructure Characterization

Abstract

1. Introduction

1.1. Microstructure Analysis Spectrum: A Framework for AI in SEM/EDS

1.2. Evaluation Metrics

1.3. Scope and Significance

2. Core AI in Microstructural Characterisation

2.1. Semantic Segmentation

2.1.1. AI Techniques for Semantic Segmentation

2.1.2. Applications Across Domains

2.1.3. Detailed Applications in Materials Science

2.1.4. Recent Advancements and Challenges

2.1.5. Quantitative Insights and Performance Metrics

2.1.6. Critical Analysis of Methodologies and Trade-Offs in Semantic Segmentation

2.2. Object Detection in Metallic Materials

2.2.1. AI Techniques for Object Detection in Metallic Materials

2.2.2. Applications in Metallic Materials

2.2.3. Case Studies

2.2.4. Quantitative Insights and Performance Metrics

2.2.5. Challenges and Future Directions

2.2.6. Critical Analysis of Methodologies and Trade-Offs in Object Detection

2.3. Instance Segmentation in Metallic Materials

2.3.1. AI Techniques for Instance Segmentation

2.3.2. Applications in Metallic Materials

2.3.3. Case Studies

2.3.4. Challenges and Future Directions

2.3.5. Critical Analysis of Methodologies and Trade-Offs in Instance Segmentation

2.4. Chemical Composition Analysis in Metals

2.4.1. Traditional Methods for Chemical Composition Analysis

2.4.2. Role of AI in Chemical Composition Analysis

2.4.3. Case Studies

2.4.4. Challenges and Future Directions

2.4.5. Critical Analysis of Methodologies and Trade-Offs in Chemical Composition Analysis

3. Advanced Techniques and Recent Advancements (2020–2025)

3.1. Introduction to Transformers and GANs

3.2. Transformers in SEM Image Segmentation

3.3. GANs for Synthetic SEM Image Generation

3.4. Benefits and Challenges

3.5. Impact, Challenges, and Future Directions

4. Classification Models in Automated SEM/EDS

4.1. Traditional Machine Learning Models

4.2. Deep Learning Models

4.3. Multi-Class Categorisation in Automated SEM/EDS

4.3.1. Common Inclusion Classes

4.3.2. Machine Learning Approaches for Multi-Class Categorisation

4.3.3. Performance and Challenges

4.4. Differences Between Classification Models and Multi-Class Categorisation

4.5. Time Efficiency in Automated SEM/EDS for Inclusion Characterisation

4.5.1. Traditional vs. Automated SEM/EDS Analysis

4.5.2. Advanced Automation and High-Throughput Systems

4.5.3. Role of Machine Learning in Enhancing Time Efficiency

4.5.4. Challenges and Considerations

4.5.5. Future Directions

5. Predictive Modelling with AI

5.1. Importance of Predictive Modelling in Materials Science

5.2. Case Studies in Predictive Modelling

5.3. Methodologies and Techniques

5.4. Performance Metrics

5.5. Challenges and Limitations

5.5.1. Generating 3D Data from 2D SEM Slices Using AI

5.5.2. Examples of Successful Implementations

5.6. Future Directions

6. General Discussion

6.1. Hybrid Networks

6.2. Critical Analysis of Errors and Weaknesses in Modern AI Models

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI