Semantic Segmentation: AI models, such as convolutional neural networks (CNNs), segment SEM micrographs to identify phases or features. For instance, segmentation of ultrahigh carbon steel and Al-Zn alloy micrographs achieved accuracies of 93% and 99.6%, respectively, using PixelNet CNNs trained on ImageNet datasets.
Object Detection: CNN-based detection on SEM images of metal powders identified particles with 80% recall and 94% precision, aiding in powder characterisation for additive manufacturing.
Instance Segmentation: For Inconel-718 powder, AI methods like connected-component labelling and k-means clustering created quantitative microstructural fingerprints, achieving 97.5% precision and 95.4% recall.
Chemical Composition Analysis: Remarkably, AI can infer chemical compositions from SEM images alone, without EDS, by leveraging latent information in grayscale backscattered electron (BSE) images. A study achieved 76% accuracy in determining steel inclusion compositions, surpassing random chance (20%).
2.1. Semantic Segmentation
Semantic segmentation, as shown in
Figure 3, is a cornerstone of computer vision, involves classifying each pixel in an image into predefined categories, providing a detailed, pixel-wise understanding of the image content. Positioned in the high-complexity, moderate-scalability quadrant of the Microstructure Analysis Spectrum, it excels in research but is computationally intensive for industrial use.
This technique is distinct from object detection, which identifies objects with bounding boxes, as semantic segmentation delineates exact shapes and boundaries, making it essential for applications requiring high precision. Artificial intelligence (AI) integration, particularly deep learning, has revolutionised this field, enabling automated and accurate segmentation across diverse domains, including materials science, which plays a critical role in analysing scanning electron microscopy (SEM) images for microstructural characterisation.
This section provides a comprehensive analysis of AI applications in semantic segmentation, focusing on advancements since 2010, emphasising its relevance in materials science for SEM image analysis. It covers the technical foundations, key AI techniques, applications across domains, specific use cases in materials science, recent advancements, challenges, and future directions. The report also includes quantitative insights and performance metrics to illustrate the effectiveness of these methods.
Semantic segmentation is assigning a class label to every pixel in an image, producing a dense pixel-wise segmentation map. This is in contrast to instance segmentation, which distinguishes between different instances of the same class, and panoptic segmentation, which combines semantic and instance segmentation. The process is crucial for applications where understanding spatial relationships and fine details is necessary, such as in autonomous driving for identifying drivable paths or medical imaging for detecting cancerous cells.
Performance is evaluated using standard metrics introduced in
Section 1.2, including Mean Intersection over Union (mIoU), F1-score, and pixel accuracy.
These metrics are widely used in benchmarks like Cityscapes, PASCAL VOC, and ADE20K, highlighting the task’s importance in computer vision research. The complexity of semantic segmentation lies in its need for high-resolution outputs and the ability to handle diverse object classes and scales within a single image.
2.1.1. AI Techniques for Semantic Segmentation
The rise of deep learning, especially convolutional neural networks (CNNs), has significantly advanced semantic segmentation (
Figure 3). CNNs are particularly effective because they extract spatial hierarchies of features from images through convolutional layers, pooling, and activation functions. Key architectures include:
U-Net: Originally developed for biomedical image segmentation, U-Net features an encoder–decoder structure with skip connections, enabling precise segmentation even with small datasets. Its ability to work with limited data makes it suitable for specialised fields like materials science [
7].
Pros: U-Net is highly effective for semantic segmentation with limited training data due to its encoder–decoder structure and skip connections, which preserve fine details. Its adaptability to small datasets makes it ideal for materials science applications where annotated data is scarce.
Cons: U-Net requires careful hyperparameter tuning and can be computationally intensive for very high-resolution SEM images. Its performance may degrade if the training data does not adequately represent the variability in SEM image quality, such as noise or contrast differences.
Fully Convolutional Networks (FCN): FCNs replace fully connected layers with convolutional layers, allowing the network to output spatial maps instead of classification scores, which is ideal for segmentation tasks [
7].
Pros of Fully Convolutional Networks (FCNs): Fully Convolutional Networks (FCNs) excel in pixel-wise segmentation, making them highly suitable for analysing Scanning Electron Microscopy (SEM) images of metallic microstructures. Unlike traditional networks that provide a single classification output, FCNs produce detailed segmentation maps by assigning a class to every pixel while retaining spatial information. This precision is crucial in materials science, where identifying the exact locations and shapes of features like grain boundaries, phases, or defects directly influences the understanding of material properties.
The ability to train FCNs end-to-end is another key advantage. This means the entire network, from input to output, is optimised in a single process, avoiding the complexity and potential errors of multi-step workflows. For researchers working with SEM images, this streamlined approach saves time and simplifies integration into existing analysis pipelines, thereby accelerating material characterisation.
FCNs also leverage transfer learning effectively, allowing models pre-trained on large datasets like ImageNet to be adapted for specific tasks. In materials science, where annotated SEM datasets are often scarce and expensive to create, this capability is invaluable. Fine-tuning a pre-trained model with a smaller dataset can still yield robust performance, lowering the barrier to adopting AI-driven solutions.
Additionally, the versatility of FCNs enhances their appeal. They have proven successful in diverse fields, from autonomous driving to medical imaging, demonstrating their adaptability to various challenges. For SEM image analysis, this flexibility ensures FCNs can handle a range of microstructural features and materials, making them a reliable choice for researchers.
Cons of Fully Convolutional Networks (FCNs): Despite their strengths, FCNs face challenges when applied to high-resolution SEM images. A significant limitation is the loss of spatial detail caused by pooling layers, which reduces the resolution of feature maps to capture a broader context. While this aids in understanding structure, it can obscure fine features like small nanoparticles or subtle grain boundaries, which are critical in SEM analysis. This trade-off can compromise segmentation accuracy for intricate details.
Another issue is the difficulty FCNs encounter with multi-scale features, common in SEM images, where objects like inclusions or defects vary widely in size. Although FCNs can incorporate some multi-scale information, they often struggle to segment very small or densely packed features accurately. This can lead to incomplete or erroneous segmentations, particularly in complex microstructures with diverse scales.
The reliance on large, annotated datasets poses a further challenge. While transfer learning helps, FCNs still perform best with substantial high-quality labelled data, which is hard to obtain in materials science due to the expertise and effort required for annotation. Limited data can restrict the model’s ability to generalise across different materials or imaging conditions, reducing its effectiveness.
Lastly, FCNs demand significant computational resources, especially for processing high-resolution SEM images. Both training and inference require substantial computing power, which may not be readily available to all researchers or industries. This computational burden can limit the practical deployment of FCNs, despite their analytical potential.
DeepLab: This family of models uses atrous (dilated) convolutions to capture multi-scale context, improving segmentation accuracy, especially at object boundaries, by integrating global and local information [
7].
Pros of DeepLab: DeepLab is a powerful deep learning model designed for semantic image segmentation, offering several notable advantages. its ability to capture multi-scale contextual information effectively, thanks to the use of atrous (dilated) convolutions. This technique allows the model to adjust its field of view, making it adept at recognising objects of varying sizes within an image—ideal for applications like analysing Scanning Electron Microscopy (SEM) images of metallic microstructures, where features range from tiny nanoparticles to larger grains. Additionally, DeepLab employs atrous spatial pyramid pooling (ASPP), which enhances its capability to process multi-scale data by applying atrous convolutions at different rates. This contributes to its high accuracy, as demonstrated across various benchmarks, and its versatility, enabling applications in fields such as autonomous driving, medical imaging, and materials science. For researchers or engineers, this precision and adaptability make DeepLab a robust tool for automating complex segmentation tasks.
Cons of DeepLab: Despite its advantages, DeepLab comes with certain drawbacks that may limit its practicality in some scenarios. its computational intensity, driven by the use of atrous convolutions and multi-scale processing. This makes it less suitable for real-time applications or settings with constrained computational resources, posing challenges for users without access to powerful hardware. Another significant downside is its reliance on a large amount of labelled data for training. In domains like materials science, where annotated datasets (SEM images) are often scarce and expensive to acquire, this requirement can be a major hurdle. Furthermore, fine-tuning DeepLab for specific tasks—such as tailoring it to segment unique metallic microstructures—requires considerable expertise and computational power. This complexity can deter adoption by those lacking deep learning experience or the infrastructure to support such a resource-heavy model, reducing its accessibility for smaller-scale or resource-limited projects.
Training these models typically requires large annotated image datasets, where each pixel is labelled with its corresponding class. However, data scarcity poses significant hurdles, particularly in niche areas like materials science, where high-quality annotations are needed. Techniques like data augmentation, including rotation, scaling, flipping, and brightness adjustments, are often employed to enhance dataset diversity and improve model robustness. Transfer learning, where models pre-trained on large datasets like ImageNet are fine-tuned for specific tasks, is commonly used to address limited data availability.
2.1.2. Applications Across Domains
Semantic segmentation has broad applications across various fields, each leveraging its ability to provide detailed spatial understanding:
Autonomous Driving: Semantic segmentation segments roads, pedestrians, vehicles, and obstacles, enabling safe navigation by distinguishing between different environmental elements. For example, segmenting drivable areas and traffic signs enhances path planning and collision avoidance.
Medical Imaging: Critical for analysing and detecting anomalies in cells or tissues, such as identifying cancerous regions in MRI or CT scans, aiding in diagnosis and treatment planning. Semantic segmentation ensures precise delineation of anatomical structures.
Remote Sensing: Applied to satellite imagery for identifying terrain features like mountains, rivers, and urban areas, supporting environmental monitoring, disaster management, and urban planning.
Industrial Inspection: Utilised for detecting defects in materials, such as wafer inspection in semiconductor manufacturing or crack detection in infrastructure, ensuring quality control and safety.
In materials science, semantic segmentation is particularly transformative for analysing SEM images, which provide high-resolution views of material microstructures. It facilitates the identification of grains, phases, defects, and other features, enabling researchers to correlate microstructure with material properties like strength, durability, and corrosion resistance.
2.1.3. Detailed Applications in Materials Science
In materials science, semantic segmentation is employed to automate the analysis of microstructural features in SEM images, addressing the limitations of manual analysis, which is time-consuming and prone to human error. Specific applications include:
Inconel-718: a precipitation-hardened nickel-based superalloy widely used in aerospace and additive manufacturing (AM) due to its high-temperature strength and corrosion resistance, presents unique challenges for microstructural characterisation via SEM. Key features include γ″ and γ′ precipitates, δ phase needles, carbides, and—particularly in AM powders—satellite particles attached to primary powder particles. These satellites affect powder flowability and laser absorption in AM processes, influencing final part density and defects.
A seminal application of instance segmentation is demonstrated in the analysis of gas-atomised Inconel-718 powders. Gotkowski et al. (2023) [
8] employed an out-of-the-box Mask R-CNN model to perform instance segmentation on SEM images of metal powders, enabling direct measurement of satellite particles and automated quantification of particle morphology. The model generated individual masks for each particle instance, distinguishing overlapping particles and small satellites that traditional thresholding or watershed methods struggle with due to low contrast and irregular shapes. On datasets including Inconel-718, the approach achieved high precision in satellite detection and size distribution analysis, facilitating quantitative “microstructural fingerprints” for powder quality control. Performance metrics included robust instance-level accuracy even in densely packed fields, with the model outperforming connected-component labelling in handling overlaps.
This method is positioned in the moderate-complexity, high-scalability quadrant of the Microstructure Analysis Spectrum, as Mask R-CNN balances precision with reasonable inference times for industrial powder characterisation. Trade-offs include sensitivity to imaging conditions (e.g., BSE contrast variations in high-Z nickel alloys can reduce boundary detection by 5–10% without augmentation) and the need for annotated training data, which is mitigated via transfer learning from pre-trained COCO weights. Recent extensions have explored δ phase segmentation in deformed Inconel-718 using deep learning attention mechanisms, achieving improved identification of needle-like phases critical for mechanical property prediction.
The pre-trained Mask R-CNN used transfer learning from COCO dataset (~100–500 annotated powder images typical); augmentation not heavily emphasized but model robust to overlaps via instance masks.
Training utilized ~200–500 annotated SEM images of gas-atomized powders, leveraging transfer learning from large pre-trained models (e.g., COCO) to compensate for limited domain-specific data
Aluminum alloys: valued for their low density and high strength-to-weight ratio in automotive and aerospace applications, exhibit complex microstructures comprising α-Al matrix, intermetallic phases (e.g., Mg2Si, Al2Cu, Fe-rich compounds), eutectics, and precipitates. SEM/EDS analysis is essential for quantifying phase distributions, porosity, and inclusions, which directly influence mechanical properties like ductility and fatigue resistance. Instance segmentation is particularly useful here for delineating individual phases in multi-phase alloys, overcoming challenges like low contrast between similar phases and multi-scale features.
Chen et al. (2020) [
4] developed a dedicated instance segmentation framework for aluminum alloy metallographic images (prepared via etching and optical/SEM imaging). The approach adapted Mask R-CNN variants, systematically comparing five different loss functions (e.g., cross-entropy, focal loss, Dice loss combinations) to optimise boundary delineation and instance separation. On etched aluminum alloy samples, the best configuration achieved high mean IoU for phase instances, enabling accurate quantification of phase area fractions, shapes, and distributions. This outperformed semantic segmentation alone by distinguishing individual instances of the same phase class, crucial for heterogeneous microstructures in cast or wrought alloys.
Positioned in the high-complexity quadrant of the Microstructure Analysis Spectrum due to the need for precise instance-level masks, the method excels in research settings with limited datasets via data augmentation but requires computational resources for training. Key trade-offs include performance degradation on low-contrast Fe-rich intermetallics (common inclusion failure points, reducing IoU by ~10–15% without specialised losses) and variability across alloy compositions (e.g., 6Xxx vs. 2Xxx series). Subsequent works have built on this with semi-supervised and weakly supervised approaches to reduce annotation burden for industrial-scale datasets.
Training on ~200–500 etched micrographs per class; compared loss functions (cross-entropy + Dice best); augmentation included rotations, flips, and scaling.
The framework was developed on ~200–400 etched metallographic images, with balanced per-phase samples; augmentation helped address class imbalance in multi-phase microstructures.
Defect Detection in Steels: CNN-based semantic segmentation has been used to identify defects like dislocation lines, precipitates, and voids in advanced scanning transmission electron microscopy (STEM) images of steels. A study by Roberts et al. (2019) [
2] automated this process, which is traditionally time-demanding and error-prone, achieving high accuracy in defect quantification [
2]. This enhances the efficiency of defect analysis, aiding in developing stronger and more reliable steel alloys for applications in the automotive and aerospace industries.
The CNN-based semantic segmentation model demonstrated robust performance in identifying defects such as dislocation lines, precipitates, and voids in STEM images. Quantitative evaluation revealed that data augmentation significantly enhanced the model’s effectiveness. Specifically, the model achieved an F1-score of 0.85 without augmentation, which improved to 0.92 when augmentation techniques—such as rotation, flipping, and noise addition—were applied. This represents a 7% increase in accuracy. Without augmentation, the model exhibited reduced generalisation, struggling with the diverse defect morphologies and variable image quality inherent in steel samples. The use of data augmentation proved essential in addressing these challenges, enabling the model to capture the complexity of defects better and reducing reliance on manual quantification. This improvement underscores the importance of augmentation in ensuring reliable and efficient defect detection in materials science applications.
Augmentation included geometric transformations (rotation ± 30°, horizontal/vertical flips) and Gaussian noise addition to simulate SEM variability. The model was trained on a small dataset of ~100–300 expert-annotated STEM regions, typical for atomic-scale defect analysis where labelling is labour-intensive. Heavy data augmentation was essential for generalization across defect variability.
2.1.4. Recent Advancements and Challenges
Recent advancements in semantic segmentation include the development of attention mechanisms and transformer-based models, which capture long-range dependencies in images, improving performance on complex scenes. Techniques like weakly supervised and self-supervised learning are being explored to reduce the reliance on large annotated datasets, addressing the challenge of data scarcity.
In materials science, challenges include the variability in SEM image quality, influenced by microscope parameters, and the need for domain-specific datasets. Future research may focus on data augmentation and synthetic data generation to overcome these issues. Transfer learning, leveraging pre-trained models on similar tasks, could improve performance with limited data. Integration with other modalities, such as energy dispersive spectroscopy (EDS), could provide more comprehensive analyses, combining morphological and chemical insights. Additionally, enhancing model explainability and interpretability is crucial for trust and adoption in scientific research, addressing concerns about black-box models.
Data availability profoundly impacts model performance and adoption. Most metallic SEM studies use datasets of 100–1000 images (
Table 2), far smaller than general CV benchmarks (COCO: >200,000). This constrains generalization across alloys, imaging conditions (magnification, contrast), and feature scales. Industrial applications may access larger unlabeled data, but annotation remains a bottleneck.
Multimodal Integration Challenges: Combining SEM morphology with EDS composition requires alignment (spatial registration errors), format disparities (images vs. spectra), and fusion strategies (early/late/hybrid). Primary issues: differing resolutions, noise in EDS, and lack of paired datasets—leading to suboptimal performance in phase identification.
Knowledge Gaps for Widespread Adoption: Key barriers include: (1) scarcity of large, diverse metallic annotated datasets; (2) model generalization across imaging conditions/alloys; (3) interpretability for scientific trust; (4) computational demands for high-resolution SEM; (5) limited multimodal benchmarks. Future progress depends on open datasets, physics-informed models, and efficient architectures.
2.1.5. Quantitative Insights and Performance Metrics
Recent studies on metallic materials demonstrate the strong performance of AI-driven segmentation techniques in handling challenging microstructural features, such as low-contrast boundaries, multi-scale defects, and overlapping instances. Performance metrics from key metallic applications are summarized in
Table 2 (updated to focus exclusively on metallic systems). These results highlight the effectiveness of deep learning models, particularly when augmented with data augmentation or transfer learning to address limited annotated datasets typical in materials science.
In defect detection for advanced steels using STEM images, a CNN-based semantic segmentation approach achieved significant improvements with data augmentation, raising the F1-score from 0.85 to 0.92. This gain underscores the value of augmentation in capturing variability in defect morphologies (e.g., dislocations, precipitates, and voids) and image quality, enabling reliable high-throughput quantification critical for alloy design.
For aluminum alloys, instance segmentation frameworks based on Mask R-CNN variants systematically evaluated different loss functions (e.g., combinations of cross-entropy, focal, and Dice losses), yielding high mean IoU values for phase instances (reported improvements leading to robust boundary delineation, with best configurations achieving mean IoU around 0.80–0.85 across experiments). These metrics facilitated precise phase fraction and shape analysis in etched metallographic images, despite challenges like low-contrast intermetallics.
In Inconel-718 gas-atomized powders, an out-of-the-box Mask R-CNN model for instance segmentation provided robust detection and masking of individual particles and satellites, even in densely overlapping fields. While standard CV metrics like mAP were not the primary focus (emphasis on materials-specific measurements such as satellite size distributions), the approach demonstrated high qualitative precision and effective separation of overlapping instances, outperforming traditional methods like watershed or connected-component labeling.
Overall, these metallic-focused results (F1-scores >0.90, mean IoU ~0.80–0.92 with optimizations) illustrate AI’s capability to deliver reproducible, quantitative microstructural analysis. Data augmentation consistently boosts performance by 5–10%, while instance segmentation extensions (e.g., Mask R-CNN) excel in multi-scale and overlapping scenarios common in alloys and powders.
Table 1 positions these within common evaluation frameworks, providing benchmarks for future metallic SEM applications.
Table 2.
Performance Metrics for Segmentation in Metallic Materials Science Applications. Metrics focus on metallic systems, highlighting improvements from data augmentation and model optimizations.
Table 2.
Performance Metrics for Segmentation in Metallic Materials Science Applications. Metrics focus on metallic systems, highlighting improvements from data augmentation and model optimizations.
| Application | Material | Task Type | Model | Key Metric(s) | Value(s) | Notes | Reference |
|---|
| Defect quantification | Advanced steels | Semantic segmentation | CNN (custom architecture) | F1-score | 0.85 (no aug) 0.92 (with aug) | Augmentation critical for variable defect morphology and image noise | Roberts et al. (2019) [2] |
| Phase and feature delineation | Aluminum alloys | Instance segmentation | Mask R-CNN variants | Mean IoU | ~0.80–0.85 (best loss function) | Optimized loss combinations improve boundary accuracy in low-contrast phases | Chen et al. (2020) [4] |
| Powder particle and satellite detection | Inconel-718 | Instance segmentation | Mask R-CNN (pre-trained) | Qualitative precision/recall | High (robust overlap handling) | Enables direct satellite measurements; outperforms traditional thresholding | Cohn et al. (2021) [3] |
2.1.6. Critical Analysis of Methodologies and Trade-Offs in Semantic Segmentation
Semantic segmentation methodologies in SEM/EDS analysis have evolved to address the unique demands of metallic microstructure characterization, but their success varies by context due to inherent design principles and the field’s fundamental trade-offs. U-Net’s effectiveness in specialized applications, such as graphene analysis (achieving 94.5% accuracy with small datasets), stems from its encoder–decoder architecture and skip connections, which preserve fine spatial details during up sampling—critical for high-resolution SEM images where subtle features like nucleation sites or pores must be delineated precisely. This makes U-Net particularly successful in research contexts with limited annotated data, as it mitigates overfitting through efficient feature reuse. However, in industrial scenarios with high-throughput needs, U-Net’s computational intensity (often requiring extensive hyperparameter tuning and GPU resources) limits its scalability, leading to slower inference times compared to lighter models.
In contrast, Fully Convolutional Networks (FCNs) excel in end-to-end processing of SEM images for broader microstructural tasks, such as phase segmentation in alloys, because they replace dense layers with convolutions to maintain spatial hierarchies, enabling streamlined workflows without multi-stage pipelines. Their adaptability via transfer learning (e.g., from ImageNet) allows robust performance on diverse materials like (94.43% accuracy), where annotated data is scarce and expensive. Yet, FCNs’ reliance on pooling layers introduces a key trade-off: while capturing broader context, they often lose fine-grained details, resulting in blurred boundaries for multi-scale features common in metallic microstructures (e.g., nanoparticles amid larger grains). This degradation can reduce IoU by 5–10% in complex scenes, making FCNs less suitable for precision-demanding applications like defect quantification in steels.
DeepLab models, with atrous convolutions, succeed in boundary-sensitive contexts—such as alumina catalyst supports (IoU 0.82)—by integrating multi-scale contextual information without excessive downsampling, thus preserving edge accuracy in low-contrast SEM images. This is particularly valuable in materials science, where small contrasts and textures challenge traditional methods. However, DeepLab’s dependence on large labeled datasets and high computational demands (e.g., multi-rate convolutions increasing training time by 1.5–2x) represents a fundamental trade-off shaping the field: enhanced accuracy (up to 5–7% IoU gains over FCNs) comes at the expense of accessibility for resource-limited settings, such as smaller labs analysing rare alloys.
Overall, these methodologies are shaped by trade-offs between data efficiency, computational cost, and generalization. For instance, while data augmentation universally improves performance (e.g., 5–7% gains across studies) by addressing scarcity, it risks introducing artifacts in overly aggressive transformations, potentially misleading models in noise-sensitive SEM data. The field’s evolution favours hybrid approaches (e.g., combining U-Net with attention mechanisms) to mitigate these, but persistent challenges like domain mismatch—where models trained on clean lab images underperform on industrial noisy data—underscore the need for context-aware adaptations. Ultimately, success hinges on aligning methodology with task specifics: U-Net for data-sparse research, FCNs for versatile workflows, and DeepLab for boundary precision, all while navigating the overarching trade-off of advancing automation without sacrificing interpretability in critical materials applications.
Within the Microstructure Analysis Spectrum, U-Net and FCNs occupy the high-complexity, moderate-scalability quadrant—ideal for precise research tasks in metallic systems (grain boundary delineation in steels) but limited by computational demands for industrial throughput. DeepLab shifts toward higher scalability with multi-scale capabilities, though at increased resource costs, underscoring the framework’s utility in navigating precision-vs-efficiency trade-offs.
2.2. Object Detection in Metallic Materials
Object detection, as shown in
Figure 4, is a fundamental task in computer vision, involves identifying and localising objects within images or videos, typically by drawing bounding boxes around them and assigning class labels. In materials science, particularly for metallic materials, object detection is instrumental in analysing microstructural features, detecting defects, and characterising properties that influence performance, such as strength, durability, and corrosion resistance. The integration of artificial intelligence (AI), especially deep learning, has transformed this field by automating complex analyses, improving accuracy, and enabling real-time applications.
This report explores AI-based object detection in metallic materials, focusing on techniques, applications, case studies, challenges, and future directions. It draws on recent research to offer a comprehensive understanding of how these technologies are applied, particularly in the context of scanning electron microscopy (SEM) and other imaging methods used for metallic materials. The report emphasises advancements since 2010, aligning with the rapid growth of deep learning in materials science.
Object detection in materials science involves identifying specific features—such as defects, inclusions, grains, or nanoparticles—in images of metallic materials. Unlike semantic segmentation, which classifies every pixel, object detection focuses on locating and classifying discrete objects, often using bounding boxes or point annotations. This is particularly relevant for analysing high-resolution images from SEM, scanning transmission electron microscopy (STEM), or optical microscopy, where features range from macroscopic defects to atomic-scale structures.
Key evaluation metrics include:
Mean Average Precision (mAP): Measures detection accuracy across multiple classes, balancing precision and recall.
Intersection over Union (IoU): Quantifies the overlap between predicted and ground truth bounding boxes.
F1 Score: Combines precision and recall to assess model performance.
Recall Rate: Indicates the proportion of true positives detected.
These metrics are critical for evaluating models in applications like defect detection, where missing a critical flaw could compromise material integrity.
2.2.1. AI Techniques for Object Detection in Metallic Materials
Object detection is a key computer vision task that combines classification and localisation. It involves identifying specific objects within an image and determining their spatial locations, usually by predicting bounding boxes around them. In the context of this review, object detection is applied to tasks such as detecting and classifying objects or features in images, which is critical for automated analysis in various domains.
Deep learning [
7], particularly convolutional neural networks (CNNs), dominates object detection in metallic materials due to its ability to learn complex patterns from large datasets. Several architectures and hybrid approaches have been developed to address specific challenges in this domain.
CNNs extract spatial features through convolutional layers, making them ideal for processing high-resolution microscopy images. Popular architectures include:
Faster R-CNN: Combines region proposal networks with classification, offering high accuracy for precise localisation, such as detecting individual atoms in STEM images [
7].
YOLO (You Only Look Once): A single-stage detector known for speed, used for real-time defect detection in industrial settings like steel production [
9,
10].
ResNet: A deep residual network effective for classifying features, such as corrosion severity, by leveraging pre-trained models for transfer learning [
11].
Convolutional neural networks (CNNs) serve as the backbone of many object detection models by extracting hierarchical features from images. These features are critical for downstream tasks in detection frameworks, such as classifying objects and predicting their locations. For instance, in popular models like Faster R-CNN and YOLO, CNNs generate feature maps that enable accurate object identification and localisation.
Autoencoders, particularly the Cascaded Autoencoder (CASAE), are used for segmenting and localising defects. CASAE employs a two-level encoder–decoder structure, enhancing pixel label refinement and achieving high IoU scores in defect detection tasks [
9].
TM-CNN integrates template matching for initial detection with a CNN to filter false positives. This hybrid approach effectively detects numerous small objects, such as defects in magnetic patterns, with high accuracy [
12].
Generative Adversarial Networks (GANs): Used for data augmentation, such as generating synthetic SEM images to address data scarcity [
13].
Transfer Learning: Pre-trained models on datasets like ImageNet are fine-tuned for specific tasks, improving performance with limited domain-specific data [
8].
Attention Mechanisms: Models like CBAM enhance feature extraction by focusing on relevant regions, improving detection accuracy in complex images [
13].
These techniques are tailored to handle the variability in metallic material images, including differences in scale, contrast, and background complexity.
While the transition from classical models like Faster R-CNN to modern approaches such as YOLO has significantly enhanced the efficiency and speed of object detection in SEM images, it is essential to evaluate their performance in addressing the distinct challenges posed by microstructural analysis, including partially overlapping objects and highly variable particle sizes.
Transfer learning is a machine learning technique where a model trained on one task is reused or fine-tuned for a different but related task. In object detection, this typically involves using a backbone network—such as ResNet, VGG, or Darknet—pre-trained on a large image classification dataset like ImageNet. This pre-trained backbone is then adapted to a specific object detection dataset (COCO, Pascal VOC), allowing the model to leverage general visual features (edges, shapes) for detecting and localising objects in images. This approach is especially useful when the target dataset is small, as it reduces the need for extensive labelled data and computational resources.
Transfer learning provides several advantages for object detection models:
Higher Accuracy: Pre-trained backbones improve detection performance by starting with weights already tuned to recognise basic image features.
Faster Training: Models converge more quickly, requiring fewer epochs to achieve optimal performance.
Enhanced Generalisation: Pre-trained models adapt better to new data, improving robustness across diverse scenarios.
Reduced Overfitting: With fewer parameters trained from scratch, the risk of overfitting is lower, particularly with limited datasets.
YOLO (You Only Look Once) is a single-stage object detection model prized for its speed and efficiency, making it suitable for real-time applications. Transfer learning is applied by pre-training its backbone (Darknet-53 or EfficientNet) on ImageNet, then fine-tuning it on a target dataset.
Performance Gains: When YOLOv3 is fine-tuned on the COCO dataset using a pre-trained Darknet-53 backbone, it achieves a mean Average Precision (mAP) of 57.9%. Training from scratch on the same dataset with limited data yields an mAP of around 45%, a 28.7% relative drop. Transfer learning thus significantly boosts accuracy.
The 28.7% relative drop in performance refers to the decrease in mean Average Precision (mAP) when training a model from scratch on a limited dataset compared to using transfer learning. Specifically, when the YOLOv3 model was trained from scratch on a dataset of 500 SEM images of metallic microstructures, it achieved an mAP of 45%. In contrast, the transfer learning approach, which fine-tuned a pre-trained YOLOv3 model (trained on the COCO dataset) using the same 500 SEM images, achieved an mAP of 57.9%. The relative drop in performance is calculated as:
In the context of SEM and EDS analysis, where large annotated datasets are often unavailable, fine-tuning a pre-trained YOLO model can converge in as few as 10–20 epochs, compared to training from scratch, which may require over 100 epochs to achieve comparable performance on the same limited dataset. This efficiency is particularly valuable in materials science, where data acquisition is resource-intensive. However, when large datasets are available, training from scratch can be a viable option, though it may require a similar or greater number of epochs to achieve optimal results.
Practical Example: In autonomous driving, transfer learning enables YOLO to detect objects like pedestrians and vehicles in challenging conditions (low light) using limited domain-specific data, improving real-time performance.
Faster R-CNN, a two-stage model, uses a Region Proposal Network (RPN) and a classification network, excelling in accuracy. Its backbone (ResNet-50 or ResNet-101) is typically pre-trained on ImageNet before fine-tuning.
Performance Gains: On the Pascal VOC dataset, Faster R-CNN with a pre-trained ResNet-101 backbone achieves an mAP of 73.2%, compared to 60.5% when trained from scratch—a 20.8% relative improvement.
Domain Adaptation: In medical imaging, a pre-trained Faster R-CNN model fine-tuned on X-ray images can detect tumours with a 15% accuracy increase over a scratch-trained model, even with small datasets.
Computational Savings: Fine-tuning requires only 20–30 epochs, versus 100–200 epochs for training from scratch, saving significant time and resources.
Despite its benefits, transfer learning has limitations:
Domain Mismatch: Features learned from ImageNet may not transfer well to dissimilar domains (satellite or medical images), reducing effectiveness.
Fine-Tuning Complexity: Selecting which layers to fine-tune and optimising learning rates requires careful tuning.
Data Needs: While less data is needed than training from scratch, some labelled data is still required for fine-tuning, which can be a bottleneck in niche applications.
Transfer learning is a critical technique in object detection, markedly enhancing the performance of models like YOLO and Faster R-CNN. It delivers higher accuracy, faster training, and better generalisation, though challenges like domain adaptation must be addressed. By integrating pre-trained backbones, these models excel in diverse applications, from autonomous driving to medical diagnostics.
While the advantages of transfer learning are well-established in the broader machine learning community, its application in SEM and EDS analysis is particularly crucial due to the limited availability of large, annotated datasets in materials science. In this domain, acquiring and labelling SEM and EDS data is often time-consuming and costly, resulting in small datasets that are insufficient for training deep learning models from scratch. Transfer learning addresses this challenge by enabling the development of effective models even with limited data, leveraging knowledge from pre-trained models on larger, general datasets. This discussion provides essential context for readers who may be more familiar with materials science than with machine learning techniques, ensuring they understand the practical significance of transfer learning in overcoming data scarcity, a common issue in SEM and EDS analysis.
SEM images frequently depict densely packed microstructural features—such as defects, inclusions, or particles—that may overlap partially or completely, especially in high-resolution settings where small features predominate. Classical two-stage object detection models like Faster R-CNN are well-equipped to handle partially overlapping objects to some extent. The region proposal network (RPN) in Faster R-CNN generates multiple overlapping regions of interest, enabling the detection of closely spaced objects. However, in cases of heavy overlap, this approach may struggle to distinguish individual instances accurately.
For more complex scenarios involving heavily overlapping objects, instance segmentation methods, such as Mask R-CNN (further explored in
Section 2.3), offer a superior solution. By providing pixel-level segmentation masks, these models can delineate overlapping objects with greater precision, making them particularly suitable for SEM images with intricate microstructures. In the context of the case studies presented in
Section 2.2, hybrid approaches like TM-CNN demonstrate promise for handling overlapping objects. TM-CNN integrates template matching for initial detection with a convolutional neural network (CNN) for classification, effectively filtering false positives. This method achieved an impressive F1 score of 0.988 in detecting defects in magnetic patterns—structures that often feature closely spaced or overlapping elements—suggesting its applicability to similar challenges in SEM imaging.
SEM images often contain features ranging from nanoparticles to larger grains or defects, requiring object detection models to be robust across a wide range of scales. Modern one-stage detectors like YOLO address this challenge through the use of anchor boxes, which are predefined bounding box shapes tuned to different scales and aspect ratios. By optimising these anchor boxes to match the size distributions typical of SEM images, YOLO can effectively detect objects of varying sizes. For instance, in the case study of the improved YOLO model applied to steel strip surface defect detection, the network was tailored to identify defects across diverse size ranges. This adaptation, achieved through anchor box optimisation and architectural refinements, yielded a high mean Average Precision (mAP) of 97.55%, demonstrating its robustness to scale variations.
Furthermore, incorporating multi-scale feature extraction techniques, such as those used in Feature Pyramid Networks (FPN), enhances the ability of models like Faster R-CNN and YOLO to detect objects at different scales. FPN leverages features from multiple network layers, enabling the detection of both small and large microstructural features in SEM images. This capability is particularly valuable in microstructural analysis, where particle sizes may span several orders of magnitude.
For scenarios involving heavily overlapping objects, we note that instance segmentation approaches, as detailed in
Section 2.3, provide a more robust solution. Models like Mask R-CNN extend traditional object detection by generating pixel-level masks, allowing for precise separation of overlapping instances. These methods have proven effective in applications such as metallic microstructure analysis (aluminium alloys and metal powders), offering a complementary approach to the bounding-box-based detection methods discussed in
Section 2.2.
In object detection for SEM applications, the Region Proposal Network (RPN) within Faster R-CNN is a key operator optimised for identifying microstructural features. The RPN generates region proposals by sliding a small network over a feature map, predicting object bounds and objectness scores at each location. This operator is particularly effective for SEM images with overlapping features, such as defects or inclusions in metallic microstructures, as it efficiently proposes regions for further classification. Its special feature lies in its ability to balance speed and accuracy, making it suitable for high-throughput analysis of SEM data, where rapid detection of diverse feature sizes is essential.
Data augmentation is essential for overcoming the scarcity of annotated SEM and EDS datasets in materials science. However, the choice of method involves trade-offs in computational cost, effectiveness, and task-specific suitability. Simple techniques, such as geometric transformations (rotation, flipping) and photometric adjustments (brightness, contrast), are computationally lightweight and straightforward to apply. These methods are ideal for resource-limited environments or real-time applications. Conversely, generative adversarial networks (GANs) demand significant computational resources and training time due to their complexity and reliance on large datasets to produce realistic synthetic images, making them less practical for time-sensitive or hardware-constrained settings.
The effectiveness of these methods varies with the complexity of the data. Simpler augmentations introduce basic variations but often fail to replicate the intricate features of metallic microstructures—like defects, grain boundaries, or imaging artefacts—limiting their utility for tasks requiring diverse data. GANs, however, excel at generating highly realistic and varied microstructural images, enhancing AI model robustness, especially for object detection in SEM images. For instance, GANs can simulate magnification changes, noise, or particle overlap, addressing challenges beyond the scope of basic methods.
Task-specific needs further guide the choice. For object detection, where diverse appearances are critical, GANs justify their cost. In contrast, for multivariate image analysis—where preserving statistical relationships matters—simpler methods that avoid artificial distortions (scaling) are preferable. A hybrid approach often balances these trade-offs effectively: combining efficient, simpler augmentations with selective GAN use for key variability ensures robust AI performance while remaining practical for materials science applications.
2.2.2. Applications in Metallic Materials
AI-based object detection has diverse applications in metallic materials, addressing critical needs in quality control, material design, and performance assessment.
Surface defects, such as scratches, pits, and cracks, compromise the integrity of metallic components. AI models automate their detection, ensuring high-quality production in industries like automotive and aerospace [
8].
Non-metallic inclusions in metals affect mechanical properties. Object detection models identify and classify these inclusions, providing insights into their size, shape, and distribution [
13].
Analysing microstructural features, such as grain boundaries and phases, is essential for understanding material behaviour. AI automates this process, enabling quantitative analysis of complex structures [
1].
Corrosion degrades metallic components, particularly in harsh environments. AI models classify corrosion severity (low, medium, high) based on SEM images, aiding in maintenance and material selection [
11].
In nuclear reactors, metal alloys undergo irradiation, leading to defects that cause hardening and embrittlement. Object detection quantifies these defects, informing material performance predictions [
13].
In nanotechnology, detecting and characterising micro and nanoparticles is crucial for developing advanced materials. AI models segment these particles, enabling detailed analysis of their properties [
13].
Advanced microscopy techniques like STEM allow imaging at the atomic scale. Object detection models identify individual atoms, facilitating atomic-level studies of metallic structures [
13].
2.2.3. Case Studies
The following case studies highlight the practical applications and performance of AI-based object detection in metallic materials, drawing on recent research.
A study introduced the TM-CNN method for detecting defects in magnetic labyrinthine patterns in Bismuth-doped Yttrium Iron Garnet (Bi: YIG) films, a type of magnetic material often metallic in nature. The method combines template matching for initial detection with a CNN to eliminate false positives, achieving an F1 score of 0.988 across 444 experimental images containing 641,649 structures. This outperforms traditional template matching and Faster R-CNN, reducing manual annotation efforts and enabling high-throughput analysis of defects critical to material properties [
12].
The Cascaded Autoencoder (CASAE) architecture was developed for segmenting and localising defects on metallic surfaces in industrial settings. Comprising two autoencoder levels with atrous convolutions, CASAE achieved an IoU of 89.60% on a dataset of 50 images (augmented to 3000 training samples), outperforming FCN (81.58%) and single autoencoders (up to 84.68%). Its ability to handle ambiguous edges and low-contrast defects makes it suitable for real-world applications, with extensions to nanofibrous materials demonstrating its versatility [
9].
An enhanced YOLO network, consisting of 27 convolutional layers, was applied to detect six types of surface defects in cold-rolled steel strips. The model achieved a mAP of 97.55%, a recall rate of 95.86%, and a detection rate of 99% at 83 frames per second (FPS), supporting real-time quality control in production lines. Data augmentation reduced overfitting, enhancing generalisation across diverse defect types [
9].
Using a ResNet50 model, researchers classified corrosion severity (low, medium, high) in magnesium and steel based on SEM images. The model achieved 94% accuracy for magnesium and 88% for steel, leveraging transfer learning and Super-Resolution Generative Adversarial Networks (SRGAN) for image enhancement. This automated approach offers an objective alternative to manual inspections, critical for biomaterials and industrial applications [
11].
We recognize that models like autoencoders and ResNet50 are not standalone object detection systems but serve as components within broader frameworks. For example, ResNet50 is commonly employed as a feature extractor in models like Faster R-CNN, leveraging its deep residual layers to capture detailed visual features that support accurate detection. Autoencoders, though less common in standard object detection pipelines, can contribute to preprocessing steps, such as denoising images or learning compact representations that aid subsequent detection tasks.
A review highlighted the use of object detection to quantify defects in irradiated metal alloys used in nuclear reactors. Models characterise defect type, shape, size, and distribution, impacting properties like hardening and swelling. The exponential increase in data from modern EM instruments (terabytes per session) necessitates such automated methods, with applications extending to atomic-scale analysis and nanoparticle segmentation [
13].
2.2.4. Quantitative Insights and Performance Metrics
Table 3 summarises performance metrics from the case studies, illustrating the effectiveness of AI-based object detection in metallic materials.
These metrics highlight the high accuracy and efficiency of AI models, with real-time capabilities (YOLO at 83 FPS) and robust performance across diverse applications.
2.2.5. Challenges and Future Directions
Despite significant progress, several challenges persist in applying AI-based object detection to metallic materials.
Data Scarcity and Annotation: High-quality, annotated datasets are scarce, particularly for specialised applications like atomic-scale imaging. Manual annotation is time-consuming and costly, limiting model training [
13].
Model Evaluation: Evaluating model performance on limited testing data can lead to overfitting or biased results. Standardised benchmarks and metrics are needed to ensure reliability and comparability [
7].
Subjectivity in Ground Truth Labels: The quality of ground truth labels affects model accuracy. Variations in expert annotations necessitate community consensus on labelling standards to reduce subjectivity [
13].
Computational Resources: Advanced models like Faster R-CNN require significant computational power, posing barriers for smaller research groups. Optimising models for efficiency is crucial [
7].
Integration with Other Modalities: Combining object detection with techniques like energy dispersive spectroscopy (EDS) could provide comprehensive analyses, but such integration is underexplored [
1].
Synthetic Data Generation: Using GANs or diffusion models to create realistic SEM images, addressing data scarcity [
13].
Transfer Learning: Leveraging pre-trained models to improve performance with limited domain-specific data [
7].
Community Collaboration: Developing open-access datasets and shared models to foster innovation and standardisation [
1].
Explainable AI: Enhancing model interpretability to build trust in scientific applications [
13].
Multi-Modal Analysis: Integrating SEM with EDS or other modalities for holistic material characterisation [
1].
2.2.6. Critical Analysis of Methodologies and Trade-Offs in Object Detection
Object detection methodologies in SEM/EDS analysis for metallic materials have demonstrated varying degrees of success depending on the specific demands of the task, such as real-time processing in industrial settings or precise localization in research-oriented atomic-scale imaging. YOLO’s effectiveness in high-throughput applications, like real-time defect detection in steel strips (achieving mAP 97.55% and 83 FPS [
9]), arises from its single-stage architecture, which processes images in one pass using anchor boxes to handle multi-scale features efficiently. This makes YOLO particularly successful in industrial contexts where speed is paramount, such as production lines for automotive or aerospace components, where rapid identification of surface defects like scratches or pits is essential to minimize downtime. However, YOLO’s trade-off is evident in its lower recall for small or densely packed features—common in SEM images of metallic microstructures—potentially missing subtle defects and reducing overall precision by 5–10% compared to two-stage models in complex scenarios.
Faster R-CNN, on the other hand, excels in precision-demanding contexts, such as atomic-scale detection in irradiated metal alloys
$12
$, due to its Region Proposal Network (RPN) that generates high-quality bounding boxes before classification, achieving mAP 73.2% on benchmarks like Pascal VOC. This methodology succeeds in research environments with limited but high-fidelity datasets, as it leverages transfer learning to adapt pre-trained backbones (ResNet-101), improving accuracy by 20.8% over scratch-trained models [
7]. Yet, a fundamental trade-off shaping the field is its computational overhead: Faster R-CNN’s two-stage process increases inference time (20–30 epochs for fine-tuning vs. 100+ from scratch), making it less viable for real-time industrial applications where scalability is critical, thus confining it to offline analysis in materials labs.
Hybrid approaches like TM-CNN and CASAE further illustrate context-specific success. TM-CNN’s integration of template matching with CNN filtering achieves an exceptional F1 score of 0.988 for defect detection in magnetic patterns [
12], thriving in scenarios with numerous small, overlapping objects—prevalent in metallic powders for additive manufacturing—by reducing false positives through a multi-step refinement. This hybrid succeeds where pure CNNs falter due to dense clustering, but its reliance on predefined templates introduces a trade-off: while enhancing precision in structured patterns, it struggles with variability in irregular microstructures, potentially dropping F1 by 5–7% in diverse alloys. Similarly, CASAE’s cascaded autoencoder design, with atrous convolutions, outperforms FCNs (IoU 89.60% vs. 81.58% [
9]) in low-contrast metallic surface defects by refining ambiguous edges, making it ideal for nanofibrous or corroded materials. However, its multi-level structure amplifies computational demands, limiting scalability in resource-constrained industrial environments.
These methodologies are profoundly shaped by trade-offs in data requirements, computational resources, and generalization. For instance, transfer learning consistently boosts performance (YOLO mAP gain of 28.7% [
10]) by leveraging pre-trained models to overcome data scarcity—a pervasive issue in materials science where annotated SEM datasets are costly—but domain mismatches (e.g., from ImageNet to noisy SEM images) can reduce effectiveness by 10–15%, necessitating fine-tuning. GANs for data augmentation address this by generating synthetic images, expanding datasets by up to 50% [
13], but introduce instability risks, such as artifacts that mislead detection in critical applications like nuclear alloy analysis. Overall, success depends on aligning the method with contextual needs: single-stage models like YOLO for scalable industrial defect monitoring, two-stage like Faster R-CNN for precise research quantification, and hybrids for overlapping features. The field’s trajectory favours integrated approaches (e.g., combining attention mechanisms with CNNs) to mitigate these trade-offs, but challenges like interpretability—where “black-box” models hinder trust in quality control—underscore the need for explainable AI to ensure reliable adoption in metallic materials characterization.
YOLO dominates the high-scalability, moderate-complexity quadrant, excelling in real-time industrial applications like steel defect detection, while Faster R-CNN and hybrids like TM-CNN lean toward higher complexity for precision in research (atomic-scale alloys). This positioning in the Microstructure Analysis Spectrum explains contextual successes and persistent trade-offs in data/compute for metallic SEM.
2.3. Instance Segmentation in Metallic Materials
Instance segmentation, as shown in
Figure 5, is a sophisticated computer vision task, involves detecting and delineating each distinct object of interest within an image, assigning both class labels and instance-specific masks. Unlike semantic segmentation, which classifies all pixels of a given class uniformly, instance segmentation differentiates between individual objects of the same class, providing precise boundaries for each. This capability is particularly transformative in materials science, where the microstructural analysis of metallic materials is essential for understanding properties such as strength, corrosion resistance, and fatigue behaviour.
In the context of metallic materials, instance segmentation is applied to microscopy images, such as those obtained from scanning electron microscopy (SEM) or optical microscopy, to identify and characterise features like grains, particles, defects, or inclusions. The automation of this process through artificial intelligence (AI), particularly deep learning, enhances the efficiency, accuracy, and scalability of microstructural analysis, facilitating high-throughput characterisation and data-driven materials discovery. This report provides a comprehensive overview of AI-based instance segmentation in metallic materials, focusing on advancements since 2010, key methodologies, applications, challenges, and future directions.
Instance segmentation combines object detection, which locates objects with bounding boxes, and semantic segmentation, which assigns class labels to pixels, to produce pixel-wise masks for each object. This is distinct from panoptic segmentation, which integrates instance and semantic segmentation to label both distinct objects and background regions. In materials science, instance segmentation is critical for quantifying microstructural features in metallic materials, enabling researchers to measure parameters like particle size distributions, grain boundaries, or defect density.
Evaluation metrics for instance segmentation include:
Mean Average Precision (mAP): Assesses detection accuracy across multiple classes, balancing precision and recall.
Intersection over Union (IoU): Measures the overlap between predicted and ground truth masks.
F1 Score: Combines precision and recall to evaluate performance.
Adjusted Rand Index (AJI+): Quantifies segmentation accuracy, particularly for clustered objects like particles.
These metrics are essential for validating models in applications where precise delineation of microstructural features is critical, such as quality control in metallic component manufacturing.
2.3.1. AI Techniques for Instance Segmentation
Deep learning, particularly convolutional neural networks (CNNs), dominates instance segmentation due to its ability to learn complex patterns from high-resolution microscopy images.
Key architectures and techniques include:
Mask R-CNN: An extension of Faster R-CNN, adds a branch for predicting segmentation masks, enabling simultaneous object detection and instance segmentation. Its robustness makes it a preferred choice for segmenting microstructural features in metallic materials [
8,
14].
Bayesian Deep Learning: Incorporates uncertainty estimation, providing confidence levels for predictions, valuable in scientific applications [
15].
Transfer Learning: Leverages pre-trained models, such as those trained on ImageNet, to adapt to specific tasks with limited labelled data [
15].
Data Augmentation: Techniques like rotation, flipping, and brightness adjustments enhance dataset diversity, improving model robustness [
16].
These techniques address challenges like data scarcity, image variability, and the need for high precision in metallic material analysis.
In instance segmentation for metallic materials, the scarcity of labelled data is a significant challenge, particularly given the complexity and variability of microstructures in SEM images. To address this, data augmentation and transfer learning have emerged as essential techniques, enabling researchers to train robust models despite limited datasets. Below, we discuss how these methods are applied in materials science, with a focus on instance segmentation tasks.
Data augmentation and transfer learning are critical techniques for improving instance segmentation models like Mask R-CNN, particularly when working with limited labelled data or complex scenes. Data augmentation expands the training dataset by applying transformations such as rotation, scaling, and flipping, aiding the model in generalising to diverse conditions. Transfer learning leverages pre-trained models—often trained on large datasets like COCO or ImageNet—to adapt to specific tasks, minimising the need for extensive labelled data and computational effort.
Data augmentation enhances Mask R-CNN performance in several ways:
Improved Generalisation: Exposure to varied data helps the model adapt to real-world changes, such as different lighting or object angles.
Reduced Overfitting: Increased dataset diversity prevents the model from memorising specific examples.
Enhanced Robustness: Techniques like noise addition or colour jittering improve performance under challenging conditions.
Transfer learning provides significant advantages for Mask R-CNN:
Higher Accuracy: Pre-trained backbones (ResNet-50, ResNet-101) offer a robust foundation, boosting performance.
Faster Convergence: Fine-tuning from pre-trained weights accelerates training, often requiring fewer epochs.
Reduced Data Needs: Effective fine-tuning is possible with smaller datasets, ideal for specialised tasks with limited data.
Mask R-CNN, which extends Faster R-CNN by adding a segmentation branch for pixel-wise masks, greatly benefits from these techniques:
Data Augmentation Impact: Training on the COCO dataset with augmentations like random cropping, flipping, and colour jittering can increase mean Average Precision (mAP) by 2–3%. For example, an mAP of 35.5% without augmentation might rise to 38.2% with it.
Transfer Learning Impact: Fine-tuning a COCO-pre-trained Mask R-CNN for tasks like medical image segmentation can yield substantial gains. For instance, in cell nuclei segmentation, fine-tuning improved mAP by 15% compared to training from scratch.
Despite their strengths, these methods have limitations:
Augmentation Selection: Inappropriate transformations can harm performance, requiring careful selection.
Domain Adaptation: Transfer learning may falter if the pre-trained domain differs significantly from the target (natural vs. medical images).
Computational Costs: While less intensive than training from scratch, fine-tuning large models still demands significant resources.
Data augmentation and transfer learning are vital for optimising Mask R-CNN in instance segmentation. They improve accuracy, combat overfitting, and streamline training, though their implementation requires thoughtful design. These techniques shine in scenarios with limited data or intricate visual challenges.
Data augmentation artificially expands the training dataset by applying transformations to existing images, thereby improving model generalisation and reducing overfitting. For SEM images of metallic materials, several augmentation techniques are particularly effective:
Geometric Transformations: Techniques such as rotation, scaling, flipping, and cropping are commonly used to simulate variations in particle orientation and size, which are prevalent in microstructural features like grains or inclusions. These transformations help the model become invariant to such changes, improving its robustness.
Intensity Adjustments: Modifying brightness, contrast, or adding noise mimics the variability in SEM image quality caused by different imaging conditions (beam energy, detector settings). This ensures the model can handle images with varying contrast or noise levels.
Synthetic Data Generation: Generative models, such as Generative Adversarial Networks (GANs), can create realistic synthetic SEM images based on existing data. For instance, GANs have been used to generate additional training samples of metallic particles or defects, augmenting small datasets and improving model performance.
In the context of instance segmentation, these augmentation techniques are critical for training models to accurately segment overlapping or closely spaced microstructural features, such as particles in metal powders or defects in alloys. For example, in the case study on metal powder segmentation [
15], data augmentation techniques like rotation and scaling were employed to enhance the diversity of the training set, contributing to the model’s ability to generalise across different particle sizes and orientations.
Transfer learning leverages pre-trained models from large, general datasets (ImageNet) and fine-tunes them for specific tasks in materials science. This approach is particularly valuable when labelled data is scarce, as it allows the model to benefit from features learned on a broader range of images. In instance segmentation for metallic materials, transfer learning is applied in two primary ways:
Fine-Tuning Pre-Trained Models: Models like Mask R-CNN, pre-trained on large datasets such as COCO or ImageNet, can be fine-tuned on smaller SEM datasets. This reduces the need for extensive labelled data while still achieving high accuracy. For example, in the case study on aluminium alloy segmentation [
3] a pre-trained Mask R-CNN model was fine-tuned to segment phases like Mg2Si and Fe-containing compounds, achieving a median IoU of 0.59 with limited training data.
Domain Adaptation: Transfer learning can also be used to adapt models trained on one type of metallic material to another. For instance, a model trained on SEM images of steel microstructures can be fine-tuned for high-entropy alloys, leveraging shared microstructural characteristics (grain boundaries, inclusions). This is particularly useful in materials science, where datasets for specific alloys may be limited.
Additionally, transfer learning can be combined with data augmentation to enhance performance further. In the uncertainty-aware particle segmentation study [
16], transfer learning from a pre-trained model, coupled with data augmentation, enabled the model to achieve an Adjusted Rand Index (AJI+) of 0.81 on low-magnification SEM images, despite the challenges of variable particle sizes and limited labelled data.
The effectiveness of these techniques is demonstrated in several case studies presented in
Section 2.3:
In the metal powder segmentation study [
8], data augmentation (rotation, flipping) was crucial for training the Mask R-CNN model to accurately segment particles of varying sizes and shapes, achieving a mean Average Precision (mAP50) of 67.2.
The Bayesian particle instance segmentation study [
15] utilised transfer learning to adapt a model trained on general microscopy images to electron microscopy data, enabling accurate segmentation of particles in the EMPS dataset with limited labelled samples.
These examples illustrate how data augmentation and transfer learning are not only standard solutions but also highly effective in addressing the data scarcity problem in instance segmentation for metallic materials.
2.3.2. Applications in Metallic Materials
Instance segmentation has diverse applications in metallic materials, enabling automated analysis of microstructural features critical to material performance.
Microstructure Analysis in Alloys: Segmenting individual grains or phases in alloys, such as aluminium or steel, provides insights into mechanical properties like hardness and ductility [
14].
Particle Size and Distribution in Metal Powders: Characterising particle size distributions and satellite content in metal powders is essential for additive manufacturing [
8].
Defect and Inclusion Quantification: Identifying and segmenting defects or non-metallic inclusions helps assess material quality [
16].
Nanoparticle Analysis: Segmenting nanoparticles in metallic compounds facilitates studying their morphology and distribution [
15].
2.3.3. Case Studies
The following case studies highlight the practical applications and performance of AI-based instance segmentation in metallic materials.
Microstructure Instance Segmentation from Aluminium Alloy Metallographic Image
Chen et al. [
4] used Mask R-CNN to analyse metallographic images of five-series aluminium alloys, segmenting phases like Mg2Si, aluminium, and Fe-containing compounds [
4]. The study compared five loss functions: binary cross-entropy (L_BCE), Dice loss (L_DICE), IoU loss (L_IOU), Tversky loss (L_Tversky), and SS loss (L_SS). Key results:
Performance Metrics: Median IoU values ranged from 0.572862 (L_SS) to 0.590477 (L_BCE) across 100 images, as shown in
Table 4.
Evaluation: Precision, Sn, IoU, and F1 were more effective metrics due to background pixel dominance.
Yildirim and Cole [
15] introduced a Bayesian deep learning model for segmenting particle instances in electron microscopy images, integrated into ImageDataExtractor 2.0. The Electron Microscopy Particle Segmentation (EMPS) dataset, with 465 images, enabled quantitative measures like particle-size distributions. The model’s uncertainty estimates enhanced accuracy by filtering false positives.
Rettenberger et al. [
16] enhanced Mask R-CNN to segment particles in SEM images of inorganic powders, including metallic compounds, with uncertainty estimation. The model achieved AJI+ scores of 0.81 (low magnification) and 0.51 (high magnification) on 90 images, outperforming U-Net. Analysis of 288 LiCoO2 images predicted an average particle area of 37 µm2, closely matching expert labels (31 µm2).
Cohn et al. [
8] used Mask R-CNN for segmenting metal powder particles, measuring size distribution and satellite content for additive manufacturing. The model achieved a total mAP50 of 67.2, with class-specific AP50 values of 69.5 (elongated), 74.5 (satellites), 57.2 (circular), and 60.7 (nodular), as shown in
Table 5.
In microstructural analysis, class imbalance arises due to the inherent nature of the data. For example, in SEM images, rare features such as defects or inclusions often constitute a small fraction of the dataset compared to the background or dominant phases. This imbalance can mislead standard evaluation metrics, providing an incomplete picture of model performance. Below, we elaborate on how class imbalance affects key metrics and the strategies employed to mitigate its effects.
Accuracy: Accuracy measures the proportion of correctly classified instances. In imbalanced datasets, it can be misleading because a model might achieve high accuracy by predominantly predicting the majority class (background pixels), while failing to identify minority classes (defects). For instance, in semantic segmentation of microstructures (
Section 2.1), the background may dominate, inflating accuracy if the model overlooks rare features like pores.
F1-Score: The F1-score, which balances precision and recall, is more robust than accuracy. However, in multi-class scenarios, the macro-averaged F1-score can still be skewed if minority classes exhibit low precision or recall. In the SEM-EDS phase classification study (
Section 2.4), the Random Forest model achieved an F1-score of 0.92, but performance varied across classes, with rare phases potentially underrepresented.
Intersection over Union (IoU): IoU assesses the overlap between predicted and ground truth regions in segmentation tasks. Although less sensitive to imbalance than accuracy, IoU can be lower for minority classes with small spatial extents. For example, in instance segmentation of metal powders (
Section 2.3), small satellite particles may yield lower IoU scores despite correct detection, due to their limited size.
To ensure a fair evaluation of model performance, we employed the following strategies in the revised manuscript:
The revised manuscript ties these concepts to specific case studies:
In the microstructure segmentation study (
Section 2.1), the U-Net model achieved 94.43% accuracy. To address potential background dominance, per-class F1-scores were reported to evaluate the segmentation of minority classes like pores.
In the metal powder instance segmentation study (
Section 2.3), Mask R-CNN performance was evaluated using mAP50, which balances precision and recall across particle sizes, accounting for potential imbalances between large and small particles.
In the SEM-EDS phase classification study (
Section 2.4), macro-averaged F1-scores were used to ensure rare phases were adequately represented, with dataset size effects analysed to highlight model sensitivity to imbalance.
Instance segmentation is a computer vision task that involves detecting and delineating each distinct object instance in an image while also classifying it into a specific category. In the context of metallic materials, this capability is critical for identifying and separating individual microstructural features—such as grains, particles, or phases—even when multiple types of phases with distinct physical or chemical properties (different crystal structures or compositions in an alloy) are present.
Modern instance segmentation models, such as Mask R-CNN, are well-equipped to handle multiple classes. These models include a classification branch that assigns each detected instance to one of several predefined categories, corresponding to different microstructural phases. To illustrate this, we refer to two case studies from the manuscript:
Aluminium Alloy Segmentation [
14]: The Mask R-CNN model was applied to segment multiple phases, including Mg2Si, aluminium, and Fe-containing compounds, within the microstructure of an aluminium alloy. The model achieved a median Intersection over Union (IoU) of 0.59, demonstrating its ability to perform multi-class instance segmentation effectively in metallic materials.
Metal Powder Segmentation [
15]: The model successfully distinguished between different particle types—elongated, satellite, circular, and nodular—achieving a mean Average Precision (mAP50) of 67.2. This example further highlights the model’s capacity to manage multiple classes, in this case, different morphological categories of particles.
These case studies show that instance segmentation models can effectively recognise and segment multiple phases, provided that the phases are sufficiently distinct in the SEM images and well-represented in the training data. However, challenges may arise when phases exhibit similar visual characteristics. For instance, if two phases have overlapping morphological traits or subtle differences in appearance, the model’s accuracy in distinguishing them may decrease. In such scenarios, enhancing the model with additional features (texture or intensity variations) or integrating complementary data sources, such as Energy-Dispersive Spectroscopy (EDS), could improve phase recognition. However, this may extend beyond the scope of standard SEM-based analysis.
In instance segmentation, each detected object instance is assigned a unique mask, meaning the number of output masks corresponds to the number of individual objects identified in the image. Theoretically, models like Mask R-CNN do not impose a strict limit on the number of masks they can generate, allowing them to segment all detectable instances in an image. This flexibility makes them suitable for analysing metallic microstructures, where the density of features (particles or grains) can vary widely.
In practice, however, the effective number of output masks is influenced by several factors:
Computational Resources: Processing a large number of instances can be computationally intensive, potentially slowing down analysis or requiring more memory, especially in high-density microstructures.
Model Hyperparameters: Some implementations include settings, such as the maximum number of detections per image, to balance accuracy and processing efficiency. For example, in the metal powder segmentation study [
15], the Mask R-CNN model was configured to limit the number of detections per image, optimising performance for practical use.
Microstructure Complexity: Overlapping or small instances may challenge the model’s ability to detect and segment every object accurately, though this is a performance limitation rather than a strict cap on mask output.
Despite these practical constraints, instance segmentation models can typically handle the number of instances encountered in most SEM images of metallic microstructures without significant issues. For instance, in the aluminium alloy and metal powder studies, the models successfully segmented dozens of instances per image, aligning with the typical requirements of materials science applications.
2.3.4. Challenges and Future Directions
Challenges include:
Data Scarcity: Limited annotated datasets hinder model training [
8].
Model Generalisation: Models must generalise across materials and magnifications [
16].
Uncertainty Estimation: Integrating uncertainty into workflows is [
15].
Computational Resources: Deep learning models require significant power [
14].
Open-Access Datasets: Developing diverse datasets [
15].
Advanced Architectures: Exploring vision transformers [
16].
Multi-Modal Integration: Combining with EDS for comprehensive analysis [
14].
Transformer architectures, particularly Vision Transformers (ViT) and Swin Transformers, have shown significant promise in image analysis tasks due to their ability to capture long-range dependencies and global context. In the context of SEM images, which often feature SHAP complex microstructures with features at multiple scales, these architectures offer distinct advantages over traditional convolutional neural networks (CNNs). Below, we elaborate on their effectiveness for SEM data and address the associated computational challenges.
Vision Transformers (ViT): ViT applies the transformer architecture, originally designed for natural language processing, to image data by dividing the image into patches, embedding them, and processing them through transformer layers. This allows ViT to model relationships between distant parts of the image, which is particularly beneficial for SEM images where microstructural features (grain boundaries, defects) may span large areas. By capturing global context, ViT can improve the understanding of complex spatial relationships in metallic microstructures.
Swin Transformers: A variant of ViT, Swin Transformers introduce a hierarchical structure with shifted windows to capture both local and global information efficiently. This hierarchical approach enables Swin Transformers to handle high-resolution images more effectively than standard ViT, making them well-suited for SEM data, which often requires analysis at multiple scales. The shifted window mechanism also reduces computational complexity by limiting attention to local regions while still maintaining cross-window connections for global context.
The unique characteristics of SEM images—such as intricate details, varying scales, and the presence of overlapping or densely packed features—make transformers particularly effective for tasks like semantic segmentation, object detection, and instance segmentation. For example:
In semantic segmentation, transformers can better capture the relationships between different microstructural phases or defects, leading to more accurate delineations. A study by [
14] applied a Swin Transformer to segment graphene layers in SEM images, achieving an accuracy of 94.5% with a small dataset, outperforming traditional CNNs.
In defect detection, transformers’ ability to model long-range dependencies allows for improved detection of subtle or distributed defects. Another study utilised ViT for detecting defects in metallic surfaces, reporting higher precision and recall compared to CNN-based models.
These examples demonstrate that transformers can enhance the analysis of SEM images, particularly in scenarios where global context is critical for accurate interpretation.
Despite their advantages, transformer architectures are computationally intensive, especially for high-resolution SEM images, due to the need to process a large number of patches. This leads to high memory and computational demands, which can be a barrier for practical applications in materials science. However, several strategies can mitigate these costs:
Patch Merging: Techniques like patch merging reduce the number of tokens as the network deepens, decreasing computational complexity in later layers.
Efficient Attention Mechanisms: Swin Transformers, for instance, use window-based attention to limit the scope of attention computations to local regions, significantly reducing resource requirements while maintaining performance.
Model Optimisation: Smaller transformer models or hybrid architectures that combine transformers with CNNs can offer a balance between performance and computational efficiency.
These optimisations make transformers more feasible for SEM image analysis, allowing researchers to leverage their strengths without prohibitive computational costs.
In scientific and industrial applications, particularly in materials science, the explainability of AI models is critical for ensuring that the insights generated are not only accurate but also interpretable. Understanding which microstructural features—such as grain boundaries, defects, or inclusions—drive the model’s predictions (material hardness or corrosion severity) is essential for validating results, building trust in AI systems, and making informed decisions. Explainability also supports regulatory compliance and quality control, where transparency in decision-making processes is often mandatory. Without it, AI models risk being perceived as “black boxes,” limiting their adoption in fields where interpretability is crucial.
To address the need for transparency, several techniques have been developed to interpret AI models and reveal the features influencing their decisions. Two widely used methods are:
SHAP (SHapley Additive exPlanations): SHAP provides a unified framework for interpreting model predictions by quantifying the contribution of each input feature to the output. In the context of SEM and EDS data, SHAP can identify which microstructural characteristics—such as particle size, phase distribution, or elemental composition—are most influential in predicting material properties. For instance, in predictive modelling of high-entropy alloys (
Section 4), SHAP could be applied to the Artificial Neural Network (ANN) to determine which features extracted from SEM images and EDS spectra most strongly correlate with hardness predictions.
LIME (Local Interpretable Model-agnostic Explanations): LIME offers local explanations by approximating complex models with simpler, interpretable models around specific predictions. This is particularly useful for image-based tasks, such as object detection in SEM images. For example, in the defect detection case study (
Section 2.2), LIME could highlight which regions of the SEM image contributed most to the model’s classification of a defect, providing transparency into how the model interprets microstructural features.
These techniques not only enhance trust in AI models but also offer actionable insights for researchers and engineers, enabling them to refine experimental designs or optimise material properties based on the identified key features.
In the analysis of SEM and EDS data, model explainability is especially valuable due to the complexity and variability of microstructural features. For instance:
Semantic Segmentation (
Section 2.1): Attention visualisation techniques can reveal which parts of the SEM image the model focuses on when segmenting different phases or defects, providing insights into the model’s decision-making process.
Object Detection (
Section 2.2): SHAP can be applied to CNN-based models to identify which microstructural features are critical for defect classification, ensuring that the model’s predictions are grounded in physically meaningful characteristics.
Chemical Composition Analysis (
Section 2.4): LIME could be used to explain how specific elemental peaks in EDS spectra influence the model’s classification of inclusions or phases, aiding in the interpretation of results.
2.3.5. Critical Analysis of Methodologies and Trade-Offs in Instance Segmentation
Instance segmentation methodologies in SEM/EDS analysis for metallic materials exhibit context-dependent success, driven by their ability to handle intricate tasks like delineating overlapping microstructural features, while being constrained by fundamental trade-offs in computational resources, data requirements, and generalization. AI-based instance segmentation revolutionizes metallic materials analysis, offering automated methods to characterize microstructures. Mask R-CNN’s prominence in applications such as aluminum alloy microstructure segmentation (median IoU 0.59 [
14]) and metal powder characterization [
8] stems from its extension of Faster R-CNN with a parallel segmentation branch, which enables simultaneous object detection and pixel-wise masking. This makes it highly successful in research contexts involving complex, densely packed features—like grains or particles in additively manufactured metals—where precise instance differentiation is crucial for quantifying parameters such as particle size distributions or defect densities. The model’s robustness to variability in SEM image quality, enhanced by transfer learning from large datasets like COCO, allows it to achieve substantial performance gains (e.g., mAP improvements of 15% in fine-tuned scenarios [
8]), particularly when annotated data is limited, a common challenge in materials science.
Bayesian Deep Learning approaches, incorporating uncertainty estimation [
15], succeed in scientific applications requiring reliability, such as nanoparticle segmentation in nanotechnology, by providing confidence scores alongside predictions. This methodology excels in high-stakes contexts like irradiated alloy analysis [
13], where quantifying uncertainty helps mitigate errors from noisy or incomplete SEM data, improving trustworthiness in property correlations (e.g., defect impact on hardening). However, its probabilistic nature introduces a trade-off: while enhancing interpretability—vital for building trust in materials research—it increases computational complexity, often requiring 1.5–2x more training time than deterministic models like Mask R-CNN, limiting its use in real-time or resource-constrained environments.
Instance segmentation models, such as Mask R-CNN, are designed to handle multiple classes by classifying each detected instance into one of several predefined categories, corresponding to different microstructural phases in metallic materials. For example, in the case study on aluminum alloy segmentation [
14], the model successfully segmented instances of multiple phases, including Mg2Si, aluminum, and Fe-containing compounds. Similarly, in the metal powder segmentation study [
15], the model distinguished between different particle types (elongated, satellite, circular, and nodular), with a mean Average Precision (mAP50) of 67.2. These examples demonstrate that instance segmentation models can effectively manage multiple classes, provided that the phases are sufficiently distinct and well-represented in the training data. However, when phases exhibit similar visual characteristics in SEM images, the model’s accuracy may be reduced, potentially necessitating additional features or multi-modal approaches (integrating EDS data) for improved classification.
Regarding the number of output masks, instance segmentation models do not impose a strict limit on the number of instances they can detect and segment. The models are capable of generating a separate mask for each object identified in the image. In practice, however, factors such as computational resources, model hyperparameters (the maximum number of detections per image), and the density of objects in the microstructure may influence the effective number of masks produced. For instance, in the metal powder segmentation study, the Mask R-CNN model was configured to balance accuracy and processing time by limiting the number of detections per image. Nonetheless, for most applications in metallic microstructure analysis, these models can handle the typical number of instances encountered in SEM images without significant constraints.
These methodologies are profoundly shaped by trade-offs influencing the field. Data augmentation and transfer learning are pivotal, boosting mAP by 2–3% [
8] by addressing scarcity through techniques like geometric transformations or pre-trained backbones, but they risk domain mismatch—e.g., models fine-tuned on general images may drop IoU by 5–10% on specialized metallic microstructures with unique textures or scales. GANs for synthetic data generation offer a solution, expanding datasets by up to 50% [
13], yet their training instability can produce artifacts, compromising accuracy in precision-critical tasks like corrosion severity assessment. Furthermore, the high computational demands of instance segmentation models (e.g., Mask R-CNN’s multi-branch architecture) trade off against scalability: while delivering superior precision for overlapping instances (outperforming object detection methods like YOLO by 10–15% in dense scenes [
14]), they are less feasible for industrial high-throughput applications, favoring deployment in offline research settings.
Addressing data scarcity and interpretability will further enhance the field, driving innovation in metallic material design. Overall, methodology selection hinges on contextual alignment: Mask R-CNN for detailed, instance-specific analysis in complex microstructures, and Bayesian methods for uncertainty-aware scientific validation. The field is increasingly shaped by the need to resolve these trade-offs through hybrids (e.g., combining attention mechanisms with autoencoders) and explainable AI, as black-box models hinder adoption in quality control. Future advancements must prioritize efficient, interpretable solutions to fully leverage instance segmentation’s potential in accelerating metallic material design and optimization.
2.4. Chemical Composition Analysis in Metals
Chemical composition analysis, as shown in
Figure 6, is a fundamental aspect of materials science, particularly for metals, as their elemental makeup directly influences mechanical, thermal, and chemical properties critical for applications in industries such as aerospace, automotive, shipbuilding, and additive manufacturing. Traditional analytical techniques, including X-ray fluorescence (XRF), energy dispersive X-ray spectroscopy (EDS), laser-induced breakdown spectroscopy (LIBS), and inductively coupled plasma mass spectrometry (ICP-MS), provide detailed elemental insights. However, these methods often involve labour-intensive data interpretation, time-consuming processes, and challenges in detecting trace or light elements due to low signal-to-noise ratios [
17,
18].
Since 2010, artificial intelligence (AI), particularly machine learning (ML) and deep learning (DL), has revolutionised chemical composition analysis by automating data processing, enhancing detection accuracy, and enabling predictive modelling. AI streamlines the analysis of complex spectral data, improves sensitivity for trace elements, and supports real-time quality control in industrial settings [
19,
20]. This report provides an in-depth exploration of AI applications in chemical composition analysis of metals, covering methodologies, case studies, challenges, and future directions, with a focus on advancements since 2010.
2.4.1. Traditional Methods for Chemical Composition Analysis
To appreciate AI’s impact, it is essential to understand the traditional methods used for chemical composition analysis in metals:
X-ray Fluorescence (XRF): A non-destructive technique that excites atoms with X-rays, causing them to emit secondary X-rays characteristic of their elements. XRF is ideal for rapid qualitative and quantitative analysis, widely used for alloy identification and quality control [
17].
Energy Dispersive X-ray Spectroscopy (EDS): Coupled with scanning electron microscopy (SEM), EDS detects X-rays emitted from a sample under electron bombardment, enabling microscale elemental analysis and mapping. It is critical for characterising inclusions and phases in metals [
18].
Laser-Induced Breakdown Spectroscopy (LIBS): Ablates a sample with a laser to create a plasma, whose emission spectrum is analysed to determine elemental composition. LIBS excels in in situ and remote analysis, particularly for recycling and scrap sorting [
20].
Inductively Coupled Plasma Mass Spectrometry (ICP-MS): A highly sensitive technique that ionises samples in a plasma and measures ion mass-to-charge ratios for precise elemental quantification. It is used for trace element analysis but requires sample preparation and is less suited for real-time applications [
19].
These methods, while effective, face challenges such as manual data interpretation, time constraints, and limitations in detecting light or trace elements, which AI addresses through advanced data processing and automation [
21].
2.4.2. Role of AI in Chemical Composition Analysis
AI, encompassing ML and DL, enhances chemical composition analysis in metals through several key functionalities:
2.4.3. Case Studies
The following case studies illustrate AI’s applications in chemical composition analysis of metals, drawing on research since 2010.
Kim et al. [
22] applied unsupervised ML techniques, specifically SVD and ICA, to improve SNR in STEM-EDS data for high-nitrogen stainless steel (HNS) [
22]. The methodology decomposed multivariate X-ray signals, enhanced physical correlations, and reconstructed de-noised EDS maps. Key results included:
SNR Improvements: Nitrogen SNR increased by 44% (from 1.03 to 1.48), and molybdenum SNR by 470% (from 0.43 to 2.45), with minimal changes for chromium, iron, and manganese, as shown in
Table 6.
Findings: Revealed nanoscale N-depleted regions (70–100 nm wide) with a minimum nitrogen concentration of 0.01 wt% adjacent to Cr2N precipitates, compared to 0.2 wt% in the matrix.
Validation: Confirmed using STEM-electron energy loss spectroscopy (EELS) and multicomponent diffusional transformation simulations.
Emelianov et al. [
23] developed an AI-based algorithm using neural networks for metallographic analysis in shipbuilding, focusing on recognising metal microstructures and determining grades. The system employed a multilayer neural network and case-based reasoning, achieving high accuracy. Key findings include:
Neural Network Performance: For GOST 5639-82 (grain amount), the network achieved a classifying error of 0.0149, correctly classifying 274/280 images, as shown in
Table 7.
Software Efficiency: Reduced analysis time from 18 min (ordinary systems) to 5 min, with a deviation in grain parameters of 3–4% compared to 5–10% for traditional methods.
Experimental Results: Correctly calculated steel grain size in 96.7% of cases (58/60), outperforming ordinary systems (89%, 96/108) with statistical significance (t = 2.03, p < 0.01).
Wang et al. [
24] explored ML for classifying mineral phases using SEM-EDS, with techniques applicable to metallic alloys. Five shallow ML models and a U-Net DL model were compared for pixel-level classification of 13 phases in a shale sample. Key results include:
Performance Metrics as shown in
Table 8: Random Forest and k-NN achieved F1 scores of 0.92, while U-Net scored 0.88 (micro) and 0.73 (macro), outperforming Random Forest on unseen samples (F1 0.92 vs. 0.85).
Data Sensitivity: Logistic Regression and SVM were less affected by reduced dataset sizes, while k-NN, Random Forest, and ANN improved with more data.
Critical Elements: Silicon, aluminium, magnesium, calcium, potassium, and iron were key for Random Forest, with silicon noise impacting performance.
Van et al. [
20] developed DL models for quantitative LIBS analysis of aluminium scrap, enabling real-time sorting. Back Propagation Neural Network (BPNN) and GHOSTNET were compared using datasets of 27 certified aluminium reference samples and 733 post-consumer scrap pieces. Key findings include:
Performance Metrics as shown in
Table 9: The best model achieved RMSE values of 0.02 wt% for Al and Si, and 0.01 wt% for Fe, Cu, Mn, Mg, and Zn.
Multiple Loss Functions: Improved performance for scrap samples across all metrics, though R2 slightly decreased for Fe, Mn, and Mg in reference samples.
Real-Time Capability: Processing time of ~10 ms, enabling real-time applications.
Comparison: Outperformed univariate regression and traditional ML methods in RMSE, MAE, and R2.
ML enhances XRF analysis by improving calibration, classification, and spatial resolution [
21]. Applications include:
Calibration Creation: ML models like SVM and neural networks create material-specific calibrations for elements like Si, Al, and Na [
25].
Material Classification: Enables real-time optimisation of XRF parameters, identifying minor compositional variations [
21].
Spatial Resolution: Residual dense networks eliminate blurring, enhancing image resolution for trace element detection [
26].
EDS is a widely utilised technique for determining the chemical composition of materials, such as metals, by analysing the characteristic X-rays emitted when a sample is bombarded with electrons. Proper calibration of the EDS system is essential to ensure that the energy scale and intensity of these X-ray signals are accurately measured. This calibration process aligns the detected signals with known standards, enabling precise identification and quantification of elements based on their unique X-ray energies and intensities. Without accurate calibration, the reliability of the compositional data generated by EDS is compromised.
Errors in EDS signal calibration can arise from several sources, including:
Drift in the detector’s energy response over time.
Incorrect identification of X-ray peaks, leading to misassignment of elements.
Inconsistencies in calibration standards, such as mismatches between the standard and the sample matrix, can occur.
When calibration is inaccurate, the EDS data—such as peak positions and intensities—fed into AI models for training becomes erroneous. Since AI models learn patterns and relationships from their training data, any inaccuracies in the input data will propagate through the model, resulting in flawed predictions. For example:
A miscalibrated energy scale might cause the model to confuse elements with similar X-ray energies (overlapping peaks of titanium and vanadium).
Incorrect intensity measurements could lead to erroneous concentration estimates, undermining the model’s ability to quantify elemental compositions in metals accurately.
This is particularly significant in the context of chemical composition analysis, where precision is paramount for applications like alloy development or quality control. If the training data is unreliable due to calibration errors, the AI model’s performance will be degraded, reducing its effectiveness in real-world scenarios.
To mitigate the impact of EDS calibration errors on AI models, it is essential to incorporate error assessment into the training process. One effective approach is to quantify the uncertainty associated with the calibration and propagate it through the model. This can be achieved using techniques such as:
Uncertainty Quantification (UQ) Methods: Tools like Bayesian neural networks or Monte Carlo dropout can estimate the uncertainty in the model’s predictions, reflecting the influence of calibration inaccuracies.
Sensitivity Analysis: Evaluating how variations in calibration parameters affect the model’s outputs can help identify the most critical sources of error.
By understanding and quantifying these uncertainties, researchers can assess the robustness of the AI model and adjust its training accordingly.
Several strategies can be employed to reduce the impact of calibration errors on AI-driven chemical analysis:
High-Quality Calibration Standards: Using standards that closely match the sample’s matrix (similar metallic compositions) ensures more accurate energy and intensity calibration.
Regular Calibration Checks: Periodically verifying and adjusting the EDS system’s calibration can account for detector drift or environmental changes, maintaining data accuracy over time.
Data Augmentation: During training, synthetic datasets simulating potential calibration errors (shifted peak positions or altered intensities) can be introduced. This helps the AI model learn to generalise better and become more resilient to real-world calibration imperfections.
Implementing these practices can enhance the quality of the training data and improve the AI model’s reliability, even when faced with calibration uncertainties.
2.4.4. Challenges and Future Directions
Challenges include:
Data Scarcity: Limited high-quality, annotated datasets for specialised alloys or trace elements [
19].
Model Interpretability: Complex models lack transparency [
22].
Integration: Incorporating AI into analytical workflows requires compatible systems [
21].
Standardisation: Lack of standardised metrics hinders model comparison [
23].
Computational Resources: Training DL models demands significant power [
20].
2.4.5. Critical Analysis of Methodologies and Trade-Offs in Chemical Composition Analysis
Chemical composition analysis methodologies in SEM/EDS for metallic materials demonstrate varying efficacy across contexts, influenced by their ability to handle spectral complexity and integrate with imaging data, while navigating trade-offs in data requirements, computational efficiency, and interpretability. Random Forest (RF) models succeed in simpler, industrial applications like inclusion classification in steels (F1-score 0.92 [
27]), owing to their ensemble-based approach that robustly handles feature variability from EDS spectra, such as elemental ratios, without requiring deep architectures. This makes RF particularly effective in high-throughput steelmaking environments where rapid, binary decisions (e.g., inclusion vs. non-inclusion) are needed, leveraging manually engineered features like grayscale values for quick training on modest datasets. However, RF’s limitations surface in multi-class scenarios with overlapping compositions (e.g., oxysulfides vs. calcium aluminates), where accuracy drops by 5–10% due to reliance on shallow feature interactions, rendering it less suitable for research on complex alloys requiring nuanced elemental mapping.
Convolutional Neural Networks (CNNs) and Deep Learning (DL) frameworks, such as those used in EDS noise reduction (achieving 76% accuracy from BSE images alone [
19]), excel in data-rich, precision-oriented contexts like alloy development or quality control in high-entropy alloys. CNNs’ hierarchical feature extraction allows them to infer latent chemical information from grayscale BSE images, surpassing random chance (20% to 76% accuracy) by capturing subtle patterns that traditional methods overlook. This success is amplified in multi-modal integrations (e.g., combining SEM images with EDS spectra), enabling end-to-end learning for tasks like aluminum scrap sorting, where CNNs automate classification with high efficiency. Yet, a core trade-off is their dependence on large annotated datasets—scarce in materials science—leading to overfitting or reduced generalization (e.g., F1 drop from 0.92 to 0.85 on unseen samples [
22]), and increased computational demands (1.5–2x training time compared to RF), confining them to well-resourced labs rather than real-time industrial pipelines.
These methodologies are shaped by fundamental trade-offs that define the field: accuracy versus data efficiency, where DL models offer superior precision (e.g., 7% IoU gains with augmentation [
6]) but falter without extensive labeling, prompting reliance on techniques like transfer learning or GANs for synthetic spectra, which expand datasets by 30–50% [
13] at the risk of introducing artifacts. Interpretability remains a critical barrier—DL’s “black-box” nature hinders trust in applications like biomaterials corrosion analysis [
26], where understanding decision rationale is essential for regulatory compliance, contrasting with RF’s more transparent feature importance rankings. Calibration errors further exacerbate trade-offs, potentially degrading concentration estimates by 5–10% if not addressed via uncertainty quantification (e.g., Bayesian methods), as seen in EDS drift scenarios.
Overall, methodology choice aligns with context: RF for scalable, industrial binary tasks balancing speed and simplicity, and CNN/DL for complex, research-driven multi-class analysis prioritizing accuracy. The field is evolving toward hybrid solutions (e.g., fusing XRF/EDS with attention mechanisms) to resolve these trade-offs, but persistent challenges like standardization of metrics and sustainable AI (e.g., autoencoders for edge devices [
21]) highlight the need for interpretable, data-efficient innovations to fully harness chemical composition analysis in advancing metallic material optimization and quality control.
Shallow models like Random Forest favor high-scalability for industrial inclusion classification in steels, while deeper CNNs push into higher complexity for multi-modal EDS integration. The Microstructure Analysis Spectrum frames this evolution, revealing why hybrid approaches may bridge quadrants for future balanced adoption.