To provide context for our proposed framework, this section reviews the foundational concepts and existing literature relevant to our study. We first introduce the principles of Meta-Pseudo-Labeling (
Section 2.1) and the architectures of U-Net and nnU-Net (
Section 2.2). We then discuss the taxonomy of defects in Fused Filament Fabrication (
Section 2.3). Finally, we survey recent advancements in vision-based additive manufacturing monitoring, highlighting the persistent challenge of dataset scarcity (
Section 2.4).
2.1. The Fundamentals of Meta-Pseudo-Labeling
Meta-Pseudo-Labeling (MPL) formulates pseudo-label generation as a bilevel optimization problem rather than a fixed self-training heuristic. Instead of passively imitating teacher predictions, the student is trained on pseudo-labels that are explicitly optimized to improve supervised performance on labeled data. This perspective connects MPL to gradient-based meta-learning and hyperparameter optimization frameworks [
6,
7,
8,
9].
Let
and
denote the parameters of the teacher and student networks, respectively. Given labeled data
and unlabeled data
, MPL can be written as the bilevel problem:
where
is the supervised loss on
and
is the pseudo-label loss on
. The outer gradient
is the
outer (bilevel) gradient, since it differentiates through the solution of the inner optimization problem [
7]. Exact computation would require backpropagation through the entire student training trajectory, which is infeasible for large-scale deep networks.
MPL therefore adopts a one-step approximation. A student update is computed as follows:
and the resulting supervised improvement
serves as a finite-difference estimate of the bilevel update signal. Multiplying this scalar signal with the gradient of the teacher pseudo-label objective yields a scalable first-order approximation of the true bilevel gradient.
The choice of pseudo-label representation further shapes the nature of this gradient signal. If pseudo-labels are sampled as discrete one-hot variables, the teacher update formally involves the gradient of an expectation over sampled labels, which leads to a score-function estimator analogous to REINFORCE [
10]. In contrast, soft pseudo-label probabilities render the teacher–student interaction fully differentiable, eliminating the need for stochastic gradient estimators and enabling standard backpropagation [
6]. In our segmentation adaptation, we adopt hard pseudo-labels (sigmoid-thresholded at 0.5, with low-confidence regions suppressed) and leverage the finite-difference dot-product signal
h as the teacher reward, which retains the bilevel structure without requiring explicit stochastic gradient estimation.
Conceptually, MPL forms a closed feedback loop: the teacher proposes pseudo-labels, the student learns from them, and the teacher is rewarded only when the student improves on labeled data. This mechanism mitigates confirmation bias in semi-supervised learning and explains the strong empirical performance of MPL across vision tasks [
6].
2.2. Fundamentals of U-Net and nnU-Net
The U-Net architecture, introduced by Ronneberger et al. [
11], revolutionized biomedical image segmentation by combining a contracting encoder path with an expanding decoder path. The key innovation lies in its
skip connections, which concatenate high-resolution feature maps from the encoder with the decoder’s upsampled features. This mechanism allows the network to recover fine-grained spatial details lost during pooling, enabling precise localization essential for medical and industrial inspection tasks.
However, U-Net’s performance is highly sensitive to hyperparameters such as patch size, preprocessing, and topology, which are often manually tuned for specific datasets. To address this, Isensee et al. introduced
nnU-Net (no-new-Net) [
12,
13], a framework that automates the configuration of the entire segmentation pipeline. Unlike architectural variants that introduce complex modules (e.g., attention mechanisms or dense connections), nnU-Net retains a vanilla U-Net structure while rigorously optimizing the
end-to-end training pipeline configuration.
nnU-Net dynamically adapts to the “fingerprint” of a new dataset by automatically setting:
Preprocessing: resampling strategies and intensity normalization (e.g., z-scoring vs. clipping) based on modality and voxel spacing.
Network topology: configuring the number of pooling operations, kernel sizes, and feature map depths, ensuring the receptive field covers the input patch while fitting within GPU memory constraints.
Training schedule: utilizing a robust combination of Dice and Cross-Entropy loss with a large smoothing constant (smooth = 1) and batch-level loss aggregation (batch_dice=True), along with extensive on-the-fly data augmentation (rotation, scaling, elastic deformation) to prevent overfitting on small datasets.
By systematizing these design choices, nnU-Net serves as a robust, self-adapting baseline that frequently outperforms manually designed architectures, making it an ideal foundation for evaluating our semi-supervised MPL approach.
2.3. Fundamentals of Defects in Additive Manufacturing
As detailed in [
3], Fused Filament Fabrication (FFF) is susceptible to a wide array of physical defects that compromise structural integrity and geometric fidelity. These defects can be broadly categorized into extrusion anomalies, dimensional inaccuracies, and thermal-induced deformities.
Extrusion anomalies include interruption due to clogged nozzles or material grinding, as well as inconsistent flow leading to blobs or wavy surface patterns.
Dimensional errors arise from under- or over-extrusion, often exacerbated by first-layer adhesion issues like “elephant foot.”
Thermal defects such as warping, curling, and overheating occur when cooling rates are mismatched with the material’s melting properties, particularly in high-temperature polymers like ABS. Furthermore,
layer-specific failures, including layer separation (delamination), shifting, and missing layers, directly undermine the layer-wise construction principle of AM. Understanding this taxonomy of failures motivates using top-layer segmentation as the foundational monitoring primitive, as illustrated by the fault tree subset in
Figure 1.
Table 1 enumerates all intermediate and undeveloped fault events derived from this tree, annotated by whether each fault manifests as a top-layer-visible symptom.
The occurrence of these faults is rooted in the fundamental mechanics of FFF. Fused Filament Fabrication constructs 3D geometries by extruding semi-molten thermoplastic material through a heated nozzle along a computer-controlled trajectory. This trajectory is defined by G-code, a numerical control programming language that specifies the spatial coordinates , extrusion rates (E), and travel speeds (F) for each layer. In an ideal process, the physical deposition matches this digital instruction perfectly. However, stochastic environmental factors and hardware limitations often introduce discrepancies between the G-code “ideal” and the physical “actual”, necessitating real-time visual verification.
Rabbaa et al. [
14] provided a comprehensive foundation for understanding FFF hardware, covering physical construction, mathematical modeling, processor integration, and stepper motor programming. Complementing this, Jadhav et al. [
15] demonstrated a practical deployment of real-time LLM-assisted monitoring, detailing their specific printer setup, firmware configuration, and image acquisition pipeline, which serve as a concrete blueprint for future IoT integration of learned segmentation models.
2.4. Related Work
Xiao et al. [
16] of the FLARE2022 challenge [
17] explored the use of Meta-Pseudo Labeling (MPL) [
6] for medical image segmentation. While MPL demonstrates potential, its implementation remains closed-source and specific to volumetric data. Our work bridges this gap by providing an open-source, reproducible adaptation of MPL for additive manufacturing, specifically addressing the extreme foreground–background imbalance (1:24) inherent in industrial defect detection.
The scarcity of annotated segmentation datasets in additive manufacturing is well-documented. Liu et al. [
4] conducted a systematic review across 9 major dataset repositories and the top-ranked AM journal, identifying only 10 open image datasets suitable for AM applications, of which only 1–2 provided annotations for microstructure defect segmentation in melt pools. This scarcity is further corroborated by Zhang et al. [
18], whose review of 144 ML-assisted AM studies found that the majority of datasets contain fewer than 1000 instances, with only 13 studies reporting datasets with more than 10,000 samples. Both reviews emphasize the need for robust methodologies that can operate effectively with limited labeled data, a key motivation for our MPL-based approach. This data scarcity is severe enough that some comparative studies, such as that of Werkle et al. [
19], explicitly eschew data-driven approaches entirely in favor of classical computer vision (e.g., Canny Edge detection), citing dataset acquisition difficulty as the primary barrier, further reinforcing the need for our synthetic data generation pipeline.
Other approaches, such as those proposed in [
20], have demonstrated effective real-time control by using image classification to predict discrete process-parameter states (e.g., flow rate, Z-offset) and issuing corrective G-code commands. However, such classification-centric approaches are fundamentally constrained to the discrete error states observed during training and may miss non-systematic, local geometric defects, such as material blobs or partial delamination, that can occur even under nominally correct parameter settings. This work complements this direction by providing a trained segmentation model capable of performing dense, pixel-wise geometric comparison, a model whose learned representations could be directly applied to such control loops in future work.
The scale of publicly available benchmarks further underscores this scarcity. While large image-level classification datasets exist (e.g., over 1.2 M images in [
21]) and object detection datasets for coarse bounding-box tracking have been developed (e.g., [
22,
23]), pixel-wise segmentation data remains critically limited. To identify subtle extrusion anomalies without dense masks, recent works have focused on classification-based approaches: Li et al. [
24] apply a Weakly Supervised Data Augmentation Network (WSDAN) to classify four extrusion state patterns from only 1000 images at the nozzle; Wu [
25] uses an AR overlay framework combined with classical ML classifiers to detect deviations, bypassing learned pixel-wise segmentation entirely. Even when true layer-wise segmentation is attempted, the datasets are severely constrained. For example, Mehta and Shao [
26] employed U-Net within a Federated Learning framework to enable collaborative model training; however, their evaluation relied on the open-source Peregrine (v2021-03) dataset [
27], which contains only 60 explicitly annotated layers. These limitations collectively highlight why robust methods that can learn from limited, unlabeled data, such as our MPL-based approach, are essential for practical FFF monitoring.
Furthermore, architectural advancements in AM monitoring have sought to directly integrate digital fabrication instructions into the inference pipeline. For instance, Tran et al. [
2] proposed GG-Net, a specialized U-Net architecture that uses Spatial Transformer Networks (STNs) and cross-attention mechanisms to jointly perform G-code alignment and layer segmentation within a single end-to-end model. Note that geometric alignment with the G-code blueprint is a general requirement of any complete defect detection pipeline, and the distinction of GG-Net is that it internalizes this alignment within the learned model. In contrast, our work focuses on the upstream problem of training robust segmenters: rather than designing a custom end-to-end architecture, we leverage G-code
offline to synthesize massive annotated training datasets, allowing standard, highly optimized architectures (like nnU-Net) to learn strong pixel-wise representations, with MPL further exploiting unlabeled data. G-code alignment for final defect comparison remains a natural next step and is explicitly identified as future work.
Table 2 consolidates these prior approaches by methodology category and corresponding dataset scale.
The gaps identified across this body of work are that scarce pixel-level annotations, evaluations against weak baselines, and the absence of semi-supervised approaches tailored to AM segmentation directly motivate the synthetic data pipeline and MPL framework described in the following section.