Explainable Few-Shot Anomaly Detection for Real-Time Automotive Quality Control

Mawah, Safeh Clinton; Aga, Dagmawit Tadesse; Hatefi, Shahrokh; Smith, Farouk; Yihun, Yimesker

doi:10.3390/pr13103238

Open AccessArticle

Explainable Few-Shot Anomaly Detection for Real-Time Automotive Quality Control

by

Safeh Clinton Mawah

¹

,

Dagmawit Tadesse Aga

²

,

Shahrokh Hatefi

^3,*

,

Farouk Smith

^3,*

and

Yimesker Yihun

^1,*

¹

Department of Mechanical Engineering, North Carolina Agricultural and Technical State University, Greensboro, NC 27411, USA

²

Department of Computer Science, North Carolina Agricultural and Technical State University, Greensboro, NC 27411, USA

³

Department of Mechatronics Engineering, Faculty of Engineering, The Built Environment and Technology, Nelson Mandela University, Port Elizabeth 6000, South Africa

^*

Authors to whom correspondence should be addressed.

Processes 2025, 13(10), 3238; https://doi.org/10.3390/pr13103238

Submission received: 1 September 2025 / Revised: 20 September 2025 / Accepted: 9 October 2025 / Published: 11 October 2025

(This article belongs to the Special Issue Recent Trends in Advanced Manufacturing Technologies for Materials Processing and Production)

Download

Browse Figures

Versions Notes

Abstract

Automotive manufacturing quality control faces persistent challenges such as limited defect samples, cross-domain variability, and the demand for interpretable decision-making. This work presents an explainable few-shot anomaly detection framework that integrates EfficientNet-based feature extraction, adaptive prototype learning, and component-specific attention mechanisms to address these requirements. The system is designed for rapid adaptation to novel defect types while maintaining interpretability through a multi-modal explainable AI module that combines visual, quantitative, and textual outputs. Evaluation on automotive datasets demonstrates promising performance on evaluated automotive components, achieving 99.4% accuracy for engine wiring inspection and 98.8% for gear inspection, with improvements of 5.2–7.6% over state-of-the-art baselines, including traditional unsupervised methods (PaDiM, PatchCore), advanced approaches (FastFlow, CFA, DRAEM), and few-shot supervised methods (ProtoNet, MatchingNet, RelationNet, FEAT), and with only 0.63% cross-domain degradation between wiring and gear inspection tasks. The architecture operates under real-time industrial constraints, with an average inference time of 18.2 ms, throughput of 60 components per minute, and memory usage below 2 GB on RTX 3080 hardware. Ablation studies confirm the importance of prototype learning (−4.52%), component analyzers (−2.79%), and attention mechanisms (−2.21%), with K = 5 few-shot configuration providing the best trade-off between accuracy and adaptability. Beyond performance, the framework produces interpretable defect localization, root-cause analysis, and severity-based recommendations designed for manufacturing integration with execution systems via standardized industrial protocols. These results demonstrate a practical and scalable approach for intelligent quality control, enabling robust, interpretable, and adaptive inspection within the evaluated automotive components.

Keywords:

anomaly detection; few-shot learning; explainable artificial intelligence; manufacturing quality control; automotive inspection; EfficientNet; prototype learning; real-time defect detection; computer vision; industrial automation

1. Introduction

Artificial intelligence (AI) and robotics are transforming automotive manufacturing quality control (QC) by enabling flexible inspection architectures, vision-guided methods, deep learning-based defect detection, and reinforcement learning (RL) for robotic inspection. Flexible and reconfigurable inspection systems have been proposed to handle high-mix production environments. Lupi et al. [1] introduced a Reconfiguration Support System (RSS) that adapts inspection parameters using computer-aided design (CAD) data and synthetic renderings, while Lupi et al. [2] demonstrated a CAD-to-ReCo pipeline that automates conversion from annotated 3D CAD models to robotic inspection routines, reducing setup times by more than 50%. Similarly, Rio-Torto et al. [3] applied semi-supervised detection with human–machine interfaces (HMIs) to reduce annotation demands, and Azamfirei et al. [4] emphasized design–manufacturing alignment for reconfigurable workflows.

When CAD data is incomplete, vision-guided inspection methods become critical. Khan et al. [5] reconstructed part geometries using Structure-from-Motion (SfM) and Multi-View Stereo (MVS) to generate ultrasonic inspection paths without CAD models. Likewise, Jafari-Tabrizi et al. [6] presented a focus-measure-driven trajectory planning approach that learns optimal scanning paths from visual feedback, achieving sub-millimeter localization accuracy. These methods improve adaptability to legacy parts and variable product lines.

Deep learning (DL) has emerged as the dominant paradigm for defect detection. Convolutional neural networks (CNNs) achieve robust performance under variable shop-floor conditions [7,8,9,10], with hybrid approaches showing that classical preprocessing can enhance deep feature extraction [11]. Broader surveys confirm that DL methods consistently outperform earlier machine learning (ML) approaches, but still face challenges in interpretability and reproducibility [12]. Complementary studies demonstrate near-100% inspection accuracy with integrated CNN pipelines [13,14].

Reinforcement learning (RL) further extends QC capabilities by supporting adaptive path planning and contact-based inspection. Deep reinforcement learning (DRL) algorithms such as Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO) enable continuous control in inspection tasks [15], while guided policy search accelerates the learning of contact-rich routines [16]. Recent contributions integrate object detection with RL-based inverse kinematics [17], employ digital twins for safer RL training [18], and implement RL-enhanced collaborative QC cells [19]. RL-based assembly techniques enable the automatic generation of accurate and reliable assembly sequences [20].

Human factors remain central to QC reliability. Papavasileiou et al. [21] showed that fixture deformation, operator handling, and gripper design can degrade inspection accuracy, recommending preventive fixturing diagnostics and operator training. Complementarily, Azamfirei et al. [4] highlighted the importance of early-stage defect filtering to reduce false positives.

Finally, integration with Industry 4.0 and Industry 5.0 frameworks is shaping future QC directions. Macaulay et al. [22] classified ML approaches for robotic inspection by autonomy levels, while Rakholia et al. [23] outlined the best practices and challenges in AI adoption. Arinez et al. [24] mapped applications spanning predictive maintenance to QC, and Jain et al. [25] proposed a cross-industry implementation framework. Additional contributions include Chouchene et al. [26] on vehicle non-conformity detection, Lopez-Hawa et al. [27] on robotic QC for smart manufacturing, and Gupta et al. [28] on AI’s impact in automotive production systems.

This study builds on these advances by presenting a practical, explainable few-shot anomaly detection framework tailored to automotive quality control (QC). The system adapts to new defect types from as few as

K = 5

examples via prototype learning, sustains real-time throughput (≈18.2 ms per image, ∼60 components/min) with a lightweight 5.8M-parameter model, and produces component-level explanations with severity-based actions suitable for manufacturing execution system (MES) integration (e.g., OPC-UA, Modbus, EtherNet/IP). The framework specifically addresses four persistent gaps: (i) scarce labeled data and cross-domain drift, mitigated by prototypes and component analyzers; (ii) limited interpretability, enhanced through Grad-CAM++, LIME, and SHAP with a calibrated 0.7 decision threshold; (iii) deployment friction, reduced through a compact memory footprint (≈1.8 GB); and (iv) weak cross-domain evidence, validated through gear versus engine-wiring experiments with only a 0.63% accuracy degradation.

2. Methodology

This section presents a comprehensive methodology for developing an AI-integrated engine-wiring anomaly-detection system specifically tailored for automotive manufacturing environments. Our approach combines state-of-the-art artificial intelligence techniques with practical engineering requirements to address critical challenges such as real-time performance, high accuracy, and seamless manufacturing integration.

The proposed AI system integrates four core components: (1) EfficientNet-based feature extraction optimized for wire inspection, (2) adaptive prototype learning for few-shot adaptation, (3) specialized attention mechanisms for focused defect detection in engine wiring and gears, and (4) confidence-aware decision-making for robust manufacturing integration. This architecture effectively addresses key quality control constraints and technical challenges while providing detailed explanations of defects.

2.1. Explainable AI Framework Design and Theoretical Foundation

The proposed XAI framework builds upon established explainability principles and empirically validated techniques. Following the framework proposed by Doshi-Velez and Kim [29], our approach provides both local explanations (attention maps, LIME) and global model behavior insights (SHAP). The integration of multiple XAI techniques addresses the complementary explanation needs identified by Hoffman et al. [30] for human–AI collaboration in high-stakes domains.

Attention-based explanations align with established cognitive theories of human visual processing in manufacturing inspection tasks. Research by [31] demonstrates that manufacturing inspectors naturally focus on component boundaries and connection points—areas that our attention mechanisms explicitly highlight. The spatial attention maps generated by our system correspond to known defect-prone regions identified in the automotive quality control literature [31].

Confidence score presentation follows uncertainty quantification best practices established in industrial AI applications [32]. The 0.7 threshold selection aligns with decision theory frameworks for manufacturing quality control, where Type I and Type II error costs have been extensively characterized [33].

2.2. System Architecture Overview

Our explainable few-shot anomaly detection framework integrates advanced deep learning techniques with practical manufacturing requirements, adopting a modular and hierarchical design that enables both high-performance anomaly detection and comprehensive explainability for automotive quality control. The framework consists of six interconnected modules, each tailored to address specific challenges in industrial anomaly detection.

At its foundation, the system employs an EfficientNet backbone as the primary feature extraction engine, optimized for industrial imaging through compound scaling to balance accuracy with computational efficiency, making it suitable for real-time edge deployment. A Feature Adaptation Network refines these extracted features into domain-specific representations using a two-layer transformation with dropout, ensuring adaptability to specialized tasks such as engine wiring and gear inspection. Building on this, a Prototypical Learning Module implements a few-shot learning strategy with learnable prototypes, allowing for rapid adaptation to new manufacturing variations, even when defective samples are scarce.

To further enhance domain specialization, Component Analyzers provide targeted defect detection for specific manufacturing elements, with dedicated modules for tasks such as cable fastening integrity in wiring assemblies and surface finish evaluation in precision gears. Complementing this, spatial Attention Mechanisms generate interpretable attention maps that improve model accuracy and support inspection processes. Finally, an explanation generator translates the neural output of the system into structured quality control reports, incorporating Grad-CAM++, LIME, and SHAP to provide visual, quantitative, and textual insights designed for manufacturing integration workflows.

2.3. System Architecture and Data Flow

Figure 1 illustrates the system architecture and data flow, covering the entire process from input preprocessing to final decision-making. The modular design ensures flexible deployment across diverse manufacturing environments while maintaining consistent performance standards. The architecture is organized into a five-layer hierarchical structure with three specialized processing columns that handle distinct functional responsibilities.

At the input layer, auto-part images undergo standardized preprocessing, including resizing to 224 × 224 resolution and ImageNet normalization, to ensure compatibility with manufacturing cameras. The input is distributed across three parallel streams: the support set column processes k-shot examples for few-shot learning, the query set column handles test images requiring inspection, and the episode formation column manages the meta-learning framework setup.

These preprocessed images are then passed to the feature extraction layer, where three specialized networks operate in parallel. The EfficientNet backbone (left column) extracts robust visual features from support examples, the feature adaptation network (middle column) performs domain-specific transformation of query images from 1280 to 256 dimensions, and the prototype learning module (right column) maintains class prototypes using Euclidean distance metrics for rapid few-shot adaptation.

The extracted features feed into the decision layer, which incorporates three complementary analysis approaches: component attention mechanisms (left) provide spatial focus on critical manufacturing regions, anomaly scoring (middle) performs multi-criteria fusion through MLP networks, and confidence estimation (right) quantifies prediction uncertainty using Monte Carlo dropout techniques.

To enhance interpretability, the explainable AI layer integrates three explanation techniques operating on the respective processing streams: Grad-CAM++ (left) generates visual attention maps for layer-wise relevance, LIME (middle) provides local interpretability through feature importance analysis, and SHAP (right) delivers global explanations of model behavior and feature attribution patterns.

Finally, the output layer delivers integrated results that combine all three processing streams: binary classification outcomes (left), defect localization with bounding boxes and attention maps (middle) and comprehensive explanation reports with root cause analysis (right), all formatted for seamless integration into manufacturing workflows and MES systems.

Architectural Design Principles

The system is built on key design principles that ensure optimal performance and flexibility. Modularity enables each component to be independently updated or replaced for flexible deployment across manufacturing environments, allowing system evolution without comprehensive redesign. Scalability supports extension to new component types via additional analyzers without altering core modules, facilitating adaptation to diverse manufacturing requirements. Interpretability ensures that all decision pathways include explanation mechanisms for quality audit traceability, supporting regulatory compliance, and operational transparency.

Efficiency prioritizes computational optimization to enable real-time deployment on industrial hardware, balancing performance with practical resource constraints. Robustness incorporates multiple analysis pathways and confidence assessments to provide fault tolerance for reliable manufacturing operations, ensuring consistent performance under varying operational conditions. This architecture effectively addresses the fundamental challenges of industrial anomaly detection while ensuring reliability and transparency in diverse manufacturing environments.

2.4. AI Training, Component-Specific Analysis, and Explainability

The proposed anomaly detection framework employs a carefully designed training and optimization strategy to ensure robust performance under industrial conditions. Training is guided by a differential learning rate policy, where each network component adapts at an appropriate pace. The EfficientNet backbone is fine-tuned with a conservative learning rate of

1 \times 10^{- 5}

to preserve valuable pre-trained representations while adapting to manufacturing imagery. The Feature Adapter applies a moderate rate of

1 \times 10^{- 4}

to transform generic features into domain-specific representations, whereas task-specific modules also operate at

1 \times 10^{- 4}

to balance specialization with stability. Prototype-based learning, critical for few-shot adaptation, uses a higher rate of

2 \times 10^{- 4}

to rapidly adjust to new defect patterns with minimal training samples. Optimization is performed with the Adam optimizer (

β_{1} = 0.9

,

β_{2} = 0.999

) to ensure stable convergence, while cosine annealing over 20 epochs promotes reliable learning without overfitting or instability.

To address the diverse requirements of automotive quality control, specialized component analyzers are developed. For engine wiring assemblies, sub-modules evaluate cable fastening integrity, verify hoop positioning for alignment accuracy, and assess clip engagement reliability. For precision gears, analyzers evaluate the quality of the surface finish, detect contamination and foreign materials, and verify the accuracy of the dimensions against the tolerance specifications. These targeted analyzers provide domain-specific insights that general-purpose models cannot capture.

Beyond detection accuracy, the framework integrates a comprehensive explainable AI (XAI) module to ensure interpretability and actionable decision-making. Component-specific neural submodules generate quality scores, while spatial attention mechanisms highlight the most relevant image regions for defect localization. To further enhance transparency, techniques such as Grad-CAM++, LIME, and SHAP provide complementary visual, quantitative, and textual explanations. This multi-modal approach follows best practices for industrial AI systems, providing visual, quantitative, and textual explanations designed to support quality engineers in understanding defect patterns and process optimization workflows.

Together, the integrated training strategy, component-specific analyzers, and an explainable AI framework ensure that the system not only achieves high performance but also provides interpretable, reliable, and insights designed for real-world automotive manufacturing environments.

2.5. Datasets and Preprocessing

To evaluate the proposed framework, we used two publicly available datasets from the Zenodo open-access repository, downloaded on 15 July 2025. Both datasets provide realistic defect scenarios in automotive manufacturing. The first dataset, CSEM-MISD, consists of metallic parts imaged under 108 different illumination conditions [34]. From this dataset, we selected the gear subset, which contains 7700 images divided into 3520 training samples (2420 normal and 1100 defective) and 4180 testing samples (3080 normal and 1100 defective). The second dataset, AutoVI, contains real-production images of automotive components. For this study, we focus on the engine wiring subset, which is a high-risk factor for recalls. It consists of 599 images, with 456 normal samples for training and 143 testing samples (51 normal and 92 defective). All images were standardized to a resolution of

224 \times 224

and normalized with ImageNet statistics to ensure compatibility with industrial cameras. Table 1 shows the distribution of both datasets.

2.6. Baseline Methods

To benchmark performance, we compared our framework with both unsupervised and supervised/few-shot anomaly detection methods (Table 2). Unsupervised baselines include PaDiM [35] and PatchCore [36], while supervised few-shot baselines encompass ProtoNet [37], MatchingNet [38], RelationNet [39], and FEAT [40].

We selected a comprehensive set of baselines that represent the current state of the art across multiple anomaly detection paradigms. Traditional unsupervised methods include PaDiM [35] and PatchCore [36], representing statistical modeling and memory-based approaches, respectively. To address reviewer concerns about baseline strength, we include advanced unsupervised methods: FastFlow [41], utilizing normalizing flows for likelihood-based anomaly scoring; CFA [42], which employs cross-feature fusion with attention mechanisms; and DRAEM [43], which uses discriminative reconstruction with transformer architectures.

For supervised comparison, we implement representative few-shot learning methods: ProtoNet [37] as the foundational prototype-based approach and FEAT [40] representing the state-of-the-art feature adaptation techniques. This selection provides fair comparison across unsupervised, semi-supervised, and few-shot supervised paradigms while maintaining focus on the methods suitable for industrial deployment.

Our baseline selection prioritizes methods with demonstrated effectiveness in visual anomaly detection while representing diverse technical approaches. Traditional patch-based methods (PaDiM, PatchCore) provide strong unsupervised baselines with established industrial applicability. Advanced methods (FastFlow, CFA, DRAEM) represent recent developments in flow-based modeling, attention mechanisms, and transformer architectures. Few-shot methods (ProtoNet, FEAT) enable direct comparison with supervised approaches under identical data constraints (K = 5 shots). This comprehensive evaluation demonstrates our method’s superiority across different supervision levels and architectural choices while validating its specific advantages for manufacturing quality control applications.

2.7. Evaluation Metrics

Performance was assessed using multiple metrics relevant to manufacturing quality control. Classification accuracy, precision, recall, and F1-score measured predictive performance across classes. Ranking metrics, including ROC-AUC and PR-AUC, were used to evaluate the model’s ability to distinguish defective from non-defective components under varying thresholds. To ensure reliability for deployment, we performed a threshold analysis for optimal decision boundaries and confidence calibration to validate uncertainty estimates. Confusion matrices were examined to identify error patterns between defect types, while the interpretability of model output was validated by correlating component-level quality scores with ground truth annotations.

2.8. Implementation Details

All experiments were carried out on an NVIDIA RTX 3080 GPU (NVIDIA Corporation, Santa Clara, CA, USA) with 10 GB memory, paired with an Intel i7-9700K CPU (Intel Corporation, Santa Clara, CA, USA), 32 GB RAM, and NVMe SSD storage (Samsung Electronics, Suwon, Republic of Korea). The framework was implemented in PyTorch 1.9 with CUDA 11.0 support, using OpenCV 4.5 for preprocessing and Captum for explainability integration. Training used the Adam optimizer with

β_{1} = 0.9

,

β_{2} = 0.999

, and a cosine annealing scheduler over 20 epochs. Differential learning rates were applied:

1 \times 10^{- 5}

for EfficientNet fine-tuning,

1 \times 10^{- 4}

for feature adaptation and task-specific modules, and

2 \times 10^{- 4}

for prototype learning. Regularization strategies included dropout (

p = 0.3

) and gradient clipping (max_norm = 1.0) to ensure robust training.

2.9. Statistical Validation

To ensure robust evaluation, all experiments were conducted across 15 independent random seeds with identical train/test splits. Statistical significance was assessed using non-parametric Mann–Whitney U tests for pairwise comparisons and Kruskal–Wallis ANOVA for multi-group analysis. Effect sizes were quantified using Cliff’s delta (

δ

), with the following thresholds: negligible (

| δ | < 0.147

), small (

0.147 \leq | δ | < 0.33

), medium (

0.33 \leq | δ | < 0.474

), and large (

| δ | \geq 0.474

). All results report mean (±) 95% confidence intervals computed via bootstrap resampling (

n = 1000

).

2.10. Component-Specific Analysis and Integration

To address diverse manufacturing requirements, specialized analyzers were incorporated. For engine wiring assemblies, modules assessed cable fastening integrity, hoop positioning, and clip engagement reliability. For gears, the analyzers evaluated the surface finish, contamination detection, and dimensional accuracy. These results were integrated into a manufacturing execution system (MES) through standardized protocols (OPC-UA, Modbus, Ethernet/IP). With 5.8M parameters, the model maintained a lightweight footprint (2 GB GPU memory, <20 ms inference per image), making it suitable for both edge and cloud deployment.

2.11. Validation Strategy

Industrial readiness was validated through multiple strategies. The adaptation ability of few-shots was tested using 20 independent episodes (K = 4, Q = 8) to simulate realistic production changes. The results were averaged across random seeds to confirm statistical reliability. Furthermore, predictions with confidence greater than 0.7 were analyzed separately to evaluate the system’s ability to support automated decision-making in production environments.

3. Results and Discussion

Performance of the proposed few-shot defect detection model was evaluated on publicly available surface-inspection datasets from Zenodo [34,44]. The proposed EfficientNet-based few-shot anomaly detection system demonstrates superior performance in both manufacturing domains. Table 3 presents comprehensive metrics, showing precision of 99.38% for wire components and 98.75% for gear components, achieving a system average of 99.07%, with consistent performance between the two tested domains. Statistical validation across 15 independent seeds confirms the significance of performance improvements, with all baseline comparisons showing

p < 0.01

and effect sizes ranging from medium to large (Cliff’s

δ = 0.45

–

0.72

), demonstrating consistent and substantial performance gains.

The system demonstrates encouraging cross-domain consistency within the tested domains, with only a 0.63% accuracy difference between manufacturing domains. The detection of wire components achieves perfect precision (100%) for defective components and perfect recall (100%) for normal components, with zero false positives. Gear component detection maintains balanced performance (98.75%) across all metrics. The five-shot learning configuration enables rapid adaptation while maintaining competitive computational efficiency, with an 18.2 ms inference time and a compact 23.8 MB model size.

3.1. Engine Wiring Anomaly Detection Results

The engine wiring assembly inspection system achieved superior performance with 99.38% accuracy, demonstrating the effectiveness of component-specific attention mechanisms for assembly quality verification. Figure 2 presents detailed performance metrics, while Figure 3 shows the confusion matrix analysis with perfect precision for defective component detection.

Figure 4 demonstrates the explainable AI capabilities through component-specific attention visualizations, highlighting the system’s ability to focus on critical assembly regions, including metallic clips, cable routing, and blue hoop positioning.

3.2. Gear Anomaly Detection Results

The gear manufacturing inspection system achieved 98.75% accuracy with balanced precision and recall, effectively identifying surface defects, oil contamination, and axis hole damage. Figure 5 shows the confusion matrix demonstrating reliable defect detection with minimal false-positive and false-negative rates. The multi-component analysis framework provides comprehensive quality assessment across critical gear manufacturing parameters, with component-specific attention mechanisms shown in Figure 6.

The gear manufacturing inspection results reveal systematic quality challenges across multiple defect categories. Surface scratches (21.9%) represent the most frequent single defect, typically resulting from inadequate handling procedures during manufacturing operations, compromising both aesthetic quality and long-term durability. Oil contamination defects (9.4%) stem from excessive lubrication or improper storage, requiring cleaning protocols and systematic lubrication procedure review. Axis hole damage (6.3%) represents the most critical category, requiring immediate rejection due to machining errors that compromise shaft alignment and torque transmission functionality. The predominance of multiple-defect occurrences (50.0%) indicates systemic quality control challenges, requiring comprehensive process reviews across material handling, machining operations, and inspection procedures.

3.3. Defect Pattern Recognition and Severity Analysis

Table 4 presents a comprehensive defect pattern analysis, revealing systematic quality issues requiring targeted corrective actions. The gear detection system identified multiple defects as the predominant issue (50.0%), while the wiring system revealed systematic cable fastening problems requiring immediate procedural review. Figure 7 shows the manufacturing inspection performance analysis, including a ROC-AUC evaluation, precision–recall optimization, and classification metrics. The consistency in performance across all evaluation criteria validates the system’s manufacturing readiness.

3.4. Quantitative Interpretability Framework

The explainable AI framework generates structured quality control reports with component-specific numerical assessments, addressing concerns about purely qualitative explainability validation. The system provides quantitative quality scores for critical manufacturing components, enabling objective decision-making beyond visual attention maps.

Engine Wiring Assembly Analysis: For engine wiring inspection, the framework evaluates assembly integrity through component-specific scoring. The cable fastening quality scored 0.483, below the 0.7 threshold, indicating potential fastening issues. Similarly, the blue hoop placement received a score of 0.459, also below the threshold, suggesting positioning concerns. Despite these individual component scores, the system correctly identified the overall wire harness configuration as determined with 98% confidence, demonstrating the ability to integrate multiple quality indicators for holistic assembly assessment. This high confidence enables automated acceptance decisions while flagging specific components for further monitoring.

Gear Manufacturing Analysis: For gear inspection, the framework demonstrated multi-dimensional quality assessment. The surface quality score of 0.651 indicated medium-severity surface scratches, while the oil contamination score of 0.723 confirmed acceptable cleanliness levels. The axis hole quality scored 0.970, reflecting excellent dimensional accuracy. Overall, the system achieved 99.6% confidence in defect classification with a Medium severity rating. This result highlights the framework’s ability to automate high-confidence quality decisions while simultaneously providing detailed guidance for corrective actions.

3.5. Confidence-Based Decision Making and Manufacturing Integration

The framework demonstrates sophisticated confidence calibration across different inspection scenarios.

High-Confidence Automated Decisions: For cases exceeding 95% confidence, the system reliably supports automated production outcomes. In the case of engine wiring inspection, the framework reached a confidence level of 98.5%, enabling automated acceptance while simultaneously flagging individual components for monitoring. Similarly, gear defect detection achieved 99.6% confidence, leading to automated rejection accompanied by specific corrective actions.

Confidence-Threshold Integration: The framework further incorporates a multi-level confidence-threshold mechanism. Individual component scores below 0.7 trigger alerts for preventive maintenance. In contrast, overall assembly scores exceeding 95% confidence authorize automated production decisions. Intermediate confidence levels between 70–95% are routed to manual expert review, ensuring balanced decision-making between automation and human oversight.

3.6. Automated Root Cause Analysis and Process Integration

Beyond anomaly detection, the system provides process-specific corrective guidance to support manufacturing quality improvement. For gear manufacturing, the root cause of defects was identified as poor handling, inadequate surface protection, or machining issues, with a targeted recommendation to inspect the surface finish process and handling procedures to prevent scratching. For engine wiring assembly, component-level assessments highlighted cable fastening and blue hoop placement as areas requiring attention, despite the overall acceptable configuration.

Manufacturing Integration Capabilities: The framework extends its functionality to manufacturing integration by providing structured outputs that support quality management systems. Severity classification enables automatic categorization into None, Medium, High, or Critical levels, guiding resource prioritization. Component-level monitoring produces quantitative quality scores that facilitate predictive maintenance scheduling. In addition, the system delivers process-specific recommendations that support targeted corrective actions within the workflow. Finally, high-confidence predictions enable confidence-based automation, allowing for quality gates to be enforced with minimal manual intervention.

The structured output format supports direct MES integration through standardized quality reporting protocols, addressing the practical deployment requirements identified by reviewers.

3.7. Discussion

3.7.1. Performance Comparison Against Baseline Methods

A comprehensive set of baseline methods, covering the principal anomaly detection paradigms, was selected in accordance with established evaluation protocols for industrial anomaly detection [45]. Our evaluation encompasses three categories of baseline methods: First, we include traditional unsupervised approaches such as PaDiM and PatchCore, which rely on statistical modeling and memory-based patch analysis commonly used in manufacturing inspection. Second, we consider advanced unsupervised methods, including FastFlow, CFA, and DRAEM, which leverage modern architectures such as normalizing flows, cross-attention mechanisms, and transformer-based reconstruction. Finally, we evaluate supervised few-shot learning methods, specifically ProtoNet and FEAT, to enable direct comparison with our proposed few-shot framework under identical supervision constraints.

The traditional methods provide established benchmarks, while advanced unsupervised approaches test against state-of-the-art techniques. FastFlow leverages normalizing flows for likelihood-based anomaly scoring, CFA employs cross-attention for multi-scale feature fusion, and DRAEM uses transformer-based reconstruction with synthetic anomaly generation. However, these methods present practical limitations for manufacturing deployment: (1) FastFlow requires extensive hyperparameter tuning incompatible with few-shot constraints, (2) CFA lacks component-specific analysis crucial for automotive inspection, and (3) DRAEM’s computational overhead (>50 ms inference) violates real-time requirements.

For few-shot comparison, we implemented ProtoNet as the seminal prototype-based method and FEAT as state-of-the-art feature adaptation approach, both using identical K = 5 episodic training on our datasets. This selection prioritizes methods with demonstrated industrial applicability while representing different technical paradigms suitable for manufacturing quality control, enabling fair comparison across unsupervised and supervised approaches under identical experimental conditions. Table 5 presents comprehensive performance comparisons across all baseline methods.

Cross-domain consistency between the two evaluated components represents an important consideration for industrial deployment. Our approach demonstrates minimal cross-domain degradation of only 0.63% between wiring and gear inspection tasks, outperforming both traditional unsupervised baselines (PaDiM: 29.8% degradation; PatchCore: 3.8% degradation) and advanced methods (FastFlow: 4.5% degradation; CFA: 0.6% degradation). While this consistency suggests potential for broader applicability, validation across additional automotive components and manufacturing conditions is needed to confirm the architecture’s effectiveness for diverse industrial applications without domain-specific modifications.

Despite strong results (engine-wiring: 99.4% accuracy; gears: 98.8%; ROC-AUC ≥ 0.9995), our evaluation acknowledges several limitations. The study spans two automotive datasets (CSEM-MISD gear subset: 7700 images; AutoVI wiring: 599 images) under controlled imaging conditions, so generalization to other components, camera configurations, or harsher shop-floor environments requires validation. Few-shot analysis centers on K = 5 across 20 episodes; broader K values, class-incremental learning, and heavy distribution shifts warrant investigation. Real-time performance was measured on an NVIDIA RTX 3080 GPU (NVIDIA Corporation, Santa Clara, CA, USA) (∼18.2 ms, 1.8 GB); embedded edge benchmarks on NVIDIA Jetson Xavier (NVIDIA Corporation, Santa Clara, CA, USA); embedded edge benchmarks ) and end-to-end pipeline latency (I/O + MES integration) represent ongoing work. Explainability validation remained qualitative; quantitative human-in-the-loop studies and inter-rater agreement assessment constitute future directions, alongside 3D sensing integration and longitudinal reliability analysis.

3.7.2. Ablation Studies

Systematic ablation studies quantify each architectural component’s contribution to overall system performance, presented in Table 6. The comprehensive analysis reveals the critical importance of prototype-based learning, specialized component analyzers, and attention mechanisms for achieving superior manufacturing inspection performance.

The ablation studies confirm the critical contributions of each architectural component. Prototype-based learning emerges as the most essential element, with its removal causing a substantial −4.52% average performance degradation, demonstrating its fundamental role in few-shot adaptation to manufacturing patterns. Component analyzers provide specialized manufacturing insights with −2.79% impact when removed, validating the importance of domain-specific analysis capabilities. Attention mechanisms contribute significantly to defect localization and feature selection, showing −2.21% impact on overall performance. The EfficientNet backbone demonstrates a moderate but consistent advantage over traditional ResNet50 architecture, contributing −1.17% improvement to system performance.

Few-shot configuration analysis confirms K = 5 as optimal, providing the best balance between performance and adaptation efficiency. Lower configurations (K = 1, K = 2) show insufficient sample representation with performance degradations of −6.35% and −4.02%, respectively. Higher configurations (K = 8) exhibit diminishing returns with minimal performance improvement (−0.32%) while requiring additional computational resources and slower adaptation times.

3.7.3. Few-Shot Configuration Robustness

Extended evaluation across

K \in {1, 2, 5, 8}

with class-incremental scenarios shows optimal performance at

K = 5

(99.1 ± 2.8% accuracy), with diminishing returns beyond

K = 8

(99.2 ± 3.1%). Distribution shift robustness testing with 20% lighting variation maintains 96.8 ± 3.5% accuracy, demonstrating acceptable degradation for industrial deployment.

3.7.4. Methodological Considerations

While our framework demonstrates effectiveness with end-to-end deep learning, hybrid approaches combining classical preprocessing with deep learning represent a valuable alternative direction. Recent work demonstrates that edge detection and morphological operations can enhance deep feature extraction for surface defect detection [11], providing complementary benefits to end-to-end deep learning approaches. Future work could explore integrating such preprocessing techniques with our prototype-based learning framework to evaluate potential synergistic effects.

3.7.5. Robustness Limitations

Current evaluations use controlled imaging conditions. Industrial stress testing with

\pm 30 %

lighting variation, vibration noise, and dust contamination shows 8–12% accuracy degradation, indicating the need for environmental adaptation strategies in harsh manufacturing environments.

3.8. Explainability, Deployment, and Industrial Integration

The proposed framework integrates explainable AI (XAI) techniques with industrial deployment strategies to ensure both high detection performance and practical usability in manufacturing environments. The XAI module generates component-specific assessments by combining Grad-CAM++, LIME, and SHAP visualizations with quantitative quality scores. Confidence intervals ranging between 0.21 and 0.29 provide interpretable measures of detection certainty, enabling risk-based quality management decisions. Objective pass/fail thresholds, set at 0.7, support automated decision-making while still producing granular defect scores that aid in trend analysis and predictive maintenance.

Defect root-cause analysis is tightly integrated with manufacturing parameters, linking detected anomalies to process-level issues. This enables corrective measures to be prioritized according to severity. As summarized in Table 7, critical severity defects are automatically rejected, high-severity issues trigger immediate process reviews, and medium- to low-severity cases support scheduled maintenance and long-term monitoring. Such structured prioritization facilitates efficient resource allocation and continuous quality improvement.

3.9. Deployment and System Integration

The proposed system supports both edge- and cloud-based deployment, ensuring seamless integration with manufacturing execution systems (MESs) and flexibility across diverse industrial settings. On standard industrial hardware (RTX 3080, 10 GB GPU), the framework achieves an average inference time of 18.2 ms, processing up to 60 components per minute—surpassing the <20 ms requirement for real-time automotive inspection. With only 5.8 M parameters (23.8 MB), the model remains compact enough for embedded edge devices without compromising accuracy.

Edge Deployment: Lightweight devices such as the NVIDIA Jetson Xavier NX (8-core ARM CPU, 384-core GPU, 4–8 GB RAM) ensure low-latency inspection and minimal network dependency, making them ideal for inline inspection on the shop floor. While direct Jetson benchmarking was not performed in this study, we explicitly identify this as a limitation, and it is a subject for future work.

Cloud Deployment: In contrast, cloud execution leverages scalable GPU instances (e.g., AWS p3.2xlarge, Azure NC6s_v3) within Docker/Kubernetes environments, enabling centralized model management, large-scale defect analysis, and simplified updates across production lines.

Edge deployment excels in latency and network independence, while cloud deployment provides superior scalability and update flexibility (Figure 8). Table 8 complements this by presenting a structured side-by-side comparison of the two strategies, highlighting their suitability for different industrial contexts.

System Infrastructure: Reliable imaging requires ≥ 5 MP, 30 fps industrial cameras with controlled lighting. IP65-rated computing platforms provide environmental resilience, while high-throughput networks enable secure real-time data transfer. APIs ensure MES compatibility via OPC-UA, Modbus, and Ethernet/IP. Audit trails, dashboards, and automated alerts support traceability and compliance.

Role-Specific Interfaces: The system is designed for usability across different levels of manufacturing operations. Quality control operators receive intuitive real-time defect visualizations with pass/fail indicators and confidence scores. Manufacturing engineers access detailed defect analyses, root-cause reports, and statistical process control charts for process optimization. At the management level, plant executives utilize KPI dashboards, cost-impact reports, and ROI assessments to align inspection performance with strategic manufacturing objectives.

3.9.1. Manufacturing Integration Guidelines

Camera Specifications: ≥5 MP resolution, 30 fps, industrial-grade lighting (LED arrays, 3200–5600 K).

Data Flow: OPC-UA for real-time quality data, Modbus for legacy PLCs, Ethernet/IP for Allen–Bradley systems.

Parallelism: Up to four inspection stations per edge device with round-robin scheduling.

3.9.2. Cost-Benefit Analysis

Type I errors (false rejects) cost USD 12/part in wasted materials, while Type II errors (missed defects) cost USD 850/part in downstream failures. At 0.7 confidence threshold, expected cost per 1000 parts is USD 48 (4 false rejects × USD 12), compared to USD 127 with manual inspection (15% miss rate × 10 defects × USD 850).

3.10. Adaptive Learning and Real-Time Implementation

The framework incorporates few-shot learning to rapidly adapt to new defect types with as few as four or five examples. Base models trained on 4180 gear images and 143 wiring images enable efficient cross-domain transfer, reducing initial training requirements. The K = 5 configuration was shown to be optimal, balancing precision and adaptability while limiting performance degradation to just 0.63%. Continuous adaptation is supported through online learning, and statistical validation across multiple random seeds ensures robust deployment readiness.

In real-time conditions, the optimized inference pipeline achieves sub-20 ms latency, maintaining 78% GPU utilization and a memory footprint of only 1.8 GB. This translates to 60 components per minute throughput, meeting automotive assembly line requirements. The system also supports parallel inspection across multiple stations with centralized data coordination. Immediate alerts trigger automated actions, such as part rejection or line stoppage, while historical data logs enable predictive maintenance and compliance reporting.

In the future, the integration of 3D imaging modalities is expected to further enhance defect detection. By capturing dimensional variations, surface topology, and geometric deformations, 3D point cloud data will enable comprehensive surface integrity assessment and volumetric defect quantification at submillimeter precision, extending the framework’s applicability to increasingly complex automotive components.

4. Conclusions

This study introduces an explainable few-shot anomaly detection framework for intelligent automotive quality control, integrating EfficientNet-based feature extraction, prototype learning, and component-specific attention mechanisms. The system demonstrated state-of-the-art accuracy while maintaining real-time performance and lightweight deployment requirements, making it suitable for both edge and cloud integration in manufacturing environments. Importantly, the framework extends beyond detection by generating interpretable outputs, including component-level quality scores, root cause analysis, and severity-based corrective recommendations, thus bridging the gap between AI decision-making and actionable manufacturing insights.

This research confirms that prototype-based learning and attention mechanisms are critical to achieving robust few-shot adaptation, while cross-domain validation highlights the scalability of the approach across wiring and gear inspection tasks with minimal performance degradation. By supporting integration with existing manufacturing execution systems through standardized industrial protocols, the framework offers a practical path toward next-generation quality control systems.

Future work will focus on large-scale validation with diverse industrial datasets, human-in-the-loop evaluation of explainability, and the integration of 3D imaging modalities to improve defect quantification with submillimeter precision. Continued development in adaptive learning and real-time optimization will further extend the applicability of the system to dynamic, high-mix automotive production environments. Although the results demonstrate its effectiveness in wiring and gear inspection, broader validation is needed across additional automotive components, manufacturing environments, and imaging conditions to confirm its generalization capabilities across diverse industrial contexts.

Author Contributions

Conceptualization, S.C.M., D.T.A. and Y.Y.; Data curation, D.T.A. and S.C.M.; Formal analysis, D.T.A.; Funding acquisition, Y.Y.; Investigation, S.C.M. and D.T.A.; Methodology, S.C.M., D.T.A., S.H., F.S. and Y.Y.; Project administration, Y.Y.; Resources, Y.Y., S.H., F.S. and D.T.A.; Software, D.T.A. and S.C.M.; Supervision, Y.Y.; Validation, S.C.M., D.T.A., S.H., F.S. and Y.Y.; Visualization, S.C.M., D.T.A., S.H. and F.S.; Writing—original draft, S.C.M. and D.T.A.; Writing—review and editing, S.C.M., D.T.A., S.H., F.S. and Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Ford Company Endowed Professorship at North Carolina A&T State University.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest. This work was partially supported by the Ford Company Endowed Professorship at North Carolina A&T State University, which provided general funding used to support graduate students. The funder was not involved in the study design, data collection, analysis, interpretation, writing, or decision to publish.

References

Lupi, F.; Biancalana, M.; Rossi, A.; Lanzetta, M. A Framework for Flexible and Reconfigurable Vision Inspection Systems. Int. J. Adv. Manuf. Technol. 2023, 129, 871–897. [Google Scholar] [CrossRef]
Lupi, F.; Freitas, N.; Arvana, M.; Rocha, A.D.; Maffei, A.; Barata, J.; Lanzetta, M. Next-Generation Vision Inspection Systems: A Pipeline from 3D Model to ReCo File. J. Intell. Manuf. 2024, 36, 4711–4734. [Google Scholar] [CrossRef]
Rio-Torto, I.; Campaniço, A.T.; Pinho, P.; Filipe, V.; Teixeira, L.F. Hybrid Quality Inspection for the Automotive Industry: Replacing the Paper-Based Conformity List through Semi-Supervised Object Detection and Simulated Data. Appl. Sci. 2022, 12, 5687. [Google Scholar] [CrossRef]
Azamfirei, V.; Lagrosen, Y.; Granlund, A. Harmonising design and manufacturing: A quality inspection perspective. In Proceedings of the 2021 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA ), Vasteras, Sweden, 7–10 September 2021; pp. 1–7. [Google Scholar] [CrossRef]
Khan, A.; Mineo, C.; Dobie, G.; MacLeod, C.; Pierce, G. Vision Guided Robotic Inspection for Parts in Manufacturing and Remanufacturing Industry. J. Remanuf. 2021, 11, 49–70. [Google Scholar] [CrossRef]
Jafari-Tabrizi, A.; Gruber, D.P.; Gams, A. Exploiting Image Quality Measure for Automatic Trajectory Generation in Robot-Aided Visual Quality Inspection. Int. J. Adv. Manuf. Technol. 2024, 132, 4885–4901. [Google Scholar] [CrossRef]
Mazzetto, M.; Teixeira, M.; Rodrigues, É.O.; Casanova, D. Deep Learning Models for Visual Inspection on Automotive Assembling Line. Int. J. Adv. Eng. Res. Sci. 2020, 7, 473–481. [Google Scholar] [CrossRef]
Ren, J.; Zhang, H.; Yue, M. YOLOv8-WD: Deep Learning-Based Detection of Defects in Automotive Brake Joint Laser Welds. Appl. Sci. 2025, 15, 1184. [Google Scholar] [CrossRef]
Cheamsiri, W.; Jitpattanakul, A.; Muneesawang, P.; Wongpatikaseree, K.; Hnoohom, N. Enhancing Quality Control: A Study on AI and Human Performance in Flip Chip Defect Detection. IEEE Access 2024, 12, 197840–197855. [Google Scholar] [CrossRef]
El Hachem, C.; Perrot, G.; Painvin, L.; Couturier, R. Automation of Quality Control in the Automotive Industry Using Deep Learning Algorithms. In Proceedings of the 2021 International Conference on Computing, Communication, and Control (ICCCR), Shanghai, China, 8–10 January 2021. [Google Scholar] [CrossRef]
Ding, F.; Huang, G.; Wang, C.; Wang, J.; Zhang, X. Depth Edge Detection by Image-Based Smoothing and Morphological Operations. Eng. Sci. Technol. Int. J. 2016, 19, 399–407. [Google Scholar] [CrossRef]
Chen, X.; Hu, X.; Huang, Y.; Jiang, H.; Ji, W.; Jiang, Y.; Ji, Y.; Liu, B.; Liu, H.; Li, X.; et al. Deep Learning-Based Software Engineering: Progress, Challenges, and Opportunities. Sci. China Inf. Sci. 2025, 68, 111102. [Google Scholar] [CrossRef]
Sundaram, S.; Zeid, A. Artificial Intelligence-Based Smart Quality Inspection for Manufacturing. Micromachines 2023, 14, 570. [Google Scholar] [CrossRef]
Paraschos, P.D.; Koulinas, G.K.; Koulouriotis, D.E. Reinforcement Learning for Combined Production-Maintenance and Quality Control of a Manufacturing System with Deterioration Failures. J. Manuf. Syst. 2020, 56, 470–483. [Google Scholar] [CrossRef]
Liu, R.; Nageotte, F.; Zanne, P.; de Mathelin, M.; Dresp-Langley, B. Deep Reinforcement Learning for the Control of Robotic Manipulation: A Focussed Mini-Review. Robotics 2021, 10, 22. [Google Scholar] [CrossRef]
Levine, S.; Wagener, N.; Abbeel, P. Learning Contact-Rich Manipulation Skills with Guided Policy Search. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015. [Google Scholar] [CrossRef]
Sekkat, H.; Tigani, S.; Saadane, R.; Chehri, A. Vision-Based Robotic Arm Control Algorithm Using Deep Reinforcement Learning for Autonomous Objects Grasping. Appl. Sci. 2021, 11, 7917. [Google Scholar] [CrossRef]
Matulis, M.; Harvey, C. A Robot Arm Digital Twin Utilising Reinforcement Learning. Comput. Graph. 2021, 95, 106–114. [Google Scholar] [CrossRef]
Brito, T.; Queiroz, J.; Piardi, L.; Fernandes, L.A.; Lima, J.; Leitão, P. A Machine Learning Approach for Collaborative Robot Smart Manufacturing Inspection for Quality Control Systems. Procedia Manuf. 2020, 51, 11–18. [Google Scholar] [CrossRef]
Sheng, J.; Tang, Y.; Xu, S.; Tan, F.; Hou, R.; Xu, T. A Stable Learning-Based Method for Robotic Assembly With Motion and Force Measurements. IEEE Trans. Ind. Electron. 2024, 71, 11093–11103. [Google Scholar] [CrossRef]
Papavasileiou, A.; Michalos, G.; Makris, S. Quality Control in Manufacturing—Review and Challenges on Robotic Applications. Int. J. Comput. Integr. Manuf. 2025, 38, 79–115. [Google Scholar] [CrossRef]
Macaulay, M.O.; Shafiee, M. Machine Learning Techniques for Robotic and Autonomous Inspection of Mechanical Systems and Civil Infrastructure. Auton. Intell. Syst. 2022, 2, 8. [Google Scholar] [CrossRef]
Rakholia, R.; Suárez-Cetrulo, A.L.; Singh, M.; Carbajo, R.S. Advancing Manufacturing Through Artificial Intelligence: Current Landscape, Perspectives, Best Practices, Challenges, and Future Direction. IEEE Access 2024, 12, 131621–131637. [Google Scholar] [CrossRef]
Arinez, J.F.; Chang, Q.; Gao, R.X.; Xu, C.; Zhang, J. Artificial Intelligence in Advanced Manufacturing: Current Status and Future Outlook. J. Manuf. Sci. Eng. 2020, 142, 110804. [Google Scholar] [CrossRef]
Jain, D. Artificial Intelligence in Quality Control Systems: A Cross-Industry Analysis of Applications, Benefits, and Implementation Frameworks. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 2024, 10, 1321–1333. [Google Scholar] [CrossRef]
Chouchene, A.; Carvalho, A.; Lima, T.M.; Charrua-Santos, F.; Osório, G.J.; Barhoumi, W. Artificial Intelligence for Product Quality Inspection toward Smart Industries: Quality Control of Vehicle Non-conformities. In Proceedings of the 2020 9th International Conference on Industrial Technology and Management (ICITM), Oxford, UK, 11–13 February 2020; pp. 127–131. [Google Scholar] [CrossRef]
Lopez-Hawa, H.; VanPelt, A.; Emmanuel, S.; Yihun, Y. Automated Scanning Techniques Using UR5. J. Robot. 2019, 2019, 5304267. [Google Scholar] [CrossRef]
Gupta, G. The Revolutionary Impact of AI in the Automotive Industry. Int. J. Sci. Res. Arch. 2025, 14, 1506–1512. [Google Scholar] [CrossRef]
Doshi-Velez, F.; Kim, B. Towards a Rigorous Science of Interpretable Machine Learning. arXiv 2017, arXiv:1702.08608. [Google Scholar] [CrossRef]
Hoffman, R.R.; Mueller, S.T.; Klein, G.; Litman, J. Metrics for Explainable AI: Challenges and Prospects. arXiv 2018, arXiv:1812.04608. [Google Scholar] [CrossRef]
Newman, T.S.; Jain, A.K. A Survey of Automated Visual Inspection. Comput. Vis. Image Underst. 1995, 61, 231–262. [Google Scholar] [CrossRef]
Heskes, T. Practical Confidence and Prediction Intervals. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Denver, CO, USA, 2–5 December 1996; MIT Press: Cambridge, MA, USA, 1997; pp. 176–182. Available online: https://proceedings.neurips.cc/paper_files/paper/1996/file/7940ab47468396569a906f75ff3f20ef-Paper.pdf (accessed on 8 October 2025).
Montgomery, D.C. Introduction to Statistical Quality Control, 8th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2019. [Google Scholar]
Honzátko, D.; Türetken, E.; Bigdeli, S.A.; Fua, P.; Dunbar, L.A. CSEM-MISD: CSEM’s Multi-Illumination Surface Defect Detection Dataset. Zenodo 2021, 0.2.0, 1–5. [Google Scholar] [CrossRef]
Defard, T.; Setkov, A.; Loesch, A.; Audigier, R. PaDiM: A Patch Distribution Modeling Framework for Anomaly Detection and Localization. In Proceedings of the International Conference on Pattern Recognition (ICPR), Virtual Event, 10–11 January 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 475–489. [Google Scholar] [CrossRef]
Roth, K.; Pemula, L.; Zepeda, J.; Schölkopf, B.; Brox, T.; Gehler, P. Towards Total Recall in Industrial Anomaly Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 14318–14328. [Google Scholar] [CrossRef]
Snell, J.; Swersky, K.; Zemel, R. Prototypical Networks for Few-Shot Learning. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 4077–4087. Available online: https://openreview.net/forum?id=B1-Hhnslg (accessed on 8 October 2025).
Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching Networks for One Shot Learning. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS)*, Barcelona, Spain, 5–10 December 2016; Volume 29, pp. 3630–3638. Available online: https://papers.nips.cc/paper/6385-matching-networks-for-one-shot-learning (accessed on 8 October 2025).
Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.S.; Hospedales, T.M. Learning to Compare: Relation Network for Few-Shot Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 1199–1208. [Google Scholar] [CrossRef]
Ye, H.J.; Hu, H.; Zhan, D.C.; Sha, F. Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8808–8817. [Google Scholar] [CrossRef]
Yu, J.; Zheng, Y.; Wang, X.; Li, W.; Wu, Y.; Zhao, R.; Wu, L. FastFlow: Unsupervised Anomaly Detection and Localization via 2D Normalizing Flows. In Proceedings of the AAAI Conference on Artificial Intelligence, Vitual, 22 February–1 March 2022; Volume 35, pp. 3395–3403. [Google Scholar] [CrossRef]
Lee, S.; Lee, S.; Song, B.C. CFA: Coupled-Hypersphere-Based Feature Adaptation for Target-Oriented Anomaly Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 3539–3548. [Google Scholar] [CrossRef]
Zavrtanik, V.; Kristan, M.; Skočaj, D. DRAEM—A Discriminatively Trained Reconstruction Embedding for Surface Anomaly Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 8330–8339. [Google Scholar] [CrossRef]
Carvalho, P.; Pruvost, A.; Lafu, M.; Ganadel, Y.; Leblanc, A. The Automotive Visual Inspection Dataset (AutoVi): A Genuine Industrial Production Dataset for Unsupervised Anomaly Detection. Data 2022, 7, 137. [Google Scholar] [CrossRef]
Bergmann, P.; Fauser, M.; Sattlegger, D.; Steger, C. MVTec AD—A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 9592–9600. [Google Scholar] [CrossRef]

Figure 1. EfficientNet few-shot anomaly detection: system overview.

Figure 2. Engine wiring defect detection performance showing ROC-AUC analysis, precision–recall curves, and threshold optimization. The system achieves near-perfect ROC-AUC (0.9997), with optimal threshold selection for manufacturing deployment.

Figure 3. Confusion matrix analysis for engine wiring defect inspection, demonstrating 99.38% overall accuracy with perfect precision (100%) for defective component detection. The balanced performance across normal and defective classes validates system reliability for industrial deployment.

Figure 4. Explainable AI visualization for engine wiring defect inspection, showing component-specific attention mechanisms. From left to right: original image, metallic clips attention (fastening point analysis), cables attention (routing verification), and blue hoop attention (positioning assessment). The attention maps enable precise defect localization and provide interpretable insights designed for quality control personnel.

Figure 5. Confusion matrix for gear quality inspection showing 98.75% accuracy with balanced precision and recall (0.9875 each). The minimal false-positive and false-negative rates demonstrate reliable defect detection suitable for critical manufacturing applications.

Figure 6. Multi -component attention visualization for gear quality inspection. (a) Original gear image, (b) center hole attention (axis precision), (c) gear teeth attention (manufacturing accuracy), and (d) surface attention (quality assessment). The attention mechanisms enable component-specific quality scoring and defect localization.

Figure 7. Gear manufacturing inspection performance analysis, including ROC-AUC evaluation (0.9995), precision–recall optimization, and classification metrics. The consistent performance across all evaluation criteria validates the system’s manufacturing readiness.

Figure 8. Trade-off comparison between edge and cloud deployment strategies, highlighting relative suitability across latency, scalability, hardware requirements, and integration for real-time automotive quality control.

Table 1. Dataset distribution for experimental evaluation.

Dataset	Component	Training (Normal)	Testing (Normal)	Testing (Defective)
CSEM-MISD	Gears	2420	3080	1100
AutoVI	Engine Wiring	456	51	92

Table 2. Summary of baseline anomaly detection methods with supervision requirements.

Method	Core Approach	Supervision	Implementation Highlights
Unsupervised Methods
PaDiM	Gaussian modeling	Unsup	Wide ResNet-50-2; dimensionality reduction (512→50); Mahalanobis distance scoring
PatchCore	Coreset-based memory	Unsup	Wide ResNet-50-2; 1% subsampling; k-NN (k = 1) Euclidean distance
Few-Shot Supervised Methods
ProtoNet	Prototype learning	Few-shot = 5	Euclidean distance to class prototypes; episodic training
MatchingNet	Attention mechanisms	Few-shot K = 5	Full Context Embeddings; attentional classifier
RelationNet	Relation learning	Few-shot K = 5	CNN + relation module; learnable distance metric
FEAT	Feature adaptation	Few-shot K = 5	Set-to-set functions; feature transformation
Proposed	Prototype + Attention	Few-shot K = 5	EfficientNet + prototypes + component-specific attention

Table 3. System performance metrics across manufacturing components.

Metric	Wire Components	Gear Components	System Average
Classification Performance
Accuracy	99.4%	98.8%	99.1%
Precision	99.4%	98.8%	99.1%
Recall	100.0%	98.8%	99.4%
F1-Score	99.7%	98.8%	99.25%
False Positive Rate	0%	1.2%	0.6%
Real-Time Performance
Inference Time	18.2 ms	18.2 ms	18.2 ms
Production Rate	60 comp/min	60 comp/min	60 comp/min
GPU Memory Usage	1.8 GB	1.8 GB	1.8 GB
GPU Utilization	78%	78%	78%
Manufacturing Req.	<20 $ms$ ✓	<20 $ms$ ✓	<20 $ms$ ✓
System Configuration
Training Configuration	5-shot	5-shot	5-shot
Model Size	23.8 MB	23.8 MB	23.8 MB

Note: ✓ indicates requirement met.

Table 4. Comprehensive defect pattern recognition and severity analysis.

System	Defect Type	Instances	Percentage	Severity	Root Cause Analysis	Corrective Action
Gear Detection	Surface Scratches	35	21.9%	Medium	Poor handling, inadequate surface protection	Inspect surface finish process and handling procedures
	Oil Pollution	15	9.4%	Low	Improper storage, excessive lubrication	Clean surface and review lubrication/storage procedures
	Axis Hole Damage	10	6.3%	Critical	Machining error, improper tooling	Reject part—affects critical functionality
	Multiple Defects	80	50.0%	High	Multiple process issues requiring review	Complete quality review—address multiple issues
Wiring Detection	Cable Not in Clip	80	50.0%	High	Assembly error or loose connection	Ensure cable securely fastened into metallic clip
	Blue Hoop Misplaced	25	15.6%	Medium	Incorrect assembly procedure	Reposition blue hoop between metallic clips
	Multiple Assembly Defects	54	33.8%	Critical	Multiple assembly errors	Fix both cable fastening and hoop positioning

Table 5. Performance comparison with state-of-the-art baseline methods (15 seeds).

Method	Supervision	Engine Wiring			Gear Components
Method	Supervision	Acc (%) ± CI	ROC-AUC	F1	Acc (%) ± CI	ROC-AUC	F1
Traditional Unsupervised Methods
PaDiM	Unsup	93.33 ± 4.2	0.935	0.903	63.5 ± 5.8 ***	0.627	0.439
PatchCore	Unsup	89.6 ± 3.8	0.922	0.896	93.4 ± 4.1 ***	0.920	0.900
Advanced Unsupervised Methods
FastFlow	Unsup	94.2 ± 3.5	0.948	0.921	89.7 ± 4.3 **	0.912	0.885
CFA	Unsup	91.8 ± 3.9	0.928	0.894	91.2 ± 3.7 **	0.925	0.903
DRAEM	Unsup	93.6 ± 3.6	0.941	0.915	88.4 ± 4.5 ***	0.908	0.871
Few-Shot Supervised Methods
ProtoNet	K = 5	91.2 ± 4.1	0.912	0.885	89.3 ± 3.8 ***	0.901	0.876
MatchingNet	K = 5	88.7 ± 4.3	0.898	0.871	86.8 ± 4.2 ***	0.888	0.845
RelationNet	K = 5	90.5 ± 3.7	0.905	0.879	88.9 ± 4.0 ***	0.895	0.869
FEAT	K = 5	92.8 ± 3.4	0.925	0.901	91.2 ± 3.6 **	0.915	0.892
Proposed	K = 5	99.4 ± 2.8	0.9997	0.997	98.8 ± 3.1	0.9995	0.988
Best Improvement		+5.2%	+0.052	+0.076	+7.6%	+0.085	+0.085

Statistical notes: *** p < 0.001; ** p < 0.01 vs. proposed method (Mann–Whitney U test). All large effect sizes (Cliff’s

δ

> 0.45). CI = 95% confidence interval. Implementation Details: All methods use 224 × 224 input with ImageNet normalization. Traditional: PaDiM, PatchCore (Wide ResNet-50-2). Advanced: FastFlow, CFA (ResNet-18), DRAEM (Custom Transformer). Few-shot: ProtoNet, MatchingNet, RelationNet, FEAT (ResNet-18, K = 5, Q = 8). Proposed: EfficientNet-B0, K = 5, Q = 8.

Table 6. Component-wise wise ablation study and few-shot configuration analysis.

Component Configuration	Wire Accuracy (%)	Gear Accuracy (%)	Average Impact (%)	Impact Analysis
Full Model	99.38	98.75	—	Baseline performance
No Prototypes	94.23	93.15	−4.52	Prototype learning critical
No Component Analyzers	95.67	94.43	−2.79	Specialized analysis valuable
No Attention Mechanisms	96.45	95.87	−2.21	Attention mechanisms important
ResNet50 Backbone	97.89	97.12	−1.17	EfficientNet superiority
Few-Shot Learning Configuration Analysis
K = 1	92.3	91.7	−6.35	Insufficient samples
K = 2	95.8	94.9	−4.02	Limited representation
K = 5 (Optimal)	99.38	98.75	—	Best balance
K = 8	98.9	98.8	−0.32	Diminishing returns

Note: Comprehensive ablation study results demonstrate the critical importance of each architectural component. Component-wise analysis reveals prototype-based learning as the most critical element (−4.52% average impact when removed), followed by specialized component analyzers (−2.79%) and attention mechanisms (−2.21%). EfficientNet backbone provides moderate advantage over ResNet50 (−1.17% impact). Few-shot configuration analysis confirms K = 5 as optimal, providing superior performance with efficient adaptation, while K = 8 shows diminishing returns with minimal improvement.

Table 7. Severity-based defect categorization and corrective measures.

Severity Level	Example Defects	Action	Manufacturing Impact
Critical	Gear axis hole damage; multiple wiring failures	Immediate rejection	Affects functionality, safety risk
High	Cable not clipped; multiple gear defects	Priority investigation	Process review and adjustment
Medium	Blue hoop misplacement; surface scratches	Scheduled maintenance	Moderate quality degradation
Low	Oil contamination (minor)	Trend monitoring	Minimal impact, documentation

Table 8. Comparison of edge and cloud deployment strategies.

Aspect	Edge Deployment	Cloud Deployment
Latency	Real-time (18.2 ms), network independent	Network-dependent, higher latency
Scalability	Limited to device capacity	Easily scales across factories/lines
Update Flexibility	Manual local updates	Centralized updates, seamless rollout
Hardware Example	Jetson Xavier NX (ARM CPU, GPU, 4–8 GB RAM)	AWS p3.2xlarge/Azure NC6s_v3
Best Use Case	Inline inspection, immediate feedback	Multi-line analysis, predictive analytics

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mawah, S.C.; Aga, D.T.; Hatefi, S.; Smith, F.; Yihun, Y. Explainable Few-Shot Anomaly Detection for Real-Time Automotive Quality Control. Processes 2025, 13, 3238. https://doi.org/10.3390/pr13103238

AMA Style

Mawah SC, Aga DT, Hatefi S, Smith F, Yihun Y. Explainable Few-Shot Anomaly Detection for Real-Time Automotive Quality Control. Processes. 2025; 13(10):3238. https://doi.org/10.3390/pr13103238

Chicago/Turabian Style

Mawah, Safeh Clinton, Dagmawit Tadesse Aga, Shahrokh Hatefi, Farouk Smith, and Yimesker Yihun. 2025. "Explainable Few-Shot Anomaly Detection for Real-Time Automotive Quality Control" Processes 13, no. 10: 3238. https://doi.org/10.3390/pr13103238

APA Style

Mawah, S. C., Aga, D. T., Hatefi, S., Smith, F., & Yihun, Y. (2025). Explainable Few-Shot Anomaly Detection for Real-Time Automotive Quality Control. Processes, 13(10), 3238. https://doi.org/10.3390/pr13103238

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Explainable Few-Shot Anomaly Detection for Real-Time Automotive Quality Control

Abstract

1. Introduction

2. Methodology

2.1. Explainable AI Framework Design and Theoretical Foundation

2.2. System Architecture Overview

2.3. System Architecture and Data Flow

Architectural Design Principles

2.4. AI Training, Component-Specific Analysis, and Explainability

2.5. Datasets and Preprocessing

2.6. Baseline Methods

2.7. Evaluation Metrics

2.8. Implementation Details

2.9. Statistical Validation

2.10. Component-Specific Analysis and Integration

2.11. Validation Strategy

3. Results and Discussion

3.1. Engine Wiring Anomaly Detection Results

3.2. Gear Anomaly Detection Results

3.3. Defect Pattern Recognition and Severity Analysis

3.4. Quantitative Interpretability Framework

3.5. Confidence-Based Decision Making and Manufacturing Integration

3.6. Automated Root Cause Analysis and Process Integration

3.7. Discussion

3.7.1. Performance Comparison Against Baseline Methods

3.7.2. Ablation Studies

3.7.3. Few-Shot Configuration Robustness

3.7.4. Methodological Considerations

3.7.5. Robustness Limitations

3.8. Explainability, Deployment, and Industrial Integration

3.9. Deployment and System Integration

3.9.1. Manufacturing Integration Guidelines

3.9.2. Cost-Benefit Analysis

3.10. Adaptive Learning and Real-Time Implementation

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI