Next Article in Journal
YOLO-WL: A Lightweight and Efficient Framework for UAV-Based Wildlife Detection
Previous Article in Journal
VTC-Net: A Semantic Segmentation Network for Ore Particles Integrating Transformer and Convolutional Block Attention Module (CBAM)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Machine Learning-Powered Vision for Robotic Inspection in Manufacturing: A Review

by
David Yevgeniy Patrashko
and
Vladimir Gurau
*
Robotics Process Development Laboratory (RPDL), Georgia Southern University, Statesboro, GA 30458, USA
*
Author to whom correspondence should be addressed.
Sensors 2026, 26(3), 788; https://doi.org/10.3390/s26030788
Submission received: 15 December 2025 / Revised: 22 January 2026 / Accepted: 23 January 2026 / Published: 24 January 2026
(This article belongs to the Section Intelligent Sensors)

Abstract

Machine learning (ML)-powered vision for robotic inspection has accelerated with smart manufacturing, enabling automated defect detection and classification and real-time process optimization. This review provides insight into the current landscape and state-of-the-art practices in smart manufacturing quality control (QC). More than 50 studies spanning across automotive, aerospace, assembly, and general manufacturing sectors demonstrate that ML-powered vision is technically viable for robotic inspection in manufacturing. The accuracy of defect detection and classification frequently exceeds 95%, with some vision systems achieving 98–100% accuracy in controlled environments. The vision systems use predominantly self-designed convolutional neural network (CNN) architectures, YOLO variants, or traditional ML vision models. However, 77% of implementations remain at the prototype or pilot scale, revealing systematic deployment barriers. A discussion is provided to address the specifics of the vision systems and the challenges that these technologies continue to face. Finally, recommendations for future directions in ML-powered vision for robotic inspection in manufacturing are provided.

Graphical Abstract

1. Introduction

Smart manufacturing uses real-time data and data-driven technologies such as artificial intelligence (AI), cloud connectivity and industrial internet of things (IIOT) to increase the efficiency and agility of traditional manufacturing systems. It uses data from sensors, machines, and across the supply chain to improve quality, optimize production and to respond in real time to changing demands and conditions in the factory, supply network, and customer needs. Manufacturers are under pressure to rapidly adapt, and many are turning to smart manufacturing technologies to address challenges in labor shortages, skills gaps, and geopolitical and supply chain issues.
The current interest in smart manufacturing is reflected in the 2025 State of Manufacturing Report conducted by Rockwell Automation [1]. They analyzed 1560 questionnaires sent to decision-makers in manufacturing industries around the world. A total of 95% of the responders reported that they have either invested in, or plan to invest in, Machine Learning (ML), GenAI or Causal AI in manufacturing in the next five years. Among respondents, 50% plan to use AI/ML in quality control (QC), 49% in cybersecurity, 42% in process optimization, 37% in robotics, and 36% in logistics.
Traditional machine vision (MV), or rule-based MV has been, for decades, an essential tool in manufacturing, facilitating QC tasks such as gaging, defect detection, sorting parts, or assembly verification through the detection and localization of parts. To achieve these tasks, industry-level MV uses techniques such as edge detection, template matching, color analysis, morphological operations, or stereo imaging. Traditional MV provides techniques for camera calibration [2,3] used to correct lens distortions and to convert image pixels to real world coordinates. In 3D vision applications, MV provides techniques for stereo calibration [4,5] used to find the intrinsic parameters for each of the two cameras and the extrinsic parameters between the two cameras. When MV is used with robotic technology, it provides techniques for Hand–Eye calibration [6,7,8,9,10], or Robot–World–Hand–Hand–Eye calibration [11,12], used to determine the position and orientation of the coordinate system associated with the camera sensor relative to the coordinate system associated with the robot tool center point, and that of the target object relative to the camera sensor. These latter two techniques enable the camera to guide the robot in its work envelope and execute tasks.
Traditional MV works well for tasks with limited variability but fails to meet expectations when handling large product variations or unpredictable defects. Nevertheless, recent advancements in AI have led to the emergence of a new approach: deep learning (DL)-enhanced MV, which offers greater flexibility and adaptability in real-world applications. Unlike traditional MV, DL-enhanced MV significantly improves its accuracy in applications with large product variations, applications with unpredictable defects, or in complex environments.
The integration of DL-enhanced MV with robotics has significantly boosted inspection capabilities even further. Unlike fixed-camera inspection systems, vision-guided robotics, also known as “eye-in-hand” systems, can dynamically adapt the inspection path around objects to navigate through confined spaces or to scan along irregular features. This flexibility further improves the QC efficiency in systems with large positional, dimensional, or visual variability. In these cases, the integrated ML algorithms have a two-fold beneficial impact on the efficiency of the QC operation: they enhance the detection capabilities of the vision system, and at the same time, provide the robot with the ability to perceive and understand its environment, allowing it to adapt in real time to changes and variations in the production line. The interested reader may find additional information on the ML-enhanced vision-based control of robots in manufacturing in [13,14,15,16,17,18,19,20,21].
The objective of this literature review is to obtain information on the current landscape and the state-of-the-art practices in smart manufacturing QC and to extract details of ML-enhanced vision for robotic inspection, regarding the following:
  • The manufacturing context, such as the industry sector or application domain, production environment characteristics (high-mix/low-volume, assembly line, etc.), integration with existing systems (Industry 4.0, IoT, collaborative robots), operational constraints or requirements, or the scale of implementation (prototype, pilot, full deployment).
  • The system implementation, such as the robot integration approach, the vision system used, camera type and specifications (2D, RGB-D, stereo), additional sensors used (structural light), data fusion approaches, etc.
  • The ML approaches used, such as specific algorithms, training methodology, data preprocessing and augmentation techniques, feature extraction methods, or model architecture.
  • Performance metrics, such as accuracy for detection/classification, false positives/negatives if reported, detected rates, comparison with baseline, processing speed or inference time, etc.

2. Review Approach

A first semantic search was performed, assisted by Elicit search engine, of 138 million academic papers using the following query: “Machine learning-powered vision for robotic inspection in manufacturing”. The query retrieved 500 of the most relevant papers, which were screened to meet the following criteria:
  • Does the study involve robotic systems equipped with computer vision capabilities for inspection tasks?
  • Is the application specifically within manufacturing environments such as production lines, quality control, or assembly inspection?
  • Does the study explicitly incorporate ML algorithms for vision processing such as DL, neural networks (NN), or traditional ML approaches, rather than being purely rule-based or using only traditional image processing?
  • Does the research focus on inspection, quality control, defect detection, or monitoring applications?
  • Does the study report quantitative or qualitative performance outcomes with empirical validation?
  • Does the study include robotic integration rather than focusing solely on computer vision without robotics?
  • Is this a full research paper with substantial technical content rather than a conference abstract, editorial, opinion piece, or brief communication?
A large language model (LLM) was asked to extract data significant to this literature review, such as details about the ML approaches used, vision system, inspection application, performance metrics, manufacturing context, or system implementation. The search, followed by screening, identified 40 research papers examining ML-enhanced vision systems for robotic inspection across diverse manufacturing contexts.
The data extracted from each of the 40 publications was read and verified manually for consistency and correctness by the authors of this review, and 18 more papers were rejected based on the screening criteria. A typical criterion for which papers were rejected by the authors was when the topic focused on describing the use of ML-enhanced vision for robot manipulation, rather than describing its use in QC. Interestingly, none of the final 22 papers described the use of vision systems for inspection in welding or additive manufacturing technologies.
A second semantic search was performed using the Elicit search engine using the following query: “Machine learning-powered vision for robotic inspection in welding”. After a similar screening process and manual verification by the authors, an additional 18 research papers on vision inspection in robot welding were selected.
Finally, an additional 11 papers describing the use of ML-enhanced vision inspection in additive manufacturing were selected using the Google search engine.
A Sankey diagram illustrating the systematic literature selection process is shown in Figure 1.

3. Results

3.1. Characteristics of the Selected Studies

The systematic review identified 51 studies examining state-of-the-art ML-enhanced robotic inspection in general manufacturing [22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43], in welding processes [44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61] and in additive manufacturing [62,63,64,65,66,67,68,69,70,71,72]. Table 1 presents the key characteristics of the studies addressing ML-enhanced inspection in general manufacturing processes.
Table 1 reveals substantial diversity in approaches across manufacturing sectors. Studies spanned across automotive, with 14% of the retrieved studies, aerospace (27%), assembly (10%), general manufacturing applications (27%), and other manufacturing sectors, including food processing, logistics/warehouse, molded part packaging, etc. (22%).
Most implementations remained at prototype or pilot scale—77%—with only 23% achieving full industrial deployment. This suggests that while the technology shows promise, barriers to widespread adoption persist.
Table 2 presents the key characteristics of the studies addressing ML-enhanced vision systems for robotic welding inspection.
The general manufacturing sector represented the largest application domain, accounting for approximately 33% of the retrieved studies, followed by the automotive industry, at 22%, and other specialized sectors, including aerospace, nuclear, pipeline construction, infrastructure, etc., accounting for 45% of the studies.
Laser welding was the most frequently studied processes, in 22% of studies, followed by Tungsten Inert Gas (TIG) welding (17%), arc welding (11%), and other welding processes, including fusion welding, resistance spot welding, or multilayer multi-pass welding, accounting for a total of 17% of the studies. A total of 33% of the studies did not specify the welding technology.
Production scale ranged from high-throughput mass production environments to the specialized inspection of critical components in aerospace manufacturing. Multiple studies emphasized real-time or inline inspection capabilities, addressing the need for immediate quality feedback in automated production lines.
Table 3 presents the key characteristics of the studies addressing ML-enhanced vision systems for inspection in additive manufacturing processes.
The studies spanned multiple AM processes: powder bed fusion (PBF) with both laser and electron beam was most common, with five studies, followed by directed energy deposition (DED), with two studies, material extrusion using filament fusion or liquid polymer extrusion using a syringe, with two studies, binder jetting, with one study, and other AM processes, with two studies.
Vision systems varied significantly in terms of the sensor type, positioning, and integration approach. The dominant sensor configuration was in situ camera monitoring, with positioning strategies including coaxial mounting aligned with the laser beam and off-axis mounting above the build chamber.

3.2. Machine Learning Technologies and Architectures

Both traditional ML and DL techniques are currently being used in vision inspection for manufacturing QC.
In traditional ML, vision detection and classification processes identify defects by comparing a set of their features, called feature vectors, to a set of features that are characteristic of classes of known defects. The classification process (Figure 2a) involves image preprocessing, feature vector extraction, feeding the feature vector to a classification engine, and evaluating the results. Training is achieved using a dataset of images of known defects and generally, the larger the dataset, the more accurate the classification process is. Image preprocessing is performed using classic MV convolution transforms to filter them, to eliminate insignificant features and keep only features that can be used for classification. The feature vectors can be extracted using algorithms such as the Histogram of Gradients (HOG), Histogram of Binary Patterns (HBP), etc. Feature vectors for classification based on color may include statistical functions of various color spaces, such as histograms, skewness, entropy, etc. Feature vectors for classification based on texture can be categorized as statistical (histograms, co-occurrence matrices, local binary descriptors, etc.), structural (edge features, morphological operations, etc.), model-based (fractal, random field, etc.), or transformer-based (spectral, wavelet, curvelet, etc.). Traditional machine learning classification engines include random forest (RF), Support Vector Machines (SVM), Nearest Neighbor, (1NN), K-Nearest Neighbor (KNN), Decision Tree (DT), single-layer Artificial Neural Networks (ANN), etc. Traditional ML works well with smaller, structured datasets, with fewer computational resources, and offers better interpretability when used with simpler models that are easier to understand and explain. Their disadvantage is that they require experts to select and transform the feature vectors from raw data, they may struggle with massive, complex or unstructured datasets, and they may not capture effectively intricate patterns in high-dimensional data.
Deep learning methods for defect detection and classification primarily use Convolutional Neural Networks (CNNs) to automatically learn features from raw pixels, moving beyond traditional methods like HOG or HBP. Key techniques involve hierarchical feature extraction in convolutional layers, training with large, labeled datasets (supervised learning), and using transfer learning (fine-tuning pre-trained models like AlexNet, VGG) for efficiency. Deep learning learns the feature vectors automatically from raw data; performance improves significantly with larger datasets, it excels with unstructured data and complex patterns, and it is more adaptable to complex, large-scale problems. Its disadvantages include that it needs a massive amount of labeled data to perform well, it requires significant computational power, it is hard to interpret how decisions are made, and it is more difficult to implement or tune.
The steps of defect detection and classification processes using CNNs are shown in Figure 2b.
The basic structure of a CNN is shown in Figure 3. The CNN architecture contains an upstream feature vector extractor, also called the “backbone” or “body” of the network, and a downstream classifier, also called the “head” of the network. The backbone consists of convolutional layers which apply convolutional operations to input images using filters or kernels to detect features such as edges, textures, and more complex patterns. They also convert them to nonlinear values through their activation function, which is typically the Rectified Linear Unit (ReLU). Between convolutional layers, there are pooling layers, which downsample the input dimensions and reduce the number of parameters in the network. The fully connected layers are responsible for making predictions based on the features learned by the previous layers and may use ReLU or SoftMax activation functions.
CNNs used in manufacturing inspection can be categorized into four groups based on their application:
  • Classic CNNs, which assigns a single class label to an entire image. Some representative networks used for this application are AlexNet, ResNet, or VGGNet.
  • CNNs for defect detection and localization, which identify and locate defects with bounding boxes and assign individual class labels to each of them. Some representative networks used for defect detection and localization are R-CNN, faster R-CNN, or YOLO.
  • CNNs for semantic segmentation, which assign a class label to each pixel in an image. They provide a holistic understanding of the image by segmenting it into meaningful semantic regions, without differentiating between individual object instances. Representative networks used for semantic segmentation are U-Net, FCN, DeepLab, PSP Net, or SegNet.
  • CNNs for instance segmentation, which combine elements of defect detection and semantic segmentation. They identify and delineate individual defect instances within an image at a detailed pixel level and assign class labels to each identified defect. Representative networks used for instance segmentation are Mask R-CNN, Cascade Mask R-CNN, SOLO, or YOLACT.
All four categories of CNNs can be used for defect detection and classification, but CNNs for defect detection and localization and those for instance segmentation have the additional function of localizing the defects within the image.
The statistical analysis results of selected papers categorized by the type of CNN architecture used in manufacturing vision inspection are shown in Figure 4.
Self-designed CNNs represent the most frequently used network architectures in vision inspection, followed by YOLO, traditional ML, and ResNet. Self-designed CNNs represented combinations of CNN with the Long Short-Term Memory (LSTM) network, [39,44], a combination of CNN with the Gated Recurrent Unit (GRU) network [45,46], or were created using Keras, TensorFLow, and PyTorch libraries [24,71,72].
Traditional ML used classification engines such as RF [25,59,62], SVM [36,47,49,51,56,62], perceptron [51], Multi-Layer Perceptron (MLP) [36,62], DT [49], 1NN [61], KNN [49,56], or a combination of Bag of Words (BoW) and SVM [63].

3.3. Machine Learning Model Assessment

The ML model’s ability to identify and classify correctly defects were evaluated based on precision, recall, overall accuracy, F1 score, intersection over union, average precision, and mean average precision.
The precision for class i, P i represents the probability that a defect classified into class i does belong to class i, and is calculated as the ratio of the number of defects classified correctly into class i to the total number of defects classified into class i:
P i = T P i T P i + F P i
The recall for class i, R i , is the probability that a defect is classified to the class to which it belongs, and is calculated as the ratio of the number of defects in class i classified correctly to the total number of defects that belong to class i:
R i = T P i T P i + F N i
The overall classifier accuracy, OA is defined as the total number of defects in the dataset classified correctly, i T P i , divided by the total number of samples classified, N:
O A = i = 1 M T P i N
In Equations (1)–(3), T P i represents the true positives for class i, or the number of defects belonging to class i that were correctly predicted as belonging to that class, F P i represents false positives for class i, or number of defects belonging to other classes that were incorrectly predicted as belonging to class i, F N i represents false negatives for class i, or number of defects belonging to class i that were incorrectly predicted as belonging to other classes, N represents the total number of defects, and M represents the number of classes.
The F1i score is the harmonic mean of precision and recall, and provides a balanced assessment of a model’s performance while considering both false positives and false negatives:
F 1 i = 2 × P i × R i P i + R i
The intersection over union—IoU—plays a fundamental role in evaluating the accuracy of defects’ localization and represents a measure that quantifies the overlap between a predicted bounding box and a ground truth bounding box:
I o U = a r e a B p r e d i c t e d     B a c t u a l a r e a ( B p r e d i c t e d     B a c t u a l × 100 %
The average precision for class i, APi, represents the area under the precision–recall curve for class i and can be approximated using numerical integration.
The mean average precision, mAP is the mean of the APi values across all classes in the dataset and is calculated as follows:
m A P = 1 M i = 1 M A P i
A more robust way to assess model performance is cross-validation, with the most commonly used version being k-fold cross-validation [73]. When different ML algorithms need to be compared, the most-used approach is nested cross-validation [74].
In industrial robotic inspection, metrics (1)–(6) are highly sensitive to dataset composition, controlled environments, and validation protocols. Defect-limited variability, class imbalance, controlled lighting, or lack of true production-scale testing represent potential sources of assessment bias. The reader must interpret the reported ML model assessment with caution.
Performance metrics were reported across multiple criteria, though not all studies provided comprehensive quantitative results. Detection and classification accuracy formed the primary metric, with processing speed, coverage ratios, and comparisons with baseline methods also frequently reported.
Multiple studies achieved exceptionally high accuracy rates, exceeding 95%. Variz et al. [30] achieved near-100% accuracy using a self-designed CNN used for vision quality control of Human–Machine Interface consoles. Ardic et al. [35] reported 99.9% accuracy using an R-CNN for engine part inspection after four months of operation. Terras et al. [33] demonstrated a detection and classification accuracy of 98%, successfully processing more than 600 items with high efficiency and low computational cost. Their results were matched by Shaloo et al. [31], who used YOLOv8 for assembly inspection.
The mid-90s accuracy range was commonly observed. Zhou et al. [38] reported 94.95% recall with 92.35% precision for mesh screen inspection. Rajesh et al. [25] achieved 95% accuracy and 94% recall using an RF classification engine in vision inspection of gears. Yazid et al. [43] demonstrated 96% detection accuracy using YOLOv5, while Hussain et al. [32] obtained a 92.7% mean average precision for pallet racking inspection using VGG16 network.
Lower accuracy ranges were observed in more challenging applications. Mueller et al. [37] reported 86% accuracy for online rivet classification using sensor data, improving to 97% with image-based classification using a self-designed CNN. Lee et al. [41] achieved 83.33% accuracy using an ENN classifier, correctly predicting five of six datasets in hole quality assessment.
When specific algorithm comparisons were provided, performance differences emerged. For engine part inspection, Ardiç et al. [35] found that Faster R-CNN achieved 0.994 average precision versus 0.955 for SSD. Kirda et al. [28] compared three algorithms for metal edge detection: YOLOv5 achieved 0.957 mean average precision, outperforming VGG16 at 0.942 and ResNet at 0.854. The superiority of ensemble methods (Knaak et al. [45] 99.5% F1 score; Knaak et al. [46] 95.2% F1 score) versus single-model approaches stems from their ability to combine complementary error patterns. Spatiotemporal CNN-GRU architecture captured dynamic welding process features that pure spatial CNNs missed. Similarly, Fernandez et al. [44] found a superior performance of the spatiotemporal CNN-LSTM architecture compared to pure spatial CNN (0.95 vs. 0.94 recall).

Context-Specific Performance Patterns

Performance outcomes cluster distinctly by application complexity and environmental conditions. Studies achieving the highest accuracy (>98%) predominantly addressed well-defined defect categories in controlled environments. O. Ardiç et al.’s [35] 99.9% accuracy for engine parts and N. Terras et al.’s [33] 98% for food products occurred in assembly line settings with consistent object presentation and minimal environmental variability.
In contrast, applications in unstructured or dynamic environments showed systematically lower performance. In the study by R. Mueller et al. [37], aircraft riveting inspection achieved only 86% accuracy when relying on real-time sensor data, improving to 97% with post-process image analysis, suggesting that temporal constraints in collaborative human–robot scenarios compromise detection reliability.
The L. Variz et al. [30] study illustrates performance variability within a single system: while console classification and button defect detection approached 100% accuracy, face recognition exceeded only 50%. This dramatic difference reflects the fundamental distinction between inspecting manufactured components with consistent specifications versus recognizing variable human features, suggesting that performance claims require careful scoping to specific subtasks rather than system-level averages.
The apparent contradiction between traditional ML outperforming DL in specific cases (e.g., in the study by S. Zhang et al. [56], KNN achieved 98% accuracy in 33 ms versus the slower CNN performance) resolves when considering feature space dimensionality. KNN excelled when discriminative features were already well-understood and extracted using Gabor transforms and texture analysis, whereas DL demonstrated advantages when feature engineering was not feasible.

3.4. Machine Learning Architecture Trade-Offs

The prevalence of YOLO variants across studies reflects not superior fundamental performance but rather a favorable balance of speed, accuracy, and simplicity of deployment in industrial applications. Direct algorithmic comparisons reveal nuanced trade-offs rather than clear winners. O Ardiç et al. [35] found that Faster R-CNN achieved higher average precision (0.994) than SSD (0.955) for engine inspection, yet SSD’s faster inference might prove preferable in high-throughput scenarios despite lower accuracy. A.W. Kirda et al.’s [28] comparison showed YOLOv5 (0.957 MAP) outperforming VGG16 (0.942) and ResNet (0.854), but the 0.015 advantage over VGG16 may not justify switching in systems already using the latter.

4. Future Directions

Future directions in ML-powered vision for manufacturing QC are driven by the demand to achieve higher accuracy, robustness, and reliability.

4.1. Use of Synthetic Training Images

Limited training data has been revealed as a primary constraint on performance and deployment of ML models, explaining why many high-performing vision systems remain at prototype scale. Transfer learning can mitigate for limited datasets but does not eliminate data requirements for production reliability.
Synthetic training images can boost the reliability of machine learning vision by providing vast, perfectly labeled, diverse data, especially for rare edge cases that are hard to obtain in real life. Automated synthetic data generation enables rapid scaling, reduces bias, and protects privacy while delivering high-quality annotation. In computer vision, synthetic data generation uses advanced techniques like generative adversarial networks (GANs) and variational autoencoders (VAEs). These models learn patterns from real datasets and then produce new, artificial examples.

4.2. Use of Federated Machine Learning

A second direction that has the potential to increase the size of training datasets while improving data privacy and security in manufacturing is through federated learning. Federated Machine Learning is a decentralized AI training method that builds a shared model from data on many devices without moving the raw data, keeping sensitive information private. Instead of sending data to a central server, the model travels to the data, learns locally, and sends back only aggregated updates such as model parameters or gradients to improve the main model.

4.3. Ensemble Learning

Ensemble learning combines multiple individual models to create a single, more powerful model, improving prediction accuracy, robustness, and generalization by leveraging collective “wisdom” over a single model. This review has already shown that hybrid spatiotemporal models such as CNN-GRU or CNN-LSTM outperform the pure spatial CNN models.

4.4. Self-Supervised Learning

Training on image datasets requires manually adding labels to objects in images, a time-consuming process called annotation. An emerging direction is Self-Supervised Learning (SSL), a type of ML where models learn to generate labels from the input data itself, eliminating the need for manually labeled images. SSL models for industrial vision train on unlabeled image/video data to learn features for tasks like defect detection, object recognition, and anomaly localization, reducing reliance on costly annotations. They enable powerful Vision Foundation Models like DINOv3 to adapt better to specialized industrial environments where labeled data is scarce. The key benefits of SSL models include lower costs, faster training, and improved performance on downstream tasks.

4.5. Visual–Language Models for Explainability

Visual–Language Models (VLMs) enhance explainability by generating human-readable descriptions, identifying important visual features, and providing step-by-step reasoning for complex tasks, thus bridging the gap between opaque AI decisions and user understanding. VLMs can generate natural-language descriptions of visual content, moving beyond simple labels to explain why objects are recognized. They can highlight specific image regions or features such as pixels or objects that are most influential in their decision-making process. VLMs use techniques such as Chain-of-Thought prompting to outline their reasoning steps, making complex logical paths such as puzzles and medical diagnosis understandable. Their benefit is in making AI decisions understandable to non-experts by translating complex computations into simple language, and uncovering false correlations or biases by revealing the model’s thinking process, thus increasing user confidence.

4.6. Physics-Informed Machine Learning

Physics-Informed Machine Learning (PIML) integrates known physical laws such as the conservation of energy or fluid dynamics equations directly into machine learning models, creating more accurate, data-efficient, and physically consistent AI systems. They become especially useful when data is scarce, or the underlying physics is complex. They work by adding physical laws, often expressed by partial differential equations, to the ML’s loss function, thus penalizing predictions that violate these laws. By embedding physical knowledge, models need significantly fewer training samples to generalize well, overcoming the data-scarcity issues common in science and engineering.

5. Conclusions

This review provides insight into the current landscape and state-of-the-art practices in smart manufacturing ML-powered robotic vision inspection.
More than 50 studies spanning across the automotive, aerospace, assembly, and general manufacturing sectors demonstrate that ML-powered vision is a technical viability for robotic inspection in manufacturing.
The accuracy of defect detection and classification frequently exceeds 95%, with some vision systems achieving 98–100% accuracy in controlled environments.
The vision systems use predominantly self-designed convolutional neural network (CNN) architectures, YOLO variants, or traditional ML vision models.
However, 77% of implementations remain at prototype or pilot scale, revealing systematic deployment barriers.

Author Contributions

Conceptualization, V.G.; validation, V.G. and D.Y.P.; investigation, V.G. and D.Y.P.; data curation, V.G. and D.Y.P.; writing—original draft preparation, V.G.; writing—review and editing, D.Y.P.; supervision, V.G.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Georgia Southern University.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

During the preparation of this manuscript, the authors used Elicit: The AI Research Assistant (https://elicit.com) for the purposes of gathering and screening relevant papers, and for extracting insights from these papers. The graphical abstract was generated with help from ChatGPT Graphical abstract designer 5.2. The authors have reviewed the output for consistency and correctness, have edited it, and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
AMAdditive Manufacturing
BDNBayesian Decision Networks
BoWBag of Words
CNNConvolutional Neural Network
DLDeep Learning
DRLDeep Reinforcement Learning
DTDecision Tree
ENNEnsemble Neural Network
F1iF1 score for class i
GANGenerative Adversarial Network
GRUGated Recurrent Unit
HBPHistogram of Binary Patterns
HOGHistogram of Gradients
IoUIntersection over union
KNNK-Nearest Neighbor
LLMLarge Language Model
LSTMLong Short-Term Memory
mAPMean average precision
MLMachine Learning
MLPMulti-Layer Perceptron
NNNeural Network
1NNNearest Neighbor
OAOverall classifier accuracy
PiPrecision for class i
QCQuality Control
RiRecall for class i
RFRandom Forest
RT-DETRReal-Time Detection Transformer
ST-MDLSemi-supervised Transfer Learning based Multi-Domain learning
SSDSingle Shot Detector
SVMSupport Vector Machine
VAEVariational Autoencoder

References

  1. Rockwell Automation. 2025 State of Manufacturing Report. 10th Annual State of Smart Manufacturing. 2025. Available online: https://www.rockwellautomation.com/en-us/capabilities/digital-transformation/state-of-smart-manufacturing.html (accessed on 12 December 2025).
  2. Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
  3. Strobl, K.H.; Hirzinger, G. More accurate pinhole camera calibration with imperfect planar target. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011; pp. 1068–1075. [Google Scholar] [CrossRef]
  4. Beschi, R.; Feng, X.; Melillo, S.; Parisi, L.; Postiglione, L. Stereo camera system calibration: The need of two sets of parameters. arXiv 2021, arXiv:2101.05725. [Google Scholar] [CrossRef]
  5. Hanning, T. Calibration of a stereo camera system. In High Precision Camera Calibration; Vieweg+Teubner: Wiesbaden, Germany, 2011; pp. 91–106. [Google Scholar] [CrossRef]
  6. Tsai, R.Y.; Lenz, R.K. A new technique for fully autonomous and efficient 3D robotics hand/eye calibration. IEEE Trans. Robot. Autom. 1989, 5, 345–358. [Google Scholar] [CrossRef]
  7. Park, F.C.; Martin, B.J. Robot sensor calibration: Solving AX=XB on the Euclidean group. IEEE Trans. Robot. Autom. 1994, 10, 717–721. [Google Scholar] [CrossRef]
  8. Horaud, R.; Dornaika, F. Hand-Eye Calibration. Int. J. Robot. Res. 1995, 14, 195–210. [Google Scholar] [CrossRef]
  9. Andreff, N.; Horaud, R.; Espiau, B. On-line hand-eye calibration. In Proceedings of the 2nd International Conference on 3-D Digital Imaging and Modeling (3DIM’99), Ottawa, ON, Canada, 4–8 October 1999; pp. 430–436. [Google Scholar] [CrossRef]
  10. Daniilidis, K. Hand-Eye Calibration Using Dual Quaternions. Int. J. Robot. Res. 1999, 18, 286–298. [Google Scholar] [CrossRef]
  11. Shah, M. Solving the Robot-World/Hand-Eye Calibration Problem Using the Kronecker Product. J. Mech. Robot. 2013, 5, 031007. [Google Scholar] [CrossRef]
  12. Li, A.; Wang, L.; Wu, D. Simultaneous robot-world and hand-eye calibration using dual-quaternions and Kronecker product. Int. J. Phys. Sci. 2010, 5, 1530–1536. [Google Scholar]
  13. Ibrahim, I.A.; Ali, A.I.; Baballe, M.A. Artificial Intelligence-Driven Vision-Based Control: Unlocking the Potential of Robotics Amidst Challenges. Glob. J. Res. Eng. Comput. Sci. 2025, 5, 74–80. [Google Scholar]
  14. Mahajan, H.B.; Uke, N.; Pise, P.; Shahade, M.; Dixit, V.G.; Bhavsar, S.; Deshpande, S.D. Automatic robot maneuvers detection using computer vision and deep learning techniques: A perspective of internet of robotics things (IoRT). Multimed. Tools Appl. 2023, 82, 23251–23276. [Google Scholar] [CrossRef]
  15. Choksi, S.; Narasimhan, S.; Ballo, M.; Turkcan, M.; Hu, Y.; Zang, C.; Farrell, A.; King, B.; Nussbaum, J.; Reisner, A.; et al. Automatic assessment of robotic suturing utilizing computer vision in a dry-lab simulation. Artif. Intell. Surg. 2025, 5, 160–169. [Google Scholar] [CrossRef]
  16. Osita, M.N.; Ogochukwu, C.O.; Ike, J.M. Intelligent robotic object grasping system using computer vision and deep reinforcement learning techniques. Int. J. Sci. Res. Arch. 2025, 14, 511–521. [Google Scholar] [CrossRef]
  17. Nguyen, V.T.; Nguyen, P.T.; Su, S.F.; Tan, P.X.; Bui, T.L. Vision-Based Pick and Place Control System for Industrial Robots Using an Eye-in-Hand Camera. IEEE Access 2025, 13, 25127–25140. [Google Scholar] [CrossRef]
  18. Voulodimos, A.; Kosmopoulos, D.; Vasileiou, G.; Sardis, E.; Doulamis, A.; Anagnostopoulos, V.; Lalos, C.; Varvarigou, T. A dataset for workflow recognition in industrial scenes. In Proceedings of the 2011 18th IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011; pp. 3249–3252. [Google Scholar] [CrossRef]
  19. Voulodimos, A.; Kosmopoulos, D.; Vasileiou, G.; Sardis, E.; Anagnostopoulos, V.; Lalos, C.; Doulamis, A.; Varvarigou, T. A Threefold Dataset for Activity and Workflow Recognition in Complex Industrial Environments. IEEE Multimed. 2012, 19, 42–52. [Google Scholar] [CrossRef]
  20. Carletti, V.; Greco, A.; Longobardi, D.; Ritrovato, P.; Saggese, A.; Vento, M. Multi-modal Human-Robot Collaboration in Production Lines Through Speech Commands and Gestures. In Computer Analysis of Images and Patterns, CAIP 2025, Lecture Notes in Computer Science; Castrillón-Santana, M., Travieso-González, G.M., Suarez, O.D., Freire-Obregón, D., Hernández-Sosa, D., Lorenzo-Navarro, J., Santana, O.J., Eds.; Springer: Cham, Germany; Volume 15622. [CrossRef]
  21. Dreger, F.; Karthaus, M.; Metzler, Y.; Tauro, F.; Carrelli, V.; Athanassiou, G.; Rinkenauer, G. Requirements for Successful Human Robot Collaboration: Design Perspectives of Developers and Users in the Scope of the EU Horizon Project FELICE. In Human Factors in Robots, Drones and Unmanned Systems; Proceedings of the AHFE (2024) International Conference; AHFE International: New York, NY, USA, 2024; Volume 138. [Google Scholar] [CrossRef]
  22. Villalonga, A.; Cruz, Y.J.; Alfaro, D.; Haber, R.E.; Martínez-Lastra, J.L.; Castaño, F. Enhancing quality inspection in zero-defect manufacturing through robotic-machine collaboration. In Proceedings of the 2024 7th Iberian Robotics Conference (ROBOT), Madrid, Spain, 6–8 November 2024; pp. 1–6. [Google Scholar] [CrossRef]
  23. Rosell, A.; Svenman, E.; Westphal, P.; Mukundan, A.; Bhattacharya, S.; Bharthulwar, S.; Brahmachari, K.; Jhanardhanan, S. Machine learning-based system to automate visual inspection in aerospace engine manufacturing. In Proceedings of the 2023 IEEE 28th International Conference on Emerging Technologies and Factory Automation (ETFA), Sinaia, Romania, 12–15 September 2023; pp. 1–8. [Google Scholar] [CrossRef]
  24. Kim, D.; TabkhPaz, M.; Park, S.S.; Lee, J. Development of a vision-based automated hole assembly system with quality inspection. Manuf. Lett. 2023, 35, 64–73. [Google Scholar] [CrossRef]
  25. Rajesh, P.J.; Balambica, V.; Achudhan, M. Automated gear inspection using Image Processing and Machine Learning Techniques. In Proceedings of the 2024 4th International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 14–15 May 2024; pp. 1643–1648. [Google Scholar] [CrossRef]
  26. Chen, J.; Van Le, D.; Tan, R.; Ho, D. BubCam: A vision system for automated quality inspection at manufacturing lines. In Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023), San Antonio, TX, USA, 9–12 May 2023; pp. 12–21. [Google Scholar] [CrossRef]
  27. Karigiannis, J.; Liu, S.; Harel, S.; Bian, X.; Zhu, P.; Xue, F.; Bouchard, S.; Cantin, D. Multi-robot system for automated fluorescent penetrant indication inspection with deep neural nets. Procedia Manuf. 2021, 53, 735–740. [Google Scholar] [CrossRef]
  28. Kirda, A.W.; Majewski, P.; Bursy, G.; Bartoszuk, M.; Yassin, H.; Królczyk, G.; Akbar, N.A.; Caesarendra, W. Integrating YOLOv5, Jetson nano microprocessor, and Mitsubishi robot manipulator for real-time machine vision application in manufacturing: A lab experimental study. Adv. Sci. Technol. Res. J. 2025, 19, 248–270. [Google Scholar] [CrossRef] [PubMed]
  29. Zeng, L.; Ye, Z.; Shen, L.; Sun, J.; Wang, Z.; Chen, M.; Cheng, Y.; Zheng, H.; Dong, Q.; Qian, X. VISAR: Vision-based robotic arm system for intelligent industrial inspection. Int. J. Comput. Sci. Inf. Technol. 2025, 6, 46–53. [Google Scholar] [CrossRef]
  30. Variz, L.; Piardi, L.; Rodrigues, P.J.; Leitão, P. Machine learning applied to an intelligent and adaptive robotic inspection station. In Proceedings of the 2019 IEEE 17th International Conference on Industrial Informatics (INDIN), Helsinki, Finland, 22–25 July 2019; pp. 290–295. [Google Scholar] [CrossRef]
  31. Shaloo, M.; Princz, G.; Hörbe, R.; Erol, S. Flexible automation of quality inspection in parts assembly using CNN-based machine learning. Procedia Comput. Sci. 2024, 232, 2921–2932. [Google Scholar] [CrossRef]
  32. Hussain, M.; Chen, T.; Hill, R. Moving toward smart manufacturing with an autonomous pallet racking inspection system based on MobileNetV2. J. Manuf. Mater. Process. 2022, 6, 75. [Google Scholar] [CrossRef]
  33. Terras, N.; Pereira, F.; Ramos Silva, A.; Santos, A.A.; Lopes, A.M.; Silva, A.F.D.; Cartal, L.A.; Apostolescu, T.C.; Badea, F.; Machado, J. Integration of deep learning vision systems in collaborative robotics for real-time applications. Appl. Sci. 2025, 15, 1336. [Google Scholar] [CrossRef]
  34. Raj, N.B.M.; Sandeep, N.; Bobby, T.C.; Karthikeyan, R.B. Development of AI model for robotic vision inspection of sheet-metal components in manufacturing. Int. J. Multidiscip. Res. Anal. 2024, 7, 5444–5449. [Google Scholar] [CrossRef]
  35. Ardic, O.; Cetinel, G. Deep learning-based real-time engine part inspection with collaborative robot application. IEEE Access 2024, 12, 187483–187497. [Google Scholar] [CrossRef]
  36. Bauer, P.; Schmitt, S.; Dirr, J.; Magaña, A.; Reinhart, G. Intelligent predetection of projected reference markers for robot-based inspection systems. Prod. Eng. 2022, 16, 719–734. [Google Scholar] [CrossRef]
  37. Mueller, R.; Vette, M.; Masiak, T.; Duppe, B.; Schulz, A. Intelligent real time inspection of rivet quality supported by human-robot-collaboration. SAE Int. J. Adv. Curr. Pract. Mobil. 2019, 2, 811–817. [Google Scholar] [CrossRef]
  38. Zhou, S.; Le, D.V.; Jiang, L.; Chen, Z.; Peng, X.; Ho, D.; Zheng, J.; Tan, R. RoboCam: Model-based robotic visual sensing for precise inspection of mesh screens. ACM Trans. Sens. Netw. 2025, 21, 1–23. [Google Scholar] [CrossRef]
  39. Martelli, S.; Mazzei, L.; Canali, C.; Guardiani, P.; Giunta, S.; Ghiazza, A.; Mondino, I.; Cannella, F.; Murino, V.; Del Bue, A. Deep endoscope: Intelligent duct inspection for the avionic industry. IEEE Trans. Industr. Inform. 2018, 14, 1701–1711. [Google Scholar] [CrossRef]
  40. Deshpande, S.; Roy, A.; Johnson, J.; Fitz, E.; Kumar, M.; Anand, S. Smart monitoring and automated real-time visual inspection of a sealant applications (SMART-VIStA). Manuf. Lett. 2023, 35, 1134–1145. [Google Scholar] [CrossRef]
  41. Lee, S.K.H.; Mongan, P.G.; Farhadi, A.; Hinchy, E.P.; O’Dowd, N.P.; McCarthy, C.T. In-situ evaluation of hole quality and cutting tool condition in robotic drilling of composite materials using machine learning. J. Intell. Manuf. 2025, 1–22. [Google Scholar] [CrossRef]
  42. Tang, W.; Jahanshahi, M.R. Autonomous robotic inspection based on active vision and Deep Reinforcement Learning. In Proceedings of the 14th International Workshop on Structural Health Monitoring, Stanford, CA, USA, 12–14 September 2023. [Google Scholar] [CrossRef]
  43. Yazid, Y.; Guerrero-González, A.; El Oualkadi, A.; Arioua, M. Deep learning-empowered robot vision for efficient robotic grasp detection and defect elimination in industry 4.0. Eng. Proc. 2023, 58, 63. [Google Scholar] [CrossRef]
  44. Fernandez, A.; Souto, A.; Gonzalez, C.; Mendez-Rial, R. Embedded vision system for monitoring arc welding with thermal imaging and deep learning. In Proceedings of the 2020 International Conference on Omni-layer Intelligent Systems (COINS), Barcelona, Spain, 31 August–2 September 2020. [Google Scholar] [CrossRef]
  45. Knaak, C.; von Eßen, J.; Kröger, M.; Schulze, F.; Abels, P.; Gillner, A. A spatio-temporal ensemble deep learning architecture for real-time defect detection during laser welding on low power embedded computing boards. Sensors 2021, 21, 4205. [Google Scholar] [CrossRef] [PubMed]
  46. Knaak, C.; Kröger, M.; Schulze, F.; Abels, P.; Gillner, A. Deep learning and conventional machine learning for image-based in-situ fault detection during laser welding: A comparative study. arXiv 2021, arXiv:202105.0272. [Google Scholar] [CrossRef]
  47. Xia, C.; Pan, Z.; Fei, Z.; Zhang, S.; Li, H. Vision based defects detection for Keyhole TIG welding using deep learning with visual explanation. J. Manuf. Process. 2020, 56, 845–855. [Google Scholar] [CrossRef]
  48. Kumar, D.D.; Fang, C.; Zheng, Y.; Gao, Y. Semi-supervised transfer learning-based automatic weld defect detection and visual inspection. Eng. Struct. 2023, 292, 116580. [Google Scholar] [CrossRef]
  49. Buongiorno, D.; Prunella, M.; Grossi, S.; Hussain, S.M.; Rennola, A.; Longo, N.; Di Stefano, G.; Bevilacqua, V.; Brunetti, A. Inline defective laser weld identification by processing thermal image sequences with machine and deep learning techniques. Appl. Sci. 2022, 12, 6455. [Google Scholar] [CrossRef]
  50. Li, H.; Wang, X.; Liu, Y.; Liu, G.; Zhai, Z.; Yan, X.; Wang, H.; Zhang, Y. A novel robotic-vision-based defect inspection system for bracket weldments in a cloud–edge coordination environment. Sustainability 2023, 15, 10783. [Google Scholar] [CrossRef]
  51. Yemelyanova, M.; Smailova, S. Application of machine learning for recognizing surface welding defects in video sequences. SJAITU 2024, 16, 44–52. [Google Scholar] [CrossRef]
  52. Schmitz, M.; Pinsker, F.; Ruhri, A.; Jiang, B.; Safronov, G. Enabling rewards for reinforcement learning in laser beam welding processes through deep learning. In Proceedings of the 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 14–17 December 2020. [Google Scholar] [CrossRef]
  53. Cherkasov, N.; Ivanov, M.; Ulanov, A. Classification of weld defects based on computer vision system data and deep learning. In Proceedings of the 2023 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM), Sochi, Russia, 15–19 May 2023; pp. 856–860. [Google Scholar] [CrossRef]
  54. Kartashov, O.O.; Chernov, A.V.; Alexandrov, A.A.; Polyanichenko, D.S.; Ierusalimov, V.S.; Petrov, S.A.; Butakova, M.A. Machine learning and 3D reconstruction of materials surface for nondestructive inspection. Sensors 2022, 22, 6201. [Google Scholar] [CrossRef]
  55. Kajan, S.; Trebul’a, M.; Duchoň, F.; Kovaríková, Z.; Švolík, M.; Švec, D. Robotic vision inspection of weld quality using convolutional neural networks. In Proceedings of the 2024 25th International Carpathian Control Conference (ICCC), Krynica Zdrój, Poland, 22–24 May 2024. [Google Scholar] [CrossRef]
  56. Zhang, S.; Deng, M.; Xie, X. Real-time recognition of weld defects based on visible spectral image and machine learning. MATEC Web Conf. 2022, 355, 03014. [Google Scholar] [CrossRef]
  57. Truong, V.D.; Wang, Y.; Won, C.; Yoon, J. A deep learning-based machine vision system for online monitoring and quality evaluation during multi-layer multi-pass welding. Sensors 2025, 25, 4997. [Google Scholar] [CrossRef] [PubMed]
  58. Dai, W.; Li, D.; Tang, D.; Jiang, Q.; Wang, D.; Wang, H.; Peng, Y. Deep learning assisted vision inspection of resistance spot welds. J. Manuf. Process. 2021, 62, 262–274. [Google Scholar] [CrossRef]
  59. Dong, X.; Taylor, C.J.; Cootes, T.F. A random forest-based automatic inspection system for aerospace welds in X-ray images. IEEE Trans. Autom. Sci. Eng. 2021, 18, 2128–2141. [Google Scholar] [CrossRef]
  60. Cruz, Y.J.; Rivas, M.; Quiza, R.; Beruvides, G.; Haber, R.E. Computer vision system for welding inspection of liquefied petroleum gas pressure vessels based on combined digital image processing and deep learning techniques. Sensors 2020, 20, 4505. [Google Scholar] [CrossRef]
  61. Shi, Y.; Zhu, Y.-Y.; Wang, J.-Q. Surface defect detection method for welding robot workpiece based on machine vision technology. Manuf. Technol. 2023, 23, 691–699. [Google Scholar] [CrossRef]
  62. Gaikwad, A.; Williams, R.J.; de Winton, H.; Bevans, B.D.; Smoqi, Z.; Rao, P.; Hooper, P.A. Multi phenomena melt pool sensor data fusion for enhanced process monitoring of laser powder bed fusion additive manufacturing. Mater. Des. 2022, 221, 110919. [Google Scholar] [CrossRef]
  63. Rossi, A.; Moretti, M.; Senin, N. Layer inspection via digital imaging and machine learning for in-process monitoring of fused filament fabrication. J. Manuf. Process. 2021, 70, 438–451. [Google Scholar] [CrossRef]
  64. Zhang, B.; Jaiswal, P.; Rai, R.; Guerrier, P.; Baggs, G. Convolutional neural network-based inspection of metal additive manufacturing parts. Rapid Prototyp. J. 2019, 25, 530–540. [Google Scholar] [CrossRef]
  65. Cannizzaro, D.; Varrella, A.G.; Paradiso, S.; Sampieri, R.; Macii, E.; Patti, E.; Di Cataldo, S. Image analytics and machine learning for in-situ defects detection in Additive Manufacturing. In Proceedings of the 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 1–5 February 2021. [Google Scholar] [CrossRef]
  66. Tsintavi, E. A deep learning approach for automated inspection of 3D printed orodispersible films. Mater. Res. Proc. 2024, 46, 15–22. [Google Scholar] [CrossRef]
  67. Kaji, F.; Nguyen-Huu, H.; Budhwani, A.; Narayanan, J.A.; Zimny, M.; Toyserkani, E. A deep-learning-based in-situ surface anomaly detection methodology for laser directed energy deposition via powder feeding. J. Manuf. Process. 2022, 81, 624–637. [Google Scholar] [CrossRef]
  68. Elwarfalli, H.; Papazoglou, D.; Erdahl, D.; Doll, A.; Speltz, J. In situ process monitoring for laser-powder bed fusion using convolutional neural networks and infrared tomography. In Proceedings of the 2019 IEEE National Aerospace and Electronics Conference (NAECON), Dayton, OH, USA, 15–19 July 2019. [Google Scholar] [CrossRef]
  69. Lu, L.; Hou, J.; Yuan, S.; Yao, X.; Li, Y.; Zhu, J. Deep learning-assisted real-time defect detection and closed-loop adjustment for additive manufacturing of continuous fiber-reinforced polymer composites. Robot. Comput. Integr. Manuf. 2023, 79, 102431. [Google Scholar] [CrossRef]
  70. Scime, L.; Siddel, D.; Baird, S.; Paquit, V. Layer-wise anomaly detection and classification for powder bed additive manufacturing processes: A machine-agnostic algorithm for real-time pixel-wise semantic segmentation. Addit. Manuf. 2020, 36, 101453. [Google Scholar] [CrossRef]
  71. Klamert, V.; Achsel, T.; Toker, E.; Bublin, M.; Otto, A. Real-time optical detection of artificial coating defects in PBF-LB/P using a low-cost camera solution and convolutional neural networks. Appl. Sci. 2023, 13, 11273. [Google Scholar] [CrossRef]
  72. Chen, Z.; Gao, J.; Wang, C.; Zeng, Z.; Zhu, C.; Fan, W. A machine learning approach for enhancing process screening and qualification in metal additive manufacturing. Eng. Sci. Addit. Manuf. 2025, 1, 025280018. [Google Scholar] [CrossRef]
  73. Muller, A.C.; Guido, S. Chapter 5 Model Evaluation and Improvement. In Introduction to Machine Learning with Python. A Guide for Data Scientists, 1st ed.; O’Reilly Media Inc.: Sebastopol, CA, USA, 2017; ISBN 978-1-449-36941-5. [Google Scholar]
  74. Raschka, S.; Liu, Y.; Mirjalili, V. Chapter 6: Learning Best Practices for Model Evaluation and Hyperparameter Tuning. In Machine Learning with PyTorch and Scikit-Learn, 1st ed.; Packt Publishing: Birmingham, UK, 2022; ISBN 978-1-80181-931-2. [Google Scholar]
Figure 1. Sankey diagram illustrating the systematic literature selection process.
Figure 1. Sankey diagram illustrating the systematic literature selection process.
Sensors 26 00788 g001
Figure 2. Manufacturing defects classification process (a) for traditional ML methods, and (b) for DL methods.
Figure 2. Manufacturing defects classification process (a) for traditional ML methods, and (b) for DL methods.
Sensors 26 00788 g002
Figure 3. Basic structure of a CNN.
Figure 3. Basic structure of a CNN.
Sensors 26 00788 g003
Figure 4. CNNs in vision inspection for manufacturing. The numbers indicate the frequency used. Traditional ML: refs. [25,36,47,49,51,56,59,61,62,63]. Self-designed CNN: refs. [24,27,30,34,36,37,39,44,45,46,49,53,55,56,60,62,63,64,70,71,72]. YOLO: refs. [22,28,29,31,33,43,50,54,57,58,69].U-Net: refs. [23,48,65]. AlexNet: refs. [54,63,67]. VGGNet: refs. [23,28,32]. ResNet: refs. [23,24,28,45,46,47,50,55,63]. RetinaNet: refs. [33,58]. PSPNet: refs. [26,38]. R-CNN: ref. [38]. Faster R-CNN: refs. [33,34,58,69]. RT-DETR: ref. [33]. ENN: ref. [41]. DRL: refs. [42,52]. SSD: refs. [35,58,69]. MobileNet: refs. [32,45,46]. GoogleNet: ref. [66]. Inception: refs. [45,47,55]. ST-MDL: ref. [49]. RandLA-Net: ref. [67], BDN [40].
Figure 4. CNNs in vision inspection for manufacturing. The numbers indicate the frequency used. Traditional ML: refs. [25,36,47,49,51,56,59,61,62,63]. Self-designed CNN: refs. [24,27,30,34,36,37,39,44,45,46,49,53,55,56,60,62,63,64,70,71,72]. YOLO: refs. [22,28,29,31,33,43,50,54,57,58,69].U-Net: refs. [23,48,65]. AlexNet: refs. [54,63,67]. VGGNet: refs. [23,28,32]. ResNet: refs. [23,24,28,45,46,47,50,55,63]. RetinaNet: refs. [33,58]. PSPNet: refs. [26,38]. R-CNN: ref. [38]. Faster R-CNN: refs. [33,34,58,69]. RT-DETR: ref. [33]. ENN: ref. [41]. DRL: refs. [42,52]. SSD: refs. [35,58,69]. MobileNet: refs. [32,45,46]. GoogleNet: ref. [66]. Inception: refs. [45,47,55]. ST-MDL: ref. [49]. RandLA-Net: ref. [67], BDN [40].
Sensors 26 00788 g004
Table 1. Key characteristics of the studies addressing ML-enhanced inspection in general manufacturing.
Table 1. Key characteristics of the studies addressing ML-enhanced inspection in general manufacturing.
StudyIndustry SectorML
Technique
Vision SystemRobotInspection ApplicationDeployment Scale
A. Villalonga et al. [22]General manufacturingYOLOv10 Mako G192UR5eQCPilot line
A. Rosell
et al., 2023 [23]
AerospaceU-Net, VGG16, ResNet50Not specifiedUnspecifiedAerospace engine componentsDeployed system
D. Kim
et al., 2023 [24]
Assembly
manufacturing
Self-designed CNN, ResNet50Dual camerasUnspecifiedPeg-in-hole assembly qualityPilot/prototype
P. J. Rajesh et al., 2024 [25]Automotive (gears)RFUnspecifiedUnspecifiedGear teeth
defects (cracks, chips,
wear)
Pilot/
prototype
J. Chen et al., 2023 [26]Printing (ink bags)PSPNetSingle and
multi-camera
systems
UnspecifiedAir bubble
volume in ink
bags
Pilot/full
deployment
J. N.
Karigiannis
et al., 2021 [27]
Aerospace
Self-designed CNNUnspecifiedFanuc LR-MATE 200iDFluorescent
penetrant
inspection
Proof-of concept
A. Kirda et al., 2025 [28]General
manufacturing
YOLOv5, VGG16,
ResNet
UnspecifiedMitsubishiMetal edge detectionProof-of-concept
Liwei Zeng et al., 2025 [29]Industrial
inspection
YOLOv5UnspecifiedUnspecifiedDetect objects in robot workspacePrototype/pilot
L. Variz et al., 2019 [30]HMI console
manufacturing
Self-designed CNNMako G125bUR3eButton
condition,
LCD display
defects
Prototype/pilot
M. Shaloo et al., 2024 [31]Parts
assembly
YOLOv8UnspecifiedMitsubishiAssembly
correctness
Prototype/pilot
M. Hussain et al., 2022 [32]Logistics/warehouseMobileNetSmartphone
camera
Pallet racking
damage
Prototype/pilot
N. Terras
et al., 2025 [33]
Food
products
RetinaNet, RT-DETR, Faster RCNN, YOLOUnspecifiedUR3eFood sorting
and quality
Pilot/full
deployment
N. Raj et al., 2024 [34]General
manufacturing
Self-designed CNNUnspecifiedUnspecifiedSheet-metal
defects
(scratches,
dimensional
deviations)
Prototype/pilot
O. Ardiç et al., 2024 [35]AutomotiveFaster
R-CNN, SSD
UnspecifiedFanuc CR-15iaEngine part
defects
Full
deployment
P. Bauer et al., 2022 [36]AutomotiveSelf-designed CNN, SVM,
MLP
ZEISS
COMET 3D
Sensor,
Canon DSLR
Fanuc M-20iaSheet metal
reference
markers
Prototype/pilot
R. Mueller et al., 2019 [37]AerospaceSelf-designed CNN,Laser line
Sensor + RGB camera
UnspecifiedRivet qualityPrototype/pilot
S. Zhou
et al., 2025 [38]
Molded pulp
packaging
R-CNN, PSPNetUSB cameraUR3eClogged
pores in mesh
screens
Pilot/full
deployment
S. Martelli et al., 2018 [39]AerospaceSelf-designed CNN, CNN + LSTMMicrocamera
in endoscope
ABB IRB1600Gearbox
residuals
Prototype/pilot
S. Deshpande et al., 2023 [40]Aerospace
(sealant)
BDNUnspecifiedKUKA KR AgilusGlue dot
quality
Prototype
S. K. H. Lee et al., 2025 [41]AerospaceENNUnspecifiedKUKA KR210Hole quality
in composites
Prototype/pilot
W. Tang et al., 2023 [42]General manufacturingDRLRGB cameraUnspecifiedCracks on
metallic
surfaces
Prototype/pilot
Y. Yazid
et al., 2023 [43]
General manufacturingYOLOv5RGB-D
camera
UR5Metal part
defects on
conveyor
Prototype/pilot
Table 2. Key characteristics of the studies adresssing ML-enhanced vision systems for robotic welding inspection.
Table 2. Key characteristics of the studies adresssing ML-enhanced vision systems for robotic welding inspection.
StudyPrimary ApplicationML TechniqueRobotWelding ProcessIndustrial Context
A. Fernández et al., 2020 [44]Online
monitoring
Self-designed CNN, CNN + LSTMABBArc weldingUnspecified
C. Knaak et al., 2021 [45]Real-time
defect detection
Self-designed CNN, CNN + GRU,
ResNet50,
MobilNetV2,
InceptionV3
UnspecifiedLaser weldingAutomotive/
aerospace
C. Knaak et al.,
2021a [46]
Fault detectionSelf-designed CNN, CNN + GRU,
ResNet50,
MobilNetV2,
InceptionV3
UnspecifiedLaser weldingManufacturing
C. Xia et al.,
2020 [47]
State recognitionResNet,
SVM
UnspecifiedKeyhole TIGManufacturing
D.D. Kumar et al., 2023 [48]Porosity detectionST-MDL,
U-Net
UnspecifiedUnspecifiedInfrastructure
D. Buongiorno et al., 2022 [49]Defect
classification
Self-designed CNN,
DT, SVM, KNN
Comau NJ220Laser weldingAutomotive
(EV batteries)
H. Li et al., 2023 [50]Defect detectionYOLOv5,
ResNet50
UnspecifiedArc weldingAutomotive bracket
production
M. Yemelyanova et al., 2024 [51]Surface defect
recognition
Perceptron,
SVM
UnspecifiedTIG weldingPipe production
Markus Schmitz et al., 2020 [52]Quality evaluationDRLUnspecifiedLaser weldingUnspecified
N. Cherkasov et al., 2023 [53]Surface defect
detection
Self-designed CNNFanuc ARC MateUnspecifiedSteel structures
O. Kartashov et al., 2022 [54]Pipeline weld
inspection
YOLOv5UnspecifiedFusion weldingPipeline installation
S. Kajan et al., 2024 [55]Quality inspectionSelf-designed CNN, AlexNet, ResNet18,
Inception-v3
Fanuc CRX-25iaUnspecifiedUnspecified
S. Zhang et al.,
2022 [56]
Real time defect recognitionSelf-designed CNN, SVM, KNNUnspecifiedTIG weldingAutomation
Van-Doi Truong et al., 2025 [57]Multi-pass
monitoring
YOLOv10UnspecifiedMulti layer multi pass weldingNuclear pressure
vessels
W. Dai et al., 2021 [58]Spot weld
inspection
YOLOv3, SSD, Faster R-CNN, RetinaNetFanucResistance spot
welding
Automotive
X. Dong et al.,
2020 [59]
Defect inspectionRFUnspecifiedUnspecifiedAerospace
Y. J. Cruz et al., 2020 [60]Pre/post-weld
inspection
Self-designed CNNUnspecifiedUnspecifiedLPG pressure
vessels
Yun Shi et al., 2023 [61]Surface defect
detection
1NNUnspecifiedUnspecifiedIndustrial
manufacturing
Table 3. Key characteristics of the studies adresssing ML-enhanced vision systems for inspection in additive manufacturing (AM) processes.
Table 3. Key characteristics of the studies adresssing ML-enhanced vision systems for inspection in additive manufacturing (AM) processes.
StudyAM ProcessMaterialsML ApproachSensor Types
A. Gaikwad et al., 2022 [62]Laser Powder Bed FusionMetals (inferred)Self-designed CNN, SVM, MLP, RF, KNNTwo co-axial high-speed cameras,
thermal imaging
A. Rossi et al., 2021 [63]Fused Filament UnspecifiedSelf-designed CNN, AlexNet, ResNet50, BoW + SVMDigital camera
B. Zhang et al., 2019 [64]Metal AMCoCrMoSelf-designed CNNUnspecified
D. Cannizzaro et al., 2022 [65]Powder Bed FusionMetalsU-NetOff-axis camera
E. Tsintavi et al., 2024 [66]Material Extrusion using syringeOrodispersible
films with
Warfarin
GoogleNetCamera (inferred)
F. Kaji et al., 2022 [67]Laser Direct Energy Deposition via powder feedingMetalsRandLA-NetLaser line scanner
H. Elwarfalli et al.,
2019 [68]
Laser Powder Bed Fusion (Selective Laser Melting)MetalsAlexNetIR tomography
L. Lu et al., 2023 [69]Robot-based Composite Fiber-Reinforced Polymer AMComposite Fiber-Reinforced PolymerFaster R-CNN,
SSD,
YOLOv4
Unspecified
L. Scime et al., 2020 [70]Powder Bed Fusion
(laser fusion, binder jetting, and
electron beam fusion)
UnspecifiedSelf-designed CNNUnspecified
V. Klamert et al.,
2023 [67]
Laser Powder Bed Fusion Polyamide
PA2200
Self-designed CNNLow-cost RGB
Camera
(Raspberry Pi)
Z. Chen et al., 2025 [68]Laser Direct Energy DepositionMetalsSelf-designed CNNUnspecified
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Patrashko, D.Y.; Gurau, V. Machine Learning-Powered Vision for Robotic Inspection in Manufacturing: A Review. Sensors 2026, 26, 788. https://doi.org/10.3390/s26030788

AMA Style

Patrashko DY, Gurau V. Machine Learning-Powered Vision for Robotic Inspection in Manufacturing: A Review. Sensors. 2026; 26(3):788. https://doi.org/10.3390/s26030788

Chicago/Turabian Style

Patrashko, David Yevgeniy, and Vladimir Gurau. 2026. "Machine Learning-Powered Vision for Robotic Inspection in Manufacturing: A Review" Sensors 26, no. 3: 788. https://doi.org/10.3390/s26030788

APA Style

Patrashko, D. Y., & Gurau, V. (2026). Machine Learning-Powered Vision for Robotic Inspection in Manufacturing: A Review. Sensors, 26(3), 788. https://doi.org/10.3390/s26030788

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop