Next Article in Journal
ODCalibrator: An Interactive Visualization System for OD Traffic Flow Calibration in Microscopic Traffic Simulations
Previous Article in Journal
Functional Effects of Sericin on Bone Health and D-Serine Regulation in Estrogen-Deficient Rats
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Industrial Robotic Setups: Tools and Technologies for Tracking and Analysis in Industrial Processes

by
Mantas Makulavičius
*,
Juratė Jolanta Petronienė
,
Ernestas Šutinys
,
Vytautas Bučinskas
and
Andrius Dzedzickis
*
Department of Mechatronics, Robotics and Digital Manufacturing, Vilnius Gediminas Technical University, LT-10105 Vilnius, Lithuania
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2025, 15(18), 10249; https://doi.org/10.3390/app151810249
Submission received: 25 August 2025 / Revised: 15 September 2025 / Accepted: 18 September 2025 / Published: 20 September 2025
(This article belongs to the Special Issue Multimodal Robot Intelligence for Grasping and Manipulation)

Abstract

Since the development of industrial robots, they have been used to enhance efficiency and reduce the need for manual labor. Industrial robots have become a universal tool across all economic sectors, with the integration of software that is extremely important for the effective operation of machines and processes. Robotic action accuracy is currently experiencing rapid development in all robot-involving activities. Currently, a significant breakthrough has been observed in modifying algorithms and controlling robot actions, as well as in monitoring and planning software and hardware compatibility to prevent errors in real-time. The integration of the Internet of Things, machine learning, and other advanced techniques has enhanced the intelligent features of industrial robots. As industrial automation advances, there is an increasing demand for precise control in a variety of robotic arm applications. It is essential to refine current solutions to address the challenges posed by the high connectivity, complex computations, and various scenarios involved. This review examines the application of vision-based models, particularly YOLO (You Only Look Once) variants, in object detection within industrial robotic environments, as well as other machine learning models for tasks such as classification and localization. Finally, this review summarizes the results presented in selected publications, compares represented methods, identifies challenges in prospective object-tracking technologies, and suggests future research directions.

1. Introduction

From the initial development of industrial robots and their implementation in assembly lines, robotics has significantly improved efficiency and reduced the need for humans to perform laborious tasks [1]. Robotics, which employs software to control machines and processes, is now used in all sectors of economic activity. The integration of the Internet of Things (IoT), Machine Learning (ML), and other advanced techniques into numerous applications has resulted in enhanced intelligent features, such as decision-making, perception, adaptability, data-processing, object identification, maps, camera feeds, predictive maintenance by monitoring key components, etc. [2]. As industrial automation continues to progress, an increasing number of application scenarios for manipulators arise, accompanied by a growing demand for precise control. Current solutions must be refined to address issues related to high tracking accuracy, complex computations, and diverse applications within industrial environments, particularly with respect to manipulator trajectory tracking and control schemes [3].
Industrial robotic manipulators (RM) are not standalone mechanical units; they interact with machined workpieces, obstacles, other machines, and humans/users [1]. Motion planning for manipulation tasks is a fundamental task for industrial RM [4]. Consequently, the coordination of these mechanical devices has become a recent technological challenge. Optimizing the trajectory of RM to enhance production accuracy is now a common practice in the industry [5]. Advanced RM along with autonomous systems equipped with machine vision systems for object and obstacle detection and recognition are utilized in various economic activities. Industrial RM are utilized in various tasks, such as machining, welding, and painting.
Efficient and straightforward industrial RM for surface machining tasks are highly desirable. However, RM programming for specialized projects requires skilled operators and additional equipment. For example, to transform a manufacturing environment from manual operation to an RM machining center, a reliable process-monitoring system is required [6].
In certain manufacturing environments, replacing manual processes with RM systems can present significant challenges due to the inherent variability of tasks such as polishing, welding, and grinding, which can result in the need for real-time adjustments to tool paths from predefined trajectories. Spatial limitations or operational restrictions often constrain real-time monitoring of tool paths; however, the use of vision or other types of sensors can enhance the RM’s ability to navigate and compensate for these limitations [7].
The development of industrial production lines is being accelerated by the implementation of RM in the quality control phase, thus enhancing their efficiency. One such area is nondestructive testing—a highly multidisciplinary field that encompasses a wide range of analytical techniques supported by scientific knowledge, including technologies and material sciences [8].
Inspecting a robotic platform and detecting various types of defects typically involves three main tasks: first, path planning; second, shape, pattern, and surface reconstruction; and third, inspection of the entire structure [9]. There are popular non-destructive techniques, such as thermography, for non-contact and full-field inspection of the workspace of an industrial RM [10]. RM upgraded by thermographic instrumentation that makes it easier to inspect the geometry of large complex components using multiple images from optimal positions are highly desirable for these tasks.
Ultrasonic imaging technologies used for nondestructive tests on aeronautical components such as pulse echo and through transmission, as well as other innovative techniques, such as the parallel development of laser ultrasonics and mechanical manipulators and the use of industrial RM with the collaboration of the KUKA (KUKA Aktiengesellschaft, Augsburg, Germany) RABITTM inspection system, have resulted in the availability, reliability, and good maintenance of programs for industrial robots [11]. The index-based triangulation (ITB) method produces a 3D C-scan representation of colorless surfaces on challenging geometric surfaces. The high-speed generation of C-scans, forming large, ultrasonic, 3D-mapped datasets that are able to be used for surface reconstruction, was represented by C. Mineo [12]. Developers in this field are concerned with precise mechanics, ensuring both the accuracy and the user-friendliness of the software.
Automation and robotics made significant advances with Industry 4.0, increasing efficiency and productivity. To recognize an object properly and reliably in real time and manipulate it, integrating a robot with computer vision algorithms is one of the most important challenges. It involves the integration of intelligent machines and robotic components into industrial production systems [13]. Robotization is also highly desirable in agricultural operations during harvests [14]. ML optimization of ML-enabled automatic identification of crops can result in increased efficiency, but some challenges still exist. Convolutional neural networks (CNNs) can perform some tasks like crop detection and harvesting by robotic manipulation has been performed for the first time in a simulated environment. In recent years, biomedical engineering has begun robot-assisted surgical procedures, where a reinforcement-based temporal-difference-based approach results in high potential for complex medical solutions. In a number of industrial domains, robots are utilized to perform tasks that involve repetitive actions in a sequence within a dynamic environment. Consequently, to cope with variability, sensor-based adaptations have been implemented in many aspects of robotic manufacturing cycles [15]. Industrial robots have lower stiffness in comparison with machine tools; thus, there appear to be some disadvantages for machining, but they also have advantages, such as better flexibility. Friction is one of the features that needs to be modeled through analytical or data-driven models to evolve robots. A comparison of the references for friction identification in industrial robots, investigated experimentally, may lead to theoretical research and practical recommendations.
Virtual reality (VR) is gaining traction in the robotics and teleoperation industries, offering new perspectives on methodology that enable humans to interact with robots. VR allows us to control systems in a virtual environment with our physical movements, significantly improving the comfort of robot control compared to traditional button-based teleoperations. Another advantage of VR is the ability to target the eyes to monitor the user’s workload online. F. Nenna [16] covered aspects of VR by analyzing behavior data from simulations of controlling robots performing pick-and-place tasks under low and high mental demands and confirmed that users performed accurately using action-based VR control. Such results provide a new human-centered overview of human–robot interaction in VR and demonstrate potential.
This work aims to show the trends and search for the solutions presented by the authors of selected articles and evaluate the tools, technologies, and methodologies currently used for process tracking and analysis in industrial robotic setups. Particular focus will be given to the role of machine vision in object detection, recognition, and real-time tracking during robotic machining and manipulation tasks. This study aims to identify and categorize advanced solutions that improve the accuracy, reliability, and flexibility of industrial robots during complex operations. Furthermore, it aims to highlight the integration challenges, performance metrics, and future research directions at the intersection of vision systems and robotic process analytics in industrial environments.
The methodology of the present study begins with a comprehensive overview of the materials and methods used for article selection. The criteria, databases, and filtering techniques utilized for the selection of relevant literature are systematically outlined to ensure a focused and credible review. Section 3 explores the application of vision-based models, particularly YOLO (You Only Look Once) variants, highlighting their role in object detection within industrial robotic environments alongside other machine learning models utilized for tasks such as classification and localization. Section 4 explores the application of various systems in the monitoring and evaluation of the performance of machining and manipulation tasks. Finally, the Section 5 and Section 6 synthesizes the findings, comparing the various approaches represented by the authors of selected publications, identifying existing advantages and challenges for specific use cases, and proposing directions for future research in selected works on enhancing robotic vision and tracking systems in industrial contexts. The abundance of information and the different capabilities of search tools do not allow us to guarantee a complete picture of the current situation of the topic under discussion. In order to best reveal the uniqueness of the selected examples and successful stages of work, additional analysis is required. This review highlights the strengths and weaknesses of the analyzed approaches in industrial process tracking and analysis tools and technologies from the perspective of industrial robotics.

2. Materials and Methods

In our research, we selected valuable data after a three-stage procedure involving an initial sorting by title and keyword, followed by a selection of abstracts and full texts, and finally a full reading of the selected documents, which were analyzed from various perspectives, including statistical analysis of the results. This procedure for reviewing different scientific articles, which is called PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses), was utilized as noted in [17] (Figure 1).
Keywords used for scientific literature research included the following: industrial robot, machine learning, machine vision and recognition, algorithm, YOLO, classification and regression trees, robot operating system, machining tasks, robotic manipulation, robot interaction, robot process tracking, and robot system architecture. To ensure relevance, technical depth, and applicability to the scope of the review, only articles that meet the following criteria have been included:
  • The focus of the articles should be on industrial environments, referencing robotic manufacturing, manipulation, and assembly.
  • Articles must address machine vision techniques for object detection, localization, or recognition in an industrial robotic arm setup.
  • Articles must address other ML tracking methods for tools, parts, or RM during operation.
  • The focus of the articles should be on the discussion or evaluation of particular tools, sensors, or technologies.
  • Research must address and describe the use of data acquisition systems.
  • Articles must have been published within the last 5 years, unless the work is unique.
To ensure the review is within our focus and avoid inaccurate or low-value data, unsuitable scientific articles have been excluded based on the following criteria:
  • Articles focusing on non-industrial or mobile robotic applications, unless it has the potential to be adopted in industrial robotics.
  • Studies which only involve simulation and do not refer to industrial process constraints or hardware validation.
  • Articles discussing general ML or vision algorithms without specifically adapting or applying them to industrial robotic systems.
  • Papers on robotic control, navigation, or planning the robot’s path, unless they are directly related to process tracking or machine vision.
  • Articles lacking technical depth.
  • Papers with insufficient citations, a poorly structured methodology, or an unclear experimental setup.
The results of the statistical analysis represented two main topics:
  • Machine vision for object detection and recognition in industrial robotics;
  • Tracking and analysis in industrial robotic machining and manipulation.
Each of these topical areas is illustrated with specific findings reported by the authors, mainly highlighting the types of robots involved, application domains, software platforms, and implementation perspectives in industrial robotics. The insights presented here offer a comprehensive understanding of contemporary trends and challenges relevant to industrial robotic systems, drawing upon the most pertinent and technically robust sources.

3. Machine Vision and Recognition in an Industrial Robotics Setup

Robot manipulation tasks often require scene interpretation in environments, defined by contact or non-contact sensors. Contact sensors include force and tactile sensors, while non-contact sensors include image-based vision sensors that perform tasks such as grasping unknown objects and solving emerging problems, such as varying lighting conditions, real-time object recognition, and adaptive planning in unpredictable settings or dynamic environments. Automated robotic grasping tasks require knowledge of the objects to be manipulated. On the contrary, robotic machining tasks require precise visual information about the geometry, orientation, and positioning of the workpiece to achieve precise, high-quality material removal and dimensional accuracy (Figure 2). The images from visual sensors can be helpful for identifying objects in the environment by matching them with the stored information in a database from stereo systems, laser camera systems, Time-of-Flight (ToF) sensors, RGBD (red, green, blue, and depth) sensors, or other sensors.
Furthermore, advancements in Artificial Intelligence (AI) and Machine Learning (ML) have led to the development of sophisticated models, such as various iterations of the You Only Look Once (YOLO) object detection algorithm. These models have enhanced the capability of robotic systems by enabling real-time object recognition, classification, and localization, even in complex, dynamic environments.

3.1. YOLO Models for Industrial Robotic Vision

In recent years, research using computer vision in robotics for precision manipulation has increased productivity in many industries. In agriculture, productivity depends on technological and mechanical advances. However, current identification algorithms face the problem of color-based identification of fruits. For example, apple identification algorithms have problems identifying green and red apples on tree branches in various workplaces. A. Maideen [18] proposed a YOLO V5 framework to recognize apples automatically by integrating the system into a Raspberry Pi 4B computer, where images were taken with an 8-megapixel camera that uses the camera serial interface (CSI) protocol. In the practical test, the model performed well with apples of all colors, and the receiver operating characteristic (ROC) values were 0.98 and 0.9488. This system was applied in a variety of fields, with multi-robot collaboration achieving excellent results (Figure 3).
In agricultural tasks, H. Zhao [19] proposed that the framework of the YOLO algorithm be employed with traditional rectangular boundary boxes (R-Bbox) for the location of items. YOLO is a computer vision technique utilized for object recognition with excellent real-time detection skills, and the authors claim that its Visual Geometry Group (VGG) models, particularly their deep architecture, enable effective capture of intricate and complex image characteristics.
However, developers state that difficulties in improving the traditional four-axis robotic arm are caused by the cost incurred by the combination of software and hardware coupled in their design. Based on the robot kinematic theory and geometric principles, Y. Wang [20] performed dynamic simulations of the RM forward and reverse trajectory analysis model using the YOLO V7 target detection algorithm, which reduces manufacturing cost and power consumption, with a resulting recognition accuracy of 95.2%.
The work of T. Zhong [21] used 3D object detection and YOLO V5 to improve the performance of vision robots in object detection using a novel OpenVINO-based model deployment approach, which can improve the inference speed of models through model optimization and achieves a 70% reduction in CPU in inference time compared to the baseline model.
For developing an industrial robot control system, S. Kondratyev [22] proposed focusing on computer vision for efficient object detection by analyzing various other ML algorithms in the Simulink environment, and confirmed that the Robot Operating System (ROS), through the YOLO ROS package, can expand its applicability in Irobotic systems with a camera (C270 Digital Webcam) used for its object detection algorithm for manipulation in a manufacturing setup.
Robotic visual support enhanced by deep learning (DL) models with robotic hand grasping objects and a variety of applications has a significant influence on transformative solutions in industry. In the investigation of L. Li [23], the main goal was to drive intelligence and automation in the industrial domain to reduce workplace costs. Color recognition in YOLO-based models (YOLO V4 and YOLO V7) was experimentally evaluated using the EPSON C4-A601S robotic arm grasping execution with inflatable jaws, RealSense Depth Camera D435i, and RGB-D. The training phase employed five Tesla T4 GPUs (each with 16GB) on Ubuntu 20.04, CUDA 11.3, PyTorch 1.10, and Python 3.7. The experiment mirrors this setup but without GPUs, and adapted to varied lighting conditions to achieve optimal grasp.
The 6 Degrees of Freedom (DOF) pose estimation can be crucial. E. Govi [24] states (Figure 4) that when a task is easy for humans, it is challenging for machines, because the process includes intelligent solutions such as recognition and pose prediction. But it is a simple task for robots to compute these things; every case simply needs a dataset and model to handle specific challenges. To illustrate the investigation, the authors selected four reflective textures of industrial objects and pose ambiguities with heterogeneous shapes to be chosen by the collaborative robotic arm (Figure 5).
This investigation presented a new synthetic dataset of industrial objects and a fine-tuning method to close the sim-to-real domain gap, with a new pipeline for RGB-input 6D pose estimation. Using RGB input, a background dataset was created with UnityEngine. The original pipeline was as follows: 2D detection with Single-Shot Detector (SSD) and RetinaNet, Common Objects in Context, and Stochastic Gradient Descent (SGD), were used to train the YOLO V7, and a novel version of Augmented Autoencoder (AAE) or LessAAE significantly helped the learning procedure. Instead of standard template matching techniques, the following revised version of the AAE was applied, as is reported in M. Sunderneyer’s article on 6D object detection [25].
Consequently, many authors have proposed recursive methods to improve the expressive capabilities of backbone networks. Object detection is the most important part of intelligent collection/grasping tasks. The accurate and fast recognition of different targets in collection tasks using systems based on YOLO has made great progress. The YOLO neural network for object detection in onboard devices has been successfully applied (Table 1) to various manipulation tasks due to its real-time and high-accuracy characteristics.
In addition to the table above, deep learning-based object detection algorithms have been successfully applied in autonomous driving, security, industry, and surveillance [27]. The in-mine robot can accelerate the exploration of thermal images based on mining to self-rescue using YOLO V5 and YOLO V8 exploration of thermal images [28]. YOLO, compared to traditional object detection algorithms, improved speed and detection accuracy. Detection in robots using STRAW-YOLO based on YOLO V8 improved 91.6% [26]. The Mamba-YOLO used for shiitake picking by K. Qi [29] had some limitations and needed some cross-validation techniques during training. YOLO-VDS based on YOLO V5 improved the accuracy deployed in Jetson TX2 NX, and YOLO-VDS reached 19.2 frames per second, Ch. Zhu [30] claimed. F. Zheng [31] proposed a model to improve the detection accuracy of YOLO in PASCAL VOC when the dataset has a 3% MAP improvement. The latest YOLO V9-V12 implementation in the area of robotic control has just started, and therefore data is not available in our publications according to the criteria declared in the Section 2.
The validity and precision of these models are verified by performing a robot intelligent assembly recognition task (Figure 6). Using the real-time feedback function of the six-dimensional force sensor, flexible assembly can be achieved between composite workpieces [31]. Recursion of functions is implemented by passing the output layers of a backbone network to the backbone network as an input. Practical application of various YOLO models on an experimental basis with one specific robot can help to evaluate the key evaluation indicators and understand the suitability of various YOLO models for real-time robot applications to select optimal autonomous navigation and interaction architectures. R.Vaghela [32] presented a work comparing YOLOv5, YOLOv8, YOLOv9, and YOLOv10 on mean Average Precision (mAP), Recall and Precision, using the following dataset: COCO, KITTI, BDD100K. The work concludes that the YOLO v9c model achieved the highest mAP50 of 82.20%, the YOLO v9c model maintained a high recall of 0.97 and achieved a precision of 1.00 with a confidence score of 0.95, ultimately concluding that the YOLO v9c model stands out as most suitable overall for different scenarios. Detailed analysis of YOLO is also provided by R. Jalayer [33]. R. Sapkota elaborated on the fusion of language and vision, compared large vision-language models (LVLMs) with YOLO SSD, R-CNN, and detection transformers (DETRs), and claimed remarkable performance [34]. However, the background interference and scale differences are still challenging [35]. After evaluating the publications discussed, it can be concluded that researchers rarely publish studies using more than one version of YOLO in specific robotic tasks or equipment systems. In most cases, YOLO is mainly used for object recognition and color distinguishing between objects for manipulation tasks or obstacle detection, and precise positioning for robust trajectory generation. To sum up, the main challenge that researchers face is the compatibility of machine vision systems with robotic setups, where additional equipment or algorithms are needed for a more precise performance, since the applications of industrial robots vary from farming to mining. Therefore, for each implementation in distant sectors, additional retraining of ML models is required, and for that purpose, scientists need to have material (datasets) to train with. One of the solutions, to avoid spending effort on collecting material manually, is to make synthetic datasets, which are a comparatively cheap practice in terms of time. Although, to make such a dataset, it is crucial to foresee all the variables needed for the synthetic dataset generation, which influences the recognition quality of the overall YOLO models.

3.2. Other ML Models Used for Industrial Robotic Vision

Not only YOLO architectures are used in pose estimation or accurate positioning. Robots with a vision system at the end-effector are a powerful combination for industrial decisions, enabling the execution of actual tasks, and inspection applications [36]. Machine vision algorithms optimize the posing system, but not as successfully as is demanded in recent times. This is a place where detection performance could be improved to avoid occlusions or collisions in the workplace. L. Roveda [36] proposed a Franka EMIKA Panda (Franka Robotics GmbH, Munich, Germany) robot with an Intel RealSense D400 at its end-effector as a robotic platform for a working environment with offline optimization. Bayesian Optimization-based methodology was used to show the possibility to exploit a digital twin of the environment for reconstruction to avoid measurement noise in the 3D reconstruction. In another study, E. Kidiraliev [37] proposed an operating-system-based architecture for human–industrial robot interactions for object detection and collision avoidance in the Gazebo simulator to develop a system based on the Kuka KR3 industrial robot to detect and track humans and other objects in a working area, but as the authors state, the system needs updates. Additionally, touch and collision sensors are significant to operate industrial RM with greater reliability. The unintentional presence of a human in the workplace of an RM or incorrect calculations can lead to a collision. Many authors have developed algorithms to prevent collisions and classify hazards in human–machine interaction [37]. Distance calculation can be implemented using the Gilbert–Johnson–Keerthi algorithm, wherein a human is tracked by an RGB-D sensor, as proposed by S. Secil [38]. T. Tariq [39] proposed using an RGB-D camera to measure the exact position of the operator and an Xbox Kinect V2 sensor/camera (Microsoft, One Microsoft Way, Redmond, Washington, USA) and Scorbot ER-V Plus (Rosh Ha’Ayin, Israel) to mimic applications in real-life human–robot collaboration tasks.
During grasping and manipulation tasks, the position of the camera changes significantly as the robot moves to adapt its path for correct grasping. For proper control in manipulation, the eye-to-hand camera system must become a common part of the robotic system. Position accuracy can be achieved by Long-Short-Term Memory (LSTM) neural networks and spare regression, as proposed by D. Bilal [40] (Figure 7).
In addition to offline programming systems, which are best suited to scenarios in which it is essential to minimize production downtime and to simulate and validate complex tasks virtually prior to deployment, most of the solutions depend on human training or robot offline programming systems, which assume tasks and result in calibration between the robot and the workplace. The system proposed by Z.C. Ong [41] used CNN for precise object detection, allowing the robot arm to interact with a variety of items effectively, where the training phase and the sorting phase were the two key phases of the approach.
The positioning accuracy and orientation accuracy of industrial robots are essential for determining their application in industry. Less-sophisticated industrial robot calibration methods take positional error into account, while more advanced methods consider orientation accuracy. D. Lao [42] presented an error model, the Transform Matrix–Modified Denavit–Hartenberg and Calibrated Levenberg–Marquardt (LM) method, for the calibration of industrial robots based on 6 DOF positions and orientations to identify the actual structural parameters based on the MDH parameters, and the LM algorithm is proposed to eliminate the lack of local optimality.
Building on these model-based calibration techniques, researchers have also explored data-driven strategies to enhance robot motion accuracy. Another method for robotic tracking-error compensation is proposed by Sh. Tan [43], based on a temporal convolutional network used to predict tracking errors in the joints, with the terminal load decomposed using the Jacobian matrix. A pre-compensation method is used to improve joint tracking accuracy in predicted errors and shows good results in accurately tracking the tool center point and orientation in the Cartesian coordinate system.
Whilst learning-based error prediction focuses on dynamic compensation, other studies have investigated geometric correction methods to refine robot path execution. Desirable solutions are an efficient and robust path correction method for robotic autonomous systems. N. Wang [7] proposed a solution to maintain the theoretical shape of the pre-taught path, dealing with location errors and using local matching reflexes to constrain the shape errors caused by structured light vision, based on the iterative closest point algorithm [7]. Furthermore, N. Wang’s proposed method has four steps: path scanning, global matching, local matching, and data updating (Figure 8), using a 6-axis robot, controller, structured light vision sensor, host computer, and welding components (welding power source, wire feeder, welding torch, etc.).
V. Pandiyan [6] proposed a technique for automatic detection of endpoints in weld removal for the automatic detection of weld seams in a robotic abrasive belt grinding process with the help of a vision system using DL, where CNN encoder–decoders are applied for semantic segmentation. The prediction system is able to monitor the geometry of the weld profile during machining for process optimization.
The welding process, which is an indispensable part of the manufacturing industry, has been in demand for many years and continues to attract the attention of researchers. Today, the welding process no longer requires linear work, even operator control, and has become automated with sensors and AI, which is why industrial RM or robots have entered the manufacturing sector, almost replacing humans. Autonomous welding and its quality are entirely based on vision sensors. B. Eren [44] reviewed the application of visual sensors in development for cutting-edge applications in ML and AI for robotic welding. These processes included intelligent robotic welding applications such as calibration, weld start point determination, seam tracking, and weld quality.
Including machine vision systems in robotic machining applications significantly enhances the flexibility, accuracy, and adaptability of automated processes. Due to the integrated combination of vision and force sensing, robotic milling has become an important method for machining complex parts. Optimization was performed on pose planning, dynamics, and deformation control for more effective robotic milling and theoretical guidance for the best precision [45]. Due to the significant dynamic cutting forces during milling, vibration, and machining deformation emerged depending on the characteristics of the robots. These problems are avoidable in low-machining applications, such as simple grinding, polishing, or deburring. To achieve efficiency and precision in robotic milling, relevant research methods are summarized in Z. Zhu’s article [45]. This article states that for accurate machining results, it is important to model robot stiffness, dynamic characteristics, workspace planning, the deformation mechanism, and error compensation strategies based on vision and force sensing, the robot’s adaptive pose adjustment, identification of parameters, accuracy of machining trajectory, early time identification of chatter generation in the milling environment, dynamic deformation prediction through real-time updates, and online updating of model parameters.
When visual information is taken from different points, according to M. Bilal [46], in the context of cluttered scenarios, the number of images captured is a paramount factor in ensuring the accuracy of machine vision systems. M. Bilal [46] proposed to take multiple 3D images, or take one 3D image and multiple 2D images, and combine the results using two methods. First, the exact visible center of the object for calculation is marked, and second, a bounding box to mark the object is used for the calculation of the center by intersecting diagonals, to make the viewpoint more accurate. Object recognition and grasping tasks in industry face complexities including color differences, position varieties, and the orientation of objects. To select properly a DL model with appropriate capabilities is essential to achieve accurate multi-feature target recognition and perception. K. W. Lee [47] proposed a vision-based teleoperation system using master position, a high sampling rate interaction, and force estimation method for mastering position and orientation information without using physical sensors. It used RGB images, 6 DOF robot inputs without force sensors, and a dynamic neural network (DNN) connected to DenseNet with LSTM.
In another study, C. Srinivasamurthy [48] proposed the design of a 6 DOF-automated industrial robotic arm system using TensorFlow object detection controlled by an Arduino ATmega328P microcontroller board and used SolidWorks to design the robotic arm. The hardwire connection was between six MG966R servomotors and a NEMA17 stepper motor connected to Arduino UNO with the support of PC9685 and A4988 motor drivers. The TensorFlow API was used and executed in Python. In addition, TensorFlow requires GPUs for computing. A prototype of a pick-and-place arm with the capability to be trained was developed using 3D printing techniques and deployed in real time using a Logitech C270 Webcam. After testing, the authors claimed that the system’s performance highly depended on the quality of the camera and lighting conditions and required a significant amount of computational power.
With the development of DL it is being introduced in wide fields, such as natural language processing, computer vision, and robotics. Despite the excellent performance of DL for object detection, most industrial vision robots still rely on traditional object detection methods due to the computational limitations of robot controllers [21]. To achieve a pick-up task of a simple-geometry object with a two-fingered jaw attached to a 5 DOF robot, T. R. Deshpande [49] discussed strategies for camera mounting to optimize light source and angle influence. This paper presented statistics on the rate of successful finding and picking-up of the object, which reached from 80% to 96.6%, depending on the geometry of the object. In order to improve the robot’s grasping accuracy, a visual tracking control system for industrial robots was designed based on a time-delay-compensation mechanism [50]. The external parameters of the camera are calibrated according to the relationship of the coordination system between the camera and the robot arm. The motion of the industrial robot is simulated, and then the robot arm generates the coordinate system transformation in the process of motion.
Studies on intelligent robotic manipulation systems are usually focused on programming efficiency, especially when visual recognition is incorporated. The task tree for automatic task planning with decision-making models for a pre-built motion dataset was proposed by Z.Y. Deng [51] and was tested on a multitasking robot arm with grippers and a vision module. Based on its vision recognition results, the robotic arm automatically selects suitable script using a classification and regression tree algorithm and generates an action. The motion of the arm is determined by the results of the truncated analysis and the hardware limitations [51].
Continuing with the grasping and intelligent robotic manipulation tasks, the optical sensing system or the position of the camera relative to the object is a consequence of both robot movement and the mounting location of the camera. There are common strategies that use a point cloud, where the system describes a point in the cloud by checking the relationship between their normal vectors [52]. The general system of this work is represented in Figure 9.
In [52] the PA-10 robot is controlled as a slave in a software architecture client-server, connected with a module server installed on the PC and communicated through ARCNET. The robot receives commands in machine language from controller PA-10. The proposed recognition method uses a software development methodology framework structured on two levels of abstraction [52]. To evaluate precision in visually controlled point estimation, all data are transferred into a common fixed base frame.
In addition to the development of control architectures, research has been undertaken to enhance grasp reliability through perceptual verification [53]. Grasp verification protocols are useful for autonomous manipulation in robots, to provide feedback for planning task components, usually depending on the selection of sensors. The vision-based perceptual verification system using a DL interface with different neural accelerators, when the interface hardware is near the data source, reduces robot dependence on the central server. For systematically comparing machine vision cameras, D. Nair [53] proposed a parametric model generator that generates CNN models to check the latency and bandwidth of machine vision cameras.
In another application of industrial robot manipulation and combination with machine vision systems, S. D’Avella [54] elaborated on the ABB IRB 1200-7-70 (ABB, Zurich, Switzerland) robotic system for loading jewelry pieces from a conveyor belt, where two Blackfly S USB 3.0 stereo cameras with open control loops are mounted under the robot gripper. The main application runs on an ASEM PR4050 industrial computer with an Intel Xeon E2176G. The ROS-Industrial layer is used on top of the ROS framework to control the robot with packages abb_libegm and abb_librws. In particular, the EGM Trajectory interface of abb_libegm is used for pre-defined movements. The performance realization method uses a histogram of oriented gradients that feature descriptors and ML algorithms for the detection of objects (Figure 10).
The vision of developers includes the need to create collaborative robots that are contextually aware, capable of meaningful communication, and adaptable to changing conditions. ML and computer vision allow for the detection and recognition of human poses and facial expressions, thereby improving communication and security. The reviewed studies highlight key advancements in vision-based industrial robotics. For example, optimizing camera placement with end-effector-mounted systems improves object detection accuracy. Human–robot interaction improves with optical sensors and skeletal tracking, and machining and manipulation tasks benefit from AI-enhanced pose estimation and error compensation. Additionally, research shows that low-cost vision systems can perform reliably, and integrating vision with tool changers and delay compensation improves the flexibility of the system. Overall, these developments support the creation of more adaptive, precise, and collaborative robotic systems for Industry 4.0 applications. On the other hand, there are several limitations to achieving the advancements mentioned. For instance, to have reliable vision detection, it is necessary to integrate additional sensors like force sensors, since it is impossible to detect vibrations in specific machining tasks using cameras or laser trackers, for more precise tool displacement in a 3D environment. Additionally, in order to utilize cheap USB cameras, with lower resolution compared to industrial cameras, it is crucial to prepare the overall setup, from the relative position between the observable object and the camera to lighting conditions, like lighting angle, intensity, or reflection. Another notable issue with regard to machine vision detection, including YOLO algorithms, when these are used in combination with industrial robotics, is the limitations of robot controllers when it comes to communication protocols. Nevertheless, one solution to this issue, as outlined above, is to implement an intermediate slave architecture to achieve communication compatibility.

4. Industrial Robotic Process Tracking and Analysis in Machining and Manipulation

Machine vision is just one of the many ways AI is employed in the field of industrial robotics, including machining and manipulation. Through the integration of sensory data from various sources, incorporating not only cameras but also force, accelerometers, laser sensors, and other sensors, AI systems enable dynamic decision making, collision avoidance, and trajectory optimization, and enhance machining and manipulation efficiency. Due to the wide application of industrial robots in precision machining, positioning accuracy is very important for industrial robots. Industrial robots are complex nonlinear systems. For example, in order to more accurately position, a radial function network is elaborated to construct a mapping relationship between uncertain parameters and the coordinates of the end-effector [27].

4.1. Machining Tasks

Trajectory tracking is an essential aspect of ensuring machining accuracy in dynamic environments. M. Zhang [55] proposed an online solution scheme for trajectory tracking in redundant RM with physical constraints through the Zhang neural dynamics method by integration into a time-varying system consisting of time-varying nonlinear equations and time-varying linear inequality (TVLI), and claimed that the model for TVLI indeed had more advantages than the existing varying-parameter Zhang neural dynamics models.
Z. Lai [56] proposed an offline robot programming system for an automatic deburring process, wherein RobSim is built as an add-on to SolidWorks by adding graphics, a kinematics engine, and other modules. The SolidWorks graphical window can directly access the object; thus, the hidden links between objects are bridged between two software blocks, and then object motion is driven inside the SolidWorks assembly document [56]. In this paper, the automatic adaptation of the tool path is represented by detecting the workplace shape using an industrial camera with an OpenCV library and is performed by a NACHI robot.
The real-time tracking of welding with structured light vision enabled welding robots to shorten teaching time and increase accuracy (Figure 11) [57]. Currently, image processing algorithms are deficient, and the security regime is not always considered in trajectory recognition [57]. To solve these problems, X. Zhao [57] proposed an adaptive feature extraction algorithm to extract the center of the seam from laser strips and remove noise with appropriate accuracy and processing speed, using the Pauta criterion method to ensure the accuracy of welding.
In welding tasks, the position accuracy of the robot in repeating accuracy can reach micrometer levels. Sometimes cars emerge in uncontrolled situations where robots do not operate according to a set trajectory, and it can cause damage to equipment. X. Wu [58] proposed a method for real-time evaluation of the trajectory of a welding robot. Multiple datasets are obtained and data phases aligned using the proposed algorithm, and after the Kendall correlation coefficient is applied to identify and remove the weak axis data, the average of the multi-axis datasets with correlation is calculated as a typical trajectory, and a threshold is determined using the μ ± nσ method-based trajectory. The absolute difference between the real-time axis trajectory and the standard trajectory is the main information needed to determine the deviation in the running trajectory, the authors claim, and the robot can be stopped when the deviation exceeds the threshold + σ, respectively.
The Probabilistic Roadmap applied for recognition of the path to a region of interest and the inverse kinematics (IK) approach for the accurate approximation of the pixel space to a real-time workspace in medical surgery was proposed by P. N. Srinivasu [59]. To work with unique structured tissues, software-based procedures and algorithms require a more accurate view to choose an optimal path for the procedure domain. Statistical analysis performed by the authors states that the proposed method requires a high learning rate, which is the weakest part of the exploration factor. Artificial neural networks (ANN) are the other most commonly used mechanism in robotic surgery [60]. The ANN includes kinematics and assessment tools for robotic procedures, but the CNN mechanism has a potential implementation in sensorless force estimation when performing robotic surgery [59]. Despite the fact that this methodology is implemented in the medical field, it can also be applied in industrial settings where it is necessary to work with soft objects. Machining soft or deformable objects is an area in which this methodology can be applied. Examples of such processes include the handling of soft polymers, foams, textiles, foodstuffs, and delicate electronic components, where traditional rigid automation methods can result in deformation or damage.
Controlling the contact force of different surfaces is crucial in many robotic industrial applications. Hybrid motion force control is commonly used for physical interaction scenarios. M. Iskandar [61] represented the hybrid force-impedance framework for a highly dynamic end-effector in free motion directions, with simulations on flat and curved surfaces, with excellent results. The extended Cartesian impedance algorithm incorporates constraints and tracks force in a hybrid manner, working as a unified framework. The involved subspace of force in the direction of contact is decoupled from the dynamics in the motion subspace. Fekik [62] illustrates the abilities of applying sliding mode control and the effectiveness of applying it to a three-dimensional model for PUMA 560 RM, to be applied in machining (deburring, trimming) and surface tracking (polishing) using a sliding mode control law. To improve wood-processing efficiency, a palletizing robot for loading glued laminated timber (GLT) was developed by R. Gao [63]. The robot was equipped with MATLAB Monte Carlo methods and consisted of a six-axis crank mechanism with a sponge suction cup as a gripping actuator, which enables intelligent automatic loading and unloading and pallet feeding operations for small GLTs. The trajectory of wood parts during the loading/unloading process, which the operator planned using the high-order quintic and sextic polynomial curve interpolation method, provides data support and a parameter basis for automatic control and software design of the loading, unloading, and palletizing robots.
V. Pandiyan [6] proposed a technique for automatic endpoint detection of weld removal in automatic weld seam removal in a robotic abrasive belt-grinding process with the help of a vision system using DL, where CNN encoder-decoder CNNs are applied for semantic segmentation. The prediction system is able to monitor the geometry of the weld profile during machining for process optimization.
The primary task of the decoder is to semantically project lower-resolution features into higher pixel space. The shortcut connections between the encoder and decoder were elaborated for better recovery of object details. The encoder performs a series of operations to learn the representation of multi-scale features. The schematic representation of the method is shown in Figure 12.
Currently, robot trajectory planning is mainly divided into Cartesian space trajectory planning and joint space trajectory planning, wherein polynomial, B-spline, and trapezoidal transition curve fitting are used to fit and interpolate. To optimize time and energy when generating the trajectory, constraints were reviewed and set based on robot kinematics and dynamics [64]. The method proposed by Sh. Li [64] is based on interpolation of the discrete points of the locations of each joint based on the five b-spline curves for an underwater visual welding robot. H. Zhao represented a method for planning a grinding trajectory on curved surfaces to improve grinding efficiency on aluminum alloy surfaces (Figure 13) [60].
Existing robot grinding systems can be divided into automatic grinding systems based on workpiece CAD model data and automatic grinding systems based directly on point cloud data [65,66,67,68]. C. Cheng [69] proposed a method capable of not only finding defect areas but also planning the robot’s spraying with effective visual guidance for aluminum alloy manufacturing in aerospace devices.
In fact, the process of spraying does not need consideration of contact with the workplace, since the process is performed in a non-contact manner. This eliminates tool–workpiece interaction forces, reduces wear, and allows for higher flexibility in trajectory planning and surface coverage. Furthermore, path-planning algorithms such as Rapidly Exploring Random Tree (RRT) have been shown to be a highly effective solution for optimizing spraying trajectories in complex geometries, thereby ensuring the uniform coating of surfaces.
The building information modeling (BIM) and vision-based robotic welding trajectory planning method (Figure 14) is represented by T. Li [70], where the main benefits are welding path point extraction by teaching, efficient point cloud registration, trajectory planning, real-time welding tracking using Industry Foundation Classes (IFC), Iterative Closest Point (ICP), and improved dihedral angular structure method.
Traditional industrial robot machining usually involves offline-programmed machining trajectories. In this case, the positioning of the workpiece is performed by robotic tools. However, all methods has limitations, especially when working with complex workpieces and when working in dynamic environments. For instance, machine vision-based workpiece positioning involves using a camera and image processing algorithms. Machine vision provides real-time feedback and upgrades to positioning precision, detecting workpieces, and determining their locations. Image-processing algorithms are essential for detecting and locating workpieces. Hence, despite its advantages, machine-vision-based workpiece positioning faces several challenges. One of the main problems is the reliability of position estimation based on part properties, as complex geometry can affect accuracy. Accurate 3D reconstruction works to determine the exact position of the workpiece but can be disrupted by factors such as noise in captured images, camera system calibration errors, and the need for large computational resources.
To address these limitations, the integration of complementary sensing approaches with vision systems has emerged as a promising area of research above. Force/torque sensors have been shown to provide feedback on contact conditions and enable correction when visual data is unreliable. The utilization of laser scanners and structured-light sensors facilitates the acquisition of high-precision 3D surface mapping, thereby reducing the necessity for camera calibration. Furthermore, inertial measurement units facilitate the compensation of robot or workpiece motion in dynamic conditions. The integration of these sensors has been demonstrated to enhance system robustness, improve accuracy in challenging conditions, and expand the applicability of robotic machining systems.

4.2. Manipulation Tasks

Robotic manipulation and robotic machining are two related fields with the common objective of achieving high precision in industrial tasks. However, they differ fundamentally in scope and requirements. The field of robotic machining centers on material removal or shaping processes, where accuracy, stiffness, and tool-path planning are essential for ensuring surface quality and dimensional tolerances. In contrast, robotic manipulation entails object handling, assembly, and interaction with dynamic environments, necessitating adaptability, precision grasping, and multimodal sensing for reliable performance. In this context, advanced solutions that enhance overall quality can be adapted as well.
C. Mineo [10] proposed an inspection platform consisting of a pulsed thermography setup, RM, an algorithm for image processing, and tested it on a convex-shaped specimen with purpose-made defects. The automotive, machining, electronics, rubber and plastics, food, and aerospace industries were improved through applying different robots with different accuracy needs, resulting in serial configurations of robots. Industries require high accuracy, which can be improved by, for instance, a neural-network-based system optimized by a genetic particle swarm algorithm to improverobot positioning, which is constructed to predict positioning errors. B. Li [71]. After a series of experiments on a KUKA KR 500–3 industrial robot based on GPSO-NN with no-load and drilling scenarios, the proposed method was validated.
Standard industrial robot position control mechanisms have already been implemented; however, the integration of an outer compliance controller is necessary to ensure that safety standards are met. The concept of iterative learning control (ILC) has been developed as a means to optimize the performance of systems when engaged in repeated tracking tasks, as a common motion profile is inherently specified in programs and causes some limitation in scope and improvement. ILC is usually designed to maximize the performance of systems in repetitive tracking tasks. In most known programs, the motion profile is inherently specified, resulting in a restriction in application range and performance improvement. Many ILC designs are not suitable for a temporal trajectory profile and require a repeated path-following task. Y. Chen [72] extended ILC task descriptions to enable the trial variable motion profile and to formulate a path-tracking task within system constraints and practically demonstrated this using a portal robot test platform to compare it with other control methods for the benefit of demonstrating the error reduction, control effort reduction, and handling of constraints.
The machine vision system (Figure 15) presented by U. Bhattarai [73] using, 6-DOF RM, Dell Alienware 15R4, Ubuntu 18.04 with ROS Melodic, Intel Core i7-8750 CPU, 8 GB NVIDIA GeForce GTX 1080 graphics processing unit, RM, a laptop, and an air compressor unit was supplied by a Honda EU2200i. The Inverse Kinematic (IK) solution successfully executed the motion planned for handling the pollen spray, elaborated to identify the target flower cluster for a robotic pollination technique.
Differential evolution (DE) is a type of evolutionary algorithm, based on mimicking a natural process, by the generation of solutions by combining information, wherein processes can be divided into population initialization, mutation or differential variant, crossover or exponential variant, and selection or elitism [74]. DE also can be applied for RMs, and could effectively address problems by working with candidate solution populations, even in situations involving singular points, without requiring Jacobian matrices [74]. Nowadays, the demand of mankind for increasing agricultural production is increasing, so there is an urgent need for robots to perform complex agronomic tasks and overcome the limitations of traditional techniques [75].
S. Sidhik [76] presented an adaptive control system (Figure 16) to change contact robot manipulation tasks, which requires the robot to make and break contact with surfaces. The complexity of the task consists of nonlinear dynamics during contact changes that can damage the robot or domain objects.
P.V. Sabique [77] proposed a framework based on the combination of recurring neural network (RNN), LSTM, Dimensionality Reduction, and cyclical learning rate (CLR) which shows 9.23% and 3.8% improvement in force prediction accuracy in visuo-haptic simulation. In S. Kruzic’s [78] paper, several deep architectures were trained using data collected on real 6 DOF RM using custom-made interaction objects operated by a human, where deep architecture with a Root Mean Square Error (RMSE) metric shows on test sets which were 16%, 12%, and 6% of maximum force in the respective directions of the x, y, and z axes. To enable a standard industrial sensorless position-controlled robot to perform this interaction task, L. Roveda [79] proposed an external joint torque observer, and the extended Kalman Filter was proposed to estimate the external joint torques performed on Franka EMIKA panda robot. Q. Meng [80] proposed a two-link rigid–flexible RM of two links with vibration amplitude restriction to achieve its position control and adaptive tracking approach, and the ANN-SM tracking controller was designed based on RBFNN. The trajectories for RM links were planned based on virtual damping and online correction methods when planning movements. Collectively, these studies underline the potential of deep learning, adaptive observers, and neural controllers to enhance force estimation and trajectory tracking. However, new challenges emerge when robots are required to replicate complex human-like dynamic behaviors.
Compared to human motion, it is a challenging task to apply robots to throw objects properly. Motion planning in Cartesian space is often challenging [81]. The flocking algorithm shows good results in mobility control [82] in 2D and 3D spaces with collision avoidance and navigation feedback using a force-based motion planning algorithm. H. Gao [81] proposed a variable centroid trajectory tracking network with a fusion of visual and force information by entering force data from throwing processing into a visual neural network. For precisely tracking the in-flight trajectory of a variable centroid object, H. Gao [81] proposed performing model-free and centroid tracking by fusing vision and force information. In M. Liu’s research [83], it was pointed out that all existing neural network solutions for controlling robots have complicated computing and a lack of orientation tracking. M. Liu [83] proposed a training-free DNN that investigates the orientation tracking constraints and physical constraints.
Accurate understanding of human intentions is crucial for achieving high-level human–robot collaboration. Existing methods are mostly based on case-specific data and face challenges related to novel tasks and invisible categories, and data is often scarce in real-world conditions. To enhance the proactive cognitive capabilities of collaborative robots, a visual language and time approach can be applied, in which intention recognition is defined as a multimodal learning problem with human-centered prompts. D. Wu [84] represented a human–robot H2R Bridge for semantic communication, which is expected to facilitate proactive HRC and state Visual-Language-Temporal approach, conceptualizing intent recognition as a multimodal learning problem with HRC-oriented prompts (Figure 17).
Industrial robots have had their application in aerospace equipment, where they need absolute pose accuracy. W. Wang [85] present a DL scheme for the compensation of absolute pose errors. Based on deep belief networks, theoretical pose coordinates, and actual pose errors, the mapping model was realized. The scheme has been verified on a 6 DOF industrial robot KUKA KR500-3 in an automatic assembly system, resulting in a more accurate work environment. The cross-contrast training investigated in this paper extracts contrast information instead of just the robot pose error features. In this study [85], the potential of deep-learning-based calibration to enhance absolute positioning is demonstrated. However, challenges remain when robots must adapt to changing environments or perform complex manipulation tasks.
The 6 DOF robotic arm, as usual, suffers from time-consuming computation of low flexibility obstacle-avoiding algorithms and low adaptability to the environment of these algorithms. According to traditional computational methods, the UR3 robot must perform complex image processing and solve the kinematic model. A method for manipulating a 6 DOF UR3 robot based on DL was presented by J. Zhang [86], wherein a new Unity-ROS-created 3D model was built and forward kinematic model calculations were performed using the pre-trained model with Visual VGG16 as the backbone network, combined with the Perspective-n-Point algorithm. Then the Unity-ROS simulated grasping and the ROS subsystem was applied for orientation prediction work and used for feasibility for a VGG16 model based on DL. Industrial milling systems require improvement of positional accuracy. Sh. Ma [87] proposed the validated method in another industrial robot. They used this method on an incremental extreme learning machine model optimized by the improved sparrow search algorithm, where predicted errors are used to compensate for target points in the workplace.
Robots are designed to replace human work in repetitive, heavy, or dangerous professions. There are three main categories of robots: data acquisition, mobile, and manipulation. RMs with different degrees of freedom are used in industry. A robot motion programming tool designed with MATLAB for KUKA is presented in T. Wrutz’s [88] paper. The interface between the KUKA controller and the external PC was investigated by J. Golz [89]. When WLAN moves the robot, the MATLAB can be elaborated to add extra robots and build a master–slave system [90]. Two controllers: fuzzy logic and ANFIS were high-level validated to control the speed of KUKA KR6 R900 SIXX in the M. Mousa investigation [91] and adapted for conventional PID with a self-tuning method and AI controllers. The linear PID controller is mostly used to control RMs. Linear controllers are not effective when uncertainties and fluctuations occur. The simple feedback mode design is effective in dealing with uncertainties and disturbances. The sliding mode control (SMC) was proposed by M. H. Barhaghtalab [92] and proposed SMCBL are compared with inverse dynamic control (IDC), which is a conventional nonlinear control method. The control method was implemented on a 6 DOF IRB-120 RM. The performance of RNN-LSTM demonstrates better results when optimized dimension reduction and loss function are represented as RMSE, and learning rate as CLR.
Industrial robots are required to be autonomous, to be able to regulate behavior in different operational conditions, adapt with low time cost, low resource consumption, and be less time-consuming for humans during procedures. Automatically tuning the control parameters of an RM is a complex task that involves modeling or identifying the robot dynamics. L. Roveda [93] proposed a Bayesian algorithm (torque-controlled FRANKA Emika RM) to tune the low-level controller parameters and the high-level controller parameters. The algorithm works by adapting the control parameters via a data-driven procedure. The safety constraints are also included. The method was demonstrated in a torque-controlled 7 DOF FRANKA Emika RM, where 25 robot parameters were tuned in less than 130 iterations and compared to the embedded position controller. The proposed algorithm was split into two subproblems: the feedback linearizer and the independently designed feedforward controller. The PID was optimized in the second stage. The friction compensation is influenced by model and parameter tuning, and optimization is challenging. In addition to parameter optimization, force estimation and interaction with the environment remain vital to achieving safe and intelligent autonomy.
Robot interactions with their environments are based on force estimation methods used to interact with humans safely without additional external sensing devices. An estimation method for unknown contact forces in RM based on the development of the disturbance Kalman filter (DKF) is proposed by J. Hu [94]. As is known, DKF can take both RM’s dynamic model and the disturbance’s dynamic model into account. The method is divided into two steps: to identify a robot dynamic model and then to construct a force estimation observer. The model of the parameters is usually derived from rigid-body dynamic theory (RBD). The accuracy improvement was performed by a nonparametric compensator trained with multilayer perception of the RBD. Simulations were performed using a 6 DOF Kinova Jaco2 RM. The challenge of estimating the contact force for RM with incomplete model information is represented in Y. Wei’s [95] work. The Gaussian Process (GP) model was adopted to create a data-based term for compensation of residual dynamics and is applied as an observer, and the RM dynamic model is divided into nominal parts using Eulerian-Lagrange theory and a residual dynamics term without prior information. The advantage of the proposed GP model is its computational efficiency in exploiting the information contained in the nominal model. Residual dynamics estimated using GP, elaborated as statistical info in a novel decoupling observer, are presented for real-time force estimation as represented by Y. Wei [95].
A robotic grasping system that merges Oriented FAST and Rotated BRIEF feature detection, VGG19 convolutional neural networks (where VGG stands for Visual Geometry Group), and random sample consensus algorithm for geometric verification, the VGGNet is a prominent CNN architecture, represented by Y.Xiao [96], led to 99% accuracy.
B. Alt [97] proposed the BANSAI (Figure 18) method (Bridging the AI Adoption Gap via Neurosymbolic AI) or the BANSAI approach, which reflects the robot programming and the deployment process, and the symbolic component of the program representation is a traditional, skill-based representation, which is used for robot execution as well as for interaction with a human user (Figure 19). In this article, the authors described the gap in the implementation of AI in industrial robot programming and outlined the AI challenges that arise in robot programming. BANSAI, a neurosymbolic method, was introduced, which is designed to address the specific challenges faced by robot programmers.
Continuing the assembly issues, Y. Song [98] states that the absolute accuracy depends on geometrical errors and assembling, elastic deformation errors from flexible joints, and payload. In that article, a generalized cross-value method is adopted, to solve the ill-conditioning problem. Based on finite and instantaneous screw theory, a calibration method for geometric and deformation errors for the UR3 robot has been proposed.
To improve process productivity by applying time-based planning and scheduling strategies based on the ability to predict the maximum performance of an industrial robot performing non-deterministic tasks. P. Righettini [99] proposed neural network models trained using Stochastic Gradient Descent, the PyTorch library to approximate the task time function of a generic multi-DOF RM, generic Task Time Optimizer (TTO), execution of the model itself within the Python3 interpreter, GTP-TTO algorithm, adapted for 4 DOF RMs such as SCARA, Clavel Delta, linear Delta, palletizing, Adept Quattro, and Cartesian robots.
In conclusion, there is a similarity in the application of AI architectures, such as auto-encoders and CNNs, in robotic machining and manipulation. This integration of visual cameras with other sensors is a key area of focus. While there are notable distinctions between machining tasks and manipulation, the necessity for additional sensors and robotic arm movement observation during manipulation is not a constant requirement. However, the adaptation of ML architectures, such as RNN or specifically LSTM networks, is essential for trajectory movement prediction and optimization. This is due to the utilization of coordinate series data for trajectory analysis. Consequently, the overall robotic setup can become more cost-effective without requiring additional sensors. However, the implementation of ML models can lead to increased computational costs.
Despite the fundamental differences in the examples of activities and solution methods presented in Table 2 above, a common set of insights is visible. The reviewed algorithms demonstrate a broad range of applications, extending from welding and grinding to robotic manipulation. The most effective solutions comprise the Cartesian architecture with point cloud matching and tool deflection compensation, achieving an accuracy of 0.005 mm and repeatability of 0.02 mm, as well as the PSO refinement algorithm, which achieves a measurement accuracy of 50 nm in grinding tasks. Similarly, advanced force prediction methods employing LSTM-RNN demonstrated real-time enhancements of up to 9.23% in accuracy. These findings underscore the efficacy of hybrid optimization and learning-based methodologies in attaining ultra-high precision and adaptability.
Conversely, less effective solutions include curve optimization algorithms for grinding, which only achieved an accuracy of around 10%, and the XGBoost learning algorithm, which exhibited relatively high maximum errors (10.9%) and moderate material removal rate improvements (14.4%). Similarly, the early implementations of neural network–based force estimation provided valuable insights for haptic feedback; however, their slow execution times were not suitable for real-time deployment.
Another approach to covering collaborative robots is to enhance operational efficiency by improving the proactive cognitive capabilities of these robots. In some cases, this is performed using the visual–linguistic–temporal analysis approach, in which intention recognition is defined as a multimodal learning task with human–robot interface principles.
M. Beetz [106] states that in simulating the outcome of a possible action and determining the feasibility of that action, the internal world’s knowledge base plays two roles: (i) a representation of the robot’s state of beliefs about itself and the world (as described above) and (ii) a reasoning mechanism for determining the outcome of a possible action. Here, the internal world knowledge base includes two types of knowledge: current beliefs about the robot and the world, and a predicted internal simulation of future states. Thus, simulations are becoming an increasingly reliable and relevant tool that require improvement of this tool in order to optimize the performance of all robotic systems.
Neurosymbolic programming and AI can be suited for the particular requirements of industrial robot programming. Future works will need to focus on implementing a unified software framework and user interface for neurosymbolic robot programming and evaluating this common approach in a real-world manufacturing scenario, as the past year’s articles have reasoned.
Most of the ML techniques proposed in revised articles have proposed the estimation of force in a real-time environment. The authors, applying their work to surgery, emphasized the limitations of the experimental setup and the context of the surgical environment. The addition of a neural network for learning and fine-tuning or the use of hybrid learning methods can improve the accuracy of prediction. As the most valuable future paths, surgical robot developers focus on cross-model learning, advanced dimensionality reduction techniques, attention models, deep cascade-WR architecture, and online sequential extreme learning machines. The concept of cognitive architecture integrates elements to exhibit abilities: perception, action, learning, adaptation, motivation, autonomy, internal simulation, attention, action selection, and memory still need improvement in all areas.

5. Discussions

Recent trends highlight the shift toward data-driven approaches, which are transforming the industrial robot sector by moving beyond purely physics-based models towards learning-based optimization frameworks [26]. The reviewed studies highlight the continuous effort to enhance the precision, adaptability, and robustness of industrial robotics through calibration, error compensation, sensing, and algorithmic innovation. Conventional methodologies, including kinematic calibration and iterative error modeling, provide a robust foundation for enhancing positioning accuracy [20]. However, these approaches are frequently constrained by local optimality and rigid assumptions. On the other hand, data-driven approaches, including temporal convolutional networks and reinforcement learning, extend these methods by enabling predictive compensation under dynamic conditions. In a similar manner, vision-based approaches, like YOLO architecture, positioning, recognition, and separation have demonstrated their efficacy in enhancing real-time adaptability. However, these approaches are susceptible to noise sensitivity, calibration errors, and substantial computational demands, as is pointed to in [32]. These limitations underscore the necessity of sensor fusion, wherein complementary sensing means such as force/torque, tactile, ultrasonic, IMUs, and laser scanning can address the limitations of vision-only systems.
The comparison between robotic machining and robotic manipulation further illustrate distinct challenges: machining demands physical rigidity and high levels of accuracy, while manipulation requires physical flexibility, the ability to interact with a high degree of accuracy, and multimodal perception. From the standpoint of algorithmic development, there has been considerable progress in recent approaches, resulting in significant advances in robotic performance. The most effective solutions, including PSO refinement for nanoscale grinding, Cartesian point cloud matching with tool deflection compensation, and enhanced RRT for path planning, demonstrate the efficacy of optimization and geometry-based corrections in achieving exceptional accuracy and reliability [69]. Conversely, less sophisticated methods, such as curve optimization for grinding and early XGBoost or neural network implementations, demonstrate limited precision and inadequate adaptability. This highlights the significance of task-specific optimization and computational efficiency. Finally, there is a marked trend towards human-centric robotics. As demonstrated by [84], multimodal recognition (e.g., combining gaze, IMU, and visual data) is crucial for seamless human–robot interaction. The integration of probabilistic, multimodal approaches is further supported by Bayesian inference frameworks [93] and contextual recognition methods [46].
Overall, the existing literature appears to indicate a shift towards hybrid strategies, which integrate physics-based modeling and AI-driven learning. This trend underscores the increasing significance of computational efficiency, sensor integration, and task-specific optimization [32]. The incorporation of recent advances in multimodal sensing and lightweight neural architecture remains central to sustaining performance in real-time industrial contexts. This can be attributed to the fact that the complexity of the architecture is directly proportional to the computational power and time required for effective machine learning architecture training.

6. Conclusions

The following three strategies can been identified as potential paths from the reviewed articles for future research improvements: (i) sensor fusion, combining visual, force/torque, acoustic, and tactile data to counteract the limitations of individual input sources; (ii) hybrid optimization–learning frameworks, integrating reinforcement learning with swarm-based or genetic algorithms to combine adaptability with precision; and (iii) real-time computational efficiency improvements, including code optimization, parallel GPU implementation, and lightweight neural architectures to ensure online feasibility. To advance accuracy and robustness in industrial robotics, it is necessary to combine the strengths of physics-based modeling with data-driven AI approaches.
Additionally, there is a significant shift toward human-centric manufacturing processes, which emphasize the integration of collaborative robots with advanced sensory and cognitive capabilities. ML developers prioritize the integration of human workers with advanced technologies, emphasizing collaboration and recognizing the unique strengths of both humans and machines, with a focus on human well-being. Although robotics and automation have traditionally focused on increasing productivity and reducing human intervention, the latter no longer applies to human–robot collaboration. Human–robot collaboration, intention recognition, and visual-language models address challenges such as task-specific dependencies, data, and training costs. Implementing a multimodal system that integrates visual, linguistic, and temporal information improves proactive cognitive capabilities and flexibility in human–robot interaction tasks. The authors state limitations and suggest possible future work to address them. One possible solution is to explore language-driven robot policy learning using the intent developed in this work as instructions. The most realistic aspiration would be to create a comprehensive perception–cognition–execution system, which remains a long-term goal and challenge in this field, promoting active human–robot collaboration.
In the longer term, the realization of a comprehensive perception–cognition system that enables seamless human–robot collaboration is considered to be of great significance. The realization of this vision necessitates the integration of physical modeling, multimodal AI, and cognitive intent recognition into unified frameworks. The development of such systems has the potential to enhance accuracy and robustness, while also facilitating the emergence of a new generation of industrial robotics that makes a balance between technological efficiency and human-centric design in the concept of Industry 5.0 phase.

Author Contributions

Conceptualization, V.B. and A.D.; methodology, E.Š.; validation, E.Š. and J.J.P.; formal analysis, A.D.; investigation, M.M.; data curation, E.Š.; writing—original draft preparation, M.M.; writing—review and editing, A.D., V.B. and J.J.P.; visualization, J.J.P.; supervision, A.D.; project administration, V.B.; funding acquisition, V.B. All authors have read and agreed to the published version of the manuscript.

Funding

Research was supported by the Joint Research Collaborative Seed Grant Program between National Taiwan University of Science and Technology and Vilnius Gediminas Technical University (Grant No: NTUST-VILNIUS TECH-2024-03).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
IoTInternet of Things
MLMachine Learning
RMRobotic Manipulators
CNNConvolutional Neural Network
VRVirtual Reality
ToFTime-of-Flight
RGBDRed, Green, Blue, and Depth
AIArtificial Intelligence
YOLOYou Only Look Once
CSICamera Serial Interface
ROCReceiver Operating Characteristic
R-BboxRectangular Bounding Boxes
VGGVisual Geometry Group
ROSRobot Operating System
DLDeep Learning
DOFDegrees Of Freedom
SSDSingle-Shot Detector
SGDStochastic Gradient Descent
AAEAugmented Autoencoder
LSTMLong Short-Term Memory
DNNDynamic Neural Network
IKInverse Kinematics
ANNArtificial Neural Network
GLTGlued Laminated Timber
BIMBuilding Information Modeling
IFCIndustry Foundation Classes
ICPIterative Closest Point
ILCIterative Learning Control
DEDifferential Evolution
RNNRecurrent Neural Network
CLRCyclical Learning Rate
RMSERoot Mean Square Error
DKFKalman Filter
GPGaussian Process
BANSAIBridging the AI Adoption Gap via Neurosymbolic AI
TTOTask Time Optimizer
PSOParticle swarm optimization
RRTRapidly Random Tree

References

  1. Fathi, M.; Sepehri, A.; Ghobakhloo, M.; Iranmanesh, M.; Tseng, M.L. Balancing assembly lines with industrial and collaborative robots: Current trends and future research directions. Comput. Ind. Eng. 2024, 193, 110254. [Google Scholar] [CrossRef]
  2. Ayasrah, F.T.M.; Abu-Alnadi, H.J.; Al-Said, K.; Shrivastava, G.; Mohan, G.K.; Muniyandy, E.; Chandra, U. IoT Integration for Machine Learning System using Big Data Processing. Int. J. Intell. Syst. Appl. Eng. 2024, 12, 591–599. [Google Scholar]
  3. Mazhar, A.; Tanveer, A.; Izhan, M.; Khan, M.Z.T. Robust Control Approaches and Trajectory Planning Strategies for Industrial Robotic Manipulators in the Era of Industry 4.0: A Comprehensive Review. Eng. Proc. 2023, 56, 75. [Google Scholar] [CrossRef]
  4. Makulavičius, M.; Petkevičius, S.; Rožėnė, J.; Dzedzickis, A.; Bučinskas, V. Industrial Robots in Mechanical Machining: Perspectives and Limitations. Robotics 2023, 12, 160. [Google Scholar] [CrossRef]
  5. Su, C.; Li, B.; Zhang, W.; Tian, W.; Liao, W. An analysis and reliability-based optimization design method of trajectory accuracy for industrial robots considering parametric uncertainties. Reliab. Eng. Syst. Saf. 2025, 254, 110626. [Google Scholar] [CrossRef]
  6. Pandiyan, V.; Murugan, P.; Tjahjowidodo, T.; Caesarendra, W.; Manyar, O.M.; Then, D.J.H. In-process virtual verification of weld seam removal in robotic abrasive belt grinding process using deep learning. Robot. Comput. Integr. Manuf. 2019, 57, 477–487. [Google Scholar] [CrossRef]
  7. Wang, N.; Shi, X.; Zhong, K.; Zhang, X.; Chen, W. A Path Correction Method Based on Global and Local Matching for Robotic Autonomous Systems. J. Intell. Robot. Syst. Theory Appl. 2022, 104, 1–12. [Google Scholar] [CrossRef]
  8. Hull, B.; John, V. Non-Destructive Testing; Springer Inc.: New York, NY, USA, 1988. [Google Scholar]
  9. Almadhoun, R.; Taha, T.; Seneviratne, L.; Dias, J.; Cai, G. A survey on inspecting structures using robotic systems. Int. J. Adv. Robot. Syst. 2016, 13, 1–18. [Google Scholar] [CrossRef]
  10. Mineo, C.; Montinaro, N.; Fustaino, M.; Pantano, A.; Cerniglia, D. Fine Alignment of Thermographic Images for Robotic Inspection of Parts with Complex Geometries. Sensors 2022, 22, 6267. [Google Scholar] [CrossRef]
  11. Cuevas, E.; López, M.; García, M. Ultrasonic Techniques and Industrial Robots: Natural Evolution of Inspection Systems. In Proceedings of the 4th International Symposium on NDT in Aerospace, Berlin, Germany, 13–15 November 2012. [Google Scholar]
  12. Mineo, C.; Riise, J.; Pierce, S.G.; Summan, R.; Macleod, C.N.; Pierce, S.G. Index-based triangulation method for efficient generation of large three-dimensional ultrasonic C-scans. Insight-Non-Destr. Test. Cond. Monit. 2018, 60, 183–189. [Google Scholar] [CrossRef]
  13. Soori, M.; Arezoo, B.; Dastres, R. Internet of things for smart factories in industry 4.0, a review. Internet Things Cyber-Phys. Syst. 2023, 3, 192–204. [Google Scholar] [CrossRef]
  14. Droukas, L.; Doulgeri, Z.; Tsakiridis, N.L.; Triantafyllou, D.; Kleitsiotis, I.; Mariolis, I.; Giakoumis, D.; Tzovaras, D.; Kateris, D.; Bochtis, D. A Survey of Robotic Harvesting Systems and Enabling Technologies. J. Intell. Robot. Syst. 2023, 107, 1–29. [Google Scholar] [CrossRef]
  15. Rahmati, M. Dynamic role-adaptive collaborative robots for sustainable smart manufacturing: An AI-driven approach. J. Intell. Manuf. Spec. Equip. 2025; ahead-of-print. [Google Scholar] [CrossRef]
  16. Nenna, F.; Zanardi, D.; Gamberini, L. Enhanced Interactivity in VR-based Telerobotics: An Eye-tracking Investigation of Human Performance and Workload. Int. J. Hum. Comput. Stud. 2023, 177, 103079. [Google Scholar] [CrossRef]
  17. Amobonye, A.; Lalung, J.; Mheta, G.; Pillai, S. Writing a Scientific Review Article: Comprehensive Insights for Beginners. Sci. World J. 2024, 2024, 7822269. [Google Scholar] [CrossRef] [PubMed]
  18. Maideen, A.; Mohanarathinam, A. Computer Vision-Assisted Object Detection and Handling Framework for Robotic Arm Design Using YOLOV5. ADCAIJ: Adv. Distrib. Comput. Artif. Intell. J. 2023, 12, e31586. [Google Scholar] [CrossRef]
  19. Zhao, H.; Tang, Z.; Li, Z.; Dong, Y.; Si, Y.; Lu, M.; Panoutsos, G. Real-Time Object Detection and Robotic Manipulation for Agriculture Using a YOLO-Based Learning Approach. In Proceedings of the IEEE International Conference on Industrial Technology, Bristol, UK, 25–27 March 2024. [Google Scholar] [CrossRef]
  20. Wang, Y.; Zhou, Y.; Wei, L.; Li, R. Design of a Four-Axis Robot Arm System Based on Machine Vision. Appl. Sci. 2023, 13, 8836. [Google Scholar] [CrossRef]
  21. Zhong, T.; Gan, Y.; Han, Z.; Gao, H.; Li, A. A Lightweight Object Detection Network for Industrial Robot Based YOLOv5. In Proceedings of the Proceedings-2023 China Automation Congress, CAC 2023, Chongqing, China, 17–19 November 2023; pp. 4685–4690. [Google Scholar] [CrossRef]
  22. Kondratev, S.; Pikalov, V.; Belokopytov, R.; Muravyev, A.; Boikov, A. Designing an Advanced Control System for ABB IRB 140 Robotic Manipulator: Integrating Machine Learning and Computer Vision for Enhanced Object Manipulation. In Proceedings of the 2023 5th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA), Lipetsk, Russia, 8–10 November 2023. [Google Scholar] [CrossRef]
  23. Li, L.; Cherouat, A.; Snoussi, H.; Wang, T.; Lou, Y.; Wu, Y. Vision-Based Deep learning for Robot Grasping Application in Industry 4.0. In Proceedings of the Technological Systems, Sustainability and Safety (TS3), Paris, France, 6–7 February; 2024. [Google Scholar]
  24. Govi, E.; Sapienza, D.; Toscani, S.; Cotti, I.; Franchini, G.; Bertogna, M. Addressing challenges in industrial pick and place: A deep learning-based 6 Degrees-of-Freedom pose estimation solution. Comput. Ind. 2024, 161, 104130. [Google Scholar] [CrossRef]
  25. Sundermeyer, M.; Marton, Z.C.; Durner, M.; Triebel, R. Augmented Autoencoders: Implicit 3D Orientation Learning for 6D Object Detection. Int. J. Comput. Vis. 2020, 128, 714–729. [Google Scholar] [CrossRef]
  26. Ma, Z.; Dong, N.; Gu, J.; Cheng, H.; Meng, Z.; Du, X. STRAW-YOLO: A detection method for strawberry fruits targets and key points. Comput. Electron. Agric. 2025, 230, 109853. [Google Scholar] [CrossRef]
  27. Kang, S.; Hu, Z.; Liu, L.; Zhang, K.; Cao, Z. Object Detection YOLO Algorithms and Their Industrial Applications: Overview and Comparative Analysis. Electronics 2025, 14, 1104. [Google Scholar] [CrossRef]
  28. Addy, C.; Nadendla, V.S.S.; Awuah-Offei, K. YOLO-Based Miner Detection Using Thermal Images in Underground Mines. Min. Met. Explor. 2025, 42, 1369–1386. [Google Scholar] [CrossRef]
  29. Qi, K.; Yang, Z.; Fan, Y.; Song, H.; Liang, Z.; Wang, S.; Wang, F. Detection and classification of Shiitake mushroom fruiting bodies based on Mamba YOLO. Sci. Rep. 2025, 15, 1–14. [Google Scholar] [CrossRef]
  30. Zhu, C.; Li, Z.; Liu, W.; Wu, P.; Zhang, X.; Wang, S. YOLO-VDS: Accurate detection of strawberry developmental stages for embedded agricultural robots. Eng. Res. Express 2025, 7, 015274. [Google Scholar] [CrossRef]
  31. Zheng, F.; Yin, A.; Zhou, C. YOLO with feature enhancement and its application in intelligent assembly. Rob. Auton. Syst. 2025, 183, 104844. [Google Scholar] [CrossRef]
  32. Vaghela, R.; Vaishnani, D.; Sarda, J.; Thakkar, A.; Nasit, Y.; Brahma, B.; Bhoi, A.K. Optimizing object detection for autonomous robots: A comparative analysis of YOLO models. Measurement 2026, 257, 118676. [Google Scholar] [CrossRef]
  33. Jalayer, R.; Chen, Y.; Jalayer, M.; Orsenigo, C.; Tomizuka, M. Testing human-hand segmentation on in-distribution and out-of-distribution data in human–robot interactions using a deep ensemble model. Mechatronics 2025, 110, 103365. [Google Scholar] [CrossRef]
  34. Sapkota, R.; Karkee, M. Object Detection with Multimodal Large Vision-Language Models: An In-depth Review. TechRxiv 2025. [Google Scholar] [CrossRef]
  35. Wan, D.; Deng, L.; Dong, J.; Guo, M.; Yin, J.; Liu, C.; Liu, H. An algorithm for multi-directional text detection in natural scenes. Digit. Signal Process 2026, 168, 105482. [Google Scholar] [CrossRef]
  36. Roveda, L.; Maroni, M.; Mazzuchelli, L.; Praolini, L.; Shahid, A.A.; Bucca, G.; Piga, D. Robot End-Effector Mounted Camera Pose Optimization in Object Detection-Based Tasks. J. Intell. Robot. Syst. Theory Appl. 2022, 104, 1–21. [Google Scholar] [CrossRef]
  37. Lavrenov, R.; Kidiraliev, E. Using optical sensors for industrial robot-human interactions in a Gazebo environment. Proc. Int. Conf. Artif. Life Robot. 2023, 28, 174–177. [Google Scholar] [CrossRef]
  38. Secil, S.; Ozkan, M. Minimum distance calculation using skeletal tracking for safe human-robot interaction. Robot. Comput. Integr. Manuf. 2022, 73, 102253. [Google Scholar] [CrossRef]
  39. Tashtoush, T.; Garcia, L.; Landa, G.; Amor, F.; Nicolas, A.; Oliva, D.; Safar, F. Human-Robot Interaction and Collaboration (HRI-C) Utilizing Top-View RGB-D Camera System. Int. J. Adv. Comput. Sci. Appl. 2023, 12, 10–17. [Google Scholar] [CrossRef]
  40. Bilal, D.K.; Unel, M.; Tunc, L.T.; Gonul, B. Development of a vision based pose estimation system for robotic machining and improving its accuracy using LSTM neural networks and sparse regression. Robot. Comput. Integr. Manuf. 2022, 74, 102262. [Google Scholar] [CrossRef]
  41. Chern, O.Z.; Hoe, H.K.; Chua, W. Towards Industry 4.0: Color-Based Object Sorting Using a Robot Arm and Real-Time Object. Ind. Manag. Adv. 2023, 1, 125. [Google Scholar] [CrossRef]
  42. Lao, D.; Quan, Y.; Wang, F.; Liu, Y. Error Modeling and Parameter Calibration Method for Industrial Robots Based on 6-DOF Position and Orientation. Appl. Sci. 2023, 13, 10901. [Google Scholar] [CrossRef]
  43. Tan, S.; Yang, J.; Ding, H. A prediction and compensation method of robot tracking error considering pose-dependent load decomposition. Robot. Comput. Integr. Manuf. 2023, 80, 102476. [Google Scholar] [CrossRef]
  44. Eren, B.; Demir, M.H.; Mistikoglu, S. Recent developments in computer vision and artificial intelligence aided intelligent robotic welding applications. Int. J. Adv. Manuf. Technol. 2023, 126, 4763–4809. [Google Scholar] [CrossRef]
  45. Zhu, Z.; Tang, X.; Chen, C.; Peng, F.; Yan, R.; Zhou, L.; Li, Z.; Wu, J. High precision and efficiency robotic milling of complex parts: Challenges, approaches and trends. Chin. J. Aeronaut. 2022, 35, 22–46. [Google Scholar] [CrossRef]
  46. Bilal, M.T.; Tyapin, I.; Choux, M.M.H. Enhancing Object Localization Accuracy by using Multiple Camera Viewpoints for Disassembly Systems. In Proceedings of the IECON Proceedings (Industrial Electronics Conference) 2022, Brussels, Belgium, 17–20 October 2022. [Google Scholar] [CrossRef]
  47. Lee, K.W.; Ko, D.K.; Lim, S.C. Toward Vision-Based High Sampling Interaction Force Estimation with Master Position and Orientation for Teleoperation. IEEE Robot. Autom. Lett. 2021, 6, 6640–6646. [Google Scholar] [CrossRef]
  48. Srinivasamurthy, C.; Sivavenkatesh, R.; Gunasundari, R. Six-Axis Robotic Arm Integration with Computer Vision for Autonomous Object Detection using TensorFlow. In Proceedings of the 2023 2nd International Conference on Advances in Computational Intelligence and Communication, ICACIC 2023, Puducherry, India, 7–8 December 2023. [Google Scholar] [CrossRef]
  49. Deshpande, T.R.; Sapkal, S.U. Development of Vision Enabled Articulated Robotic Arm with Grasping Strategies for Simple Objects. In Proceedings of the 2021 IEEE Bombay Section Signature Conference, IBSSC 2021, Gwalior, India, 18–20 November 2021. [Google Scholar] [CrossRef]
  50. Gu, X. Design of delay compensation system for visual tracking control of industrial robots. In Proceedings of the Fourth International Conference on Mechanical, Electronics, and Electrical and Automation Control (METMS 2024), Xi’an, China, 26–28 January 2024; Volume 13163, pp. 2138–2143. [Google Scholar] [CrossRef]
  51. Deng, Z.Y.; Kang, L.W.; Chiang, H.H.; Li, H.C. Integration of Robotic Vision and Automatic Tool Changer Based on Sequential Motion Primitive for Performing Assembly Tasks. IFAC-Pap. 2023, 56, 5320–5325. [Google Scholar] [CrossRef]
  52. Mateo, C.M.; Gil, P.; Torres, F. Visual perception for the 3D recognition of geometric pieces in robotic manipulation. Int. J. Adv. Manuf. Technol. 2016, 83, 1999–2013. [Google Scholar] [CrossRef]
  53. Nair, D.; Pakdaman, A.; Plöger, P.G. Performance Evaluation of Low-Cost Machine Vision Cameras for Image-Based Grasp Verification. arXiv 2020, arXiv:2003.10167. [Google Scholar]
  54. D’Avella, S.; Avizzano, C.A.; Tripicchio, P. ROS-Industrial based robotic cell for Industry 4.0: Eye-in-hand stereo camera and visual servoing for flexible, fast, and accurate picking and hooking in the production line. Robot. Comput. Integr. Manuf. 2023, 80, 102453. [Google Scholar] [CrossRef]
  55. Zhang, M.; Tong, W.; Li, P.; Hou, Y.; Xu, X.; Zhu, L.; Wu, E.Q. Robust Neural Dynamics Method for Redundant Robot Manipulator Control With Physical Constraints. IEEE Trans. Ind. Inf. 2023, 19, 11721–11729. [Google Scholar] [CrossRef]
  56. Lai, Z.; Xiong, R.; Wu, H.; Guan, Y. Integration of Visual Information and Robot Offline Programming System for Improving Automatic Deburring Process. In Proceedings of the 2018 IEEE International Conference on Robotics and Biomimetics, ROBIO 2018, Kuala Lumpur, Malaysia, 12–15 December 2018; pp. 1132–1137. [Google Scholar] [CrossRef]
  57. Zhao, X.; Zhang, Y.; Wang, H.; Liu, Y.; Zhang, B.; Hu, S. Research on Trajectory Recognition and Control Technology of Real-Time Tracking Welding. Sensors 2022, 22, 8546. [Google Scholar] [CrossRef]
  58. Wu, X.; Tian, R.; Lei, Y.; Gao, H.; Fang, Y. Real-Time Space Trajectory Judgment for Industrial Robots in Welding Tasks. Machines 2024, 12, 360. [Google Scholar] [CrossRef]
  59. Srinivasu, P.N.; Bhoi, A.K.; Jhaveri, R.H.; Reddy, G.T.; Bilal, M. Probabilistic Deep Q Network for real-time path planning in censorious robotic procedures using force sensors. J. Real. Time Image Process 2021, 18, 1773–1785. [Google Scholar] [CrossRef]
  60. Almusawi, A.R.J.; Dülger, L.C.; Kapucu, S. Artificial Neural Network Based Kinematics: Case Study on Robotic Surgery. Mech. Mach. Sci. 2019, 73, 1839–1848. [Google Scholar] [CrossRef]
  61. Iskandar, M.; Ott, C.; Albu-Schaffer, A.; Siciliano, B.; Dietrich, A. Hybrid Force-Impedance Control for Fast End-Effector Motions. IEEE Robot. Autom. Lett. 2023, 8, 3931–3938. [Google Scholar] [CrossRef]
  62. Fekik, A.; Azar, A.T.; Hamida, M.L.; Denoun, H.; Kais, D.; Saidi, S.M.; Bousbaine, A.; Kasim, I.; Kamal, N.A.; Al Mhdawi, A.K.; et al. Sliding Mode Control of the PUMA 560 Robot. In Proceedings of the 2023 International Conference on Control, Automation and Diagnosis, ICCAD 2023, Rome, Italy, 10–12 May 2023. [Google Scholar] [CrossRef]
  63. Gao, R.; Zhang, W.; Wang, G.; Wang, X. Experimental Research on Motion Analysis Model and Trajectory Planning of GLT Palletizing Robot. Buildings 2023, 13, 966. [Google Scholar] [CrossRef]
  64. Li, S.; Zhang, X. Research on planning and optimization of trajectory for underwater vision welding robot. Array 2022, 16, 100253. [Google Scholar] [CrossRef]
  65. Wang, J.; Wen, K.; Lei, T.; Xiao, Y.; Pan, Y. Automatic Aluminum Alloy Surface Grinding Trajectory Planning of Industrial Robot Based on Weld Seam Recognition and Positioning. Actuators 2023, 12, 170. [Google Scholar] [CrossRef]
  66. Mineo, C.; Pierce, S.G.; Nicholson, P.I.; Cooper, I. Robotic path planning for non-destructive testing–A custom MATLAB toolbox approach. Robot. Comput. Integr. Manuf. 2016, 37, 1–12. [Google Scholar] [CrossRef]
  67. Ma, K.; Han, L.; Sun, X.; Liang, C.; Zhang, S.; Shi, Y.; Wang, X. A Path Planning Method of Robotic Belt Grinding for Workpieces with Complex Surfaces. IEEE/ASME Trans. Mechatron. 2020, 25, 728–738. [Google Scholar] [CrossRef]
  68. Wang, W.; Yun, C. A Path Planning Method for Robotic Belt Surface Grinding. Chin. J. Aeronaut. 2011, 24, 520–526. [Google Scholar] [CrossRef]
  69. Cheng, C.; Lv, X.; Zhang, J.; Zhang, M. Robot Arm Path Planning Based on Improved RRT Algorithm. In Proceedings of the 2021 3rd International Symposium on Robotics and Intelligent Manufacturing Technology, ISRIMT 2021, Changzhou, China, 24–26 September 2021; pp. 243–247. [Google Scholar] [CrossRef]
  70. Li, T.; Meng, S.; Lu, C.; Wu, Y.; Liu, J. A novel BIM and vision-based robotic welding trajectory planning method for complex intersection curves. Measurement 2025, 253, 117587. [Google Scholar] [CrossRef]
  71. Li, B.; Tian, W.; Zhang, C.; Hua, F.; Cui, G.; Li, Y. Positioning error compensation of an industrial robot using neural networks and experimental study. Chin. J. Aeronaut. 2022, 35, 346–360. [Google Scholar] [CrossRef]
  72. Chen, Y.; Chu, B.; Freeman, C.T. Iterative Learning Control for Robotic Path Following With Trial-Varying Motion Profiles. IEEE/ASME Trans. Mechatron. 2022, 27, 4697–4706. [Google Scholar] [CrossRef]
  73. Bhattarai, U.; Sapkota, R.; Kshetri, S.; Mo, C.; Whiting, M.D.; Zhang, Q.; Karkee, M. A vision-based robotic system for precision pollination of apples. Comput. Electron. Agric. 2025, 234, 110158. [Google Scholar] [CrossRef]
  74. Juříček, M.; Parák, R.; Kůdela, J. Evolutionary Computation Techniques for Path Planning Problems in Industrial Robotics: A State-of-the-Art Review. Computation 2023, 11, 245. [Google Scholar] [CrossRef]
  75. He, L.; Sun, Y.; Chen, L.; Feng, Q.; Li, Y.; Lin, J.; Qiao, Y.; Zhao, C. Advance on Agricultural Robot Hand–Eye Coordination for Agronomic Task: A Review. Engineering 2025, 51, 263–279. [Google Scholar] [CrossRef]
  76. Sidhik, S.; Sridharan, M.; Ruiken, D. An adaptive framework for trajectory following in changing-contact robot manipulation tasks. Rob. Auton. Syst. 2024, 181, 104785. [Google Scholar] [CrossRef]
  77. Sabique, P.V.; Pasupathy, G.; Ramachandran, S. A data driven recurrent neural network approach for reproduction of variable visuo-haptic force feedback in surgical tool insertion. Expert. Syst. Appl. 2024, 238, 122221. [Google Scholar] [CrossRef]
  78. Kruzic, S.; Music, J.; Kamnik, R.; Papic, V. Estimating robot manipulator end-effector forces using deep learning. In Proceedings of the 2020 43rd International Convention on Information, Communication and Electronic Technology, MIPRO 2020-Proceedings, Opatija, Croatia, 28 September–2 October 2020; pp. 1163–1168. [Google Scholar] [CrossRef]
  79. Roveda, L.; Riva, D.; Bucca, G.; Piga, D. External joint torques estimation for a position-controlled manipulator employing an extended kalman filter. In Proceedings of the 2021 18th International Conference on Ubiquitous Robots, UR 2021, Gangneung, Republic of Korea, 12–14 July 2021; pp. 101–107. [Google Scholar] [CrossRef]
  80. Meng, Q.; Lai, X.; Yan, Z.; Su, C.Y.; Wu, M. Motion Planning and Adaptive Neural Tracking Control of an Uncertain Two-Link Rigid-Flexible Manipulator With Vibration Amplitude Constraint. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 3814–3828. [Google Scholar] [CrossRef] [PubMed]
  81. Gao, H.; An, H.; Lin, W.; Yu, X.; Qiu, J. Trajectory Tracking of Variable Centroid Objects Based on Fusion of Vision and Force Perception. IEEE Trans. Cybern. 2023, 53, 7957–7965. [Google Scholar] [CrossRef] [PubMed]
  82. Semnani, S.H.; De Ruiter, A.H.J.; Liu, H.H.T. Force-Based Algorithm for Motion Planning of Large Agent. IEEE Trans. Cybern. 2022, 52, 654–665. [Google Scholar] [CrossRef]
  83. Liu, M.; Shang, M. Orientation Tracking Incorporated Multicriteria Control for Redundant Manipulators With Dynamic Neural Network. IEEE Trans. Ind. Electron. 2024, 71, 3801–3810. [Google Scholar] [CrossRef]
  84. Wu, D.; Zhao, Q.; Fan, J.; Qi, J.; Zheng, P.; Hu, J. H2R Bridge: Transferring vision-language models to few-shot intention meta-perception in human robot collaboration. J. Manuf. Syst. 2025, 80, 524–535. [Google Scholar] [CrossRef]
  85. Wang, W.; Tian, W.; Liao, W.; Li, B.; Hu, J. Error compensation of industrial robot based on deep belief network and error similarity. Robot. Comput. Integr. Manuf. 2022, 73, 102220. [Google Scholar] [CrossRef]
  86. Zhang, J.; Yan, W. 6-DOF UR3 Robot Manipulation Based on Deep Learning. In Proceedings of the 16th International Conference on Advanced Computer Theory and Engineering, ICACTE 2023, Hefei, China, 15–17 September 2023; pp. 237–241. [Google Scholar] [CrossRef]
  87. Ma, S.; Deng, K.; Lu, Y.; Xu, X. Robot error compensation based on incremental extreme learning machines and an improved sparrow search algorithm. Int. J. Adv. Manuf. Technol. 2023, 125, 5431–5443. [Google Scholar] [CrossRef]
  88. Wrütz, T.; Group, V.; Biesenbach, R. Robot Offline Programming Tool (RoBO-2L) for Model-Based Design with MATLAB. In Proceedings of the 2nd International Conference on Engineering Science and Innovative Technology, ESIT 2016, Phuket, Thailand, 21–23 April 2016; pp. 1–5. [Google Scholar]
  89. Golz, J.; Wruetz, T.; Eickmann, D.; Biesenbach, R. RoBO-2L, a Matlab interface for extended offline programming of KUKA industrial robots. In Proceedings of the 2016 11th France-Japan and 9th Europe-Asia Congress on Mechatronics, MECATRONICS 2016/17th International Conference on Research and Education in Mechatronics, REM 2016, Compiègne, France, 15–17 June 2016; pp. 64–67. [Google Scholar] [CrossRef]
  90. Al-Mahasneh, A.J.; Falkenhain, J.; Mousa, M.; Biesenbach, R.; Al-Mahasneh, A.; Baniyounis, M. Development of ANFIS Controller for Trajectory Tracking Control Using ROBO2L MATLAB Toolbox for KUKA Industrial Robot via RSI. In Proceedings of the 2024 21st International Multi-Conference on Systems, Signals & Devices (SSD), Erbil, Iraq, 22–25 April 2024. [Google Scholar] [CrossRef]
  91. Mousa, M.A.A.; Elgohr, A.T.; Khater, H.A. Trajectory Optimization for a 6 DOF Robotic Arm Based on Reachability Time. Ann. Emerg. Technol. Comput. 2024, 8, 22–35. [Google Scholar] [CrossRef]
  92. Barhaghtalab, M.H.; Meigoli, V.; Haghighi, M.R.G.; Nayeri, S.A.; Ebrahimi, A. Dynamic analysis, simulation, and control of a 6-DOF IRB-120 robot manipulator using sliding mode control and boundary layer method. J. Cent. South. Univ. 2018, 25, 2219–2244. [Google Scholar] [CrossRef]
  93. Roveda, L.; Forgione, M.; Piga, D. Robot control parameters auto-tuning in trajectory tracking applications. Control Eng. Pr. 2020, 101, 104488. [Google Scholar] [CrossRef]
  94. Hu, J.; Xiong, R. Contact Force Estimation for Robot Manipulator Using Semiparametric Model and Disturbance Kalman Filter. IEEE Trans. Ind. Electron. 2018, 65, 3365–3375. [Google Scholar] [CrossRef]
  95. Wei, Y.; Li, W.; Yang, Y.; Yu, X.; Guo, L. Decoupling Observer for Contact Force Estimation of Robot Manipulators Based on Enhanced Gaussian Process Model. In Proceedings of the 2022 8th IEEE International Conference on Cloud Computing and Intelligence Systems, CCIS 2022, Chengdu, China, 26–28 November 2022; pp. 1–7. [Google Scholar] [CrossRef]
  96. Xiao, Y. Integrating CNN and RANSAC for improved object recognition in industrial robotics. Syst. Soft Comput. 2025, 7, 200240. [Google Scholar] [CrossRef]
  97. Alt, B.; Dvorak, J.; Katic, D.; Jäkel, R.; Beetz, M.; Lanza, G. BANSAI: Towards Bridging the AI Adoption Gap in Industrial Robotics with Neurosymbolic Programming. Procedia CIRP 2024, 130, 532–537. [Google Scholar] [CrossRef]
  98. Song, Y.; Liu, M.; Lian, B.; Qi, Y.; Wang, Y.; Wu, J.; Li, Q. Industrial serial robot calibration considering geometric and deformation errors. Robot. Comput. Integr. Manuf. 2022, 76, 102328. [Google Scholar] [CrossRef]
  99. Righettini, P.; Strada, R.; Cortinovis, F. Neural Network Mapping of Industrial Robots’ Task Times for Real-Time Process Optimization. Robotics 2023, 12, 143. [Google Scholar] [CrossRef]
  100. Li, L.; Ren, X.; Feng, H.; Chen, H.; Chen, X. A novel material removal rate model based on single grain force for robotic belt grinding. J. Manuf. Process 2021, 68, 1–12. [Google Scholar] [CrossRef]
  101. Lv, Y.; Peng, Z.; Qu, C.; Zhu, D. An adaptive trajectory planning algorithm for robotic belt grinding of blade leading and trailing edges based on material removal profile model. Robot. Comput. Integr. Manuf. 2020, 66, 101987. [Google Scholar] [CrossRef]
  102. Zhu, D.; Feng, X.; Xu, X.; Yang, Z.; Li, W.; Yan, S.; Ding, H. Robotic grinding of complex components: A step towards efficient and intelligent machining–challenges, solutions, and applications. Robot. Comput. Integr. Manuf. 2020, 65, 101908. [Google Scholar] [CrossRef]
  103. Gao, K.; Chen, H.; Zhang, X.; Ren, X.K.; Chen, J.; Chen, X. A novel material removal prediction method based on acoustic sensing and ensemble XGBoost learning algorithm for robotic belt grinding of Inconel 718. Int. J. Adv. Manuf. Technol. 2019, 105, 217–232. [Google Scholar] [CrossRef]
  104. Abeywardena, S.; Yuan, Q.; Tzemanaki, A.; Psomopoulou, E.; Droukas, L.; Melhuish, C.; Dogramadzi, S. Estimation of Tool-Tissue Forces in Robot-Assisted Minimally Invasive Surgery Using Neural Networks. Front. Robot. AI 2019, 6, 457392. [Google Scholar] [CrossRef]
  105. Batty, T.; Ehrampoosh, A.; Shirinzadeh, B.; Zhong, Y.; Smith, J. A Transparent Teleoperated Robotic Surgical System with Predictive Haptic Feedback and Force Modelling. Sensors 2022, 22, 9770. [Google Scholar] [CrossRef]
  106. Beetz, M.; Kazhoyan, G.; Vernon, D. Robot manipulation in everyday activities with the CRAM 2.0 cognitive architecture and generalized action plans. Cogn. Syst. Res. 2025, 92, 101375. [Google Scholar] [CrossRef]
Figure 1. A PRISMA-style flow diagram illustrating the number of articles at each stage.
Figure 1. A PRISMA-style flow diagram illustrating the number of articles at each stage.
Applsci 15 10249 g001
Figure 2. Robotic manipulator setup in the case of a requirement for a complex toolpath [4].
Figure 2. Robotic manipulator setup in the case of a requirement for a complex toolpath [4].
Applsci 15 10249 g002
Figure 3. Graphic representation of the computer-vision-assisted object handling framework using YOLO V5 [18].
Figure 3. Graphic representation of the computer-vision-assisted object handling framework using YOLO V5 [18].
Applsci 15 10249 g003
Figure 4. YOLO structure represented in E. Govi [24].
Figure 4. YOLO structure represented in E. Govi [24].
Applsci 15 10249 g004
Figure 5. Industrial use case: main contributions in data generation, pipeline, and picking experiments as represented in [25].
Figure 5. Industrial use case: main contributions in data generation, pipeline, and picking experiments as represented in [25].
Applsci 15 10249 g005
Figure 6. Intelligent assembly tasks for a robot [31].
Figure 6. Intelligent assembly tasks for a robot [31].
Applsci 15 10249 g006
Figure 7. Coordinate system transformations, where is a transformation representing in and denotes origin of frame OR and denote 1—Camera, 2—Laser, 3—Robot, 4—Laser Target, 5—Camera Target, 6—Cutting Tool and Fixed Base frames OFB [40].
Figure 7. Coordinate system transformations, where is a transformation representing in and denotes origin of frame OR and denote 1—Camera, 2—Laser, 3—Robot, 4—Laser Target, 5—Camera Target, 6—Cutting Tool and Fixed Base frames OFB [40].
Applsci 15 10249 g007
Figure 8. The structure of the robot vision system.
Figure 8. The structure of the robot vision system.
Applsci 15 10249 g008
Figure 9. Distributed architecture of a robot vision system.
Figure 9. Distributed architecture of a robot vision system.
Applsci 15 10249 g009
Figure 10. ABB ROS-Industrial packages details. From Avella [54].
Figure 10. ABB ROS-Industrial packages details. From Avella [54].
Applsci 15 10249 g010
Figure 11. The flowchart of embedded Pauta criteria [56].
Figure 11. The flowchart of embedded Pauta criteria [56].
Applsci 15 10249 g011
Figure 12. (a) An illustration of the encoder–decoder architecture; (b) General description of the proposed methodology [6].
Figure 12. (a) An illustration of the encoder–decoder architecture; (b) General description of the proposed methodology [6].
Applsci 15 10249 g012
Figure 13. The overall flow of the grinding trajectory planning algorithm: (a) Schematic diagram of large Al alloy cabin grinding; (b) The overall flow of the grinding trajectory planning algorithm [60].
Figure 13. The overall flow of the grinding trajectory planning algorithm: (a) Schematic diagram of large Al alloy cabin grinding; (b) The overall flow of the grinding trajectory planning algorithm [60].
Applsci 15 10249 g013
Figure 14. Robotic welding system [70].
Figure 14. Robotic welding system [70].
Applsci 15 10249 g014
Figure 15. Setup of the robotic pollination system during the field trial in Naches and Pullman, WA. The processing laptop, machine vision, manipulation and end-effector system were placed on a utility cart with an accompanying air compressor that supplied pressurized air for air-assisted atomization of pollen suspension. The electrostatic sprayer nozzle and the Intel RealSense D435i RGB-D camera were rigidly attached to the distal end of the UR5e RM [73].
Figure 15. Setup of the robotic pollination system during the field trial in Naches and Pullman, WA. The processing laptop, machine vision, manipulation and end-effector system were placed on a utility cart with an accompanying air compressor that supplied pressurized air for air-assisted atomization of pollen suspension. The electrostatic sprayer nozzle and the Intel RealSense D435i RGB-D camera were rigidly attached to the distal end of the UR5e RM [73].
Applsci 15 10249 g015
Figure 16. (a) Illustrative contact-changing task where the robot experiences discontinuities in dynamics due to different surface friction in the middle of motions “2” and “4”, and collisions at the end of “1”, “2”, and “4”; (b) Simulated environment used in the experiments; robot approaches the table from the top, slides the end-effector (green) along the table until it collides with the wall, and slides along the wall and table; (c) 2D multispring environment with the robot lagging behind the target pattern with the AIC method ed block is the robot end-effector, green lines are springs attached to the end-effector [76].
Figure 16. (a) Illustrative contact-changing task where the robot experiences discontinuities in dynamics due to different surface friction in the middle of motions “2” and “4”, and collisions at the end of “1”, “2”, and “4”; (b) Simulated environment used in the experiments; robot approaches the table from the top, slides the end-effector (green) along the table until it collides with the wall, and slides along the wall and table; (c) 2D multispring environment with the robot lagging behind the target pattern with the AIC method ed block is the robot end-effector, green lines are springs attached to the end-effector [76].
Applsci 15 10249 g016
Figure 17. Four stages of efficient transfer strategy for Vision-Language Models in industrial-like scenarios [84].
Figure 17. Four stages of efficient transfer strategy for Vision-Language Models in industrial-like scenarios [84].
Applsci 15 10249 g017
Figure 18. Model of industrial robot programming process, the roles and involvement of human actors, as well as opportunities for AI assistance [97].
Figure 18. Model of industrial robot programming process, the roles and involvement of human actors, as well as opportunities for AI assistance [97].
Applsci 15 10249 g018
Figure 19. The BANSAI workflow for AI-assisted industrial robot programming. The use of a dual symbolic-subsymbolic program representation (red/gray) enables the seamless integration of AI assistance into typical industrial robot programming processes [97].
Figure 19. The BANSAI workflow for AI-assisted industrial robot programming. The use of a dual symbolic-subsymbolic program representation (red/gray) enables the seamless integration of AI assistance into typical industrial robot programming processes [97].
Applsci 15 10249 g019
Table 1. Summary of recent YOLO uses in industrial robotics.
Table 1. Summary of recent YOLO uses in industrial robotics.
AlgorithmApplicationDetailsRef.
YOLO V5Apple recognitionIntegrated into Raspberry Pi 4B, 8MP camera, ROC values 0.98 and 0.9488[18]
YOLO V3Harvesting item localizationUses of R-Bbox, VGG models for image characteristics[19]
YOLO V7RM trajectory analysisReduces manufacturing cost and power consumption, recognition accuracy of 95.2%[20]
YOLO V53D object detectionOpenVINO-based model deployment, 70% reduction in inference time[21]
YOLO ROSIndustrial robot controlUses Simulink environment, ROS with C270 Digital Webcam[22]
YOLO V4, YOLO V7Color recognition, robotic arm graspingUses of EPSON C4-A601S, RealSense Depth Camera D435i, 5 Tesla T4 GPUs[23]
YOLO V76 DOF pose estimationNew synthetic dataset, fine-tuning method[24]
YOLOv8-to STRAW-YOLO; Target accuracyP, R and mAP@50 of the key point 91.6%, 91.7% and 96.0%, respectively, are 3.4%, 1.0% and 2.3% higher[26]
Table 2. Summary of recent original elaborated algorithms for robotic machining and manipulation.
Table 2. Summary of recent original elaborated algorithms for robotic machining and manipulation.
AlgorithmApplicationDetailsRef.
ICP
Canny algorithm
Deburring operationOLP system, OpenCV
Library; motion RobSim;
[56]
Original extraction algorithm and the correction algorithmWeldingRecognition rate of 97.0%;
Adaptive feature extraction 0.04 s,
[57]
Original trajectory judgment algorithm; original trajectory judgment algorithmWelding, to monitor the real-time statusRPP, up to ±0.04 mm;
APP only ±0.5 mm
[58]
Deep Q Networks model reinforcement learning algorithmRobotic surgery for censorious surgeries in real-timeLearning rate 0.0 and 1.0[59]
Neural Network and Genetic AlgorithmsRobotics assisted minimally invasive surgeryANN architecture[60]
Original Cartesian impedance control algorithmIndustrialDamping ration 0.7;
Transl. Cart. Impedance 1500 N/m
[65]
Sequential quadratic programming with filterWeldingOptimize the time of the quintic B-spline curve trajectory[64]
Particle swarm optimization (PSO) refinement algorithmGrindingMeasurement accuracy of 50 nm;
Measurement range of 200 mm in diameter
[67]
Curve optimization algorithmGrindingAccuracy 10%[68]
XGboost learning algorithmGrindingMax errors 10.9%,
Material Removal Rate is 14.4%.
[100]
Trajectory planningGrindingRa values reach 0.277 μm and 0.264 μm;
Profile errors at blade leading and training edges 0.0319 mm and 0.0342 mm; standard deviation on the convex and concave is 0.0232 and 0.0216,
[101]
Cartesian architecture, point cloud matching algorithm;
tool deflection compensation algorithm
GrindingAccuracy 0.005 mm;
Repeatability 0.0 2 mm
[102]
Combined the acoustic sensing and ensemble XGBoost learning algorithmGrindingAbsolute percentage error 4.373%,[103]
RRTPath planningSuccess rate 97.85%[69]
BIM-basedWeldingRepeat accuracy 0.1mm[70]
Spatial ILCMicro scale
Application
Accuracy level of 10−3[72]
Neural network algorithm;
force estimation algorithm
Testing for the prediction in a force feedback systemHaptic feedback in robotic surgery; execution time of the code should be improved for online estimation[104]
Exponentially weighed recursive least squaresSurgicalThe measured slave force was reduced to 0.076 N; estimates the respective parameters of the Kelvin–Voigt (KV) 0.356 N and Hunt–Crossley (HC) to 0.560 N force models[105]
LSTM based RNNEstimating the force on the surface and internal layers Framework RNN-LSTM + DR + CLR show a 9.23% and 3.8% in force prediction accuracy in real-time and 7.11% and 1.68%[77]
Extended Kalman Filter; control architecture in real sensorless robotic applicationsIndustrial manipulatorsOptimal switching impact/force
Controller is under investigation
[79]
Genetic algorithmRobot manipulatorSMC with SMCBL to eliminate the chattering[92]
Bayesian optimization algorithmRobot manipulator25 parameters optimized[93]
Bridging the AI Adoption Gap via Neurosymbolic AIIndustrial robotDescribed AI gap in industrial robot programming[97]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Makulavičius, M.; Petronienė, J.J.; Šutinys, E.; Bučinskas, V.; Dzedzickis, A. Industrial Robotic Setups: Tools and Technologies for Tracking and Analysis in Industrial Processes. Appl. Sci. 2025, 15, 10249. https://doi.org/10.3390/app151810249

AMA Style

Makulavičius M, Petronienė JJ, Šutinys E, Bučinskas V, Dzedzickis A. Industrial Robotic Setups: Tools and Technologies for Tracking and Analysis in Industrial Processes. Applied Sciences. 2025; 15(18):10249. https://doi.org/10.3390/app151810249

Chicago/Turabian Style

Makulavičius, Mantas, Juratė Jolanta Petronienė, Ernestas Šutinys, Vytautas Bučinskas, and Andrius Dzedzickis. 2025. "Industrial Robotic Setups: Tools and Technologies for Tracking and Analysis in Industrial Processes" Applied Sciences 15, no. 18: 10249. https://doi.org/10.3390/app151810249

APA Style

Makulavičius, M., Petronienė, J. J., Šutinys, E., Bučinskas, V., & Dzedzickis, A. (2025). Industrial Robotic Setups: Tools and Technologies for Tracking and Analysis in Industrial Processes. Applied Sciences, 15(18), 10249. https://doi.org/10.3390/app151810249

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop