Computer Vision-Based Robotic System Framework for the Real-Time Identification and Grasping of Oysters

Qu, Hao-Ran; Wang, Jue; Lei, Lang-Rui; Su, Wen-Hao

doi:10.3390/app15073971

Open AccessArticle

Computer Vision-Based Robotic System Framework for the Real-Time Identification and Grasping of Oysters

by

Hao-Ran Qu

^†,

Jue Wang

^†,

Lang-Rui Lei

and

Wen-Hao Su

^*

College of Engineering, China Agricultural University, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2025, 15(7), 3971; https://doi.org/10.3390/app15073971

Submission received: 9 March 2025 / Revised: 2 April 2025 / Accepted: 2 April 2025 / Published: 3 April 2025

Download

Browse Figures

Versions Notes

Abstract

This study addresses the labor-intensive and safety-critical challenges of manual oyster processing by innovating an advanced robotic intelligent sorting system. Central to this system is the integration of a high-resolution vision module, dual operational controllers, and the collaborative AUBO-i3 robot, all harmonized through a sophisticated Robot Operating System (ROS) framework. A specialized oyster image dataset was curated and augmented to train a robust You Only Look Once version 8 Oriented Bounding Box (YOLOv8-OBB) model, further enhanced through the incorporation of MobileNet Version 4 (MobileNetV4). This optimization reduced the number of model parameters by 50% and lowered the computational load by 23% in terms of GFLOPS (Giga Floating-point Operations Per Second). In order to capture oyster motion dynamically on a conveyor belt, a Kalman filter (KF) combined with a Low-Pass filter algorithm was employed to predict oyster trajectories, thereby improving noise reduction and motion stability. This approach achieves superior noise reduction compared to traditional Moving Average methods. The system achieved a 95.54% success rate in static gripping tests and an impressive 84% in dynamic conditions. These technological advancements demonstrate a significant leap towards revolutionizing seafood processing, offering substantial gains in operational efficiency, reducing potential contamination risks, and paving the way for a transition to fully automated, unmanned production systems in the seafood industry.

Keywords:

oysters; robotic sorting system; computer vision; deep learning; dynamic grasping

1. Introduction

Automation in oyster sorting and packaging is crucial in the seafood industry due to inefficiencies and hygiene risks associated with manual labor. Conventional methods are time-consuming, error-prone, and increase contamination risks, especially for raw oyster consumption [1,2]. To address these challenges, combining robotic systems with computer vision has emerged as an innovative solution to enable robots to mimic human movements and accurately recognize and handle objects [3,4,5].

Convolutional Neural Networks (CNNs) have significantly advanced object detection and robotic grasping tasks, enhancing image recognition, segmentation, and object detection capabilities in complex automation scenarios [6,7,8,9]. Recently, CNN-based models have increasingly been applied in food safety, such as Mask R-CNN for grading South Rock Lobsters [10] and Faster R-CNN for detecting fish bones in Atlantic salmon [11]. In agriculture, YOLOv5 combined with tracking algorithms tracked and detected apples [12], while R-CNN-based deep neural networks identified and classified fruits [13]. However, challenges remain in real-world applications due to the complex nature of agricultural products, occlusion, lighting variations, and background noise [14].

In the food processing industry, robotic arms are increasingly being deployed to address inefficiencies and contamination risks [15,16,17]. Some studies have incorporated angle information into bounding box regression to improve grasping accuracy and detection speed [18,19,20,21]. For dynamic objects, CNNs combined with You Only Look At Coefficients (YOLACT) and Long Short-Term Memory (LSTM) networks have achieved high grasp accuracy [22]. However, research on the robotic autonomous inspection and sorting of aquatic products such as oysters is still limited, facing challenges such as conveyor-based dynamic sorting, light variations, and motion blur. In dynamic grasping scenarios, trajectory prediction accuracy is critical. While simple methods like Moving Average can reduce noise, advanced filters such as KF combined with Low-Pass filters are necessary to balance real-time performance and stability.

This study introduces an advanced robotic system to automate oyster sorting on a conveyor belt, integrating machine learning and robot control techniques. The system uses deep learning for real-time oyster recognition and KF for motion prediction. Key contributions include the following: (a) Development of a Vision System: replacing the backbone network structure with the YOLOv8-OBB model achieves a lightweight effect optimized for dynamic real-world environments where lighting, motion blur, and occlusion pose significant challenges. (b) Utilization of KF and a Low-Pass filter: KF predicts the position of oysters on a conveyor belt, improving grasping accuracy under static and dynamic conditions. Mean smoothing was applied to reduce measurement noise, and a Low-Pass filter was employed to suppress high-frequency disturbances. The hysteresis effect caused by the Low-Pass filter cutoff frequency ensures stable tracking and accurate gripping.

2. Materials and Methods

This section describes the experimental setup and implementation process of the robotic oyster sorting system. It includes the design of the robotic platform, sample preparation protocols, dataset construction and augmentation strategies, object detection model development, and the motion prediction algorithm for dynamic grasping. Each component was developed to ensure seamless integration and real-time performance under production-like conditions.

2.1. Robotic System

2.1.1. Hardware Design of the Robotic System

As shown in Figure 1, the system consists of four main parts: the vision module, the robot body, the upper controller and the lower controller. The vision module is the “eye” of the system, which consists of a camera and a holder for capturing images. The robot body consists of a robotic arm and gripper for grasping and placing objects. The upper controller acts as the “brain” of the system, receiving data from each component, processing it and making decisions. Its main tasks include target recognition, image localization, and motion planning for the robot. Finally, the lower controller acts as the “nervous system” and is directly connected to the robot actuator, sending control signals to drive the robot motion.

A planar grasping approach was adopted for the experimental manipulation of oyster handling, prioritizing convenience. This method maintains a consistent height of the gripper during each operation, eliminating the need for depth information from the images. As a result, a compact and efficient RGB camera was selected and mounted on a bracket positioned above the conveyor belt (eye-to-hand configuration), with the lens parallel to the belt.

The robotic system comprises two main parts: the arm and the gripper. The robotic arm (AUBO-i3, AUBO, Haidian, China) weighs 16 kg and has six degrees of freedom, enabling it to reach any position (x, y, z) and orientation (α, β, γ) within its working space. The gripper, a key component of the system, is responsible for grasping, transporting, and placing objects. Safe gripping requires securely contacting the object and preventing slipping and damage during handling. Therefore, when robots are used for sorting tasks, the shape, size, and material of the objects must be carefully considered, and the gripper should be chosen accordingly. For this study, a flexible gripper was selected for oyster handling. The gripper, attached to the end of the robotic arm via a connector, features four fingers, providing four points of contact to enhance gripping stability. The gripper operates as a pneumatic device, using a valve to control gas flow in a cylinder, which moves a piston to open and close the gripper. As such, the pneumatic manipulator typically has only two states: fully open or fully closed [23]. Despite this, the manipulator offers a high gripping force and fast response time, making it well-suited for the task [24].

2.1.2. System Realization Based on ROS

ROS is a new programming framework for developing artificial intelligence systems such as robots and unmanned systems [25]. The ROS system employs a distributed processing framework, and the executables can be individually designed and loosely coupled at runtime. It offers a wide range of functionalities and mechanisms for robotics development, integrating numerous third-party tools and libraries that significantly enhance the development process. In the ROS platform, four fundamental concepts are typically involved, namely the following: (1) nodes, (2) messages, (3) topics, and (4) services.

In ROS, each process runs in the form of a node. Communication between different nodes is facilitated through topics or services, where topics and services employ asynchronous and synchronous communication, respectively. Messages represent specific data exchanged between nodes. As shown in Figure 2, the control framework of the robotic system consists of a computer serving as the upper controller, a robot control cabinet as the lower controller, and an RGB camera connected to the host computer via a serial port. The ROS system runs on the host computer and is used to receive and process the real-time image information acquired by the camera. The host computer is connected to the control cabinet via Ethernet and establishes communication via the Ethernet port to send control commands to the control cabinet. The control cabinet then outputs control signals to control the movement of the robotic arm, while the robot provides real-time feedback on the angle of each joint. The robot is also connected to a solenoid valve via IO control to manage the opening and closing of the gripper. In addition, the output state of the IO provides feedback on the open or closed state of the robot.

As shown in Figure 3, the ROS system framework is described as follows: six nodes (Camera, YOLOv8-OBB, Regulation, MoveIt, Robot, and Gripper) which contained two topics (Topic1, Topic2) and three services (Service1, Service2, Service3) were designed.

The image captured by the camera serves as the input to the “YOLOv8-OBB” node, which processes it to output the grasp point information. This information is then sent to the “Regulation” node as input. Since both processes involve only one-way data transmission and do not require feedback, these three nodes are connected via two Topics. The publisher and subscriber of Topic1 are “RGB camera” and “YOLOv8-OBB”, respectively. The publisher and subscriber of Topic2 are “YOLOv8-OBB” and “Regulation”, respectively. The message published and subscribed to in Topic1 consists of RGB images detected by the camera. The message published and subscribed to in Topic2 consists of the 2D coordinates (x,y) of the grasp point in the image and the grasping angles

θ

.

After receiving the grasp point information, the “Regulation” node sends it to the “MoveIt” node, which handles the kinematics and plans the trajectory before issuing control commands to the robot controller. The commands direct the robotic arm and gripper to move to the grasp point. During this process, the robotic arm provides real-time feedback on the rotation angles of each motor to the “MoveIt” node. In turn, the “MoveIt” node updates the “Regulation” node on the status of the robotic arm to verify whether the gripper has reached the desired position. Since this process requires real-time bidirectional data transmission, two servers are used to connect these three nodes. Service1 connects the “Regulation” node as the server and the “MoveIt” node as the client, while Service2 connects the “MoveIt” node as the server and the “Robot” node as the client. Once the “Regulation” node confirms that the gripper is in the correct position, it sends a control command to the “gripper” node to manage the opening and closing of the gripper. The gripper must also provide real-time feedback on its status to the “Regulation” node. To facilitate this, Server 3 connects the “Regulation” and “Gripper” nodes, with “Regulation” serving as the client and “Gripper” as the server.

2.2. Sample

To ensure reliable model training and evaluation, biological samples were collected and processed under controlled conditions. The following subsections describe the procedures for oyster sample preparation and the subsequent creation and augmentation of the dataset used for model development.

2.2.1. Sample Preparation

A total of 25 fresh oysters were purchased from three seafood wholesale markets in Yantai, China. These oysters were randomly selected to ensure that their shape, size, freshness, and other characteristics were varied. They were then stored in a thermostatic sealed container of 4 °C for use in subsequent analyses.

2.2.2. Dataset Creation and Data Augmentation

A total of 140 photographs containing oysters comprise the raw dataset. In real production lines, oysters may present various orientations. Thus, a different number of oysters were selected each time and placed in random positions for image acquisition. During the shooting process, the camera was perpendicular to the bottom plate on which the oysters were placed. The resolution of the captured image was 4032 × 3024 pixels, and images were stored as .tiff files. To ensure randomness, the captured images were shuffled. Image labeling was performed manually using the roLabelImg v1.8.5 software (https://github.com/cgvict/roLabelImg (accessed on 20 June 2024). The oyster region was selected by drawing a bounding box and then rotating it to form the minimum bounding rectangle around the oyster. The quadrilateral coordinates and angle outputs were saved in a TXT file using the DOTA format which was passed into a training process together with the input images after conversion. This conversion was implemented using a custom Python v 3.9 script based on the official DOTA coordinate transformation protocol.

To increase the amount of training data, a data augmentation method was employed. The roLabelImg tool, which allows users to determine the minimum bounding rectangle, also facilitates the easy generation of JavaScript Object Notation (JSON) txt files for subsequent data augmentation. Therefore, images generated by the roLabelImg tool were used to augment the training dataset. Geometric transformations were performed on the image, including the random flipping and translation of object data, the adjustment of image brightness, and the addition of salt and pepper noise. These techniques aim to improve the generalization ability of the model (as shown in Figure 4). As a result, the original set of 140 images was expanded to 700 images, which was then divided into two subsets: a training subset consisting of 564 images and a testing subset comprising 136 images. Finally, all outputs were converted to TXT files required for training the detection model.

The green box is the rotation box, the central green point is the anchor point of the rotation box, and the red is the coverage area of the rotation box.

2.3. Objection Detection for Oysters

The images caught by the camera required further processing to obtain the coordinates of the oysters. Traditional image processing usually employs threshold segmentation to extract objects from the background, but this method has encountered many challenges particularly in situations where images contain a high level of noise. This is where objection detection techniques using CNN are considered as an effective alternative.

2.3.1. YOLOv8-OBB Model for Oysters Detection

YOLOv8-OBB, a model developed based on YOLOv8, was employed to implement the object detection component of the proposed system. Specifically designed for detecting rotated objects, YOLOv8-OBB builds upon the advancements of YOLOv8, which itself introduces several new features and improvements over previous versions [26,27,28]. Firstly, YOLOv8 replaces the C3 structure with the C2f structure, which provides a richer gradient flow, and adjusts the number of channels for different scale models. Additionally, the Head section adopts a decoupled structure, separating classification and detection, and transitions from an anchor-based model to an anchor-free model. Currently, YOLOv8 has a fast detection speed and high detection accuracy, making it highly popular in object detection applications.

However, the standard detection boxes in YOLOv8 are horizontal bounding boxes, which pose a challenge during the grasping process. In planar grasping, regardless of the orientation of the oyster, the object detection output can only provide the location of the grasping point, not determine the grasping angle. This limitation can lead to failed grasping attempts or even collisions between the gripper and the oyster, potentially causing damage, especially when there is an angular deviation between the oyster and the gripper. Fortunately, YOLOv8-OBB adds a classification loss for the bounding box rotation angle on the basis of YOLOv8, allowing the final output to be described by five parameters:

[x, y, l o n g s i d e, s h o r t s i d e, θ]

, where “x” and “y” represent the coordinates of the center point of the rotated bounding box, “longside” represents the longest edge of the rotated bounding box, “shortside” represents the shortest edge, and “

θ

” represents the angle (in a clockwise direction) at which the x-axis needs to be rotated to align with the longest edge of the rotated bounding box.

After obtaining the object detection results of YOLOv8-OBB, the coordinates of the center point of the rotating bounding box (x,y) are used as the grasping position, while the angle θ is used as the grasping angle of the gripper. Since these coordinates are in pixel form and cannot be used directly for robot manipulation, the camera was first calibrated according to the method of Zhang [29]. Following this, the conversion relationship between pixel coordinates and robot coordinates was determined through a nine-point calibration process, enabling accurate and reliable grasping.

2.3.2. Optimizing the Model Based on the Lightweight Network MobileNet V4

A lightweight network, MobileNet V4 [30], was used as the backbone of the network for feature extraction to reduce the computational cost by decreasing the redundant parameters in the model (shown in Figure 5). MobileNet V4 is a lightweight network optimized from MobileNet V1 to MobileNet V3 [31,32,33]. MobileNet V1 introduced depthwise separable convolutions as a replacement for traditional convolutions, significantly reducing the number of parameters and computational complexity. However, the network lacked residual connections, and the depthwise convolutions often suffered from kernel degradation. To address these issues, MobileNet V2 introduces an inverted residual module similar to the ResNet architecture, which improves the performance of MobileNet V1. MobileNet V3 further improved upon MobileNet V2 by incorporating the SE-net attention mechanism and adopting the h-swish activation function, leading to enhanced network performance. Building on the foundations of the previous MobileNet versions, MobileNet V4 introduces the Universal Inverted Bottleneck (UIB) search block, which integrates the Inverted Bottleneck (IB), ConvNext, feed-forward networks (FFN), and a novel extra depthwise convolution. Additionally, MobileNet V4 introduces Mobile MQA, which provides an inference acceleration of over 39%.

2.4. Moving Oysters Prediction

Static oyster grasping strategies typically exhibit significant limitations when dealing with dynamic oyster flow on conveyor belts. Since the arrangement and movement state of oysters on the conveyor belt is constantly changing, static grasping mechanisms are often unable to adapt to such dynamics, resulting in inefficient grasping and insufficient grasping accuracy. The static grasping mechanism estimates when the target object enters the grasping range of the robot based on the known conveyor speed, which is often unable to adapt to such dynamic changes, resulting in inefficient grasping and insufficient grasping accuracy. However, this approach is prone to significant errors. These errors are more likely to occur when the speed of the conveyor belt or the position of the target object changes. Such variations can lead to potential grasping failures. In cases where the conveyor belt speed cannot be precisely determined, many sorting systems use encoders to detect the speed in real-time. This can help reduce errors to some extent. However, it requires more complex hardware, which increases costs. In order for the robot to accurately grasp oysters moving on the conveyor belt, the effect of system noise on the predicted results must be considered. The motion of the conveyor belt can be approximately represented as a linear system. Therefore, the object detection algorithm combined with the filter allows us to use the visual information more effectively to guide the robot grasping. The Erdmann filter [34,35] is an algorithm for the optimal estimation of the system state using the state equation of a linear system and system observations. It serves two primary functions: the optimal estimation of the current state of a signal and the prediction of future states. A Low-Pass filter is a signal-processing tool designed to pass low-frequency components and attenuate high-frequency ones [36]. It primarily reduces high-frequency noise for cleaner audio and prevents aliasing artifacts during analog-to-digital conversion.

In practical applications, the KF is widely used for state estimation of dynamic systems due to its recursive nature and optimal estimation capabilities. However, in some cases, there may be high-frequency noise or measurement errors in the system. In such situations, introducing a Low-Pass filter can further enhance the filtering effect.

When using KF, the first step is to establish the system model and observation equation. Assume that A, B, C,

U_{t}

, and

W_{t}

are, respectively, the state transition matrix, control matrix, system control input, and system noise vector. Then, the state

X_{t}

of the system at time t is defined as follows:

X_{t} = {AX}_{t - 1} + {BU}_{t - 1} + W_{t - 1}

(1)

Assume that

Z_{t}

, H, and

V_{t}

are, respectively, the observation vector at time t, the observation matrix of the system, the observation noise vector. The observation equation of the system is defined as follows:

Z_{t} = {HX}_{t} + V_{t}

(2)

Assume that W and V are both white noise processes that follow Gaussian distribution, with covariance matrices Q and R, respectively. In this case, a first-order Low-Pass filter is introduced:

{\hat{X}}_{t}^{L P F} = α {\hat{X}}_{t} + (1 - α) {\hat{X}}_{t - 1}^{L P F}

(3)

Fusing the Low-Pass filter into the KF yields the following prediction and update equations. The prediction equation is defined as follows:

{\hat{X}}_{t}^{-} = A {\hat{X}}_{t - 1}^{L P F} + B U_{t - 1}

(4)

P_{t}^{-} = A P_{t - 1} A^{T} + Q

(5)

where

{\hat{X}}_{t}^{-}

is defined as the prior status estimation value derived from status transition equation at the moment of t−1 (“

^

” denotes the estimated value and “

-

” denotes the prior value). Similarly,

{\hat{P}}_{t}^{-}

is the prior estimated value of the error covariance matrix; the prediction uses the state estimation at the previous time after the Low-Pass filter

{\hat{X}}_{t - 1}^{L P F}

. The updated equations are defined as follows:

\begin{matrix} K_{t} = {\hat{P}}_{t}^{-} H^{T} {(H {\hat{P}}_{t}^{-} H^{T} + R)}^{- 1} \end{matrix}

(6)

\begin{matrix} {\hat{X}}_{t} = {\hat{X}}_{t}^{-} + K_{t} (Z_{t} - H {\hat{X}}_{t}^{-}) \end{matrix}

(7)

\begin{matrix} P_{t} = (I - K_{t} H) {\hat{P}}_{t}^{-} \end{matrix}

(8)

where

K_{t}

,

{\hat{X}}_{t}

,

P_{t}

, and

I

are, respectively, the Kalman gain matrix, the optimum filter value, the posterior estimated value of the error covariance matrix, and the unit matrix. The Low-Pass filter processing formula is as follows:

\begin{matrix} {\hat{X}}_{t}^{L P F} = α {\hat{X}}_{t} + (1 - α) {\hat{X}}_{t - 1}^{L P F} \end{matrix}

(9)

where

{\hat{X}}_{t}^{L P F}

is taken as the optimal state estimate at the current time and will be used for the prediction at the next time.

During the actual experimental process, the conveyor belt carries oysters at a certain speed along the negative direction of the camera coordinate x-axis, and there is no relative sliding between the oysters and the conveyor belt. The external input is considered to approximate zero. During image acquisition, the time interval between adjacent frames is so short that it is assumed that the speed of the target object remains constant in a very short time, that is, the object moves in a uniform linear motion between adjacent frames. Considering that the direction of the conveyor belt motion aligns with the x-axis of the camera coordinate system, although there may be a slight offset in the y-axis direction due to installation errors, this error can be considered negligible. Therefore, a set of two-dimensional vectors

(x_{t}, v_{x, t})

is used to describe the displacement and velocity of the oysters in the x directions at time t. The motion equation of the oysters can be represented as follows:

\begin{matrix} x_{t} = x_{t - 1} + v_{x, t} ∆ t \end{matrix}

(10)

The state variable vector of the system is as follows:

{\hat{X}}_{t}^{L P F} = [\begin{matrix} x_{t} \\ v_{x, t} \end{matrix}]

(11)

The state matrix of the system is as follows:

A = [\begin{matrix} 1 & ∆ t \\ 0 & 1 \end{matrix}]

(12)

The state

{\hat{X}}_{t}^{L P F}

of the system is as follows:

{\hat{X}}_{t}^{L P F} = [\begin{matrix} x_{t} \\ v_{x, t} \end{matrix}] = [\begin{matrix} 1 & ∆ t \\ 0 & 1 \end{matrix}] [\begin{matrix} x_{t - 1} \\ v_{x, t - 1} \end{matrix}]

(13)

Since only the position is observed, the observation matrix is as follows:

\begin{matrix} H = [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}] \end{matrix}

(14)

In order to ensure that the number of iterations is sufficient to obtain the optimal information of the oyster state, the KF + Low-Pass filter method is used. When the grasp point reaches the middle of the x-axis field of view of the camera, the position of the oyster t seconds ahead is predicted by evaluation. This prediction is further refined by considering the average state of the oyster over a certain time horizon. Finally, the robot is controlled to intercept and capture the oyster.

3. Results

To validate the performance of the proposed robotic oyster sorting system, a series of experiments were conducted across multiple modules. These include object detection accuracy, grasping success rates under static and dynamic conditions, and prediction performance under variable conveyor speeds. The following subsection presents the results of the object detection models evaluated on the constructed dataset.

3.1. Object Detection

Seven different models are trained and evaluated, including the original YOLOv8s network, YOLOv11s, and lightweight models using MobileNet V4, MobileNet V3, and MobileNet V2, among others. The performance of these models was compared using the mAP50-95 metric. The mAP is a composite metric that takes into account both precision and recall and provides a comprehensive assessment of model performance.

Table 1 shows that models with MobileNet V4 and similar backbone structures achieve comparable mAP scores on the dataset with only slight performance changes. However, there are significant differences in terms of model parameters, computational complexity, and accuracy. For example, YOLOv8s has 11,422,166 parameters and a computational complexity of 29.6 GFLOPs. In contrast, YOLOv11s achieves about 15% reduction in the number of parameters and 23% reduction in GFLOPs, while achieving the highest 99.99% accuracy. Compared with YOLOv8s, MobileNet V4 reduces the number of parameters by about 50% and GFLOPs by 32.4%. Although its accuracy is 99.87%, which is slightly lower than that of YOLOv8s, this small accuracy loss is an acceptable trade-off compared with the large parameter and computational complexity reduction. In addition, Shufflenetv1 and Shufflenetv2 also demonstrate advantages in parameters and computational complexity, but MobileNet V4 is particularly outstanding in terms of overall efficiency and deployment flexibility. Therefore, MobileNet V4 not only performs well in resource-constrained environments, but also significantly improves the running efficiency of the model while maintaining high accuracy, making it an ideal choice for mobile devices, embedded systems, and other applications that require efficient computation. The results indicate that reducing model parameters significantly enhances computational efficiency without compromising detection accuracy. Model selection should depend on specific application requirements. For tasks prioritizing precision, YOLOv11s offers a balanced solution with high accuracy and moderate computational demands. Conversely, MobileNet V4 excels in resource-constrained environments by minimizing computational demands while maintaining reliable detection accuracy.

As illustrated in Figure 6, all models effectively identified oysters with high precision. This underscores the fact that reducing parameters not only optimizes computational efficiency but also preserves detection reliability. Such flexibility allows for selecting the most appropriate model based on deployment needs, enhancing operational efficiency and practicality. These optimizations pave the way for better resource management and application-specific model selection in various scenarios.

3.2. Static Grasping

A static grasping experiment was conducted with four comparative test groups to assess the feasibility of the proposed method. For each group, the number of oysters on a stationary conveyor belt was gradually increased. Fifteen tests were performed for each group, with a randomly selected number of oysters used for each test. If the robot failed to sort an oyster during a test, the sorting process continued until all oysters within its field of view were successfully sorted. Due to the limited field of view of the robot when detecting objects, any oyster accidentally removed from the field of view would have to be manually repositioned.

Figure 7 shows part of the robot gripping process. Figure 7a–f show the sequence of the sorting process: the gripper of the robot starts from the starting position and then moves to the gripping point above the oyster after recognition. The gripper remains parallel to the conveyor belt and rotates to the corresponding grasping angle before moving vertically downward to a fixed height. After closing the gripper, the robot lifts the oyster to a certain height to ensure it is fully detached from the conveyor belt surface, then moves to the placement point to complete the sorting process. In Figure 7g, the object detection results are displayed, with each oyster showing a recognition confidence above 0.8. This indicates high recognition accuracy and precise calculation of the grasping point and angle for each oyster. During actual deployment, a confidence threshold mechanism was added: prediction and grasping are only executed when the confidence remains above 80 for 20 consecutive frames.

According to the data in Table 2, the static grasping experiment consisted of four test groups, each with a progressively increasing number of oysters, and each group was tested 15 times. In each test, the robotic arm attempted to grasp all oysters within its field of view. By analyzing the results of the experiment, out of the total 157 grasping operations, 150 were successful, with a success rate of 95.54%. From the data of each group, the success rate of grasping decreased slightly as the number of oysters increased, which indicates that there may be some challenges in multi-object scenarios, especially when the number of oysters increases, and the following problems may be encountered: instability in object detection and interference in the movement of the gripper claws due to the close arrangement of oysters. Specifically, the grasping success rates of both Group 1 and Group 2 were 100%, which indicated that the robotic arm’s grasping performance was very stable when the number of oysters was small. However, as the number of oysters increased, the grasping success rates of Groups 3 and 4 decreased to 95.74% and 92.30%, respectively. This decline can be attributed to several factors: first, the robotic arm may have difficulty in recognizing multiple targets, especially when the oysters are occluded from each other; second, the action of the grasping jaws may be affected by the positions and rows of other oysters, which may lead to some of the grasping failures. Overall, despite the fluctuating performance of the robotic arm in high-density object environments, the overall success rate of 95.54% indicates that the system still has high reliability in static grasping tasks.

It is worth nothing that in the experiment, it was found that the four-fingered flexible gripper has good robustness when grasping oysters. This means that even in the occasional occurrence of deviation or other unexpected situations, the oyster can be stably grasped.

3.3. Dynamic Grasping

Accurate prediction of oyster positions on a moving conveyor is essential for successful dynamic grasping. Given the time delay between detection and actuation, the system must estimate future grasping positions with both precision and stability. The following subsection evaluates this aspect.

3.3.1. Prediction Performance of the KF

Figure 8 illustrates the dynamic grasping process on the production line and the relationship between various coordinate systems. The conveyor belt carries oysters into the vision system, where images are captured. The upper-level computer system performs object recognition on the oysters and predicts their positions. Subsequently, the lower-level computer system controls the robot to execute the grasping motion based on the signals received.

The direction of the conveyor belt is along the x-axis of the robot coordinate system, while the x-axis direction of the camera coordinate system is opposite to that of the robot coordinate system. Therefore, the oysters move from right to left in the camera field of view. After entering the field of view, the YOLOv8-OBB algorithm was used for real-time detection, and according to the initial state of oysters, the method combining KF and Low-Pass filters was used to refine the prediction results. The KF combines previous information with the most recent measurements to provide the best estimate of oyster location, while the Low-Pass filter suppresses high-frequency noise to ensure smoother and more stable predictions. This combination effectively balances accuracy and stability, addressing challenges such as noisy measurements and sudden position fluctuations.

In Figure 9, the green dots represent the location of the oyster grab detected by yoloviii-obb, and the blue dots represent the predicted location of the oyster grab in the next frame determined by the KF and Low-Pass filter. Initially, the predicted positions lag slightly due to the weight coefficients of the KF not being fully updated and the smoothing effect of the Low-Pass filter. However, after several iterations, the predictions become more accurate and stable, reducing noise-induced errors.

To ensure sufficient iterations, a prediction range is established in the center of the image field of view. When the horizontal coordinates of the grasping point reach this range, the best estimated position and the average velocity of the grasping point, computed through the combined filtering process of the current and previous time frames, will be used to predict the position of the grasping point t seconds earlier. Here, t represents the typical time required for the gripper to move from its initial position to the grasping range, as determined through empirical testing.

Figure 10 illustrates the effectiveness of the KF combined with two different smoothing techniques (Moving Average and Low-pass filter) across three conveyor belt speed scenarios: (a) high speed, (b) medium speed, and (c) low speed. The blue line represents the raw grasp point predictions obtained directly from the KF, showing significant noise and fluctuations. The orange line corresponds to the predictions refined using the KF combined with Moving Average, showing reduced noise but introducing minor delays in the grasp point predictions. The green line denotes predictions processed with the KF combined with the Low-Pass filter, achieving smoother and more stable results with minimal delay, particularly evident at higher speeds.

In terms of system performance, our average latency for image acquisition is 15 ms. The YOLOv8-OBB inference latency averages 42 ms, while the KF prediction latency is 10 ms. At the same time, ROS Rviz is utilized to monitor node statuses in real time and record the Topic transmission frequency. The average detection publishing frequency is 25 ms, resulting in a total latency of 92 ms, which satisifies industrial-grade requirements.

At all conveyor speeds, the Low-Pass filter method consistently produces more accurate and reliable predictions by suppressing high-frequency noise more effectively than the Moving Average. However, at higher conveyor speeds, as shown in Figure 10a, the performance gap becomes more apparent as the Moving Average method struggles with delays, while the Low-Pass filter maintains stability and precision. These results highlight the advantages of integrating the Low-Pass filter with the KF for dynamic oyster grasping tasks, particularly in high-speed scenarios.

3.3.2. Results of Dynamic Grasping

The dynamic grasping process executed by our robotic system integrates a visual module and a KF + Low-Pass filter combination to ensure accuracy. The visual module first captures images of the oysters on the conveyor belt, after which the state of each oyster is determined by the object detection algorithm. The KF + Low-Pass filter method processes this information to refine the prediction of the optimal state of the oyster, including location and velocity. Using these refined data, the robot calculates the predicted grasping position t seconds ahead and executes the grasping action, placing the oyster into the designated box.

As the conveyor belt has an infinitely variable speed, the belt speed is unknown. For comparison, three different belt speeds were tested. The belt speed increased from Level 1 to Level 3, and each group underwent 25 grasping tests. The results, shown in Table 3, reveal a progressive decline in success rates as the belt speed increased.

According to the data in Table 3, the dynamic grabbing experiments were conducted in three sets of tests at different band speeds, each containing 25 grabbing operations. As the belt speed increases from Level 1 to Level 3, the success rate of grabbing shows a gradual decrease. Specifically, Group 1 consisted of 25 tests, of which 21 were successful grabs, with a success rate of 84%. Group 2 had only 19 successful grabs, although the sample size was the same, and the success rate dropped to 72%. Group 3 performed the worst with only 12 successful grabs, dropping the success rate to 53%. Instances of failed grasping were analyzed and several factors were identified that led to a reduced success rate, especially when the conveyor speed increased. In addition to the challenges encountered with static grasping, it was found that the grasp success rate decreased significantly at higher belt speeds.

4. Discussion

This study proposes an integrated vision and robotic control system for the automatic sorting and gripping of oysters on a conveyor belt, aimed at enhancing sorting efficiency and reducing contamination risks in the aquatic food production and processing industry. The contributions of this work can be evaluated from both technical and practical perspectives.

Using CNNs for oyster image processing proved highly effective, highlighting their essential role in modern computer vision systems. By combining CNN with robotics, the intelligence of the robotic system has been significantly improved, setting the stage for future advances in automated sorting and gripping. For real-time object detection and grasping angle prediction, YOLOv8-OBB was employed due to its efficiency and practicality. Unlike two-stage detectors like Oriented R-CNN—which first generate region proposals and then refine oriented boxes—YOLOv8-OBB is based on a single-stage architecture that directly regresses oriented bounding boxes. As a result, YOLOv8-OBB offers faster inference speeds, making it more suitable for real-time applications. Additionally, its support for ONNX format conversion enables deployment on embedded systems like the Raspberry Pi, further enhancing its versatility. In the experiments, YOLOv8-OBB showed robust performance in various scenarios. However, when oysters were stacked closely together, detection instability led to inaccurate position and angle estimations, affecting grasping accuracy. Expanding the training dataset and adjusting the network structure could improve detection performance under such conditions.

Regarding the gripper, our four-fingered flexible claw is made of compliant material and is intentionally designed larger than an oyster, allowing it to compensate for minor jitter errors during grasping. This design ensures excellent robustness, enabling the gripper to maintain a stable grip even when faced with small deviations or unexpected disturbances. From a practical standpoint, the automated sorting system offers clear advantages over traditional manual methods. By reducing direct contact with oysters, it minimizes contamination risk—critical for food safety. Moreover, automation enhances productivity, reduces labor costs, and boosts operational efficiency, significantly improving the overall performance of the seafood processing industry.

While the system worked well in both static and dynamic grasping scenarios, challenges remained, particularly in multi-object situations. In static grasping, stacked oysters caused detection instability, resulting in failed grasps. In dynamic grasping, rapid motion and high conveyor speeds introduced errors due to misalignment between the gripper and the oyster, further complicated by delays in gripper closure. Despite applying the KF and Low-Pass filters, oscillations in prediction curves persisted at higher speeds. To improve the success rate of oyster capture, this study proposes the following measures: (a) a more stable mounting method is used to stabilize the robot and camera, minimizing vibration and improving detection and gripping accuracy; (b) increasing the mounting height of the camera expands its field of view, enabling earlier detection and providing more reaction time for the robotic arm; (c) placing the camera further away from the robot allows the oyster to be recognized earlier, giving the robotic arm more time to make adjustments for a successful grab; (d) firmly securing the robot base and camera mount to reduce instability and vibration is crucial for dynamic grasping accuracy.

This study provides a strong foundation for developing automated sorting and gripping systems in the food industry. Future research will focus on refining object detection models, particularly in multi-object and occlusion-prone environments. Additionally, optimizing system calibration and hardware setup will be key to improving performance in dynamic conditions. Integrating advanced machine learning techniques to predict and compensate for environmental disturbances like vibrations and occlusions could further enhance the system robustness and success rate. With continued optimization, the system could achieve even greater sorting efficiency and reliability in real-world applications.

5. Conclusions

This research presents the development of an innovative robotic system employing machine vision to automate the sorting of oysters on a conveyor belt within a cold-chain production setting. The system integrates the YOLOv8-OBB deep learning algorithm for object detection, effectively addressing challenges associated with oyster localization and gripper angle determination. To enhance the performance of the vision module, YOLOv8-OBB was optimized with a lightweight MobileNetV4 network, significantly reducing the number of parameters by approximately 50% and lowering GFLOPS by 32.4%. KF, combined with a Low-Pass filter, was also integrated into the system to enable the robot to accurately grasp dynamic oysters. The control framework, built on the ROS platform, successfully facilitated real-time communication between the vision system, motion planning, and hardware control, achieving a static grasping success rate of 95.54%. However, the dynamic grasping success rate was found to vary with the speed of the conveyor belt, with an 84% success rate at the lowest speed that diminished as the belt speed increased. Despite the success, several areas for improvement remain. Future research should focus on enhancing the dynamic grasping capabilities by improving the prediction accuracy of the KF and addressing system limitations related to the physical setup of the robotic arm and vision module. In addition, it is recommended to expand the training dataset to include a wider range of object orientations and illumination conditions to enhance the adaptability and robustness of the detection module. Optimizing the reaction time and gripper mechanics of the robot for faster moving objects is also critical to improving efficiency in real-world applications. Overall, this study underscores the potential of robotic systems in advancing the automation of food processing by reducing direct human interaction, enhancing operational productivity, and improving food safety. Future research will prioritize optimizing the performance of the system in high-speed scenarios, addressing hardware limitations, and scaling for large-scale industrial applications.

Author Contributions

Conceptualization, W.-H.S. and J.W.; methodology, W.-H.S. and J.W.; software, J.W.; validation, J.W., H.-R.Q., and W.-H.S.; formal analysis, J.W. and L.-R.L.; investigation, J.W.; resources, W.-H.S.; data curation, J.W.; writing—original draft preparation, J.W. and H.-R.Q.; writing—review and editing, H.-R.Q., W.-H.S., and L.-R.L.; visualization, H.-R.Q., W.-H.S., and L.-R.L.; supervision, W.-H.S.; project administration, W.-H.S.; funding acquisition, W.-H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 32371991.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data are available on request due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gao, P.; Noor, N.Q.I.M.; Shaarani, S.M. Current status of food safety hazards and health risks connected with aquatic food products from Southeast Asian region. Crit. Rev. Food Sci. Nutr. 2022, 62, 3471–3489. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Li, Y.; Xing, S. Ultraviolet Intelligent Intensity Control Sterilization System for Raw Aquatic Products. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach. 2021, 52, 513–518+541. [Google Scholar]
Du, G.; Wang, K.; Lian, S.; Zhao, K. Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: A review. Artif. Intell. Rev. 2021, 54, 1677–1734. [Google Scholar] [CrossRef]
Zhang, J.; Qin, L.; Wang, G.; Wang, Q.; Zhang, X. Non-destructive Ripeness Detection of Avocados (Persea Americana Mill) using Vision and Tactile Perception Information Fusion Method. Food Bioprocess Technol. 2025, 18, 881–898. [Google Scholar] [CrossRef]
Antonucci, F.; Figorilli, S.; Costa, C.; Pallottino, F.; Spanu, A.; Menesatti, P. An Open Source Conveyor Belt Prototype for Image Analysis-Based Rice Yield Determination. Food Bioprocess Technol. 2017, 10, 1257–1264. [Google Scholar] [CrossRef]
Cong, S.; Zhou, Y. A review of convolutional neural network architectures and their optimizations. Artif. Intell. Rev. 2023, 56, 1905–1969. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 60, 84–90. [Google Scholar] [CrossRef]
Punyani, P.; Gupta, R.; Kumar, A. Neural networks for facial age estimation: A survey on recent advances. Artif. Intell. Rev. 2020, 53, 3299–3347. [Google Scholar] [CrossRef]
Wang, C.; Liu, B.; Liu, L.; Zhu, Y.; Hou, J.; Liu, P.; Li, X. A review of deep learning used in the hyperspectral image analysis for agriculture. Artif. Intell. Rev. 2021, 54, 5205–5253. [Google Scholar] [CrossRef]
Vo, S.A.; Scanlan, J.; Turner, P. An application of Convolutional Neural Network to lobster grading in the Southern Rock Lobster supply chain. Food Control 2020, 113, 107184. [Google Scholar] [CrossRef]
Xie, T.; Li, X.; Zhang, X.; Hu, J.; Fang, Y. Detection of Atlantic salmon bone residues using machine vision technology. Food Control 2021, 123, 107787. [Google Scholar] [CrossRef]
Villacrés, J.; Viscaino, M.; Delpiano, J.; Vougioukas, S.; Cheein, F.A. Apple orchard production estimation using deep learning strategies: A comparison of tracking-by-detection algorithms. Comput. Electron. Agric. 2023, 204, 107513. [Google Scholar] [CrossRef]
Mohapatra, D.; Choudhury, B.; Sabat, B. An automated system for fruit gradation and aberration localisation using deep learning. In Proceedings of the 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 19–20 March 2021; IEEE: Piscataway, NJ, USA, 2021; Volume 1, pp. 6–10. [Google Scholar]
Bhosale, Y.; Rao, G.; Naphade, P. Plant disease detection under challenging field conditions using enhanced deep learning approaches. Comput. Electron. Agric. 2023, 199, 107073. [Google Scholar]
Iqbal, J.; Khan, Z.H.; Khalid, A. Prospects of robotics in food industry. Food Sci. Technol. 2017, 37, 159–165. [Google Scholar] [CrossRef]
Aly, B.A.; Low, T.; Long, D.; Brett, P.; Baillie, C. Tactile sensing for tissue discrimination in robotic meat cutting: A feasibility study. J. Food Eng. 2023, 363, 111754. [Google Scholar] [CrossRef]
Mozafari, B.; O’Shea, N.; Fenelon, M.; Li, R.; Daly, D.F.; Villing, R. An Automated Platform for Measuring Infant Formula Powder Rehydration Quality Using a Collaborative Robot Integrated with Computer Vision. J. Food Eng. 2024, 383, 112229. [Google Scholar] [CrossRef]
Zhou, X.; Lan, X.; Zhang, H.; Tian, Z.; Zhang, Y.; Zheng, N. Fully convolutional grasp detection network with oriented anchor box. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 7223–7230. [Google Scholar]
Guo, D.; Sun, F.; Liu, H.; Kong, T.; Fang, B.; Xi, N. A hybrid deep architecture for robotic grasp detection. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1609–1614. [Google Scholar]
Zhang, H.; Zhou, X.; Lan, X.; Li, J.; Tian, Z.; Zheng, N. A real-time robotic grasping approach with oriented anchor box. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 3014–3025. [Google Scholar] [CrossRef]
Park, D.; Seo, Y.; Chun, S.Y. Real-time, highly accurate robotic grasp detection using fully convolutional neural network with rotation ensemble module. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 9397–9403. [Google Scholar]
Wong, C.C.; Chien, M.Y.; Chen, R.J.; Aoyama, H.; Wong, K.Y. Moving object prediction and grasping system of robot manipulator. IEEE Access 2022, 10, 20159–20172. [Google Scholar]
Ireri, D.; Belal, E.; Okinda, C.; Makange, N.; Ji, C. A computer vision system for defect discrimination and grading in tomatoes using machine learning and image processing. Artif. Intell. Agric. 2019, 2, 28–37. [Google Scholar] [CrossRef]
Campos, C.B.; Arteche, M.M.; Ortiz, C.; Fernández, Á.V. Technologies for robot grippers in pick and place operations for fresh fruits and vegetables. Span. J. Agric. Res. 2011, 9, 1130–1141. [Google Scholar] [CrossRef]
Cui, J.; Tian, C.; Zhang, N.; Duan, Z.; Du, H. Verifying schedulability of tasks in ROS-based systems. J. Comb. Optim. 2019, 37, 901–920. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE Computer Society: Washington, DC, USA, 2016; pp. 779–788. [Google Scholar]
Farhadi, A.; Redmon, J. Yolov3: An incremental improvement. In Computer Vision and Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2018; Volume 1804, pp. 1–6. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 116–131. [Google Scholar] [CrossRef]
Howard, A.G. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Kalman, R.E. A new approach to linear filtering and prediction problems. IEEE Trans. Pattern Anal. Mach. Intell. 1960, 22, 1330–1334. [Google Scholar]
Azimi, V.; Munther, D.; Fakoorian, S.A.; Nguyen, T.T.; Simon, D. Hybrid extended Kalman filtering and noise statistics optimization for produce wash state estimation. J. Food Eng. 2017, 212, 136–145. [Google Scholar] [CrossRef]
Ni, D.; Nelis, J.L.; Dawson, A.L.; Bourne, N.; Juliano, P.; Colgrave, M.L.; Juhász, A.; Bose, U. Application of near-infrared spectroscopy and chemometrics for the rapid detection of insect protein adulteration from a simulated matrix. Food Control 2023, 159, 110268. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the robotic system.

Figure 2. Schematic diagram of the relationships between various parts of the robotic system.

Figure 3. Schematic diagram of robot control system based on ROS.

Figure 4. Data augmentation and image annotation: (a) original image, (b) image mirroring processing, (c) image flipping processing, (d) image translation processing, (e) image translation and flipping processing, (f) adjusting image brightness and adding noise, (g) image annotation.

Figure 5. The optimized network architecture diagram of YOLO8-OBB using MobileNet V4.

Figure 6. Comparison of detection results between the original model and the optimized model (The red box is the rotation box that identifies the oyster).

Figure 7. Static gripping process: (a) the robot is in the initial position, (b) the robot is moving over the target, (c) the robot arm is grasping the oyster, (d) the robot is moving toward the end point, (e) the robot is loading oysters into the container, (f) the robot is detecting oysters, (g) static object detection results.

Figure 8. Schematic diagram illustrating the dynamic grasping process of an oyster and the relative positioning of various coordinate systems.

Figure 9. Dynamic detection results for oysters: (a) the oyster comes into view of the camera, (b) the oyster remains in motion, (c) the oyster moves out of the field of view (the green dots represent the location of the oyster grab detected by yoloviii-obb, and the blue dots represent the predicted location of the oyster grab in the next frame determined by the KF and Low-Pass filter).

Figure 10. Filtering results of KF combined with the Low-Pass filter and Moving Average under three conveyor speed conditions: (a) high speed (12 s), (b) medium speed (15 s), and (c) low speed (18 s).

Table 1. Training results of two models.

Backbone	Parameters	GFLOPs	Precision	mAP50-95
YOLOv8s	11422166	29.6	99.96%	0.858
MobileNet V4	5703667	20.0	99.87%	0.847
MobileNet V3	10710268	21.7	99.89%	0.861
MobileNet V2	8481142	22.1	99.88%	0.853
Shufflenetv1	8476998	18.9	99.94%	0.856
Shufflenetv2	7611250	19.6	99.91%	0.851
YOLOv11s	9744931	22.7	99.99%	0.879

Table 2. Results of static grasping.

Number of Oysters	Number of Samples	Number of Grabs	Accuracy (%)
1	15	15	100
2	15	30	100
3	15	47	95.74
4	15	65	92.30
Total	60	157	95.54

Table 3. Results of dynamic grasping.

Belt Speed	Number of Samples	Number of Grabs	Average Success Rate
1	25	21 20 22	84%
2	25	19 18 17	72%
3	25	12 13 15	53%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qu, H.-R.; Wang, J.; Lei, L.-R.; Su, W.-H. Computer Vision-Based Robotic System Framework for the Real-Time Identification and Grasping of Oysters. Appl. Sci. 2025, 15, 3971. https://doi.org/10.3390/app15073971

AMA Style

Qu H-R, Wang J, Lei L-R, Su W-H. Computer Vision-Based Robotic System Framework for the Real-Time Identification and Grasping of Oysters. Applied Sciences. 2025; 15(7):3971. https://doi.org/10.3390/app15073971

Chicago/Turabian Style

Qu, Hao-Ran, Jue Wang, Lang-Rui Lei, and Wen-Hao Su. 2025. "Computer Vision-Based Robotic System Framework for the Real-Time Identification and Grasping of Oysters" Applied Sciences 15, no. 7: 3971. https://doi.org/10.3390/app15073971

APA Style

Qu, H.-R., Wang, J., Lei, L.-R., & Su, W.-H. (2025). Computer Vision-Based Robotic System Framework for the Real-Time Identification and Grasping of Oysters. Applied Sciences, 15(7), 3971. https://doi.org/10.3390/app15073971

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Computer Vision-Based Robotic System Framework for the Real-Time Identification and Grasping of Oysters

Abstract

1. Introduction

2. Materials and Methods

2.1. Robotic System

2.1.1. Hardware Design of the Robotic System

2.1.2. System Realization Based on ROS

2.2. Sample

2.2.1. Sample Preparation

2.2.2. Dataset Creation and Data Augmentation

2.3. Objection Detection for Oysters

2.3.1. YOLOv8-OBB Model for Oysters Detection

2.3.2. Optimizing the Model Based on the Lightweight Network MobileNet V4

2.4. Moving Oysters Prediction

3. Results

3.1. Object Detection

3.2. Static Grasping

3.3. Dynamic Grasping

3.3.1. Prediction Performance of the KF

3.3.2. Results of Dynamic Grasping

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI