A Soft Robotic Gripper for Crop Harvesting: Prototyping, Imaging, and Model-Based Control

Jiang, Yalun; Mohammadpour Velni, Javad

doi:10.3390/agriengineering7110378

Open AccessArticle

A Soft Robotic Gripper for Crop Harvesting: Prototyping, Imaging, and Model-Based Control

by

Yalun Jiang

^*

and

Javad Mohammadpour Velni

Department of Mechanical Engineering, Clemson University, Fluor Daniel Engineering Innovation Building, Clemson, SC 29631, USA

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(11), 378; https://doi.org/10.3390/agriengineering7110378

Submission received: 17 September 2025 / Revised: 27 October 2025 / Accepted: 3 November 2025 / Published: 7 November 2025

(This article belongs to the Special Issue Advancing Smart Farming through Agricultural Robots and Automation Technologies)

Download

Browse Figures

Versions Notes

Abstract

The global agricultural sector faces escalating labor shortages and post-harvest losses, particularly in delicate crop handling. This study introduces an integrated soft robotic harvesting system addressing these challenges through four key innovations. First, a low-cost, high-yield fabrication method for silicone-based soft grippers is proposed, reducing production costs by 60% via compressive-sealing molds. Second, a decentralized IoT architecture with edge computing achieves real-time performance (42 fps to 73 fps) on affordable hardware (around $180 per node). Third, a lightweight vision pipeline combines handcrafted geometric features and contrast analysis for crop maturity assessment and gripper tracking under occlusion. Fourth, a Neo-Hookean-based statics model incorporating circumferential stress and variable cross-sections reduces tip position errors to 5.138 mm. Experimental validation demonstrates 100% gripper fabrication yield and hybrid feedforward–feedback control efficacy. These advancements bridge the gap between laboratory prototypes and field-deployable solutions, offering scalable automation for perishable crop harvesting.

Keywords:

soft robotic grippers; agricultural automation; real-time vision systems; pneumatic actuation; hyperelastic modeling

1. Introduction

The global agricultural sector faces a dual crisis: a projected 56% increase in food demand by 2050 [1], juxtaposed with a significant and persistent decline in agricultural labor availability over recent decades. In the United States, the agricultural workforce has shrunk remarkably, with a 20% drop in full-time farm employment between 2002 and 2014 alone [2]. This imbalance is further exacerbated by regional disparities. For instance, within the European Union, the number of farms fell by about 37% between 2005 and 2020 [3], while in the Asia–Pacific region, the proportion of agricultural employment represented by youth aged 15–24 decreased by 5.2% from 2011 to 2021 [4]. Simultaneously, inefficiencies in harvesting contribute to substantial food loss: approximately 2.5 billion metric tons of food are lost annually between harvest and retail [5]. In the United States alone, loss of the delicate crops such as soft-fruit spoilage (notably strawberries) is estimated at $2 billion per year, primarily due to bruising and improper handling [6]. Traditional approaches to agricultural automation tend to oscillate between rigid mechanization and limited manual labor assistance, both of which fall short in addressing these multifaceted challenges. Traditional approaches to agricultural harvesting oscillate between two extremes:

Labor-intensive manual harvesting (wages of $16–18 per hour in the U.S. [7]), rendered unsustainable by demographic shifts.
Industrial mechanization enhances efficiency for durable crops like grains through precision agriculture technologies (e.g., sensor-based optimization and IoT-enabled monitoring) [8], but exacerbates post-harvest losses in soft fruits due to mechanical damage. For strawberries, for instance, studies indicate that mechanical harvesting and handling can lead to significant bruising and quality degradation, primarily due to compression and impact forces [9,10].

However, many existing robotic harvesting systems struggle to achieve the necessary balance of accuracy, speed, and cost-effectiveness for real-world deployment. This is particularly true for systems reliant on resource-intensive deep learning for perception, which often fail to deliver real-time performance on affordable hardware. Beyond perception, the end-effector remains a critical bottleneck. Conventional rigid grippers often cause damage, while existing soft gripper prototypes are frequently hampered by high manufacturing complexity, cost, and poor scalability. Finally, achieving precise control is challenging. Prevailing models for soft actuators often rely on oversimplified geometries, while the integration of high-resolution tactile sensing often proves too costly for scalable deployment.

The synthesis of prior research underscores several persistent limitations in agricultural robotic harvesting: conventional rigid end-effectors cause excessive damage to delicate crops; existing soft grippers often suffer from high manufacturing complexity, low yield rates, and poor reproducibility; meanwhile, perception modules relying on deep learning are frequently resource-intensive, data-hungry, and difficult to generalize in unstructured environments. Additionally, current analytical models for soft actuators tend to oversimplify geometry and material behavior, limiting their predictive capability and control accuracy. To overcome these challenges, we present an integrated soft robotic harvesting system featuring the following key innovations:

Low-cost, high-yield fabrication of soft grippers: We propose a novel mold design integrating compressive sealing and bubble displacement mechanisms, enabling rapid, DIY-friendly production of silicone-based soft grippers in 30–40 min per unit. Each finger costs less than $4, representing a cost reduction of over 60% compared to traditional FDM-based 3D-printed molds (typically $10–$12 per finger, including materials and post-processing). Notably, our method has achieved 100% fabrication yield across five independent gripper prototypes without reliance on industrial-grade equipment.
Cost-effective, IoT-friendly system architecture: Our hardware design adopts decentralized pneumatic control nodes and edge-level visual computing, maintaining real-time responsiveness while significantly reducing system cost. Each pneumatic node costs $172–185, enabling a 50–54% cost reduction compared to commercial deep-learning-based robotic platforms. The compactness and modularity of the system promote easy deployment in field environments and scalability across different crops.
Lightweight and generalizable vision system for crop and gripper tracking: Instead of relying on deep learning models or color-coded markers, we develop a robust image processing pipeline based on handcrafted geometric and contrast features. This system is capable of:
- Assessing crop maturity, ranking visible candidates based on ripeness scores, and prioritizing targets for sequential harvesting.
- Tracking gripper prototypes of various designs without any visual markers, even during partial occlusion, rapid camera motion, or environmental disturbances in the field (e.g., wind).
This unified pipeline supports real-time tracking (42–73 fps on Raspberry Pi 5) and maintains spatial consistency across perception and actuation layers—bridging the gap between high-level visual intelligence and low-level control execution.
Accurate and generalized analytical modeling: We introduce a Neo-Hookean-based statics model that accounts for circumferential stress and variable cross-sectional geometry, enabling a more realistic estimation of actuator deformation than the constant-curvature approximation. Experimental measurements validate the theoretical pressure-angle relationship, showing an average relative error (gripper tip position change) below +4.1 mm across bending angles from 0° to 70°. This model serves as the core of a hybrid (feedforward/feedback) controller, enhancing grasp accuracy while reducing reliance on expensive tactile sensors.

This study aims to establish an integrated framework that unifies soft gripper fabrication, lightweight perception, and model-based control into a cost-effective, real-time harvesting system.

The remainder of this paper is organized as follows. Section 2 reviews the related work in soft robotic grippers, agricultural perception, and system architectures. Section 3 introduces the system design and methods. In Section 3.1, we describe the gripper prototype and manufacturing process, including our low-cost mold design, material choices, and casting workflow. Section 3.2 outlines the overall harvesting system architecture, highlighting the decentralized control setup and cost-efficient hardware configuration. Section 3.3 presents the image processing pipeline for crop detection and tracking, designed to handle occlusion and prioritize ripeness. Section 3.4 explains our mathematical model of the gripper, which accounts for stress distribution during bending, and Section 3.5 details the proposed control strategy that combines feedforward and feedback components for a reliable actuation. Section 4 presents and discusses the experimental results, providing a unified analysis of system performance across fabrication yield, model accuracy, and controller efficacy. Finally, Section 5 provides concluding remarks and outlines future research directions.

2. Related Work

Our integrated harvesting system builds upon and differentiates from prior research in soft robotic grippers, agricultural automation architectures, and robotic perception.

2.1. Robotic Perception and System Architecture in Agriculture

To address the above challenges, researchers have increasingly turned to robotic harvesting systems that promise both productivity and delicacy, often by leveraging recent advances in computer vision (CV) and deep learning (DL) to enhance perception and autonomy. Field trials have demonstrated the viability of DL-based systems in real-world scenarios; for instance, one achieved a 93.4% successful grasp rate on strawberries [11]. Despite these encouraging results and the fact that DL has become the dominant perception method in agricultural robotics [12,13], most systems remain confined to laboratories, with only a small fraction reporting real-time performance above 30 frames per second (fps) on low-cost (less than $500) embedded platforms. For example, on a $150 LattePanda Mu (Intel N100 CPU), YOLOv8-nano achieves only 4 fps to 7 fps on raw inference and about 7 fps to 9 fps after INT8 quantization [14], far below the minimum 10 fps (corresponding to 100 ms latency) required for delicate soft-gripper control. To gain a broader picture of current real-time performance on affordable hardware, we surveyed and compiled publicly available benchmarks for common detection and tracking algorithms on Raspberry Pi 4, Pi 5, and Hailo-8L modules. As shown in Table 1, most combinations of Pi hardware and lightweight models struggle to exceed 10 fps, particularly when tracking multiple objects in parallel. These findings reinforce the urgent need for practical automation frameworks that harmonize accuracy, speed, and affordability for real-world deployment.

2.2. Soft Robotic Grippers and Actuation Strategies

While recent advances in perception and motion planning have improved the autonomy of harvesting robots, the design of the end-effector, i.e., the gripper that directly interacts with the fruit, remains a major limitation in practical applications. Most existing systems rely on either rigid industrial arms equipped with mechanical claws, or semi-automated solutions requiring human assistance. The former offers high-speed operation but often causes unacceptable bruising or surface damage to delicate fruits [15]. Bag-type and suction-based grippers have been explored to address this issue, but they face practical limitations including slow operation, complexity, or poor adhesion under field conditions [16].

In contrast, soft grippers, characterized by their compliance and bio-inspired design, have emerged as a promising alternative. By leveraging materials and structures that deform passively, they can grasp a variety of fruit shapes and sizes without requiring high-precision alignment or force control [16,17]. This design flexibility makes them particularly suitable for unstructured agricultural environments, where fruit position and orientation can vary significantly. Recent studies have demonstrated the effectiveness of soft grippers in minimizing damage while maintaining sufficient gripping force [18,19], highlighting their potential as the default end-effector choice in future harvesting platforms. However, despite these advantages, the widespread adoption of soft grippers in real-world agricultural systems remains hindered by several critical challenges. These challenges involve not only the technical limitations of the grippers themselves, but also broader concerns related to maintainability, system complexity, cost, manufacturing, and the practical difficulties of implementing such systems in the field.

Among the various actuation strategies developed for soft grippers, several mainstream designs have emerged, each with distinct advantages and trade-offs. Tendon- or cable-driven grippers, for instance, offer relatively high force output and can be compactly integrated into robotic systems. However, they typically rely on external motors and exposed transmission lines (e.g., wires or Bowden cables), which are susceptible to wear, environmental interference, and mechanical failure [20]. While shape memory alloys (SMAs) have also been explored as actuators, offering compact and silent operation, their high cost, limited actuation speed, and poor energy efficiency make them less attractive for large-scale or field-deployable systems [21]. Some low-cost alternatives utilize ordinary metallic wires as substitutes for SMAs, but these are often custom-built, lacking modularity and interchangeability, thus complicating maintenance and repair once deployed in harsh agricultural environments [16]. Hydraulic-driven soft grippers, while capable of delivering strong and precise movements, tend to involve bulky and heavy components (e.g., fluid reservoirs, valves, and pumps), making them ill-suited for lightweight robotic platforms [22]. In contrast, pneumatic actuation offers a compelling balance between structural simplicity, weight, and functional adaptability [23,24]. Unlike cable-driven designs, pneumatic systems often require only a single control unit (e.g., air pump or solenoid valve), reducing the need for complex mechanical linkages and easing system integration. This simplicity not only improves reliability and maintainability in field conditions but also facilitates modular design and scalability for multi-gripper harvesting systems. As such, pneumatic-driven actuation is adopted as the primary design strategy in this work.

In considering the practical implementation of pneumatic-driven soft grippers, one critical factor is the cost and complexity associated with their fabrication. Many existing gripper prototypes rely on custom molds tailored for injection molding processes, which, even in a laboratory setting, can be prohibitively expensive and technically demanding. This reliance on industrial manufacturing techniques significantly limits scalability and accessibility, especially for research groups or small-scale deployments. As an alternative, 3D printing has been employed to lower prototyping barriers; however, additive manufacturing of elastomeric components remains slow and inefficient. The production of a single gripper can take upwards of one to two hours, making it impractical for large-batch fabrication or rapid design iteration. Furthermore, the use of silicone or other elastomeric materials requires precise control over curing conditions to ensure consistency in mechanical properties, further complicating the production pipeline. These constraints collectively present significant obstacles to rapid prototyping and real-world deployment of pneumatic grippers, particularly when multiple units are needed for distributed agricultural systems.

2.3. Modeling and Sensing for Soft Actuators

In terms of modeling, existing models for soft grippers are often built upon simplified geometries—commonly assuming constant curvature and uniform cross-sections such as circular, rectangular, or semicircular shapes [25,26]. While these models are tractable and effective for basic control tasks, they fall short in representing the complex, multi-segmented morphologies required for advanced manipulation tasks, particularly in unstructured environments. As grippers evolve towards more dexterous and adaptive designs, there is a growing need for generalized modeling frameworks capable of capturing nonlinear material behavior, variable cross-sections, and heterogeneous internal structures. Addressing this gap is critical for enabling predictive control, optimization, and the integration of feedback into closed-loop systems.

In addition to actuation and modeling, the broader system architecture and hardware integration also merit consideration. Successful agricultural harvesting requires not only reliable mechanical grasping, but also careful force modulation to avoid bruising or damaging delicate crops. Two predominant approaches have been adopted in the literature to tackle this challenge. The first relies heavily on computer vision techniques, often augmenting the gripper with high-contrast visual markers [27,28] to enable precise localization and pose estimation. However, such methods suffer from significant limitations in real-world conditions, where occlusion by foliage or overlapping fruits may render markers ineffective. The second approach incorporates tactile sensing directly onto the gripper, frequently using a combination of strain gauges and discrete pressure sensors [29,30,31]. While this method enhances force feedback, its effectiveness is limited for irregularly shaped or small-sized produce, largely due to the sparse spatial resolution of point-based pressure measurements. Some commercial solutions have explored the use of high-resolution force sensing arrays—such as the Tekscan 5027 sensor, which offers a remarkable spatial resolution of up to 284 sensing points per square centimeter [32,33]. These systems are capable of generating detailed pressure contour maps, enabling more precise and adaptive force control even for small or irregular crops. However, such products are typically proprietary, expensive, and tailored for niche applications. Data acquisition is only compatible with dedicated Tekscan hardware, and communication is limited to USB-C or Wi-Fi protocols. These constraints not only increase the cost per deployed node significantly but also complicate mechanical integration due to the added bulk of housing and wiring. Therefore, considering the constraints of system cost, maintainability, and structural complexity, the integration of force sensing resistor (FSR) arrays or other force-sensing technologies—despite their technical advantages—is deemed impractical for large-scale agricultural deployment.

The aforementioned research highlights significant progress in individual components. However, a gap remains for an integrated system that synergistically addresses the challenges of cost, scalability, and performance across fabrication, perception, and control. Our work is positioned to bridge this gap by introducing a cohesive framework that unifies low-cost gripper fabrication, a decentralized IoT architecture, a lightweight vision pipeline, and an accurate physics-based model.

3. Materials and Methods

This section describes the methodology employed to develop the integrated soft robotic harvesting system. The framework encompasses four integral components: (1) the design and low-cost fabrication process of the pneumatic soft gripper, (2) the decentralized system architecture and IoT-enabled control setup, (3) a lightweight vision pipeline for real-time crop and gripper tracking, and (4) the derivation of a mechanics-based model for accurate gripper control.

3.1. Gripper Prototype Design and Manufacturing

In this section, we provide details on the design and manufacturing process of the soft gripper, which is pivotal for enabling delicate and adaptive grasping. We begin with the material selection rationale, followed by the introduction of a novel compressive-sealing mold design that effectively addresses issues of air entrapment and structural inconsistency. The detailed workflow demonstrates a DIY-friendly approach that achieves high reproducibility and significantly reduces production cost and time compared to conventional methods.

3.1.1. Material Selection

To develop a low-cost, user-friendly, and reliable harvesting end-effector, the selection of appropriate materials for constructing the gripper is of paramount importance. Among the materials commonly adopted in prior research, thermoplastic polyurethane (TPU) and Dragon Skin 20 silicone rubber have been widely used, particularly for fabrication via 3D printing or molding techniques. However, TPU typically exhibits a significantly higher Shore hardness (ranging from 85 A to 95 A) compared to Dragon Skin 20 (20 A), indicating a substantially greater resistance to deformation. As a result, grippers made from TPU generally require higher actuation forces, torques, or energy input to achieve comparable levels of bending or deformation. This mechanical stiffness directly impacts system-level design choices. In order to deform a higher-stiffness material such as TPU to the same extent as a softer material like Dragon Skin 20, a more powerful actuator, which is often larger in size and higher in cost, is required. This conflicts with the overarching design objective of achieving a compact and cost-effective end-effector system, especially for applications where portability and economic scalability are critical.

Considering that the target crops of our harvesting system primarily include apples, oranges, and peaches, it is essential to select a material capable of handling their typical size and weight. On average, these fruits exhibit diameters ranging from 6

c

m

to 9

c

m

and weights between 130–230 g, depending on species and ripeness stage [34,35,36]. For example, medium-sized apples generally weigh from 180 g to 230 g [34], oranges 130 g to 180 g [35], and peaches 130 g to 170 g [36]. Given these parameters, a material must offer sufficient flexibility to conform to the fruit’s surface while maintaining enough mechanical strength to support its weight without permanent deformation or slippage. As a result, Dragon Skin 10, which is a platinum-cure silicone known for its biocompatibility, food safety, and exceptional flexibility, was selected for fabricating the gripper fingers. With a Shore hardness of 10 A, Dragon Skin 10 has demonstrated the capability to deform easily under low actuation forces while reliably supporting the mass of common horticultural products. Preliminary trials confirmed its ability to conform to and grasp apples, oranges, and peaches without causing damage, making it a suitable candidate for soft robotic end-effectors in crop harvesting applications.

To accommodate the use of this material, a custom mold was developed to facilitate rapid and repeatable gripper production as seen in Figure 1. While traditional methods such as 3D printing and injection molding are widely adopted in soft robotic fabrication, they pose significant limitations as 3D printing often suffers from low throughput (requiring 1–2 h per gripper), while injection molding demands costly, bulky equipment not commonly accessible in standard research labs. To circumvent these challenges, we developed a time-efficient, DIY-friendly molding process that offers a practical and scalable alternative for rapid prototyping and small-batch production.

3.1.2. Manufacturing Process

To ensure tight sealing, structural integrity, and efficient bubble removal during the molding process, we designed a male–female mold pair featuring large draft angles (45°) and a tenon–mortise structure. This configuration not only ensures sufficient compressive force on the silicone material during molding, yielding denser, more reliable gripper fingers, but also facilitates easy demolding and alignment. The relatively steep draft angle is especially important in DIY setups, where manual demolding without damage is critical.

Another key consideration in the mold design is the behavior of the silicone compound during mixing and curing. Commercially available platinum-cure silicones like Dragon Skin 10 typically consist of a two-part mixture (Part A and Part B) that, when combined, undergo a curing reaction. However, during the mixing process, air bubbles are inevitably introduced. In professional manufacturing settings, degassing via vacuum chambers is employed to eliminate these bubbles. In our setup, however, the mold’s high-contact, compressive design serves an additional function: as the mold halves are closed, excess air and bubbles are physically pushed out of the cavity, which minimizes the need for time-consuming degassing or long resting periods.

Leveraging this design, we were able to use the Smooth-On^TM Dragon Skin 10 Very Fast (Smooth-On Inc., Macungie, PA, USA) formulation, which cures in approximately 30 min at room temperature. This significantly shortens the fabrication cycle and makes the process suitable for rapid prototyping. In practice, several gripper fingers can be fabricated within a 30–40 min window demonstrating a marked efficiency improvement over traditional 3D printing, which often takes over an hour per finger.

The full molding procedure is illustrated in Figure 1. First, the female mold is aligned and clamped with the middle mold to form a negative of half the gripper geometry. After curing, a 3D-printed middle plug is inserted into the resulting cavity to define the internal channel. The male mold is then attached to form the remaining half of the gripper. This process produces a nearly finished gripper with an open base for the air inlet. The middle plug is subsequently removed and replaced by a small threaded plug, which facilitates pneumatic connection via a quick-release fitting. Finally, the air inlet section is sealed using an inlet clamper and inlet mold, resulting in a complete, ready-to-use soft actuator.

3.1.3. Gripper Mold Design Benchmark

To validate the robustness of the proposed mold structure, preliminary trials were conducted using two conventional mold configurations—a top–bottom and a side-parting design as shown in Figure 2a. Both exhibited common defects such as uneven silicone flow, trapped air, and weak seams along the parting line, resulting in a fabrication success rate below 30%. In contrast, the adopted compressive-sealing mold with high draft angles effectively displaced trapped air and maintained uniform internal pressure during casting, achieving a 100% yield across multiple fabrication attempts. These findings highlight the importance of proper mold geometry for ensuring consistent and defect-free soft-gripper fabrication.

3.2. Crop Harvesting System Architecture

To address the demands of modern agricultural management, the system is designed as an Internet-of-Things (IoT)-based multi-node control architecture. This modular design not only aligns with real-world application needs but also significantly reduces cost compared to solutions based on deep learning (DL) models. The overall system consists of two main subsystems: a host machine for data processing and command dispatching (implemented using a Raspberry Pi 5 (Raspberry Pi Ltd., Cambridge, UK)), and multiple servo units composed of Arduino Uno R4 WiFi boards (Arduino AG, Ivrea, Italy) along with necessary actuators and driver circuits.

For communication between devices, we adopted

2.4

G

Hz

WiFi using UDP protocol with handshake verification, rather than Bluetooth Low Energy (BLE). While BLE is energy-efficient, it poses limitations in scenarios requiring local data storage. WiFi not only enables real-time communication with ultra-low latency, but also allows lightweight data to be uploaded to cloud storage. Furthermore, under non-time-critical conditions, certain computations can be offloaded to the cloud. This architecture lays a scalable foundation for future system extensions.

In terms of cost-performance evaluation (as detailed in Table 2), experimental results show that the system achieves a maximum processing throughput of 73 fps, while the minimum effective frame rate, limited primarily by the webcam and UDP latency, remains around 42 fps. Given this performance, one host machine is capable of managing 4 to 5 servo units concurrently. Based on the BOM pricing, the cost per node ranges from $172.18 to $184.68. In comparison, an equivalent DL-based node, which requires additional acceleration hardware such as a Google Coral TPU (approximately $80), would cost approximately $369.24. This results in a cost reduction of approximately 49.98% to 53.39% per node, which is highly favorable for system scalability and practical adoption in agriculture. To clarify the interconnections of the mechatronic components, a schematic circuit diagram of the system architecture is provided in Figure 3.

With the system architecture and cost-efficiency validated, we designed an experimental setup (shown in Figure 4 and Figure 5) to assess its real-world performance. As illustrated in Figure 4c, the bottom layer of the test bench consists primarily of an artificial orange setup and the corresponding gripper assembly. The artificial orange model features fruits of varying shapes and sizes, with branches and leaves introducing partial occlusion and inconsistent lighting conditions when observed from different webcam angles. This configuration closely emulates the visual challenges encountered in real-world agricultural environments, such as fruit detection, ripeness estimation, and target tracking. In addition to testing the perception system for the crop itself, it also offers a highly practical scenario for evaluating the gripper’s ability to visually locate, approach, and interact with objects under complex and uncertain conditions (see Supplementary Materials for the related video demonstrations).

The top layer of the test bench, as shown in Figure 5a, primarily consists of the servo and actuation system, along with the experimental setup for validating the gripper’s static model. Regarding the choice of microcontroller, we ultimately selected the Arduino Uno R4 WiFi over the more cost-effective ESP32-S3-WROOM-2-N32R8V (featuring 32 MB Octal Flash and 8 MB Octal PSRAM, priced at $17). While the ESP32 offers powerful features, it typically supports only 5

V

or

3.3

V

power inputs. In contrast, the Arduino provides a 12

V

-compatible barrel jack input, which significantly simplifies power management in practical applications. Although using the ESP32 with a 12

V

-to-5

V

DC converter and a dedicated USB Type-C interface is technically feasible, this solution introduces additional hardware complexity, increases system bulk, and may reduce overall reliability. In comparison, the Arduino-based configuration provides a more compact, stable, and easily scalable solution for controlling the servo system.

Components labeled as “Optional” in Figure 5 are not required for the core system functionality. For instance, the DATAQ pressure transducer is only used during experimental validation to ensure the dynamic pressure inside the gripper chamber remains within safe limits. Similarly, the vacuum pump is only employed when handling large crops, where reverse bending is necessary to open the gripper by evacuating internal air. Therefore, only one solenoid valve and one motor driver are needed for basic operation.

To validate both the statics model and the control performance of the proposed soft gripper, a simple experimental setup was designed and implemented, as illustrated in Figure 6a. Despite the lightweight nature of the gripper, gravitational bending may still occur. To ensure the accuracy of the validation experiments, the gripper was suspended in the air to eliminate friction and reduce undesired contact forces. Moreover, both the gripper’s side profile and the lens of the webcam (Intel Realsense D435) were carefully aligned parallel to the horizontal plane. To further facilitate the analysis of the bending angle, the baseline of the webcam was kept parallel to the longitudinal axis of the gripper.

To evaluate the accuracy of the statics model, nine different desired bending angles were selected as references: 5°, 10°, 15°, 20°, 30°, 45°, 60°, 75°, and 90°. For each target angle, the experiment was repeated 20 times. Given a desired bending angle, the corresponding pump voltage from Equation (29) was first computed using the statics model from Equation (8). The same voltage was then applied to the pneumatic system, and the resulting bending angle was measured and compared to the reference.

The actual bending angle was determined through image processing. Under relatively simple background conditions, Otsu’s thresholding method combined with Gaussian blurring was used to segment the gripper’s contour. Subsequently, OpenCV’s convexHull function was employed to extract the key vertices from the detected contour. Among the detected points, the top-left and top-right vertices were selected as representative landmarks to characterize the bending feature

θ

of the gripper by computing

tan (θ / 2)

from Figure 7b.

3.3. Reinforced Image Processing

To enable robust visual perception across diverse agricultural conditions, we designed a modular image processing pipeline compatible with both online and offline modes as seen in Figure 8. At the core of the system, several user-defined configuration classes determine how the pipeline behaves. The iso class manages all I/O-related parameters, such as whether a webcam is available on the current host machine, and where to store processed results. In contrast, visual properties and tracking settings are encapsulated within the crop and gripper classes, which respectively store color information and tracking configurations for crops and grippers.

The initialization function init_obj() is responsible for validating user-defined configurations. Once validated, the program automatically detects connected RGB webcams. If a usable camera is found, the system enters online mode; otherwise, it switches to offline mode and processes pre-defined video streams or static images from the specified path.

The program is designed to maximize the use of limited system resources through multi-threading. The first thread is responsible for crop image processing, which includes both maturity classification and spatial tracking. Only when a crop meets predefined ripeness criteria and is successfully locked does the second thread, gripper processing, become activated. Importantly, once this thread is launched, it remains active even if the crop is lost or temporarily occluded, ensuring continuity of task execution.

Upon successful localization of both targets, the system activates the third thread for communication, handled by the com class. This class abstracts the low-level I/O between the host and the actuation units over either serial or Wi-Fi. Through this interface, control signals are sent to the servo-driven soft gripper to execute the harvesting sequence. After each harvesting cycle, the system returns to the initial state to search for the next target. The communication thread remains open to reduce latency between cycles and is only terminated when the system is powered down or when all ripe crops have been harvested.

The following two sub-sections provide further details of our image processing logic. Section 3.3.1 describes how crop maturity and position are reliably determined using a lightweight, non-supervised-learning-based strategy. Section 3.3.2 presents our gripper identification and tracking approach, which accommodates variable designs and ensures tight synchronization with the actuation process. The structured pseudocode for the core algorithms described in this section, including system initialization, crop detection and maturity analysis, gripper detection, multi-target tracking, and thread management, is provided in Appendix B for reproducibility.

3.3.1. Robust Crop Image Processing & Tracking with Confidence Validation

While deep learning-based object detection excels at fruit identification, it often falls short in assessing maturity, which requires fine-grained analysis of color, size, and surface integrity. Furthermore, these data-driven methods struggle to generalize across diverse cultivars and incur substantial overhead in data collection and training, limiting their practicality for real-world deployment.

In current practice, many semi-autonomous systems still rely on manual validation following AI-based detection, which not only increases human intervention but also undermines the goal of full automation. Additionally, certain quality criteria such as the absence of bruises or localized rot are often visually observable and can be quantitatively described using interpretable features, eliminating the necessity of end-to-end learning models. Our approach embraces this principle by leveraging explicitly defined thresholds and parametric criteria, rather than relying on opaque learned representations.

Beyond maturity estimation, real-time and robust object tracking presents another critical bottleneck in crop harvesting tasks. While numerous algorithms such as CSRT, KCF, and others developed within the OpenCV framework offer reliable tracking under controlled conditions, their performance degrades significantly in the dynamic, cluttered environments typical of agricultural fields. One notable issue arises when the target fruit experiences lateral movement (parallel to the camera’s baseline) and becomes temporarily occluded by branches or foliage. In such cases, standard trackers may fail to recognize occlusion and continue generating bounding boxes around incorrect regions, causing the system to misidentify targets or enter failure loops.

Another prevalent failure mode occurs when the fruit moves along the depth axis (perpendicular to the camera), either due to natural motion or active camera repositioning. In such scenarios, conventional trackers often fail to adjust the bounding box size accordingly as shown in Figure 9b,c, resulting in either cropped or overly large bounding boxes that compromise localization precision. Although the target may still be nominally tracked, the lack of scale adaptation and boundary accuracy renders these algorithms insufficient for systems that rely entirely on visual feedback to initiate downstream actions such as actuation or trajectory planning.

In light of these limitations, we propose a lightweight, rule-based visual processing pipeline tailored specifically for crop maturity analysis and precision tracking in field-like conditions. The following paragraphs detail our system design and implementation, highlighting how traditional image processing techniques, when carefully composed, can outperform general-purpose tracking models in targeted applications with well-defined requirements.

In our experiments using artificial oranges, we employed HSV color space segmentation combined with CLAHE to handle lighting variations (Figure 10a). The target crop color was defined as

{HSV}_{crop} = [38.8, 255, 255]

(tolerance: ±4%). Regions of interest (ROIs) were considered valid candidates if the proportion of mature-color pixels within them exceeded a predefined ripeness threshold (e.g., 80%). The system then sorted these candidates by ripeness score and selected the top-ranked one as the harvesting target, as shown in Figure 10b.

To mitigate tracking failures (Figure 9b–e), we introduced a secondary verification process. At each bounding box update, the system constrains the box to a square and defines a circle within it (Figure 9f). The first stage verifies that the proportion of mature-color pixels inside this circle exceeds a threshold, addressing failures as in Figure 9c–e.

A second verification step is applied to the non-shadowed portion of the bounding box outside the circular region. If this area still contains a small but significant percentage of pixels matching the mature color, it indicates that the crop is not fully enclosed by the circle, which was an issue observed in Figure 9b. This two-stage approach ensures more reliable crop localization during tracking.

We compared multiple tracking algorithms, including CSRT, KCF, MIL, Boosting, Median Flow, MOSSE, and TLD. Only CSRT and Median Flow achieved acceptable tracking quality in our tests. However, CSRT had a significantly higher computational cost exceeding 50% CPU usage per tracker even without the secondary verification. Due to its lower overhead (∼34% CPU usage with full verification), Median Flow was selected as the final tracking algorithm for our system, making it more suitable for real-time deployment on resource-limited platforms.

3.3.2. Universal Gripper Feature Detection via Shape Priors

A wide variety of gripper prototypes exist today, ranging from commercial products to experimental models still in laboratory development. These grippers exhibit significant diversity in structural design and appearance. If deep learning-based recognition were to be used, it would require collecting and labeling a large volume of training data for each new gripper type introduced, which is evidently not scalable or generalizable.

After an extensive review of gripper designs from a variety of sources, we identified a common pattern: in their non-actuated state, the contours of all fingers are approximately rectangular and nearly parallel to one another. To account for practical conditions, we allow for a small angular tolerance (typically up to 10 degrees), not due to perspective distortion, but because grippers can enter the camera’s field of view in arbitrary orientations (e.g., vertically, horizontally, or diagonally as shown in Figure 11). In such poses, the gripper’s own weight and spatial configuration can cause slight physical bending of the fingers, making them no longer perfectly parallel. Moreover, most fingers tend to be of approximately equal length. With this combination of parallelism and length constraints, the gripper region can be effectively segmented from complex backgrounds using purely geometric criteria, without relying on instance-level appearance learning.

Unlike the crop detection pipeline, where the color-based separation is relatively robust, grippers in our system are predominantly white. Their appearance is heavily affected by lighting conditions, making direct color segmentation unreliable. To address this, we employ a hybrid approach consisting of HSL-based enhancement, CLAHE, and an inverted HSV mask generated from the crop and leaf regions in Figure 12b,c. This allows the removal of most non-white regions. The resulting image is then refined using Gaussian blurring, Otsu thresholding, and auto-tuned Canny edge detection.

To ensure robust detection, two preconditions are enforced: (1) the gripper must be completely within the camera’s field of view, and (2) it must be in the fully unactuated state. Candidates located too close to the image borders are discarded, since the rectangular shape of the image frame often leads to false positives when combined with other objects in the scene. After computing the length of each contour and its distance to the image boundary, small contours and those near the edges are filtered out. The remaining set is subjected to the previously described parallelism and length constraints, enabling accurate extraction of the gripper region.

In addition, precise localization of the gripper tip feature points is essential to ensure a successful and stable grasping. Continuing from the detection step, we analyze the outer contour of the gripper to identify its key vertices. For every three consecutive points along the contour, we compute the internal angle. If the angle is below a certain threshold (typically 1°–2°), the middle point is considered redundant and omitted as seen in Figure 13e,f. Through this angle-based vertex refinement, we can extract a minimal set of dominant points that describe the overall contour. The tip positions of the fingers are then directly inferred from these vertices. A visual example of this process is illustrated in Figure 14. The proposed method consistently yielded reliable results across diverse gripper types and conditions.

3.4. Gripper Modeling

To realize a robust control, whether to include a system model remains to be a topic of interest to the research community. Model-free control systems are more suitable for scenarios that the models themselves are so sophisticated to establish or their working conditions contain strong uncertainties. State-of-the-art model-free control methods such as Neural Network Control (NNC) and Reinforcement Learning (RL) usually require a large amount of time and data to train and tune the model. They have been proved to be competent for highly complex systems; however, their specific target-oriented feature makes them difficult to tune, lack versatility, safety and stability. Even though model-based control systems usually have larger computation workload depending on the complexity of the model, their high precision, predictable characteristic specially for Model Predict Control (MPC), constraint design friendly make them widely useful in industry.

In the current phase, most of our experiments have been done in the lab environment with stable illumination source and relatively simpler background with less disturbance information. Since we were not able to collect high quality experimental data from crop field for model training, model-free control was not considered. On the other hand, the computation capability of micro-controllers (e.g., Arduino, STM32 and ESP32) and single board computers (such as Raspberry Pi) is limited, thereby the model needs to be simplified. One goal of the harvesting system is to control the distance between the gripper tips to ensure a firm grasp. Since the separation can fall below 10

c

m

, real-time, low-latency control is necessary. To achieve a trade-off between these factors, a precise gripper static model will generate the desired feedforward input, and the dynamics model will correct the system error.

3.4.1. Gripper Static Modeling

As our proposed system relies merely on computer vision, the precision requirement could be at the semi-millimeter level, a more elaborate physics-based model to capture the relationship between pressure input (stress) and gripper bending angle (strain) based on Neo-Hookean method. Unlike most of the metal materials with a linear elastic Stress-Strain curve before the proportionality limit (the highest stress point in which stress is proportional to stress according to Hooke’s Law), hyperelastic materials are more suitable to be modeled by Neo-Hookean’s method. In this work, we propose a novel method to model pneumatic driven soft manipulators with irregular cross-sectional area shape and prove that circumferential stress of the grippers is not negligible.

Before describing the modeling process, it is worth noting several basic hypotheses about this method itself: the energy loss due to heat transfer is ignored; the material’s mechanical property is seen to be identical at any point; the strains are small (usually less than 20%); and the material is incompressible.

The governing equations for our model are derived from the strain energy density function of a general Neo-Hookean material in Equation (A1). The complete derivation, which incorporates the incompressibility constraint via a Lagrange multiplier and proceeds through to the final expressions for axial and circumferential stress in Equation (A8), is detailed in Appendix A. This derivation incorporates additional simplifying assumptions to reduce the system’s degrees of freedom, such as negligible radial stress and constant side-layer length.

Using this foundation, and representing

ε_{1}

with

ε

the stress per unit length for the axial and circumferential directions is given by:

\{\begin{matrix} σ_{1} & = μ ε - μ {(\frac{1}{ε})}^{2} \cdot \frac{1}{ε} = μ (ε - \frac{1}{ε^{3}}) \\ σ_{2} & = μ - μ {(\frac{1}{ε})}^{2} = μ (1 - \frac{1}{ε^{2}}) \end{matrix}

(1)

In refs. [25,26], there is no proof that the circumferential direction stress is negligible. In our work, we validated the feasibility of each strain-stress assumption and provided evidence that circumferential stress is not small enough to be ignored.

Another novelty of our work is to provide a more general method to model pneumatic driven soft manipulators with irregular cross-section shape along the backbone. This is applicable for more customized-free and complicated prototype design. For our design, the geometry feature of the gripper inner hollow chamber and outline dimension is periodic about its length. Thereby, by defining an isolated line (see Figure 15), the key dimensions, the width, upper height above the isolated line, lower height below the isolated line can be written into different piecewise functions,

w (l)

,

h_{t} (l)

and

h_{b} (l)

respectively. And the equivalent (with subscript e) dimension per unit length can be expressed as

\{\begin{matrix} w_{e} & = \frac{1}{l_{e} - l_{0}} \int_{l_{0}}^{l_{e}} w (l) d l \\ h_{e} & = \frac{1}{l_{e} - l_{0}} \int_{l_{0}}^{l_{e}} h_{t} (l) d l \\ t_{b e} & = \frac{1}{l_{e} - l_{0}} \int_{l_{0}}^{l_{e}} h_{b} (l) d l \end{matrix}

(2)

Substituting

h_{e}

in Equation (2) into Equation (A8), the strain and stress for gripper bottom, side and top layers are computed as follows, where

d α

,

d β

and

d γ

are infinitesimal thickness change for gripper bottom, side and top layer deformation respectively (as seen in Figure 16).

For bottom layer:

\{\begin{matrix} ε_{1 b} (α) & = \frac{l_{e} - l_{0} + Δ l_{b}}{l_{e} - l_{0}} = \frac{l_{e} - l_{0} + [\frac{2 π (1 / κ + α) θ}{2 π} - \frac{(2 π / κ) θ}{2 π}]}{l_{e} - l_{0}} = 1 + \frac{α θ}{l_{e} - l_{0}} \\ σ_{1 b} (α) & = μ [ε_{1 b} (α) - \frac{1}{ε_{1 b}^{3} (α)}] \end{matrix}

(3)

For side layer(s):

\{\begin{matrix} ε_{2 s} (β) & = \frac{h_{e} + Δ l_{s}}{h_{e}} = \frac{h_{e} + [\frac{2 π (h_{e} / θ + β) θ}{2 π} - \frac{2 π (h_{e} / θ) θ}{2 π}]}{h_{e}} = 1 + \frac{β θ}{h_{e}} \\ σ_{2 s} (β) & = μ [1 - \frac{1}{ε_{2 s}^{2} (β)}] \end{matrix}

(4)

For top layer:

\{\begin{matrix} ε_{1 t} (γ) & = \frac{l_{e} - l_{0} + Δ l_{t}}{l_{e} - l_{0}} = \frac{l_{e} - l_{0} + [\frac{2 π (1 / κ + h_{e} + γ) θ}{2 π} - \frac{2 π (1 / κ + h_{e}) θ}{2 π}]}{l_{e} - l_{0}} = 1 + \frac{γ θ}{l_{e} - l_{0}} \\ σ_{1 t} (γ) & = μ [ε_{1 t} (γ) - \frac{1}{ε_{1 t}^{3} (γ)}] . \end{matrix}

(5)

Then, the actuation moment

M_{P}

generated by the input pressure

Δ P

acting on the internal pneumatic chamber of the gripper and the elastic restoring moment

M_{θ}

due to the hyperelastic deformation of the gripper material can be expressed as

\{\begin{matrix} M_{P} & = Δ P \cdot \{\frac{1}{l_{e} - l_{0}} \int_{l_{0}}^{l_{e}} [h_{t} (l) \cdot w (l)] \cdot h_{t} (l) d l\} = \frac{Δ P}{l_{e} - l_{0}} \int_{l_{0}}^{l_{e}} w (l) \cdot h_{t}^{2} (l) d l \\ M_{θ} & = M_{t} + M_{b} + M_{s} \end{matrix}

(6)

where the torques of gripper bottom, side and top layers can be described as

\{\begin{matrix} M_{b} & = \int_{0}^{t_{b e}} σ_{1 b} (α) \cdot (w_{e} + 2 t_{s}) α \cdot (l_{e} - l_{0}) d α \\ = (l_{e} - l_{0}) \int_{0}^{t_{b e}} [(w_{e} + 2 t_{s}) α \cdot σ_{1 b} (α)] d α \\ M_{s} & = 2 \int_{0}^{t_{s}} σ_{2 s} (β) \cdot [h_{e} \cdot \frac{1}{2} w_{e} + 2 \int_{0}^{π / 2} {(\frac{1}{2} h_{e} + β)}^{2} sin φ d φ] \cdot (l_{e} - l_{0}) d β \\ = (l_{e} - l_{0}) \int_{0}^{t_{s}} [h_{e} w_{e} + \int_{0}^{π / 2} {(h_{e} + 2 β)}^{2} sin φ d φ] \cdot σ_{2 s} (β) d β \\ M_{t} & = \int_{0}^{t_{t}} σ_{1 t} (γ) \cdot [(w_{e} + 2 t_{s}) t_{b e} + (h_{e} + γ) (w_{e} + 2 t_{s})] \cdot (l_{e} - l_{0}) d γ \\ = (l_{e} - l_{0}) \int_{0}^{t_{t}} [(w_{e} + 2 t_{s}) (h_{e} + t_{b e} + γ)] \cdot σ_{1 t} (γ) d γ \end{matrix}

(7)

Bringing all the geometry factors shown in Figure 15 into Equations (6) and (7) and equating the two moments in Equation (6) result in an explicit mapping between statics model pressure

Δ P_{stat}

and gripper bending angle

θ

as

Δ P_{stat} = \frac{log (N_{A, a} θ + N_{A, b}) \cdot \sum_{i = 0}^{5} N_{A, i} θ^{i} + \sum_{j = 1}^{8} N_{B, j} θ^{j} + N_{C}}{D_{A} {(\sum_{k = 0}^{2} D_{A, k} θ^{k})}^{2} θ^{3}}

(8)

where N and D denote numerator and denominator, respectively.

N_{X}

,

N_{X, y}

,

D_{X}

and

D_{X, y}

are all very large constants with magnitude varying from

10^{28}

to

10^{154}

. Any approximation about these numbers will cause at least

10^{46}

magnitude deviation from the correct values. Instead, a linearized expression was derived using

p o l y f i t

in MATLAB redR2022b (MathWorks, Natick, MA, USA) as

\begin{matrix} Δ P_{stat, simp} = & - 16.521 θ^{4} + 444.826 θ^{3} - 6564.534 θ^{2} + 106903.183 θ + 52.475 . \end{matrix}

(9)

Compared to the explicit expression in Equation (8), the error ratio of the simplified version is less than

3 %

. With the increasing of the bending angle, the error ratio will gradually converge to zero as shown in Figure 17d, which confirms its reliability.

3.4.2. Modeling of the Gripper Dynamics

To model the dynamics of manipulators made of soft materials is still a challenge due to material’s hyper elastic inhomogeneity. Denavit–Hartenberg (D-H) convention with constant curvature (CC) assumption treating soft manipulators as rigid ones is still a commonly used method for modeling, but usually it works for short appendages (usually about 10

c

m

). For longer manipulators, when the deformation and elongation along the axial backbone direction are not negligible, the precision will decrease as evaluated in [37]. To address this, the manipulator is divided into several segments, and piecewise constant curvature (PCC) assumption will be applied onto different segments; this, however, will result in more degrees of freedom (DoFs) which largely affects the computations and performance.

In this paper, the gripper is modeled as a single-segment RPR (comprising three links, where P and R denote prismatic and revolute joints, respectively) continuum robot manipulator. This modeling approach is grounded in the Piecewise Constant Curvature (PCC) assumption, a well-established simplification for soft robotic manipulators that enables the application of rigid-body kinematics tools, such as the Denavit–Hartenberg (D-H) convention [38,39]. While the PCC assumption has known limitations for highly dynamic motions or large deformations, it provides a favorable balance between model fidelity and computational tractability for our quasi-static grasping application. The use of the D-H convention under PCC facilitates straightforward kinematic analysis and controller design, which is crucial for real-time implementation on our resource-constrained hardware platform. Considering that the gripper’s motion is limited in 2D plane, there is only one DoF for this system since the curvature

κ

is a constant; the second joint prismatic distance

d_{2}

can be expressed as a function of bending angle

θ

as

d_{2} = \frac{1}{κ} \cdot sin \frac{θ}{2} \cdot 2 = \frac{2 sin (θ / 2)}{κ}

(10)

2 π \cdot \frac{1}{κ} \cdot \frac{θ}{2 π} = l_{e} - l_{0}

(11)

combining Equations (10) and (11), we obtain

d_{2} = \frac{2 (l_{e} - l_{0}) sin (θ / 2)}{θ} .

(12)

Then, the D-H parameters can be summarized as given in Table 3.

The homogeneous transformation matrix

H_{i}^{i - 1}

for each link with respect to its local coordinate frame is determined by the dot product of transformation matrix for each parameter as

H_{i}^{i - 1} = R_{θ_{i}} T_{d_{i}} T_{a_{i}} R_{α_{i}} .

(13)

Then, the matrices with respect to the global or base coordinate frame are computed as

\{\begin{matrix} H_{1}^{0} & = [\begin{matrix} R_{1}^{0} & {[o_{1}]}_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{matrix}] = [\begin{matrix} cos (θ / 2) & 0 & sin (θ / 2) & 0 \\ - sin (θ / 2) & 0 & cos (θ / 2) & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}] \\ H_{2}^{0} & = H_{1}^{0} = [\begin{matrix} R_{2}^{0} & {[o_{2}]}_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{matrix}] = [\begin{matrix} cos (θ / 2) & sin (θ / 2) & 0 & d_{2} sin (θ / 2) \\ - sin (θ / 2) & cos (θ / 2) & 0 & d_{2} cos (θ / 2) \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}] \\ H_{3}^{0} & = H_{1}^{0} \cdot H_{2}^{1} = [\begin{matrix} R_{3}^{0} & {[o_{3}]}_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{matrix}] = [\begin{matrix} cos θ & 0 & 0 & d_{2} sin (θ / 2) \\ - sin θ & 1 & 0 & d_{2} cos (θ / 2) \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}] \end{matrix}

(14)

The Jacobian of the mass matrix for a specific link k can be derived based on the following

J_{k} = {[\begin{matrix} J_{1}^{(k)} & J_{2}^{(k)} & \dots & J_{k}^{(k)} & 0_{6 \times 1} & \dots & 0_{6 \times 1} \end{matrix}]}_{6 \times n}

(15)

in which n is total number of links, each element

J_{i}^{(k)}

is a

6 \times 1

column vector which is composed of two parts defined as

\begin{matrix} J_{i}^{(k)} & = {[\begin{matrix} J_{v_{i}}^{(k)} & J_{ω_{i}}^{(k)} \end{matrix}]}^{T} = \{\begin{matrix} revolute : [\begin{matrix} z_{i - 1} \times (o_{c, k} - o_{i - 1}) \\ z_{i - 1} \end{matrix}] \\ prismatic : [\begin{matrix} z_{i - 1} \\ 0_{3 \times 1} \end{matrix}] \end{matrix} (i = 1, 2, 3) \end{matrix}

(16)

where

J_{v_{i}}^{(k)}

and

J_{ω_{i}}^{(k)}

denote the Jacobian matrices of linear and angular velocities for each center of mass (CoM) of link i, respectively.

z_{i}

is the axis of rotation with

z_{0} = {[\begin{matrix} 0 & 0 & 1 \end{matrix}]}^{T}

,

z_{1}

and

z_{2}

are the third columns in

R_{1}^{0}

and

R_{2}^{0}

.

o_{i}

is the origin of each link’s local coordinate frame with

o_{0} = {[\begin{matrix} 0 & 0 & 0 \end{matrix}]}^{T}

and the remaining can be found from (14). Furthermore,

o_{c, k}

represent the CoM coordinates of each link which can be derived based on

o_{k - 1}

and

o_{k}

as

o_{c, k} = o_{k - 1} + (o_{k} - o_{k - 1}) / 2

(17)

The general form of system’s equation of motion derived using Euler-Lagrangian method is as

\begin{matrix} {[\begin{matrix} τ_{g} \end{matrix}]}_{3 \times 1} = {[\begin{matrix} M_{g} \end{matrix}]}_{3 \times 3} \cdot {\ddot{q}}_{g} + {[\begin{matrix} C_{g} \end{matrix}]}_{3 \times 3} \cdot {\dot{q}}_{g} + {[\begin{matrix} K_{g} \end{matrix}]}_{3 \times 1} \cdot q_{g} + {[\begin{matrix} G_{g} \end{matrix}]}_{3 \times 1} \end{matrix}

(18)

where the subscript g denotes gripper,

q_{g}

is the system’s variable

q_{g} = {[\begin{matrix} - θ / 2 & d_{2} & θ / 2 \end{matrix}]}^{T}

,

M_{g}

is the inertial matrix which can be computed using Jacobian matrices, derived previously, as

M_{g} = \sum_{i = 1}^{3} \{J_{v_{i}}^{T} m_{i} J_{v_{i}} + J_{ω_{i}}^{T} [R_{i}^{0} I_{i} {(R_{i}^{0})}^{T}] J_{ω_{i}}\}

(19)

where

m_{i}

and

I_{i}

denote the mass and inertia tensor of each link, respectively. Because the first and the third links are revolute, they were modeled as point mass; they actually have zero contribution to

M_{g}

. For simplicity, the second link was modeled as a rectangular cuboid with inertia tensor

I_{2} = diag (I_{xx}, I_{yy}, I_{zz})

, each element of which is defined as

\{\begin{matrix} I_{xx} & = \frac{1}{12} m_{2} [{(w_{e} + 2 t_{s})}^{2} + {(l_{e} - l_{0})}^{2}] \\ I_{yy} & = \frac{1}{12} m_{2} [{(h_{b e} + h_{t e} + t_{t})}^{2} + {(l_{e} - l_{0})}^{2}] \\ I_{zz} & = \frac{1}{12} m_{2} [{(h_{b e} + h_{t e} + t_{t})}^{2} + {(w_{e} + 2 t_{s})}^{2}] + m_{2} {(\frac{h_{b e} + h_{t e} + t_{t}}{2})}^{2} \end{matrix}

(20)

Furthermore,

C_{g}

is the Centrifugal-Coriolis matrix; each element

C_{ij}

of

C_{g}

can be computed based on the partial derivatives of elements

M_{ij}

in

M_{g}

as

C_{ij} = \frac{1}{2} \sum_{k = 1}^{3} (\frac{\partial M_{ij}}{\partial q_{k}} + \frac{\partial M_{ik}}{\partial q_{j}} - \frac{\partial M_{kj}}{\partial q_{i}}) {\dot{q}}_{k}

(21)

to reflect the compliance of soft manipulators and establish the actual binding between static and dynamic modeling, the stiffness term

K_{g} q_{g}

will be used. Based on the Hamilton Principle and linear viscoelasticity [40], it is defined by elastic potential energy

U_{K} (θ)

as (only bending, no stretching)

\begin{matrix} K_{g} q_{g} & = \nabla U_{K} (θ) = {[\begin{matrix} \frac{\partial U_{K}}{\partial q_{g, 1}} & \frac{\partial U_{K}}{\partial q_{g, 2}} & \frac{\partial U_{K}}{\partial q_{g, 3}} \end{matrix}]}^{T} \\ = {[\begin{matrix} \frac{\partial U_{K}}{\partial θ} \cdot \frac{\partial θ}{\partial q_{g, 1}} & \frac{\partial U_{K}}{\partial θ} \cdot \frac{\partial θ}{\partial q_{g, 2}} & \frac{\partial U_{K}}{\partial θ} \cdot \frac{\partial θ}{\partial q_{g, 3}} \end{matrix}]}^{T} \\ = \frac{\partial U_{K}}{\partial θ} {[\begin{matrix} \frac{\partial (q_{g, 3} - q_{g, 1})}{\partial q_{g, 1}} & \frac{\partial (q_{g, 3} - q_{g, 1})}{\partial q_{g, 2}} & \frac{\partial (q_{g, 3} - q_{g, 1})}{\partial q_{g, 3}} \end{matrix}]}^{T} \\ = M_{θ} \cdot {[\begin{matrix} - 1 & 0 & 1 \end{matrix}]}^{T} . \end{matrix}

(22)

The term gravitational force

G_{g}

is ignored for this model for several reasons: (1) The length of all the grippers are relatively small, and their self gravity will not affect so much their bending feature; (2) The bending feature modeling including gravitational factor is determined by the gesture of the gripper mounting plane in the 3D space. These state variables (e.g., Euler angles) are not controllable and measurable in the current stage.

For the left hand side (LHS) of the motion equations, the generalized torque

τ_{g}

balances the total moment

M_{P}

from Equation (6), and by Principle of Virtual Work, we have

\begin{matrix} δ W & = M_{P} \cdot δ θ = M_{P} \cdot δ [\frac{θ}{2} - (- \frac{θ}{2})] = M_{P} (δ q_{g, 3} - δ q_{g, 1}) \\ = \sum_{i = 1}^{3} τ_{g, i} δ q_{g, i} = τ_{g, 1} δ q_{g, 1} + τ_{g, 2} δ q_{g, 2} + τ_{g, 3} δ q_{g, 3}, \end{matrix}

(23)

where the calculus

δ q_{i}

represents the virtual angular or linear displacement. From Equation (23), it can be seen that only the two revolute joints contribute to the gripper bending, thus the generalized torque

τ_{g}

can be rewritten as

τ_{g} = {[\begin{matrix} - M_{P} & 0 & M_{P} \end{matrix}]}^{T} .

(24)

Assembling Equations (12), (19), (21), (22) and (24) into Equation (18), the equation of motion can be simplified into one DoF system as (note that the first one is the geometry constraint of the translation motion)

\{\begin{cases} \ddot{θ} = & [\frac{(l_{e} - l_{0})}{4 θ} - \frac{(2 - \frac{4}{θ}) cos \frac{θ}{2} - θ sin \frac{θ}{2}}{2 sin \frac{θ}{2} (θ cot \frac{θ}{2} - 2)}] {\dot{θ}}^{2} \\ \ddot{θ} = & \frac{5000}{163} {[\frac{θ}{(l_{e} - l_{0}) sin (θ / 2)}]}^{2} (M_{P} - M_{θ}) + {(cot \frac{θ}{2} - \frac{2}{θ})}^{2} {\dot{θ}}^{2} \end{cases}

(25)

3.4.3. Sensor & Electronics Dynamics

To obtain more straightforward mapping between system’s input (gripper tip distance sensed from webcam) and output (voltage supply for each electronic), a sensor dynamics needs to be established.

Furthermore, the compressor pump used in this project is 12

V

/ 4

A

for normal working condition with flow rate 18 LPM (liters per minute), thus the instant volume

V_{g}

inside the gripper chamber at time instant

t_{n}

can be expressed as

V_{g} = n_{g} V_{g, 0} + Δ V_{g}

(26)

in which the total volume air charged

Δ V_{g}

from starting time

t_{0}

to

t_{n}

can be rewritten into a time variant function related to flow rate with unit

m^{3} / s

,

n_{g}

is the total number of grippers, and

V_{g, 0}

is the initial volume of the hollow gripper chamber. Therefore, we obtain

\{\begin{matrix} Δ V_{g} & = \frac{V_{pump} - 3.5}{12 - 3.5} \cdot \frac{18}{60} \cdot \frac{Δ t}{1000} + Δ V_{g, n - 1} \\ = \frac{3}{1.7 \times 10^{5}} (2 V_{pump} - 7) \cdot Δ t_{n} + Δ V_{g, n - 1} \\ = \frac{3 \sum_{k = 1}^{n} (2 V_{pump, k} - 7) \cdot Δ t_{k}}{1.7 \times 10^{5}} \\ V_{g, 0} & = \int_{l_{0}}^{l_{e}} h_{t} (l) \cdot w (l) d l . \end{matrix}

(27)

Then, the actual absolute pressure inside gripper chamber will be

Δ P = \frac{V_{g}}{n_{g} V_{g, 0}} \cdot P_{0} - P_{0} = \frac{P_{0}}{n_{g} V_{g, 0}} \cdot Δ V_{g}

(28)

It is noted that in real applications, the starting and stopping voltages of a motor are not identical. Usually, for a 12

V

motor, the starting voltage is around 3

V

while the stopping voltage is only about 1 V to 2 V. To ensure the pumps can be actuated, the starting voltage was chosen for reference. For this brand of pump, the experimental starting voltage is around 3.3 V to 3.8 V, as shown in Equation (27), and

3.5

V

was set as a reference value.

Combining Equations (26)–(28), the relationship between output pressure

Δ P

and compressor pump voltage input at time instant

t_{n}

is reorganized as

\begin{matrix} V_{pump} & = \frac{4.7856 n_{g}}{10^{6} Δ t_{n}} \cdot Δ P - \frac{1.7 \times 10^{5}}{3 Δ t_{n}} Δ V_{g, n - 1} + 3.5 \\ = \frac{4.7856 n_{g}}{10^{6} Δ t_{n}} \cdot Δ P - \frac{\sum_{k = 1}^{n - 1} (2 V_{pump, k} - 7) \cdot Δ t_{k}}{Δ t_{n}} + 3.5 \end{matrix}

(29)

To calculate the distance between gripper tips, considering the limited image processing capability of the host machine and the uncertainty caused by infrared camera, their distance is directly measured in pixel unit

Δ d_{pixel}

. Meanwhile, we assume the webcam lens is always facing the direction as shown in Figure 18c. The distance between each two adjacent gripper mounting position

Δ D_{pixel}

and the projected chord length

l_{c, pixel}

can also be computed by knowing the coordinates of gripper key vertices. Therefore, to solve for the real bending angle

θ

in the 3D space, we have

sin \frac{θ_{proj}}{2} = \frac{Δ D_{pixel} - Δ d_{pixel}}{2} / l_{c, pixel}

(30)

sin \frac{θ_{proj}}{2} = sin \frac{θ}{2} \cdot cos \frac{π - 2 π / n_{g}}{2}

(31)

where

n_{g}

is the number of grippers, in this work,

n_{g} = 3

. Combining Equations (30) and (31) results in

\begin{matrix} θ = & θ (Δ d_{pixel}) = 2 {sin}^{- 1} [\frac{Δ D_{pixel} - Δ d_{pixel}}{2 l_{c, pixel} sin (π / n_{g})}] = 2 {sin}^{- 1} (\frac{Δ D_{pixel} - Δ d_{pixel}}{\sqrt{3} l_{c, pixel}}) \end{matrix}

(32)

3.5. Controller Design

To achieve real-time and accurate control of the soft gripper, a combined feedforward and feedback control strategy is adopted (as seen in Figure 19). The feedforward component generates an expected pressure input based on the gripper’s static model, estimating the pressure required to reach the desired equilibrium position or bending angle. Meanwhile, the feedback loop compensates for modeling errors and external disturbances by adjusting the control input in real-time. Within this loop, an adaptive PID controller is implemented, where the gains are dynamically tuned according to the desired steady-state behavior of the gripper, ensuring a responsive and robust performance across different operating conditions.

3.5.1. System Linearization

The estimated pressure for desired or reference bending angle

θ_{ref}

can be defined by the expression

Δ P_{dyn, ref} = f (θ_{ref}, {\dot{θ}}_{ref}, {\ddot{θ}}_{ref}), f : R^{3} \to R

using Equations (9) and (25). While the nonlinear model in Equation (8) captures the essential statics of the gripper’s pressure-angle relationship, its complexity poses challenges for real-time control implementation. To derive a tractable control law while preserving the dominant system characteristics, we linearize the dynamics around the equilibrium position defined by the reference bending angle, where

M_{p} - M_{θ}

can be expressed using static pressure input from Equation (9) as

M_{P} - M_{θ} = \frac{\int_{l_{0}}^{l_{e}} w (l) \cdot h_{t}^{2} (l) d l}{l_{e} - l_{0}} (Δ P_{dyn} - Δ P_{stat, simp}) .

(33)

Since

Δ P_{dyn}

cannot be directly measured in the system, it will be expressed by the pump voltage supply in Equation (29), thus for simplification, the second equation in Equation (25) can be rewritten as

\ddot{θ} = f_{\ddot{θ}} (θ, \dot{θ}, V_{pump})

. This approach yields a simplified representation of the system behavior in the vicinity of the equilibrium state (

θ_{ref}

,

{\dot{θ}}_{ref} = 0

and

V_{pump} = V_{pump, ref}

), enabling the design of an adaptive PID controller with explicitly defined stability margins. The linearization process using first order Taylor series proceeds as follows:

\begin{matrix} \ddot{θ} \approx & f_{\ddot{θ}} (θ_{ref}, {\dot{θ}}_{ref}, V_{pump, ref}) + {\frac{\partial f_{\ddot{θ}}}{\partial θ}|}_{\begin{matrix} θ = θ_{ref}, \dot{θ} = {\dot{θ}}_{ref} \\ V_{pump} = V_{pump, ref} \end{matrix}} (θ - θ_{ref}) \\ + {\frac{\partial f_{\ddot{θ}}}{\partial \dot{θ}}|}_{\begin{matrix} θ = θ_{ref}, \dot{θ} = {\dot{θ}}_{ref} \\ V_{pump} = V_{pump, ref} \end{matrix}} (\dot{θ} - {\dot{θ}}_{ref}) + {\frac{\partial f_{\ddot{θ}}}{\partial V_{pump}}|}_{\begin{matrix} θ = θ_{ref}, \dot{θ} = {\dot{θ}}_{ref} \\ V_{pump} = V_{pump, ref} \end{matrix}} (V_{pump} - V_{pump, ref}) \\ = & C_{A} \cdot θ + C_{B} \cdot \dot{θ} + C_{C} \cdot V_{pump} + C_{D} \end{matrix}

(34)

where all the polynomial coefficients from

C_{A}

to

C_{D}

are constant, and their specific expressions are as follows:

\{\begin{matrix} C_{A} = {\frac{\partial f_{\ddot{θ}}}{\partial θ}|}_{\begin{matrix} \begin{matrix} V_{pump} = V_{pump, ref} \\ θ = θ_{ref}, \dot{θ} = 0 \end{matrix} \end{matrix}}, C_{B} = {\frac{\partial f_{\ddot{θ}}}{\partial \dot{θ}}|}_{\begin{matrix} \begin{matrix} V_{pump} = V_{pump, ref} \\ θ = θ_{ref}, \dot{θ} = 0 \end{matrix} \end{matrix}}, C_{C} = {\frac{\partial f_{\ddot{θ}}}{\partial V_{pump}}|}_{\begin{matrix} \begin{matrix} V_{pump} = V_{pump, ref} \\ θ = θ_{ref}, \dot{θ} = 0 \end{matrix} \end{matrix}} \\ C_{D} = f_{\ddot{θ}} (θ_{ref}, 0, V_{pump, ref}) - θ_{ref} \cdot {\frac{\partial f_{\ddot{θ}}}{\partial θ}|}_{\begin{matrix} \begin{matrix} V_{pump} = V_{pump, ref} \\ θ = θ_{ref}, \dot{θ} = 0 \end{matrix} \end{matrix}} - V_{pump, ref} \cdot {\frac{\partial f_{\ddot{θ}}}{\partial V_{pump}}|}_{\begin{matrix} \begin{matrix} V_{pump} = V_{pump, ref} \\ θ = θ_{ref}, \dot{θ} = 0 \end{matrix} \end{matrix}} \end{matrix}

(35)

and to eliminate the system error

e (t)

, unmodeled effects and disturbances of the environment, the control input

V_{pump}

is designed using the canonical form of classical PID controller as

u (t) = V_{pump} = K_{P} \cdot e (t) + K_{I} \cdot \int e (t) d t + K_{D} \cdot \dot{e} (t) + K_{comp}

(36)

where

K_{comp}

is the compensation term, and the system error

e (t)

and

\dot{e} (t)

are expressed as

\{\begin{matrix} e (t) & = θ_{ref} (t) - θ (t) \\ \dot{e} (t) & = {\dot{θ}}_{ref} (t) - \dot{θ} (t) = - \dot{θ} \end{matrix}

(37)

Taking Equations (36) and (37) into Equation (35) and expanding results in

\begin{matrix} \ddot{θ} = (C_{A} - C_{C} K_{P}) \cdot θ + (C_{B} - C_{C} K_{D}) \cdot \dot{θ} + [K_{comp} + C_{C} K_{P} \cdot θ_{ref} + C_{C} K_{I} \int (θ_{ref} - θ) d t + C_{D}] . \end{matrix}

(38)

Thus, the compensation term can be expressed as

K_{comp} = C_{C} K_{I} \int (θ - θ_{ref}) d t - C_{C} K_{P} θ_{ref} - C_{D}

(39)

3.5.2. Tuning of the Controller Gains

The selection of an appropriate tuning method is critical for the performance of the soft gripper controller. While classical heuristic methods such as Ziegler-Nichols (Z-N) are widely used for PID tuning, they are less suitable for our application due to several system-specific characteristics. The Z-N method relies on extracting parameters from the system’s step response or inducing sustained oscillations, which assumes quasi-linear and time-invariant dynamics. However, our soft pneumatic gripper exhibits significant nonlinearities, hysteresis, and slow pneumatic response times. Attempting to apply the Z-N method could lead to aggressive tuning, resulting in overshoot and high-frequency oscillations that would be detrimental to the vision-based control loop and could induce mechanical stress in the soft structure.

Therefore, we adopted a model-based pole placement approach. This method allows us to explicitly incorporate the known system dynamics (via linearization) and directly address practical constraints, such as the vision processing delay and the desire for smooth, non-oscillatory responses to avoid visual feedback corruption.

To identify/tune the PID coefficients, we first specify that the system should reach steady-state within 1 s. This decision is motivated by the practical constraints of our application: the control loop is entirely vision-based, and each frame acquisition and processing cycle introduces a communication delay of approximately 20 ms. A faster response would amplify the effects of this latency and potentially destabilize the system, while a slower response would degrade control accuracy and responsiveness. Therefore, 1 s serves as a reasonable compromise to reach steady-state between robustness and performance. This time-domain specification is translated into a second-order system requirement using the standard form:

\ddot{θ} + 2 ζ ω_{n} \dot{θ} + ω_{n}^{2} θ = ω_{n}^{2} θ_{ref}

(40)

where

ζ

denotes the damping ratio and

ω_{n}

is the natural frequency. The damping ratio of

ζ = 0.7

is selected as a compromise between fast response and minimal overshoot, which is a common choice for critically damped-like behavior in second-order systems. This value ensures adequate phase margin and smooth transient response without excessive oscillation. Given the requirement that the system settles within 1 s, we estimate the natural frequency based on the empirical relationship between settling time

t_{s}

, damping ratio

ζ

and natural frequency

ω_{n}

expressed as

t_{s} \approx \frac{4}{ζ ω_{n}}

. Substituting

t_{s} =

1

s

and

ζ = 0.7

, the desired natural frequency is obtained as

ω_{n} \approx 5.714 rad / s

. Bringing these parameter values back into Equation (40), the canonical form can be rewritten as

\ddot{θ} = - 8 \dot{θ} - 32.65 θ + 32.65 θ_{ref} .

(41)

Combining this equation with Equation (38) and considering only the closed-loop dynamic characteristics, we obtain

\{\begin{matrix} C_{A} - C_{C} K_{P} = - 32.65 \\ C_{B} - C_{C} K_{D} = - 8 \end{matrix} \Rightarrow \{\begin{matrix} K_{P} = \frac{C_{A} + 32.65}{C_{C}} \\ K_{D} = \frac{C_{B} + 8}{C_{C}} \end{matrix}

(42)

To ensure that the system reaches steady state within 1 s, a dominant pair of complex-conjugate poles was placed at

- ζ ω_{n} \pm ω_{d} j = - 4 \pm 4.081 j

, corresponding to a natural frequency of

ω_{n} = 5.714 rad / s

and damping ratio of

ζ = 0.7

. These parameters were used to determine the proportional and derivative gains (

K_{P}

and

K_{D}

), based on a linearized second-order model that captures the dominant dynamics of the soft gripper.

To eliminate steady-state error in position tracking, an integral term was introduced, which increases the closed-loop system order from two to three. Although this changes the characteristic equation, the system’s dominant behavior is still governed by the previously designed second-order dynamics, and the integrator is treated as a mild perturbation.

To determine the integral gain

K_{I}

, an additional real pole was placed at

s_{3} = - 2

, located further to the left of the complex pair but not too far to cause numerical stiffness or slow zero dynamics. This placement ensures:

The added pole does not dominate the system’s transient response.
The resulting third-order characteristic polynomial remains well-conditioned.
The integral gain $K_{I}$ can be uniquely solved via pole-matching with the closed-loop system matrix.

This cascaded approach of first designing

K_{P}

and

K_{D}

from a second-order model followed by incorporating

K_{I}

via third-order pole placement is commonly adopted in controls community. It balances tractability with performance, ensuring both acceptable transient response and zero steady-state error. The complete desired characteristic polynomial then is

(s^{2} + 8 s + 32.65) (s + 2) = s^{3} + 10 s^{2} + 48.65 s + 65.3 = 0,

(43)

and based on Equation (38), by letting

x_{1} = θ

,

x_{2} = \dot{θ}

and

x_{3} = \int θ d t

, we can build a third order state-space representation as

\begin{matrix} \frac{d}{d t} [\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \end{matrix}] = [\begin{matrix} x_{2} \\ (C_{A} - C_{C} K_{P}) x_{1} + (C_{B} - C_{C} K_{D}) x_{2} - C_{C} K_{I} x_{3} + [K_{comp} + C_{C} (K_{P} θ_{ref} + K_{I} \int θ_{ref} d t) + C_{D}] \\ x_{1} \end{matrix}] \end{matrix}

(44)

Combining Equations (43) and (44) results in the numerical solution for integral gain:

- C_{C} K_{I} = 65.3 \Rightarrow K_{I} = - \frac{65.3}{C_{C}} .

(45)

To validate the effectiveness of the proposed controller, we first conduct numerical simulations based on the established gripper model. The simulated results provide a preliminary verification of the controller’s performance in tracking the desired bending angles. For the physical system, we estimate the actual bending angle by combining the gripper’s static model with experimental calibration data. Specifically, for desired bending angles ranging from 5° to 90°, we utilize the corresponding voltage values applied to the pump

V_{pump}

as listed in Table 4. For angles outside this calibrated range, we employ the analytical expression in Equation (8) to construct a numerical inverse function of bending angle

θ

with respect to pressure

Δ P_{dyn}

using MATLAB’s interp1 function. By feeding the real-time

V_{pump}

into this inverse mapping, we are able to estimate the real-time bending angle and thereby evaluate the controller’s closed-loop performance.

4. Results and Discussion

This section presents and discusses the experimental validation of the proposed soft robotic harvesting system. To provide a concise overview, Table 5 summarizes the key performance metrics across all evaluated aspects of the system.

As summarized in Table 5, the system demonstrates strong performance across cost, speed, reliability, and accuracy metrics. The following subsections provide detailed analysis and discussion of these findings, examining the underlying factors governing system behavior and contextualizing the results within the broader scope of agricultural robotics.

4.1. Examining the Effect of Circumferential Stress

To examine whether the circumferential direction stress is small enough to be neglected compared to the axial direction, a ratio

η_{σ}

is defined using Equation (A8) as

\begin{matrix} η_{σ} & = \frac{σ_{1}}{σ_{2}} = \frac{μ (ε - \frac{1}{ε^{3}})}{μ (1 - \frac{1}{ε^{2}})} = ε + \frac{1}{ε} \end{matrix}

(46)

The variable

ε

, introduced in Section 3.4.1, denotes the principal stretch ratio in the axial direction (

ε_{1}

). To evaluate an upper-bound scenario, we consider the maximum attainable strain in this direction, which leads to the following expression:

\begin{matrix} ε = & ε_{1, \max} = \frac{l_{e} - l_{0} + Δ l_{\max}}{l_{e} - l_{0}} \\ = & 1 + \frac{2 π (1 / κ + h_{b e} + h_{t e} + t_{t}) - 2 π \cdot (1 / κ)}{l_{e} - l_{0}} \cdot \frac{θ}{2 π} \\ = & 1 + \frac{h_{b e} + h_{t e} + t_{t}}{l_{e} - l_{0}} θ \end{matrix}

(47)

Defining a constant

c_{\max} = \frac{h_{b e} + h_{t e} + t_{t}}{l_{e} - l_{0}}

and substituting Equation (47) into Equation (46) gives

\begin{matrix} η_{σ} = c_{\max} θ + 1 + \frac{1}{c_{\max} θ + 1} . \end{matrix}

(48)

To confirm the extreme value for the ratio, the gradient of bending angle

θ

is expressed as

\begin{matrix} \frac{d η_{σ}}{d θ} = c_{\max} - \frac{c_{\max}}{{(c_{\max} θ + 1)}^{2}} \end{matrix}

(49)

The two saddle points are

θ_{η, 1} = - 2 / c_{\max}

and

θ_{η, 2} = 0

. Due to its physical meaning, negative bending angle is not analyzed since

c_{\max}

is always positive. From the left and middle in Figure 20, the minimum stress ratio will be 2 when

θ = 0

. Even though this ratio will keep increasing with larger bending angles, the tendency is not so significant, and the ratio can be less than 2.15 with a very large bending

θ = π

.

On the other hand, in real applications, there will always be some minor strain in the circumferential direction due to stretching. Assume

ε_{2} \neq 1

, and it is another variable while still taking

ε_{1}

as

ε

. Considering this in Equation (A8), the stress ratio

η_{σ}

will be updated as

\begin{matrix} η_{σ} = \frac{σ_{1}}{σ_{2}} = \frac{μ [ε - \frac{{(\frac{1}{ε_{2} ε})}^{2}}{ε}]}{μ [ε_{2} - \frac{{(\frac{1}{ε_{2} ε})}^{2}}{ε_{2}}]} = \frac{ε_{2} (ε_{2}^{2} ε^{4} - 1)}{(ε_{2}^{4} ε^{2} - 1) ε} \end{matrix}

(50)

Based on the assumptions made for the Neo-Hookean Theorem at the beginning of Section 3.4.1, the maximum strain is 20%. To simulate the impact the dominated stress

σ_{1}

has on the stress ratio

η_{σ}

, a series of

ε_{2}

’s were chosen for simulation studies as

ε_{2} = \{\begin{matrix} 1 + 0.01 n, & n \in {1, 2, 3, 4, 5} \\ 1.05 + 0.05 (n - 5), & n \in {6, 7, 8} \end{matrix}

(51)

simulation results are shown in Figure 20. Under practical conditions, when the strain in circumferential direction is also negligible, even for the maximum axial direction strain ratio (

ε_{1} = 1.2

), the maximum stress ratio

η_{σ}

still will not exceed 2. Therefore, it was shown that, by no means, the stress in circumferential direction is negligible.

4.2. Gripper Statics Model Verification

To validate the accuracy of the proposed static bending model, we conducted a set of controlled experiments under quasi-static conditions, using the soft gripper driven by voltage-modulated pneumatic actuation. The bending angle and tip displacement of the gripper were recorded under various actuation voltages and compared against the model’s predictions derived from the explicit formulation in Equation (8).

Figure 21 summarizes the comparison results. For each voltage level, the bar indicates the mean measured bending angle, the red line represents the median, and the black error bars show the standard deviation across 20 repeated trials. In general, the predicted and measured angles exhibit similar trends, with the closest agreement observed in the mid-voltage range (

V_{pump} \approx 3.5 - 4.5

V), corresponding to bending angles between

30 °

and

60 °

. In this region, the static model achieves high fidelity, with average absolute error about

6.3 °

. At lower voltages (e.g.,

V_{pump} =

3.5445

V

), the model tends to overestimate the actual bending angle, while at higher voltages (

V_{pump} >

4

V

), a reverse trend is observed. These boundary deviations remain within

\pm 7 . 75 °

, which is generally acceptable in applications where rough shape adaptation is sufficient.

Table 4 lists the reference commands and the corresponding measurements for both bending angle and finger-tip distance. Across the full set of target angles (

5 °

–

90 °

), the mean absolute tip distance error is

6.227

m

m

, and the maximum error reaches

- 11.474

m

m

at the

90 °

command. Restricting the analysis to the gripper’s typical working range (

5 °

–

60 °

) reduces the mean and maximum absolute angle errors to

5.138

m

m

and

6.975

m

m

, respectively.

These results confirm that while the simplified model may suffer from moderate boundary deviations, it captures the primary voltage–angle relationship with sufficient accuracy over the full range. The model is particularly reliable in the

5 °

–

60 °

range, which aligns well with the expected working range of the soft gripper during typical harvesting tasks.

Although the static pressure–gripper bending angle relationship was initially captured using a high-order nonlinear rational function in Equation (8) and later simplified into a polynomial form Equation (9), discrepancies remain between the predicted and observed bending angles. Notably, the bending angle tends to be overestimated at low input voltages and underestimated at higher voltages. These asymmetric deviations likely stem from a combination of modeling simplifications, actuator nonlinearities, and practical limitations in the pneumatic driven systems.

First, the derived pressure–voltage relationship assumes quasi-static air flow and ideal pressure transmission, where the pumped volume

Δ V_{g}

is linearly related to voltage above the threshold

3.5

V

. However, the compressor’s behavior under PWM-driven voltage input introduces time-dependent dynamics and nonlinearities not accounted for in the static mapping. At low voltages, the compressor may operate more efficiently per unit energy (possibly due to reduced internal losses or backpressure buildup) resulting in higher actual internal pressure and thus larger-than-predicted bending angles. Conversely, at high voltages, flow limitations such as valve resistance, chamber backpressure, and air leakage may cap the pressure increase, causing smaller-than-predicted bending.

Second, while the model presumes accurate control of internal chamber volume via Equation (28), real-world systems often suffer from leakage, unpredictable internal compliance, and backflow, particularly as the pressure differential increases. These factors limit the actual volume change despite increasing control input, leading to pressure saturation effects not captured in the model.

Third, from a mechanical standpoint, the model assumes isothermal compression and uses a constant curvature kinematic model, which becomes increasingly inaccurate as the actuator enters the nonlinear regime. At larger bending angles, material softening, axial stretch, and asymmetric wall deformation contribute to deviation from ideal behavior. Furthermore, soft actuators often exhibit hysteresis and creep, which introduce a memory effect between successive actuations that static models cannot replicate.

Fourth, although the simplified polynomial function improves computational tractability, it comes at the cost of boundary accuracy. The original fitted model (with very large coefficients, up to

10^{154}

) is numerically sensitive, and simplification necessarily trades off fidelity near the extremes of the voltage or pressure range. This may partially explain the underestimation of bending at high voltages.

Finally, the pressure model assumes instantaneous response between voltage input and pressure output. However, under stepwise or time-varying voltage commands, the system exhibits dynamic lag due to the finite pump response time, chamber filling rate, and the mechanical inertia of the structure. These transient delays, especially pronounced at higher voltage levels, further widen the gap between the static model’s predictions and real-time actuator response.

Despite the noted discrepancies, it is important to contextualize the model’s performance within its intended application. The observed overestimation of bending at low voltage inputs corresponds to scenarios where the gripper operates at smaller bending angles, which is typically associated with grasping larger, heavier crops. In such cases, a slightly larger-than-predicted deformation can be beneficial, as it improves the contact area and gripping force, contributing to a more secure grasp. From a practical standpoint, this asymmetry may therefore serve as a favorable feature rather than a flaw. Considering the trade-off between modeling complexity and real-world utility, the proposed static model offers a well-balanced solution. Its predictive accuracy is particularly robust within the 0°–70° bending range, which aligns well with the effective working envelope for most agricultural grasping tasks.

4.3. Controller Performance Validation

To evaluate the performance of the proposed controller, we conducted simulations aiming to regulate the bending angle of the soft gripper by controlling the voltage applied to the compressor pump. While the ultimate objective is to stabilize the gripper at a target bending angle, our setup lacks comprehensive experimental data directly mapping input voltage to bending angle.

To address this limitation, we refer to partial experimental data collected during the static model validation phase, which covered a limited range of bending angles (from

13 . 234 °

to

60.900 °

). For the remaining regions outside this interval, we adopted a simplified static model as a surrogate. Following the same protocol used in the validation stage, we selected nine representative angles (

5 °

,

10 °

,

15 °

,

20 °

,

30 °

,

45 °

,

60 °

,

75 °

, and

90 °

) as target setpoints.

However, due to the inherent physical properties of the soft pneumatic actuator, the bending angle is not solely determined by the instantaneous voltage or pressure, but rather by their cumulative effect over time. As a result, we chose to use the internal pressure of the gripper as an intermediate reference variable, mapping it to the actual bending angle as seen in Figure 22.

It is important to note that since this mapping combines both experimental and model-derived data, the resulting curve exhibits discontinuities at the boundaries between the data sources. These discontinuities introduce potential instability in the control system, particularly when operating near transition points between the experimental and modeled regions.

To evaluate the dynamic performance of the system under varying target bending angles, four simulation plots were generated. The first two plots in Figure 23 illustrate the simulation results over a 0–5 s time window. Specifically, Figure 23a presents the time evolution of the bending angle, while Figure 23b shows the corresponding control voltage over the same duration.

From these plots, it is evident that when the target bending angles are set to

10 °

and

75 °

, the system exhibits marginal stability characterized by persistent high-frequency oscillations. In contrast, when the desired angle is

5 °

, the response demonstrates periodic but minor oscillations with significantly reduced amplitude. This observed behavior aligns with the discontinuities previously noted in our hybrid modeling approach, which combines experimental measurements and idealized model predictions. Such discontinuities likely reflect the transition across different dynamic regimes of the soft actuator.

The last two plots (Figure 23c,d) display the simulation results over an extended time horizon of 0–10 s, again with bending angle and voltage shown in sequence. Unlike the earlier analysis, this simulation focuses only on cases where the system eventually reaches a steady-state. In these cases, both the bending angle and input voltage evolve smoothly over time. The response curves exhibit a near-parabolic trajectory, without the underdamped oscillations typically associated with second-order dynamic systems. This raises an open question regarding the plausibility of such smooth transitions, which will be further addressed in the Discussion section. Nonetheless, the observed convergence toward the target angles provides strong evidence that the proposed controller achieves both effectiveness and stability across a range of reference configurations.

The performance of the proposed controller was systematically evaluated across a range of target bending angles. Although the initial design goal was for the system to reach steady-state within 1 s, simulation results show that this criterion was met only under limited conditions, specifically when the desired bending angle lies between approximately

15 °

and

45 °

, or around

90 °

, where settling time remained within 2 s. For all other target angles, the settling time extended significantly, often approaching 10 s.

This deviation from classical second-order system behavior characterized by damped oscillations and well-defined settling time can be attributed to both deliberate design choices and intrinsic system constraints. Rather than exhibiting underdamped convergence with clear exponential decay, the response curves observed in simulation were smooth and gradual, resembling quasi-parabolic profiles. This reflects the underlying control philosophy: to drive the system in a quasi-static manner rather than toward fast dynamic convergence. The controller was intentionally tuned to minimize aggressive actuation and prioritize smooth transitions. This is particularly important in our application, where vision-based estimation of bending angles is employed. Any high-frequency oscillation or overshoot would introduce visual artifacts and noise, severely degrading the reliability of the perception pipeline. The smooth profiles observed in our results, with no signs of overshoot or secondary peaks, are therefore a desirable outcome rather than a limitation.

Additionally, hardware limitations played a non-negligible role in shaping this behavior. Notably, the pneumatic system lacks an active mechanism to quickly release air in the event of overshoot as there is no dedicated exhaust valve. As a result, pressure corrections must rely either on reverse flow through the pump or on natural leakage, both of which are inherently slow processes. While more aggressive PID gain tuning could, in theory, reduce settling time, such an approach would likely exacerbate the effects of hysteresis and viscoelastic lag intrinsic to silicone-based actuators. These nonlinearities could manifest as residual oscillations or overshoot, again counteracting the benefits of visual feedback and potentially inducing long-term mechanical fatigue.

Moreover, aggressive control would necessitate frequent switching of the pump and solenoid valves. Given the high inrush current required for valve actuation, such behavior would lead to increased thermal load, faster hardware wear, and reduced energy efficiency. These practical trade-offs further justify our preference for a quasi-static control regime. By maintaining low-frequency actuation and avoiding unnecessary component cycling, we improve overall system reliability and energy economy, which is an especially important consideration in portable or embedded robotic systems.

In summary, the controller performance, though not matching the rapid transient specifications of traditional second-order designs, aligns well with the demands and constraints of our soft robotic platform. The controller provides stable, smooth, and visually friendly actuation without inducing mechanical stress or energy inefficiencies. The lack of oscillatory behavior, the consistency in reaching and holding target angles, and the adaptability across various target states together validate the robustness and practicality of our quasi-static control strategy. Despite longer settling times under certain conditions, the system meets its design objectives and demonstrates effective and stable closed-loop performance.

4.4. Future Work

In the current prototype, the only sensor contributed to practical control is the camera. The pressure transducer did not actually participate in the control loop; it only worked as a monitor to ensure the pressure supply is within a safe threshold. The benefit of analogous system design is that with less state variables to measure or compute, it can satisfy low latency and highly real-time system requirements. Although the actual distance between the gripper tips can be directly measured by an infrared camera, this firstly requires the host machine to complete more image processing, the performance is largely restricted by the hardware CPU and GPU. Secondly, as the measurement accuracy depends on the baseline of stereo cameras, the uncertainty will vary with the distance between the camera lens and the object. Thirdly, the cost of adding cameras with infrared capability will be increased by an order of magnitude, which hinders large-scale deployment in agriculture applications. To address this, we provided an economic-friendly substitution solution.

However, the limitation of the existing prototype is that: (1) due to the stringent gripper vertex detection method, applied and unpredictable environmental illumination condition change, and hence the detection will sometimes fail; (2) The purely computer vision based control still has some chance to misjudge the crop being tightly hold state owing to lacking contact and interaction force feedback from the system.

To address the issues mentioned above, tactile force sensing resistor (FSR) matrix (e.g., Texscan 5027) will be fixed on each gripper finger tip to measure the contact force distribution (contour map). Moreover, this new function will make the grasping task more adaptable to crops with different skin texture hardness to maintain the crop quality while harvesting, especially for crops with softer skins like strawberries. And this can be also expanded to crop disease diagnosis affecting skin texture hardness.

5. Conclusions

This study addresses the dual crises of agricultural labor shortages and post-harvest losses through the development of an integrated soft robotic harvesting system tailored for delicate crops. By synergizing advancements in material science, edge computing, and control theory, the proposed framework overcomes longstanding limitations in agricultural automation. The low-cost, high-yield fabrication method for silicone-based soft grippers reduces production costs by 60% compared to traditional 3D-printed molds, while the compressive-sealing mold design ensures 100% fabrication yield across prototypes, which is a a critical milestone for scalable deployment. The decentralized IoT architecture, leveraging Raspberry Pi and Arduino platforms, demonstrates real-time performance (42 fps to 73 fps) at a per-node cost of around $175, effectively halving expenses relative to deep learning-based systems. This cost-efficiency, combined with modularity, positions the system as a viable solution for small-to-medium farms with limited capital.

The vision system’s hybrid approach, blending handcrafted geometric features with adaptive contrast analysis, achieves robust crop maturity assessment and gripper tracking under occlusion, wind disturbances, and variable lighting, that are conditions pervasive in agricultural settings. By eschewing data-hungry deep learning models, the pipeline reduces computational overhead while maintaining interpretability, a crucial factor for field technicians. Furthermore, the Neo-Hookean-based statics model, which incorporates circumferential stress and variable cross-sectional geometry, advances soft actuator modeling by addressing oversimplifications in prior constant-curvature approximations. Experimental validation confirms sub-centimeter precision (mean tip error of less than

5.2

m

m

) across bending angles up to

70 °

, enabling reliable feedforward control without reliance on expensive tactile sensors.

The hybrid feedforward–feedback controller, optimized for resource-constrained hardware, exemplifies the balance between model-based precision and real-time adaptability. By linearizing nonlinear dynamics around equilibrium points, the system achieves steady-state convergence within 1 s, which is a critical threshold for continuous harvesting cycles. However, challenges persist in extreme actuation ranges (e.g., overestimation at low voltages and saturation effects at high pressures), underscoring the need for dynamic compensation in future iterations. Overall, the presented study provides an integrated framework that connects soft gripper fabrication, vision-based perception, and model-based control into a unified, low-cost architecture tailored for real-time fruit harvesting applications.

Looking ahead, three directions emerge for practical translation: (1) integration of tactile sensing arrays to enable force-sensitive grasping and disease diagnostics via skin texture analysis; (2) field validation under diverse environmental conditions (e.g., rain, dust, and diurnal temperature shifts) to assess robustness; and (3) scalability studies for multi-arm cooperative harvesting in high-density fruit trees. Additionally, the system’s modular design invites adaptation to logistics (e.g., fragile package handling) and food processing (e.g., automated sorting), broadening its societal impact beyond agriculture.

By bridging the gap between laboratory prototypes and real-world applicability, this work establishes a foundational framework for next-generation agricultural robotics. It not only mitigates immediate challenges in labor and waste but also catalyzes a paradigm shift toward sustainable, precision farming in the face of global demographic and climatic pressures.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/agriengineering7110378/s1.

Author Contributions

Conceptualization, Y.J. and J.M.V.; methodology, Y.J. and J.M.V.; software, Y.J.; validation, Y.J.; formal analysis, Y.J.; investigation, Y.J. and J.M.V.; resources, J.M.V.; data curation, Y.J.; writing—original draft preparation, Y.J.; writing—review and editing, Y.J. and J.M.V.; visualization, Y.J.; supervision, J.M.V.; project administration, J.M.V.; funding acquisition, J.M.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the United States National Institute of Food and Agriculture (NIFA) under award No. 2020-67021-32461.

Data Availability Statement

Data are contained within this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Derivation of the Neo-Hookean Statics Model

This appendix provides the full mathematical derivation of the static model for the soft gripper, as referenced in Section 3.4.1. The derivation begins with the strain energy density function for a Neo-Hookean material:

W = C_{1} (I_{1} - 3)

(A1)

where

C_{1}

is the material property constant, which is determined by material’s shear modulus or second Lamé parameter

μ

(for the gripper material Dragon Skin 10 in this paper,

μ = 72 kPa

) as

C_{1} = μ / 2

(A2)

and

I_{1}

is the first invariant (trace) of right Cauchy-Green deformation tensor, which is defined as

I_{1} = \sum_{i = 1}^{3} ε_{i}^{2} (i = 1, 2, 3)

(A3)

where each ratio

ε_{i}

is the principal stretch in axial, circumferential and radial directions, respectively, and the relationship between these three ratios is

\prod_{i = 1}^{3} ε_{i}^{2} = 1 .

(A4)

Thus, a Lagrange function can be composed with parameter

λ

denoting the Lagrange multiplier as

\begin{matrix} L (ε_{i}) = W + λ \prod_{i = 1}^{3} ε_{i}^{2} = \frac{μ}{2} (ε_{1}^{2} + ε_{2}^{2} + ε_{3}^{2} - 3) + λ ε_{1} ε_{2} ε_{3} . \end{matrix}

(A5)

Then, the stress per unit length for each direction can be computed as

\{\begin{matrix} σ_{1} & = \frac{\partial L (ε_{i})}{\partial ε_{1}} = μ ε_{1} + \frac{λ}{ε_{1}} \\ σ_{2} & = \frac{\partial L (ε_{i})}{\partial ε_{2}} = μ ε_{2} + \frac{λ}{ε_{2}} \\ σ_{3} & = \frac{\partial L (ε_{i})}{\partial ε_{3}} = μ ε_{3} + \frac{λ}{ε_{3}} \end{matrix}

(A6)

To simplify the model without introducing more variables, several assumptions were made as follows:

The stress in the radial direction (which will affect the bottom layer wall thickness) and the strain in circumferential direction (the deformation will not contribute so much to the gripper bending) are negligible.
The length of the gripper bending towards side always remains unchanged.
The bending angle of the gripper side layer is deemed to be equivalent to the gripper bending angle (see Figure 17b).

Thus, we have

\{\begin{matrix} σ_{3} & = μ ε_{3} + \frac{λ}{ε_{3}} = 0 \Rightarrow λ = - μ ε_{3}^{2} \\ ε_{2} & = 1 \Rightarrow ε_{1} ε_{3} = 1 . \end{matrix}

(A7)

To further decrease the DoFs of the system,

ε

is used to represent

ε_{1}

, and hence (A6) can be rewritten as

\{\begin{matrix} σ_{1} & = μ ε - μ {(\frac{1}{ε})}^{2} \cdot \frac{1}{ε} = μ (ε - \frac{1}{ε^{3}}) \\ σ_{2} & = μ - μ {(\frac{1}{ε})}^{2} = μ (1 - \frac{1}{ε^{2}}) . \end{matrix}

(A8)

Appendix B. System Software Architecture and Algorithm Implementation

This appendix documents the core algorithmic implementation of the integrated soft robotic harvesting system presented in this work. The system is architected around four interconnected software modules, which synergize to deliver robust, real-time automation capabilities for delicate crop harvesting. The code is structured to prioritize computational efficiency, resource-aware execution, and modularity for field deployment.

The provided algorithms elucidate the transition from theoretical models to a functional embedded system. libpublic furnishes the foundational utilities for geometric computation, system configuration, and visualization. The heart of the visual perception pipeline is encapsulated within gpdipch, which details the lightweight, rule-based algorithms for concurrent crop and gripper detection, maturity assessment, and robust tracking under occlusion, all without relying on data-intensive deep learning. The libgpcombind module outlines the decentralized communication framework, enabling low-latency command and data exchange between the host computer and multiple actuator nodes via adaptive WiFi/UDP or serial protocols. Finally, the main program orchestrates the entire operation, implementing a state-aware, multi-threaded scheduler that dynamically coordinates perception, decision-making, and actuation threads.

Collectively, these algorithms demonstrate a practical integration of soft robotic actuation, real-time computer vision, and IoT-based control. They validate the system’s ability to achieve high-performance automation on low-cost hardware, bridging a critical gap between laboratory prototypes and scalable agricultural solutions. The following sections provide a complete technical specification for researchers and engineers seeking to replicate or build upon this work.

Algorithm A1 System initialization and multi-module loading (within libpublic and libgpcombind)

Require: None
Ensure: Initialized system objects: iso, com, crop, gripper

    Step 1 Dynamic loading of core modules: public function library, communication module, vision processing module.
    Step 2 Initialize system configuration objects:
         •  iso: System environment configuration (camera parameters, debug flags, Three-Strike Policy)
         •  com: Communication configuration (WiFi/serial parameters, device numbering)
         •  crop: Crop configuration (color thresholds, tracker type)
         •  gripper: Gripper configuration (geometric features, tracking parameters)
    Step 3 Automatic detection of available RGB cameras and resolution configuration.
    Step 4 Set color space thresholds (HSV environment colors, crop colors, gripper colors).
    Step 5 Initialize multi-threading events and lock mechanisms.

Algorithm A2 Crop detection and maturity analysis (within gpdipch)

Require: Raw image frame (iso.org)
Ensure: Candidate crop ROI list (crop.roi)

    Step 1 Color space conversion: BGR → LAB → HSV.
    Step 2 Extreme color filtering (white region removal).
    Step 3 Crop color protection segmentation:
         •  Apply crop-specific HSV thresholds.
         •  Soil color masking to eliminate interference.
    Step 4 Morphological processing and contour analysis:
         •  Otsu threshold segmentation.
         •  Minimum enclosing circle fitting.
    Step 5 Maturity validation:
         •  Calculate mature color pixel ratio.
         •  Apply maturity threshold filtering (default 80%).
    Step 6 Spatial constraint validation:
         •  Boundary distance checking.
         •  Size rationality verification.

    Control Logic:
        if No valid contours detected: return Empty ROI list
        else:
           for Each detected contour:
              if Contour area < Minimum threshold or > Maximum threshold: Skip contour
              else:
                 Calculate maturity ratio
                 if Maturity ratio ≥ Threshold: Add to candidate list
                 else: Skip contour
           end for
        return Sorted candidate list (largest to smallest)

Algorithm A3 Gripper detection and geometric feature extraction (within gpdipch)

Require: Raw image frame (iso.org)
Ensure: Gripper key vertex coordinates (gripper.roi)

    Step 1 Multi-level masking:
         •  Inverse crop color masking
         •  Leaf color exclusion
         •  Gripper white region enhancement
    Step 2 Geometric constraint filtering:
         •  Rectangularity verification (golden ratio constraint)
         •  Edge distance constraint
         •  Parallelism verification (angle tolerance ±10°)
    Step 3 Hierarchical contour analysis:
         •  Gaussian blur and Otsu segmentation
         •  Minimum area rectangle fitting
         •  Convex hull vertex extraction
    Step 4 Key vertex refinement:
         •  Angle threshold filtering (1°–2° redundancy elimination)
         •  Tip/base classification
         •  Bending angle calculation

    Control Logic:
        while Processing contours:
           if Contour area < Minimum threshold: Skip contour
           else: Calculate width-to-height ratio
           if Ratio not within golden ratio tolerance: Skip contour
           else: Check edge proximity
           if Too close to image border: Skip contour
           else: Calculate inclination angles
           if Angles not within parallel tolerance: Skip contour
           else: Add to valid gripper list
        end while
        if Valid grippers found:
          Select best candidates based on size consistency
          Extract key vertices using angle-based filtering
        else: return Empty vertex list

Algorithm A4 Multi-target tracking and state management (within gpdipch)

Require: Detected ROI, image sequence
Ensure: Real-time tracking status

    Step 1 Tracker selection and initialization:
         •  Support for multiple algorithms (CSRT, KCF, MedianFlow)
         •  Dynamic selection based on system resources (MedianFlow default)
    Step 2 Dual verification mechanism:
         •  Primary region: Circular maturity verification
         •  Secondary region: Non-shadowed area color verification
    Step 3 Scale adaptation:
         •  Depth axis motion compensation
         •  Dynamic bounding box adjustment
    Step 4 Occlusion handling:
         •  Temporary target loss tolerance
         •  Trajectory prediction and reacquisition

    Control Logic:
        while Tracking active:
           Update tracker with current frame
           if Tracking successful:
              Extract new bounding box
              Apply dual verification:
                 Primary: Calculate mature pixels in circular region
                 Secondary: Calculate mature pixels in rectangular complement
              if Both verifications pass:
                 Update ROI position
                 Continue tracking
              else:
                 if Consecutive failures > Threshold: Reinitialize tracker
                 else: Continue with predicted position
           else:
              if Target lost for extended period:
                 Terminate tracking thread to save resources
                 Return to detection mode (Algorithm A2)
              else: Predict position based on motion model
        end while

Algorithm A5 Vision-actuation thread control, parallel processing and resource management

Require: Image stream, control commands
Ensure: Coordinated vision-actuation loop

    Step 1 Thread initialization hierarchy:
         •  Primary: Crop processing thread (dip_crop)
         •  Secondary: Gripper processing thread (dip_gripper)
         •  Tertiary: Communication processing thread (RX_UDP/RX_serial)
    Step 2 Event-driven synchronization:
         •  Image capture event triggers all processing threads
         •  Mutex locks protect shared image data and ROI parameters
         •  Threads operate asynchronously with coordinated timing
    Step 3 Adaptive resource management:
         •  Dynamic priority adjustment based on detection success rates
         •  Graceful degradation under poor detection conditions
         •  Efficient thread termination and resource cleanup

    Thread Control Logic:
       1. Crop Processing Thread
       while System running:
          Wait for crop detection event
          Acquire processing lock
          if No crop detected: Perform crop detection
             if Detection successful: Initialize tracker
          else: Update tracking
             if Tracking failed: Reset detection
          Release processing lock
          Clear crop detection event
       end while
       2. Gripper Processing Thread
       while System running:
          Wait for gripper detection event
          Acquire processing lock
          if No gripper detected: Perform gripper detection
             if Detection successful: Lock on gripper
          else:
             if not Aligned with crop: Perform tracking
             else: Perform precise vertex detection
         Release processing lock
         Clear gripper detection event
       end while

    Resource Optimization:
         •  Monitor detection success rates continuously
         •  Reduce processing frequency for failing modules
         •  Reallocate resources to higher-success components
         •  Enter low-power mode during persistent failures

References

Food and Agriculture Organization of the United Nations. The Future of Food and Agriculture: Trends and Challenges. Available online: https://www.fao.org/3/i6583e/i6583e.pdf (accessed on 11 May 2025).
New American Economy. A Vanishing Breed: How the Decline in U.S. Farm Laborers Over the Last Decade Has Hurt the U.S. Economy and Slowed Production on American Farms. Available online: https://www.newamericaneconomy.org/research/vanishing-breed-decline-u-s-farm-laborers-last-decade-hurt-u-s-economy-slowed-production-american-farms (accessed on 11 May 2025).
Eurostat. Farms and Farmland in the European Union—Statistics. Available online: https://ec.europa.eu/eurostat/statistics-explained/index.php/Farms_and_farmland_in_the_European_Union_-_statistics (accessed on 11 May 2025).
International Labour Organization. Asia-Pacific Sectoral Labour Market Profile: Agriculture. Available online: https://www.ilo.org/resource/brief/asia-pacific-sectoral-labour-market-profile-agriculture (accessed on 11 May 2025).
Food and Agriculture Organization of the United Nations. The State of Food and Agriculture 2019: Moving Forward on Food Loss and Waste Reduction. Available online: https://www.fao.org/3/ca6030en/ca6030en.pdf (accessed on 11 May 2025).
FoodBev Media. Berry Big Problem: What’s Spoiling Our Strawberries? Available online: https://www.foodbev.com/news/berry-big-problem-what-s-spoiling-our-strawberries (accessed on 11 May 2025).
USDA Economic Research Service. Farm Labor. Available online: https://www.ers.usda.gov/topics/farm-economy/farm-labor (accessed on 11 May 2025).
Getahun, S.; Kefale, H.; Gelaye, Y. Application of Precision Agriculture Technologies for Sustainable Crop Production and Environmental Sustainability: A Systematic Review. Sci. World J. 2024, 2024, 2126734. [Google Scholar] [CrossRef] [PubMed]
Aliasgarian, S.; Ghassemzadeh, H.R.; Moghaddam, M.; Ghaffari, H. Mechanical Damage of Strawberry During Harvest and Postharvest Operations. Acta Technol. Agric. 2015, 18, 1–5. [Google Scholar] [CrossRef]
Azam, M.; Ejaz, S.; Rehman, R.N.U.; Khan, M.; Qadri, R. Postharvest Quality Management of Strawberries. In Strawberry—Pre- and Post-Harvest Management Techniques for Higher Fruit Quality; Asao, T., Asaduzzaman, M., Eds.; IntechOpen: London, UK, 2019. [Google Scholar] [CrossRef]
He, Z.; Khanal, S.R.; Zhang, X.; Karkee, M.; Zhang, Q. Real-time Strawberry Detection Based on Improved YOLOv5s Architecture for Robotic Harvesting in Open-Field Environment. arXiv 2023, arXiv:2308.03998. [Google Scholar]
Botta, A.; Cavallone, P.; Baglieri, L.; Colucci, G.; Tagliavini, L.; Quaglia, G. A Review of Robots, Perception, and Tasks in Precision Agriculture. Appl. Mech. 2022, 3, 830–854. [Google Scholar] [CrossRef]
Wakchaure, M.; Patle, B.; Mahindrakar, A. Application of AI Techniques and Robotics in Agriculture: A Review. In Proceedings of the International Conference on Robotics in Agriculture, London, UK, 14–15 August 2023. [Google Scholar]
LattePanda. Running YOLOv8 on LattePanda Mu: Performance Benchmarks. Available online: https://www.lattepanda.com/blog-323173.html (accessed on 11 May 2025).
Navas, E.; Shamshiri, R.R.; Dworak, V.; Weltzien, C.; Fernández, R. Soft gripper for small fruits harvesting and pick and place operations. Front. Robot. AI 2024, 10, 1330496. [Google Scholar] [CrossRef] [PubMed]
Zhang, D.; Zhang, W.; Yang, H.; Yang, H. Application of Soft Grippers in the Field of Agricultural Harvesting: A Review. Machines 2025, 13, 55. [Google Scholar] [CrossRef]
Shintake, J.; Rosset, S.; Schubert, B.; Floreano, D.; Shea, H. Versatile Soft Grippers with Intrinsic Electroadhesion Based on Multifunctional Polymer Actuators. Adv. Mater. 2016, 28, 231–238. [Google Scholar] [CrossRef]
Navas, E.; Fernández, R.; Sepúlveda, D.; Armada, M.; González-de Santos, P. Soft Grippers for Automatic Crop Harvesting: A Review. Sensors 2021, 21, 2689. [Google Scholar] [CrossRef]
Wang, X.; Kang, H.; Zhou, H.; Au, W.; Wang, M.Y.; Chen, C.H. Development and Evaluation of a Robust Soft Robotic Gripper for Apple Harvesting. Comput. Electron. Agric. 2023, 204, 107552. [Google Scholar] [CrossRef]
Zaidi, S.; Maselli, M.; Laschi, C.; Cianchetti, M. Actuation Technologies for Soft Robot Grippers and Manipulators: A Review. Curr. Robot. Rep. 2021, 2, 355–369. [Google Scholar] [CrossRef]
Guadalupe, J.A.; Copaci, D.; del Cerro, D.S.; Moreno, L.; Blanco, D. Efficiency Analysis of SMA-Based Actuators: Possibilities of Configuration According to the Application. Actuators 2021, 10, 63. [Google Scholar] [CrossRef]
Lin, Y.; Zhou, X.; Cao, W. 3D-Printed Hydraulic Fluidic Logic Circuitry for Soft Robots. arXiv 2024, arXiv:2401.16827. [Google Scholar] [CrossRef]
Walker, J.; Zidek, T.; Harbel, C.; Yoon, S.; Strickland, F.S.; Kumar, S.; Shin, M. Soft Robotics: A Review of Recent Developments of Pneumatic Soft Actuators. Actuators 2020, 9, 3. [Google Scholar] [CrossRef]
Wong, D.C.Y.; Li, M.; Kang, S.; Luo, L.; Yu, H. Reconfigurable, Transformable Soft Pneumatic Actuator with Tunable 3D Deformations for Dexterous Soft Robotics Applications. arXiv 2023, arXiv:2311.03032. [Google Scholar] [CrossRef]
Tian, Y.; Zhang, Q.; Cai, D.; Chen, C.; Zhang, J.; Duan, W. Theoretical modelling of soft robotic gripper with bioinspired fibrillar adhesives. Mech. Adv. Mater. Struct. 2022, 29, 2250–2266. [Google Scholar] [CrossRef]
Polygerinos, P.; Wang, Z.; Overvelde, J.T.; Galloway, K.C.; Wood, R.J.; Bertoldi, K.; Walsh, C.J. Modeling of soft fiber-reinforced bending actuators. IEEE Trans. Robot. 2015, 31, 778–789. [Google Scholar] [CrossRef]
Wang, H.; Ni, H.; Wang, J.; Chen, W. Hybrid vision/force control of soft robot based on a deformation model. IEEE Trans. Control Syst. Technol. 2019, 29, 661–671. [Google Scholar] [CrossRef]
Wu, Q.; Gu, Y.; Li, Y.; Zhang, B.; Chepinskiy, S.A.; Wang, J.; Zhilenkov, A.A.; Krasnov, A.Y.; Chernyi, S. Position control of cable-driven robotic soft arm based on deep reinforcement learning. Information 2020, 11, 310. [Google Scholar] [CrossRef]
Song, Z.; Zhou, Y.; Zhao, L.; Chang, C.; An, W.; Yu, S. A wearable capacitive friction force sensor for E-skin. ACS Appl. Electron. Mater. 2022, 4, 3841–3848. [Google Scholar] [CrossRef]
Zhou, Z.; Zuo, R.; Ying, B.; Zhu, J.; Wang, Y.; Wang, X.; Liu, X. A sensory soft robotic gripper capable of learning-based object recognition and force-controlled grasping. IEEE Trans. Autom. Sci. Eng. 2022, 21, 844–854. [Google Scholar] [CrossRef]
Visentin, F.; Castellini, F.; Muradore, R. A soft, sensorized gripper for delicate harvesting of small fruits. Comput. Electron. Agric. 2023, 213, 108202. [Google Scholar] [CrossRef]
Tekscan, Inc. Force and Pressure Sensors. Available online: https://www.tekscan.com/products-solutions/sensors (accessed on 13 May 2025).
Tekscan, Inc. Model 5027—Pressure Mapping Sensor. Available online: https://www.tekscan.com/products-solutions/pressure-mapping-sensors/5027 (accessed on 13 May 2025).
U.S. Department of Agriculture. Apples—SNAP-Ed Connection. Available online: https://snaped.fns.usda.gov/resources/nutrition-education-materials/seasonal-produce-guide/apples (accessed on 14 May 2025).
U.S. Department of Agriculture. Oranges—SNAP-Ed Connection. Available online: https://snaped.fns.usda.gov/resources/nutrition-education-materials/seasonal-produce-guide/oranges (accessed on 14 May 2025).
U.S. Department of Agriculture. Peaches—SNAP-Ed Connection. Available online: https://snaped.fns.usda.gov/resources/nutrition-education-materials/seasonal-produce-guide/peaches (accessed on 14 May 2025).
Sadati, S.H.; Naghibi, S.E.; Shiva, A.; Michael, B.; Renson, L.; Howard, M.; Rucker, C.D.; Althoefer, K.; Nanayakkara, T.; Zschaler, S.; et al. TMTDyn: A Matlab package for modeling and control of hybrid rigid–continuum robots based on discretized lumped systems and reduced-order models. Int. J. Robot. Res. 2021, 40, 296–347. [Google Scholar] [CrossRef]
Wang, C.; Wagner, J.; Frazelle, C.G.; Walker, I.D. Continuum robot control based on virtual discrete-jointed robot models. In Proceedings of the IECON 2018—44th Annual Conference of the IEEE Industrial Electronics Society, Washington, DC, USA, 21–23 October 2018; pp. 2508–2515. [Google Scholar]
Wang, C.; Frazelle, C.G.; Wagner, J.R.; Walker, I.D. Dynamic control of multisection three-dimensional continuum manipulators based on virtual discrete-jointed robot models. IEEE/ASME Trans. Mechatronics 2020, 26, 777–788. [Google Scholar] [CrossRef]
Shabana, A.A. Computational Continuum Mechanics, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2018; Chapters 4–5. [Google Scholar]

Figure 1. Crop harvesting gripper mold design and building procedure demo.

Figure 2. Soft gripper finger mold design benchmark and common failure modes for previous designs. (a) Two other preliminary mold design schemes for fabricating the soft gripper. (Left): The parting line separates the mold into top and bottom halves, enabling the formation of an internal hollow chamber while using a solid base. (Right): The parting line follows the gripper’s backbone, resulting in two laterally symmetrical mold halves. These early designs served as important steps toward optimizing molding feasibility and gripper performance. (b) In the top-bottom mold scheme, gravitational imbalance during casting may cause uncured silicone in the lower mold to stick to the semi-solid structure in the upper mold, leading to blockage of the hollow chamber. (c) Insufficient inward pressure along the parting line during mold assembly can result in weak seam regions or even large gaps; this issue is present in both mold designs.

Figure 3. Circuit schematic of the soft robotic harvesting system, showing interconnections between the Raspberry Pi host, Arduino servo controllers, pneumatic actuators, sensors, and communication modules. Dashed arrows represent data exchange via wireless connection.

Figure 4. System-level overview of the crop harvesting soft gripper experimental setup. (a) Overall view of the experimental setup which includes three basic subsystems: Raspberry Pi host machine, test bench with servo system and necessary accessories for system performance monitoring and debugging. (b) Host machine working as a computation center that is equipped with SSD for rapid data read and write, and a webcam for image processing. (c) Front view of the test bench, which includes the gripper mathematical modeling validation (on top) and general purpose crop harvesting image processing algorithm test benches, servo control and actuation subsystems.

Figure 5. Experimental setup and devices used for multi-functional test bench. (a) Rear view of the test bench, the optional devices marked are not necessarily needed to be used in real application, some of them were only used for validation during the development stage. (b) Top view of the actuation subsystem which should at least include one air pump and one solenoid valve. (c) Top view of the servo control system which includes a micro controller with IoT capability (Arduino Uno R4 WiFi) and motor driver to operate at larger current.

Figure 6. Soft gripper mathematical model validation and control performance test bench layout with intermediate image processing for bending angle calculation. (a) Gripper modeling and control evaluation test bench setup. (b) Gripper segmentation after applying Otsu and Gaussian Blur methods. (c) Gripper outline extraction. (d) Gripper key vertices extraction using convexHull function in OpenCV. (e) The two final characteristic vertices (top left and right most) are extracted for computing the gripper bending angle.

Figure 7. Gripper dynamics state variable and coordinate frame for each link definition (⊙ and ⊗ represent out of and into the page axis directions, respectively). (a) Key geometry factor variables defined for a general gripper bending case with positive pressure charged. (b) Coordinate frame definition for the simplified RPR gripper based on D-H convention.

Figure 8. Vision based control crop harvesting system workflow that shows the details involved in the entire process. Modules enclosed in dashed boxes belong to the same unified function, which processes either crop or gripper inputs by switching designated identifiers.

Figure 9. Fundamental types of crop tracking failure and secondary ROI evaluation strategy. (a) Beginning of crop tracking process with initial bounding box. (b) Crop tracking failure example due to webcam lens zoom-in. (c) Crop tracking failure example due to webcam lens zoom-out. (d) Crop ROI detection and tracking inaccuracy due to close proximity of multiple crops. (e) Crop tracking failure due to slight misalignment of the tracking bounding box. (f) Demonstration of the crop maturity and spatial validation process by calculating the percentage of mature-color pixels in both the shadowed area and the visible portion of the bounding box.

Figure 10. Image processing workflow for crop detection and selection. The images are extracted from real-time video frames. (a) ROI detection result: The system accurately outlines the contours of all potential fruits for further evaluation. (b) Target selection: After estimating the ripeness of each candidate, the system locks onto the one with the highest maturity as the harvesting target.

Figure 11. Gripper detection demonstration in the real application. (a) System is able to precisely recognize and segment gripper fingers’ ROIs when they horizontally entered the webcam frame with maximum gripper finger self-gravity impact. (b) Gripper fingers with different gesture(s) can be detected by the system.

Figure 12. Gripper detection process under sophisticated environment. (a) Original unprocessed webcam capture frame. (b) Processed frame after applying global pixel brightness balanced filter using CLAHE. (c) Processed frame after applying crop, leaf and gripper HSV masks in sequence. (d) Qualified contours outlined after applying Gaussian Blur filter with adaptive kernel size and Otsu method filter. (e) Qualified contours resembling rectangle outlined by filtering out contours with small length and inclination angle beyond 90° ± 10% (uncertainty). (f) Real gripper ROI sorted out by filtering out contours close to the frame edge and those that do not have other contours with mount base and end-effector positions on the similar horizontal or vertical level (10% uncertainty).

Figure 13. Gripper configuration and key features identification during tracking process. (a) Gripper entity being sorted out after applying crop, leaf and gripper HSV masks. (b) First round gripper outline produced after applying a slight Gaussian Blur filter without losing so much detail. (c) Second round gripper outline produced after applying an auto Canny filter by analyzing the mean, median and standard deviation of pixel intensity, a smoother gripper outline was produced by applying dilation morphological operation. (d) Convex polygon outline produced by merging all the points and opened contours. (e) Less vertices sorted out by filtering out most of small length contours. (f) Only key vertices kept after applying angle threshold between lines composed by three adjacent points.

Figure 14. Gripper key vertices detection demonstration in the real application. (a) Gripper key vertices being sorted out under unactuated state. (b) Gripper key vertices can be precisely sorted out under different deformation configuration(s).

Figure 15. Soft gripper key dimension definition and piecewise functions about length for non-constant dimensions

w (l)

,

h_{t} (l)

and

h_{b} (l)

. The dashed curves denote the cross-sectional shape at the gripper tip, used as the dimensional boundary for the piecewise definitions. (a) Bottom view of gripper cross-section area from

w o l

plane. (b) side view of gripper cross-section area from

h o l

plane.

Figure 15. Soft gripper key dimension definition and piecewise functions about length for non-constant dimensions

w (l)

,

h_{t} (l)

and

h_{b} (l)

. The dashed curves denote the cross-sectional shape at the gripper tip, used as the dimensional boundary for the piecewise definitions. (a) Bottom view of gripper cross-section area from

w o l

plane. (b) side view of gripper cross-section area from

h o l

plane.

Figure 16. Gripper deformation variable definition for computing the strain and stress for each layer. (a) Bottom and top layers’ strain changing with thickness

α

and

γ

, respectively. (b) Side layers’ strain changing with wall thickness

β

while assuming the circumferential direction stress is not negligible and their bending angles are equivalent to gripper’s bending angle

θ

.

Figure 16. Gripper deformation variable definition for computing the strain and stress for each layer. (a) Bottom and top layers’ strain changing with thickness

α

and

γ

, respectively. (b) Side layers’ strain changing with wall thickness

β

while assuming the circumferential direction stress is not negligible and their bending angles are equivalent to gripper’s bending angle

θ

.

Figure 17. Pressure supply

Δ P

as a function of bending angle

θ

over different domains: (a) Full range

θ \in (- \infty, \infty)

. (b) Bounded range

θ \in [- 2 π, 2 π]

. (c) Typical actuation range

θ \in [0, π]

. These plots illustrate the nonlinear characteristics and singularities of the pressure response. (d) Evaluation of simplified and linearized statics model’s reliability in terms of the error and error ratio compared with the original explicit expression in range

θ \in [0, π]

.

Figure 17. Pressure supply

Δ P

as a function of bending angle

θ

over different domains: (a) Full range

θ \in (- \infty, \infty)

. (b) Bounded range

θ \in [- 2 π, 2 π]

. (c) Typical actuation range

θ \in [0, π]

. These plots illustrate the nonlinear characteristics and singularities of the pressure response. (d) Evaluation of simplified and linearized statics model’s reliability in terms of the error and error ratio compared with the original explicit expression in range

θ \in [0, π]

.

Figure 18. Definition of measurable dimensions from the webcam in 2D and 3D views. (a) The grippers are mounted on the same base with the same horizontal level, by defining the origin of each gripper at the center of the bottom layer, the angle between each adjacent two is 120°. (b) Distance projection measurement on a 2D plane, e.g., plane

z_{1} o_{1} o_{3}

. (c) The relationship schematic of actual bending angle

θ

in the 3D space and projection angle

θ_{proj}

on a 2D plane.

Figure 18. Definition of measurable dimensions from the webcam in 2D and 3D views. (a) The grippers are mounted on the same base with the same horizontal level, by defining the origin of each gripper at the center of the bottom layer, the angle between each adjacent two is 120°. (b) Distance projection measurement on a 2D plane, e.g., plane

z_{1} o_{1} o_{3}

. (c) The relationship schematic of actual bending angle

θ

in the 3D space and projection angle

θ_{proj}

on a 2D plane.

Figure 19. Block diagram for the system controller design.

Figure 20. Axial-to-circumferentia l stress ratio

η_{σ}

with respect to gripper bending angle

θ

and axial direction strain

ε_{1}

. Subplots (a,b) depict the ratio and its gradient while assuming no strain on circumferential direction. The ratio is increasing with a small ascending gradient. (c) Real application is considered when

ε_{2} \neq 1

. With the increasing of axial and circumferential strain, the ratio, however, does not exceed 2, which indicates that the circumferential direction stress is not negligible.

Figure 20. Axial-to-circumferentia l stress ratio

η_{σ}

with respect to gripper bending angle

θ

and axial direction strain

ε_{1}

. Subplots (a,b) depict the ratio and its gradient while assuming no strain on circumferential direction. The ratio is increasing with a small ascending gradient. (c) Real application is considered when

ε_{2} \neq 1

. With the increasing of axial and circumferential strain, the ratio, however, does not exceed 2, which indicates that the circumferential direction stress is not negligible.

Figure 21. Single gripper finger bending feature validation under different compressor pump voltage supplies (or pressure inputs). (a) The measured bending angles of the gripper finger at various desired target angles. The blue bars represent the mean values over 20 repetitions each, red horizontal lines indicate the median, and black error bars show the standard deviation. (b) Comparison between predicted (or desired) and measured bending angles. The theoretical values are calculated based on the static model, while the experimental results are computed by averaging the mean and median experimental bending angles. Cubic spline interpolation is used to show the trend, with vertical lines indicating the deviation between the two datasets.

Figure 22. Relationship between gripper bending angle and internal pressure. For bending angles between

12.234 °

and

60.009 °

, the pressure values are inferred from experimentally measured pump voltages. Outside this range, data is obtained from static theoretical model.

Figure 22. Relationship between gripper bending angle and internal pressure. For bending angles between

12.234 °

and

60.009 °

, the pressure values are inferred from experimentally measured pump voltages. Outside this range, data is obtained from static theoretical model.

Figure 23. Simulation results evaluating the bending angle and corresponding pump voltage under various reference angles. The results demonstrate the controller’s effectiveness in achieving stable or marginally stable responses depending on the desired bending angle. (a) Simulated bending angle trajectories (0–5 s) under multiple reference angles. (b) Corresponding pump voltage profiles (0–5 s) for the simulations. (c) Bending angle trajectories (0–10 s) for selected reference angles that reached steady-state. (d) Pump voltage profiles (0–10 s) corresponding to the cases.

Table 1. Benchmarking detection and tracking fps across Raspberry Pi and edge AI modules.

Model + Algorithm	Resolution (Pixels)	Raspberry Pi Module	Detect Obj. No. (Fps)		Detect + Track Obj. No. (Fps)
Model + Algorithm	Resolution (Pixels)	Raspberry Pi Module	1	2	1	2
YOLOv5n + SORT	640 × 640	Pi 4	∼1.5–2.5	∼1–2	∼1–2	∼1
YOLOv5n + SORT	640 × 640	Pi 5	4–6	∼3.5–5	3–4	∼2–3
YOLOv8n + ByteTrack	640 × 640	Pi 4	∼2.5–3	∼1.5–2	∼2–3	∼1.5–2
YOLOv8n + ByteTrack	640 × 640	Pi 5	8–10	∼6–8	∼7–9	∼5–7
MobileNet-SSD + SORT	300 × 300	Pi 4	8–10	∼6–8	∼7–9	∼6–7
MobileNet-SSD + SORT	300 × 300	Pi 5	15–18	∼12–15	∼14–16	∼12–14
YOLOv5n + DeepSORT	640 × 640	Pi 4	∼1.5–2.5	∼0.5–1	∼1	∼0.5–1
YOLOv5n + DeepSORT	640 × 640	Pi 5	4–6	∼1–2	∼3–4	∼1–2
Tiny-YOLOv4 + SORT	416 × 416	Pi 4	3–5	∼2–4	∼3–4	∼2–3
Tiny-YOLOv4 + SORT	416 × 416	Pi 5	10–14	∼8–12	∼9–11	∼7–9
YOLOv8s + Hailo-8L	640 × 640	Pi 5	∼120	∼100	∼115	∼100

Table 2. Bill-of-material (BOM) of each host and servo machine.

Architecture	Device/Item/Material	Price Each ($) ¹	Quantity
Host	Raspberry Pi 5 8G	89.97	1
	SAMSUNG 990 PRO SSD 1T	159.99	1
	Arducam 4K 8MP IMX219 Autofocus USB Camera Module	38.99	1
Servo	Arduino Uno R4 WiFi	27.50	1
	DC 12 V 4 A Micro Air Pump	45.57	1
	Beduan 2 Way Normally Closed DC 12V Electric Solenoid Air Valve	9.99	1
	BTS7960 DC Motor Driver	7.44	1
	Gripper Mold & Accessories ²	19.99	1
	Gripper Finger ³	3.90	3
Node ⁴	$\frac{1}{n} (Host + n \cdot Servo)$	184.68	$n_{\min} = 4$
Node ⁴	$\frac{1}{n} (Host + n \cdot Servo)$	172.18	$n_{\max} = 5$

¹ The prices listed represent the official manufacturer’s suggested retail prices (MSRP) at the time of preparing this manuscript (actual prices may vary slightly depending on market conditions, availability, and regional differences). ² The gripper mold and other 3D-printed components were fabricated using 3D printing technology, and their listed prices are estimated based on the material consumption equivalent to one spool of Bambu Lab PLA Basic filament. ³ The gripper fingers were fabricated using Smooth-On Dragon Skin^TM 10 Very Fast silicone rubber. Based on a market reference price of $50.67 for a 2 lb kit, and an average mass of

65.2

g

per finger, up to 13 units can be produced from a single kit. ⁴ Each node corresponds to a single servo unit in practical implementation. However, from a system architecture and functional standpoint, the average cost of each node should also include a portion of the host machine, as multiple servo units share a common host.

Table 3. Equivalent D-H parameters of the soft gripper.

Link No. i	$θ_{i}$	$d_{i}$	$α_{i}$
1	$- θ / 2$	0	$- π / 2$
2	0	$d_{2}$	$π / 2$
3	$θ / 2$	0	0

This table summarizes the Denavit–Hartenberg parameters used to model the kinematics of the soft gripper.

Table 4. Comparison of reference and measured bending angles and tip distances.

Gripper Bending Angle $θ$ (°)		Gripper Finger Tip Distance $Δ d / 2 (mm)$ *
Reference / Desired	Measured	Reference / Desired	Measured	Error
5	13.234	3.263	8.603	+5.340
10	19.888	6.512	12.855	+6.343
15	25.853	9.738	16.596	+6.858
20	31.247	12.927	19.902	+6.975
30	38.989	19.144	24.490	+5.345
45	47.400	27.902	29.224	+1.322
60	52.495	35.724	31.944	−3.780
75	56.031	42.365	33.760	−8.605
90	60.900	47.632	36.158	−11.474

* The definition of

Δ d

is similar to

Δ d_{pixel}

as shown in Figure 18, but it was measured using metric unit.

Table 5. Summary of key experimental results and performance metrics.

Metric	Value	Description
Cost per node	$172–$185	Based on Bill of Materials (Table 2)
Vision processing speed	42 fps to 73 fps	On Raspberry Pi 5 hardware
Gripper fabrication yield	100%	Across five independent prototypes
Fabrication cost reduction	>60%	Compared to traditional 3D-printed molds
Mean tip position error	$5.138$ $m$ $m$	For bending angles $5 °$ – $60 °$
Maximum tip position error	$6.975$ $m$ $m$	For bending angles $5 °$ – $60 °$
Model bending angle error	∼6.3°	At mid-voltage range ( $30 °$ – $60 °$ )
Fabrication time	30 min to 40 min	Per gripper finger unit

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiang, Y.; Mohammadpour Velni, J. A Soft Robotic Gripper for Crop Harvesting: Prototyping, Imaging, and Model-Based Control. AgriEngineering 2025, 7, 378. https://doi.org/10.3390/agriengineering7110378

AMA Style

Jiang Y, Mohammadpour Velni J. A Soft Robotic Gripper for Crop Harvesting: Prototyping, Imaging, and Model-Based Control. AgriEngineering. 2025; 7(11):378. https://doi.org/10.3390/agriengineering7110378

Chicago/Turabian Style

Jiang, Yalun, and Javad Mohammadpour Velni. 2025. "A Soft Robotic Gripper for Crop Harvesting: Prototyping, Imaging, and Model-Based Control" AgriEngineering 7, no. 11: 378. https://doi.org/10.3390/agriengineering7110378

APA Style

Jiang, Y., & Mohammadpour Velni, J. (2025). A Soft Robotic Gripper for Crop Harvesting: Prototyping, Imaging, and Model-Based Control. AgriEngineering, 7(11), 378. https://doi.org/10.3390/agriengineering7110378

Article Menu

A Soft Robotic Gripper for Crop Harvesting: Prototyping, Imaging, and Model-Based Control

Abstract

1. Introduction

2. Related Work

2.1. Robotic Perception and System Architecture in Agriculture

2.2. Soft Robotic Grippers and Actuation Strategies

2.3. Modeling and Sensing for Soft Actuators

3. Materials and Methods

3.1. Gripper Prototype Design and Manufacturing

3.1.1. Material Selection

3.1.2. Manufacturing Process

3.1.3. Gripper Mold Design Benchmark

3.2. Crop Harvesting System Architecture

3.3. Reinforced Image Processing

3.3.1. Robust Crop Image Processing & Tracking with Confidence Validation

3.3.2. Universal Gripper Feature Detection via Shape Priors

3.4. Gripper Modeling

3.4.1. Gripper Static Modeling

3.4.2. Modeling of the Gripper Dynamics

3.4.3. Sensor & Electronics Dynamics

3.5. Controller Design

3.5.1. System Linearization

3.5.2. Tuning of the Controller Gains

4. Results and Discussion

4.1. Examining the Effect of Circumferential Stress

4.2. Gripper Statics Model Verification

4.3. Controller Performance Validation

4.4. Future Work

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Derivation of the Neo-Hookean Statics Model

Appendix B. System Software Architecture and Algorithm Implementation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI