Reinforcement Learning-Driven Digital Twin for Zero-Delay Communication in Smart Greenhouse Robotics

Bua, Cristian; Borgianni, Luca; Adami, Davide; Giordano, Stefano

doi:10.3390/agriculture15121290

Open AccessArticle

Reinforcement Learning-Driven Digital Twin for Zero-Delay Communication in Smart Greenhouse Robotics

¹

Department of Information Engineering, University of Pisa, Via G. Caruso, 16, 56122 Pisa, Italy

²

Department of Information Engineering, CNIT—University of Pisa, Via G. Caruso, 16, 56122 Pisa, Italy

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(12), 1290; https://doi.org/10.3390/agriculture15121290

Submission received: 29 April 2025 / Revised: 4 June 2025 / Accepted: 13 June 2025 / Published: 15 June 2025

(This article belongs to the Special Issue Automation Strategy Using Machine Learning in Horticultural Crop Cultivation)

Download

Browse Figures

Versions Notes

Abstract

This study presents a networked cyber-physical architecture that integrates a Reinforcement Learning-based Digital Twin (DT) to enable zero-delay interaction between physical and digital components in smart agriculture. The proposed system allows real-time remote control of a robotic arm inside a hydroponic greenhouse, using a sensor-equipped Wearable Glove (SWG) for hand motion capture. The DT operates in three coordinated modes: Real2Digital, Digital2Real, and Digital2Digital, supporting bidirectional synchronization and predictive simulation. A core innovation lies in the use of a Reinforcement Learning model to anticipate hand motions, thereby compensating for network latency and enhancing the responsiveness of the virtual–physical interaction. The architecture was experimentally validated through a detailed communication delay analysis, covering sensing, data processing, network transmission, and 3D rendering. While results confirm the system’s effectiveness under typical conditions, performance may vary under unstable network scenarios. This work represents a promising step toward real-time adaptive DTs in complex smart greenhouse environments.

Keywords:

digital twin; zero-delay network; negative delay; remote control; wearable device; smart agriculture; reinforcement learning

1. Introduction

In 2025, the world faces a dual challenge: a growing population and a worsening food crisis. With global population projected to reach 8.23 billion and a yearly growth rate of 0.85% [1], the pressure on food systems is intensifying. At the same time, the 2025 Global Report on Food Crises reveals that 295.3 million people experienced acute food insecurity in 2024, affecting 22.6% of the analyzed population, up from 21.5% in 2023 [2]. This marks the sixth consecutive year of rising food insecurity, underscoring the urgent need for sustainable and efficient agricultural solutions. To address these growing concerns, it is essential to integrate advanced technologies into the agricultural sector. One promising approach is the use of Greenhouses (GHs), particularly hydroponic systems, which are a soilless cultivation method where plants grow in a nutrient-rich water solution for more efficient resource use compared with traditional farming. These systems also offer a controlled environment that supports the adoption of innovative Agritech solutions [3], from sensor-based monitoring to AI-driven growth management, enabling more efficient and resilient food production to meet future demands. One example is the automation of the fruit harvesting thanks to Robotic Arms (RAs) and their control [4,5]. To enable RAs’ high-precision operations, one technology could be the Digital Twin (DT) of the controller and the RA that allows real-time monitoring, dynamic adaptation, and predictive maintenance. To achieve the high demand for food, the agricultural operations must be increased, but these operations require automation and precise control loops. The production of food increases, but the harvesting must increase to keep pace. Intuitively, using and controlling RAs enhances efficiency and harvesting operations. The DT enables testing of new algorithms and optimizing movements before real-world implementation, but also provides a visual update of the real scenario and can change the physical asset. The DTs are used in many scenarios like robotics, teleoperations, and control in agriculture, but not all the works that use the word Digital Twin deploy a real DT. There are three definitions of the representations and interactions between the two worlds, the Physical and the Digital [6]:

The simplest is the Digital Model, which is a static representation without a direct connection to the physical world. It is a replica not synchronized in real time, and it is used for simulation and default analysis;
The Digital Shadow instead is a Digital Model that receives data from the physical world, but the communication is one-way. The physical world updates the model, but not vice versa. It can be used for monitoring and data analysis;
Finally, the Digital Twin is an interactive model of the physical world with a bidirectional data exchange. The model receives data in real time from the physical world, and it can send commands to the real world, modifying it.

In this paper, we consider the scenario of hydroponic greenhouses [7] with the aim of implementing a service managed by its DT for an integrated RA and Sensor-equipped Wearable Glove (SWG) system. In previous works, we developed a 3D model of a hand to better visualize the movements transmitted to the remote RA [8,9]. Possible applications include assessing the ripeness of a tomato using an RA and measuring bioimpedance, or performing fruit harvesting. Specifically, a commercial hydroponic greenhouse [10] has been modified and adapted with a network architecture that enables the expansion of services, functionalities, and devices without the need to redesign the entire system whenever a new hardware or software component is added. Our hydroponic greenhouse comprises four Heltec ESP32 WiFi LoRa [11] devices with sensing and actuating capabilities, including temperature, humidity, water level, light spectrum, and lighting measurements.

With this work, we aim to introduce an additional functionality: the remote control of an RA with SWG, which manages and plans movements using the DT of the service. This approach minimizes the service delay, particularly in the network, potentially eliminating it in certain cases to achieve real-time interaction. Moreover, we integrate a movement prediction model based on Reinforcement Learning (RL) that is able to predict the next movement based on the past movements, and at every step, the error between the real angles and the predicted ones allows the model to improve the quality of the prediction. RL is a machine learning paradigm in which an agent learns to make decisions through interaction with an environment. Specifically, the agent performs actions in order to maximize a specific reward function by learning an optimal policy. Unlike supervised learning, in RL there are no labeled data and no a priori knowledge of what the right action is, but the agent learns by balancing exploration (trying new actions) and exploitation (using what has already been learned). It is widely used in robotics, gaming, and autonomous systems, where real-time decision-making is crucial.

The proposed DT framework supports three operating modes:

Real2Digital, which updates the DT with real-world data in real time;
Digital2Real, which executes commands from the DT to the physical system;
Digital2Digital, which simulates tasks in the DT to optimize performance and algorithms before deployment.

We first analyzed the communication delays introduced in two scenarios by leveraging edge and cloud computing. Next, we examined the computational delays caused by the SWG and DT host. Lastly, we developed a predictive model for hand movements using a reinforcement learning algorithm, enabling the reduction or even elimination of network latency by leveraging the concepts of zero-delay networks and negative delay.

The remainder of the paper is organized as follows. Section 2 presents key related works on the application and implementation of digital twins. Section 3 describes the employed devices and network architecture. Section 4 details our DT framework and the predictive model used. Finally, Section 5 analyzes the results, Section 6 discusses our obtained results, highlighting the concepts of zero/negative communication delay, and Section 7 provides the conclusions.

2. Related Work

In this section, we review the existing literature relevant to our proposed DT-based framework. We organize the review into five main areas: (i) wearable glove-based teleoperation systems, which are essential for human–robot interaction in our setup; (ii) IoT applications in smart agriculture, providing context for environmental monitoring; (iii) applications of DTs in robotics and rehabilitation; (iv) gesture recognition approaches based on computer vision and gloves and (v) recent efforts combining digital twins with RL. We conclude by identifying the research gap that our work addresses.

2.1. Wearable Glove-Based Teleoperation Systems

We first review research on wearable glove (WG)-based systems for teleoperation, as these represent a key interaction interface in our proposed DT framework. The study [12] implements two teleoperation control frameworks: one based on a WG and another on a vision-tracking system. The glove-based method is identified as more accurate and superior in performance compared with the vision-based approach. However, it is considered less comfortable and requires greater user effort. The work in [13] presents a novel data glove developed and utilized for the teleoperation control of a robotic hand-arm setup. NASA has implemented a practical use case of the wearable-based teleoperation control framework, employing two CyberGloves to remotely control robots for tasks such as threading nuts onto bolts and tying knots [14]. Additionally, a method utilizing WGs has been introduced for virtual interaction scenarios, enabling activities such as 3D modeling and task training [15].

2.2. IoT and Smart Agriculture Monitoring

Given the agricultural context of our work, we now examine IoT-based solutions for greenhouse and hydroponic systems that inform our environmental sensing approach. Previous research has explored IoT applications in hydroponic farming systems. Studies such as [16] focus on automating greenhouse environmental monitoring, pH level adjustments, and electrical conductivity maintenance, while [17] presents a monitoring system for real-time data analysis in small to medium-sized greenhouse installations. The work in [18] introduces a solution to enhance agricultural production and reduce pesticide use by developing a smart greenhouse system for hydroponic farming. This system employs an Arduino Mega2560 and ESP-01 for data communication and control, along with an Android smartphone application for environmental monitoring while ensuring pesticide-free plant cultivation. In [19], the authors leverage IoT technology to design an environmental monitoring system for greenhouses, integrating wireless networks, mobile networks, and the Internet to facilitate remote plant condition monitoring. A layer-based data analysis method for monitoring tomato growth is proposed in [20]. The authors collected various environmental parameters, including humidity, temperature (both indoor and outdoor), soil moisture, and CO₂ concentration, utilizing a Raspberry Pi 3 and the DART framework as an IoT gateway.

2.3. Digital Twin Applications in Robotics and Human Interaction

The concept of the DT is gaining increasing traction across various domains, ranging from medical rehabilitation to robotic teleoperation, demonstrating its versatility and adaptability for real-time applications [21]. By leveraging physical models and historical data, DTs comprehensively represent the entire lifecycle of physical systems and entities. Their ability to facilitate real-time, two-way interactions between the physical and virtual realms enables efficient monitoring, analysis, and prediction of system behavior and optimizes resource allocation [22]. The authors in [23] developed a double DT for hand rehabilitation, implementing a virtual glove to model both injured and healthy hands. Their system employs a DT with a healthy hand model as a reference to calibrate the injured hand’s model, integrating real-time data acquisition. Similarly, ref. [24] develops a gesture-sensing glove designed to accurately capture hand gestures for robotic teleoperation, leveraging strain gauges embedded in flexible beams to measure finger movements. This system has been utilized to create a DT of hand movements and to control a RA remotely in real time, showcasing its potential in human–robot interaction applications. Additionally, a neural network-based algorithm was developed to reduce the computational load of the motion-sensing model.

More specifically, in the context of robotic applications, the authors in [25] propose a modular, plug-and-play teleoperation framework integrating a DT interface called TELESIM. TELESIM is developed on ROS2 and, utilizing NVIDIA Isaac Sim, supports multiple robotic platforms, collision avoidance, and motion planning with flexibility and adaptability across different scenarios. More simply, the authors in [26] create a DT for a six-degree-of-freedom RA using the Newton-Euler method to enhance trajectory planning precision.

Beyond robotics, ref. [27] introduces a DT for sign language recognition. Their system consists of a lightweight computer vision AI and a 3D avatar capable of real-time movement detection and rendering using general-purpose hardware. The 3D avatar model is developed with Pixcap and the model-viewer JavaScript library, synchronizing real-world gestures with virtual counterparts. Using the ThingJS platform and a low-cost attitude sensor, the authors in [28] demonstrate the application of a DT to improve RA positioning accuracy. Their methodology combines 3D positional information exchange and error compensation, with the system maintaining real-time synchronization between the physical and digital worlds through WebSocket and serial communication.

2.4. Limitations of Vision-Based Gesture Recognition

In recent years, numerous studies have utilized digital cameras and depth sensors to capture three-dimensional depth data for gesture recognition [29,30]. However, challenges persist in computer vision-based gesture prediction, particularly regarding the high computational demands and processing time required by these algorithms. Additionally, visual occlusions can negatively impact the performance of vision-based systems. In contrast, glove-based or external sensor systems offer a viable solution by mitigating visual occlusion issues and reducing processing delays. In order to overcome these limitations, some works present approaches using hand gloves. Authors in [31] proposed a theoretical framework and developed a corresponding hardware system for gesture recognition by combining deep learning with a specially designed data glove. They also introduced a novel gesture recognition algorithm that employs a fusion model integrating 1-D and 2-D Convolutional Neural Networks to extract spatiotemporal features, followed by a Temporal Convolutional Network framework to enhance recognition performance. With a similar approach, the authors in [32] developed a system for real-time gesture prediction using flex sensors arranged on a data glove according to the hand’s muscle distribution to capture detailed motion data. They extracted information on position, velocity, and acceleration from the raw sensor outputs and processed these to derive adjacent finger-coupling features. They implemented a combined prediction model using a neural network and a multiclass support vector machine (SVM), designing neural network experiments that optimized the model based on prediction time and accuracy.

2.5. Digital Twins Combined with Reinforcement Learning

In the literature, we can find some works that introduce the concept of RL in the context of DT. In particular, in [33], the authors developed a DT framework that uses RL to train a virtual robot arm, which is then successfully mapped to a physical counterpart. They integrate Unity and TensorFlow to simulate, train, and test robotic tasks in a virtual environment, demonstrating that policies learned in simulation can be transferred to a real robot without prior exposure to all physical states. However, their method also has limitations. The training process is time-consuming (over 30 h), requires careful reward design to avoid undesired behaviors, and does not introduce a network system integration. With a similar approach, in [34], the authors present a digital twin-driven deep reinforcement learning (DRL) framework for adaptive task allocation in robotic construction. They demonstrate that a DRL agent, trained in a virtual environment reflecting dynamic site conditions, can learn and apply policies that outperform traditional rule-based approaches, particularly in uncertain and changing scenarios. Finally, in [35], the authors introduce the use of DRL for DT manufacturing process optimization.

2.6. Positioning of Our Work

Our work focuses on the delays introduced by individual system components, including the network architecture, aiming to minimize latency gaps for synchronizing the physical and digital worlds. Instead of relying on serial communications, which have a limited coverage range, we utilize wireless networks that offer greater flexibility across different scenarios. Moreover, we adopt a Reinforcement Learning approach that allows an adaptive and dynamic learning method that can be more flexible for different users with different hand gesture behaviors. Specifically, our research extends the application of DTs in precision agriculture by integrating and developing a DT of a WG and RA system leveraging wireless technologies. We analyze the various delays introduced by the main system components and propose a reinforcement learning-based approach to reduce the synchronization gap between the physical and digital worlds, achieving, in some cases, zero or even negative network latency.

3. Materials and Methods

This section describes the physical structure of the three main components that constitute the DT of the service: the SWG, RA, and network architecture. The SWG and RA are connected to the digital twin of the entire service through our proposed network architecture. The study scenario features a remote SWG and an RA installed in our miniature prototype hydroponic greenhouse [36], which is designed for laboratory experimentation and limited-scale testing. The greenhouse measures 50 cm × 40 cm × 50 cm and includes a water container filled with four liters of water and nutrient solutions. A submersible water pump ensures water circulation beneath the plant roots. Plants are grown within this setup, accompanied by environmental monitoring tools such as a DHT11 temperature and humidity sensor and an LLC 3779 spectrometer (Adafruit, New York, NY, USA). Above the plants, four high-power LEDs and two cooling fans maintain optimal environmental conditions. The DT, containing both the 3D model of the hand and the 3D model of the RA, is updated to track every movement in the real world.

3.1. Sensor-Equipped Wearable Glove

The SWG [37] is built using an Arduino Uno controller and is equipped with five sliding resistors, a gyroscope, and a tilt sensor, specifically designed to capture the angular inclination of the fingers and the motion state of the palm. It employs the HC-08 module for communication tasks, which operates on the Bluetooth 4.0 protocol and supports both master-slave and serial communication. Figure 1 shows the selected glove.

The human hand can perform a diverse range of movements through the coordinated contraction and relaxation of muscles, regulated by the nervous system. However, the glove integrated into our architecture is designed to capture only the movements of the Metacarpophalangeal (MCP) joints and the palm’s inclination, which is sufficient for our application scenario.

3.2. 6-Degree-of-Freedom Robotic Arm

Figure 2 illustrates the selected 6-degree-of-freedom (6DOF) RA [38], a cost-effective device capable of performing various movements. Its mobility is enabled by four Radio Controller servos with metal gears, while the remaining two servomotors control the gripper’s opening/closing and rotation. The robotic arm consists of six rotary joints: base, shoulder, elbow, wrist pitch, wrist yaw, and wrist roll.

The kinematics of the RA pose a significant challenge in automated control and manipulation. However, direct control via the SWG is intended to reduce this complexity. Rather than computing the inverse kinematics matrix to determine the flexion angles of the six motors from given spatial coordinates, the approach focuses on synchronizing the arm’s movements with the operator’s hand gestures. This architectural framework lays the groundwork for future developments, particularly for remotely controlling the RA within the hydroponic GH for applications such as fruit quality assessment, including ripeness evaluation. In our GH prototype, the devices within the system utilize Long Range (LoRa) modulation to communicate and exchange data with other devices. This is achieved by interfacing sensors, actuators, and additional components via the Heltec ESP32 WiFi LoRa board. To control the six servomotors of the RA, an additional board, the Octopus Expansion Shield [39], is required, along with an external power supply due to the high power consumption of the servomotors. This expansion board provides 16 PWM outputs and 16 digital input/output pins, enabling the seamless control of the servomotors through standard libraries.

3.3. Smart Greenhouse System Architecture

The proposed network and system architecture are illustrated in Figure 3. In general, the adoption of IoT and DT technologies has become essential in the transition from traditional GH cultivation to intelligent and optimized farming.

The main components of our scenario can be identified in the diagram. At the edges of the figure, we highlight the remote devices, including the WG, the DT, and the RA. At the center of the figure, the network infrastructure enabling interaction between these components is depicted. More specifically, in the implemented architecture, GH devices communicate via LoRa communication at 869 MHz, directly connecting to a LoRa Gateway located near the edge computing server. This server has public network connectivity, which can leverage both satellite and terrestrial networks, depending on external management beyond the scope of this research. For the transport protocol, Message Queuing Telemetry Transport (MQTT) has been employed due to its flexibility and lightweight nature, making it a widely adopted standard in the state of the art. This facilitates data exchange between remote devices, edge computing, and cloud computing servers. In our system, the computing operations are managed by a Node-RED server, which, thanks to its block-based programming, allows for rapid implementation, management, and modification of new functionalities and integration of additional devices.

All the results presented in the following sections are based on this real-world system architecture, demonstrating the feasibility of the research not only from a theoretical perspective but also through practical implementation.

3.4. Training Configurations of the RL Model

A custom Gym [40] environment was designed with a DDPG agent implemented with the following:

Actor-critic networks: 3-layer MLPs (256-128 units)
Exploration: Gaussian noise $N (0, {0.1}^{2})$
Training parameters:
-
Replay buffer: $10^{5}$ samples
-
Batch size: 128
-
Learning rate: 0.001
-
Training steps: 50,000

During evaluation, the agent uses the deterministic policy

π (s_{t})

(without exploration noise). The accuracy of the predictions is measured using the MSE:

{MSE}_{{angle}_{i}} = \frac{1}{N} \sum_{j = 1}^{N} {({\hat{θ}}_{i, j} - θ_{i, j})}^{2},

where N is the total number of prediction steps,

{\hat{θ}}_{i, j}

is the j-th prediction for the i-th angle, and

θ_{i, j}

is the true value for the i-th angle.

The simulation leverages the DDPG algorithm to model temporal dependencies and delays in dynamic hand motion trajectories.

Kinematic data comprising angular positions of five fingers and palm orientation were normalized to the

[0, 1]

range:

x_{norm} = \frac{x - x_{min}}{x_{max} - x_{min}} .

(1)

Temporal consistency was preserved using timestamps, with the dataset partitioned into overlapping windows for sequential prediction. The MSE is computed both globally and per dimension to evaluate the model’s performance in detail.

In conclusion of this section, we present a qualitative comparison with classical approaches versus our DDPG-based framework. Linear or semi-nonlinear predictive filters (Kalman Filter, Extended Kalman Filter, Adaptive Filters) present high computational efficiency and ease of implementation, and do not require a large initial training dataset, since they rely on a predefined state-observation model. On the other hand, these filters struggle to capture complex joint dynamics. In real-world scenarios with variable network latency, filter estimates degrade quickly once the true trajectory deviates from model assumptions. A different approach is defined by recurrent models (RNN, LSTM, GRU). These models are able to capture nonlinear temporal patterns and long-term dependencies, making them suitable for modeling complex joint sequences. The main limitation of these approaches for our specific purpose is that they require a large dataset to avoid overfitting. Moreover, hyperparameter tuning is complex and often demands extensive experimentation. Moreover, offline training limits rapid adaptation to changing network conditions. We propose a DDPG approach that exploits online updates via a replay buffer and exploration noise to handle continuous action spaces, providing superior robustness under variable network latency and rapid movements, without the need for extensive dataset collection or continuous hyperparameter tuning.

4. Digital Twin Framework and Hand Movement Prediction Model

In this section, we provide a detailed description of the interactions between the devices in the physical world and the proposed DT. Specifically, we first analyze the functional scheme and interaction modes with our DT before delving into the computational steps executed by the SWG, RA, and DT, concluding with the mathematical details of the hand movement prediction model.

4.1. Digital Twin Framework

The proposed DT framework consists of the SWG and RA, both of which communicate with the 3D digital model of the hand and the RA. The primary objective is to completely eliminate network-induced latency, ensuring a truly real-time and synchronized interaction. To precisely define our DT framework, we distinguish three distinct modes of operation:

Real2Digital: The SWG and RA operate in the physical world, transmitting data to the DT to maintain synchronization between the 3D models and real-world actions. Communication is handled via the MQTT protocol, which enables the transmission of commands and status updates from the physical environment to the digital realm.
Digital2Real: In this mode, interaction occurs in the opposite direction, from the DT to the physical system. Specifically, manipulations performed directly on the 3D model of the hand or RA are transmitted as control commands to the RA in real time. In our implementation, human interaction with the DT is facilitated through a serial interface, which differentiates control messages from those originating from field devices.
Digital2Digital: This mode is particularly useful for testing and analyzing new algorithms and commands in a virtual environment without affecting the physical world. Here, mode-specific identifiers are used to indicate the operational mode. Through the serial interface, commands can be sent to directly manipulate the 3D model of the hand or RA, allowing visualization of the robotic arm’s potential response without requiring actual physical movement.

Overall, the bidirectional communication ensures seamless synchronization between digital commands and physical execution. The DT framework is illustrated in Figure 4, presenting a block diagram that details the computational processes of the physical devices, their implementation in the DT, and the interconnections enabling communication between the three main system components.

4.2. SWG Computation

Referring to the detailed Figure 4, the selected SWG consists of five sliding resistors (SRs) and the MPU6050 sensor, which incorporates both an accelerometer and a gyroscope. The five sliding resistors measure the flexion of the first phalanx of the five fingers, from the MCP joint to the Proximal InterPhalangeal (PIP) joint, as illustrated in Figure 5.

The SRs produce a dimensionless output ranging from 500 to 2500, representing the Finger Inclination (FI). These values correspond, based on the dependency detailed in Equation (2), to the maximum and minimum finger Flexion Angles (FA), denoted as

ϕ_{m a x}

and

ϕ_{m i n}

, respectively. For the selected SWG, these boundary angles correspond to 45° and 0°, measured relative to the vertical plane of the palm.

F I_{j} = (1 - \frac{ϕ_{j}}{ϕ_{m a x}}) \times 2000 + 500,

(2)

where j represents the j-th finger, ranging from 1 to 5.

In addition to these five values, the SWG provides the tilt angles of the palm, specifically

α

and

β

, utilizing data from the MPU6050 sensor (Adafruit, New York, NY, USA), which integrates an accelerometer and a gyroscope. The sensor data undergo processing through a Low-Pass Filter (LPF), a sensitivity adjustment, and an Exponential Moving Average (EMA) filter to minimize measurement noise. The underlying Algorithm 1 details the mathematical transformations applied to the data,

Algorithm 1 Computation of Palm Tilt Angles

α

and

β

1:: Input: Raw accelerometer data $a c c_{i}$ , raw gyroscope data $g_{i}$
2:: Output: Tilt angles $α$ , $β$
3:: Apply Low-Pass Filter
4:: for $i \in {x, y, z}$ do
5:: $a c c_{i 0} \leftarrow a c c_{i} \times 0.3 + a c c_{i 0} \times 0.7$
6:: $g_{i 0} \leftarrow g_{i} \times 0.3 + g_{i 0} \times 0.7$
7:: end for
8:: Apply Sensitivity Adjustment
9:: for $i \in {x, y, z}$ do
10:: $a c c_{i 1} \leftarrow \frac{a c c_{i 0} - a c c_{i - o f f s e t}}{8192}$
11:: $g_{i 1} \leftarrow g_{i 0} - g_{i - o f f s e t}$
12:: end for
13:: Compute Tilt Angles
14:: $t e m p_{α} \leftarrow atan 2 (a c c_{y 1}, a c c_{z 1}) \times \frac{180}{π}$
15:: $α \leftarrow 0.8 \times (α + \frac{g_{x 1}}{16.4} \times 0.02) + (- t e m p_{α} \times \frac{180}{π}) \times 0.2$
16:: $t e m p_{β} \leftarrow atan 2 (a c c_{x 1}, a c c_{z 1}) \times \frac{180}{π}$
17:: $β \leftarrow 0.8 \times (β + \frac{g_{y 1}}{16.4}) + (- t e m p_{β}) \times 0.2$
18:: Final Output
19:: return $α, β$

Where i represents the three Cartesian axes x, y and z.

After capturing the five FIs and the tilt angles

α

and

β

, the seven angles are formatted into a JSON message, which is the most commonly used format, and transmitted via the MQTT protocol to both the DT host and the corresponding RA.

4.3. RA and DT Computation

The seven samples collected from hand movements by the SWG are acquired at a frequency of 50 Hz, meaning every 20 ms, and transmitted to both the DT and the physical RA. Both recipient devices receive the seven values: two correspond to the palm tilt angles

α

and

β

, which are immediately ready to use, while the remaining five are FIs, represented as dimensionless values ranging between 500 and 2500.

The first computation that both devices must perform involves normalizing the FI values (FIN), scaling them to a dimensionless range between 0 and 1 to represent finger movements, and converting the FIN values into angles, specifically calculating the FAs of the fingers. These angles, expressed in degrees, are used to animate the fingers of the 3D hand model in the DT and to control the movement of both the real and 3D model of the RA, following Equation (3).

\begin{matrix} F I N_{j} = 1 - \frac{F I - 500}{2000} = \frac{ϕ}{ϕ_{m a x}}, \\ F A_{j} = F I N_{j} \times ϕ_{m a x} = (1 - \frac{F I_{j} - 500}{2000}) \times ϕ_{m a x} . \end{matrix}

(3)

To summarize, the physical RA receives the same input values as the DT and computes, specifically using

α

and

β

, the inclination angle of the first arm. Additionally, averaging the five FA_j values using the arithmetic mean determines the inclination of the second arm. Finally, the degree of opening and closing of the gripper is calculated based on the FA of the thumb and index finger, specifically FA₁ and FA₂. Figure 6 provides a clearer representation of the first and second arms mentioned.

On the other hand, the DT receives the same input values as the physical RA (either via MQTT messages from the SWG in Real2Digital mode or through serial communication in the other two modes) and updates the 3D model of the RA. It computes the inclination angles of the first and second arms, as well as the gripper’s opening and closing, mirroring the behavior of the physical RA. Additionally, the DT updates the 3D model of the hand (including fingers and palm) using the individual FA_j values, along with

α

and

β

, ensuring that the relative positions of all components are correctly updated through the Rotation Matrix (RM) defined in Equation (4).

\begin{matrix} R M_{x} (γ) = [\begin{matrix} 1 & 0 & 0 \\ 0 & cos (γ) & - sin (γ) \\ 0 & sin (γ) & cos (γ) \end{matrix}], \\ R M_{y} (β) = [\begin{matrix} cos (β) & 0 & sin (β) \\ 0 & 1 & 0 \\ - sin (β) & 0 & cos (β) \end{matrix}], \\ R M_{z} (α) = [\begin{matrix} cos (α) & - sin (α) & 0 \\ sin (α) & cos (α) & 0 \\ 0 & 0 & 1 \end{matrix}], \\ R M = R M_{z} (α) R M_{y} (β) R M_{x} (γ) = \\ = [\begin{matrix} cos (α) cos (β) & - sin (α) cos (γ) + sin (β) sin (γ) cos (α) & sin (α) sin (γ) + sin (β) cos (α) cos (γ) \\ sin (α) cos (β) & sin (α) sin (β) sin (γ) + cos (α) cos (γ) & sin (α) sin (β) cos (γ) - sin (γ) cos (α) \\ - sin (β) & sin (γ) cos (β) & cos (β) cos (γ) \end{matrix}] \end{matrix}

(4)

More specifically, it adjusts the positions of the five fingers relative to the palm’s movement and updates the position of the second arm in response to the first arm’s movement. Finally, it predicts the subsequent hand movements that may be received from the SWG in order to proactively update the state of the various components and reduce the overall synchronization delay between the physical and digital environments (the prediction model and related details are described in Section 4.4).

4.4. Hand Movements Prediction with Reinforcement Learning

In this section, we present the use of RL for hand movement prediction in order to reduce the latency in sending the real movement from the SWG to the DT and vice versa. Before presenting the results and the quality of prediction we achieved, let us present a problem statement.

In order to introduce an RL problem, we need to define a few key elements:

the observable state of the system (state),
the actions that the agent can perform,
a reward function that quantifies the goodness of each action over time,
and a policy, which is the decision strategy that the agent learns and optimizes during the interaction with the environment.

In our case, the system can be modeled as a Partially Observable Markov Decision Process (POMDP), in which the agent predicts the current state of the robotic hand based on past and incomplete observations due to network latency.

Let the robotic hand’s state be defined by seven angular coordinates:

θ (t) = {[θ_{1} (t), \dots, θ_{5} (t), θ_{6} (t), θ_{7} (t)]}^{T} \in R^{7}

(5)

where

θ_{1}

to

θ_{5}

represent FA_j angles, and

θ_{6}, θ_{7}

denote tilt palm orientation angles. Given a network delay

Δ t

, the DT receives

θ (t - Δ t)

at time t. Our objective is to find a prediction function

P

such that

\hat{θ} (t) = P (θ (t - Δ t), ∥ \hat{θ} (t) - θ (t) ∥ \to min

(6)

The state at time t, denoted

s_{t}

, is defined as a window of past values:

s_{t} = {a_{t - w}, a_{t - w + 1}, \dots, a_{t - 1}},

where

w = 20

is the temporal window, and

a_{k} \in R^{7}

represents the normalized angle vector at time k.

The action represents the predicted current state:

A = R^{7}, a_{t} = \hat{θ} (t)

When the agent produces an action

{\hat{s}}_{t}

as a prediction of the future state, the reward

r_{t}

is computed as the negative Mean Squared Error (MSE):

r (t + Δ t) = - \sum_{i = 1}^{7} w_{i} {({\hat{θ}}_{i} (t) - θ_{i} (t))}^{2}

(7)

where

w_{i}

are joint-specific weights.

To enable exploration in continuous action spaces, we inject Gaussian noise into the action outputs (using NormalActionNoise). For each action dimension

i = 1, \dots, 7

, the noise is sampled as follows:

ϵ_{i} \sim N (0, σ^{2}), with σ = 0.1 .

Thus, the total noise vector is as follows:

ϵ \sim N (0, {0.1}^{2} I_{7}),

where

0

is the zero vector in

R^{7}

and

I_{7}

is the

7 \times 7

identity matrix. The final exploratory action becomes the following:

{\hat{s}}_{t} = π (s_{t}) + ϵ,

where

π (s_{t})

is the deterministic output of the policy network.

The algorithm is based on the Deep Deterministic Policy Gradient (DDPG) framework, which is an actor-critic method designed for continuous control tasks. This approach combines policy optimization with value function estimation and is particularly suitable when the action space is continuous.

The policy (actor) network, parameterized by

θ^{π}

, is tasked with generating continuous actions based on the current state

s_{t}

. Its parameters are updated to maximize the expected return. The gradient of the objective function can be approximated as follows:

\nabla_{θ^{π}} J \approx E_{s_{t} \sim D} [\nabla_{a} Q (s_{t}, a ∣ θ^{Q}) |_{a = π (s_{t})} \nabla_{θ^{π}} π (s_{t} ∣ θ^{π})],

(8)

where

D

is the replay buffer storing past transitions, and

Q (s_{t}, a | θ^{Q})

is the critic’s Q-value estimate.

The critic network, parameterized by

θ^{Q}

, evaluates the state-action pairs by approximating the Q-value function. Its loss function is defined by the Temporal-Difference error:

L (θ^{Q}) = E_{(s_{t}, a_{t}, r_{t}, s_{t + 1}) \sim D} [{(Q (s_{t}, a_{t} ∣ θ^{Q}) - y_{t})}^{2}],

(9)

with the target value

y_{t}

computed as follows:

y_{t} = r_{t} + γ Q (s_{t + 1}, π (s_{t + 1} ∣ θ^{π}) ∣ θ^{Q}) .

(10)

Here,

γ

is the discount factor that weights future rewards. In many implementations, separate target networks are maintained for both the actor and critic, updated via a soft-update rule:

θ_{target} \leftarrow τ θ + (1 - τ) θ_{target},

where

τ ≪ 1

.

A replay buffer

D

stores transitions of the form

(s_{t}, a_{t}, r_{t}, s_{t + 1})

up to a capacity (e.g.,

10^{5}

transitions). During training, mini-batches (e.g., 128 samples per batch) are uniformly sampled from this buffer. This technique helps to

Break the temporal correlations between sequential samples.
Provide a diverse set of experiences, which stabilizes the training process.

To effectively explore the continuous action space, an exploration noise process is added to the actions generated by the actor during training. A common choice is the Ornstein–Uhlenbeck process, which produces temporally correlated noise that is well-suited for physical control systems where smooth transitions between actions are desired.

For applications such as robotic control, it is crucial to ensure that the predicted joint angles stay within physically feasible limits. Let the predicted joint angles be

{\hat{θ}}_{i} (t)

for

i = 1, \dots, 7

, with physical limits defined by the following:

θ_{i}^{m i n} \leq {\hat{θ}}_{i} (t) \leq θ_{i}^{m a x}, \forall i \in {1, \dots, 7} .

(11)

This constraint is enforced by applying a scaled hyperbolic tangent activation function at the output layer of the policy network:

{\hat{θ}}_{i} (t) = θ_{i}^{m i n} + \frac{θ_{i}^{m a x} - θ_{i}^{m i n}}{2} (1 + tanh (z_{i})),

(12)

where

z_{i}

are the raw outputs from the final layer of the actor network. This transformation ensures that regardless of the value of

z_{i}

, the final output

{\hat{θ}}_{i} (t)

always lies within the interval

[θ_{i}^{m i n}, θ_{i}^{m a x}]

.

5. Experimental Results

In this section, we present the various delays introduced by the individual processes described in the previous Section 4, including network latency in scenarios where the processing server Node-RED and the MQTT broker are located either at the edge or in the cloud.

Specifically, we measured the delays introduced by the SWG in sensing the FI, sensing and processing the tilt angles of the palm, formatting the message, and transmitting it. Figure 7 shows the corresponding numerical values in milliseconds for comparison.

The other key component of our system is the DT, whose delays involve the computation of the FA for the five fingers, the calculation of the average FA to control the second arm of the RA model, the computation of all new relative positions using the RM, the update of the 3D scene, and the prediction of subsequent hand movements. Figure 8 provides a detailed breakdown.

Finally, Figure 9 illustrates the one-way latency introduced by the network architecture in the two scenarios previously described, where the MQTT broker and the Node-RED computing server are deployed either at the edge or in the public cloud. Specifically, in the scenario involving the public cloud, a virtual machine was instantiated to host the Node-RED server and the MQTT broker, which was implemented using Mosquitto.

The latency values reported in Figure 7, Figure 8 and Figure 9 were obtained through two distinct measurement strategies, depending on the component under evaluation. For device-side processing times, we used the built-in C++ millis function, which leverages the internal hardware clock of the microcontroller to provide millisecond-level timing precision, and the Python 3.10.0 time function to provide microsecond-level precision. For communication latency between the edge device and the cloud server, both systems were synchronized using a Network Time Protocol (NTP) server. Messages sent from the edge device to the server included a timestamp generated at the time of sending. Upon receiving the message, the server computed the latency by subtracting the received timestamp from its current time. This method allowed accurate measurement of end-to-end communication delays, compensating for potential clock drift through periodic re-synchronization with the NTP server.

Without leveraging hand movement prediction to reduce network-induced latency and, consequently, the overall delay in the interaction between end devices and the DT, the average total delay would have been 67.76 ms in the edge computing scenario and 117.7 ms in the cloud computing scenario. Table 1 provides a detailed summary of the individual contributions and the total service delay.

To further reduce the total delay and improve synchronization between the real and virtual environments, we evaluated the performance of the proposed RL-based hand movement prediction model. The following describes the simulation settings and the obtained results, leveraging the DDPG algorithm introduced in Section 4.4. To ensure variability and robustness, the system was trained and tested using data collected from five different human hands, operated by five distinct users from our research group. Each user independently controlled the commercial SWG system during data acquisition, introducing natural inter-user variability in hand shape, motion dynamics, and grasp behavior. Table 2 reports the MSE values for each component of the hand. The overall MSE across all components is 0.014206, indicating a generally low prediction error. Among the individual fingers, the middle finger shows the lowest MSE (0.009180), suggesting higher prediction accuracy, while the thumb and palm components (PalmX and PalmY) exhibit the highest errors. Overall, the errors remain relatively low, indicating good model performance across all components. It is crucial to underline that after training, the DDPG agent’s error stabilized within the first few tens of thousands of steps. In multi-hour inference sessions, prediction error stayed constant (±<5%) even when we added random latency spikes or small trajectory perturbations, demonstrating steady behavior.

In Figure 10, we reported the seven angles of movements showing that the predicted trajectories (orange dashed lines) generally follow the patterns of the real data (blue lines), indicating that the model captures the overall dynamics of finger and palm movement. However, some components demonstrate higher prediction stability than others. For instance, the middle and ring fingers show particularly close tracking, consistent with their lower MSE values. In contrast, components like the thumb and the palm coordinates exhibit more noticeable deviations, particularly during rapid transitions, aligning with their relatively higher MSE.

Despite the inherent challenge of capturing fast and fine-grained variations in movement, our model demonstrates good performance in predicting the overall motion patterns of hand components. This makes it particularly well-suited for applications involving repetitive or coarse movements, such as in agriculture, industrial automation, or other scenarios where high-frequency precision is not critical. In these contexts, the model’s ability to generalize and reproduce consistent motion trajectories proves sufficient and reliable, supporting its practical deployment in real-world systems.

6. Discussion

The results obtained from this work provide valuable insights into the most delay-critical processes that hinder achieving real-time interaction between the physical and digital worlds. Regarding the tasks performed by the SWG, as illustrated in Figure 7, it is evident that the process introducing the highest delay is the formatting of the message in JSON, which, despite being widely adopted, represents a significant bottleneck. Therefore, optimizing both the message format and its construction will be essential to reduce the overall system latency.

As for the DT, Figure 8 shows that the most time-consuming task is the updating of the 3D scene. This step can be considerably improved by hosting the DT on machines equipped with more efficient graphics processing units specifically tailored to accelerate such rendering operations.

Nonetheless, this work primarily focuses on optimizing the communication between the two environments. First, by identifying the most suitable deployment location for the MQTT broker and the computing server, as clearly highlighted in Figure 9, and then by exploring methods to mitigate or even anticipate the delays introduced by the network itself. This is achieved by leveraging the concepts of zero-delay and negative-delay networks.

Through the use of reinforcement learning, it becomes feasible, with negligible prediction errors, as demonstrated in Figure 10 and Table 2, to anticipate the hand movements up to 20 ms before they occur. This ensures that the 3D scene in the digital world can be updated in perfect synchronization with the real-world actions.

Figure 11 below clearly shows how the presence or absence of a predictive model significantly affects the one-way latency introduced by the network over time.

The black trace represents the one-way latency without prediction, while the green trace illustrates how the predictive model can substantially reduce this latency. In some cases, communication latency reaches zero; in others, it increases due to traffic spikes and in certain instances, it becomes negative, an abstract concept implying that the data are computed in advance of the actual physical movement, thanks to the model’s highly accurate predictions.

Several recent works across different domains address latency in DT systems through diverse architectural, algorithmic, and application-specific strategies. The DT model of the Geneva Motorway system proposed in [41] in the transportation domain achieves low-latency synchronization between real-time traffic and microscopic simulation by leveraging minute-level data feeds. While this solution provides runtime adaptability, its latency mitigation depends on dense sensor availability. In contrast, our system bypasses the need for continuous sensing by anticipating user gestures via RL, shifting latency minimization from the data layer to the user-interaction layer—critical for responsive control in edge AI scenarios like precision agriculture. In the telecommunication and systems optimization context, the authors in [42] address latency through optimal DT placement, balancing data age and application response time using heuristic algorithms like Z-Congestion. Their strategy focuses on network-side latency in static infrastructures, whereas our method tackles dynamic latency caused by wireless instability and user unpredictability, using forecasting to maintain responsiveness even in constrained environments. From the construction domain, the authors in [43] present a DT framework integrated with Mixed Reality (MR) for robotic construction. Their approach ensures real-time feedback by embedding the DT in an immersive MR interface, enabling remote monitoring and commissioning. However, their latency mitigation remains tethered to fixed robot actions and pre-modeled feedback loops. Our approach generalizes better to unstructured, real-world contexts, where interaction latency must be offset in real time through user intent prediction rather than immersive visualization. The DT-SYNC system in [44] focuses on reducing decision-making delays in precast construction by synchronizing cyber-physical operations using functional tickets. While effective in coordinating spatially constrained resources, DT-SYNC emphasizes task-level orchestration, whereas our system emphasizes ultra-low-latency gesture prediction, aiming to actuate devices before user feedback is fully registered, which is essential for real-time responsiveness in mobile edge scenarios. Lastly, the QoS-aware DT framework in [45] ensures latency control in manufacturing by enforcing deterministic communication through semantic modeling and redundant networking. Although robust in factory settings, such assumptions are less viable in open-field agriculture. Instead, our system dynamically adapts to fluctuating network and user conditions using lightweight DRL, enabling latency compensation without deterministic channels, a crucial distinction in bandwidth-limited or mobile deployments.

Despite the promising results, our system presents several limitations and challenges. First, while the gesture prediction model using RL achieves low latency, it requires a training phase specific to user behavior patterns, which may limit immediate generalization to new users or tasks without prior adaptation. Second, the wireless communication layer can still be affected by extreme environmental conditions (e.g., interference from vegetation, weather) that were not extensively tested in this study. These factors may impact latency. Finally, while the current approach compensates for network latency using prediction, it does not yet include fail-safe mechanisms or fallback strategies in case of gesture misclassification or signal dropout, which are important for critical applications involving actuation.

Future work will address these limitations by improving model generalization, integrating adaptive learning on edge devices, testing under more diverse environmental conditions, and including redundancy and safety strategies in the control logic.

To conclude, we aim to demonstrate the actual implementation of the proposed work through Figure 12.

7. Conclusions

The study presents a network architecture for a smart hydroponic greenhouse prototype designed to enable the remote control and management of a Robotic Arm through a sensor-equipped Wearable Glove. Furthermore, a Digital Twin of the system was developed to operate in three distinct modes: Real2Digital, where the digital environment is updated based on movements in the physical world; Digital2Real, where digital commands are used to modify the state of physical objects and Digital2Digital, a simulation mode for testing and analyzing the system’s response to specific inputs. The system was validated by measuring the delays introduced by the SWG, RA, network, and DT in order to identify potential bottlenecks in real-time communication, such as the optimal placement of the MQTT broker and computing server, message formatting on the end-device, and the 3D scene rendering within the DT. To partially mitigate communication delays in the interaction between physical end-devices and the DT, a Reinforcement Learning model was employed to accurately predict hand movements 20 ms before they occur. This predictive approach enables, in some cases, the elimination of network-induced latency, thereby enhancing synchronization between the physical and digital environments. Looking ahead, this approach opens new research avenues for designing predictive digital twin systems that can proactively adapt to user behavior and environmental dynamics, ultimately enabling near-instantaneous human–machine interaction in complex cyber-physical systems.

Author Contributions

Conceptualization, C.B., L.B., D.A. and S.G.; methodology, C.B. and L.B.; investigation, C.B. and L.B.; simulation, C.B. and L.B.; writing—original draft preparation, C.B. and L.B.; review and editing, S.G. and D.A.; supervision, D.A. and S.G.; project administration, S.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the Italian Ministry of Research in the framework of the CrossLab and Forelab Projects and by the European Union under the Italian National Recovery and Resilience Plan of NextGenerationEU, partnership on “National Research Centre for Agricultural Technologies” (CN00000022-program “AGRITECH”).

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

World Population Projections. Available online: https://www.worldometers.info/world-population/world-population-projections/ (accessed on 2 June 2025).
Food Security Update and World Bank Solutions to Food Insecurity. Available online: https://www.worldbank.org/en/topic/agriculture/brief/food-security-update (accessed on 2 June 2025).
Lowry, G.V.; Avellan, A.; Gilbertson, L.M. Opportunities and challenges for nanotechnology in the agri-tech revolution. Nat. Nanotechnol. 2019, 14, 517–522. [Google Scholar] [CrossRef] [PubMed]
Onishi, Y.; Yoshida, T.; Kurita, H.; Fukao, T.; Arihara, H.; Iwai, A. An automated fruit harvesting robot by using deep learning. Robomech J. 2019, 6, 1–8. [Google Scholar] [CrossRef]
Zhou, H.; Wang, X.; Au, W.; Kang, H.; Chen, C. Intelligent robots for fruit harvesting: Recent developments and future challenges. Precis. Agric. 2022, 23, 1856–1907. [Google Scholar] [CrossRef]
Zhang, R.; Zhu, H.; Chang, Q.; Mao, Q. A Comprehensive Review of Digital Twins Technology in Agriculture. Agriculture 2025, 15, 903. [Google Scholar] [CrossRef]
Rayhana, R.; Xiao, G.; Liu, Z. Internet of things empowered smart greenhouse farming. IEEE J. Radio Freq. Identif. 2020, 4, 195–211. [Google Scholar] [CrossRef]
Bua, C.; Borgianni, L.; Adami, D.; Giordano, S. Empowering Remote Agriculture: Wearable Glove Control for Smart Hydroponic Greenhouses. In Proceedings of the 2024 IEEE 25th International Conference on High Performance Switching and Routing (HPSR), Pisa, Italy, 22–24 July 2024; pp. 215–220. [Google Scholar] [CrossRef]
Bua, C.; Borgianni, L.; Adami, D.; Giordano, S. Digital Twin for Remote Control of Robotic Arm via Wearable Glove in Smart Agriculture. In Proceedings of the 2025 IEEE Wireless Communications and Networking Conference (WCNC), Milan, Italy, 24–27 March 2025; pp. 1–6. [Google Scholar] [CrossRef]
Commercial Hydroponic Greenhouse. Available online: https://futuranet.it/prodotto/n-264-maggio-2022/ (accessed on 9 February 2024).
Heltec Wi-Fi LoRa v3. Available online: https://heltec.org/project/wifi-lora-32-v3/ (accessed on 9 February 2024).
Fu, J.; Poletti, M.; Liu, Q.; Iovene, E.; Su, H.; Ferrigno, G.; De Momi, E. Teleoperation control of an underactuated bionic hand: Comparison between wearable and vision-tracking-based methods. Robotics 2022, 11, 61. [Google Scholar] [CrossRef]
Fang, B.; Guo, D.; Sun, F.; Liu, H.; Wu, Y. A robotic hand-arm teleoperation system using human arm/hand with a novel data glove. In Proceedings of the 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), Zhuhai, China, 6–9 December 2015; pp. 2483–2488. [Google Scholar]
Diftler, M.A.; Culbert, C.; Ambrose, R.O.; Platt, R.; Bluethmann, W. Evolution of the NASA/DARPA robonaut control system. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation (Cat. No. 03CH37422), Taipei, Taiwan, 14–19 September 2003; Volume 2, pp. 2543–2548. [Google Scholar]
Almeida, L.; Lopes, E.; Yalçinkaya, B.; Martins, R.; Lopes, A.; Menezes, P.; Pires, G. Towards natural interaction in immersive reality with a cyber-glove. In Proceedings of the 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), Bari, Italy, 6–9 October 2019; pp. 2653–2658. [Google Scholar]
Saraswathi, D.; Manibharathy, P.; Gokulnath, R.; Sureshkumar, E.; Karthikeyan, K. Automation of hydroponics green house farming using IoT. In Proceedings of the 2018 IEEE International Conference on System, Computation, Automation and Networking (ICSCA), Pondicherry, India, 6–7 July 2018; pp. 1–4. [Google Scholar]
Chen, Y.J.; Chien, H.Y. IoT-based green house system with splunk data analysis. In Proceedings of the 2017 IEEE 8th International Conference on Awareness Science and Technology (iCAST), Taichung, Taiwan, 8–10 November 2017; pp. 260–263. [Google Scholar]
Andrianto, H.; Suhardi; Faizal, A. Development of Smart Greenhouse System for Hydroponic Agriculture. In Proceedings of the 2020 International Conference on Information Technology Systems and Innovation (ICITSI), Padang, Indonesia, 19–23 October 2020; pp. 335–340. [Google Scholar] [CrossRef]
Li, S.L.; Han, Y.; Li, G.; Zhang, M.; Zhang, L.; Ma, Q. Design and implementation of agricultral greenhouse environmental monitoring system based on Internet of Things. Appl. Mech. Mater. 2012, 121, 2624–2629. [Google Scholar] [CrossRef]
Park, J.; Choi, J.H.; Lee, Y.J.; Min, O. A layered features analysis in smart farm environments. In Proceedings of the International Conference on Big Data and Internet of Thing, London, UK, 20–22 December 2017; pp. 169–173. [Google Scholar]
Xu, H.; Wu, J.; Pan, Q.; Guan, X.; Guizani, M. A Survey on Digital Twin for Industrial Internet of Things: Applications, Technologies and Tools. IEEE Commun. Surv. Tutor. 2023, 25, 2569–2598. [Google Scholar] [CrossRef]
Attaran, M.; Çelik, B.G. Digital Twin: Benefits, use cases, challenges, and opportunities. Decis. Anal. J. 2023, 6, 100165. [Google Scholar] [CrossRef]
Matteo, A.D.; Lozzi, D.; Mattei, E.; Mignosi, F.; Montagna, S.; Polsinelli, M.; Placidi, G. Calibration of the Double Digital Twin for the Hand Rehabilitation by the Virtual Glove. In Proceedings of the 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS), Guadalajara, Mexico, 26–28 June 2024. [Google Scholar] [CrossRef]
Yu, T.; Luo, J.; Gong, Y.; Wang, H.; Guo, W.; Yu, H.; Chen, G. A Compact Gesture Sensing Glove for Digital Twin of Hand Motion and Robot Teleoperation. IEEE Trans. Ind. Electron. 2024, 72, 1684–1693. [Google Scholar] [CrossRef]
Audonnet, F.P.; Grizou, J.; Hamilton, A.; Aragon-Camarasa, G. TELESIM: A Modular and Plug-and-Play Framework for Robotic Arm Teleoperation using a Digital Twin. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024. [Google Scholar] [CrossRef]
Bratchikov, S.; Bratchikov, S.; Abdullin, A.; Abdullin, A.; Demidova, G.; Demidova, G.L.; Lukichev, D.V.; Lukichev, D.V. Development of Digital Twin for Robotic Arm. In Proceedings of the 2021 IEEE 19th International Power Electronics and Motion Control Conference (PEMC), Gliwice, Poland, 25–29 April 2021. [Google Scholar] [CrossRef]
Abduljabbar, M.; Gochoo, M.; Sultan, M.T.; Batnasan, G.; Otgonbold, M.E.; Berengueres, J.; Alnajjar, F.; Rasheed, A.S.A.; Alshamsi, A.; Alsaedi, N. A Cloud-Based 3D Digital Twin for Arabic Sign Language Alphabet Using Machine Learning Object Detection Model. In Proceedings of the 2023 15th International Conference on Innovations in Information Technology (IIT), Al Ain, United Arab Emirates, 14–15 November 2023. [Google Scholar] [CrossRef]
Wu, Z.; Yao, Y.; Liang, J.; Jiang, F.; Chen, S.; Zhang, S.; Yan, X. Digital twin-driven 3D position information mutuality and positioning error compensation for robotic arm. IEEE Sens. J. 2023, 22, 27508–27516. [Google Scholar] [CrossRef]
Cheng, H.; Yang, L.; Liu, Z. Survey on 3D hand gesture recognition. IEEE Trans. Circuits Syst. Video Technol. 2015, 26, 1659–1673. [Google Scholar] [CrossRef]
Kurakin, A.; Zhang, Z.; Liu, Z. A real time system for dynamic hand gesture recognition with a depth sensor. In Proceedings of the 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), Bucharest, Romania, 27–31 August 2012; pp. 1975–1979. [Google Scholar]
Dong, Y.; Liu, J.; Yan, W. Dynamic Hand Gesture Recognition Based on Signals From Specialized Data Glove and Deep Learning Algorithms. IEEE Trans. Instrum. Meas. 2021, 70, 1–14. [Google Scholar] [CrossRef]
Ge, Y.; Li, B.; Yan, W.; Zhao, Y. A real-time gesture prediction system using neural networks and multimodal fusion based on data glove. In Proceedings of the 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI), Xiamen, China, 29–31 March 2018; pp. 625–630. [Google Scholar] [CrossRef]
Matulis, M.; Harvey, C. A robot arm digital twin utilising reinforcement learning. Comput. Graph. 2021, 95, 106–114. [Google Scholar] [CrossRef]
Lee, D.; Lee, S.; Masoud, N.; Krishnan, M.; Li, V.C. Digital twin-driven deep reinforcement learning for adaptive task allocation in robotic construction. Adv. Eng. Informat. 2022, 53, 101710. [Google Scholar] [CrossRef]
Khdoudi, A.; Masrour, T.; El Hassani, I.; El Mazgualdi, C. A deep-reinforcement-learning-based digital twin for manufacturing process optimization. Systems 2024, 12, 38. [Google Scholar] [CrossRef]
Bua, C.; Adami, D.; Giordano, S. GymHydro: An Innovative Modular Small-Scale Smart Agriculture System for Hydroponic Greenhouses. Electronics 2024, 13, 1366. [Google Scholar] [CrossRef]
Werable Glove. Available online: https://www.hiwonder.com/products/wireless-glove-open-source-somatosensory-mechanical-glove (accessed on 9 February 2024).
Robotic Arm. Available online: https://futuranet.it/prodotto/braccio-robotico-6dof-con-pinza-e-servi-rc/ (accessed on 9 February 2024).
Octopus Shield. Available online: https://fishino.it/home-it.html (accessed on 9 February 2024).
Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. Openai gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
Kušić, K.; Schumann, R.; Ivanjko, E. A digital twin in transportation: Real-time synergy of traffic data streams and simulation for virtualizing motorway dynamics. Adv. Eng. Inf. 2023, 55, 101858. [Google Scholar] [CrossRef]
Vaezi, M.; Noroozi, K.; Todd, T.D.; Zhao, D.; Karakostas, G. Digital Twin Placement for Minimum Application Request Delay With Data Age Targets. IEEE Internet Things J. 2023, 10, 11547–11557. [Google Scholar] [CrossRef]
Ravi, K.S.D.; Ng, M.S.; Medina Ibáñez, J.; Hall, D.M. Real-time Digital Twin of Robotic construction processes in Mixed Reality. In Proceedings of the 38th International Symposium on Automation and Robotics in Construction (ISARC), Dubai, United Arab Emirates, 2–5 November 2021; pp. 451–458. [Google Scholar] [CrossRef]
Jiang, Y.; Li, M.; Li, M.; Liu, X.; Zhong, R.Y.; Pan, W.; Huang, G.Q. Digital twin-enabled real-time synchronization for planning, scheduling, and execution in precast on-site assembly. Autom. Constr. 2022, 141, 104397. [Google Scholar] [CrossRef]
Li, J.; Zhang, Y.; Qian, C. The enhanced resource modeling and real-time transmission technologies for Digital Twin based on QoS considerations. Robot. Comput.-Integr. Manuf. 2022, 75, 102284. [Google Scholar] [CrossRef]

Figure 1. The Sensor-equipped Wearable Glove.

Figure 2. The 6-degree-of-freedom Robotic Arm.

Figure 3. System architecture.

Figure 4. Digital Twin Framework.

Figure 5. Hand Exoskeleton.

Figure 6. First and Second Robotic Arms.

Figure 7. Delays introduced by the wearable glove.

Figure 8. Digital Twin Delay Contributions.

Figure 9. Networks average one way latency.

Figure 10. Hand components.

Figure 11. Edge Computing Network One Way Latency Trend over time.

Figure 12. Real Implementation: (a) and (b) represent 0 degree and 45 degree example positions, respectively.

Table 1. Average Total Service Delay.

	Delays (ms)
	Edge Computing	Cloud Computing
SWG Total Delay	12.33	12.33
Network Latency	20.74	70.68
Digital Twin Total Delay	34.69	34.69
Total Delay	67.76	117.70

Table 2. Mean Squared Error for each hand component.

Metric	MSE Value
Overall MSE	0.014206
Thumb	0.017617
Index Finger	0.015549
Middle Finger	0.009180
Ring Finger	0.010753
Little Finger	0.013323
PalmX	0.017937
PalmY	0.015083

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bua, C.; Borgianni, L.; Adami, D.; Giordano, S. Reinforcement Learning-Driven Digital Twin for Zero-Delay Communication in Smart Greenhouse Robotics. Agriculture 2025, 15, 1290. https://doi.org/10.3390/agriculture15121290

AMA Style

Bua C, Borgianni L, Adami D, Giordano S. Reinforcement Learning-Driven Digital Twin for Zero-Delay Communication in Smart Greenhouse Robotics. Agriculture. 2025; 15(12):1290. https://doi.org/10.3390/agriculture15121290

Chicago/Turabian Style

Bua, Cristian, Luca Borgianni, Davide Adami, and Stefano Giordano. 2025. "Reinforcement Learning-Driven Digital Twin for Zero-Delay Communication in Smart Greenhouse Robotics" Agriculture 15, no. 12: 1290. https://doi.org/10.3390/agriculture15121290

APA Style

Bua, C., Borgianni, L., Adami, D., & Giordano, S. (2025). Reinforcement Learning-Driven Digital Twin for Zero-Delay Communication in Smart Greenhouse Robotics. Agriculture, 15(12), 1290. https://doi.org/10.3390/agriculture15121290

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reinforcement Learning-Driven Digital Twin for Zero-Delay Communication in Smart Greenhouse Robotics

Abstract

1. Introduction

2. Related Work

2.1. Wearable Glove-Based Teleoperation Systems

2.2. IoT and Smart Agriculture Monitoring

2.3. Digital Twin Applications in Robotics and Human Interaction

2.4. Limitations of Vision-Based Gesture Recognition

2.5. Digital Twins Combined with Reinforcement Learning

2.6. Positioning of Our Work

3. Materials and Methods

3.1. Sensor-Equipped Wearable Glove

3.2. 6-Degree-of-Freedom Robotic Arm

3.3. Smart Greenhouse System Architecture

3.4. Training Configurations of the RL Model

4. Digital Twin Framework and Hand Movement Prediction Model

4.1. Digital Twin Framework

4.2. SWG Computation

4.3. RA and DT Computation

4.4. Hand Movements Prediction with Reinforcement Learning

5. Experimental Results

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI