GRU–Transformer Hybrid Model for GNSS/INS Integration in Orchard Environments

Gao, Peng; Fang, Jinzhen; He, Junlin; Ma, Shuang; Wen, Guanghua; Li, Zhen

doi:10.3390/agriculture15111135

Open AccessArticle

GRU–Transformer Hybrid Model for GNSS/INS Integration in Orchard Environments

by

Peng Gao

^1,2,

Jinzhen Fang

¹,

Junlin He

¹,

Shuang Ma

¹,

Guanghua Wen

¹ and

Zhen Li

^1,2,*

¹

College of Electronic Engineering (College of Artificial Intelligence), South China Agricultural University, Guangzhou 510642, China

²

Division of Citrus Machinery, China Agriculture Research System of MOF and MARA, Guangzhou 510642, China

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(11), 1135; https://doi.org/10.3390/agriculture15111135

Submission received: 29 April 2025 / Revised: 22 May 2025 / Accepted: 23 May 2025 / Published: 24 May 2025

(This article belongs to the Special Issue Agricultural Machinery and Technology for Fruit Orchard Management)

Download

Browse Figures

Versions Notes

Abstract

Precision positioning in orchards relies on Global Navigation Satellite System and Inertial Navigation System (GNSS/INS) integration. However, dense foliage often causes GNSS blockages, degrading accuracy and robustness. This paper proposes an optimized GNSS/INS integrated navigation method based on a hybrid Gated Recurrent Unit (GRU)–Transformer model (GRU-T). The GRU–Transformer hybrid dynamically adjusts the process noise covariance matrix within an error-state Extended Kalman Filter (ES-EKF) framework to address non-stationary noise and signal outages. Forest field tests demonstrate that GRU-T significantly improves positioning accuracy. Compared with the conventional ES-EKF, the proposed method achieves reductions in position root mean square error (PRMSE) of 48.74% (East), 41.94% (North), and 61.59% (Up), and reductions in velocity root mean square error (VRMSE) of 71.5% (East), 39.31% (North), and 56.48% (Up) in the East–North–Up (ENU) coordinate frame. The GRU-T model effectively captures both short- and long-term temporal dependencies and meets real-time, high-frequency sampling requirements. These results indicate that the GRU–Transformer hybrid model enhances the accuracy and robustness of GNSS/INS navigation in complex orchard environments, offering technical support for high-precision positioning in intelligent agricultural machinery systems.

Keywords:

transformer; GRU; sensor fusion; Kalman filter; trajectory prediction; orchard navigation

1. Introduction

With the continuous advancement of precision agriculture and intelligent agricultural machinery technology, the demand for high-precision positioning in fields such as automated seeding and unmanned spraying is constantly increasing. In these application scenarios, agricultural machinery needs to obtain accurate location information to ensure the smooth operation of automation. However, most orchard environments often have dense crops, tall vegetation, and complex terrain, which frequently cause Global Navigation Satellite System (GNSS) signals to be obstructed, thus affecting positioning accuracy [1]. Although GNSSs can provide high-precision geographic location information, their limitations under signal loss or obstruction remain a major bottleneck for efficient operation of intelligent agricultural machinery. In complex environments like forests, the dense trees and shifting vegetation make GNSS signals prone to blocking or reflection, severely affecting positioning accuracy. Furthermore, the GNSS update frequency decreases in such complex environments, and once the signal is interrupted, the navigation system may fail to operate properly [2]. To address this issue, many researchers have proposed sensor fusion solutions. Jiang et al. proposed integrating Light Detection And Ranging (LiDAR) sensor data, which improved the accuracy of the traditional Global Navigation Satellite System and Inertial Navigation System (GNSS/INS) integration system by 70.4% [3], effectively compensating for the deficiencies of single sensors and improving the robustness and accuracy of navigation systems.

However, traditional visual navigation methods face similar challenges. In complex farmland or densely planted environments, due to factors such as lighting changes and plant obstruction, visual navigation often fails to provide sufficient positioning accuracy, which presents significant challenges for vision-based autonomous navigation systems [4]. Especially in forest environments, due to ground and canopy obstructions, the effectiveness of visual information is greatly reduced, making image processing and target recognition difficult [5]. Therefore, achieving efficient and precise positioning in complex environments where GNSS signals are lost has become a key issue in the development of intelligent agricultural machinery technology.

Inertial Navigation Systems (INSs) provide an effective auxiliary positioning technology that can offer continuous navigation capabilities in the absence of GNSS signals [6]. Unlike traditional GNSSs, INSs do not rely on external signals but estimate position and attitude through embedded accelerometers and gyroscopes. However, the accuracy of the inertial measurement unit (IMU) is affected by accumulated errors over time, especially during long-duration navigation. Liu et al. confirmed that IMU errors grow linearly and without bounds over time, with displacement errors reaching several tens of meters within 10 s [7]. Yang et al. found that accumulated errors make the static process noise model ineffective in complex dynamic environments [8].

Currently, integrated GNSS and INS navigation systems have become the mainstream positioning technology. By combining the high-precision positioning of GNSSs with the continuous navigation capability of INSs, efficient positioning can be achieved in various complex environments [9]. However, GNSS signals are often lost due to multipath effects, building obstructions, extreme weather, and other factors [10], and INS may accumulate errors over long-term use, leading to a decline in positioning accuracy [11]. Additionally, in relevant environments, GNSS signals are often disturbed by dense vegetation, while INS systems are prone to error accumulation in the absence of external update signals, resulting in a significant decrease in location accuracy [12].

To mitigate the impact of GNSS signal loss, numerous studies have proposed filtering methods based on the fusion of INS and GNSS, such as Kalman Filtering (KF), Unscented Kalman Filtering (UKF), and Extended Kalman Filtering (EKF). Kaviani compared the attitude estimation errors of KF, PF, EKF, and UKF methods, concluding that their performance outperforms INS/GNSS integration systems [13]. Hu et al. introduced the MP-UKF method, which utilizes a model-predictive filter to establish a dynamic model for error estimation, improving the adaptive capability of UKF, and reducing position errors by 2–5 m [14]. However, these filtering methods still encounter issues with low accuracy and poor convergence when handling highly nonlinear dynamic systems. Therefore, in response to the demand for high precision and reliability, the focus of research has shifted to effectively fusing sensor data to enhance positioning accuracy and overcome the limitations of individual systems, particularly in complex, densely vegetated forest environments.

The impact of process noise on position updates in KF is significant, and optimizing noise models in agricultural dynamic environments has become a key research direction in smart agricultural machinery navigation technology. To address this challenge, researchers have proposed various Adaptive Kalman Filtering (AKF) methods and deep learning fusion approaches. These methods dynamically adjust process noise parameters, improving system adaptability to environmental changes, especially in environments where GNSS signals are hindered or heavily interfered with. With the advancement of deep learning technologies, models like Long Short-Term Memory networks (LSTM), Gated Recurrent Units (GRUs) [15], and Transformer [16] models have shown remarkable abilities in handling time-series data. In INS and GNSS integrated navigation systems, these models have become crucial tools for enhancing positioning accuracy and robustness.

Cohen et al. proposed the A-KIT (Adaptive Kalman-Informed Transformer) framework, which introduces a Set Transformer network structure to dynamically regress the process noise covariance matrix, significantly improving positioning accuracy in INS fusion systems [16]. Fang et al. proposed an LSTM-based INS method to enhance integrated navigation performance in GNSS signal loss environments, achieving a 95% improvement in positioning accuracy over pure INS navigation during GNSS outages [17]. Additionally, Wang et al. introduced the Online Semi-Supervised Transformer model, which maintains positioning accuracy to some extent during GNSS signal disruptions, though with a maximum error of 9.81 m [18]. In agriculture, Zheng et al. used a hybrid GRU–Transformer model to enhance soil moisture prediction accuracy, demonstrating its clear advantages for short-term forecasting. The strength of this fusion lies in combining the GRU’s efficiency in handling time-series data and capturing local dependencies with the Transformer’s self-attention mechanism for modeling global context, thereby achieving higher prediction accuracy and robustness in complex tasks [19]. These studies demonstrate that deep learning models, especially LSTM, GRUs, and Transformer, can effectively model the temporal features and complex relationships between sensor data in INS/GNSS integrated navigation systems, offering novel solutions to positioning challenges caused by GNSS signal loss or obstruction.

This paper proposes a hybrid model based on a Gated Recurrent Unit (GRU) and Transformer architecture (hereafter referred to as GRU-T for convenience) to optimize EKF performance in complex environments such as forests. Compared to traditional EKF methods, the key contribution of this work lies in the introduction of deep learning models, GRU and Transformer, to dynamically correct process noise, reducing error accumulation caused by long-term dependencies and nonlinearity. By combining GRUs and Transformer, the model can simultaneously capture both short-term and long-term dependencies in temporal sequences, significantly improving the accuracy and robustness of GNSS/INS integrated navigation systems.The main contributions of this paper can be summarized as follows:

1.: A noise correction method based on the combination of a GRU and Transformer is proposed to address the inadequacies of traditional EKF assumptions on process noise in dynamic environments, as illustrated in Figure 1;
2.: The method’s effectiveness in complex environments, particularly in cases of GNSS signal blockage and error accumulation, is experimentally validated, demonstrating significant improvements in positioning accuracy;
3.: The approach strikes a good balance between real-time performance and computational efficiency, meeting the demands of high-frequency sampling in real-time applications.

2. Materials and Methods

2.1. EKF Algorithm for Integrated Navigation

The Extended Kalman Filter (EKF) is a widely used classical method for state estimation in nonlinear systems, especially in integrated navigation tasks involving Inertial Navigation Systems (INSs) and Global Navigation Satellite Systems (GNSSs) [20]. Due to the drift errors in the inertial measurement unit (IMU), errors accumulate over time when using IMU navigation alone. The EKF, by fusing observation data from external sensors such as GNSS and DVL, can correct navigation errors in real time, improving the positioning accuracy and stability of the navigation system [21].

2.1.1. INS/GNSS Kinematic Model

In this study, the East–North–Up (E-N-U) coordinate system is used as the reference coordinate system for inertial navigation (n-frame). The specific force f measured by the IMU in the body frame (b-frame) is converted to the n-frame, enabling velocity and position calculations [22]. The velocity update equation is:

{\dot{v}}^{n} = C_{b}^{n} f^{b} - (2 ω_{i e}^{n} + ω_{e n}^{n}) \times v^{n} + g^{n}

(1)

where

v^{n}

represents the velocity vector in the navigation coordinate system;

g

is the local gravitational acceleration; and

C_{b}^{n}

is the direction cosine matrix from the body frame to the navigation frame, which is defined as:

C_{b}^{n} = [\begin{matrix} cos θ cos ψ & - cos ϕ sin ψ + sin ϕ sin θ cos ψ & sin ϕ sin ψ + cos ϕ sin θ cos ψ \\ cos θ sin ψ & cos ϕ cos ψ + sin ϕ sin θ sin ψ & - sin ϕ cos ψ + cos ϕ sin θ sin ψ \\ - sin θ & sin ϕ cos θ & cos ϕ cos θ \end{matrix}]

(2)

(

ψ

θ

ϕ

) represent yaw, pitch, and roll, respectively.

In the n-coordinate system, the specific force can be represented as

f^{n} = {[f_{N} f_{E} f_{U}]}^{T}

, (

f_{N}

f_{E}

f_{U}

), which is the measurement of specific force in the East, North, and Up directions. where

ω_{e}

represents the angular velocity of Earth’s rotation with a magnitude of

7.29 \times 10^{- 5}

rad/s. Then, the vectors

ω_{e n}^{n}

can be expressed as:

ω_{e n}^{n} = {[\frac{v_{E}}{R_{N} + h} \frac{- v_{N}}{R_{M} + h} \frac{- v_{E} tan φ}{R_{N} + h}]}^{T}

(3)

where

φ

represents latitude, and

v_{E}

and

v_{n}

denote eastward and northward velocities, respectively.

R_{M}

and

R_{N}

represent the radii of the meridian and the prime vertical, respectively. In the geodetic coordinate system, the position prediction equation for the INS is as follows:

[\begin{matrix} \dot{φ} \\ \dot{λ} \\ \dot{h} \end{matrix}] = [\begin{matrix} 0 & \frac{1}{R_{M} + h} & 0 \\ \frac{1}{(R_{N} + h) cos φ} & 0 & 0 \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} V_{E} \\ V_{N} \\ V_{U} \end{matrix}]

(4)

where (

v_{E} v_{N} v_{U}

) represents the velocity components in the east, north, and upward directions; (

φ λ h

) represents the position components in longitude, latitude, and altitude. By using the above method, the attitude, velocity, and position of the object in motion can be determined.

2.1.2. Conventional Error-State Extended Kalman Filter

To address the nonlinear problem in the system, this paper uses the error-state Extended Kalman Filter (ES-EKF) for the integrated navigation solution of INS/GNSS (for convenience, EKF is used to represent ES-EKF hereafter). EKF is an indirect filtering method, and its core is to estimate the navigation state and sensor errors by introducing an error-state vector [23]. The state vector is defined as follows:

δ x (t) = {[{(δ r^{n})}^{T} {(δ v^{n})}^{T} Φ^{T} b_{a}^{T} b_{g}^{T} s_{a}^{T} s_{g}^{T}]}^{T} .

(5)

In this equation, the components of the vector are functions of time, and for convenience, the time variable is omitted.

δ r^{n}

represents the position error vector of the inertial navigation system (INS);

δ v^{n}

represents the velocity error vector of the INS;

Φ

represents the attitude error vector;

b_{a}

represents the three-axis accelerometer bias vector;

b_{g}

represents the three-axis gyroscope bias vector;

s_{a}

represents the accelerometer scale factor error vector; and

s_{g}

represents the gyroscope scale factor error vector. Therefore, the following can be derived:

x^{t} = x^{e} - δ x

(6)

where

x^{t}

represents the true state;

x^{e}

represents the estimated state. Thus, the system equation for the state vector residuals can be expressed as:

δ x (t) = F (t) δ x (t) + G (t) w (t)

(7)

where

F (t)

represents the system matrix;

G (t)

represents the system noise distribution matrix; and

w (t)

represents the system noise vector. The Kalman Filter process typically consists of two distinct phases: the prediction and update steps.

During the prediction step, the error analysis assumptions for the EKF are as follows:

δ x^{-} (t) = 0 .

(8)

Based on these assumptions, the original equation can be linearized, and Equation (7) can be discretized:

δ x_{k} = Φ_{k / k - 1} δ x_{k - 1} + w_{k - 1}

(9)

where

Φ_{k / k - 1}

is the state transition matrix from

t_{k - 1}

to

t_{k}

;

w_{k - 1}

is the equivalent discretization matrix of the driving white noise. This allows the prediction of the state and its covariance equations to be written as follows:

{\hat{x}}_{k / k - 1} = Φ_{k / k - 1} {\hat{x}}_{k - 1}

(10)

P_{k / k - 1} = Φ_{k - 1} P_{k - 1} Φ_{k - 1}^{T} + Γ_{k - 1} Q_{k - 1} Γ_{k - 1}^{T}

(11)

where

{\hat{x}}_{k / k - 1}

represents the predicted state vector at

t_{k}

;

{\hat{x}}_{k - 1}

represents the optimal estimate at

t_{k - 1}

;

P_{k - 1}

represents the predicted value of the optimal estimation covariance matrix at

t_{k - 1}

;

Φ_{k - 1}

represents the one-step state transition matrix;

Γ_{k - 1}

represents the system noise driving matrix; and

Q_{k - 1}

represents the system state noise covariance matrix.

Let the discretization time interval be

Δ t = t_{k - 1} - t_{k}

, and when

[t_{k}, t_{k - 1}]

shows minimal variation within the short integration interval, it is assumed

F (t_{k - 1}) Δ t ≪ I

that satisfies. In this case, where I is the identity matrix, the state transition matrix is computed as follows:

Φ_{k / k - 1} \approx I + F (t_{k - 1}) Δ t .

(12)

Similarly, when there is little change within the short integration interval,

Q_{k}

can be simplified to trapezoidal integration:

Q_{k} \approx \frac{1}{2} [Φ_{k / k - 1} G (t_{k - 1}) q (t_{k - 1}) G^{T} (t_{k - 1}) Φ_{k / k - 1}^{T} + G (t_{K}) q (t_{k}) G^{T} (t_{k})] Δ t

(13)

where

q (t)

represents the continuous-time system noise equation matrix, which is generally a constant matrix determined by the IMU sensor error parameters. The update step is as follows:

K_{k} = P_{k / k - 1} H_{k}^{T} {(H_{k} P_{k / k - 1} H_{k}^{T} + R_{K})}^{- 1}

(14)

{\hat{x}}_{k} = {\hat{x}}_{k / k - 1} + K_{k} (z_{k} - H_{k} {\hat{x}}_{k / k - 1})

(15)

δ {\hat{x}}_{k} = K_{k} δ z_{k}

(16)

P_{k} = (I - K_{k} H_{k}) P_{k / k - 1} {(I - K_{k} H_{k})}^{T} + K_{k} R_{k} K_{k}^{T}

(17)

where

K_{k}

represents the Kalman gain matrix at

t_{k}

, which determines the weighting between the predicted value and the prediction calculated by the system model;

H_{k}

represents the observation matrix at

t_{k}

;

z_{k}

represents the observation vector at

t_{k}

; and

R_{k}

represents the measurement noise covariance matrix at

t_{k}

. The observation matrix is defined as follows:

H_{k} = {\frac{\partial h (x)}{\partial x}|}_{x = x^{e}}

(18)

where

h (x)

is the update equation of the nonlinear function of the state vector.

2.1.3. Adaptive Extended Kalman Filter

In traditional ES-EKF, during the initial filtering stage, the process noise covariance matrix (Q) and the observation noise covariance matrix (R) are set based on the model and observational noise parameters. Its performance is limited by the dependency on the prior knowledge of Q and R, and in practical applications, the noise characteristics often cause filter divergence due to dynamic changes in the environment. Moreover, numerous studies have shown that adaptive filtering often outperforms traditional methods in complex and dynamic environments. Ding et al. proposed an Adaptive Extended Kalman Filtering (AEKF) method based on covariance matching [24]. The core of this method is to use the statistical properties of the innovation sequence to correct the noise model in real time, where the innovation sequence is defined as the difference between the observed value and the predicted value:

d_{k} = δ z_{k} - H_{k} {\hat{x}}_{k} .

(19)

Next, the actual innovation covariance is calculated through a sliding window,

{\hat{C}}_{k}

, defined as:

{\hat{C}}_{k} = \frac{1}{N} \sum_{i = i_{0}}^{k} δ z_{i} δ z_{i}^{T} .

(20)

This is then compared with the theoretical value, and a scaling factor is proposed:

α = \frac{t r a c e \{H_{k} (Φ_{k - 1} {\hat{P}}_{k - 1} Φ_{k - 1}^{T} + {\hat{Q}}_{k - 1}) H_{k}^{T}\}}{t r a c e \{H_{k} (Φ_{k - 1} {\hat{P}}_{k - 1} Φ_{k - 1}^{T} + Q_{k - 1}) H_{k}^{T}\}} .

(21)

This factor reflects the deviation between the actual noise and the model-predicted noise. Through a smoothing feedback mechanism, the Q matrix is dynamically adjusted, and the mechanism is as follows:

{\hat{Q}}_{k} = {\hat{Q}}_{k - 1} \sqrt{α} .

(22)

This method avoids the sensitivity problem of traditional random modeling methods to short data windows.

2.2. Methodology

Numerous studies have proven that system state process noise has a significant impact on the performance of the EKF [25]. However, in the traditional EKF filtering framework, the process noise covariance is typically considered as a constant. This setting can achieve a certain degree of effectiveness in general indoor or outdoor open environments, but in scenarios with significant obstruction and dynamic noise variation, such as forests, the assumption of fixed or simple approximations of process noise covariance severely limits its adaptability in time-varying and uncertain environments [26]. This paper proposes incorporating a deep learning module combining a GRU and Transformer into the EKF process noise estimation. By correcting the process noise, this method solves the problem of error accumulation in vehicle trajectory navigation in complex environments such as forests. The correction factor for the process noise, obtained through GRU-T model training,

q_{k}^{a d j u s t}

adjusts the process noise as follows:

Q_{k} = {\hat{Q}}_{k} q_{k}^{a d j u s t}

(23)

where

q_{k}^{a d j u s t} \in R^{21 \times 21}

represents the diagonal correction matrix generated by the model. By optimizing

Q_{k}

during the trajectory prediction process, the ability of EKF to handle long-term dependencies in trajectory sequences is enhanced. By predicting a diagonal matrix, the model’s prediction information is simplified, eliminating the need for a large amount of data for training. Furthermore, by predicting correction factors, the physical modeling process’s process noise is retained, reducing dependency on the neural network model.

2.2.1. GRU-T Encoder Layer

GRU is a classic architecture of recurrent neural networks (RNNs), widely used for time series modeling [27]. Its core advantage lies in its ability to efficiently capture both long-term and short-term dependencies in time series through the collaborative functioning of the update and reset gates, while effectively avoiding the vanishing gradient problem.

In contrast to the EKF, which struggles to capture the strong temporal dependencies in inertial navigation data due to its linear assumptions, the GRU leverages its gating mechanism to dynamically regulate information flow, effectively extracting temporal features and learning complex dependencies in time series while maintaining high computational efficiency. Furthermore, on small datasets, the performance of the GRU is comparable to that of the LSTM model [28]. Additionally, research by Kuang et al. in the field of INS/GNSS navigation further confirms that GRU and LSTM exhibit similar accuracy in this application scenario [29]. However, a significant advantage of the GRU model lies in its reduced number of parameters, a characteristic that not only lowers computational resource demands but also enhances the efficiency of real-time processing. Therefore, based on these advantages, this study opts to implement the GRU model. The performance of traditional EKF algorithms is limited when dealing with complex nonlinearities, while the Transformer model, through its self-attention mechanism, can model global long-range dependencies between time steps [30], significantly improving the ability to model nonlinear systems. By combining a GRU with Transformer, the GRU efficiently handles short-term dependencies, while Transformer captures global features through its self-attention mechanism. Together, they enhance the model’s ability to model spatiotemporal relationships.

In this paper, we preprocess the input IMU’s angular increments and velocity increments, as well as the attitude and position data solved by the EKF. The reset gate and the update gate are used to extract the temporal dependencies in the data. The specific computation process is as follows:

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}])

(24)

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}])

(25)

{\tilde{h}}_{t} = tanh (W \cdot [r_{t} ⊙ h_{t - 1}, x_{t}])

(26)

h_{t} = (1 - z_{t}) ⊙ h_{t - 1} + z_{t} ⊙ {\tilde{h}}_{t}

(27)

where

z_{t}

represents the output of the update gate,

σ

represents the sigmoid activation function, and

W

denotes the corresponding weight equation;

r_{t}

is the output of the reset gate,

h_{t - 1}

is the previous time step’s hidden state,

x_{t}

is the current time step’s input, and ⊙ is the Hadamard product.

By coupling the encoder of GRU-T, which combines the GRU’s ability to capture temporal features in the data with Transformer’s processing capabilities [31], this approach enables a more comprehensive observation of the data dependencies in the target matrix generation process, thus reducing training time. The self-attention mechanism decouples attention into query, key, and value tuples, computing the relationships between different time data to identify complex temporal dependencies. The attention mechanism is defined by the following formula:

A t t e n t i o n (Q, K, V) = S o f t m a x (\frac{Q K^{T}}{\sqrt{d_{q}}}) V

(28)

where

d_{q}

represents the dimensionality of the vector, which

\sqrt{d_{q}}

helps to avoid the vanishing gradient problem, preventing the function from overflowing during the computation. Since a single head can only attend to local information features, a multi-head attention mechanism is used in the Transformer [32]. In this model, the query vector (

Q

), key vector (

K

), and value vector (

V

) are projected onto h different heads. The multi-head attention mechanism is defined as:

M u l t i H e a d (Q, K, V) = f_{o} ({[h e a d_{i}]}_{i = 1}^{h})

(29)

where

f_{o}

represents a concatenation layer that integrates the outputs of the h heads. Thus, the coupling process of the GRU and the encoder is shown in Figure 2.

By introducing the multi-head self-attention mechanism, the limitation of the GRU model in capturing interdependencies between sequential data is addressed, overcoming the deficiency in capturing distance-related correlations between data.

Therefore, the design approach in this paper is to first input accelerometer data, gyroscope data, and the attitude and velocity obtained from the EKF solution into the GRU network, and then pass them to the Transformer’s encoder module. By leveraging both the temporal and spatial features of the data, the model’s ability to capture relationships within the data is enhanced, improving the accuracy of the prediction results.

2.2.2. The Core of the Model

The GRU-T proposed in this paper effectively captures spatiotemporal relationships in the data, reduces computational costs, and thereby enhances computational efficiency. Additionally, incorporating positional encoding in the decoder allows for the preservation of essential information while retaining the original key vectors, preventing any loss of information.

As shown in Figure 2, the GRU-T architecture proposed in this study is an integrated Transformer specifically designed to process inertial data and velocity time-series data. This structure uses small-batch data for

Q_{k}

, including angular velocity increments and velocity increments from the IMU, as well as uncorrected EKF-estimated velocities. After normalization and timestamp alignment preprocessing, the data are sequentially input into the GRU-T encoder and decoder structure, where features are captured simultaneously in both the temporal and channel domains. After processing by the GRU-T encoder, the data enter the decoder model. To enhance the performance of the decoder, we introduce supplementary positional encoding, providing necessary positional information [33]. A dropout layer is also added to prevent overfitting and improve the model’s generalization ability by randomly deactivating a portion of the neurons.

The output part addresses the dynamic correction needs of the process noise, set as one or more diagonal matrix correction factors. The network, through the final linear layer and activation function, maps the high-level features extracted into several trainable correction values, which are used to adjust the diagonal elements of the EKF process noise covariance matrix. Strict non-negative or softmax constraints are applied to ensure the positive definiteness of the corrected matrix, avoiding numerical issues caused by improper output ranges. The training objective of the model is to minimize the position loss function. The network continuously adjusts the correction factors so that the filtered position estimates progressively approach the true values, thereby reducing the overall EKF cumulative error. The complete architecture is shown in Figure 3.

This architecture first receives multi-source sensor inputs (dimension

1 \times 3 \times 200

), using independent GRU layers to extract temporal dependency features from each sensor, and constructs a joint spatiotemporal feature tensor through channel concatenation. The decoder applies masked multi-head attention to constrain causal dependencies between time steps and utilizes a feedforward neural network to enhance the nonlinear representation capacity. Ultimately, the model compresses the features along the time dimension into a static vector, which is expanded to 21 dimensions through a linear layer and mapped to a diagonal process noise covariance matrix, constrained by a Softmax function to ensure the matrix’s positive definiteness.

The Algorithm 1 demonstrates the EKF integrated with

q_{k}^{a d j u s t}

predicted by the GRU-T.

Algorithm 1 GRU-T over EKF

1:: Input: ${\hat{f}}^{b}, {\hat{ω}}^{b}, Windowsize, Q, R$
2:: Initialize: $δ x^{n}, δ b_{a}, δ b_{g}, P$
3:: for $i \leftarrow 1$ to T do
4:: ${\hat{f}}^{b} \leftarrow {\hat{f}}^{b} - δ b_{a}$
5:: ${\hat{ω}}^{b} \leftarrow {\hat{ω}}^{b} - δ b_{g}$
6:: Calculate: $Φ_{k | k - 1}$ (12)
7:: Perform the prediction step (11)
8:: INS processing
9:: if prediction is valid then
10:: Perform the update step (14)–(18)
11:: Get GNSS update
12:: end if
13:: if update size >= Windowsize then
14:: ${\hat{Q}}_{k} \leftarrow INSPropagate$ (13)
15:: $q_{k}^{a d j u s t} \leftarrow GRU - T ({\hat{f}}^{b}, {\hat{ω}}^{b}, x_{E K F}^{n})$
16:: $Q_{k} \leftarrow {\hat{Q}}_{k} \cdot q_{k}^{a d j u s t}$ (19)
17:: end if
18:: $x^{n} \leftarrow ErrorFeedback (K, Q_{k}, x_{E K F}^{n})$
19:: end for
20:: Output: $x^{n}$

2.2.3. Experimental Data Acquisition

The experiments were conducted from September 2024 to April 2025. To ensure the feasibility and accuracy of the algorithm proposed in this paper, we conducted experiments in a forest environment using a vehicle to collect data, as shown in Figure 4. The IMU sampling frequency was set to 200 Hz, and the GNSS frequency was set to 10 Hz. We collected data on attitude and position, and compared these with results from the GRU, Transformer and fixed-noise EKF. The output of a high-precision RTK system was used as the ground truth. The specific sensor parameters are listed in Table 1.

To ensure the reliability of our model’s results, and based on prior experience, we used 75% of the data for training and 25% for testing. In our experiments, we used the Adam optimizer with an initial learning rate of 0.0005, allowing for adaptive learning rate adjustment. All deep learning models were trained using the same parameters. The mean squared error (MSE) loss function was employed to evaluate the training performance. The selection of hyperparameters in this study was primarily guided by established findings within pertinent literature. Research by Tang et al. [22] indicated that while 128 neurons yielded optimal performance in their specific application context, we configured our model with 256 neurons. This decision was based on the characteristics and scale of our input data, aiming to capture more intricate data patterns. Concurrently, the number of filters in the convolutional layer was set to 64. Regarding the GRU layer architecture, studies by Qin et al. [34] demonstrated that a dual-layer GRU structure achieves superior performance, a configuration we subsequently adopted. Furthermore, for the attention mechanism, Xu et al. [35] reported that the performance variance between 8-head and 4-head attention was not significant. Given the stringent real-time processing demands of our system, a 4-head attention mechanism was selected to maintain robust performance while optimizing computational efficiency. Detailed parameter configurations are provided in Table 2. The trajectory of the training and testing datasets is shown in Figure 5.

To assess the performance of the proposed method, the following parameters were used to evaluate the model’s performance: root mean square error (RMSE), including velocity RMSE (VRMSE) and position RMSE (PRMSE). The definitions of these parameters are as follows:

P R M S E (p_{i}, {\hat{p}}_{i}) = \sqrt{\frac{\sum_{i = 1}^{N} {(p_{i} - {\hat{p}}_{i})}^{2}}{N}}

(30)

where N is the total number of samples,

\hat{p_{i}}

is the predicted position vector, and

p_{i}

is the RTK position vector.

V R M S E (v_{i}, {\hat{v}}_{i}) = \sqrt{\frac{\sum_{i = 1}^{N} {(v_{i} - {\hat{v}}_{i})}^{2}}{N}}

(31)

where

\hat{v_{i}}

is the predicted position vector, and

v_{i}

is the RTK position vector.

Additionally, the model’s positioning accuracy was evaluated using the 50% Circular Error Probable (CEP 50%) and 2D Root Mean Square (2DRMS), defined as:

2 D R M S = 2 \sqrt{σ_{x}^{2} + σ_{y}^{2}}

(32)

where

σ_{x}

and

σ_{y}

denote the standard deviation of coordinate points x and y. Because the data do not satisfy a Gaussian distribution, the errors on the two-dimensional plane are ranked, and the 50th percentile (median) is used as the CEP 50%.

To perform a hypothesis test on the results presented in Section 3.1.2, D’Agostino’s

K^{2}

test will first be employed to assess whether the sample data follow a normal distribution. The

K^{2}

statistic is defined as follows:

K^{2} = {(\frac{S}{\sqrt{\frac{6 (n - 2)}{(n + 1) (n + 3)}}})}^{2} + {(\frac{K - 3}{\sqrt{\frac{24 n (n - 2) (n - 3)}{{(n + 1)}^{2} (n + 3) (n + 5)}}})}^{2}

(33)

where S denotes the sample skewness, K denotes the sample kurtosis, and n is the sample size. Both the skewness and kurtosis are computed directly from the sample data.

In Section 3.3, we define a cycle as the time required for the model to complete one prediction and update step. The testing algorithm was implemented in Python 3.8.20, and the deep learning model was built with PyTorch 1.12.1. All results were obtained on an NVIDIA GeForce RTX 3050 8 GB platform. In this study, we employed Visual Studio Code 2023 and Origin 2022 for data analysis.

3. Results

3.1. Model Performance Evaluation and Validation

3.1.1. Three-Dimensional Plane Localization Results

The EKF noise optimization algorithm proposed in this paper primarily optimizes the process during the motion of the carrier. After processing the obtained data, the GRU-T network was used to correct

Q_{k}

, and the results are shown in Figure 6. Different algorithms are represented by different colored lines: the yellow line represents the traditional EKF algorithm; the purple line represents the AEKF algorithm; the green line represents the GRU model for predicting and correcting the process noise; the red line represents the prediction using the GRU-T model proposed in this paper; and the blue line represents the prediction using the Transformer model. RTK represents the high-precision positioning data, which is used as the ground truth in this study.

In the plane trajectory, the traditional EKF algorithm (yellow line) exhibits significant fluctuations in the trajectory due to the lack of effective correction of process noise, especially over long periods, where the error gradually accumulates. The AEKF, which adopts an adaptive mechanism, shows improved accuracy but still does not meet the precise navigation standard. In contrast, after using the GRU model to predict and correct the process noise, the fluctuations in the trajectory are reduced, but some error accumulation remains in long-duration trajectories. The trajectory corrected by the GRU-T model (red line) is much smoother, with significantly reduced error fluctuations, indicating that the GRU-T model has a distinct advantage in handling long-term dependencies and nonlinear modeling. While the Transformer model (blue line) effectively reduces error fluctuations, its precision is slightly lower than that of GRU-T in some parts of the trajectory, suggesting that relying solely on the Transformer model still has certain limitations.

In the comparison of altitude trajectories (with the results shown in Figure 7), the standard EKF algorithm also shows large error fluctuations, especially in the altitude direction, where drift was not effectively corrected. The GRU-T model effectively optimized the altitude trajectory, providing a smoother and more accurate trajectory. Table 3 compares the errors in the eastward, northward, and upward trajectories between each model and the ground truth, and Figure 8 shows the comparison of velocity errors. By comparing the positioning errors in the three directions, it is found that the traditional EKF algorithm has a noticeably larger localization error, while the GRU-T model shows a clear advantage in this regard.

As shown in Table 3, the traditional EKF algorithm in the forest environment accumulates sensor errors due to process noise, with PRMSE values of 1.12071 m, 2.71751 m, and 0.40537 m in the east, north, and upward directions, respectively. This indicates that the traditional EKF algorithm faces significant challenges when applied in complex environments such as forests, especially when signal loss or high noise interference occurs, causing severe degradation of positioning performance due to error accumulation. The AEKF shows some reduction in error, but it still has considerable deviation. The EKF algorithm optimized by the GRU or Transformer for process noise shows significant reduction in positioning errors in all directions. The hybrid model combining the advantages of the GRU and Transformer proposed in this paper exhibits superior performance in position prediction. This method not only inherits the ability of GRUs to handle long-term dependencies and dynamic adjustments in time-series data but also combines the advantages of Transformer in capturing sequence relationships and enhancing model generalization. In position prediction, the PRMSEs in the east, north, and upward directions are 0.57453 m, 1.61692 m, and 0.15572 m, respectively. Compared with the traditional EKF, the errors in the east, north, and upward directions are reduced by 48.74%, 41.94%, and 61.59%, respectively.

In speed prediction, the VRMSE values of the traditional EKF algorithm in the east, north, and upward directions are 0.36370 m/s, 0.25179 m/s, and 0.05676 m/s, respectively. As clearly seen in Figure 8, the traditional EKF model’s correction speed is slow, and it takes a long time to recover to a lower error state due to the accumulation of errors. The process noise causes relatively large errors. This is because the EKF algorithm assumes constant process and measurement noise, which often fails to adapt effectively to changes in dynamic environments, leading to increased speed prediction errors. In contrast, a GRU, as a recurrent neural network, is capable of handling long- and short-term dependencies in time-series data, so it can capture local changes well during speed prediction. However, since it mainly relies on historical information for prediction, it may still have large deviations. By its self-attention mechanism, the Transformer model can more effectively capture global dependencies in long time series, making it more accurate in speed prediction. As seen in Figure 8, between 0 and 50 s, the proposed GRU-T model emphasizes global features. Initially, the error is relatively large because the model has not fully captured the impact of process noise. However, as time progresses, especially after 100 s, the speed error significantly decreases. At this point, the VPMSE (velocity mean square error) values in the east, north, and upward directions are reduced to 0.10364 m/s, 0.15284 m/s, and 0.02470 m/s, respectively, which are 71.5%, 39.31%, and 56.48% lower than the traditional EKF model, improving the accuracy of speed prediction.

3.1.2. Hypothesis Testing

To further quantify and evaluate the performance gains of the proposed model, this study employs hypothesis testing as a rigorous statistical approach. Given that the median is robust to outliers and provides a more reliable reflection of substantive improvements in model performance, it is adopted as the representative measure of central tendency in the hypothesis testing process. Prior to selecting an appropriate hypothesis testing method, a preliminary analysis of the distribution characteristics of the sample data is necessary to determine whether it approximates a normal distribution. To this end, the D’Agostino’s

K^{2}

test is used to assess normality. According to statistical theory, a larger value of the

K^{2}

statistic indicates a greater deviation of the sample distribution from normality. Based on the PRMSE results obtained from each model’s predictions, the corresponding test outcomes are summarized in Table 4.

As shown clearly in Table 4, the calculated

K^{2}

values for all models significantly exceed the critical thresholds corresponding to a normal distribution under the respective degrees of freedom. This indicates that the sample data do not follow a normal distribution. Consequently, to evaluate the performance improvements between models, the Wilcoxon signed-rank test (a non-parametric statistical method) is employed to conduct pairwise comparisons of model performance.

Within this testing framework, for any pair of models (e.g., Model X versus the baseline Model Y), the null hypothesis (

H_{0}

) is defined as follows: Model X does not exhibit a statistically significant improvement over Model Y (i.e., there is no statistically significant difference in the medians of their performance metrics, or Model X is not superior to Model Y). The corresponding alternative hypothesis (

H_{1}

) posits that Model X performs significantly better than Model Y. A significance level of

α

= 0.01 is adopted in this study, implying that if the computed p-value is less than 0.01, there is sufficient statistical evidence to reject the null hypothesis

H_{0}

and accept the alternative hypothesis

H_{1}

. This would indicate that the performance improvement of Model X over Model Y is statistically significant. The test results are summarized in Table 5.

As shown in Table 5, all p-values are less than 0.01, indicating that the null hypothesis is rejected. This suggests that Model X demonstrates a statistically significant performance improvement over Model Y. Furthermore, based on the differences presented in Table 5 (defined as the PRMSE of Model X minus the PRMSE of Model Y), a negative value indicates that Model X outperforms Model Y. Accordingly, it can be concluded that GRU-T outperforms the models compared in this study.

3.2. Model Robustness Evaluation

3.2.1. Validation Under Planar GPS Signal Outage and IMU Noise Injection

To further evaluate the performance of the proposed model under conditions of IMU noise and GNSS signal loss, Figure 9 illustrates the model’s behavior in such a scenario. In Region I, both IMU and GNSS signals remain intact, and no interference is introduced. Upon entering Region II (about 7 s), the GNSS signal is deliberately interrupted, leading to an increase in the error across all models. Notably, the EKF and AEKF models exhibit the largest error growth, indicating a strong reliance on GNSS data. In contrast, the proposed GRU-T model demonstrates the smallest variation in error, maintaining accurate trajectory predictions and exhibiting superior navigation performance. In Region III (about 10 s), the GNSS signal is restored while noise is injected into the IMU data. As time progresses, the navigation errors increase, suggesting a degree of sensitivity to IMU disturbances. However, both the Transformer-based model and GRU-T show comparatively stronger robustness under these conditions. Finally, in Region IV, after the IMU noise is removed, all models show a stabilization in error, and their performances converge, confirming the models’ overall robustness. Overall, the GRU-T model consistently outperforms the others under both GNSS signal interruption and IMU noise disturbance, highlighting its suitability for complex and dynamic real-world navigation scenarios.

3.2.2. Positioning Error Analysis

Furthermore, to comprehensively evaluate the proposed algorithm’s performance in localization error measurement and distribution characteristics, we tested the error of the prediction results of different models. The test results are shown in Figure 10. From Figure 10, it can be seen that, in the forest environment with GNSS signal loss, all models exhibit significant localization drift. However, the model proposed in this paper shows relatively stable localization results, with error points concentrated near the origin, while other models exhibit larger error fluctuations. To further quantify the model’s localization accuracy, this study uses the 50% CEP and 2DRMS metrics to measure the localization error of each algorithm. The calculation results are shown in Table 6. From Table 6, it can be seen that the traditional EKF algorithm performs the worst in terms of CEP and 2DRMS metrics. This is because the localization data output by EKF accumulates significant noise and bias over time, especially when GNSS signals are lost, causing the error to expand rapidly and reducing localization accuracy. In contrast, the models optimized by GRU-T for process noise show improvements in both the CEP and 2DRMS metrics. This indicates that optimizing process noise using GRU-T models can effectively suppress the accumulation and fluctuation of sensor measurement process noise, significantly improving localization accuracy under signal loss conditions. Notably, the GRU-T model proposed in this paper achieves the minimum values in both the CEP and 2DRMS metrics, indicating that the GRU-T optimization algorithm has the most significant suppression effect on process noise and provides the best performance in all dimensions of localization error.

3.3. Real-Time Performance of the Model

In this section, we mainly investigate whether the real-time performance of the model proposed in this paper meets the requirements for practical applications. Studies show that EKF, compared to other nonlinear filtering optimization algorithms (such as UKF or PF), requires less computational power and can more easily meet practical application demands. The key difference between the traditional EKF and the GRU-T model proposed in this paper lies in the updating of the process noise. In our model, because GRU and Transformer deep learning models are introduced to improve prediction accuracy, a question arises: can the proposed method meet the real-time requirements of navigation systems? Therefore, we conducted a real-time performance test of the proposed method. During the test, the model ran for 50,000 cycles, and the average time and standard deviation were recorded.

In the computation of running time, the average time per cycle for the proposed GRU-T model was 4255

μ

s. As seen in Figure 11, its time is about 9–10 times longer than the traditional EKF model. Although the GRU-T model takes longer than the GRU model, it still meets the real-time sampling requirement of 200 Hz, as its running time is within the 5000 ms sampling period, leaving enough time for other processes. Therefore, although the computational overhead of GRU-T increases, it improves the ability to handle sensor errors and environmental uncertainties. This shows that GRU-T achieves a good balance between computational cost and performance improvement, providing a new solution for real-time navigation that combines accuracy and efficiency.

4. Discussion

4.1. Performance Evaluation of the Model

In this study, we evaluate the performance of the proposed GRU-T model. As illustrated in Figure 6 and Figure 7, which show the predicted trajectories in various directions, GRU-T demonstrates superior prediction accuracy compared to other methods such as EKF, AEKF, and GRU. Furthermore, the results in Table 3 indicate that GRU-T outperforms its counterparts in terms of two key performance metrics: PRMSE and VRMSE. These findings are further supported by hypothesis testing.

To further assess the robustness of the proposed model, we introduced disturbances into the IMU data and simulated GNSS signal outages. Figure 9 presents a comparative analysis of the prediction errors under these challenging conditions across five methods. The performance of EKF and AEKF significantly deteriorates in the presence of disturbances, primarily due to their exclusive reliance on sensor data. In contrast, deep learning-based models exhibit greater resilience, and GRU-T demonstrates a distinct advantage in this scenario. Nevertheless, the model still requires improvement in handling scenarios involving sensor faults, which can lead to degraded position prediction. Additionally, GRU-T remains somewhat sensitive to noise in the input data. Figure 11 presents the computational times for different models. Although GRU-T is capable of meeting the 200 Hz real-time sampling requirement, its computation time remains relatively high. This becomes especially critical as the dataset size increases; in environments with limited computational resources, real-time performance may be compromised.

Barak Or et al. [36] proposed the HB-AEKF method, which dynamically adjusts the system noise covariance matrix using a deep neural network, resulting in approximately a 25% improvement in positioning accuracy. Although HB-AEKF offers advantages in resource-constrained environments, its accuracy enhancement is still lower than that achieved by the GRU-T model. Li et al. [37] proposed a GNSS/INS fusion method incorporating LiDAR odometry, achieving an average localization error reduction of 1.6 m. However, in densely vegetated environments, the GRU-T-based integrated navigation method proposed in this paper delivers superior accuracy. This advantage stems from GRU-T’s capability in temporal modeling and feature extraction, enabling it to effectively extract valuable information from discontinuous or weak sensor signals and accurately predict target trajectories.

4.2. Limitations and Future Work

Although this study has made progress in improving the positioning accuracy of integrated navigation systems, several limitations remain. First, the robustness of the proposed model under highly dynamic or high-noise conditions still requires enhancement. Sensor noise continues to significantly affect the accuracy of positioning solutions. Moreover, the deep learning model introduces considerable computational complexity, placing higher demands on the processing and memory resources of embedded hardware platforms, which may limit its applicability in resource-constrained real-time scenarios. While the model does improve positioning accuracy, there remains a gap between its performance and the centimeter-level accuracy required for precision agriculture operations, indicating a need for further algorithmic refinement.

In terms of comparative validation, this study primarily selected EKF, AEKF, GRU, and Transformer networks as baseline models. It did not include a comprehensive horizontal comparison with emerging advanced estimation methods in the integrated navigation domain, such as Graph Neural Networks (GNNs) and Factor Graph Optimization. Additionally, the effectiveness of the model was validated mainly using a wheeled mobile platform; its adaptability and generalization across agricultural machines with different structures and motion characteristics (such as drones or legged robots) have not yet been systematically evaluated.

Looking forward, future research will focus on the following directions:

1.: We plan to introduce and integrate cutting-edge machine learning algorithms such as Graph Neural Networks (GNNs) to explore their potential in multi-sensor data fusion and complex dynamic modeling, with the aim of further enhancing the accuracy and robustness of state estimation;
2.: Given the strong environmental perception and autonomous localization capabilities of visual and LiDAR-based technologies—such as Visual Odometry (VO), Visual–Inertial Odometry (VIO), and LiDAR Odometry (LO)—particularly in complex, dynamic, and GNSS-denied or degraded environments (e.g., indoors, canyons, and dense urban areas), future work will focus on investigating tightly coupled or deeply fused strategies integrating these sensing modalities with GNSS/INS systems to develop a more adaptable and reliable multimodal navigation framework;
3.: The optimized navigation algorithm will be deployed onto various types of agricultural automation equipment (e.g., autonomous tractors, crop-spraying UAVs), with extensive testing in diverse real-world operational scenarios to thoroughly evaluate and enhance the model’s generalization capability and practical value.

Finally, the proposed GRU-T model enhances autonomous navigation and precision operation capabilities of agricultural machinery in GNSS-constrained environments such as dense orchards. It enables wheeled machines like tractors and sprayers to operate more reliably and accurately in complex orchard environments by effectively mitigating the signal interference caused by dense foliage. This capability is critical for precise path planning, autonomous obstacle avoidance, and in-field agricultural tasks.

5. Conclusions

This study proposes a GNSS/INS integrated navigation optimization method based on the GRU–Transformer hybrid model, particularly suitable for environments with unreliable GNSS signals, such as dense orchards and forests. This model utilizes deep learning technologies, including a GRU and Transformer, to dynamically adjust the process noise covariance matrix in EKF, addressing the limitations of traditional EKF methods in handling non-stationary noise and signal loss. Experimental results show that the proposed hybrid model significantly improves positioning accuracy. Compared to the traditional EKF method, its PRMSE in the east, north, and upward directions decreased by 48.74%, 41.94%, and 61.59%, respectively, and its VRMSE was reduced by 71.5%, 39.31%, and 56.48% in the east, north, and upward directions. The GRU–Transformer model effectively handles long-term dependencies and adapts to dynamic environments, showing significant performance improvements in GNSS signal degradation scenarios. The algorithm’s efficiency in real-time applications has also been validated, meeting the computational demands of high-frequency sampling rates while maintaining real-time operation and improving accuracy. This study provides an effective technical approach for high-precision autonomous navigation of intelligent agricultural equipment in complex operating environments such as densely planted orchards and under-forest areas. By optimizing multi-sensor data fusion through deep learning, it delivers reliable positioning information even when GNSS signals are limited, offering a feasible strategy for autonomous navigation, path optimization, and precision operations of agricultural machinery.

Author Contributions

Conceptualization, P.G. and J.F.; methodology, P.G.; software, P.G. and J.F.; validation, P.G., J.F., J.H. and G.W.; formal analysis, P.G.; investigation, J.F. and J.H.; data curation, J.F.; writing—original draft preparation, P.G.; writing—review and editing, P.G., J.F., J.H., S.M., G.W. and Z.L.; visualization, P.G. and J.F.; supervision, Z.L.; project administration, Z.L. and P.G.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Key Technologies R&D Program of Guangdong Province (2023B0202100001); National Natural Science Foundation of China (32271997); Guangzhou Key Research and Development Program(2024B03J1309); the earmarked fund for CARS (CARS–26) "Hundreds of Thousands Project (BQW)" Special Program of South China Agricultural University (BQWZX-2024-01-14).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The relevant information in this study is available at https://github.com/zzzzzhenzzzz/GRU-T-model.git (accessed on 22 May 2025). The datasets in this study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cremona, J.; Comelli, R.; Pire, T. Experimental evaluation of Visual-Inertial Odometry systems for arable farming. J. Field Robot. 2022, 39, 1121–1135. [Google Scholar] [CrossRef]
Moore, A.; Rymer, N.; Glover, J.S.; Ozturk, D. Predicting GPS Fidelity in Heavily Forested Areas. In Proceedings of the 2023 IEEE/ION Position, Location and Navigation Symposium (PLANS), Monterey, CA, USA, 24–27 April 2023; pp. 772–780. [Google Scholar]
Jiang, W.; Yu, Y.; Zong, K.; Cai, B.; Rizos, C.; Wang, J.; Liu, D.; Shangguan, W. A seamless train positioning system using a LiDAR-aided hybrid integration methodology. IEEE Trans. Veh. Technol. 2021, 70, 6371–6384. [Google Scholar] [CrossRef]
Kinnari, J.; Verdoja, F.; Kyrki, V. Season-invariant GNSS-denied visual localization for UAVs. IEEE Robot. Autom. Lett. 2022, 7, 10232–10239. [Google Scholar] [CrossRef]
Bossy, T.; Ciais, P.; Renaudineau, S.; Wan, L.; Ygorra, B.; Adam, E.; Barbier, N.; Bauters, M.; Delbart, N.; Frappart, F.; et al. State of the art and for remote sensing monitoring of carbon dynamics in African tropical forests. Front. Remote Sens. 2025, 6, 1532280. [Google Scholar] [CrossRef]
Liu, Y.; Fan, X.; Lv, C.; Wu, J.; Li, L.; Ding, D. An innovative information fusion method with adaptive Kalman filter for integrated INS/GPS navigation of autonomous vehicles. Mech. Syst. Signal Process. 2018, 100, 605–616. [Google Scholar] [CrossRef]
Liu, X.; Zhou, Q.; Chen, X.; Fan, L.; Cheng, C.T. Bias-error accumulation analysis for inertial navigation methods. IEEE Signal Process. Lett. 2021, 29, 299–303. [Google Scholar] [CrossRef]
Yang, H.; Li, W.; Luo, C.M. Fuzzy adaptive Kalman filter for indoor mobile target positioning with INS/WSN integrated method. J. Cent. South Univ. 2015, 22, 1324–1333. [Google Scholar] [CrossRef]
Yan, X.; Yang, X.; Feng, B.; Liu, W.; Ye, H.; Zhu, Z.; Shen, H.; Xiang, Z. A navigation accuracy compensation algorithm for low-cost unmanned surface vehicles based on models and event triggers. Control Eng. Pract. 2024, 146, 105896. [Google Scholar] [CrossRef]
Chen, B.; Gong, L.; Yu, C.; Du, X.; Chen, J.; Xie, S.; Le, X.; Li, Y.; Liu, C. Workspace decomposition based path planning for fruit-picking robot in complex greenhouse environment. Comput. Electron. Agric. 2023, 215, 108353. [Google Scholar] [CrossRef]
Nassar, S.; El-Sheimy, N. Wavelet analysis for improving INS and INS/DGPS navigation accuracy. J. Navig. 2005, 58, 119–134. [Google Scholar] [CrossRef]
Tomaštík, J.; Mokroš, M.; Surovỳ, P.; Grznárová, A.; Merganič, J. UAV RTK/PPK method—An optimal solution for mapping inaccessible forested areas? Remote Sens. 2019, 11, 721. [Google Scholar] [CrossRef]
Kaviani, S.; Salarieh, H.; Alasty, A.; Abediny, M. Comparison of nonlinear filtering techniques for inertial sensors error identification in INS/GPS integration. Sci. Iran. 2018, 25, 1281–1295. [Google Scholar] [CrossRef]
Hu, G.; Ni, L.; Gao, B.; Zhu, X.; Wang, W.; Zhong, Y. Model predictive based unscented Kalman filter for hypersonic vehicle navigation with INS/GNSS integration. IEEE Access 2019, 8, 4814–4823. [Google Scholar] [CrossRef]
Alaeiyan, H.; Mosavi, M.; Ayatollahi, A. Enhancing the integration of the GPS/INS during GPS outage using LWT-IncRGRU. Ain Shams Eng. J. 2024, 15, 102779. [Google Scholar] [CrossRef]
Cohen, N.; Klein, I. A-KIT: Adaptive Kalman-informed transformer. arXiv 2024, arXiv:2401.09987. [Google Scholar]
Fang, W.; Jiang, J.; Lu, S.; Gong, Y.; Tao, Y.; Tang, Y.; Yan, P.; Luo, H.; Liu, J. A LSTM algorithm estimating pseudo measurements for aiding INS during GNSS signal outages. Remote Sens. 2020, 12, 256. [Google Scholar] [CrossRef]
Wang, H.; Tang, F.; Wei, J.; Zhu, B.; Wang, Y.; Zhang, K. Online Semi-supervised Transformer for Resilient Vehicle GNSS/INS Navigation. IEEE Trans. Veh. Technol. 2024, 73, 16295–16311. [Google Scholar] [CrossRef]
Zheng, W.; Zheng, K.; Gao, L.; Zhangzhong, L.; Lan, R.; Xu, L.; Yu, J. GRU–transformer: A novel hybrid model for predicting soil moisture content in root zones. Agronomy 2024, 14, 432. [Google Scholar] [CrossRef]
Ruiz, A.R.J.; Granja, F.S.; Honorato, J.C.P.; Rosas, J.I.G. Accurate pedestrian indoor navigation by tightly coupling foot-mounted IMU and RFID measurements. IEEE Trans. Instrum. Meas. 2011, 61, 178–189. [Google Scholar] [CrossRef]
Fernandes, M.R.; Magalhães, G.M.; Zúñiga, Y.R.C.; do Val, J.B. GNSS/MEMS-INS integration for drone navigation using EKF on lie groups. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 7395–7408. [Google Scholar] [CrossRef]
Tang, Y.; Jiang, J.; Liu, J.; Yan, P.; Tao, Y.; Liu, J. A GRU and AKF-based hybrid algorithm for improving INS/GNSS navigation accuracy during GNSS outage. Remote Sens. 2022, 14, 752. [Google Scholar] [CrossRef]
Del Rosario, M.B.; Khamis, H.; Ngo, P.; Lovell, N.H.; Redmond, S.J. Computationally efficient adaptive error-state Kalman filter for attitude estimation. IEEE Sens. J. 2018, 18, 9332–9342. [Google Scholar] [CrossRef]
Ding, W.; Wang, J.; Rizos, C.; Kinlyside, D. Improving adaptive Kalman estimation in GPS/INS integration. J. Navig. 2007, 60, 517–529. [Google Scholar] [CrossRef]
Akhlaghi, S.; Zhou, N.; Huang, Z. Adaptive adjustment of noise covariance in Kalman filter for dynamic state estimation. In Proceedings of the 2017 IEEE Power & Energy Society General Meeting, Chicago, IL, USA, 16–20 July 2017; pp. 1–5. [Google Scholar]
Pang, S.; Zhang, B.; Lu, J.; Pan, R.; Wang, H.; Wang, Z.; Xu, S. Application of IMU/GPS Integrated Navigation System Based on Adaptive Unscented Kalman Filter Algorithm in 3D Positioning of Forest Rescue Personnel. Sensors 2024, 24, 5873. [Google Scholar] [CrossRef]
Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 2018, 8, 6085. [Google Scholar] [CrossRef]
Yang, S.; Yu, X.; Zhou, Y. Lstm and gru neural network performance comparison study: Taking yelp review dataset as an example. In Proceedings of the IEEE 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI), Shanghai, China, 12–14 June 2020; pp. 98–101. [Google Scholar]
Kuang, X.; Yan, B. A Novel FECAM-iTransformer Algorithm for Assisting INS/GNSS Navigation System during GNSS Outages. Appl. Sci. (2076-3417) 2024, 14, 8753. [Google Scholar] [CrossRef]
Linxuan, W.; Xiangwei, K.; Hongzhe, X.; Hong, L. INS-GNSS Integrated Navigation Algorithm Based on TransGAN. Intell. Autom. Soft Comput. 2023, 37, 91. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, S.; Zhang, P.; Li, B. GRU-and Transformer-based periodicity fusion network for traffic forecasting. Electronics 2023, 12, 4988. [Google Scholar] [CrossRef]
Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 15908–15919. [Google Scholar]
Shaw, P.; Uszkoreit, J.; Vaswani, A. Self-attention with relative position representations. arXiv 2018, arXiv:1803.02155. [Google Scholar]
Qin, Z.; Luo, Q.; Zang, Z.; Fu, H. Multimodal GRU with directed pairwise cross-modal attention for sentiment analysis. Sci. Rep. 2025, 15, 10112. [Google Scholar] [CrossRef] [PubMed]
Xu, H.; Luo, H.; Wu, Z.; Wu, F.; Bao, L.; Zhao, F. Towards predicting the measurement noise covariance with a transformer and residual denoising autoencoder for GNSS/INS tightly-coupled integrated navigation. Remote Sens. 2022, 14, 1691. [Google Scholar] [CrossRef]
Or, B.; Klein, I. A hybrid model and learning-based adaptive navigation filter. IEEE Trans. Instrum. Meas. 2022, 71, 1–11. [Google Scholar] [CrossRef]
Li, Y.; Feng, Q.; Ji, C.; Sun, J.; Sun, Y. GNSS and LiDAR Integrated Navigation Method in Orchards with Intermittent GNSS Dropout. Appl. Sci. 2024, 14, 3231. [Google Scholar] [CrossRef]

Figure 1. The framework of the GRU-T process. The red boxes represent the conventional ES-EKF estimation module. The yellow boxes represent the hybrid deep learning model combining GRU and Transformer, which is designed to dynamically adjust the process noise in real time to enhance estimation accuracy. The green boxes indicate the prediction component that forecasts future states based on the adjusted process noise and current state estimates.

Figure 2. GRU and encoder combined model diagram.

Figure 3. Complete GRU-T training architecture.

Figure 4. Experimental environment and platform.

Figure 5. Trajectories (a–g) represent the training set, and trajectories (h,i) represent the testing set.

Figure 6. Comparison of trajectories obtained by different models in the plane.

Figure 7. Comparison of trajectories obtained by different models in the height.

Figure 8. Speed error comparison in the East, North, and Up directions.

Figure 9. (a) Trajectories generated by different algorithms under GNSS signal loss and IMU noise injection; (b) time-series curves of ERROR for each model. Region I and Region IV represent periods where both IMU and GNSS signals are functioning normally. Region II corresponds to GNSS signal loss with normal IMU input, while Region III represents restored GNSS input with noise-contaminated IMU data.

Figure 10. Error distribution of longitude and latitude for each model.

Figure 11. Average time and standard deviation of each model’s training in one cycle.

Table 1. Main parameters of the sensors.

Sensor	Model	Sampling Frequency	Measurement Error
RTK	UM982	10 Hz	Horizontal: 0.8 cm + 1 ppm (RMS)
			Height: 1.5 cm + 1 ppm (RMS)
			Velocity: 0.03 m/s (RMS)
GNSS	M10050	1 Hz	Position: 2.0 m (CEP)
GNSS	M10050	1 Hz	Velocity: 0.1 m/s (RMS)
IMU	Accelerometer (JY901B)	200 Hz	0.01 g (RMS)
	Gyroscope (JY901B)		0.05 °/s (RMS)
	Barometer (JY901B)		0.5 m (RMS)

Table 2. The main parameters of the model.

Layers	Neurons per Layer	1D Conv Filters	Heads	Batch Size	Dropout	Optimizer
2	256	64	4	64	0.1	0.0005

Table 3. PRMSE and VRMSE performance comparison between various models and the proposed method.

Method	PRMSE (m)			PRMSE (m)
Direaction	E	N	U	E	N	U
EKF	1.12071	2.71751	0.40537	0.36370	0.25179	0.05676
AEKF	0.87772	2.12830	0.33187	0.32133	0.21301	0.05375
GRU	0.67711	1.61692	0.25923	0.23438	0.20061	0.05036
Transformer	0.66883	1.64184	0.33894	0.09657	0.16414	0.02561
GRU-T	0.57453	1.57754	0.15572	0.10364	0.15284	0.02470

Table 4. Prediction Results for Each Model.

Model	Median (m)	95% CI of Median (m)	$K^{2}$
EKF	2.6260	(2.604, 3.5630)	>100
AEKF	2.3172	(2.291, 2.3527)	>100
GRU	2.0850	(2.037, 2.1070)	>100
GRU-T	1.7870	(1.746, 1.8070)	>100
Transformer	2.1310	(2.117, 2.1410)	>100

Table 5. Comparison of Pairwise Model Performance.

Model X	Model Y	Median Difference (m)	95% CI of Median Difference (m)	p-Value
GRU-T	EKF	−0.5270	(−0.5310, −0.5240)	<0.01
GRU-T	AEKF	−0.4661	(−0.4744, −0.4469)	<0.01
GRU-T	GRU	−0.3420	(−0.3460, −0.3250)	<0.01
GRU-T	Transformer	−0.1630	(−0.1750, −0.1450)	<0.01
Transformer	EKF	−0.3270	(−0.3480, −0.2880)	<0.01
Transformer	AEKF	−0.2363	(−0.2475, −0.2138)	<0.01
Transformer	GRU	−0.0550	(−0.1020, −0.0200)	<0.01
GRU	EKF	−0.0710	(−0.0730, −0.0700)	<0.01
GRU	AEKF	−0.0392	(−0.0401, −0.0379)	<0.01
AEKF	EKF	−0.0320	(−0.0330, −0.0309)	<0.01

Table 6. 50% CEP and 2DRMS results.

Metric	EKF (m)	AEKF (m)	GRU (m)	Transformer (m)	GRU-T (m)
50% CEP	0.4933	0.4438	0.3690	0.2169	0.0790
2DEMS	3.3280	2.7082	1.9171	1.6666	1.6361

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, P.; Fang, J.; He, J.; Ma, S.; Wen, G.; Li, Z. GRU–Transformer Hybrid Model for GNSS/INS Integration in Orchard Environments. Agriculture 2025, 15, 1135. https://doi.org/10.3390/agriculture15111135

AMA Style

Gao P, Fang J, He J, Ma S, Wen G, Li Z. GRU–Transformer Hybrid Model for GNSS/INS Integration in Orchard Environments. Agriculture. 2025; 15(11):1135. https://doi.org/10.3390/agriculture15111135

Chicago/Turabian Style

Gao, Peng, Jinzhen Fang, Junlin He, Shuang Ma, Guanghua Wen, and Zhen Li. 2025. "GRU–Transformer Hybrid Model for GNSS/INS Integration in Orchard Environments" Agriculture 15, no. 11: 1135. https://doi.org/10.3390/agriculture15111135

APA Style

Gao, P., Fang, J., He, J., Ma, S., Wen, G., & Li, Z. (2025). GRU–Transformer Hybrid Model for GNSS/INS Integration in Orchard Environments. Agriculture, 15(11), 1135. https://doi.org/10.3390/agriculture15111135

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GRU–Transformer Hybrid Model for GNSS/INS Integration in Orchard Environments

Abstract

1. Introduction

2. Materials and Methods

2.1. EKF Algorithm for Integrated Navigation

2.1.1. INS/GNSS Kinematic Model

2.1.2. Conventional Error-State Extended Kalman Filter

2.1.3. Adaptive Extended Kalman Filter

2.2. Methodology

2.2.1. GRU-T Encoder Layer

2.2.2. The Core of the Model

2.2.3. Experimental Data Acquisition

3. Results

3.1. Model Performance Evaluation and Validation

3.1.1. Three-Dimensional Plane Localization Results

3.1.2. Hypothesis Testing

3.2. Model Robustness Evaluation

3.2.1. Validation Under Planar GPS Signal Outage and IMU Noise Injection

3.2.2. Positioning Error Analysis

3.3. Real-Time Performance of the Model

4. Discussion

4.1. Performance Evaluation of the Model

4.2. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI