A Quaternion Gated Recurrent Unit Neural Network for Sensor Fusion

: Recurrent Neural Networks (RNNs) are known for their ability to learn relationships within temporal sequences. Gated Recurrent Unit (GRU) networks have found use in challenging time-dependent applications such as Natural Language Processing (NLP), ﬁnancial analysis and sensor fusion due to their capability to cope with the vanishing gradient problem. GRUs are also known to be more computationally efﬁcient than their variant, the Long Short-Term Memory neural network (LSTM), due to their less complex structure and as such, are more suitable for applications requiring more efﬁcient management of computational resources. Many of such applications require a stronger mapping of their features to further enhance the prediction accuracy. A novel Quaternion Gated Recurrent Unit (QGRU) is proposed in this paper, which leverages the internal and external dependencies within the quaternion algebra to map correlations within and across multidimensional features. The QGRU can be used to efﬁciently capture the inter-and intra-dependencies within multidimensional features unlike the GRU, which only captures the dependencies within the sequence. Furthermore, the performance of the proposed method is evaluated on a sensor fusion problem involving navigation in Global Navigation Satellite System (GNSS) deprived environments as well as a human activity recognition problem. The results obtained show that the QGRU produces competitive results with almost 3.7 times fewer parameters compared to the GRU. The QGRU code is available at.


Introduction
The success of Recurrent Neural Networks (RNNs) on sequentially-based problems has been emphasized in applications such as natural language processing, financial analysis and signal processing problems [1][2][3][4][5].Other researchers have demonstrated the excellent performance of RNNs on various time series problems such as on electronic health records [6], classifications of acoustic scenes [7], cyber-security [8], human activity recognition [9,10], and vehicular localisation [11][12][13][14][15].Although RNNs were formulated to model time-dependent relationships within basic sequential problems [16], real-world problems are often multi-dimensional and thus require a dedicated approach towards modelling the relations inherent in the data [17].Matsui et al. in [18], showed the existence of local relations within the elements of multi-dimensional data.Real-valued methods such as the RNNs, however, approach the multidimensional elements as independent entities within the input vector, where local relations are considered in the same way as global dependencies [17].
Another challenge commonly faced in machine learning is the efficient computation of the representations of large data within the hidden dimensions.It is important for a good model to encode local relations efficiently within the input features, such as the relations between the red, green and blue channels of a pixel as explored in [18,19], and structural relations across pixels such as edges or shapes.Such efficient representations lead to a significant reduction in the number of neural parameters needed to facilitate the learning process, with also naturally minimised occurrences of overfitting within the model [17].
Quaternions are a number system characterised by one real and three imaginary components that form their hypercomplex structure.Their composition lends them the ability to represent and manipulate features uniquely, thus enabling efficient learning within and across multidimensional input features through the exploitation of the Hamilton product during quaternion algebraic operations [20][21][22].Several quaternion-based learning algorithms have been proposed by researchers.Parcollet et al. [23] studied the success of quaternion Convolutional Neural Network (CNN) by investigating the influence of the Hamilton product on colour image reconstruction from gray-scale images.Moya-Sanchez et al. proposed a bio-inspired quaternion local phase CNN layer, offering the possibility of capturing rotational linear response and contrast invariance in image classification as well as faster learning image rotations than a regular convolution layer [24].Chen et al. [25] studied the use of quaternion-embedded capsule network model for knowledge graph completion.Ozcan et al. proposed a quaternion capsule network in [26], Grassucci et al. proposed a quaternion-valued variational autoencoder in [27], and Nguyen et al. proposed a quaternion graph neural network in [28].Parcollet et al. used a quaternionbased RNN and LSTM (Long Short-Term Memory) on a challenging natural language processing task [20].A bidirectional quaternion LSTM recurrent neural network was explored by Parcollet et al. for speech recognition in [29].However, the Gated Recurrent Unit (GRU) network, a variant of the LSTM is characterised by a less complex structure, making it computationally more efficient compared to the LSTM and justifying its suitability for computationally demanding applications.
A novel Quaternion Gated Recurrent Unit (QGRU) is thus proposed in this paper to leverage the internal and external dependencies within the quaternion algebra in order to map correlations within and across multidimensional features using fewer parameters within the hidden dimensional space.The QGRU is proposed as an improvement on the GRU to better address sensor fusion applications, as it can be used to efficiently capture the inter and intra dependencies within multidimensional features unlike the Gated Recurrent Unit (GRU).The performance of the quaternion formulation of the GRU is investigated comparatively to the GRU on a complex task involving the navigation of autonomous vehicles in challenging environments problems, as addressed in [16,30], and a human activity recognition classification task, as addressed in [31], with the use of time-based signals rather than the frequency transformed signals as used in [31].
The rest of the paper is structured as follows: Section 2 presents a brief literature review on Quaternion Neural Networks, then, in Section 3, we discuss the formulation of the proposed QGRU network, Section 4 presents some experimentation of the QGRU on a challenging vehicular localisation problem as well as a Human Activity Recognition (HAR) task, and it also details the employed datasets.The results obtained on the performance analysis evaluation of the QGRU and GRU are discussed in Section 5, and finally, the paper is concluded in Section 6.

Previous Work on Quaternion Neural Networks
In the past decade, the field of complex-valued neural networks has been actively researched, but with limited influence until its recent application to RNNs.Studies show that complex-valued neural networks have better generalisation capabilities [32] and are easier to optimise [33].Quaternion neural networks were proposed where the inputs and bias vectors, as well as the weight matrices, are quaternion-based.The quaternion-valued vanilla RNN and LSTM were shown to provide improved accuracy with a significantly reduced number of parameters on speech recognition tasks compared to their real-valued counterparts [20].Several researchers have proposed several quaternion-based learning algorithms with applications to various challenging problems [19][20][21][22].Cui et al. [34] ap-plied the quaternion neural network to the inverse kinematics of a robot manipulator.Luo et al. [35] compressed colour images using quaternion neural network principal component analysis.Greenblatt et al. in [36] applied quaternion neural networks to prostate cancer Gleason grading.Shang and Hiros [37], proposed a quaternion neural-networkbased PolSAR for land classification in Pointcare-sphere-parameter space.Parcollet et al. studied the applications of a deep quaternion neural network to speech recognition [38,39].Gaudet and Maidat [39], and Parcollet et al. [40] investigated the use of quaternion convolution networks for image processing on the CIFAR and KITTI datasets and an end-to-end automatic speech recognition problem respectively.Pavllo et al. modelled human motion using quarternion-based neural networks [40].A quaternion convolutional neural network was used by Comminiello et al. to detect and localise 3D sound events in [41].Zhu et al. proposed a quaternion convolutional neural network for colour image classification and denoising tasks [42].Tay et al. explored the use of quaternion networks for lightweight and efficient neural natural language processing in [43].Parcollet et al. investigated the use of quaternion-valued convolutional and recurrent neural networks on speech recognition in [44].Parcollet et al. studied the use of quaternion neural networks for theme identification of telephone conversations in [45].Tran et al. proposed a quaternion-based self-attentive long short-term user preference encoding for recommendation in [46].The localisation of colour image splicing by using a full quaternion convolutional network was explored by Chen et al. in [47].A deformable quarternion Gabor convolutional neural network for recognition of colour facial expression was proposed by Jin et al. in [48].Qiu et al. studied the use of quaternion neural networks for multi-channel distant speech recognition in [49].A hate speech classification model using multi-modal fusion architecture was proposed by Kumar et al. in [50].However, the quaternion formulation is yet to be extended to the GRU and could find use in computationally constrained sensor fusion applications.

Proposed Quaternion Gated Recurrent Unit
This section presents a novel quaternion formulation of the GRU, which formulates the input and bias vectors as well as the weight matrices as quaternions and replaces some of the multiplicative product operators of the GRU with the Hadamard product.The weight initialisation, gated operations and backward propagation mechanism of the QGRU are discussed in this section.

Real-Valued GRU
The GRU, which was introduced by Cho et al. in 2014 [51], addresses the vanishing gradient problem of the RNN giving it the opportunity to learn long-term dependencies.The cellular operation is characterised by the combination of the input gate and the update gate into a single "update gate".The hidden state and the cell state are also merged to provide a more computationally efficient model compared to the LSTM.The update and reset gate in the GRU operate to tackle the vanishing gradient problem by deciding what information should be passed to the output, thus removing information that is not relevant to the prediction.
The update gate functions to determine the amount of the previous information to be passed along to the future, while the reset gate controls how much of the previous information to forget.Memory content is introduced to store relevant information from the past using the reset gate.The operation of the gates of the GRU are governed by Equations ( 1)- (4).
update gate : reset gate : current memory state : , f inal memory : where * is the Hadamard product, h t−1 is the previous state, W z , W r and W h are the weight matrices of the update gate, reset gate and current memory state, respectively, U z , U r and U h are the hidden weight matrices of the update gate, reset gate and current memory state respectively, b z , b r and b h are the bias vectors of the update gate, reset gate and current memory state, respectively, x t is the input feature vector and σ is the sigmoid activation (non-linear) function.Figure 1 shows the GRU's cell structure.
:   =   *  − + ( −   ) *  ̀ (4) where * is the Hadamard product, ℎ −1 is the previous state,   ,   and  ℎ are the weight matrices of the update gate, reset gate and current memory state, respectively,   ,   and  ℎ are the hidden weight matrices of the update gate, reset gate and current memory state respectively,   ,   and  ℎ are the bias vectors of the update gate, reset gate and current memory state, respectively,   is the input feature vector and  is the sigmoid activation (non-linear) function.Figure 1 shows the GRU's cell structure.

Quaternion Algebraic Representation and Operations
A quaternion is a four-element vector in the class of hypercomplex numbers composed of a real part and three imaginary parts defined in a four-dimensional space, as expressed in Equation (5).
Quaternions are further characterised by their ability to satisfy the identities (Hamilton rules) expressed in Equations (6-7), establishing their non-commutativity: The conjugate of the quaternion is expressed as: The normalised quaternion is expressed as: The Hamilton product of two quaternions can be expressed as:

Quaternion-valued Gated Recurrent Unit
A fully connected QGRU has its input, weights, bias and output parameters represented as quaternions.Each variable is broken down into four dimensions representing

Quaternion Algebraic Representation and Operations
A quaternion is a four-element vector in the class of hypercomplex numbers composed of a real part and three imaginary parts defined in a four-dimensional space, as expressed in Equation ( 5). x where x (r) , x (i) , x (j) and x (k) are explicit real numbers, x Q is the quaternion-valued input and i, j and k are the quaternion bases.Quaternions are further characterised by their ability to satisfy the identities (Hamilton rules) expressed in Equations ( 6) and ( 7), establishing their non-commutativity: The conjugate of the quaternion is expressed as: The normalised quaternion is expressed as: The Hamilton product of two quaternions can be expressed as:

Quaternion-Valued Gated Recurrent Unit
A fully connected QGRU has its input, weights, bias and output parameters represented as quaternions.Each variable is broken down into four dimensions representing the four elements of a quaternion x Q = x (r) + x (i) i + x (j) j + x (k) k.Furthermore, the multiplication operator governing the product of the input vector and the weight matrix composed of real-valued elements is replaced by the Hadamard product, as principled by Equation (10).Just like in real-valued layers computations, the fully connected quaternion layers are formulated as matrix multiplications.A sample multiplication is shown in Equation (11).

Weight Initialisation
A successfully trained neural network is dependent on a properly designed weight initialization method.Proper initialisation of the weight parameters is key to the performance of the network, leading to a reduced risk of the vanishing and explosion gradient and an improved convergence.Due to the unique interactions between the weight parameters of a quaternion neural network, a quaternion-valued weight initialisation algorithm used in [20] is used as shown in Equation ( 12) where w r , w i , w j and w k are the real and imaginary components of the initialised weights.
where ϕ is sampled between −σ and σ, and σ is established according to the Glorot criterion [35] such that σ = , with n in and n out as the number of neurons at the input and output layers; ẇ(i) , ẇ(j) and ẇ(k) are the imaginary elements of a normalised imaginary quaternion ẇQ as shown in Equations ( 13)-( 16), with the imaginary elements of the base quaternion randomly chosen from a real number between 0 and 1; θ is generated as a random value within −π and π.

Gated Operations
The operations of the gates of the QGRU are governed by Equations ( 17)- (20).The structure of the QGRU cell is illustrated in Figure 2. The structure of the QGRU remains similar to the GRU, however, the input and output to each cell gate are quaternion-based.
reset gate : r q,t = σ W q,r ⊗ x q,t + U q,r ⊗ h q,t−1 current memory state : , f inal memory : h q,t = z q,t * h q,t−1 In the above equations, * is the Hadamard product; ℎ ,−1 is the previous quaternionic state; W q,z , W q,r and W q,h are the quaternion weight matrices of the update gate, reset gate and current memory state, respectively; U q,z , U q,r and U q,h are the hidden weight matrices of the update gate, reset gate and current memory state, respectively;b q,z , b q,r and b q,h are the bias vectors of the update gate, reset gate and current memory state, respectively; and x q,t is the quaternionised input features vector.

Quaternion Backward Propagation Through Time
The quaternion back-propagation mechanism is adapted from [21].For each weight matrix, the gradient of the loss   with respect to each weight matrix is expressed as shown in Equations ( 21)- (24), where ∆    is the quaternionic representation of the output weight update.
Hidden weights: Input weights: Output weights: The gradients can thus be generalised to ∆  =

𝜕𝑤 𝑘 𝑘
The computation of the loss with respect to each element of the quaternion parameters of the network is done through the application of the chain rule and updated as shown below in Equations ( 25)- (28).
Hidden weights: In the above equations, * is the Hadamard product; h q,t−1 is the previous quaternionic state; W q,z , W q,r and W q,h are the quaternion weight matrices of the update gate, reset gate and current memory state, respectively; U q,z , U q,r and U q,h are the hidden weight matrices of the update gate, reset gate and current memory state, respectively;b q,z , b q,r and b q,h are the bias vectors of the update gate, reset gate and current memory state, respectively; and x q,t is the quaternionised input features vector.

Quaternion Backward Propagation through Time
The quaternion back-propagation mechanism is adapted from [21].For each weight matrix, the gradient of the loss e t with respect to each weight matrix is expressed as shown in Equations ( 21)- (24), where ∆ t w qy is the quaternionic representation of the output weight update.
Hidden weights: Input weights: Output weights: Bias: The gradients can thus be generalised to ∆ t = ∂e t ∂w q where: ∂e t ∂w q = ∂e t ∂w r + ∂e t ∂w i i + ∂e t ∂w j j + ∂e t ∂w k k.The computation of the loss with respect to each element of the quaternion parameters of the network is done through the application of the chain rule and updated as shown below in Equations ( 25)- (28).
Hidden weights: Input weights: Output weights: Bias: where ∆ t U, q , ∆ t w q and ∆ t b, q are the generalised forms of the quaternion representations of the hidden weight, input weight and bias update, λ is the learning rate and U q , w q , w qy and b q are the generalised forms of the quaternionic hidden weight matrices, input weight matrices, output weight matrices and bias vectors.

QGRU Experiments on Sensor Fusion Applications
This section presents some experiments on evaluating the performance of the QGRU on two sensor fusion applications: the Vehicular Localisation problem in Section 4.1, and the HAR problem in Section 4.2.

Vehicular Localisation Using Wheel Encoders
The continuous and accurate positioning of autonomous vehicles, road-wise and lane-wise, is critical to their safe performance [52].In urban canyons, under bridges, tunnels, etc., the visibility of Global Navigation Satellite System (GNSS) is obstructed.Inertial Navigations Systems (INS) and wheel odometers are amongst systems that can be integrated with the GNSS to improve road localisation during GNSS outages.In [30], the wheel encoder was investigated as a replacement to the accelerometer of the INS in tracking the vehicle displacement in challenging GNSS environments, such as Hard-Brake (HB), Wet Road (WR), Successive Left and Right turns and sharp cornering (SLR) [15].However, the accuracy of the position estimation from the wheel encoder's measurement is affected by factors such as changes in tyre size and wheel slippage.A smaller tyre diameter leads to an under estimation of the vehicle's displacement and vice versa [32].These uncertainties lead to poor positioning of the vehicles over time as they are cascaded unboundedly during navigation.
Due to the safety-critical nature of this problem, there is however the need to minimize the error drift, thus offering a reliable positioning solution.As such, a localisation solution capable of strongly mapping the features of the motion dynamics to enhance the prediction accuracy of positioning algorithms is needed.The mathematical model of the wheel encoder-based localisation problem is presented in Equations ( 29)-(36).
The rear left and right wheel's angular velocity (wheel speed) measurements from the wheel encoders are represented as ωb The calculation of the angular velocity of the rear axle is shown in Equations ( 30)-(31) obtained from the average of the rear left and right wheel measurements.
Using v = wr, the vehicle's linear velocity can be found, with r defined as a constant mapping the speed of the wheel to the vehicle's displacement: Taking ε b whr r as The vehicle's displacement can thus be found through the integration of the vehicle's velocity from Equation ( 34 from Equation (34). Here The vehicle's true displacement is represented as x b GNSS and calculated according to [53] using the Vincenty's formula for geodesics on an ellipsoid based on the latitudinal and longitudinal information of the vehicle position [53,54].
The focus is on learning to estimate ε b whr,x to correct x b whr .All analysis are done in the body frame as described in [15].

Dataset
The Inertial Odometry Vehicle Navigation Benchmark Dataset (IO-VNBD) [55] is used in the experimentation.The dataset consists of about 98 h of driving data collected over about 5700 km of travel on different driving scenarios.The dataset describes a variant of vehicle motion dynamics using information from sensors such as accelerometers, wheel encoders, gyroscopes, GPS receivers, etc.Although the dataset is collected with a sampling interval of 10 Hz, we down-sampled to a frequency of 1 Hz, as in [30].The dataset is publicly available at https://github.com/onyekpeu/IO-VNBD(accessed on 30 December 2020) and described in [55].The training datasets used from the IO-VNBD are The test datasets used are as shown in Table 1.

Challenging Scenarios IO-VNB Data Subset
Hard Brake (HB) Sharp Cornering and Successive Left and Right Turns (SLR) The performance of the QGRU in comparison to the GRU on the localisation problem is evaluated using the maximum CRSE (Cumulative Root Squared Error) metric adopted in [16].The CRSE is defined as the cumulative root squared of the error estimation of each second for the total duration of the GNSS outage (defined as 10 s).The maximum CRSE from all 10 s length test sequences in each challenging scenario are compared.The CRSE equation is as shown in Equation (37).
where N t is GNSS outage length of 10 s, t is the sampling period and e pred is the uncertainty (error) prediction.

Quaternion Features
All input signals are reconstructed by down-sampling the original signals from 10 Hz to 1 Hz and restructured using a sliding window length of 4 per each input signal.The quaternion input feature x Q,t is described in Equation (38).
where v1, v2, v3 and v4 refer to the wheel speed information at times t, t − 1, t − 2 and t − 3, respectively.At any time t, the quaternion input feature X Q,t is composed of X Q,1 , X Q,2 , X Q,3 and X Q,4 as shown in the unrolled architecture of the QGRU in Figure 3. X Q,1 , X Q,2 , X Q,3 and X Q,4 denote the quaternion inputs at each time step and are defined below such that at time t: X Q,4 = x t−3 + x t−4 i + x t−5 j + x t−6 k (42)   As the performance of the QGRU is compared to the GRU in this work, the training process for both the QGRU and GRU are discussed below.
The QGRU training process is done with a single hidden layer with a batch size of 1024 and a recurrent dropout rate of 0.005 applied according to [56].The model optimization was done using Adamax with an initial learning rate of 0.001.The objective function used is the mean absolute error loss function.
The GRU's training process is also done using a single hidden layer with a batch size of 1024, a recurrent dropout rate of 0.25 and a timestep of 4. The Adamax optimizer is used to optimize the model with an initial learning rate of 0.004.The mean absolute error At time t + 1 : where x is the wheel speed measurement: ω b whrr and ω b whrl that are fed as X Q,t into the neural network to learn the target ε b whr,x .As the performance of the QGRU is compared to the GRU in this work, the training process for both the QGRU and GRU are discussed below.
The QGRU training process is done with a single hidden layer with a batch size of 1024 and a recurrent dropout rate of 0.005 applied according to [56].The model optimization was done using Adamax with an initial learning rate of 0.001.The objective function used is the mean absolute error loss function.
The GRU's training process is also done using a single hidden layer with a batch size of 1024, a recurrent dropout rate of 0.25 and a timestep of 4. The Adamax optimizer is used to optimize the model with an initial learning rate of 0.004.The mean absolute error loss function is also used as the objective function.All input to the QGRU and GRU are normalised to values between 0 and 1.
A varying number of neurons from 4 to 256 are used to compare the performance of the QGRU to the GRU.

Human Activity Recognition
The identification of different activities performed by humans from sensor data records is an active research topic.Wearable devices, such as smartphones and bracelets, are used to record the actions carried out by humans whilst performing activities such as walking, running, standing, sitting, etc. Information on these activities are used to support domains such as healthcare, home automation and fitness.The challenge, however, lies in the management of the huge amount of information obtained from an array of several sensors as well as their temporal relationships and the lack of knowledge on how to relate the information recorded to the defined activities.

Dataset
The UCI HAR dataset is the second dataset used in our experiments.The dataset, described in [31], is stored in the UCI Machine Learning Repository at http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones.(accessed on 30 December 2020).The dataset contains information from waist-mounted smartphone sensors, such as the accelerometer and gyroscope at a sampling frequency of 50 Hz.Unlike the IO-VNB Dataset, the signals were pre-processed for noise reduction with a median filter and a 3rd order low-pass Butterworth filter using a cut-off frequency of 20 Hz.The HAR dataset captures static human activities, such as standing, sitting and laying down as well as dynamic human activities, such as walking, walking upstairs and walking downstairs.The training set consists of 70% random samples from the original dataset, while the test set is made up of the remaining 30% of the dataset as used in [31].

Quaternion Features
The shape of the HAR signal is also ordered by time and sampled in sliding windows of 2.56 s (length of 128) and 50% overlap between them.The quaternion input feature at time t denoted as X Q,t is as described in Equation (47).
where v1, v2, v3 and v4 refer to each element entry of the quarter divisions of the signal as shown in Equations ( 48)- (51).As such, X Q,t is made up of X Q,1 , X Q,2 , X Q,3 , . . . .X Q,32 as shown in Figure 4 where X Q,1 , X Q,2 , X Q,3 , . . . .X Q,32 also denote the quaternion input at each time step and are as defined below.The training process of the QGRU is done with a single hidden layer, 300 epochs and a batch size of 1280.The model is optimized using the Adamax optimizer with an initial learning rate of 0.005.The objective function chosen is the mean square error loss function with a dropout rate of 0.005.However, the GRU is trained with a batch size of 4, time step of 128, epoch length of 100, an initial learning rate of 0.002, a categorical cross-entropy loss function, a Stochastic Gradient descent model optimiser and a recurrent dropout rate of 0.25.The neural networks are trained to accurately classify the activity of the human, i.e. standing, walking, laying down, sitting, walking upstairs and walking downstairs.Similarly to the localisation experiment, the performance of the QGRU and the GRU are compared using a varying number of neurons ranging from 4 to 256.

Results and Discussion
In this section, the performance of the QGRU and GRU are evaluated on the vehicular localisation problem (regression task) as well as the HAR problem (classification task) described above.

Challenging Vehicular Localisation Task
The results from the vehicle localisation experiments are presented in Table 2.The performance of the QGRU is compared to the GRU and the physical model (the directly integrated information from the wheel encoder) in estimating the positioning error (uncertainties)  ℎ,  needed for the correction of the vehicle's positioning information.The evaluation is done on three challenging scenarios for vehicular positioning in GNSS deprived environments: Hard Brake scenario (HB), sharp cornering and Successive Left and Right turn scenario (SLR), and the Wet Road scenario (WR).With the task of finding the model capable of accurately estimating the positioning uncertainties in each scenario considered, the error in accurately estimating this uncertainty from the QGRU and GRU in comparison to the original uncertainty from the physical model  ℎ,  are reported in Table 2.In the hard brake scenario, the QGRU provided the least estimation error of 2.86 m, compared to the GRU's estimation error of 3.15 m and the initial physical model's uncertainty of 7.31 m.The results from the successive left and right turn and sharp cornering scenario shows that the QGRU also offers the least error in estimating the positioning uncertainty, with an error of 1.24 m compared to the GRU's estimation error of 1.31 m and the original uncertainty of the physical model of 5.08 m.The QGRU performs similarly in the wet road scenario, with the least uncertainty estimation error of 2.09 m compared to 2.36 of the GRU and the physical model's original uncertainty of 4.01 m.The results highlight the QGRU providing an improvement over the GRU of 9.2% in the HB scenario, 5.3% in the SLR scenario and 11.4% in the WR scenario.The results so obtained are in line with At every time t: X Q,2 = x T2 + x T34 i + x T66 j + x T98 k (49) where T1, T2, T3 . . ..andTn refer to the first, second, third and nth element entry of the signal and x is an input signal (one of the 9 signals): 3-axis linear acceleration, 3-axis angular velocity and 3-axis jerk information.
The training process of the QGRU is done with a single hidden layer, 300 epochs and a batch size of 1280.The model is optimized using the Adamax optimizer with an initial learning rate of 0.005.The objective function chosen is the mean square error loss function with a dropout rate of 0.005.However, the GRU is trained with a batch size of 4, time step of 128, epoch length of 100, an initial learning rate of 0.002, a categorical cross-entropy loss function, a Stochastic Gradient descent model optimiser and a recurrent dropout rate of 0.25.The neural networks are trained to accurately classify the activity of the human, i.e. standing, walking, laying down, sitting, walking upstairs and walking downstairs.Similarly to the localisation experiment, the performance of the QGRU and the GRU are compared using a varying number of neurons ranging from 4 to 256.

Results and Discussion
In this section, the performance of the QGRU and GRU are evaluated on the vehicular localisation problem (regression task) as well as the HAR problem (classification task) described above.

Challenging Vehicular Localisation Task
The results from the vehicle localisation experiments are presented in Table 2.The performance of the QGRU is compared to the GRU and the physical model (the directly integrated information from the wheel encoder) in estimating the positioning error (uncertainties) ε b whr,x needed for the correction of the vehicle's positioning information.The evaluation is done on three challenging scenarios for vehicular positioning in GNSS deprived environments: Hard Brake scenario (HB), sharp cornering and Successive Left and Right turn scenario (SLR), and the Wet Road scenario (WR).With the task of finding the model capable of accurately estimating the positioning uncertainties in each scenario considered, the error in accurately estimating this uncertainty from the QGRU and GRU in comparison to the original uncertainty from the physical model ε b whr,x are reported in Table 2.In the hard brake scenario, the QGRU provided the least estimation error of 2.86 m, compared to the GRU's estimation error of 3.15 m and the initial physical model's uncertainty of 9.99 m.The results from the successive left and right turn and sharp cornering scenario shows that the QGRU also offers the least error in estimating the positioning uncertainty, with an error of 1.24 m compared to the GRU's estimation error of 1.31 m and the original uncertainty of the physical model of 8.19 m.The QGRU performs similarly in the wet road scenario, with the least uncertainty estimation error of 2.09 m compared to 2.36 of the GRU and the physical model's original uncertainty of 5.36 m.The results highlight the QGRU providing an improvement over the GRU of 9.2% in the HB scenario, 5.3% in the SLR scenario and 11.4% in the WR scenario.The results so obtained are in line with those presented in [30].Remarkably, despite the QGRU providing better estimates compared to the GRU, it does so with fewer of trainable parameters.For instance, in the HB scenario, the QGRU provides better estimates with 3809 parameters compared to 13,121 parameters with the GRU, as shown in Table 3.While in the SLR scenario, the QGRU provided the best estimation with 1137 parameters compared to 3489 parameters of the GRU.Additionally, in the WR scenario, the QGRU estimated the position uncertainty best with 13,761 parameters compared to 50,817 parameters of the GRU.

Human Activity Recognition (HAR) Task
The performance of the QGRU and GRU on the HAR task across different weighted connections are reported in Table 4.Both neural networks are tasked with accurately classifying the human activities in the HAR dataset, i.e. standing, walking, laying down, sitting, walking upstairs and walking downstairs.The QGRU performs slightly better than the GRU, with a classification accuracy of 95.28% and 95.16%, respectively, which is in line with those presented in [31].This highlights a 0.08% overall improvement of the QGRU over the GRU.Even so, the QGRU performs better than the GRU in all neuron numbers experimented with except in the 32 neurons experiment, where the GRU provides a better classification accuracy.Similar to the localisation problem, the QGRU offers a significant parameter reduction in providing the best overall classification accuracy, with 59,015 parameters compared to 206,087 of the GRU, as shown in Table 5.The performance of the QGRU may be attributed to the quaternion algebra and Hamilton multiplication properties, lending support to a more compact Neural Network formulation.Such reduction in the parametric complexity of the model makes it more suitable for use on low memory embedded devices.

Conclusions
This paper proposed a novel Quaternion Gated Recurrent Unit (QGRU) to map multi-dimensional features efficiently using fewer parameters.The QGRU leverages the Hamilton product of quaternions to capture internal and external dependencies efficiently within and across multi-dimensional features.The performance of the QGRU is evaluated over a vehicular localisation problem and a Human Activity Recognition (HAR) task.On the vehicular localisation problem, the QGRU provided the least error in estimating the positioning uncertainty, with a 9.2% improvement over the GRU in the hard brake scenario, a 5.3% improvement the GRU in the sharp cornering and successive left and right turns scenario and an 11.4% improvement over the GRU in the wet road scenario.However, on the HAR task, the QGRU outperforms the GRU with a classification accuracy of 95.28% compared to 95.16% of the GRU.The results obtained from the study show that the QGRU is able to obtain these positioning uncertainty estimates and better classification accuracy compared to the GRU with up to 3.7 times fewer parameters.However, without the use of a carefully designed CUDA kernel, the frequent memory copy operations between the
whrl and ωb whrr , respectively.The errors (uncertainties) corresponding to the left and right rear-wheel speed measurements are defined as ε b whrl and ε b whrr .ω b whrr and ω b whrl are the wheel speed measurements without errors.ωb whrl = ω b whrl + ); Where ε b whr,x in Equation (35) represents the integral of ε b whr,v

Table 2 .
Comparison between the QGRU and GRU on each scenario of the vehicle localisation task.

Table 3 .
The number of trainable parameters across various numbers of neurons used in the vehicle localisation experiment.

Table 4 .
Comparison between the QGRU and GRU performance on the HAR task.

Table 5 .
The number of trainable parameters across various numbers of neurons used in the HAR task experiment.