Overcoming Bandwidth Limitations in Wireless Sensor Networks by Exploitation of Cyclic Signal Patterns: An Event-triggered Learning Approach

Wireless sensor networks are used in a wide range of applications, many of which require real-time transmission of the measurements. Bandwidth limitations result in limitations on the sampling frequency and number of sensors. This problem can be addressed by reducing the communication load via data compression and event-based communication approaches. The present paper focuses on the class of applications in which the signals exhibit unknown and potentially time-varying cyclic patterns. We review recently proposed event-triggered learning (ETL) methods that identify and exploit these cyclic patterns, we show how these methods can be applied to the nonlinear multivariable dynamics of three-dimensional orientation data, and we propose a novel approach that uses Gaussian process models. In contrast to other approaches, all three ETL methods work in real time and assure a small upper bound on the reconstruction error. The proposed methods are compared to several conventional approaches in experimental data from human subjects walking with a wearable inertial sensor network. They are found to reduce the communication load by 60–70%, which implies that two to three times more sensor nodes could be used at the same bandwidth.


Introduction
Real-time data transmission is important for a large range of applications. If multiple signals must be transmitted by a number of agents in a wireless network, bandwidth limitations impose restrictions on the number of agents that can transmit their signals in real time, as illustrated by Figure 1. One example is estimation and transmission of motion states in a wearable inertial sensor network. These systems are commonly used to provide biofeedback [1,2] or to control robotic systems [3,4] and neuroprostheses [4,5]. In such a network, a wireless inertial sensor is attached to each body segment of interest and sends measurements of its orientation to a receiving node. This receiver is typically a central unit that combines the measurements of multiple body segments to determine joint angles and similar motion parameters [6]. Whenever quick motions of several body segments are tracked, the available bandwidth of the wireless network imposes limitations that lead to a trade-off between the number of sensors and the rate at which they can send their data in real time. In the worst case, this leads to cables being used despite many disadvantages [7].
One main reason for these bandwidth limitations is that standard protocols transmit each measured sample of each signal even if it can be estimated accurately from previously transmitted samples. Intelligent protocols aim at exploiting such signal properties to reduce the communication load and thereby enable the use of more sensors or higher transmission rates, cf. Figure 1. Since the energy consumption of wireless communication is significant [8,9], a reduced communication load can also help to decrease battery sizes or increase use time.
Communication load can be reduced in two different ways: (i) The number of transmitted values (bits) per sampling instant can be reduced.
(ii) The number of sampling instants with communication can be reduced.
While the first approach only reduces the payload of a data packet, the second approach aims at saving an entire data packet. Our main focus is, therefore, on the second approach. However, we will compare both approaches later on.
For obvious reasons, reducing the communication load should not be achieved at the cost of jeopardizing the accuracy of the transferred signal. Hence it is desirable to assure that the signal that is estimated or reconstructed in real time on the receiver side differs from the original signal only to a small, well-defined extent. To propose and validate methods that minimize the communication load while assuring such a small error bound is the primary goal of this article.
When a signal is approximately constant or linear in time, it seems straight-forward to accurately estimate the current sample from previously measured ones. However, if the signal varies in a less easily predictable manner, it is more difficult to find signal properties that can be exploited. In the present contribution, we consider signals that are approximately locally periodic. Approximately means that the signal can be well approximated by repetitions of a periodic pattern. Locally means that this pattern might change slowly or even suddenly, and for some time periods there might be no periodic pattern at all. Many physiological signals exhibit this approximate local periodicity, for example respiratory, cardiopulmonary, and EMG data [10], as well as inertial measurement data from periodic motions such as walking, cycling, and swimming. Even though the shape an ECG curve or the motion of the leg during swimming might change slowly or even suddenly, a very large portion of the signals is typically well described by cyclic patterns, and completely irregular episodes are rare.
We will aim at exploiting this approximate local periodicity by identifying the patterns in real time and transmitting only the data samples that cannot be adequately estimated from that pattern and previously transmitted data. To this end, we will use and extend recently proposed event-triggered learning (ETL) methods for cyclically excited systems [11], and we will validate the performance of the proposed methods with respect to more conventional approaches. The novel contributions of the present article are:

•
We extend ETL to multidimensional nonlinear system with cyclic excitation and a non-Euclidean state space. The states of the system are unit quaternions that represent body segment orientations. • We propose an ETL algorithm that uses Gaussian process regression (GPR) for prediction and thereby reduces the communication load that is associated with model updates.

•
We validate the proposed methods using real measurement data and compare the performance to those of more conventional or fundamentally different approaches, such as compression by adaptive differential pulse code modulation.

•
We apply the method to data from a sensor network and investigate the effect of network delays and a limitation of the number of transmission channels.
To the best of our knowledge, none of these contributions has been considered or presented in previous research.
An overview of the state of research is provided in Section 1.1. In Section 2, we describe the specific application and the addressed problem. New methods that address the given problem are proposed in Section 3. The data-based validation is described and the results are discussed in Section 4 and Section 5, respectively. Finally, Section 6 provides conclusions.

Related Work
Several methods have been proposed to reduce the communication load in wireless networks. For sake of simplicity, consider only the communication between one sender and one receiver, which might be part of a larger network. The sender measures the state of a system periodically, and the receiver must provide an estimate of that state in real time. All approaches that are considered below have in common that the sender applies some kind of compression algorithms before transmitting the measured data. Compression algorithms attempt to find structure in the measured data and remove redundant information. Subsequently, the receiver tries to reconstruct the original measured signal based on the transmitted data. We call the original signal values the measurements and the reconstructed signal values the estimates. Reconstruction might lead to an error between measurements and estimates, and this reconstruction error should be small.
The reviewed compression algorithms can be categorized based on two criteria: Firstly, algorithms that reduce the number of sampling instants with communication in contrast to algorithms that do not change this number, but reduce the amount of transmitted bits per sampling instant, and, secondly, lossless in contrast to lossy algorithms.
Event-based sampling (EBS [12][13][14][15]) and event-triggered state estimation (ETSE [9,[16][17][18], sometimes referred to as model-based event-based sampling [19]) are well known lossy approaches to reduce the number of sampling instants with communication and guarantee a certain accuracy. Both can work in real time. In ETSE, the sender and the receiver independently predict the state measured by the sending agent. The prediction is based on an internal state and an invariant process model. The true measured state is only communicated and updates the internal states if a defined event indicates that the prediction is not accurate enough anymore. If the prediction is accurate, then the receiver uses the prediction as estimate.
The EBS algorithm is a special case of ETSE where the prediction is the last transmitted measurement. For example, ref. [12] applies EBS to a wireless wearable inertial measurement unit (IMU) network that communicates 3D accelerations and 3D angular velocities. The aim is to subsequently estimate joint angles. For a knee joint angle, they show that communication can be reduced by 66% compared to fixed-rate sampling, although the accuracy of the estimation is similar.
Furthermore, ref. [13] proposes EBS for transmission of orientations of human body segments in a wireless body sensor network. They use quaternions to represent these orientations and consider especially orientations of human feet. Depending on the parametrization, their method achieves a communication reduction by 70% at a root-mean-square-error (RMSE) of 7 • to 8 • or a RMSE of 1 • with a communication reduction by 5% to 10%. However, the error of the estimation is not guaranteed to be bounded.
The ETL algorithm [20,21] is an extension to ETSE. It can further reduce communication or increase accuracy if the system dynamics is time-variant. The sender has got the additional capability to learn a new model for prediction. Model learning is triggered and the estimated model is sent to the receiver if an event indicates that the system dynamics has changed.
Especially, ref. [11] proposes an ETL approach for cyclically excited systems. A setup with a single IMU on a human foot as sender, which provides a 1D pitch angle measurement, is considered. Using a base sampling rate of 50 Hz in an ideal network, communication can be reduced by more than 70% while the RMSE is below 1 • and the estimation error is bounded by 2 • .
For a comprehensive study, ref. [22] uses IMUs to collect acceleration data during several running sessions. They compare the compression ratios, the additional transmission delays, and the RMSE of different lossless and lossy compression algorithms. Bzip2, zlib, and Lempel-Ziv-Welch (LZW) have got the advantage to be lossless compression algorithms, i.e., there is no reconstruction error. All three methods arrange consecutive data into packets before each packet is compressed individually. However, they reduce communication by 13% or less and introduce additional delays of at least 4 s. These delays prohibit real-time applications.
Lossy zlib and different (also lossy) wavelet compression schemes allow compression ratios of about 50% while introducing delays of more than 0.5 s. Again, these delays are too large for real-time applications.
Finally, ref. [23,24] propose modified adaptive differential pulse code modulation (ADPCM) for real-time compression. ADPCM does not change the number of sampling instants with communication, but reduces the amount of communication (bits) per sampling instant. For this, it codes difference values instead of absolute ones and adapts the quantization intervals to the signal. At every step, sender and receiver predict the measurement at the next sampling instant and only the difference between the prediction and the measurement is sent as update. This is beneficial because the range of the prediction error is expected to be much smaller than the range of the measurement value. Therefore, a smaller number of bits is required to achieve a similar accuracy. The prediction function and the quantization interval are adapted after each step. For validation, ref. [23,24] consider compression of raw signals of IMUs (acceleration, angular velocity, and magnetic field), but also the effect of compression on subsequent state estimation (attitude, heading, position, etc.). Using a sampling rate of 100 Hz, they reduce communication of the raw data by 60% while the RMSE of the subsequently estimated angles is 0.36 • .
To conclude, lossless algorithms are inefficient for motion data compression in real time. In the field of lossy compression algorithms, the number of methods that guarantee a certain accuracy and introduce no significant time delay is limited. The only approach that has already been studied for real-time compression of 3D orientation measurements does not guarantee a bounded error of the reconstructed signal and leads to errors of several degrees if a significant communication reduction should be achieved.

Specific Problem Setting
As explained above, we consider transmission of signals that exhibit unknown and time-varying cyclic patterns. In contrast to previous contributions, we consider multidimensional signals with nonlinear dynamics: three-dimensional orientations measured by a distributed network of inertial sensors.

Setup
Consider a body sensor network consisting of one receiver and seven wearable sensor units, as illustrated in Figure 2. Each sensor unit comprises at least an IMU chip, a wireless communication module, and a microcontroller. The sensors are connected to the receiver via a star network; however, Sensors 2020, 20, 260 5 of 19 the following methods are not limited to this topology. The sensor network is used to track the motion of the lower limbs during gait. A fundamental assumption is that the motion is locally approximately cyclic as described in Section 1. This does not require constant steady-state gait; the subject might occasionally change the gait velocity and the walking style, for example by limping or tiptoeing. Each sensor unit determines its orientation x[k] ∈ O at every sampling instant k ∈ Z. The sampling rate is 50 Hz. The set of all 3D orientations is denoted as O. In general, three real numbers are sufficient to describe all x ∈ O (cf. Section 3.1) and every number is coded with 16 bits. This leads to a payload of 48 bits for every data packet.
Without loss of generality, we assume that Bluetooth 5 is employed for wireless data transmission, which is a widespread standard in today's applications. We use the faster standard 2M PHY and do not enable encryption (MIC). These settings lead to a total overhead of 18 bytes = 144 bits per packet, cf.

Goals
In the given star network, and in several other network topologies, the network traffic is a superposition of many bilateral communications, in each of which one signal is being transferred from one sender to one receiver. We define goals for one such bilateral communication and demand that in a larger network these goals should hold for each bilateral communication.
The receiver must provide an accurate estimatex[k] ∈ O of the orientation at every sampling instant k = 1, 2, . . . , K in real time, where K ∈ N is the total number of samples in the measurement. A sufficiently intelligent communication protocol will achieve this goal without transmitting each measurement sample completely. We denote the number of transmitted data bits at each sampling instant with b[k] ∈ N and assess communication reduction by the following three measures: of transmitted data bits, i.e., the size of the ATT Payload (cf. Figure 3).
The total number D = P + 144S of transmitted bits including the Bluetooth 5 overhead.
On the one hand, all three numbers should be low. On the other hand, we want to achieve a small error between the measurement of the sender and the estimate of the receiver and introduce a fourth and a fifth measure for this error: The maximum absolute angle difference between the measured and the estimated body segment orientation max k (x[k],x[k]), i.e., the upper error bound.
We define the angle difference : O × O → [0, π] between two orientations as the shortest rotation angle that is necessary to rotate the first orientation onto the second one (cf. Section 3.1).

Methods
Before the introduction of three new methods to achieve the goals described in Section 2.2, we need an efficient way to represent 3D orientations of body segments and their rotations.

Orientation Representation
Orientations and rotations can be represented by the same mathematical structure since orientations can be seen as rotations with respect to a fixed reference frame. Such a structure is the set of unit quaternions [31,32]. Unit quaternions have got four components (a real part and three imaginary parts) that are real numbers and whose Euclidean norm is one. Specifically, we use augmented quaternions q ∈ R 4 , i.e., quaternions in vector style. Their first component is the real part and the components two to four are the imaginary parts.
Quaternion multiplication is denoted by ⊗ : R 4 × R 4 → R 4 . If q 1 ∈ R 4 represents an orientation and q 2 ∈ R 4 a rotation, then their product q 1 ⊗ q 2 represents the orientation after the rotation. Furthermore, we define the angle difference : R 4 × R 4 → [0, π] between two quaternions as the shortest rotation angle that rotates the first quaternion onto the second one. This is the arccos of the real part of the product of the first quaternion and the inverse of the second quaternion [33].
We consider orientations represented by unit quaternions as the states x [k] ∈ R 4 that are measured by the sender and must be estimated by the receiver at every sampling instant with index k. The orientation quaternions x [k] are determined onboard from the IMU data using a sensor fusion algorithm, e.g., [34]. The estimated orientation quaternion of the receiver is denotedx [k] ∈ R 4 . To save 25% payload data, the sender always transmits only the imaginary part of the quaternion. Subsequently, the receiver restores the full unit quaternion using the Pythagorean theorem.

Event-Triggered Learning (ETL)
The two related methods ETSE [12][13][14][15] and EBS [16][17][18][19] can be applied in the setup described in Section 2.1 to address the goals established in Section 2.2. Both are represented by the white blocks in Figure 4. The sender only sends samples x [k] when a particular event occurs. However, the receiver provides the estimatex [k] at every sampling instant k, which requires it to perform a prediction whenever there is no communication. Specifically, the receiving agent and the sending agent independently predict the measurement x [k] based on previous estimates and possibly a process model (ETSE). The predictions of both agents are identical. However, just the sender has got access to the true measurement x [k]. Therefore, it recognizes if the prediction is not good anymore, e.g., the difference between measurement and prediction is large. This is indicated by a binary state-update trigger γ state [k] ∈ {0, 1}. If γ state [k] = 1, then the sender communicates the true measurement x [k], which updates the estimatesx [k] of the sender and the receiver immediately.
The quality of the model is crucial for the accuracy of the predictions and, thus, for communication reduction, too. A drawback of ETSE is that it employs always the same model for prediction. This is not the case for ETL [20,21]; therefore, it can improve ETSE for systems with time-variant dynamics like in the considered setting with possibly changing motion patterns. Specifically, we use ETL for cyclically excited systems as describes in [11]. In comparison to ETSE, two new blocks are introduced at the sender's side, which are shown in gray in Figure 4. A learning trigger detects if the dynamics of the measured process has changed and a model learning block uses previous measurements to identify a new model when the learning trigger fires. Subsequently, the sender updates the model

EVENT-TRIGGERED LEARNING
 ETL [14] can improve ETSE for systems with time-variant dynamics like in the considered setting.  We use ETL for cyclically excited systems as described in [5].  In comparison to ETSE, two new blocks are introduced (cf.   The block diagram shows the ETL architecture with one sender and one receiver [14,5].  The For the considered special case of repetitive quaternion signals as states, the model is a trajectorŷ U ∈ R 4×N of rotations (quaternion increments) during one cycle with estimated cycle lengthN ∈ N. The model of the rotation at the current step is the j-th column ofÛ (cf. Figure 5). The index j [k] ∈ N is increased after every step. If the end of the estimated model trajectory is reached (j [k] =N) or learning is triggered (γ learn [k] = 1), then j [k] is reset to 1. Based on that, the event-triggered state estimation iŝ We use the angle difference between the measured quaternion and its prediction for the state-update trigger with an error threshold δ ∈ R + 0 . Furthermore, we employ a binary learning trigger γ learn [k] ∈ {0, 1} to indicate when we want to update the modelÛ. Ideally, we trigger model updates if and only if the cyclic excitation has changed. Our trigger is based on the so-called inter-communication times, which are the times between two consecutive state updates. If the model no longer yields valid predictions of the current data, then the inter-communication times will decrease. We aim at detecting this decrease using the Kolmogorov-Smirnov test [35] as described in [11].
If model learning is triggered (γ learn [k] = 1), then the excitation trajectory that was observed during the last cycle becomes the new modelÛ. As first step of the model learning, the cycle lengthN of the previously observed cycle is estimated. The estimation is done in the frequency domain (using the autocovariance of the measured states of a small number of previous cycles) and refined in the To save more communication load, the trajectoryÛ is compressed using polynomial regression applied individually to each of its imaginary parts. The sender transmits the compressed model to the receiver, which restores an approximation of the full trajectory.

Hierarchical ETL
Hierarchical ETL for cyclically excited systems [11] extends ETL (cf. Figure 6). In standard ETL, transferring the complete modelÛ leads to a significant amount of communication whenever model learning is triggered. To address this disadvantage, we exploit the fact that small gait velocity changes can be well described by time-warping of the current quaternion trajectory, i.e., cycle length changes of the periodic excitation. If model learning is triggered, then the model learning block optimizes the two parameters cycle length ϑ 1 =N and phase shift ϑ 2 of the current modelÛ such that it fits the previously observed cycle best. Both parameters are estimated in the frequency domain (using the auto-or crosscovariance of the measured states of a small number of previous cycles) and corrected in the time dom ain with local optimization (Code and detailed description on http: //www.control.tu-berlin.de/EventTriggeredLearning). This extended learning strategy allows for two hierarchical levels of model updates: • Full model updates are updates of the whole trajectoryÛ.
• Small model updates adjust only a small number of parameters ϑ = ϑ 1 ϑ 2 T ; in the examined problem setting, the current trajectory is warped to the new cycle length ϑ 1 =N and shifted by ϑ 2 to obtain the newÛ.
Full model updates should only be carried out if a small model update is not expected to improve the prediction performance sufficiently. For this, a binary learning-type trigger γ full [k] ∈ {0, 1} differentiates between a small and a full update [11]: The quaternion trajectory that would have been predicted with the warped and shifted model during the last cycle is calculated. Subsequently, this simulated trajectory is compared with the observed trajectory. The comparison is done by calculating the angle error between both trajectories at every point (cf. Section 3.1). The RMS-value e [k] of the angle differences provides an estimate how good the model would fit after a small model update, i.e., just an update of the model parameters ϑ. If the error e [k] exceeds a threshold α ∈ R + 0 , then a full update is triggered

EVENT-TRIGGERED LEARNING
 ETL [14] can improve ETSE for systems with time-variant dynamics like in the considered setting.  We use ETL for cyclically excited systems as described in [5].  In comparison to ETSE, two new blocks are introduced (cf.   The block diagram shows the ETL architecture with one sender and one receiver [14,5].  Figure 6. Block diagram of the hierarchical ETL architecture with one sender and one receiver [11].
In contrast to the standard ETL in Figure 4, triggered model learning can either lead to only an adjustment of certain parameters ϑ of the current model trajectory or to a completely new model trajectoryÛ.
Otherwise (γ full [k] = 0), the sender sends the parameters ϑ as small model update and both agents deform the previous trajectory to obtain the new modelÛ. If γ full [k] = 1, then a full update of the modelÛ is carried out as described in Section 3.2.

ETL with Gaussian Process Regression (GPR)
Hierarchical ETL reduces, but not eliminates, the drawback of standard ETL, which is that transferring the full model trajectoryÛ requires the communication of a significant amount of values. An alternative method with much less model communication is introduced in this section. Its fundamental idea is to use a non-parametric machine learning method, which allows the receiving agent to learn the prediction model from the available measurement data instead of transferring a complete model. Specifically, Gaussian process regression (extrapolation) [36] implements the prediction block (cf. Figure 7) instead of the trajectory-based prediction (1).


The block diagram shows the ETL architecture with one sender and one receiver [14,5] Figure 7. Block diagram of the ETL architecture with Gaussian Process Regression (GPR) for one sender and one receiver. In contrast to the standard ETL in Figure 4, predictions are GPR-based instead of trajectory-based. Consequently, the only information that is updated after model-learning is the estimated cycle lengthN, which is the single variable hyperparameter of the GPR.
Only state updates are available at the receiving agent and no additional states should be transferred. Therefore, the prediction at sampling instant k uses the previous p ∈ N measured and transferred states X = x 1 . . . [1, p]. In other words, {K, X} is the training set for the GPR, which is adjusted whenever a state update occurs. The aim is to predict the outputx[k] at index k. To make the problem suitable for GPR with one-dimensional outputs [36], we predict each imaginary part ofx[k] independently of all others and make only sure that their norm is not larger than 1. Finally, the real part of the unit quaternion is obtained using the Pythagorean theorem (cf. Section 3.1).
A Gaussian process can be thought of as a distribution over a function space [36] and is uniquely determined by its mean function m : Z → R and covariance (kernel) function φ : Z × Z → R. Prediction (extrapolation) can be done with GPR if, in addition, the expected standard deviation σ noise ∈ R + 0 of the noise of the outputs is defined. We set the mean function for each imaginary part to be piece-wise constant m i (k) = c i ∈ R, i ∈ {1, 2, 3} as it is often done according to [36] because the kernel function provides enough flexibility to model significant deviations from the mean. The constant , is updated after each state update since we assume that the empirical mean of the training data provides a good estimate of the true mean.
For the kernel function, three effects must be considered. On the one hand, data points whose indices are close to each other are expected to be correlated. On the other hand, every data point is assumed to be correlated with the data points one or more cycles before the current cycle. Additionally, it is reasonable to account for a gradual model change. All three points are adressed by using a locally periodic kernel which is the covariance between two state components at sampling instants k and k . The multiplication of a periodic kernel with the squared-exponential kernel leads to larger weights for data points in more recent cycles, compare [37,38] and especially [39] for an application involving human gait. The estimated cycle lengthN is the periodicity of the kernel. For good predictions, it must be known precisely. The lengthscale l per ∈ R + of the periodic kernel is fixed as well as the standard deviation (scaling factor) σ φ ∈ R + . The lengthscale l se ∈ R + of the squared-exponential kernel depends on the cycle length l se = ρ seN with a fixed factor ρ se ∈ R + . Figure 8 illustrates the influence of the individual parameters. Just one model parameter is adapted online-the cycle lengthN. Whenever model learning is triggered, the sender executes its estimation similar as described in Sections 3.2 and 3.3, i.e., the cycle length is estimated in the frequency domain. The estimated cycle lengthN is the only value which is transferred via the network during a model update. In consequence, each sender can send a maximum number of four values at a sampling instant (worst case of state update and model update together). This is much less than the amount of communication that is required by trajectory-based ETL to transmit a full model update of the trajectoryÛ (cf. Section 3.2 and [11]) and, therefore, an advantage of GPR-based ETL because its communication load does not have salient peaks.

Experiments
We use the setup described in Section 2.1, i.e., we attach in total seven wireless IMUs (type XSENS MTW) to a human body, two at the feet, two at the shins, two at the thighs, and one at the torso (cf. Figure 2), and record data sets during two different experiments: • The first set (variable) contains approximately 4 min (180 steps) of simulated pathological gait with frequent style, speed, and ground inclination changes.

•
The second set (steady) contains approximately 4 min (200 steps) of normal walking, at first with a speed of 2 km/h, then with 4 km/h.
During these experiments, each sensor unit records all measurement samples at a rate of 50 Hz, and a proprietary data recording software (MT Software Suite, XSENS) is used to transmit all measurement samples to a PC. Thereby, we obtain complete real-world application data sets on which we simulate and compare the proposed methods using MATLAB. For these simulations, we establish two fundamental assumptions: Firstly, the underlying network protocol accounts for channel noise and collision, re-sends samples if transmissions fail, and, thus, assures that no packets are lost. Secondly, we assume that processing and transmission delays are negligible. This is clearly an ideal environment. However, we examine the impact of channel number limitations and network delays in Sections 4.5 and 4.6.

Parameterization
For all three ETL methods, the estimation error threshold for the state-update trigger (2) is δ = 2 • (except for Section 4.2). This is, e.g., smaller than the error that is tolerated by neuroprosthesis controllers [5] and smaller than the state-of-the-art accuracy in foot orientation tracking [40]. Furthermore, the significance level of the Kolmogorov-Smirnov test as learning trigger is 5%, i.e., if the model is correct, then an (unwanted) model update is triggered with a probability smaller than 5%. Additionally, the test must fire for a minimum holding time of 0.35 s in order to make it more robust against false positives. Both parameters together lead to a good trade-off between the numbers of state and model updates in the considered application. For the compression ofÛ, polynomial regression with a degree of 18 is employed. In contrast to that, the full trajectory of a stride typically contains 40-80 samples. More parameters would increase the risk of overfitting.
For hierarchical ETL, the threshold of the learning-type trigger (4) is α = 5 • . This value balances the numbers of small and full model updates.
ETL with GPR-based predictions uses three fixed hyperparameters for the kernel, which are ρ se = 2, l per = 0.5, and σ φ = 20, and obtained with cross-validation. For each prediction step, the GPR considers a dynamic training data set {K, X} with a horizon of p = 250. On the one hand, no data points with a significant covariance with respect to the current sample lie outside of this window; on the other hand, the computational effort is not too large. Finally, the noise's standard deviation σ noise is the same for each prediction. Because the value has got a physical meaning, it can be selected based on observations. With the available data, we establish an empirical mean of the standard deviation for different walking styles that is approximately σ noise = 0.01.

Parameter Study
At first, we evaluate the influence of the angle difference threshold δ (the upper error bound) on the performance of hierarchical ETL (cf. Section 3.3). For simplification, we consider a setup with one foot-mounted sender and one receiver. Figure 9 shows the four performance measures introduced in Section 2.2 for nine different values of δ. Clearly, a higher communication reduction is achieved for steady gait then for variable gait. Additionally, the charts reveal that a slightly larger error threshold than δ = 2 • enables even less data transmission for both types of gait. Figure 9. Performance impact comparison of different estimation error thresholds δ on hierarchical ETL. Data from feet-mounted sensors during 4 min of variable gait and steady walking is considered, respectively. The charts show the number P of transmitted data bits, i.e., the size of the ATT Payload, the number S of sampling instants with data transmission, and the total number D of transmitted bits including the Bluetooth 5 overhead with respect to full communication with the values P full , S full , and D full . In addition, the RMS-value of the angle difference between the measured and the estimated body segment orientation is displayed.

Performance Comparison
We compare the performance of the three methods proposed in Section 3 (ETL, hierarchical ETL, and ETL with GPR) with four conventional methods from literature (full communication, decimation, IMA ADPCM, and EBS). This is done quantitatively based on simulations with real-world measurement data.
Firstly, full communication means that every sample is transmitted and no compression is applied. Secondly, for decimation, we send only every n-th sampling instant with n ∈ N, i.e., where the expression mod : Z × N → N is the modulo operator. We choose the decimation factor n = 2, i.e., every second sample is transmitted. Thirdly, we examine IMA ADPCM (cf. Section 1.1) from [41,42]. The algorithm uses a fixed number of 4 bits per transmitted value. As described in Section 2.1, every measured quaternion component is coded with 16 bits. In consequence, the application of IMA ADPCM to all three imaginary parts reduces the payload by a factor of four.
Finally, EBS (cf. Sections 1.1 and 3.2) is a special case of ETSE where the estimatex [k − 1] at the last sampling instant is employed to predict the measurement x [k], i.e., Our EBS implementation uses the same state-update trigger (2) with the same estimation error threshold δ = 2 • as ETL.
For simplification, we consider again a setup with one sender and one receiver and only feet (cf. Section 4.2). Later, Section 4.4 demonstrates that the results for sensors attached to other lower limb segments are qualitatively the same. Figure 10 shows the four performance measures introduced in Section 2.2 for all methods. The three novel approaches reduce the total amount of transmitted data more than the baseline methods for both, variable as well as steady gait; although, the resulting estimation errors are small with RMS-values of about 1 • . Additionally, Figure 11 visualizes the trade-off between the payload and the number of transmitted packets for the different methods. While ADPCM reduces the number of sampling instants with communication most, ETL achieves the smallest payload. However, the two modified ETL methods (hierarchical ETL and GPR-based ETL) lead to the best compromises between both measures such that their total amount of transmitted data including the Bluetooth 5 overhead is the smallest.

Different Body Segments
In contrast to Sections 4.2 and 4.3, we consider all seven IMUs as senders in a star network for the following analysis. Such a star topology with one receiver and multiple senders is common in biomedical applications and human motion tracking. Additionally, more complex network structures can often be decomposed into several star networks. For simplicity, we assume that the senders communicate at the same time with the receiver. Figure 12 shows the total amount of transmitted data as defined in Section 2.2 (including Bluetooth 5 overhead) with respect to full communication for the variable data set. The tables reveal that hierarchical ETL leads not only to a superior communication reduction in contrast to the baseline methods for feet, but also for other body segments.

Delayed Model Identification and Transmission
If ETL or hierarchical ETL with trajectory-based predictors is used, then transmission of full models causes significantly more communication than transmission of state updates. Additionally, model learning requires the most computational power of all steps of the proposed methods. Therefore, the aim is to examine how a potentially delayed identification and transmission of models affects the performance of the methods. For this, we assume ten sampling instants delay for model updates in ETL, i.e., a newly learned model is only available to the receiver ten sampling instants after learning was triggered. This is more than what is expected in practice due to communication and computation delays in modern communication networks. We carry out the same simulations as for Figure 12 to calculate D/D full . The result is that the percentage values for hierarchical ETL increase by 2% or less for all examined body segments. In consequence, they are still significantly smaller than the results for EBS and ADPCM.

Limited Number of Channels
Under poor conditions it can happen that not all senders can communicate with a single receiver at the same time. This fact has not been considered in the previous simulations and is, therefore, examined in the following. In reality, the number of IMUs that can communicate successfully at the same sampling instant is a random number. For simplicity, we define that this limit is constant. If an additional IMU attempts to communicate, then it must wait until the next sampling instant. In the simulations, the decisions which IMUs must wait are random. Figure 13 shows how often the case occurs that an IMU must wait for one or even more sampling instants when hierarchical ETL is simulated with the variable measurement data set (cf. Section 4) and different maximum channel numbers. If a state update cannot be transmitted immediately, then the error bound δ is violated. Figure 13 displays histograms of the magnitudes of these boundary violations in comparison to boundary compliances, too. If at least three channels can be used, then the error bound is never violated for more than one sampling instant by a single sender. If four channels or more are available, then the total amount of boundary violations is less than 0.5% and, in consequence, negligible. Figure 13. Simulation results for different assumed maximum channel numbers in a star network with seven inertial measurement units (IMUs) as senders. The pie charts show how often a sender can communicate immediately (delay 0) or must wait 1, 2, or more sampling instants until its communication attempt is successful. The bar charts show the distribution of the estimation error. We consider hierarchical ETL and 4 min of variable gait.

Discussion
The experimental results are discussed in two stages. Firstly, we compare the three novel methods with the four baseline methods using the example of one foot-mounted sender and one receiver. Secondly, we analyze the properties of the best performing novel method in a larger sensor network.

Method Comparison
Decimation with factor two sends samples periodically and reduces the communication by 50%, but introduces a large estimation error. This is because the selection of the samples that the sender sends to the receiver is pre-determined. In contrast to that, EBS selects the transmitted samples online (event-based) to guarantee a bounded error with as little communication as possible. Therefore, this method is able to achieve a much smaller error with an almost comparable communication reduction. ETL reduces the amount of samples with communication even more without causing a much bigger RMSE. This is possible with model-based predictions and model updates, which account for time-variant behavior of the system dynamics. Furthermore, a hierarchical architecture improves ETL without compromising the error. This strategy exploits the fact that a velocity change can be described with only two parameters, which reduces the communication in case of model updates. Although the model trajectories after small updates might be less accurate than after a full update, the prediction error and, therefore, the number of state updates increases only a little. With the hierarchical extension, ETL with trajectory-based predictions shows a comparable performance as ETL with GPR with respect to communication reduction and the RMSE. However, a disadvantage of GPR is that it requires more computational resources. On the other hand, the communication load is more evenly distributed over the sampling instants. This is because the prediction with GPR is less precise than with trajectories, and more state updates occur, but model updates are much smaller.
In addition to the four quantitative performance measures, the existence of an upper bound on the estimation error is an important assessment criteria. Only EBS and the different versions of ETL guarantee such a bound while decimation and ADPCM can cause arbitrarily large errors.
Moreover, the quantitative measures for the performance of ADPCM differ significantly from the ones for the other methods. On the one hand, ADPCM reduces the payload per sampling instant most.
On the other hand, it does not reduce number of sampling instants. Therefore, the overhead is not reduced and the total amount of communication does not shrink as much as with the ETL nethods.

Sensor Networks
We select hierarchical ETL for further evaluation in sensor networks with more than two agents because it is the novel method with the smallest amount of total communication in the previous contemplation. Firstly, hierarchical ETL shows a similar or superior performance for other body segments than feet; EBS, the baseline method with the best trade-off between communication reduction and estimation error, is not superior to hierarchical ETL for any segment.
Furthermore, we demonstrate that model update delays are no practical problem, i.e., the communication load does not increase significantly even for large delays. Additionally, the robustness against channel limitations is a special property of ETL, e.g., ADPCM cannot handle those because every sender must communicate at every sampling instant. With hierarchical ETL, we can use twice as many sensors as channels without overshooting the error bound for more than one sampling instant.

Limitations
The presented validation study has some limitations, which are briefly discussed in the following. Due to the small number of different experiments and subjects, the results provide only a qualitative proof of concept. They show the potential and elementary properties of the methods but do not yield precise performance quantification. For a detailed assessment, a more extensive study must be performed in a specific application scenario. The current approach of simulated-real-time processing of recorded data allowed us to investigate accuracy, communication load reduction and robustness to network delays independently and to gain insights into the performance of the algorithm in predefined situations. However, a detailed performance analysis in a specific application scenario will require testing of the proposed algorithms on embedded hardware with real-time wireless communication and uncontrolled bandwidth and delay variations. Performance should be evaluated also for a large number of agents in the network as well as for different communication protocols with different overheads and payloads.
Furthermore, it should be noted that the communication reduction achieved with hierarchical ETL in comparison to non-hierarchical ETL depends on the size of the large model update. In this work, its size is similar for all experiments. For a smaller full model update, the benefits of the hierarchical approach are expected to be limited.

Comparison with Related Work
In comparison to most of the existing algorithms for data compression in sensor networks (cf. Section 1.1), the proposed ETL methods work in real time. One of the methods that is covered in Section 1.1 and is also real-time-capable is ADPCM [23,24]. However, we demonstrated in Section 5.1 that ETL outperforms ADPCM on the given evaluation data.
Furthermore, ETL is preferable to the real-time-capable method presented in [13] (EBS), which achieves almost no compression at a RMSE of 1 • or leads to a five to ten times larger error for compression rates similar to those achieved by ETL.
Finally, the relative communication reduction of the proposed ETL methods is similar to the one achieved in [12] with EBS. However, the direct comparison of EBS and ETL on the same evaluation data in Section 5.1 revealed the performance increase that ETL achieves by exploiting the approximate periodicity of the signals. Moreover, the proposed quaternion-based approach leads to a general decrease of communication load in comparison to the approach in [12] because it transmits only three-dimensional signals (imaginary parts of unit quaternions) instead of six-dimensional measurements (3D accelerometer and gyroscope data, respectively). Both advantages come at the cost of increased computational effort.

Conclusions
The proposed methods reduce the communication load in sensor networks by exploiting the fact that physiological signals are often approximately cyclic. They account for the fact that these periodic patterns might change gradually or suddenly over time.
A major advantage is that ETL guarantees a user-defined upper bound for the estimation error. In the considered experimental data, the total communication load can be reduced by more than 60% or at least twice as many sensors can be used at transmission inaccuracies as small as one degree. The relative communication load reduction is expected to further improve for sampling rates at or above 100 Hz, which are commonly used in wireless IMU networks and desirable for analysis of fast and agile motions.
A disadvantage of ETL is that it requires more computational power than full communication. However, the power consumption due to communication can easily outweigh the power consumption spent on computation [8]. The total energy demand will be reduced in many application systems, and lighter batteries can be used or the time of use can be increased.
Future work will aim at large-scale validation of accuracy and reliability in selected application scenarios in wearable sensor networks and swarm robotics. In this context, dynamic network topology changes and multi-hop network topologies will be considered.