Application of Feedforward and Recurrent Neural Networks for Fusion of Data from Radar and Depth Sensors Applied for Healthcare-Oriented Characterisation of Persons’ Gait

In this paper, the useability of feedforward and recurrent neural networks for fusion of data from impulse-radar sensors and depth sensors, in the context of healthcare-oriented monitoring of elderly persons, is investigated. Two methods of data fusion are considered, viz., one based on a multilayer perceptron and one based on a nonlinear autoregressive network with exogenous inputs. These two methods are compared with a reference method with respect to their capacity for decreasing the uncertainty of estimation of a monitored person’s position and uncertainty of estimation of several parameters enabling medical personnel to make useful inferences on the health condition of that person, viz., the number of turns made during walking, the travelled distance, and the mean walking speed. Both artificial neural networks were trained on the synthetic data. The numerical experiments show the superiority of the method based on a nonlinear autoregressive network with exogenous inputs. This may be explained by the fact that for this type of network, the prediction of the person’s position at each time instant is based on the position of that person at the previous time instants.


Introduction
The life expectancy at birth, estimated for the global population in 2019, was, ca., 73 years; it has been rising during the last decades and is predicted to reach 77 years by the second half of the twenty-first century. At the same time the global fertility rate (i.e., the number of live births per woman over a lifetime) is decreasing [1]. As a consequence, the global population is ageing-it is expected that the share of European and North American population aged at least 65 years will reach 25% in 2050. Hence, there is a growing importance of research on new technologies that could be employed in monitoring systems supporting care services for elderly persons. The falls of elderly persons belong to the most frequent reasons for their admission and long-term stay in hospitals [2], and therefore the monitoring systems developed for the sake of those persons should enable prediction and prevention of dangerous events, such as person's fall and harmful long lie after the fall, and also detection of those events [3,4]. The relevance of features related to gait analysis in monitoring of elderly persons, and in particular in fall prevention, has been emphasised in several papers [5][6][7][8].
A system based on the impulse-radar sensors and depth sensors may not only be used for localisation of a monitored person in their apartment, but also for estimation of several healthcare-related parameters, enabling medical personnel to make useful inferences on the health condition of that person [33,34]. These parameters include, but are not limited to: the estimates of the time spent in motion and travelled distance (both helpful in assessing the level of physical activity), the number of turns made during motion (considered an indicator of mental health condition [35]) and the mean walking speed (considered highly informative with respect to the overall health status [36]).
It has been recently shown by the author and his collaborators that the fusion of measurement data from the impulse-radar sensors and depth sensors may enable reliable estimation of the monitored person's position [37], and that a multi-layer perceptron, trained on synthetic data, may be effectively used for that purpose [38]. In the case of the monitoring of elderly persons, training of neural networks may be a particularly challenging task: involving elderly persons in the experimentation aimed at the acquisition of the reference data could be problematic both for medical and ethical reasons. The use of synthetic data for training of these networks may solve this problem.
In this paper, the idea of applying artificial neural networks, trained on synthetic data, for fusion of the measurement data from the impulse-radar sensors and depth sensors, is further explored. The multilayer perceptron treats all the measurement data independently and does not take into account the movement history, i.e., the dependence of the current position and direction of the movement on the past position and direction of the movement. This characteristic of a movement may be exploited in the process of data fusion by means of a recurrent neural network, which takes into account the previous values of its output for predicting the current value of its output.
The novelty of the research results presented in this paper consists in application of a new method, based on a nonlinear autoregressive network with exogenous inputs, for the fusion of data from impulse-radar sensors and depth sensors, as well as a systematic comparison of that method with a method based on a multilayer perceptron, and one reference method within a comprehensive programme of experimentation, based on the real-world data, involving:

•
The localisation of a person walking around the monitored area according to various predefined movement scenarios.

•
The estimation of several parameters, carrying information important for medical experts, on the basis of the estimated movement trajectories.
This programme included, in particular, experiments aimed at the investigation of the influence of the obstacles, occluding a monitored person, and the walking speed of that person on the accuracy of the estimation of the parameters.

Method Based on Nonlinear Autoregressive Network with Exogenous Inputs
This method (called NARX method hereinafter) is based on a nonlinear autoregressive network with exogenous inputs, with the following structure ( Figure 1a):

•
Four input neurons-the input data represent two pairs of the x-y coordinates acquired by means of the impulse-radar sensor and depth sensor; • Six neurons in a single hidden layer; • Two output neurons-the output data represent one pair of fused x-y coordinates; the output values, delayed by two time instants, are fed back to the hidden layer; The implementation of the neural network has been based on the procedures available in the MATLAB Deep Learning Toolbox [39].
The artificial neural network was trained on the synthetic data generated according to the methodology described in detail in [38]. The synthetic data representative of the measurement data acquired by means of the impulse-radar sensors were corrupted with zero-mean red noise and smoothed using the moving-average filter. The generation of the synthetic data representative of the measurement data acquired by means of the depth sensor involved the modelling of a silhouette of a moving person by means of an ellipse following a sine-shaped trajectory oscillating around the reference trajectory; generation Sensors 2023, 23, 1457 3 of 18 of those data took into account the fact that the depth sensors can "see" only one side of the body. of those data took into account the fact that the depth sensors can "see" only the body.
Four different reference trajectories, presented in Figure 2, were used fo of the synthetic data. For each trajectory, forty realisations of the data were g each type of sensor, but in the case of a depth sensor, twenty of those realis fragmented to represent the corruption of the measurement data due to the oc data acquisition rate for the depth sensor and for the impulse-radar sensors w Hz. All the generated trajectories are presented in Figure 3.  Four different reference trajectories, presented in Figure 2, were used for generation of the synthetic data. For each trajectory, forty realisations of the data were generated for each type of sensor, but in the case of a depth sensor, twenty of those realisations were fragmented to represent the corruption of the measurement data due to the occlusion. The data acquisition rate for the depth sensor and for the impulse-radar sensors was set to 10 Hz. All the generated trajectories are presented in Figure 3. Radar data Depth data without occlusion Depth data with occlusion Radar data Depth data without occlusion Depth data with occlusion  Each graph for radar data depicts 40 superimposed sequences of synthetic data, while each graph for depth data depicts 20 superimposed sequences of synthetic data.

Method Based on Multilayer Perceptron
This method (called MLP method hereinafter), introduced by the author in [39], is based on a multilayer perceptron with four input neurons, eight neurons in a single hidden layer, and two output neurons ( Figure 1b).

Method Based on Kalman Filter
This method (called KF method hereinafter), described in [37], has been chosen as the reference method because it yielded very good results in an extensive experiment based on real-world data; moreover, the KF method (as the NARX method and the MLP method) is a method of data-point fusion, i.e., it is based on the assumption that the fusion of the data acquired by means of the impulse-radar sensors and the data acquired by means of the depth sensor is performed whenever new data points are available.
This method is derived from linear transition equation modelling the movement of a monitored person walking with nearly constant velocity, viz., with a small random acceleration between two consecutive time instants [40]: where:  Each graph for radar data depicts 40 superimposed sequences of synthetic data, while each graph for depth data depicts 20 superimposed sequences of synthetic data.

Method Based on Multilayer Perceptron
This method (called MLP method hereinafter), introduced by the author in [39], is based on a multilayer perceptron with four input neurons, eight neurons in a single hidden layer, and two output neurons ( Figure 1b).

Method Based on Kalman Filter
This method (called KF method hereinafter), described in [37], has been chosen as the reference method because it yielded very good results in an extensive experiment based on real-world data; moreover, the KF method (as the NARX method and the MLP method) is a method of data-point fusion, i.e., it is based on the assumption that the fusion of the data acquired by means of the impulse-radar sensors and the data acquired by means of the depth sensor is performed whenever new data points are available.
This method is derived from linear transition equation modelling the movement of a monitored person walking with nearly constant velocity, viz., with a small random acceleration between two consecutive time instants [40]: where: (1) n y n y (1) n T is a state vector representative of the two-dimensional coordinates (x n , y n ), corresponding to a time instant t n , and the velocities along these dimensions x (1) • F n and Γ n are the matrices of the form: with ∆ n being the time interval between the time instants t n−1 and t n ; • α n = α x,n α y,n T is a vector modelling the acceleration, whose elements are assumed to be the realisations of a zero-mean bivariate normal distribution with a known covariance matrix Σ α = diag σ 2 α,x , σ 2 α,y . The output equation, complementing the transition Equation (1), is modelling the relationships between the vector of observationsz n and the state vector x n : where: • H is an observation matrix of the form: T is a vector representative of the observation noise corrupting the data, whose elements are assumed to be realisations of a zero-mean bivariate normal distribution with a known covariance matrix Σ η,n .
Under the above-formulated assumptions, the radar data and the depth data can be fused by means of a Kalman filter (cf. the references [41][42][43]) performing, for each time instant, the following sequence of operations [40,44]:

1.
Determination of the pre-estimatex pre n of the state vector and the pre-estimateP pre k of its covariance matrix: where: 2.
Calculation of the estimateẑ n of the observation vector: 3.
Calculation of the so-called innovation vector: and the estimate of its covariance matrix: for each vector of dataz n ; x n =x pre n + G n ζ n (11) and the final estimate of its covariance matrix: where G n is the Kalman-gain matrix determined according to the formula: It is worth noting that, in this method, the radar data and depth data are acquired asynchronously; therefore, an observation vectorz n may be representative of either the radar data or the depth data, and is always associated with a corresponding known covariance matrix Σ η,n of the noise corrupting those data. The final estimatex n of the state vector is the result of data fusion in every time instant t n .

Detection of Motion
Motion has been detected by comparing the distance travelled in a given period of time, T m , with a given threshold, D. A binary sequence, indicating motion, has been determined according to the formula: where: In the above equations, {x n } and {y n } are the sequences of the coordinates of the monitored person's position, being either acquired by means of the sensors or fused by means of the methods described in Section 2. The values of T m and D should prevent small deviations in the position from being considered as motion; T m = 0.45 s and D = 0.1 m have been selected because these values yielded satisfactory results in the experiments.
To reduce the influence of the dispersion of the position estimates on the detection of motion, morphological closing [45] has been performed on the binary sequence {b m,n }. Because the fused data sequences are acquired by means of sensors with different data acquisition rates, the length of the structuring element used for the morphological closing has been equal to the number of data points within the time interval T m . The binary sequence obtained by means of the morphological closing, b m,n , has been used for the detection of the moments when a monitored person started walking, and the moments when that person stopped walking.

Estimation of Walking Direction and Moment of Turning
The walking direction has been estimated by computing the four-quadrant inverse tangent of two consecutive positions (x n , y n ) and (x n−1 , y n−1 ) of the walking person. It is assumed that the counter domain of the function is [−π, π] or [−180 • , 180 • ]. A turn can be detected by comparing the absolute value of the sum of changes in the walking direction in a given period of time, T t , with a given threshold, Φ. A binary sequence, indicating turning, has been computed according to the formula: In the above equations, {x n } and {y n } are the sequences of the coordinates of the monitored person's position. The values of T t and Φ should be optimised to prevent small deviations in the position from being considered as turns; T t = 1 s and Φ ∼ = 1.05 rad (60 • ) have been selected because these values yielded satisfactory results in the experiments.
To reduce the influence of the dispersion of the position estimates on the detection of turns, morphological closing has been performed on the binary sequence {b t,n }. Because the fused data sequences are acquired by means of sensors with different data acquisition rates, the length of the structuring element used for the morphological closing has been equal to the number of data points within the time interval T t . The binary sequence obtained by means of the morphological closing, b t,n , has been used for the detection of the moments when a monitored person started turning, and the moments when that person stopped turning.

Estimation of Travelled Distance
The travelled distance has been calculated by summing up the distances between the consecutive locations of the walking person:

Estimation of Walking Speed
The estimation of the average walking speed has been calculated by dividing the estimate of the travelled distance, determined according to Equation (18), by the estimate of the time spent in motion, determined according to Equation (14).

Acquisition of Measurement Data
In the experiments, the raw measurement data were acquired by means of: • Two impulse-radar sensors based on Novelda NVA6201 (Novelda, Oslo, Norway) chip working in the frequency range 6.0-8.5 GHz [46], and having the data acquisition rate of 10 Hz; • An infrared depth sensor being a part of the Microsoft Kinect V2 device (Microsoft, Redmond, WA, United States) [47], having the data acquisition rate of 30 Hz.
The experimentation programme comprised two parts (named EXP#1 and EXP#2) involving the monitoring of the movements of a person walking within the area presented in Figure 4a:

•
In experiment EXP#1, a person walked forth and back along a serpentine trajectory, among the obstacles occluding that person (see Figure 4b); 30 walks were performed and the walking speed was kept constant at v = 0.5 m/s. Thus, the whole programme of experimentation comprised the acquisition of R = 150 sequences of x-y data. Throughout the experimentation, the constancy of the walking speed was assured by means of a metronome. Thus, the whole programme of experimentation comprised the acquisition of 150 R = sequences of x-y data. Throughout the experimentation, the constancy of the walking speed was assured by means of a metronome.

Indicators of Uncertainty of Estimation of Monitored Person's Position
For each sequence of the data representative of the person's position on the x-y plane: the corresponding sequence of the absolute errors of the position estimation has been computed by subtraction of the reference values: ∆x n,r =x n,r − . x n,r n = 1, . . . , N r and ∆ŷ n,r =ŷ n,r − . y n,r n = 1, . . . , N r for r = 1, . . . , R (20) Finally, the sequences of the absolute errors of the position estimation, viz. {∆x n,r } and {∆ŷ n,r }, have been used for determining the R sequences of the position errors ∆d n,r : ∆d n,r = (∆x n,r ) 2 + (∆ŷ n,r ) 2 n = 1, . . . , N r for r = 1, . . . , R The assessment of the performance of the compared methods of data fusion has been based on the inspection of the empirical cumulative distribution functions F(ξ) [48] characterising the position errors: where ∆d m m = 1, . . . , M is a sequence composed of all the position errors ∆d n,r , and I( • ) is the function taking the value 1 if the condition inside the brackets is met, and the value is 0 otherwise. The following indicators of uncertainty of estimation have been used:

Indicators of Uncertainty of Estimation of Healthcare-Related Parameters
The errors corrupting the estimates of the x-y coordinates can significantly decrease the accuracy of the estimates of the healthcare-related parameters. To reduce the effect of the propagation of the errors of position estimation, the sequences of the x-y coordinates have been smoothed by means of a method based on a moving average filter.
The following indicators of uncertainty of estimation have been used: • The mean error determined with respect to the corresponding reference value (ME); • The standard deviation of that error (SE).

Uncertainty of Estimation of Monitored Person's Position
The estimates of the two-dimensional movement trajectories obtained in experiment EXP#1 are shown in Figure 5; the corresponding empirical distribution functions are provided in Figure 6. The values of the uncertainty indicators described in Section 4.2.1 are presented in Table 1. The analogous set of results, but obtained for experiment EXP#2, is provided in Figures 7 and 8, and Table 2. person. It should be stressed that in the case of the methods based on the artificial neural networks, the significant increase in the accuracy of the position estimation has been achieved despite the fact that the artificial neural networks were trained only on the synthetic data.

•
In experiment EXP#1, where the monitored person was occluded by the obstacles, the more accurate estimates of the trajectories have been obtained by means of the NARX method than by means of the MLP method. This result can be explained by the fact that in the case of the recurrent neural network, the prediction of the person's position at each time instant is based on the past position of that person. When the monitored person "disappears" behind the obstacle, larger measurement errors may corrupt the radar data and the depth data, but the recurrent neural network may react to such sudden changes and mitigate their influence on the result of the data fusion.            Figure 9, the estimates of the movement trajectories from EXP#1, obtained on the basis of the data fused by means   While the differences among the values of the ECDF A indicator may seem insignificant, it is worth noting that the decrease in its value by only a few hundredths may indicate a quite significant distortion of the estimated trajectories. In Figure 9, the estimates of the movement trajectories from EXP#1, obtained on the basis of the data fused by means  The analysis of the presented results is leading to the following conclusions: • The fused estimates of the person's position are characterised by lower bias and dispersion when compared with the radar-data-based and depth-data-based estimates; moreover, the fused estimates are only slightly affected by the obstacles occluding a person. It should be stressed that in the case of the methods based on the artificial neural networks, the significant increase in the accuracy of the position estimation has been achieved despite the fact that the artificial neural networks were trained only on the synthetic data.
• In experiment EXP#1, where the monitored person was occluded by the obstacles, the more accurate estimates of the trajectories have been obtained by means of the NARX method than by means of the MLP method. This result can be explained by the fact that in the case of the recurrent neural network, the prediction of the person's position at each time instant is based on the past position of that person. When the monitored person "disappears" behind the obstacle, larger measurement errors may corrupt the radar data and the depth data, but the recurrent neural network may react to such sudden changes and mitigate their influence on the result of the data fusion.

•
If the values of the uncertainty indicators, calculated on the basis of the fused data, are concerned, the best overall results have been obtained for the NARX method: it is reflected in the lowest values of the mean error and the median error, as well as the A ECDF indicator. Even though the values of the maximum error and the standard deviation of the errors are slightly greater when compared to the other methods, it does not affect the overall performance of the NARX method.
While the differences among the values of the A ECDF indicator may seem insignificant, it is worth noting that the decrease in its value by only a few hundredths may indicate a quite significant distortion of the estimated trajectories. In Figure 9, the estimates of the movement trajectories from EXP#1, obtained on the basis of the data fused by means of the NARX method and by means of the WINFNS method, minimisation of weighted infinity-norm [37], are presented; for the NARX method (Figure 9a), the value of the A ECDF indicator is 0.85, while for the WINFNS method (Figure 9b), it is 0.83. Moreover, even a small reduction of the uncertainty of the position estimates may have significant impact when those estimates are used for determining quantities carrying information important for medical and healthcare services. Finally, during the experiments, it has been observed that the number of neurons in the hidden layer varying from 6 to 10 does not influence significantly the results of the data fusion; however, in the case of a nonlinear autoregressive network with exogenous inputs, the increase in the number of the hidden neurons may make the training more challenging because more complex networks are prone to overfitting.

Experiment EXP#1
The values of the uncertainty indicators, obtained on the basis of the radar data, depth data, and fused data, acquired in EXP#1, are presented in Table 3.
If the EXP#1 results, obtained for the radar data and for the depth data, are considered, it can be noticed that:

•
In the case of the radar data, the number of turns is significantly underestimated. This may be explained by the smoothing of the radar data during their preprocessing: as a result, two consecutive turns are often treated as one.

•
In the case of the depth data, the travelled distance is significantly underestimated. This may be explained by the obstacles occluding the monitored person and making Figure 9. Estimates of the trajectories obtained in EXP#1, by means of the NARX method (a), and the WINFNS method described in [37] (b); for the NARX method, the value of the A ECDF indicator is 0.85, while for the WINFNS method, it is 0.83.
Finally, during the experiments, it has been observed that the number of neurons in the hidden layer varying from 6 to 10 does not influence significantly the results of the data fusion; however, in the case of a nonlinear autoregressive network with exogenous inputs, the increase in the number of the hidden neurons may make the training more challenging because more complex networks are prone to overfitting.

Experiment EXP#1
The values of the uncertainty indicators, obtained on the basis of the radar data, depth data, and fused data, acquired in EXP#1, are presented in Table 3. If the EXP#1 results, obtained for the radar data and for the depth data, are considered, it can be noticed that:

•
In the case of the radar data, the number of turns is significantly underestimated. This may be explained by the smoothing of the radar data during their preprocessing: as a result, two consecutive turns are often treated as one.

•
In the case of the depth data, the travelled distance is significantly underestimated. This may be explained by the obstacles occluding the monitored person and making the tracking impossible.

•
The radar-data-based and depth-data-based estimates of the mean walking speed are similarly accurate.
If the EXP#1 results obtained for the fused data are considered, it can be noticed that: • The values of the uncertainty indicators are generally lower than the uncertainty indicators obtained for the radar data and for the depth data.

•
The estimates of the number of turns, obtained for the MLP method and the NARX method, are considerably more accurate than those estimates obtained for the KF method-in the case of the last method, the number of turns for each realisation of the scenario has been overestimated by almost one. This may be explained by the corruption of the depth data related to "disappearing" of the monitored person behind the obstacles and the inability of the KF method to mitigate this phenomenon.

Experiment EXP#2
The dependence of the values of the uncertainty indicators on the walking speed is presented in Figures 10-12. The analysis of those results leads to the following conclusions:

•
In the case of the estimates of the number of turns, the values of the ME indicator, determined on the basis of the radar data, are generally lower than zero-meaning that, regardless of the walking speed, the number of turns is underestimated; on the other hand, the values of the ME indicator, determined on the basis of the data fused by the KF method, are greater than zero, which means that the number of turns is overestimated. The best results are obtained for the depth data and for the data fused by means of the MLP method and the NARX method-the values of the ME indicator, determined on the basis of those data, are not significantly affected by the walking speed of the person.

Conclusions
In this paper, the useability of feedforward and recurrent neural networks for fusion of data from impulse-radar sensors and depth sensors, in the context of healthcare-oriented monitoring of elderly persons, has been investigated. A new method for fusion of data has been proposed, viz., a method based on a nonlinear autoregressive network with exogenous inputs (NARX method), and compared with a method based on a multi-layer perceptron (MLP method) as well as one reference method based on a Kalman filter (KF method). All the artificial neural networks have been trained on synthetic data. An extensive programme of experimentation, designed for comparison of all the methods, has involved localisation of a person within a monitored area, and the estimation of several parameters of gait, considered important for medical experts, viz., the number of turns made during walking, the travelled distance, and the mean walking speed.
The results of experimentation have confirmed that the artificial neural networks trained on synthetic data may be effectively used for fusing the data from the impulse-radar sensors and from the depth sensors. The best overall results have been obtained for the NARX method: in the case of the recurrent neural network, the estimate of the position of a person is based on the estimates of that position at previous time instants; therefore, such network may effectively react to sudden changes in the measurement data, caused by stops and occlusions, and mitigate their influence on the estimated movement trajectory. This cannot be achieved for the feedforward neural network without a feedback loop because it treats the measurement data independently, without taking into account the movement history.
Funding: This work has been accomplished partially within the project PL12-0001, financially supported by EEA Grants-Norway Grants, and partially within the project 2017/25/N/ST7/00411, financially supported by the National Science Centre, Poland.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The author declares no conflict of interest.