Human Posture Identification Using a MIMO Array

Dai Sasakawa 1,*,† ID , Naoki Honma 1,†, Takeshi Nakayama 2,† and Shoichi Iizuka 2,† 1 Graduate School of Engineering, Iwate University, Morioka 020-8551, Japan; t5616003@iwate-u.ac.jp (D.S.); honma@iwate-u.ac.jp (N.H.) 2 Panasonic Corporation, Kadoma 571-8501, Japan; nakayama.takeshi@jp.panasonic.com (T.N.); iizuka.shoichi@jp.panasonic.com (S.I.) * Correspondence: sasadai1991@gmail.com; Tel.: +81-19-621-6945 † These authors contributed equally to this work.


Introduction
Recent studies have targeted smart home systems for safety-monitoring and energy saving.However, the aging of society raises new social concerns such as lonely deaths and traffic accidents by the elderly.The increase in the number of elderly people demands safety-monitoring systems that can obviate these concerns by detecting the posture of the elderly.Existing solutions for indoor use, such as networked video cameras [1] and wearable sensors [2,3], employ IoT (Internet of things) devices.However, the former is an invasion of privacy, particularly in spaces such as the bathroom and restroom.Some people also have an aversion to being continuously watched.Furthermore, such a system provides only line-of-sight (LOS) coverage.The latter allows the state of the subject to be discerned.However, it forces the elderly to wear a device and thus places excessive physical and mental burdens on the user.Such a system is also unsuitable for the elderly because observation is not possible when the person forgets to wear the device.
To avoid these problems, living-body sensing systems [4][5][6] have been studied.The use of microwaves yields several key advances including privacy protection, contactless observation, and non-line-of-sight (NLOS) coverage.Examples of microwave-based monitoring techniques include direction of arrival (DOA) and direction of departure (DOD) estimation based on multiple-input multiple-output (MIMO) radar systems [7,8].Though these methods can localize targets, they suffer from weak precision because the desired signal is buried by undesired waves due to the multi-path environment.To solve this problem, human localization methods suitable for multi-path environments have been proposed.There are three approaches to human localization: time difference of arrival (TDOA) estimation [9,10], object localization [11] based on the multiple signal classification (MUSIC) method [12], and the trigonometry methods based on DOA/DOD estimation using the MUSIC method [13][14][15].Though the TDOA methods can quickly localize targets even in multi-path environments by using frequency-modulated continuous-wave (FMCW) radar, this method is expensive as it requires a wide bandwidth of 1.79 GHz (from 5.46 to 7.25 GHz).Additionally, they need static channels, i.e., the environment must be measured without human interference in advance.Moreover, the measurements must be repeated if the environment is changed, e.g., a piece of furniture is shifted.Localization based on MUSIC [11] uses a low-frequency band, 250 MHz, and estimates the target location by using spherical-mode MUSIC to process the oscillating return signal.However, the array aperture is comparable to the estimated distance because of the low frequency, and this method requires observation periods of over 10 s.Trigonometry-based localization [13][14][15] uses MIMO radar with DOA estimation by the fast Fourier transform (FFT) technique [16].However, this method needs to observe the channel for several tens of seconds to accurately capture human activity information.
The authors have proposed a fast living-body localization algorithm where the time-differential channel is used to attain rapid DOA/DOD estimation in multi-path environments [17].This algorithm identifies the differences among observation times that correspond to cyclic human body activities such as respiration and heartbeat.The living-body locations are estimated by applying the two-dimensional MUSIC method [18] to the time-differential channel.This algorithm has an observation period that corresponds to one cycle of biological activity.Its key feature is that this method does not require calibration to the environment in advance.However, the above techniques can estimate only target location.That is unfortunate as estimating human posture is very important in detecting safety-related events such as falls.Therefore, human posture detection is needed for comprehensive safety-monitoring systems.
In this paper, we propose a human posture identification scheme that uses a MIMO array.This method estimates a three-dimensionally target location by using the time-differential channel technique [17]; the Doppler radar cross section (RCS) is calculated from the power reflected from the target and the distance between the estimated location and the receiver/transmitter.The human posture is identified by applying the nearest neighbor algorithm [19] to the estimated height and the Doppler RCS information.The three-dimensional localization procedure and the Doppler RCS calculation are described below.Experiments are carried out in an actual indoor environment to demonstrate that the proposed method can accurately estimate the human location and identify human posture with over 90% accuracy.

Human Posture Identification Based on Height Doppler RCS of Subject Estimated by MIMO Array
The authors previously proposed a fast localization algorithm that estimates subject locations by using the time-variant channel in multi-path environments [17].In this study, we apply the localization method to the observation channel, and the 3-dimensional location of the subject is estimated by MIMO radar.The Doppler RCS is calculated from the received power and the distance between transmitter/receiver and the estimated target location.Human posture is estimated from the height and the Doppler RCS of the subject as estimated by MIMO radar.The following text explains this method in detail.

Three Dimensional Localization of Human Subject
This study assumes a MIMO array consisting of an M r element array receiver and an M t element array transmitter.In a multi-path environment containing one person, the time-variant channel is generated by the fluctuation of the human body's surface due to body motion, respiration, and heartbeat.We start by expressing the M r × M t time-variant MIMO channel as where h ij is the complex channel response from the j-th transmitter element to the i-th receiver element, and t represents the observation time.M r × M t MIMO radar can be considered to be an M r M t × 1 virtual single-input multiple-output (SIMO) radar [7].The M r M t × 1 SIMO channel is expressed as where {•} T is transposition.Though DOA and DOD can be estimated using this virtual SIMO channel, unwanted path components disturb the estimation of the living-body location.Unwanted path components consist of the direct wave from transmitter to receiver and waves reflected from the walls, floor, and furniture, and these components are static.Therefore, we exclude the undesired components by applying the fast localization algorithm [17] to the converted SIMO channel; the time-differential channel is defined as where t sb represents the time difference.The instantaneous correlation matrix, using the time-differential channel h sb (t, t sb ) with observing time t and time difference t sb , is defined as where {•} H means complex conjugate transposition.To recover eigenvalue rank, a correlation matrix is calculated by averaging over the time, where both t and t sb are swept.This is expressed as where {•} is the averaging operator, T max and T min are the maximum and minimum periods corresponding to the biological activities, respectively.By eigenvalue decomposition, the averaged correlation matrix R is given by where U and Λ represent the eigenvector and the diagonal matrix representing eigenvalues, respectively.At this time, the eigenvalues, Λ, are related as follows: where σ 2 f represents the expected value of the energy of the channel fluctuation component due to noise.The eigenvector corresponding to noise, [u 2 , • • • , u M r M t ], is expressed as U N .In this study, subject location is estimated via three-dimensional MUSIC with a spherical mode vector-the original MUSIC method [12] extended to cover the 3D domain.The three-dimensional spherical mode vector a(x, y, z) is expressed as where a t (x, y, z) and a r (x, y, z) are the steering vectors at the transmitting and receiving side, respectively.⊗ represents the Kronecker product, and λ is wavelength.D r p is the distance between position (x, y, z) and the p-th receiver element, and D t q is the distance between position (x, y, z) and the q-th transmitter element, respectively.(x r p , y r p , z r p ) and (x t q , y t q , z t q ) are the positions of the p-th receiver element and the q-th transmitter element, respectively.Here, the evaluation function of the MUSIC method (MUSIC spectrum) is calculated as This MUSIC spectrum peak represents the estimated target location.

Doppler Radar Cross Section and Human Posture Identification
The first eigenvector, u 1 of Equation ( 7), corresponds to target location.The converted SIMO channel, h(t), (2), and the first eigenvector, u 1 , are multiplied to enhance the biological component of the target, and this signal y(t) is expressed as The observed signal, y(t), is Fourier-transformed, and is defined as F(ω).The received power P r (ω) is expressed as where ω represents frequency.Here, we define the Doppler radar cross section (RCS) by solving the radar range equation for σ; this is expressed as where R r and R t represent the distances of the estimated target location from the centers of the receiver and transmitter, respectively.P t represents the transmitting power, and G r and G t are the gains of the receiving antenna and transmitting antenna, respectively.f 1 and f 2 define the frequency range that encompasses the vital sign effects.First, we create training data of human posture.The dataset of the estimated height and the Doppler RCS with N trials are made.To exclude sample outliers, we use the data lying within 30-70% of the estimated height.The training data is made from every posture dataset.We evaluate the posture identification rate of the K-nearest neighbor (k-NN) algorithm [19].In this study, k is set to 1, and the one nearest neighbor classifier is used.

Experimental Condition and Measurement Setup
Table 1 and Figure 1 overview the measurement setup.The experiments used a 16 × 16 MIMO configuration.As shown in Figure 1, a single-pole 64 throw (SP64T) switch was used at the transmitting side.Though the exact observation time is not the same for all elements in the MIMO channel matrix, the time differences among the elements are so short compared to the vital activity that they can be ignored.A continuous wave (CW) signal of 2.47125 GHz was used.The transmitting power at the antennas was set to −28 dBm.The CW signal was split to the receiver side since accurate synchronization between transmitting and receiving sides is needed.At the receiver side, received signals are input to a down-converter unit by way of a low-noise amplifier (LNA) unit.
The down-converted baseband signals (I 1 , Q 1 ,∼I 16 , and Q 16 ) were digitized by a data-acquisition unit (DAQ) with a sampling frequency of 20 kHz.The snapshot rate of the MIMO channel is determined by the switching speed of the SP64T.In the experiments, the rate for taking a snapshot of the MIMO channel was set to 100 Hz.In this study, the observation time for localization and calculating Doppler RCS was set to 2.56 s.Thus, the number of snapshots was 256.The transmission components of S parameters were used as the propagation channel.The number of targets was determined following the MUSIC method.In this study, time difference t sb was set to 0.05 (s) ≤ t sb ≤ 2.5 (s) for localization, while the range of frequency was set from 0.39 to 10.16 Hz in calculating the Doppler RCS; antenna gain G r and G t was 4.96 dB, and the averaged gain value was from −40 • to 40 • .
Figure 3 shows the experimental environment.The experiment was carried out in a room containing desks and shelves.The room had concrete walls and its width, depth, and height were 7.0, 6.0, and 2.7 m, respectively.One side of the room had four windows.Figure 4 shows the subject posture when the channel was observed.When the channel was the subject assumed positions in which he was standing (a), sitting on a chair (b), sitting on the floor (c), and lying on his back (d); the measurement location was set to (X, Y) = (2.0,2.0) (m).The target faced the wall against which the antennas were set when the subject was standing and sitting.The subject lay down with his feet toward the wall against which the antennas were set, and the trunk of the subject was set at the measurement location.In all postures, the number of measurements was 500, and the number of trials of posture identification was 3000.

Results of Measured Channel and Three Dimensional Localization
Figure 5 shows an example of the time-variant channel response, h 11 (t), of the observation channel, H(t), for all postures.In this figure, "Static" indicates the channel response without a living-body.In comparison with the static channel, all time-variant channels with a living-body, Figure 5, exhibited change because the living-body's activities altered the path.The variation demonstrates the periodicity of biological activities such as respiration.Additionally, the standing posture yielded the largest channel response.Moreover, standing created non-periodic components due to gross body motion.The other positions, on the other hand, yielded far cleaner patterns, as biological activity was the dominant factor.Figure 6 shows an example of the MUSIC spectrum for living-body localization when the subject stood at (X = 2.0 m, Y = 2.0 m). Figure 6a,b are the XY plane and the ZX plane at the spectrum peak (i.e., estimated location), respectively.In these figures, the x-mark represents the spectrum peak, and the human silhouette is displayed on the MUSIC spectrum.In this case, the estimated location is (X, Y, Z) = (2.01,1.83, 0.88) (m).These figures confirm that the estimated location lies close to the human body, and the spectrum peak appears at the subject's abdomen because the abdomen exhibits the largest surface fluctuation.ZX planes when the subject sat on a chair at (X = 2.0 m, Y = 2.0 m), respectively.In this result, the estimated location is (X, Y, Z) = (1.98,1.76, 0.74) (m). Figure 7a shows that the estimated location (x-mark) appears on the subject's face.The result of Figure 7b shows that the spectrum peak tracks the subject's abdomen, and the estimated height is lower than that of the standing position.Figure 8a shows that the spectrum peak appears at the subject's location.Figure 8b shows that the estimated height was lower than that in the standing or sitting positions, and the peak appears close to the chest of the subject.Figure 9 shows examples of the MUSIC spectrum for living-body localization in XY and ZX planes when the subject lay on his back at (X = 2.0 m, Y = 2.0 m), respectively.The spectrum peak appears at (X, Y, Z) = (1.92,1.68, 0.11) (m).In this case, the subject's body looks large on the XY plane, the estimated location is again the abdomen of the subject.Additionally, the spectrum peak is the lowest among all states.Therefore, the results show that the estimated location depends on the subject's posture, and the peak generally matches the abdomen of the target.Figure 10 shows the cumulative distribution function (CDF) of the estimation error in the XY plane.The estimation error was defined by the Euclidean distance between the actual target location and the estimated location in the XY plane.Table 2 shows the results of Figure 10.The median, the 90% value, and the root mean square error (RMSE) of the estimation error are used as the evaluation metrics.For the standing position, the median, the 90% value, and the RMSE are 0.177 m, 0.271 m, and 0.201 m, respectively.The median, the 90% value, and the RMSE of the sitting-on-a-chair position are 0.101 m, 0.178 m, and 0.167 m, respectively.For the sitting-on-the-floor position, the median, the 90% value, and the RMSE are 0.124 m, 0.160 m, and 0.202 m, respectively.When the subject lay on his back, the median, the 90% value, and the RMSE of the estimation error were 0.237 m, 0.254 m, and 0.236 m, respectively.The subject has a width of about 0.4 m, and the center of the subject was always set on the vertical axis of the measurement location.Therefore, these results show that subject location can be accurately estimated for all postures.Figure 12 shows the confusion matrix of the identified posture classified by the proposed method.The number of the trials to identify each posture was 3000.In this figure, the true positive rate (TPR) of the standing, sitting-on-a-chair, sitting-on-the-floor, and lying positions were 92.4%, 96.5%, 91.2%, and 100%, respectively.The lying position yielded the highest TPR because its distribution is clearly different from the others, see Figure 11.Therefore, the proposed method can be used by safety-monitoring systems to detect that a person has fallen.Note that the sitting-on-the-floor position has the smallest TPR due to the large variation in height estimates.However, the TPR of the sitting-on-the-floor position, 91.2%, indicates high accuracy.The average TPR in our experiments was 95.0%.Therefore, it is confirmed that human posture can be accurately identified by the proposed method.

Conclusions
This paper proposes and demonstrates a human posture identification method that uses two microwave (2.47125 GHz) MIMO arrays.This method is a non-contact and non-wearable technique and thus well suits smart home applications.The MIMO antenna arrays measures the time-variant channel created by the subject, and the target location is estimated in three dimensions by using the time-differential channel technique and the Doppler RCS.The key is detecting the target's vital signs from the power reflected from the target and the relationship between the estimated location and the receiver/transmitter. Human posture is identified by applying the nearest neighbor algorithm to the estimated height and the Doppler RCS information.Experiments were carried out in an indoor environment.The results showed that all postures had RMSE values of within 0.25 m.The results demonstrated that the proposed method identified the supine posture with 100% accuracy, and the average TPR was 95.0%.Therefore, we confirmed that the proposed method can identify human posture with very high accuracy.

Figure 2
Figure2shows a photo of the array antenna.The receiver and transmitter arrays have 16 patch antennas in a vertical 4 × 4 array.All array antennas used a PTFE substrate, and an antenna thickness, width, and height were 1.6, 60, and 240 mm, respectively.All antenna elements have vertical polarization.The element space of the arrays of receiver and transmitter was half wavelength.The array's center was set to h = 0.8 m, the trunk height of the subjects.The straight line distance between transmitting and receiving antennas was set to 4.0 m.The receiver and the transmitter faced the center of the room.

Figure 4 .
Figure 4. Photo of a target undergoing measurement.(a) The subject standing.(b) The subject sitting on a chair.(c) The subject sitting on the floor.(d) The subject lying on his back.

Figure 5 .
Figure 5. Example of the time-variant channel response.

Figure 6 .
Figure 6.Example of the multiple signal classification (MUSIC) spectrum for localization when the subject stood at (X = 2.0 m, Y = 2.0 m).(a) Example of the MUSIC spectrum in the XY plane.(b) Example of the MUSIC spectrum in the ZX plane.

Figure
Figure7a,bshow examples of the MUSIC spectrum for living-body localization in the XY and ZX planes when the subject sat on a chair at (X = 2.0 m, Y = 2.0 m), respectively.In this result, the estimated location is (X, Y, Z) = (1.98,1.76, 0.74) (m).Figure7ashows that the estimated location (x-mark) appears on the subject's face.The result of Figure7bshows that the spectrum peak tracks the subject's abdomen, and the estimated height is lower than that of the standing position.

Figure 7 .
Figure 7. Example of the MUSIC spectrum for localization when the subject sat on a chair at (X = 2.0 m, Y = 2.0 m).(a) Example of the MUSIC spectrum in the XY plane; (b) Example of the MUSIC spectrum in the ZX plane.

Figure 8
Figure 8 shows an example of the MUSIC spectrum for living-body localization when the subject sat on the floor at (X = 2.0 m, Y = 2.0 m).The estimated location is (X, Y, Z) = (2.01,1.83, 0.55) (m).Figure8ashows that the spectrum peak appears at the subject's location.Figure8bshows that the estimated height was lower than that in the standing or sitting positions, and the peak appears close to the chest of the subject.

Figure 8 .
Figure 8. Example of the MUSIC spectrum for localization when the subject sat on the floor at (X = 2.0 m, Y = 2.0 m).(a) Example of the MUSIC spectrum in the XY plane.(b) Example of the MUSIC spectrum in the ZX plane.

Figure 9 .
Figure 9. Example of the MUSIC spectrum for localization when the subject lay on his back at (X = 2.0 m, Y = 2.0 m).(a) Example of the MUSIC spectrum in the XY plane.(b) Example of the MUSIC spectrum in the ZX plane.

Figure 10 .Figure 11 .
Figure 10.Cumulative distribution function (CDF) of the estimation error of all postures in the XY plane.

Figure 12 .
Figure 12.Confusion matrix of the identified posture classified by the proposed method: true positive rate (TPR).

Table 2 .
All posture estimation error of the median, the 90% value, and the Root Mean Square Error (RMSE).