Fall Detection System Based on Point Cloud Enhancement Model for 24 GHz FMCW Radar

Automatic fall detection plays a significant role in monitoring the health of senior citizens. In particular, millimeter-wave radar sensors are relevant for human pose recognition in an indoor environment due to their advantages of privacy protection, low hardware cost, and wide range of working conditions. However, low-quality point clouds from 4D radar diminish the reliability of fall detection. To improve the detection accuracy, conventional methods utilize more costly hardware. In this study, we propose a model that can provide high-quality three-dimensional point cloud images of the human body at a low cost. To improve the accuracy and effectiveness of fall detection, a system that extracts distribution features through small radar antenna arrays is developed. The proposed system achieved 99.1% and 98.9% accuracy on test datasets pertaining to new subjects and new environments, respectively.


Introduction
According to a report from the World Health Organization, approximately 28-35% of older adults fall each year, leading to serious injury or death [1].Therefore, intelligently detecting falls in indoor conditions can reduce the risk of the elderly injuring themselves.
Various technologies have been adopted to detect falls.Existing fall detection methods require wearable sensors [2].Accelerometers have been widely used in wearable methods, and a velocity threshold can be set to detect fall events [3,4]; however, these may be forgotten because of their inconvenience.Vision-based methods eliminate the need to wear something, but they are costly, sensitive to the lighting conditions, and invade privacy [5,6].Recently, radar sensors have become more popular in fall detection system due to the advantages compared with other sensing technologies: (a) convenience over wearable technologies [7]; (b) high sensitivity to motion compared to depth sensors in complex living environments and weak lighting conditions; (c) privacy compliance over vision sensors [8]; and (d) low hardware cost compared with other sensors [9].Typical radars for human fall detection are continuous wave (CW) radars and frequency-modulated continuous wave (FMCW) radars.A CW radar signal was converted into the time-frequency domain and extracted artificial features for detecting a falling person [10,11].Doppler-time signatures recorded in the CW signals were used for the training of a machine learning algorithm [12].However, CW radar can only provide velocity information.Due to the lack of information richness, actions with similar postures, such as sitting and squatting, may lead to inaccuracies.A better choice is to use an FMCW radar, which can simultaneously provide the range, Doppler, and angle information of the targets and also high sensitivity to motion [13].
Traditionally, researchers have explored serval methods based on FMCW radars, which range from 57-85 GHz [14].The Doppler information could describe the velocity attribute of a motion; thus, the range-Doppler map has been widely used in FMCW radarbased fall detection methods proposed in the literature [15][16][17][18].The Doppler-time map, including time information, was directly used as a feature to detect fall events [18].Many studies on FMCW radar-based fall detection rely on the time-frequency characteristics of the FMCW radar return signal, including the frequency magnitude, frequency ratio, and the duration of the motion [19].However, similar to CW radar-based methods, these methods cannot provide spatial information, and similar motions may lead to inaccuracies.Micro-Doppler and spatial information have been used to achieve high accuracy, proving that deep learning methods are superior to traditional artificial feature extraction methods [20].An FMCW radio has been used to obtain the 3D position information of the human body and heatmaps in both horizontal and vertical directions [17].However, there is still a problem in combining 3D location information: achieving high angular resolution using radars with large antenna arrays.
To utilize 3D spatial information, recent innovations in human activity detection have explored point clouds from radar [21][22][23], in which each point contains a 3D position in space.However, in contrast to LiDAR and camera sensors, there are two main challenges in these studies: (1) the point clouds generated by mmWave radar are usually sparse and of low resolution, and (2) the point clouds include many ghost points caused by the multipath effect.As a result, the classification accuracy and reliability may be negatively affected.To address these challenges, several methods have been designed for use in conjunction with high-resolution radars.12Txs-16Rxs antenna arrays have been used to generate high-quality point clouds [22].Hawkeye generated 2D depth images using radar intensity maps obtained from SAR scans [23].However, although large antenna arrays and SAR technologies can improve the resolution, they are still very slow and may not be practical in many applications that require a short response time and low-cost hardware.In addition, sparsity-related methods and deep learning-based methods have been used for point clouds' quality enhancement [24].Some sparsity-related methods, such as K-mean [25] and density-based spatial clustering of applications with noise (DBSCAN) algorithm [26], have been used to cluster firstly to remove outliers in the point clouds.However, these technologies could not adequately remove a sufficient number of outlier points.In recent studies, a few deep learning-based methods have been developed based on PointNet [27].After learning a mapping from the noisy input, they can automatically generate a set of clean points.Inspired by the PointNet, PCN combined a permutation invariant and non-convolutional feature extractor to complete a point cloud from a partial input, and then used a refinement block to denoise the prediction to produce the final point cloud [28].GPDNet was applicable to denoise point clouds based on graph-convolutional layers [29].However, most of these methods used LiDAR or other sensor applications and extracted pointwise features.Hence, they may not be efficient for radar-based point clouds because of their very low resolution.
In this study, we propose an FMCW radar-based fall detection method that investigates 3D point clouds while operating at 24 GHz.These systems have not been studied well owing to hardware limitations.First, we obtained raw point clouds from the radar.We then designed a new model to transform the raw points into high-quality point clouds that are closer to the ground truth.Next, we estimated the distribution parameters in the point clouds for classification.
The main contributions of this paper are as follows: (1) We propose an efficient fall detection system that uses a small, low-cost radar.As shown in Figure 1, the novel framework is primarily composed of three parts: point cloud enhancement (PCE) for point cloud quality improvement, a feature extractor for human pose parameter extraction, and a classifier for classifying normal events and fall events; (2) A PCE model is introduced to transform low-quality point clouds into high-quality point clouds and to reconstruct the shape of the point clouds using the shape of human (3) Our system works on sparse and noisy raw radar data without using expensive hardware or synthetic aperture radar (SAR) scans.
The remainder of this article is organized as follows.Section 2 provides an overview of the radar system and signal processing flow.In Section 3, the details of the proposed method are presented.The results are discussed in Section 4. Finally, Section 5 concludes this study.
Sensors 2024, 24, x FOR PEER REVIEW 3 of 17 (2) A PCE model is introduced to transform low-quality point clouds into high-quality point clouds and to reconstruct the shape of the point clouds using the shape of human body.A novel 3D point-box hybrid regression loss function based on pointwise and 3D bounding box features is proposed as a substitute for the traditional loss function; (3) Our system works on sparse and noisy raw radar data without using expensive hardware or synthetic aperture radar (SAR) scans.
The remainder of this article is organized as follows.Section 2 provides an overview of the radar system and signal processing flow.In Section 3, the details of the proposed method are presented.The results are discussed in Section 4. Finally, Section 5 concludes this study.

Radar System Design
An FMCW radar is often used to estimate the range and velocity of a target using its radio frequency signals.Each transmitted signal s t is a chirp signal, the analytical expression [30] of which is where  denotes the carrier frequency, and  is the chirp constant.
Referring to the Boulic model [31], the echo from the ith human body part is a timedelayed vision of the transmitted signal, and the received baseband signal can be expressed as where  is the distance between the  ellipsoidal center and the radar, and  is the speed of light.The echo from the entire human body can be expressed as where  is the attenuation coefficient, which is governed by the radar cross section for different body regions, and M is the number of scattering points of human body parts.As shown in Figure 2, the raw data from each channel generated a 3D data cube.FMCW radar signal processing started with the sampled echoes that were transferred to the range-Doppler matrix [32].First, the range of the fast Fourier transform (FFT) assisted in estimating the range of the targets, and the second FFT determined the Doppler frequency.The moving target indicator distinguished the targets from the background.For more reliable detection, a constant false alarm rate was used to detect the targets against the background noise.A direction-of-arrival (DOA) estimation algorithm was used to estimate the angle of the target.
For angle estimation, both 3D FFT and the multiple signal classification (MUSIC) algorithm [33] are popular methods.Compared with the MUSIC algorithm, 3D FFT has computing efficiency, but the resolution of 3D FFT is not sufficient for detecting closely

Radar System Design
An FMCW radar is often used to estimate the range and velocity of a target using its radio frequency signals.Each transmitted signal s(t) is a chirp signal, the analytical expression [30] of which is where f 0 denotes the carrier frequency, and K is the chirp constant.
Referring to the Boulic model [31], the echo from the ith human body part is a time-delayed vision of the transmitted signal, and the received baseband signal can be expressed as where R i is the distance between the i th ellipsoidal center and the radar, and c is the speed of light.The echo from the entire human body can be expressed as where η i is the attenuation coefficient, which is governed by the radar cross section for different body regions, and M is the number of scattering points of human body parts.As shown in Figure 2, the raw data from each channel generated a 3D data cube.FMCW radar signal processing started with the sampled echoes that were transferred to the range-Doppler matrix [32].First, the range of the fast Fourier transform (FFT) assisted in estimating the range of the targets, and the second FFT determined the Doppler frequency.The moving target indicator distinguished the targets from the background.For more reliable detection, a constant false alarm rate was used to detect the targets against the background noise.A direction-of-arrival (DOA) estimation algorithm was used to estimate the angle of the target.
For angle estimation, both 3D FFT and the multiple signal classification (MUSIC) algorithm [33] are popular methods.Compared with the MUSIC algorithm, 3D FFT has computing efficiency, but the resolution of 3D FFT is not sufficient for detecting closely spaced points.To obtain a better image of the human body and low computing cost, based on TDM-MIMO antenna arrays in Figure 3, after we obtained eight virtual receiver samples, a 3D FFT was used to obtain the azimuth angle θ az , and the MUSIC algorithm was used to estimate the elevation angle θ el .The output spherical coordinates were converted into Cartesian coordinates using transfer matrix T: where R is the distance between the target and the radar.After the transformation, we obtained the position values [x, y, z] of each point in each frame in Cartesian coordinates.
Sensors 2024, 24, x FOR PEER REVIEW 4 of 17 spaced points.To obtain a better image of the human body and low computing cost, based on TDM-MIMO antenna arrays in Figure 3, after we obtained eight virtual receiver samples, a 3D FFT was used to obtain the azimuth angle  , and the MUSIC algorithm was used to estimate the elevation angle  .The output spherical coordinates were converted into Cartesian coordinates using transfer matrix T: where  is the distance between the target and the radar.After the transformation, we obtained the position values , ,  of each point in each frame in Cartesian coordinates.

Proposed Method
The point clouds detected from the range-Doppler map are sensitive to the environment.Figure 4 shows different point distributions generated in different indoor environments, in which obstacles cause severe multipath interference and affect the performance of fall detection systems in new environments.spaced points.To obtain a better image of the human body and low computing cost, based on TDM-MIMO antenna arrays in Figure 3, after we obtained eight virtual receiver samples, a 3D FFT was used to obtain the azimuth angle  , and the MUSIC algorithm was used to estimate the elevation angle  .The output spherical coordinates were converted into Cartesian coordinates using transfer matrix T: where  is the distance between the target and the radar.After the transformation, we obtained the position values , ,  of each point in each frame in Cartesian coordinates.

Proposed Method
The point clouds detected from the range-Doppler map are sensitive to the environment.Figure 4 shows different point distributions generated in different indoor environments, in which obstacles cause severe multipath interference and affect the performance of fall detection systems in new environments.

Proposed Method
The point clouds detected from the range-Doppler map are sensitive to the environment.Figure 4 shows different point distributions generated in different indoor environments, in which obstacles cause severe multipath interference and affect the performance of fall detection systems in new environments.
spaced points.To obtain a better image of the human body and low computing cost, based on TDM-MIMO antenna arrays in Figure 3, after we obtained eight virtual receiver samples, a 3D FFT was used to obtain the azimuth angle  , and the MUSIC algorithm was used to estimate the elevation angle  .The output spherical coordinates were converted into Cartesian coordinates using transfer matrix T: where  is the distance between the target and the radar.After the transformation, we obtained the position values , ,  of each point in each frame in Cartesian coordinates.

Proposed Method
The point clouds detected from the range-Doppler map are sensitive to the environment.Figure 4 shows different point distributions generated in different indoor environments, in which obstacles cause severe multipath interference and affect the performance of fall detection systems in new environments.Therefore, we propose a fall detection method based on a PCE model.A flowchart of the proposed method is presented in Figure 5.The flow of proposed method consists of four steps: (1) to satisfy the input of the PCE model, the number of point clouds of a motion pattern could be extended to fixed number; (2) after raw point clouds from the radar in pre-processing, PCE model removes noise points and generates high-quality point clouds; (3) these point clouds are then fed into the feature extractor for human pose parameter extraction; (4) a lightweight classifier is used for classifying normal and fall events.In this section, the functionality of each component is described.More detailed descriptions are as follows.

Pre-Processing
The motion window accumulates L frames of the point cloud from the radar, where L is determined by the length of an action such as walking, squatting, and falling.For each action pattern, we extracted the 3D vectors of , ,  for each point in every frame.In each frame, the number of radar point clouds was random owing to the nature of the FMCW radar measurement.To satisfy the input of the PCE model, zero padding was used for data oversampling.Thus, the number of point clouds can be extended to a fixed number.We obtained the motion pattern where l is the frame index of the motion, M is the number of points from radar in each frame, and  is the  point in the  frame, which is a vector of  ,  ,  .
Simultaneously, the Azure Kinect was used as a reference sensor because of its high precision in extracting human skeletal keypoints.Ignoring some inessential points, the desired 27 skeletal keypoints returned from Azure Kinect were also accompanied by a radar time stamp in system time and served to label the ground truth in the training process.

PCE Model
Although point clouds have already been obtained from range-Doppler maps, the resulting points are still low-resolution, very noisy, and affect the classification accuracy and reliability.In addition, it is unnecessary for a fall detection system to increase the computational cost of reconstructing each point.Thus, to improve the quality of these point clouds and increase classification reliability, we propose a PCE model that aims to minimize the difference between the shapes of the radar point clouds and the ground truth.An overview of the proposed PCE architecture is shown in Figure 6.

Pre-Processing
The motion window accumulates L frames of the point cloud from the radar, where L is determined by the length of an action such as walking, squatting, and falling.For each action pattern, we extracted the 3D vectors of [x, y, z] for each point in every frame.In each frame, the number of radar point clouds was random owing to the nature of the FMCW radar measurement.To satisfy the input of the PCE model, zero padding was used for data oversampling.Thus, the number of point clouds can be extended to a fixed number.We obtained the motion pattern where l is the frame index of the motion, M is the number of points from radar in each frame, and p l m is the m th point in the l th frame, which is a vector of x l m , y l m , z l m .Simultaneously, the Azure Kinect was used as a reference sensor because of its high precision in extracting human skeletal keypoints.Ignoring some inessential points, the desired 27 skeletal keypoints returned from Azure Kinect were also accompanied by a radar time stamp in system time and served to label the ground truth in the training process.

PCE Model
Although point clouds have already been obtained from range-Doppler maps, the resulting points are still low-resolution, very noisy, and affect the classification accuracy and reliability.In addition, it is unnecessary for a fall detection system to increase the computational cost of reconstructing each point.Thus, to improve the quality of these point clouds and increase classification reliability, we propose a PCE model that aims to minimize the difference between the shapes of the radar point clouds and the ground truth.An overview of the proposed PCE architecture is shown in Figure 6.

Encoder
The encoder is responsible for extracting global features from incomplete point clouds.As shown in Figure 6, the encoder comprises three consecutive gate recurrent units (GRU) [34] followed by a dense layer.Using GRU mechanisms in a machine learning net-

Encoder
The encoder is responsible for extracting global features from incomplete point clouds.As shown in Figure 6, the encoder comprises three consecutive gate recurrent units (GRU) [34] followed by a dense layer.Using GRU mechanisms in a machine learning network is a relatively new time-series architecture compared with recurrent neural networks (RNNs) and long short-term memories (LSTMs).GRUs can retain long-term dependencies and are also faster for training compared with traditional LSTMs because there are fewer gates to update.As shown in Figure 6, the units of the three GRU layers are 32, 32, and 16.The input data X RP = x RP  1 , x RP 2 , . . .x RP N are first reconstructed by the hidden parts, and the output of the encoder is a series of vectors {h 1 , h 2 , . . .h b }.Thus, the encoder process is given by where W e,b and b e,b are the encoder weight matrix and bias vector for the bth node (b = 1, 2, . ..B).

Decoder
The decoder consists of three consecutive GRUs followed by a dense layer.The units of the three GRU layers are 32.The output from the GRU is where where W d and b d are the weights and biases of the output layer, respectively.After the sampling layer, an enhanced point cloud X EP is obtained.

Loss Function
A simple method for extracting pointwise features in 3D space is supervised by the mean squared error (MSE).However, directly supervising the pointwise features of point clouds may not utilize 3D receptive field information.The observations in Figure 7 indicate that the change in the 3D receptive field bounding box of a falling human body is different from that of a walking human body.Specifically, changes in the bounding box of a human body are related to its pose.Therefore, we not only consider pointwise features but also detect fall events by characterizing the uniqueness of bounding box changes for such poses.To use the bounding box of the radar point clouds for fall detection, the position and shape of the predicted box should be closely related to the corresponding ground truth.In this manner, we propose a 3D point-box hybrid regression loss to reduce the error between the predicted radar bounding box and ground truth.Previous studies [35,36] found intersection over union (IoU) loss and generalized IoU functions for 2D boxes using only the width and height of the boxes without considering the direction of the boxes.In addi- To use the bounding box of the radar point clouds for fall detection, the position and shape of the predicted box should be closely related to the corresponding ground truth.In this manner, we propose a 3D point-box hybrid regression loss to reduce the error between the predicted radar bounding box and ground truth.Previous studies [35,36] found intersection over union (IoU) loss and generalized IoU functions for 2D boxes using only the width and height of the boxes without considering the direction of the boxes.In addition, it is difficult to provide a specific formula describing the intersection between two 3D bounding boxes because a variety of cases must be considered.Some previous studies projected a 3D bounding box onto two 2D bounding boxes, but this did not increase the accuracy because of lack of direction [37].
In this study, because most of the measured human motion states were symmetrical along the x-axis (as shown in Figure 8), a 3D bounding box was obtained based on the 2D rotated IoU by multiplying the length along the x-axis.As shown in Figure 9, the intersection of the predicted box and ground truth included a variety of polygons, which was the sum of the areas of all the triangles.Therefore, the 3D IoU can be expressed as where Area OL is the area of overlap between the two boxes; Area r and Area GT are the areas of the predicted box from radar and ground truth, respectively; x OL is the overlap on the x-axis; and x r and x GT are the lengths of the predicted box and ground truth along the x-axis, respectively.To use the bounding box of the radar point clouds for fall detection, the position and shape of the predicted box should be closely related to the corresponding ground truth.In this manner, we propose a 3D point-box hybrid regression loss to reduce the error between the predicted radar bounding box and ground truth.Previous studies [35,36] found intersection over union (IoU) loss and generalized IoU functions for 2D boxes using only the width and height of the boxes without considering the direction of the boxes.In addition, it is difficult to provide a specific formula describing the intersection between two 3D bounding boxes because a variety of cases must be considered.Some previous studies projected a 3D bounding box onto two 2D bounding boxes, but this did not increase the accuracy because of lack of direction [37].
In this study, because most of the measured human motion states were symmetrical along the x-axis (as shown in Figure 8), a 3D bounding box was obtained based on the 2D rotated IoU by multiplying the length along the x-axis.As shown in Figure 9, the intersection of the predicted box and ground truth included a variety of polygons, which was the sum of the areas of all the triangles.Therefore, the 3D IoU can be expressed as where  is the area of overlap between the two boxes;  and  are the areas of the predicted box from radar and ground truth, respectively;  is the overlap on the x-axis; and  and  are the lengths of the predicted box and ground truth along the x-axis, respectively.Additionally, an accurate center [ ,  ,  is a prerequisite for predicting an accurate bounding box.An accurate width along the y-axis and an accurate height along the z-axis also contribute to maximizing the fall detection accuracy.Based on the above observations, the box-based loss function is expressed as Additionally, an accurate center [x center , y center , z center ] is a prerequisite for predicting an accurate bounding box.An accurate width along the y-axis and an accurate height along the z-axis also contribute to maximizing the fall detection accuracy.Based on the above observations, the box-based loss function is expressed as where b r and b GT are the center positions of the predicted box and ground truth box, respectively; l e , w e , and h e are the length, width, and height of the enclosing box, respectively; l r , w r , and h r are the length, width, and height of the predicted box, respectively; and l GT , w GT , and h GT are the length, width, and height of the ground truth box, respectively.In summary, the 3D point-box hybrid regression loss consisted of a 3D bounding box IoU loss and a pointwise loss, which can be described as Loss HyLoss = Loss point + Loss box (11) where Loss point is the position loss of the points for optimizing the IoU of the human body and the ground reflection points.
where  and  are the center positions of the predicted box and ground truth box, respectively;  ,  , and ℎ are the length, width, and height of the enclosing box, respectively;  ,  , and ℎ are the length, width, and height of the predicted box, respectively; and  ,  , and ℎ are the length, width, and height of the ground truth box, respectively.In summary, the 3D point-box hybrid regression loss consisted of a 3D bounding box IoU loss and a pointwise loss, which can be described as where  is the position loss of the points for optimizing the IoU of the human body and the ground reflection points.

Feature Extraction
We used a lightweight CNN for classification.The architecture of the lightweight CNN is shown in Figure 10.The CNN included an input layer sequence, one convolution layer, one dense layer, and a softmax layer.The feature parameters from one frame included  ,  ,  , , , ℎ,  , and L frames of 7 1 feature parameters were first subjected to the convolution layer.The convolution layer captured the movement features, consisting of eight hidden neurons with a kernel size of eight.The convolution layer employed rectified linear units as activation functions for the hidden layers.The  7 8 output from the convolution layer was fed into the dense layer.The softmax function was used in the final dense layer for classification.

Feature Extraction
We used a lightweight CNN for classification.The architecture of the lightweight CNN is shown in Figure 10.The CNN included an input layer sequence, one convolution layer, one dense layer, and a softmax layer.The feature parameters from one frame included [x center , y center , z center , w, l, h, θ], and L frames of 7 × 1 feature parameters were first subjected to the convolution layer.The convolution layer captured the movement features, consisting of eight hidden neurons with a kernel size of eight.The convolution layer employed rectified linear units as activation functions for the hidden layers.The L × 7 × 8 output from the convolution layer was fed into the dense layer.The softmax function was used in the final dense layer for classification.

Setup and Data Collection
We used the 24 GHz mmWave FMCW radar from ICLegend Micro for data collection.The radar sensor had two transmitting antennas and four receiving antenna channels.The radar parameter configurations are listed in Table 1.
To evaluate the proposed system, we set up the experiments and collected data in two different indoor environments.The experimental setup is shown in Figure 11.Room A, shown in Figure 11b, was a relatively empty office, and room B, shown in Figure 11c, was a bedroom where obstacles caused severe multipath and motion interference.The elevation field of view (FOV) of the Kinect sensor is 135 degrees, and the elevation FOV of the radar is 90 degrees.To capture more of a scene, the radar and Kinect sensors were both mounted at heights of 1.3 m.We collected 17,600 frames of data from 13 subjects performing fall and non-fall actions.The participants were aged between 20 and 30 years, and their heights were between 165 and 177 cm.

Setup and Data Collection
We used the 24 GHz mmWave FMCW radar from ICLegend Micro for data collection.The radar sensor had two transmitting antennas and four receiving antenna channels.The radar parameter configurations are listed in Table 1.
To evaluate the proposed system, we set up the experiments and collected data in two different indoor environments.The experimental setup is shown in Figure 11.Room A, shown in Figure 11b, was a relatively empty office, and room B, shown in Figure 11c, was a Sensors 2024, 24, 648 9 of 16 bedroom where obstacles caused severe multipath and motion interference.The elevation field of view (FOV) of the Kinect sensor is 135 degrees, and the elevation FOV of the radar is 90 degrees.To capture more of a scene, the radar and Kinect sensors were both mounted at heights of 1.3 m.We collected 17,600 frames of data from 13 subjects performing fall and non-fall actions.The participants were aged between 20 and 30 years, and their heights were between 165 and 177 cm.Furthermore, we divided these frames into three datasets: Dataset0.This dataset included 10,860 frames from five participants who performed the experiments in room A. After data argumentation, including shifting, rotating, and adding noise points, the dataset included 13,560 frames.
Dataset1.This dataset included 3040 frames from four participants who performed the experiments in room A.
Dataset2.This dataset included 3700 frames from four participants who performed the experiments in room B.
For each dataset, the motion included falling backward and falling forward, sitting on a chair, jumping, walking, or squatting.The ratio of the number of fall samples to non-fall samples was 1:2.

Evaluation
In this section, we present the experimental results of the proposed model in a fall detection system.All algorithms were realized based on Python code.The computation platform used was a laptop with a 2.60 GHz Intel(R) Core (TM) i7-10750H CPU.

Point Cloud Quality
Before training the proposed PCE model, the 3D radar data and ground truth skeletal points from the Azure Kinect were extracted, as described in Section IV.Then, L frames of data from the radar were combined sequentially to obtain a 3D tensor (  3).We then evaluated the PCE model for  1,2, … 10 by generating 10 distinct datasets, 0 , 1 and 2 , where  ∈ 1,2, … 10 is the frame index.The proposed PCE model  was trained on 0 , and the trained PCE model  was subjected to 1 and 2 , which did not participate in the training.Figure 12 shows predicted points.We selected two frames of each motion for comparison against the ground truth.In addition, because of the nature of the FMCW radar measurement, an exact and complete match of [x, y, z] between the predicted and ground truth keypoints was simply unrealistic; therefore, finding a method for evaluating the predicted results was crucial.

Evaluation
In this section, we present the experimental results of the proposed model in a fall detection system.All algorithms were realized based on Python code.The computation platform used was a laptop with a 2.60 GHz Intel(R) Core (TM) i7-10750H CPU.

Point Cloud Quality
Before training the proposed PCE model, the 3D radar data and ground truth skeletal points from the Azure Kinect were extracted, as described in Section IV.Then, L frames of data from the radar were combined sequentially to obtain a 3D tensor (L × M × 3).We then evaluated the PCE model for L = {1, 2, . . .10} by generating 10 distinct datasets, Dataset0 i , Dataset1 i and Dataset2 i , where i ∈ {1, 2, . . .10} is the frame index.The proposed PCE model M i was trained on Dataset0 i , and the trained PCE model M i was subjected to Dataset1 i and Dataset2 i , which did not participate in the training.Figure 12 shows predicted points.We selected two frames of each motion for comparison against the ground truth.In addition, because of the nature of the FMCW radar measurement, an exact and complete match of [x, y, z] between the predicted and ground truth keypoints was simply unrealistic; therefore, finding a method for evaluating the predicted results was crucial.
Before training the proposed PCE model, the 3D radar data and ground truth skeletal points from the Azure Kinect were extracted, as described in Section IV.Then, L frames of data from the radar were combined sequentially to obtain a 3D tensor (  3).We then evaluated the PCE model for  1,2, … 10 by generating 10 distinct datasets, 0 , 1 and 2 , where  ∈ 1,2, … 10 is the frame index.The proposed PCE model  was trained on 0 , and the trained PCE model  was subjected to 1 and 2 , which did not participate in the training.Figure 12 shows predicted points.We selected two frames of each motion for comparison against the ground truth.In addition, because of the nature of the FMCW radar measurement, an exact and complete match of [x, y, z] between the predicted and ground truth keypoints was simply unrealistic; therefore, finding a method for evaluating the predicted results was crucial.To evaluate the quality of the point clouds, we calculated the average 3D IoU for every L frame scenario across the f test samples in the dataset according to the formula 3  ∑ 3  , ∀ ∈ 1,2, … 10 .The average 3D IoU results for all L frames from Dataset1 and Dataset2 are outlined in Table 2 and Table 3, respectively.In the 10th frame, the average 3D IoU is lower than that in the other frames.The reason for this result was that sometimes the number of points was too small because the motion may have already been completed in advance.We also compared the raw point clouds from the radar and the DBSCAN method, which is the most popular method for removing outliers and improving the quality of point clouds.The proposed method outperformed the other methods.In addition, the PCE with the proposed HyLoss function outperformed the traditional MSE loss function.To evaluate the quality of the point clouds, we calculated the average 3D IoU for every L frame scenario across the f test samples in the dataset according to the formula 3DIoU i = 1 N ∑ N j=1 3DIoU ij , ∀i ∈ {1, 2, . . .10}.The average 3D IoU results for all L frames from Dataset1 and Dataset2 are outlined in Tables 2 and 3, respectively.In the 10th frame, the average 3D IoU is lower than that in the other frames.The reason for this result was that sometimes the number of points was too small because the motion may have already been completed in advance.We also compared the raw point clouds from the radar and the DBSCAN method, which is the most popular method for removing outliers and improving the quality of point clouds.The proposed method outperformed the other methods.In addition, the PCE with the proposed HyLoss function outperformed the traditional MSE loss function.Furthermore, we computed the centroids of the coordinates of the point clouds.For accuracy, the centroids of the coordinates of the predicted point clouds were compared with the centroids of the coordinates of the ground truth labels by calculating the mean absolute error (MAE) in the x-, y-, and z-coordinates.The centroid of the coordinates was . The average MAEs for all L frames tested in Dataset1 and Dataset2 are shown in Figure 13.We also compared the raw point clouds from the radar, DBSCAN, and PCE models using the traditional MSE loss function instead of the proposed HyLoss.The average MAE of the predicted method was comparable to that of the other methods.In other words, the localization of the centroid from the PCE model was the closest to the ground truth.In addition, we computed the average MAE of the width, depth, and height of the bounding box along the x-, y-, and z-axis.As shown in Figure 14, the width, depth, and height of the predicted bounding box were significantly lower than those of the other methods for both Dataset1 and Dataset2.This result indicates that the PCE model with the proposed HyLoss offers a better image of the human body for a wide range of people and environments.

Classification for Fall Detection
In terms of fall detection, the predicted high-quality point clouds of Dataset0, Da-taset1, and Dataset2 were fed into the feature extractor to obtain the human pose parameters.The lightweight CNN shown in Figure 9 was used to classify normal and fall events.4. Every encoder is denoted by E, and every decoder is denoted by D. For example, the proposed PCE model can be described as raw+32E+32E+16E+32D.Clearly, the proposed architecture is suitable.In addition, a comparison of the PCE with the traditional MSE loss function revealed that the accuracy, recall, and F1 score of the proposed HyLoss function outperformed those of the traditional function, which shows that the proposed HyLoss function improves the fall detection system.The purpose of PCE is to improve the reliability of the classification during fall detection tasks.Therefore, we compared the performance of the PCE classification reliability (instead of position errors or IoU) with existing point cloud models.First, the predicted points are shown in Figure 9.The results are listed in Table 5.The accuracy of the raw points and DBSCAN method for Dataset0 were higher than 0.98 because the test data included some people in the training set.The accuracy of the raw points (baseline) and DBSCAN method for Dataset1 and Dataset2 were lower than 0.8 because the raw data from the radar detection contained many invalid points from new test data, and DBSCAN could not adequately remove a sufficient number of outlier points.However, the performance of the proposed method exhibited no obvious changes for any of the three datasets.This means that the point clouds predicted by the PCE could improve the performance of classification when new people and environments are involved.For computing the complexity analysis, the proposed method had 0.017 million parameters, which was a much smaller number than that of the other methods.The floating-point operations (FLOPs) associated with the proposed method was 0.303M, which was also lower than that of the other methods.In addition, the fall detection system achieved high accuracy and required less response time.For the response times of all the competing methods, we ran them on the same platform.Because all of them required the same computing time for feature extraction and classification, we only calculated the computing time of PCE on a single sample at a time.Although the accuracy was not favorable for new people and environments, the response time of the proposed method was shorter than that of the others.In summary, our work balances accuracy and speed.However, fall detection based on radar currently uses different data formats, such as time-frequency maps and range-Doppler maps, and there is no public fall detection dataset for radar.It is difficult to find a baseline for a fall detection system; therefore, we compared the reported performance with that of other studies.As shown in Table 6, the proposed system achieves better performance for new people and environments.Although the performances of studies [22,40] was above 0.9, they were tested only on the same people and in the same environment.The performance of these studies may vary for new people and environments in a way similar to the results of the DBSCAN method shown in Table 5.In addition, although CW radar has lower cost, the accuracy of the proposed FMCW fall detection system was 0.989, which was higher than the CW radar system [41,42].Moreover, even though the 77 GHz 3T4R radar has a higher resolution than our sensor, the proposed fall detection system could still outperform it [22].To investigate the potential of our fall detection system for real-time implementation, we evaluated the computational time of one sample for all the steps in our work.As shown in Table 7, although the computing time for the PCE and classification was 20.349 ms in total, the signal processing cost a significant amount of time because, in this study, we used the super-resolution algorithm MUSIC for DOA, which is considered to be the bottleneck of real-time fall detection systems.However, further research is required to design a system with lower computing costs for real-time detection and mobile devices.Furthermore, it is difficult to collect samples when real falls start among older adults.The limitations of the sample size may cause potential biases.In the future, we will improve our experiment and test it with large-scale experiments in more practical environments and with more subjects.

Conclusions
This study demonstrated a fall detection system based on the 3D point clouds of a 24 GHz FMCW radar.Unlike other conventional methods that utilize more costly hardware to improve detection accuracy, we used a low-cost small radar antenna array operating at 24 GHz to maintain high accuracy.By applying the PCE model to a fall detection system, we improved the quality of the point clouds and also the system performance, especially the accuracy, sensitivity, and generalization ability for new users and environments.As a result of our efforts to reduce computing costs in signal processing, the proposed method has the potential for widespread applications in monitoring the health of the elderly in an indoor environment without considering privacy protection.

Figure 1 .
Figure 1.Proposed fall detection system based on PCE model.

Figure 1 .
Figure 1.Proposed fall detection system based on PCE model.

Figure 4 .
Figure 4. Raw point clouds in different environments.(a) Raw point clouds from radar collected in a relatively empty room; the number of points is 85.(b) Raw point clouds collected in a real bedroom; the number of points is 125.

Figure 4 .
Figure 4. Raw point clouds in different environments.(a) Raw point clouds from radar collected in a relatively empty room; the number of points is 85.(b) Raw point clouds collected in a real bedroom; the number of points is 125.

Figure 4 .
Figure 4. Raw point clouds in different environments.(a) Raw point clouds from radar collected in a relatively empty room; the number of points is 85.(b) Raw point clouds collected in a real bedroom; the number of points is 125.

Figure 4 .
Figure 4. Raw point clouds in different environments.(a) Raw point clouds from radar collected in a relatively empty room; the number of points is 85.(b) Raw point clouds collected in a real bedroom; the number of points is 125.Therefore, we propose a fall detection method based on a PCE model.A flowchart of the proposed method is presented in Figure 5.The flow of proposed method consists of four steps: (1) to satisfy the input of the PCE model, the number of point clouds of a motion pattern could be extended to fixed number; (2) after raw point clouds from the radar in pre-processing, PCE model removes noise points and generates high-quality point clouds; (3) these point clouds are then fed into the feature extractor for human pose parameter

Figure 5 .
Figure 5.The flowchart of proposed system.

Figure 5 .
Figure 5.The flowchart of proposed system.
d,b and b d,b are the decoder weight matrix and bias vector for the bth node (b = 1, 2. ..B), and wi is the input layer of the decoder.The recovered point clouds X through the dense layer is obtained via X = W d u i + b d

Figure 7 .
Figure 7. (a) Pose based on the 3D bounding box of a standing human body; (b) change in the 3D bounding box of a falling human body; (c) change in the 3D bounding box of a walking human body.

Figure 7 .
Figure 7. (a) Pose based on the 3D bounding box of a standing human body; (b) change in the 3D bounding box of a falling human body; (c) change in the 3D bounding box of a walking human body.

Figure 7 .
Figure 7. (a) Pose based on the 3D bounding box of a standing human body; (b) change in the 3D bounding box of a falling human body; (c) change in the 3D bounding box of a walking human body.

Figure 8 .
Figure 8. Example of a polygon formed by the overlap of the predicted box and ground truth.The 3D IoU can be approximately achieved by multiplying the length along the x-axis with the directed 2D IoU.

Figure 8 .
Figure 8. Example of a polygon formed by the overlap of the predicted box and ground truth.The 3D IoU can be approximately achieved by multiplying the length along the x-axis with the directed 2D IoU.

Figure 9 .
Figure 9. (a-i) are examples of the directed 2D IoU with different numbers of intersection points.The insertion area is obtained by summing the triangular areas.

Figure 9 .
Figure 9. (a-i) are examples of the directed 2D IoU with different numbers of intersection points.The insertion area is obtained by summing the triangular areas.

Figure 10 .
Figure 10.Architecture of the lightweight CNN.

( a )
1st and 10th frames of a human walking (b) 1st and 10th frames of a human squatting (c) 1st and 10th frames of a human sitting

Figure 12 .
Figure 12.Comparison between raw point clouds, the ground truth, and the PCE model's predicted points in two frames (1st and 10th) for five different actions (walking, squatting, sitting, jumping, and falling).

Figure 12 .
Figure 12.Comparison between raw point clouds, the ground truth, and the PCE model's predicted points in two frames (1st and 10th) for five different actions (walking, squatting, sitting, jumping, and falling).

Sensors 2024 ,
24,  x FOR PEER REVIEW 12 of 17 from the radar, DBSCAN, and PCE models using the traditional MSE loss function instead of the proposed HyLoss.The average MAE of the predicted method was comparable to that of the other methods.In other words, the localization of the centroid from the PCE model was the closest to the ground truth.In addition, we computed the average MAE of the width, depth, and height of the bounding box along the x-, y-, and z-axis.As shown in Figure14, the width, depth, and height of the predicted bounding box were significantly lower than those of the other methods for both Dataset1 and Dataset2.This result indicates that the PCE model with the proposed HyLoss offers a better image of the human body for a wide range of people and environments.

Figure 13 .
Figure 13.Localization error of the centroid of the points cloud compared to the baseline.Figure 13.Localization error of the centroid of the points cloud compared to the baseline.

Figure 13 .
Figure 13.Localization error of the centroid of the points cloud compared to the baseline.Figure 13.Localization error of the centroid of the points cloud compared to the baseline.

Figure 13 .
Figure 13.Localization error of the centroid of the points cloud compared to the baseline.

Figure 14 .
Figure 14.MAE of the width, depth, and height of the bounding box from the points cloud compared to the baseline.

Figure 14 .
Figure 14.MAE of the width, depth, and height of the bounding box from the points cloud compared to the baseline.4.2.2.Classification for Fall Detection In terms of fall detection, the predicted high-quality point clouds of Dataset0, Dataset1, and Dataset2 were fed into the feature extractor to obtain the human pose parameters.The lightweight CNN shown in Figure 9 was used to classify normal and fall events.Dataset0 was used to train the lightweight CNN classifier, Dataset1 was used for validation with new people, and Dataset2 was used for validation with new people and new environments.Neither Dataset1 nor Dataset2 were involved in the entire training process.To validate the choice of PCE architecture parameters, we compared the average actuary, recall, and F1 of the PCE model with various parameters.The results are summarized in Table4.Every encoder is denoted by E, and every decoder is denoted by D. For example, the proposed PCE model can be described as raw+32E+32E+16E+32D.Clearly, the proposed architecture is suitable.In addition, a comparison of the PCE with the traditional MSE loss function revealed that the accuracy, recall, and F1 score of the proposed HyLoss function outperformed those of the traditional function, which shows that the proposed HyLoss function improves the fall detection system.

Table 1 .
Unit parameter configuration of the radar.

Table 2 .
Average 3D IoU of Dataset1 in each frame.

Table 2 .
Average 3D IoU of Dataset1 in each frame.

Table 3 .
Average 3D IoU of Dataset2 in each frame.

Table 4 .
Classification result of the proposed PCE model using different architectures.

Table 5 .
Classification performance of same datasets using current point cloud generation methods.

Table 6 .
Comparison between the proposed method and other fall detection methods that use radar.

Table 7 .
Response time for each step in the system.