Skip to Content
You are currently on the new version of our website. Access the old version .
SensorsSensors
  • Review
  • Open Access

11 November 2022

Skeleton-Based Human Pose Recognition Using Channel State Information: A Survey

,
,
,
,
,
and
College of Electronic and Information Engineering, Shandong University of Science and Technology, Qingdao 266590, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section Internet of Things

Abstract

With the increasing demand for human-computer interaction and health monitoring, human behavior recognition with device-free patterns has attracted extensive attention. The fluctuations of the Wi-Fi signal caused by human actions in a Wi-Fi coverage area can be used to precisely identify the human skeleton and pose, which effectively overcomes the problems of the traditional solution. Although many promising results have been achieved, no survey summarizes the research progress. This paper aims to comprehensively investigate and analyze the latest applications of human behavior recognition based on channel state information (CSI) and the human skeleton. First, we review the human profile perception and skeleton recognition progress based on wireless perception technologies. Second, we summarize the general framework of precise pose recognition, including signal preprocessing methods, neural network models, and performance results. Then, we classify skeleton model generation methods into three categories and emphasize the crucial difference among these typical applications. Furthermore, we discuss two aspects, such as experimental scenarios and recognition targets. Finally, we conclude the paper by summarizing the issues in typical systems and the main research directions for the future.

1. Introduction

With the rapid development of information technology and the increase in people’s health requirements, human behavior recognition has attracted widespread attention [1]. Human behavior recognition is widely used in smart medical care, smart homes, virtual reality, and motion analysis and has strong research importance [2,3,4,5].
Human behavior recognition is divided into device-based [6,7,8] and device-free [9,10,11,12] methods, according to whether the person wears or carries the sensor. Device-based approaches, although generally accurate, are inconvenient in many important real-life scenarios, e.g., requiring the elderly or patients to carry the device at all times. Device-free methods make up for the deficiencies of such scenarios. For device-free human sensing, there is a wide range of sensing technologies available, including microphones and cameras. Some sensors, such as microphones and cameras, raise privacy issues. Radio-frequency (RF) signals offer unique advantages compared to these sensors because they are generally ubiquitous and can protect privacy [13]. In addition, RF signals can be utilized in poor light conditions and traverse obstacles, such as walls.
In the field of noncontact and non-line of sight (NLoS) perception [14], by analogy with data that are collected by sensors, many wireless signals can be used to identify human behavior [15]. Existing RF-based sensing systems use either commodity Wi-Fi devices or dedicated RF devices [16]. For example, radar sensing usually employs specialized devices, and it can be used to extract the target’s location, shape, and motion characteristics [17,18,19,20]. Sensing systems using commercial Wi-Fi hardware can take advantage of Wi-Fi devices to achieve ubiquitous sensing [21]. Early Wi-Fi-based systems used received signal strength (RSS) for coarse-grained sensing [22,23]. However, RSS varies randomly to some extent and does not regularly change with the circumstances. Due to the frequency of diversity [24], signal stability and satisfactory accuracy of CSI, it is broadly used in human behavior recognition [25,26,27,28,29,30,31,32,33].
With the improvement of wireless sensor systems’ algorithms and applications of artificial intelligence [34], a fundamental question arises: Can wireless signals provide the same functionality as cameras for fine-grained human perception, such as human pose recognition? Human pose recognition (HPR) can be understood as the position estimation of the human body’s pose (joints or key points, such as head, left hand and right foot) and has made remarkable achievements in computer vision (CV) [35,36,37]. However, the CV method may cause privacy problems. To address this issue, some researchers apply optical signals to realize human 3D skeleton recognition and achieve satisfactory results [38,39]. In addition, several radio-frequency (RF) sensing methods have been proposed for human pose recognition, such as Wi-Fi [40,41,42,43] and frequency-modulated continuous wave (FMCW) radar [44,45,46,47]. We use wireless sensing schemes to estimate human joints using a confidence map to overcome the technical challenges of traditional CV-based human pose recognition solutions (i.e., occlusion, poor light, clothing, and privacy issues). The map is constructed by RF signals and demonstrates the potential to realize a new generation of applications. In Section 2, we briefly introduce the application of the FMCW radar and Wi-Fi in human pose recognition before comparing their advantages and disadvantages, which provides some useful research directions for initial researchers.
However, in these schemes, radar sensing is not suitable for large-scale deployment due to the high infrastructure cost. Therefore, we are interested in sensing systems utilizing commodity Wi-Fi devices. In a wireless network, the multipath effect is generated during the propagation of wireless signals. Meanwhile, the CSI value changes according to the positions of the human body. Hence, pose recognition can be realized according to the changes in the CSI amplitude and phase. In this work, we mainly discuss the Wi-Fi human pose recognition systems leveraging CSI. Many important works of Wi-Fi CSI human pose recognition have been published in recent years [33,48,49,50,51,52,53,54,55]. In general, a person’s pose is often described by a skeleton [56]. These articles investigate typical application scenarios using CSI to build human skeletons, e.g., single person, multiperson, and 2D and 3D human pose recognition.
There are some surveys on specific types of Wi-Fi CSI sensing. For example, in [57], the survey that has the widest application range, including human detection, motion detection, respiratory monitoring, etc., Wang et al. [58] present a survey on CSI-based behavior recognition applications within six years, and the survey has the widest time. In [59], the authors classify CSI-based applications into localization, macro activity recognition and micro activity recognition. However, there is no comprehensive survey to summarize the crucial technique based on the skeleton and CSI. This survey is different from existing ones in that it has a specific research scope and the most novel research content. Our main contributions are summarized as follows.
  • The first review. We investigate behavior recognition methods that are based on Wi-Fi signals and human skeletons. To the best of our knowledge, this is the first review of human pose recognition using CSI and human skeletons. This paper serves as a practical guide for understanding real-life applications using CSI to identify human skeletons.
  • Comprehensive investigation. We present a general framework of precise pose recognition and comprehensively analyze the system components, which include data processing methods, neural network models, and performance results.
  • Typical skeleton generation model. We classify skeleton model generation methods into three categories: the human silhouette, key point coordinate, and point cloud. We emphasize the crucial difference among these typical applications, as well as the advantages and disadvantages of these models.
  • Discussion and future trends. We extensively discuss related factors that affect pose recognition from six aspects. In particular, we discuss the techniques that may be transferred from computer vision to CSI pose recognition.
The remainder of this work consists of the following parts, as shown in Figure 1. We make a detailed investigation of Wi-Fi CSI skeleton recognition from two parts. The first part contains related work (Section 2), skeleton recognition (Section 3), system overview (Section 4) and typical systems (Section 6). Related Work on human pose recognition using RF is discussed in Section 2. Section 3 describes the fundamental knowledge of CSI skeleton recognition. Section 4 presents the process of CSI data generation of human skeleton in detail, which include signal collection, signal preprocessing, and pose recognition methods. Section 5 summarizes the typical applications that are based on CSI pose recognition. The second part presents a discussion (Section 6) and future trends (Section 7) of CSI skeleton recognition. Section 6 discusses two aspects: application scenarios and recognition users. Furthermore, the transfer technique from CV and the future application scenarios of CSI skeleton are presented in Section 7. Finally, we conclude our survey in Section 8.
Figure 1. Taxonomy of this survey.

3. Fundamental Knowledge of CSI Skeleton Recognition

Wi-Fi is a wireless local area network (WLAN) technology that is based on IEEE 802.11 standard protocols [78]. In contrast to Bluetooth and infrared sensors, Wi-Fi devices have a wide range of coverage. In the following section, we introduce the basic concepts of CSI and summarize the typical neural network models and the performance evaluation indices in CSI skeleton recognition.

3.1. Introduction to CSI

CSI, a typical radio frequency signal, can be calculated using Wi-Fi technology [24]. Orthogonal frequency-division multiplexing (OFDM) and multiple-input-multiple-output (MIMO) technologies in the 802.11 protocol are used to understand the transmission process of wireless signals. OFDM means that the signal is transmitted across multiple orthogonal subcarriers at various subcarrier frequencies. MIMO refers to the signal being transmitted with multiple antennas. The CSI can be extracted from the signals of a commodity Wi-Fi device that is equipped with a network interface card (NIC) [79]. It represents the channel properties of the communication link between the transmitter and the receiver. The signal that is received by the receiver can be expressed as follows:
y i = H i x i + η i  
where i is the subcarrier index; x i R N T   is the transmitted signal; y i R N R is the received signal; n T and n R are the numbers of transmitter antennas and receiver antennas, respectively; η i is the noise vector; H i denotes the CSI matrix of subcarrier i as follows:
H i = h i 11 h i 12 h i 1 n T h i 21 h i 22 h i 2 n T h i n R 1 h i n R 2 h i n R n T
where h i is the CSI of each subcarrier for the link between the receiver antenna and the sender antenna, and h i m n is a complex value, which can be represented as:
  h i m n = h i m n e j h i m n  
where h i m n and h i m n denote the amplitude and phase, respectively. The CSI in (2) captures the status of the environment. NIC collects the CSI data on each subcarrier. Common CSI measurement tools are the Intel 5300 NIC [80], Atheros 9580 NICs [81], the Broadcom chipset [82], the ESP32 microcontroller [83] and the SDR [84].

3.2. Basic Description of the Skeleton

  • Skeleton-based action recognition has been widely used in computer vision due to its fast execution speed and strong ability to handle large datasets. Researchers can realize human action recognition by detecting changes in the position of joints, usually using techniques such as Kinect depth sensors (Xbox 360 Kinect, Kinect V2) [51,76,77,85], OpenPose [86], and Alphapose [87] to extract bone trajectories or poses.
  • OpenPose and AlphaPose are the most commonly used algorithms in the process of generating human skeletons from images or video frames. OpenPose, which is a bottom-up approach, finds key points of the human body and joins them to the frame of the human [33,48,53,54,88]. In addition, it can estimate the human body and foot joints with the confidence score of each joint, e.g., 15 (OpenPose MPI), 18 (OpenPose-COCO), or 25 (OpenPose Body-25) keypoints, as shown in Figure 10c. In contrast, AlphaPose adopts the regional multiperson pose estimation (RMPE) framework to improve the pose estimation performance [49,52]. AlphaPose can estimate the 17 (Alphapose-COCO) or 26 (Halpe) key points of human joints. In CSI-based pose recognition, we usually choose a suitable method to extract the skeleton as the ground truth value.
    Figure 10. Human skeleton model by different techniques. (a): 20 joints (Kinect for Xbox 360), (b): 25 joints (Kinect V2) and (c): 25 joints OpenPose Body-25 methods [89].

3.3. Typical Neural Network Models

Currently, the deep learning model [90,91,92] is a widely recognized solution for extracting complex features from signals. It can automatically compose features at lower levels into features at higher levels to improve the accuracy of the prediction results. Since CSI does not contain direct information about human poses, human posture cannot be manually annotated in pose recognition based on Wi-Fi devices [33]. Therefore, we need to design a neural network model for extracting temporal and spatial information from Wi-Fi signals. Commonly used models are convolutional neural networks and recurrent neural networks.
Convolutional Neural Networks (CNNs): The convolution operation aims at reducing the dimensions of features in the calculation operation. It is usually used to extract the time and space features of Wi-Fi CSI and transform them into graphs. It is also broadly used in pose recognition [33,51,52,53,54,88]. Residual neural networks (ResNets) [93] are applied to image recognition tasks by introducing the concept of residual learning to CNNs. For example, ResNet [49,77] is used to extract useful data. CNNs play an important role in 3D pose recognition due to their satisfactory performances in high-dimensional data processing.
Recurrent Neural Networks (RNNs): RNNs use the time correlation among neurons to retain previous information in the current state. However, RNNs may cause the problem of gradient disappearance. Long short-term memory (LSTM) [55] solves this problem. It uses gradient descent as a memory unit to learn the characteristics of long sequences. Compared with direct convolution, the convolution of LSTM training is more effective and has the best performance in many-to-many imaging.
Hybrid models [55,94] contain features from more than two primary deep neural networks and they can help overcome the problems requiring many feature extractions. For example, the CNN + LSTM neural network is used for feature extraction [51] to extract deeper CSI information. With this approach, the extracted effective information is more comprehensive, and the generated skeleton is smoother. In the future, the design of new neural network models or the combination of neural networks for the extraction of complete and effective information will require further research. Furthermore, designing neural network models in different software development environments also has greater development potential. For example, Virtual Reality (VR) Platforms [95] can not only be used as learning tools, but also designed deep learning applications for human activity recognition.

3.4. Performance Evaluation Indicators

Pose construction requires the establishment of pose recognition models to generate human skeletons. In this section, we introduce the performance evaluation indicators of common pose recognition models, such as the percentage of correct keypoints (PCK), the mean per joint position error (MPJPE), the percentage of correct skeleton (PCS), and the average precision (AP). The mathematical formulas of PCK, PCS, AP, and MPJPE are presented as (4)–(7). Table 1 summarizes the applications of several evaluation metrics.
Table 1. Common Evaluation Metrics.
Percentage of Correct Keypoints (PCK): The PCK index measures the accuracy of body joint positioning. If the distance between the predicted joint and the real joint is within a specified threshold, the detected joint is considered to be correct. In PCK@a, a refers to either the length of the torso or the percentage of the length of the head. The higher the PCK value is, the better the performance of the system.
Average Precision (AP): AP is used to evaluate multiperson pose recognition. In the AP measure, the predicted joint is regarded as a true positive if it falls within the threshold of the ground-truth joint location. Moreover, for multiperson pose evaluation, all predicted poses are assigned to the ground-truth poses one-by-one based on the PCKh score order.
Percentage of Correct Skeletons (PCS): PCS refers to the percentage of the Euclidean distance that is below a specified threshold. θ < 25 indicates that human poses are accurate and complete, the position of the human body is correct in the predicted image, and the predicted image looks the same as the ground truth. θ < 30 means that the posture of the person is accurate. θ < 40 indicates that the position of the person is correct, but the posture is not accurate. θ < 50 means that as the number of limbs in the predicted image increases, the person’s posture becomes incomplete, blurred, and inaccurate. PCS ◦ 30 and PCS ◦ 50 indicate the strict matching and loose matching of the human body posture.
Mean Per Joint Position Error (P-MPJPE): P-MPJPE is the average value of the L2 distance between the predicted joint point and the corresponding ground-truth joint point. The smaller the index is, the higher the performance of the 3D human pose recognition method. However, these metrics treat each joint of the body independently and may not be able to assess the overall structure of the pose. In the future, further research is needed to propose new evaluation indicators or to combine them with previously established evaluation indicators in order to improve evaluation performance.

4. Pose Recognition Procedure

To reconstruct a human skeleton using Wi-Fi devices, we need to collect CSI signals for data preprocessing, send the processed data into the model for training, and generate the human skeleton. This section introduces the process of pose estimation in detail using Wi-Fi CSI in four parts: signal collection, signal preprocessing, pose recognition methods, and system performance evaluation. The overall process of the system is illustrated in Figure 11. The complete system consists of the upper and lower parts. The upper pipe provides the ground truth for training supervision, and the lower pipe designs a neural network for extracting the time and space information of the CSI and predicting the posture of the human body.
Figure 11. System overview of CSI pose construction.

4.1. Signal Collection

According to the Fresnel zone model [96], collecting pose information in the whole space need at least two pairs of transceivers accurately capture human pose graphics. Human pose information and human position information are contained mainly in the CSI amplitude and phase, respectively. To better understand the signal collection process, we research the signal collection process in the current HPR articles based on the Wi-Fi CSI. As presented in Table 2, we summarize them based on four aspects: experimental devices, signals and the preprocess method, the scene, and number of data.
Table 2. Device, Scene, and Data for Signal Collection.

4.2. Signal Preprocessing

In signal preprocessing, the raw CSI data measured by commodity Wi-Fi devices cannot be used directly for human pose recognition due to environmental noise, unpredictable interference, and outliers. The signal preprocessing method plays a key role in extracting human posture information. We summarize the general methods of signal preprocessing from two aspects: CSI link selection and denoising, and the synchronization of video frames with CSI samples.

4.2.1. CSI Signal Characteristics Selection and Denoising

Amplitude: In CSI-based human pose recognition, it is essential to extract the information related to human pose from the amplitude of the CSI signal. The larger the action, the more obvious the amplitude change, and the higher the reconstruction accuracy of the human skeleton. The raw amplitude information extracted from CSI is not directly used. Currently, several amplitude denoising methods are available, such as principal component analysis (PCA), discrete wavelet transform (DWT), and Hampel filtering. PCA improves processing efficiency by reducing the dimensionality of the dataset, which can remove noise and data redundancy [33]. DWT can remove noise while extracting useful edge information [33,52,53]. Hampel filtering removes outliers caused by sudden changes in the device and the environment. It determines outliers through a moving average window and replaces them with the average of the data [33,55,88].
Phase: Phase is more sensitive to subtle changes in motion than amplitude characteristics. It is necessary to make full use of the amplitude and phase information of CSI for the robustness of the pose recognition system. However, consecutive CSI measurements can result in different time-varying phase offsets in data collection. Some articles [51,52] often adopt conjugate multiplication (CM) to eliminate the phase offset between two CSI antennas. In addition, Winect [77] used the blind source separation (BSS) [98] method to separate the phase changes of the signal and extracted the pose features.
Angle of arrival (AoA): The 2D AoA spectrum contains more spatial information about the human body. The 2D AoA of the reflected signal can be used to infer the number of limbs, enabling human pose reconstruction independent of the environment. However, in contrast to amplitude and phase, the 2D AoA derived from a single Wi-Fi device can only contain a small fraction of body motion. Therefore, multiple pairs of devices are required to combine the 2D AoA spectrum contained in multiple CSI packets [77,97]. In the process of inferring human pose from a 2D AoA spectrum, the phase variation of the CSI signal is typically separated first, and then the CSI amplitude is applied through a Fast Fourier Transform (FFT).
In conclusion, the amplitude, phase, and AoA spectrum of CSI contains different degrees of human motion information, and the accuracy of reconstructing human skeletons by selecting a certain feature or combining these three features may vary. Developing new key feature methods to obtain more pose information is an important direction for future development.

4.2.2. Synchronization of Video Frames with CSI Samples

Since the pose features that are contained in CSI signals are relatively weak, first, we select the antenna with larger dynamic responses to obtain complete pose information. Papers [33,52,53,97] choose an antenna with the largest variance value as the reference.
Moreover, since the transceiver time is not synchronized, data may be lost, the transmission may be delayed, or the phase may be shifted; hence, the data collected by the camera cannot match the CSI samples. Generally, the camera’s number of frames per second (FPS) is much larger than the CSI sampling rate. The transceivers and the camera are synchronized using the network time protocol (NTP) to ensure that the CSI samples are synchronized with the video frames. Secure-Pose [88] uses linear interpolation to resample a group of fixed-interval CSI samples between two video frames to resolve these issues based on timestamps.

4.3. Pose Recognition Methods

The deep learning model can extract the spatial and temporal information of the CSI, locate key points of the body or the body parts, and further infer human poses in the image. This section summarizes the general framework of neural networks in CSI human pose recognition.
Networks Based on the CSI-Predicted Poses: In the process of generating the human skeleton with CSI, a neural network is designed mainly for feature extraction. We study all the articles on HPR based on Wi-Fi CSI and summarize them according to the classification of neural network models, as presented in Table 3.
Table 3. Neural Network and Training Details.
A convolutional structure is adopted to encode the high-dimensional features into low-dimensional features [52,53,54,88], and extracts the features with the most useful information. In addition, resizing convolutions are used to eliminate the checker artifacts and decode the poses in the camera to realize pose reconstruction. CNN can substantially reduce model complexity while retaining robust feature extraction ability. ResNet [77] can train deeper convolutional neural networks to solve the degradation problem of traditional CNNs.
Hybrid models solve the limitations of a single neural network and can maximize the use of neural networks to solve real-life problems [48,49,51,55]. For example, Wi-Pose [51] includes four layers of CNNs and three layers of LSTMs on top of the CNNs. This model can encode prior knowledge of the human skeleton into the pose construction process, use LSTM to output the initial skeleton model, and obtain the current pose according to the forward kinematics.
In the field of neural networks, compared with 2D human pose recognition, 3D human pose recognition still needs to extract CSI spatial and temporal information. Hence, the convolution structure is crucial for the feature extraction process.

4.4. System Performance Evaluation

This section introduces the application of the loss function and the system recognition results based on pioneering papers in CSI human pose recognition.

4.4.1. Loss Function Selection

Loss functions are used to optimize the model to minimize the difference between the predicted and ground-truth images. We summarize the commonly used loss functions and their applications in Table 4.
Table 4. Loss Functions Selection.
The binary cross-entropy loss function is used for the dichotomy task. Formula (8) solves for the probabilities of D m k and 1 D m k . Formula (9) is the average of the binary cross-entropy loss function of each pixel, where W refers to the number of pixels in the figure, and P i , j   and S i , j relate to grayscale values that correspond to the i , j -th pixel in the image. Equation (10) represents the average Euclidean distance error, which is used mainly to predict the distance error between the predicted joint and the real joint.   p ˜ t i and p t i are the predicted and true coordinates, respectively, of the joints at time T. The sum of the Huber loss and the average Euclidean distance error is used as a loss function optimization model [52]. pPAM x and PAM x are the prediction and supervision, respectively, of the pose adjacency matrix for human body key point coordinates on the x-axis. On the y-axis, pPAM y and PAM y have similar representations.
According to the formulas in Table 4, the most commonly used function is the L2 loss function. The calculation of this function is convenient, and the measurement error is low.

4.4.2. System Recognition Results

This section discusses the system performances in HPR papers based on Wi-Fi CSI, as presented in Table 5.
Table 5. The System Performance of HPR Based on Wi-Fi CSI.
Studies [48,49] could generate human poses, and the accuracy rates of the two systems exceeded 75% in identifying skeletons. Wi-Pose [51] could accurately locate the human position, and the accuracy rate was relatively improved. Compared with Wi-Pose, the study [33] had much higher accuracy and evaluated four different scenarios. DINN used [33] as the baseline comparison. In the visual and through-wall scenarios, the average percentages of DINN improved by approximately 37% and 35.7%, respectively, on PCS◦30.
Compared with Wi-Pose, the P-MPJPE value of Wi-Mose was smaller, indicating that Wi-Mose’s human skeleton precision was higher than that of Wi-Pose. However, Wi-Mose did not discuss cross-domain research. Wi-Pose used RFPose3D [68] as the baseline comparison. The average joint localization errors (unit: mm) of Wi-Pose were all smaller than those of RFPose3D in eight scenarios of various training rates, numbers of receiving antennas, occlusion, and cross-domain conditions. Wi-Pose was the most advanced deep learning model in 3D human pose construction for a series of predefined activities using Wi-Fi CSI.
Compared with Wi-Pose, Winect has higher tracking accuracy. Because it is designed to work with free-form activities, Wi-Pose is only used for a set of predefined activities. Free-form activities were more suitable for complex activities in daily life, but this system could only identify common activities at present, and further research is needed.
In this section, we summarize the process of human pose reconstruction by Wi-Fi CSI from four aspects: signal collection, signal preprocessing, pose recognition methods, and system performance evaluation. We analyze and compare algorithms in each aspect of these systems, which provides a complete pose research process for researchers. It will contribute to the future field of pose construction based on CSI.

5. Typical Models of Generating the Human Skeleton

In the following, we analyze the typical applications of three models based on skeleton models in pose recognition and discuss the advantages and disadvantages of each model.

5.1. Skeleton Models Based on the Human Silhouette

Silhouette models contain rough width and contour information of the limbs and torso. Human body parts are approximated by the rectangles or boundaries of human silhouettes and are usually computed by convolution. Generally, contours require preprocessing, i.e., segmentation masks (SM), to extract objects based on prior knowledge of the background. The extracted contour features are usually encoded as Fourier descriptors, geometric features, shape contexts, etc. Furthermore, silhouette-based models commonly use heatmap methods to generate human skeletons [48,55,88]. For example, [48] used Mask R-CNN to map CSI tensors to SM, Joint Heat Maps (JHMs), and Partial Affinity Fields (PAFs). JHMs refer to confidence maps of key positions of the body, and PAFs encode the positions and the orientations of the limbs. Figure 12 shows a neural network model that transforms CSI variables into SM, JHM, and PAF variables to generate human contours and the final human skeleton, which is more suitable for multi-person pose recognition. The heatmap methods directly return each type of key point’s probability, providing supervision information for each point.
Figure 12. SM, JHMs, and PAFs are used for joint association to generate multiple person skeletons [48].

5.2. Skeleton Models Based on Key Point Coordinate Regression

In contrast to the heatmap method, regression key point coordinates need to extract all possible body key points from CSI pose images, encode the positions and directions of human limbs, and connect parts in the same body. The model uses the coordinates (x, y) of the target point as output to generate the human skeleton. Scholars have published three papers based on this model, namely, Wi-Pose [33,53], and Wi-Mose [52]. However, this model, based on key point coordinate regression, lacks the width and contour information of the human body. As a result, skeleton migration may not accurately generate skeleton images when facing new environments and new people.
To overcome the disadvantages of regression key points, some authors proposed another model [51] based on skeletons. The model treats a skeleton as a tree, with the nodes being the joints and the edges being the body segments. In the skeleton tree, because the lengths of the limb parts are fixed, the position of each joint can be inferred simply by estimating the rotation of its associated body segment with respect to the parent joint. Figure 13 shows the movement of an arm. The shoulder is the parent joint of the elbow, which is also the parent joint of the hand.
Figure 13. The left picture shows the skeleton tree model proposed in [51], and the right picture shows the joint rotation. The elbow is relative to the shoulder and rotates the hand with respect to the elbow.
The human skeleton model based on the tree structure of joints makes up for the shortcomings of the direct regression key point model. This model is designed to learn the rotation of the joints and use the time information to infer the current motion pose to ensure that the joints can naturally satisfy the constraints of the bones. This model can not only be applied in human pose recognition, but also in robotics and/or exo-suit engineering applications, to detect the user’s movement and direction, and recover and track the arm. In robotics and/or exo-suit engineering applications, IMU (Inertial Measurement Unit) devices, such as gyroscopes and accelerometers, should be employed.

5.3. Skeleton Models Based on the Point Cloud

To overcome the limitation of the key point coordinate model, researchers have proposed another solution: using various data analysis algorithms to extract as much human spatial information as possible from CSI signals to reconstruct human posture. Specifically, Winect [77] combined AoA and point cloud methods, extracted limb and joint information from CSI signals, and reconstructed 3D human posture.
Angle-of-arrival (AoA) is a typical ranging-based location algorithm; that is, a linear antenna array is used to infer the angle of arrival (i.e., the elevation angle and the azimuth angle) of the received signal in the N-dimensional range. The point cloud is mainly the scattered digital data measured by 3D scanning equipment, which can construct the model of the measured object. Winect estimates the 2D angle of arrival from the measured CSI data, and then analyzes the signal power change in 3D space based on the derived 2D AoA spectrum. In addition, the system uses the density-based spatial clustering of applications with noise (DBSCAN) algorithm to identify the number of moving limbs and the specific limbs in motion according to peaks in the azimuth-elevation spectrum.
Winect uses the number of identified moving limbs, and then separates the multi-limb motion signals using the BSS method. Then, Winect calculates the limb position based on the path length change of the separated signals and reconstructs the trajectory of the limb. Then, the system solves the joint positions using the kinematic model of the limb joints and builds the point cloud model to solve for limb and joint positions, as shown in Figure 14. Both limb and joint clouds are input to the ResNet network to predict the positions of multiple joints. Winect extracts joint and limb information from CSI signals and reconstructs limb trajectories using AoA and point cloud algorithms. Figure 15 shows the 3D human skeleton generated by Winect’s point cloud model. Deeply mining the available information of CSI, signals provides a variety of solutions for human pose reconstruction with commodity Wi-Fi devices.
Figure 14. Examples of point clouds reconstructing limb and joint positions.
Figure 15. (ac) show the continuous 3D skeleton: the process of lifting both arms.
This section introduces three models for generating the human skeleton, which are silhouette-based, key point coordinate-based, and point cloud-based human models. Table 6 describes the applications of the models for generating the human skeleton, the process of converting CSI data into a skeleton, and the advantages and disadvantages of each model. In general, the heatmap model may lose accuracy due to the slow speed of training and reasoning. The model based on the direct regression of key points lacks spatial generalization. Combining the description of prior knowledge and the tree structure based on the joint model renders the reconstructed human posture more realistic.
Facing different experimental scenarios, we may choose the appropriate skeleton model that can greatly improve the system’s performance. When the CSI data change is not obvious due to many interference items in the experimental scenarios, the key point coordinate regression model may not obtain good accuracy. At this time, the silhouette-based model can first locate the human body position and eliminate the interference items. In this case, the silhouette-based model, although slower to train, will perform better. The skeleton model based on point cloud is more suitable for recovering the high-speed movements of human body parts, but it cannot separate the reflection of complex signals. In the following pose recognition research, combining the skeleton model with data analysis algorithms can extract more human body information and adapt to more complex environments.
Table 6. Typical Models of Generating Human Skeleton.
Table 6. Typical Models of Generating Human Skeleton.
ModelsApplicationsGenerating Skeleton ProcessStrengthWeaknesses
Human Silhouette[48,55,88]CSI Data -> Segmentation Masks -> HeatMap (JHM and PAF) -> Silhouette -> SkeletonThis model has satisfactory spatial generalize ability and improves the positioning accuracy of key points.Slow training and inference.
Key Point Coordinate[33,49,52,53,54]CSI Data -> Key Point Coordinate -> SkeletonThis model has the advantage of fast training speed.Poor spatial generalization.
Skeleton Tree: [51].
Point Cloud[77,97]CSI Data -> Phase: AoA Estimate -> Joint Trajectory: Point Cloud -> SkeletonThis model combines a variety of data analysis algorithms to fully separate the CSI signals.Unable to solve complex signal reflection separation in dynamic scenes.

6. Discussion

This section discusses the influence of the experimental factors on the system’s robustness. It presents some insights into the development of human pose recognition from two categories, namely, experiential scenario and recognition user. In particular, we intend to investigate these typical applications and emphasize their crucial characteristics. We hope this discussion can provide useful insight for developers.

6.1. Experimental Scenarios

The current model works well indoors since the related experiments are usually conducted under a single indoor scene. Due to the influence of the experimental background, lighting, and other factors, the accuracy of CSI skeleton prediction will decrease when the environment changes. In addition, the CSI signal of Wi-Fi devices has the multipath effect and the packet rate also has an impact on the system identification accuracy. Therefore, we discuss the effects of the following on human pose recognition: whether the experimental scene is line-of-sight (LoS) or non-line-of-sight (NLoS), and whether the cross-domain scene can generate skeletons and the deployment of Wi-Fi devices.

6.1.1. LoS or NLoS

Pose recognition that is based on CV cannot reconstruct poses in NLoS scenes, but Wi-Fi signals can penetrate occlusions to recognize skeletons. We compare the pose construction in the LoS and NLoS scenarios, as presented in Table 7.
Table 7. Identification Results of LoS and NLoS.
Here, the Occluded Scenario and Through-wall Scenario refer to NLoS scenes. Obstacles (wooden screens or iron doors) are used to block the propagation and reflection of wireless signals. When signals pass through obstacles, the pose recognition error increases due to signal attenuation. Compared with the LoS scenes, the overall performances for the NLoS scenes are slightly reduced. CSI is still unable to generate a human skeleton through a wall. The realization of high-precision pose recognition after the signal passes through the wall is a major direction for future research.

6.1.2. Robustness Discussion: Cross-Domain Scenarios and Stranger Situations

Recently, many studies have demonstrated that wireless network devices can be used to reconstruct images of human poses, producing satisfactory results in constructing images of subjects in prior training samples. However, in the transmission of the process, Wi-Fi signals may be reflected and scattered by objects in the surrounding environment. Moreover, subjects of various ages, genders, heights, and weights can affect the signal differently, even if they take the same action. Therefore, the performance will decrease for new subjects or testing subjects who are not in the training sample.
Currently, several solutions are available for improving cross-domain generalization ability. One solution is to add an adversarial neural network to extract domain-independent features [54]. The other is to improve the generalization ability at the lower signal layer [51].
The more robust the system is, the greater the possibility of large-scale applications. Hence, in the future, we should design new models to address these problems and accurately extract features independent of the environment to realize cross-environment generalization.

6.1.3. Different Distances and Packet Rate of Wi-Fi Devices

The distance between the Wi-Fi transceivers has essential effects on skeleton recognition accuracy. Due to the multipath of the CSI signals, the system performance will be significantly improved with the decrease in distance. Besides the distance between Wi-Fi devices, the rate at which the devices themselves send packets also has an impact on the system’s performance. The sample size of the received CSI will change with the packet rate, as shown in Table 8.
Table 8 summarizes the average localization error (cm) changes of the human key points at different distances and packet rates in two typical papers [77,97]. As the distance between the two transceivers increases, the performance of the system significantly decreases. However, if the distance is too small, it is difficult to adapt to the long-distance application. Generally, the default packet rate is set as 1000 pkts/s, because high packet rate can effectively recover the joint movement of a body part and can better represent the full-body movements of a user. According to the current experimental results, the minimum packet rate is 250 pkt/s. Therefore, the appropriate distance and packet rate should be selected according to the actual target needs.
Table 8. Effects of Different Distance and Packet Rate on System Performance.
Table 8. Effects of Different Distance and Packet Rate on System Performance.
SystemDistance; Metric PerformancePacket Rate; Metric Performance
Winect [77]2 m; 4.6 cm250 pkts/s; 5.7 cm
2.5 m; 4.9 cm500 pkts/s; 5.0 cm
3 m; 5.1 cm1000 pkts/s; 4.5 cm
3.5 m; 5.4 cm-
GoPose [97]2.5 m; 4.7 cm250 pkts/s; 5.7 cm
3 m; 5.1 cm500 pkts/s; 5.3 cm
3.5 m; 5.8 cm1000 pkts/s; 4.7 cm
In Table 8, Metric Performance is average localization errors (cm).

6.2. Recognition User

The object of posture reconstruction is a human. Therefore, the user’s status has an important impact on pose recognition, which includes three main aspects: 2D–3D, signal-multiple person, and static-dynamic posture. We analyze and summarize the advantages and the disadvantages of each category, providing a preliminary research direction for research.

6.2.1. 2D or 3D HPR

2D HPR refers to the extraction of a 2D human skeleton by locating the coordinates of the human body’s key points (x, y) and connecting these key points in a specified joint order [33,48,53,54]. 3D HPR refers to the extraction of a 3D human skeleton obtained by locating the joint’s 3D coordinates (x, y, z) and connecting these key points in a specified joint order [51,52].
Two main methods are available for 3D HPR: one method is to obtain the 3D key points directly from 2D images through regression and design a neural network to realize end-to-end mapping. The other method is to predict the 2D pose from an image and predict the 3D pose and the trajectory of the root node based on the predicted 2D skeleton. The latter approach benefits from mature 2D pose recognition technology, which greatly reduces the complexity. However, substantial challenges are encountered with 3D human pose recognition, such as the huge 3D pose space and the inadaptability of single-view 2D to 3D mapping (for example, one 2D skeleton can correspond to multiple 3D skeletons). However, it is more important for developing intelligent medical care and motion analysis, among other applications. These problems must be urgently solved if we seek to obtain the real human posture.

6.2.2. Single-Person or Multiperson HPR

Human pose recognition can be divided into two categories: single-person pose estimation (SPPE) and multiperson pose estimation (MPPE). SPPE involves the location of the key points of a person through CSI pose images to generate a human skeleton.
MPPE requires the determination of the key points of all people in CSI pose images of multiple people [48]. Since the positions and the number of people in the figure are unknown, MPPE is more difficult. We can draw experience from CV processing methods, such as the top-down or bottom-up method. The top-down approach requires adding a human detector, estimating each body component separately, and finally calculating the posture of each person. The bottom-up approach involves detecting all the components in the image and associating each component with a person. In general, adding body detectors is easier than using correlation algorithms.

6.2.3. Static or Dynamic HPR

Static-based human behavior recognition refers to obtaining the corresponding effective information from a CSI posture image to realize the recognition of activities. Static behavior recognition involves only estimating the position of the skeleton and the coordinate position information of the N joint points of the human body that correspond to the skeleton at each moment [33,48,53,54].
Dynamic-based human behavior recognition requires the establishment of a relationship between the previous frame and the next frame in a video and the recognition of activities [51,52]. Movement refers to continuous actions that can form daily activities, such as walking and waving. In dynamic pose recognition, pose continuity must be considered because joint point changes adhere to the human body structure. An LSTM network is commonly used because it considers the effect of time on the joint points. In recent years, static posture estimation based on Wi-Fi devices has yielded better results. In contrast, dynamic posture estimation is more challenging.
In summary, we discuss and analyze the importance of current system recognition accuracy and system robustness for large-scale pose recognition in different scenarios. Next, we present some challenges and propose future research trends.

8. Conclusions

In this work, a survey of recent studies on human pose recognition using pervasive Wi-Fi CSI signals is conducted. This paper presents a general framework of precise behavior recognition based on the human skeleton and CSI, as well as summarizes the main components, including the signal processing techniques, neural network models, and performance results of pose recognition. In addition, we analyze the typical applications of pose recognition according to various skeleton models and discuss typical factors from the experimental environment to potential applications. We believe that these analyses and discussions help widen the vision of researchers and enable them to learn more mature experiences from these applications. Furthermore, we deem that transferring computer vision algorithms to CSI pose recognition may be a promising strategy for solving many difficult problems. We also believe that the combination of precise skeleton recognition, localization, and complicated human behavior analysis using CSI will enable us to develop novel applications.

Author Contributions

Conceptualization, Z.W. and M.M.; methodology, M.M.; writing—original draft preparation, M.M.; writing—review and editing, Z.W., M.M., X.F., X.L. and F.L.; supervision, Y.G. and D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Shandong University Youth Innovation Supporting Program (2019KJN020), the Taishan Scholar Engineering Construction Fund of Shandong Province of China (tsqn201812066), the Natural Science Foundation of Shandong Province (ZR2020MF086, ZR2022MF315).

Institutional Review Board Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yu, Z.; Wang, Z. Human Behavior Analysis: Sensing and Understanding; Springer Nature Singapore Pte Ltd.: Singapore, 2020. [Google Scholar]
  2. Liu, J.; Liu, H.; Chen, Y.; Wang, Y.; Wang, C. Wireless Sensing for Human Activity: A Survey. IEEE Commun. Surv. Tutor. 2020, 22, 1629–1645. [Google Scholar] [CrossRef]
  3. Gupta, N.; Gupta, S.K.; Pathak, R.K.; Jain, V.; Rashidi, P.; Suri, J.S. Human activity recognition in artificial intelligence framework: A narrative review. Artif. Intell. Rev. 2022, 55, 4755–4808. [Google Scholar] [CrossRef] [PubMed]
  4. Pareek, P.; Thakkar, A. A survey on video-based Human Action Recognition: Recent updates, datasets, challenges, and applications. Artif. Intell. Rev. 2021, 54, 2259–2322. [Google Scholar] [CrossRef]
  5. Minh Dang, L.; Min, K.; Wang, H.; Jalil Piran, M.; Hee Lee, C.; Moon, H. Sensor-based and vision-based human activity recognition: A comprehensive survey. Pattern Recognit. 2020, 108, 107561. [Google Scholar] [CrossRef]
  6. Zhou, H.; Gao, Y.; Song, X.; Liu, W.; Dong, W. LimbMotion: Decimeter-level Limb Tracking for Wearable-based Human-Computer Interaction. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2019, 3, 161. [Google Scholar] [CrossRef]
  7. Mukhopadhyay, S.C. Wearable Sensors for Human Activity Monitoring: A Review. IEEE Sens. J. 2015, 15, 1321–1330. [Google Scholar] [CrossRef]
  8. Wang, J.; Chen, Y.; Hao, S.; Peng, X.; Hu, L. Deep learning for sensor-based activity recognition: A survey. Pattern Recognit. Lett. 2019, 119, 3–11. [Google Scholar] [CrossRef]
  9. Wang, J.; Gao, Q.; Pan, M.; Fang, Y. Device-Free Wireless Sensing: Challenges, Opportunities, and Applications. IEEE Netw. 2018, 32, 132–137. [Google Scholar] [CrossRef]
  10. Zhang, R.; Jing, X.; Wu, S.; Jiang, C.; Mu, J.; Yu, F.R. Device-Free Wireless Sensing for Human Detection: The Deep Learning Perspective. IEEE Internet Things J. 2021, 8, 2517–2539. [Google Scholar] [CrossRef]
  11. Jayasundara, V.; Jayasekara, H.; Samarasinghe, T.; Hemachandra, K.T. Device-Free User Authentication, Activity Classification and Tracking Using Passive Wi-Fi Sensing: A Deep Learning-Based Approach. IEEE Sens. J. 2020, 20, 9329–9338. [Google Scholar] [CrossRef]
  12. Hussain, Z.; Sheng, Q.Z.; Zhang, W.E. A review and categorization of techniques on device-free human activity recognition. J. Netw. Comput. Appl. 2020, 167, 102738. [Google Scholar] [CrossRef]
  13. Thariq Ahmed, H.F.; Ahmad, H.; Vaithilingam, C.A. Device free human gesture recognition using Wi-Fi CSI: A survey. Eng. Appl. Artif. Intell. 2020, 87, 103281. [Google Scholar] [CrossRef]
  14. Adib, F.; Katabi, D. See through walls with WiFi! SIGCOMM Comput. Commun. Rev. 2013, 43, 75–86. [Google Scholar] [CrossRef]
  15. Cianca, E.; Sanctis, M.D.; Domenico, S.D. Radios as Sensors. IEEE Internet Things J. 2017, 4, 363–373. [Google Scholar] [CrossRef]
  16. Tan, S.; Yang, J. Commodity Wi-Fi Sensing in 10 Years: Current Status, Challenges, and Opportunities. IEEE Internet Things J. 2022. [Google Scholar] [CrossRef]
  17. Zheng, T.; Chen, Z.; Luo, J.; Ke, L.; Zhao, C.; Yang, Y. SiWa: See into walls via deep UWB radar. In Proceedings of the 2021 IEEE International Conference on Big Data and Smart Computing (BigComp), New Orleans, LA, USA, 25–29 October 2021; pp. 323–336. [Google Scholar]
  18. Yamada, H.; Horiuchi, T. High-resolution Indoor Human detection by Using Millimeter-Wave MIMO Radar. In Proceedings of the 2020 International Workshop on Electromagnetics: Applications and Student Innovation Competition (iWEM), Makung, Taiwan, 26–28 August 2020; pp. 1–2. [Google Scholar]
  19. Nam, D.V.; Gon-Woo, K. Solid-State LiDAR based-SLAM: A Concise Review and Application. In Proceedings of the 2021 IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju Island, Korea, 17–20 January 2021; pp. 302–305. [Google Scholar]
  20. Guan, J.; Madani, S.; Jog, S.; Gupta, S.; Hassanieh, H. Through Fog High-Resolution Imaging Using Millimeter Wave Radar. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 13–19 June 2020; pp. 11464–11473. [Google Scholar]
  21. Zheng, T.; Chen, Z.; Ding, S.; Luo, J. Enhancing RF Sensing with Deep Learning: A Layered Approach. IEEE Commun. Mag. 2021, 59, 70–76. [Google Scholar] [CrossRef]
  22. Haseeb, M.; Parasuraman, R. Wisture: RNN-based Learning of Wireless Signals for Gesture Recognition in Unmodified Smartphones. arXiv 2017, arXiv:1707.08569. [Google Scholar]
  23. Wang, J.; Zhang, X.; Gao, Q.; Yue, H.; Wang, H. Device-Free Wireless Localization and Activity Recognition: A Deep Learning Approach. IEEE Trans. Veh. Technol. 2017, 66, 6258–6267. [Google Scholar] [CrossRef]
  24. Yang, Z.; Zhou, Z.; Liu, Y. From RSSI to CSI: Indoor Localization via Channel Response. ACM Comput. Surv. 2013, 46, 25. [Google Scholar] [CrossRef]
  25. Li, W.; Bocus, M.J.; Tang, C.; Vishwakarma, S.; Piechocki, R.J.; Woodbridge, K.; Chetty, K. A Taxonomy of WiFi Sensing: CSI vs Passive WiFi Radar. In Proceedings of the IEEE Global Communications Conference (GlobeCom), Taipei, Taiwan, 7–11 December 2020; pp. 1–6. [Google Scholar]
  26. Pham, C.; Nguyen, L.; Nguyen, A.; Nguyen, N.; Nguyen, V.-T. Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks. Multimed. Tools Appl. 2021, 80, 28919–28940. [Google Scholar] [CrossRef]
  27. Wu, C.; Zhang, F.; Hu, Y.; Liu, K.J.R. GaitWay: Monitoring and Recognizing Gait Speed Through the Walls. IEEE Trans. Mob. Comput. 2021, 20, 2186–2199. [Google Scholar] [CrossRef]
  28. Ngamakeur, K.; Yongchareon, S.; Yu, J.; Rehman, S.U. A Survey on Device-free Indoor Localization and Tracking in the Multi-resident Environment. ACM Comput. Surv. 2020, 53, 71. [Google Scholar] [CrossRef]
  29. Li, H.; Zeng, X.; Li, Y.; Zhou, S.; Wang, J. Convolutional neural networks based indoor Wi-Fi localization with a novel kind of CSI images. China Commun. 2019, 16, 250–260. [Google Scholar] [CrossRef]
  30. Wang, F.; Zhang, F.; Wu, C.; Wang, B.; Liu, K.J.R. ViMo: Vital Sign Monitoring Using Commodity Millimeter Wave Radio. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 8304–8308. [Google Scholar]
  31. Zhang, F.; Wu, C.; Wang, B.; Wu, M.; Bugos, D.; Zhang, H.; Liu, K.J.R. SMARS: Sleep Monitoring via Ambient Radio Signals. IEEE Trans. Mob. Comput. 2021, 20, 217–231. [Google Scholar] [CrossRef]
  32. Zeng, Y.; Wu, D.; Xiong, J.; Yi, E.; Gao, R.; Zhang, D. FarSense: Pushing the Range Limit of WiFi-Based Respiration Sensing with CSI Ratio of Two Antennas. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2019, 3, 121. [Google Scholar] [CrossRef]
  33. Guo, L.; Lu, Z.; Zhou, S.; Wen, X.; He, Z. When Healthcare Meets Off-the-Shelf WiFi: A Non-Wearable and Low-Costs Approach for In-Home Monitoring. arXiv 2020, arXiv:2009.09715. [Google Scholar]
  34. Li, C.; Cao, Z.; Liu, Y. Deep AI Enabled Ubiquitous Wireless Sensing: A Survey. ACM Comput. Surv. 2021, 54, 32. [Google Scholar] [CrossRef]
  35. Ren, B.; Liu, M.; Ding, R.; Liu, H. A Survey on 3D Skeleton-Based Action Recognition Using Learning Method. arXiv 2020, arXiv:2002.05907. [Google Scholar]
  36. Kanazawa, A.; Black, M.J.; Jacobs, D.W.; Malik, J. End-to-End Recovery of Human Shape and Pose. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7122–7131. [Google Scholar]
  37. Zhao, M.; Liu, Y.; Raghu, A.; Zhao, H.; Li, T.; Torralba, A.; Katabi, D. Through-Wall Human Mesh Recovery Using Radio Signals. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 10112–10121. [Google Scholar]
  38. Isogawa, M.; Yuan, Y.; Toole, M.O.; Kitani, K. Optical Non-Line-of-Sight Physics-Based 3D Human Pose Estimation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 13–19 June 2020; pp. 7011–7020. [Google Scholar]
  39. Ruget, A.; Tyler, M.; Mora-Martín, G.; Scholes, S.; Zhu, F.; Gyöngy, I.; Hearn, B.; Mclaughlin, S.; Halimi, A.; Leach, J. Real-time, low-cost multi-person 3D pose estimation. arXiv 2021, arXiv:1901.03953. [Google Scholar]
  40. Kotaru, M.; Satat, G.; Raskar, R.; Katti, S. Light-Field for RF. arXiv 2019, arXiv:1901.03953. [Google Scholar]
  41. Kato, S.; Fukushima, T.; Murakami, T.; Abeysekera, H.; Iwasaki, Y.; Fujihashi, T.; Watanabe, T.; Saruwatari, S. CSI2Image: Image Reconstruction from Channel State Information Using Generative Adversarial Networks. IEEE Access 2021, 9, 47154–47168. [Google Scholar] [CrossRef]
  42. Zhong, W.; He, K.; Li, L. Through-the-Wall Imaging Exploiting 2.4 GHz Commodity Wi-Fi. arXiv 2019, arXiv:1903.03895. [Google Scholar]
  43. Kefayati, M.H.; Pourahmadi, V.; Aghaeinia, H. Wi2Vi: Generating Video Frames from WiFi CSI Samples. IEEE Sens. J. 2020, 20, 11463–11473. [Google Scholar] [CrossRef]
  44. Adib, F.; Hsu, C.-Y.; Mao, H.; Katabi, D.; Durand, F. Capturing the human figure through a wall. ACM Trans. Graph. 2015, 34, 219. [Google Scholar] [CrossRef]
  45. Hsu, C.-Y.; Hristov, R.; Lee, G.-H.; Zhao, M.; Katabi, D. Enabling Identification and Behavioral Sensing in Homes using Radio Reflections. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, UK, 4–9 May 2019; pp. 1–13. [Google Scholar]
  46. Yu, C.; Wu, Z.; Zhang, D.; Lu, Z.; Hu, Y.; Chen, Y. RFGAN: RF-Based Human Synthesis. arXiv 2021, arXiv:2112.03727. [Google Scholar] [CrossRef]
  47. Zheng, Z.; Pan, J.; Ni, Z.; Shi, C.; Ye, S.; Fang, G. Human Posture Reconstruction for through-the-Wall Radar Imaging Using Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  48. Wang, F.; Zhou, S.; Panev, S.; Han, J.; Huang, D. Person-in-WiFi: Fine-Grained Person Perception Using WiFi. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 5451–5460. [Google Scholar]
  49. Wang, F.; Panev, S.; Ziyi, D.; Han, J.; Huang, D. Can WiFi Estimate Person Pose? arXiv 2019, arXiv:1904.00277. [Google Scholar]
  50. Li, C.; Liu, Z.; Yao, Y.; Cao, Z.; Zhang, M.; Liu, Y. Wi-fi see it all: Generative adversarial network-augmented versatile wi-fi imaging. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems, Virtual Event, 16–19 November 2020; pp. 436–448. [Google Scholar]
  51. Jiang, W.; Xue, H.; Miao, C.; Wang, S.; Lin, S.; Tian, C.; Murali, S.; Hu, H.; Sun, Z.; Su, L. Towards 3D human pose construction using wifi. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, London, UK, 21–25 September 2020; pp. 1–14. [Google Scholar]
  52. Wang, Y.; Guo, L.; Lu, Z.; Wen, X.; Zhou, S.; Meng, W. From Point to Space: 3D Moving Human Pose Estimation Using Commodity WiFi. IEEE Commun. Lett. 2020, 25, 2235–2239. [Google Scholar] [CrossRef]
  53. Guo, L.; Lu, Z.; Wen, X.; Zhou, S.; Han, Z. From Signal to Image: Capturing Fine-Grained Human Poses with Commodity Wi-Fi. IEEE Commun. Lett. 2020, 24, 802–806. [Google Scholar] [CrossRef]
  54. Zhou, S.; Guo, L.; Lu, Z.; Wen, X.; Zheng, W.; Wang, Y. Subject-independent Human Pose Image Construction with Commodity Wi-Fi. In Proceedings of the ICC 2021—IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar]
  55. Avola, D.; Cascio, M.; Cinque, L.; Fagioli, A.; Foresti, G.L. Human Silhouette and Skeleton Video Synthesis through Wi-Fi signals. arXiv 2022, arXiv:2203.05864. [Google Scholar] [CrossRef]
  56. Yang, C.; Wang, L.; Wang, X.; Mao, S. Environment Adaptive RFID based 3D Human Pose Tracking with a Meta-learning Approach. IEEE J. Radio Freq. Identif. 2022, 6, 413–425. [Google Scholar] [CrossRef]
  57. Ma, Y.; Zhou, G.; Wang, S. WiFi Sensing with Channel State Information: A Survey. ACM Comput. Surv. 2019, 52, 1–35. [Google Scholar] [CrossRef]
  58. Wang, Z.; Jiang, K.; Hou, Y.; Dou, W.; Zhang, C.; Huang, Z.; Guo, Y. A Survey on Human Behavior Recognition Using Channel State Information. IEEE Access 2019, 7, 155986–156024. [Google Scholar] [CrossRef]
  59. Al-qaness, M.A.A.; Abd Elaziz, M.; Kim, S.; Ewees, A.A.; Abbasi, A.A.; Alhaj, Y.A.; Hawbani, A. Channel State Information from Pure Communication to Sense and Track Human Motion: A Survey. Sensors 2019, 19, 3329. [Google Scholar] [CrossRef]
  60. Zheng, C.; Wu, W.; Yang, T.; Zhu, S.; Chen, C.; Liu, R.; Shen, J.; Kehtarnavaz, N.; Shah, M. Deep Learning-Based Human Pose Estimation: A Survey. arXiv 2020, arXiv:2012.13392. [Google Scholar]
  61. Yang, Z.; Qian, K.; Wu, C.; Zhang, Y. Smart Wireless Sensing—From IoT to AIoT; Springer: Berlin/Heidelberg, Germany, 2021; pp. 3–234. [Google Scholar]
  62. Yanik, M.E.; Torlak, M. Near-Field MIMO-SAR Millimeter-Wave Imaging with Sparsely Sampled Aperture Data. IEEE Access 2019, 7, 31801–31819. [Google Scholar] [CrossRef]
  63. Li, T.; Fan, L.; Yuan, Y.; Katabi, D. Unsupervised Learning for Human Sensing Using Radio Signals. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 1091–1100. [Google Scholar]
  64. Wu, Z.; Zhang, D.; Xie, C.; Yu, C.; Chen, J.; Hu, Y.; Chen, Y. RFMask: A Simple Baseline for Human Silhouette Segmentation with Radio Signals. IEEE Trans. Multimed. 2022, 1–12. [Google Scholar] [CrossRef]
  65. Guo, H.; Zhang, N.; Shi, W.; Ali-Alqarni, S.; Wang, H. Real-Time Indoor 3D Human Imaging Based on MIMO Radar Sensing. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 8–12 July 2019; pp. 1408–1413. [Google Scholar]
  66. Meng, K.; Meng, Y. Through-Wall Pose Imaging in Real-Time with a Many-to-Many Encoder/Decoder Paradigm. In Proceedings of the 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 14–21. [Google Scholar]
  67. Zhao, M.; Li, T.; Alsheikh, M.A.; Tian, Y.; Zhao, H.; Torralba, A.; Katabi, D. Through-Wall Human Pose Estimation Using Radio Signals. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7356–7365. [Google Scholar]
  68. Zhao, M.; Tian, Y.; Zhao, H.; Alsheikh, M.A.; Li, T.; Hristov, R.; Kabelac, Z.; Katabi, D.; Torralba, A. RF-based 3D skeletons. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, Budapest, Hungary, 20–25 August 2018; pp. 267–281. [Google Scholar]
  69. Li, T.; Fan, L.; Zhao, M.; Liu, Y.; Katabi, D. Making the Invisible Visible: Action Recognition Through Walls and Occlusions. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 872–881. [Google Scholar]
  70. Du, H.; Jin, T.; He, Y.; Song, Y.; Dai, Y. Segmented convolutional gated recurrent neural networks for human activity recognition in ultra-wideband radar. Neurocomputing 2020, 396, 451–464. [Google Scholar] [CrossRef]
  71. Sengupta, A.; Cao, S. mmPose-NLP: A Natural Language Processing Approach to Precise Skeletal Pose Estimation Using mmWave Radars. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–12. [Google Scholar] [CrossRef]
  72. Ding, W.; Cao, Z.; Zhang, J.; Chen, R.; Guo, X.; Wang, G. Radar-Based 3D Human Skeleton Estimation by Kinematic Constrained Learning. IEEE Sens. J. 2021, 21, 23174–23184. [Google Scholar] [CrossRef]
  73. Xue, H.; Ju, Y.; Miao, C.; Wang, Y.; Wang, S.; Zhang, A.; Su, L. mmMesh: Towards 3D real-time dynamic human mesh construction using millimeter-wave. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services, Virtual Event, 24 June–2 July 2021; pp. 269–282. [Google Scholar]
  74. Shi, C.; Lu, L.; Liu, J.; Wang, Y.; Chen, Y.; Yu, J. mPose: Environment- and subject-agnostic 3D skeleton posture reconstruction leveraging a single mmWave device. Smart Health 2022, 23, 100228. [Google Scholar] [CrossRef]
  75. Fürst, M.; Gupta, S.T.P.; Schuster, R.; Wasenmüller, O.; Stricker, D. HPERL: 3D Human Pose Estimation from RGB and LiDAR. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 7321–7327. [Google Scholar]
  76. Zhang, F.; Wu, C.; Wang, B.; Liu, K.J.R. mmEye: Super-Resolution Millimeter Wave Imaging. IEEE Internet Things J. 2021, 8, 6995–7008. [Google Scholar] [CrossRef]
  77. Ren, Y.; Wang, Z.; Tan, S.; Chen, Y.; Yang, J. Winect: 3D Human Pose Tracking for Free-form Activity Using Commodity WiFi. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 5, 176. [Google Scholar] [CrossRef]
  78. IEEE Std. 802.11n-2009; Enhancements for Higher Throughput. IEEE: New York, NY, USA, 2009; pp. 1–565.
  79. Halperin, D.; Hu, W.; Sheth, A.; Wetherall, D. Tool release: Gathering 802.11n traces with channel state information. SIGCOMM Comput. Commun. Rev. 2011, 41, 53. [Google Scholar] [CrossRef]
  80. Halperin, D.; Hu, W.; Sheth, A.; Wetherall, D. Predictable 802.11 packet delivery from wireless channel measurements. In Proceedings of the ACM SIGCOMM 2010 Conference, New Delhi, India, 30 August–3 September 2010; pp. 159–170. [Google Scholar]
  81. Xie, Y.; Li, Z.; Li, M. Precise Power Delay Profiling with Commodity Wi-Fi. IEEE Trans. Mob. Comput. 2019, 18, 1342–1355. [Google Scholar] [CrossRef]
  82. Gringoli, F.; Schulz, M.; Link, J.; Hollick, M. Free Your CSI: A Channel State Information Extraction Platform for Modern Wi-Fi Chipsets. In Proceedings of the 13th International Workshop on Wireless Network Testbeds, Experimental Evaluation & Characterization, Los Cabos, Mexico, 25 October 2019; pp. 21–28. [Google Scholar]
  83. Hernandez, S.M.; Bulut, E. Lightweight and Standalone IoT Based WiFi Sensing for Active Repositioning and Mobility. In Proceedings of the 2020 IEEE 21st International Symposium on “A World of Wireless, Mobile and Multimedia Networks” (WoWMoM), Cork, Ireland, 31 August–3 September 2020; pp. 277–286. [Google Scholar]
  84. Ramacher, U.; Raab, W.; Hachmann, U.; Langen, D.; Berthold, J.; Kramer, R.; Schackow, A.; Grassmann, C.; Sauermann, M.; Szreder, P.; et al. Architecture and implementation of a Software-Defined Radio baseband processor. In Proceedings of the 2011 IEEE International Symposium of Circuits and Systems (ISCAS), Rio de Janeiro, Brazil, 15–18 May 2011; pp. 2193–2196. [Google Scholar]
  85. Zhang, Z. Microsoft Kinect Sensor and Its Effect. IEEE MultiMedia 2012, 19, 4–10. [Google Scholar] [CrossRef]
  86. Cao, Z.; Simon, T.; Wei, S.; Sheikh, Y. Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1302–1310. [Google Scholar]
  87. Fang, H.; Xie, S.; Tai, Y.-W.; Lu, C. RMPE: Regional Multi-person Pose Estimation. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2353–2362. [Google Scholar]
  88. Huang, Y.; Li, X.; Wang, W.; Jiang, T.; Zhang, Q. Towards Cross-Modal Forgery Detection and Localization on Live Surveillance Videos. In Proceedings of the IEEE INFOCOM 2021—IEEE Conference on Computer Communications, Vancouver, BC, Canada, 10–13 May 2021; pp. 1–10. [Google Scholar]
  89. Ahad, M.A.R.; Mahbub, U.; Rahman, T. (Eds.) Contactless Human Activity Analysis; Springer: Cham, Switzerland, 2021; Volume 200. [Google Scholar]
  90. Zhang, C.; Patras, P.; Haddadi, H. Deep Learning in Mobile and Wireless Networking: A Survey. IEEE Commun. Surv. Tutor. 2019, 21, 2224–2287. [Google Scholar] [CrossRef]
  91. Bengio, Y.; Lecun, Y.; Hinton, G. Deep learning for AI. Commun. ACM 2021, 64, 58–65. [Google Scholar] [CrossRef]
  92. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  93. Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A Survey of the Recent Architectures of Deep Convolutional Neural Networks. Artif. Intell. Rev. 2019, 53, 5455–5516. [Google Scholar] [CrossRef]
  94. Nirmal, I.; Khamis, A.; Hassan, M.; Hu, W.; Zhu, X.Q. Deep Learning for Radio-Based Human Sensing: Recent Advances and Future Directions. IEEE Commun. Surv. Tutor. 2021, 23, 995–1019. [Google Scholar] [CrossRef]
  95. Luigi, B.; Francesco Carlo, M. Neural Network Design using a Virtual Reality Platform. Glob. J. Comput. Sci. Technol. 2022, 22, 45–61. [Google Scholar] [CrossRef]
  96. Wu, D.; Zhang, D.; Xu, C.; Wang, H.; Li, X. Device-Free WiFi Human Sensing: From Pattern-Based to Model-Based Approaches. IEEE Commun. Mag. 2017, 55, 91–97. [Google Scholar] [CrossRef]
  97. Ren, Y.; Wang, Z.; Wang, Y.; Tan, S.; Chen, Y.; Yang, J. GoPose: 3D Human Pose Estimation Using WiFi. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 6, 69. [Google Scholar] [CrossRef]
  98. Abrard, F.; Deville, Y. A time-frequency blind signal separation method applicable to underdetermined mixtures of dependent sources. Signal Process. 2005, 85, 1389–1403. [Google Scholar] [CrossRef]
  99. Güler, R.A.; Neverova, N.; Kokkinos, I. DensePose: Dense Human Pose Estimation in the Wild. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7297–7306. [Google Scholar]
  100. Feng, M.; Meunier, J. Skeleton Graph-Neural-Network-Based Human Action Recognition: A Survey. Sensors 2022, 22, 2091. [Google Scholar] [CrossRef]
  101. Feng, L.; Zhao, Y.; Zhao, W.; Tang, J. A comparative review of graph convolutional networks for human skeleton-based action recognition. Artif. Intell. Rev. 2021, 55, 4275–4305. [Google Scholar] [CrossRef]
  102. Zheng, C.; Zhu, S.; Mendieta, M.; Yang, T.; Chen, C.; Ding, Z. 3D Human Pose Estimation with Spatial and Temporal Transformers. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 11636–11645. [Google Scholar]
  103. Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 1–13. [Google Scholar] [CrossRef]
  104. Liu, S.; Bai, X.; Fang, M.; Li, L.; Hung, C.-C. Mixed graph convolution and residual transformation network for skeleton-based action recognition. Appl. Intell. 2022, 52, 1544–1555. [Google Scholar] [CrossRef]
  105. Yang, Z.; Li, Y.; Yang, J.; Luo, J. Action Recognition with Spatio–Temporal Visual Attention on Skeleton Image Sequences. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 2405–2415. [Google Scholar] [CrossRef]
  106. Sekii, T. Pose Proposal Networks. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; pp. 350–366. [Google Scholar]
  107. Chen, Y.; Wang, Z.; Peng, Y.; Zhang, Z.; Yu, G.; Sun, J. Cascaded Pyramid Network for Multi-Person Pose Estimation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7103–7112. [Google Scholar]
  108. Yan, S.J.; Xiong, Y.J.; Lin, D.H. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In Proceedings of the Thirty-Second Aaai Conference on Artificial Intelligence/Thirtieth Innovative Applications of Artificial Intelligence Conference/Eighth Aaai Symposium on Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 7444–7452. [Google Scholar]
  109. Joo, H.; Simon, T.; Sheikh, Y. Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8320–8329. [Google Scholar]
  110. Sun, X.; Shang, J.; Liang, S.; Wei, Y. Compositional Human Pose Regression. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2621–2630. [Google Scholar]
  111. Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Skeleton-Based Action Recognition with Directed Graph Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 13–19 June 2020; pp. 7912–7921. [Google Scholar]
  112. Li, M.; Chen, S.; Zhao, Y.; Zhang, Y.; Wang, Y.; Tian, Q. Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 13–19 June 2020; pp. 211–220. [Google Scholar]
  113. Chen, Y.; Tian, Y.; He, M. Monocular human pose estimation: A survey of deep learning-based methods. Comput. Vis. Image Underst. 2020, 192, 102897. [Google Scholar] [CrossRef]
  114. Ke, Q.; Bennamoun, M.; An, S.; Sohel, F.; Boussaid, F. A New Representation of Skeleton Sequences for 3D Action Recognition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4570–4579. [Google Scholar]
  115. Song, L.; Yu, G.; Yuan, J.; Liu, Z. Human pose estimation and its application to action recognition: A survey. J. Vis. Commun. Image Represent. 2021, 76, 103055. [Google Scholar] [CrossRef]
  116. Duan, H.; Zhao, Y.; Chen, K.; Shao, D.; Lin, D.; Dai, B. Revisiting Skeleton-based Action Recognition. arXiv 2022, arXiv:2104.13586. [Google Scholar]
  117. Zhang, P.; Lan, C.; Xing, J.; Zeng, W.; Xue, J.; Zheng, N. View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 1963–1978. [Google Scholar] [CrossRef]
  118. Shao, D.; Zhao, Y.; Dai, B.; Lin, D. FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 13–19 June 2020; pp. 2613–2622. [Google Scholar]
  119. Xu, J.; Rao, Y.; Yu, X.; Chen, G.; Zhou, J.; Lu, J. FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment. arXiv 2022, arXiv:2204.03646. [Google Scholar]
  120. Toshpulatov, M.; Lee, W.; Lee, S.; Haghighian Roudsari, A. Human pose, hand and mesh estimation using deep learning: A survey. J. Supercomput. 2022, 78, 7616–7654. [Google Scholar] [CrossRef]
  121. Ge, Y.; Taha, A.; Shah, S.A.; Dashtipour, K.; Zhu, S.; Cooper, J.M.; Abbasi, Q.; Imran, M. Contactless WiFi Sensing and Monitoring for Future Healthcare—Emerging Trends, Challenges and Opportunities. IEEE Rev. Biomed. Eng. 2022. [Google Scholar] [CrossRef]
  122. Niu, X.P.; Li, S.J.; Zhang, Y.; Liu, Z.P.; Wu, D.; Shah, R.C.; Tanriover, C.; Lu, H.; Zhang, D.Q. WiMonitor: Continuous Long-Term Human Vitality Monitoring Using Commodity Wi-Fi Devices. Sensors 2021, 21, 751. [Google Scholar] [CrossRef]
  123. Cotton, R.J. PosePipe: Open-Source Human Pose Estimation Pipeline for Clinical Research. arXiv 2022, arXiv:2203.08792. [Google Scholar]
  124. Qiu, J.; Yan, X.; Wang, W.; Wei, W.; Fang, K. Skeleton-Based Abnormal Behavior Detection Using Secure Partitioned Convolutional Neural Network Model. IEEE J. Biomed. Health Inform. 2021. [Google Scholar] [CrossRef]
  125. Yu, B.X.B.; Liu, Y.; Chan, K.C.C.; Yang, Q.; Wang, X. Skeleton-based human action evaluation using graph convolutional network for monitoring Alzheimer’s progression. Pattern Recognit. 2021, 119, 108095. [Google Scholar] [CrossRef]
  126. Guo, L.; Lu, Z.; Zhou, S.; Wen, X.; He, Z. Emergency Semantic Feature Vector Extraction from WiFi Signals for In-Home Monitoring of Elderly. IEEE J. Sel. Top. Signal Process. 2021, 15, 1423–1438. [Google Scholar] [CrossRef]
  127. Cormier, M.; Clepe, A.; Specker, A.; Beyerer, J. Where are we with Human Pose Estimation in Real-World Surveillance? In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Waikoloa, HI, USA, 4–8 January 2022; pp. 591–601. [Google Scholar]
  128. Depatla, S.; Mostofi, Y. Passive Crowd Speed Estimation in Adjacent Regions with Minimal WiFi Sensing. IEEE Trans. Mob. Comput. 2020, 19, 2429–2444. [Google Scholar] [CrossRef]
  129. Yu, S.; Zhao, Z.; Fang, H.; Deng, A.; Su, H.; Wang, D.; Gan, W.; Lu, C.; Wu, W. Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection. arXiv 2021, arXiv:2112.03649. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.