Radar-Based Human Activity Recognition: A Study on Cross-Environment Robustness

El Hail, Reda; Mehrjouseresht, Pouya; Schreurs, Dominique M. M.-P.; Karsmakers, Peter

doi:10.3390/electronics14050875

Open AccessArticle

Radar-Based Human Activity Recognition: A Study on Cross-Environment Robustness

by

Reda El Hail

^1,2,*

,

Pouya Mehrjouseresht

³

,

Dominique M. M.-P. Schreurs

³

and

Peter Karsmakers

^1,2

¹

Department of Computer Science, Leuven AI, KU Leuven, B-2440 Geel, Belgium

²

Flanders Make, MPRO, B-3000 Leuven, Belgium

³

Waves: Core Research and Engineering (WaveCoRE), Department of Electrical Engineering (ESAT), KU Leuven, B-3001 Leuven, Belgium

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(5), 875; https://doi.org/10.3390/electronics14050875

Submission received: 25 December 2024 / Revised: 15 February 2025 / Accepted: 21 February 2025 / Published: 23 February 2025

(This article belongs to the Special Issue Feature Papers in Microwave and Wireless Communications Section)

Download

Browse Figures

Versions Notes

Abstract

:

Indoor radar-based human activity recognition (HAR) using machine learning has shown promising results. However, deploying an HAR model in unseen environments remains challenging due to a potential mismatch between training and operational conditions. Such mismatch can be reduced by acquiring annotated training data in more diverse situations. However, since this is time intensive, this paper explores the application of data augmentation and unsupervised domain adaptation (UDA) to enhance the robustness of HAR models, even when they are trained using a very limited amount of annotated data. In the initial analysis, a baseline HAR model was evaluated using a validation set (a) from the same environment as the training data and (b) from a different environment. The results showed a 29.6% decrease in the F1-score when tested on data from the different environment. Implementing data augmentation techniques—specifically, time–frequency warping—reduced this performance gap to 17.8%. Further improvements were achieved by applying an unsupervised domain adaptation strategy, which brought the performance gap drop down to 13.2%. Furthermore, an ablation study examining various augmentation methods and synthetic sample quantities demonstrates the superior performance of our proposed augmentation approach. The paper concludes with a discussion on how environmental variations, such as changes in aspect angle, occlusion and layout, can affect the time-Doppler radar representation and, consequently, HAR performance.

Keywords:

convolutional neural networks; data augmentation; FMCW radar; human activity recognition; unsupervised domain adaptation

1. Introduction

Human monitoring has ignited more attention these last decades. Hospitals and assisted home livings are in need of intelligent systems to help monitor patients and provide valuable information to medical staff. According to the European health newsletter [1], there is an estimated shortage of about 1 million health workers in Europe due to brain drain and the increasing demand on healthcare systems. Furthermore, statistics [2] show that approximately 27% of elderly (more than 60 years) people in the United States are home alone, while 40% of elderly women (aged 65 years or more) and 22% of men in Europe suffer the same [3]. This group of individuals would amply benefit from human monitoring systems to increase their well-being and provide preventive health care measures.

Commercial millimeter wave (mmWave) radar solutions are increasingly used due to their low cost, small size and wide range of applications. They are used in medicine [4], indoor environments, and automotive and industrial applications [5]. More specifically, radar can be used for indoor monitoring solutions such as intruder detection, remote health, assisted living and smart environments.

MmWave radar technology can also be applied for indoor human activity recognition (HAR). This involves identifying human movements within indoor settings, which can include environments such as hospitals and assisted living facilities. By leveraging the high resolution and sensitivity of mmWave radar combined with deep learning techniques, it is possible to monitor and interpret a wide range of human activities. Recent studies have demonstrated its efficacy in detecting a range of activities, from gross motor movements like walking, falling, standing up, siting down, bending and drinking [6,7] to finer actions such as hand gestures [8].

Various models, such as convolutional neural networks (CNN) and recurrent neural networks (RNN), are commonly employed [6]. These HAR models can be applied to diverse representations of radar data, including time-Doppler, range-Doppler, time-range maps and point cloud data. However, it is important to note that the performance of these models often degrades when they are deployed in a different environment, or even the same environment when objects (such as furniture) move in locations that differ from those used in the training cycle of the HAR model [9]. This indicates that such models are sensitive to the environment. For instance, Bhavanasi et al. [9] showed that a CNN applied on time-Doppler maps had an accuracy drop from 78% to 62% when deployed in a new environment. While Gorji et al. [10] showed that changing the layout of the environment made the time-Doppler signatures different, which degrade the model F1-score by 25% for some activities. Furthermore, Gorji et al. [10] showed that the model had a poor generalization ability when moving to a more complex environment with more clutter and different aspect angles under which activities were observed. In Shah et al. [11], the model accuracy dropped in new rooms even though the model was trained on a dataset of three different rooms. Moreover, the aspect angle of the subject with respect to the radar while performing the activity is a parameter that affects the time-Doppler map signature, as shown in Muaaz et al. [12] and Yang et al. [13]. This indicates that radar signals carry information about the layout of the environment, which causes models to inherit dependencies on the source environment and limits their generalization performance in new (target) environments.

Cross-environment HAR relies on models that are robust to a domain shift of the acquired data caused by differences in environments. Such robustness can be obtained with large amounts of training data. However, proper data are only scarcely available and time-intensive to collect. Therefore, researchers are considering approaches that lower the need for annotated data to design robust deep learning models. There are two main approaches. Either the training procedure is modified to have a model that is made (more) agnostic to changes in the environment (the resulting model can be used in multiple environments) or an existing HAR model is, prior to its use, adapted towards a new environment. For the former, data augmentation strategies can be used that modify training examples as if they were recorded in a different environment [14]. Kern et al. [15] showed that data augmentation on Doppler spectrograms enhanced gesture recognition accuracy. Meanwhile, She et al. [16] proved that data augmentation improved model robustness towards new unseen people. For the latter, domain adaptation methods can be used that align the distribution of the features extracted from the source and those extracted from the target environment. For this purpose, maximum mean discrepancy (MMD), Wasserstein metric or other distribution distances are used along with adversarial training to make the extracted features environment agnostic [17]. Specifically for radar-based HAR, Wang et al. [8] suggested a method to map the features of the target environment to the features of the source environment by developing two strategies called maximum minimum adversarial approach and center alignment, which increased accuracy by 18%. However, they only consider hand movements of a static person. Note that the choice of input features for the HAR model also impacts its robustness. Spatial information, in particular, is challenging to generalize across different environments. For instance, a HAR model might learn that a person can only lie down at the position of the bed in Room A. If the bed is located elsewhere in Room B, the model may fail to classify the lying-down event correctly. Consequently, this study excludes methods that rely on spatial information, such as those proposed in [18].

To the best of our knowledge, no prior work on radar-based HAR modeling has evaluated the combined impact of data augmentation strategies and unsupervised domain adaptation on cross-environment robustness. This study addresses this gap by investigating these techniques in scenarios with limited annotated training data. The data augmentation method is evaluated using a newly collected real-life dataset and compared to and combined with a state-of-the-art domain adaptation strategy.

The paper is organized as follows: After the introduction, in Section 2, the methodologies used in radar signal processing and machine learning are briefly reviewed. In Section 3, a thorough description of the collected datasets is given. Section 4 discusses the technical specifications of the experimental setup followed by a comprehensive presentation and analysis of the results in Section 5 and Section 6. Lastly, a summary of the findings and future perspectives are presented.

2. Methodology

First, a typical design process used to create an HAR model is briefly reviewed. Raw radar data pertaining to human activities are pre-processed using signal processing techniques to obtain time-Doppler maps. The latter expresses the changes in speed of the different human body-parts over a certain time span or window size (e.g., 2.5 s). Prior to feeding such time-Doppler map as an input to a machine learning HAR model, it is first normalized. In this work, the HAR model is trained to distinguish between six human movement activities: walk (A1), sit-down (A2), lay-down (A3), stand-up (A4), get-up (A5) and hand-movement (A6). Since for a single activity multiple time-Doppler maps can be generated (because the activity duration surpasses the window size) a post-processing procedure is used to give a single class prediction for each activity. Second, methods enabling the use of the HAR model in different environments are discussed. First data augmentation is briefly touched, then two domain adaptation strategies are explained. Figure 1 visualizes the way all parts fit together.

2.1. Pre-Processing

A standard signal processing is applied to extract time-Doppler features. As discussed in the introduction, spatial information is challenging to generalize across different environments. To address this, time-Doppler features are used in this work, as they minimize the influence of spatial information in the HAR model’s input. The time-Doppler features were calculated as follows: The raw IQ data are processed by a 2D Fast Fourier Transform (FFT) to derive a range-Doppler matrix per antenna pair. Next, all range-Doppler maps are accumulated resulting in a single range-Doppler matrix with an improved Signal to Noise Ratio (SNR). Subsequently, the static object reflections are removed by subtracting the mean value over the Doppler axis. The latter range-Doppler map is transformed to a logarithmic scale with radix 10. Then, it is further processed by taking the maximum over the range dimension to end-up with a Doppler vector. This process is repeated for each newly received radar pattern (a so-called frame). The resulting Doppler vectors are stacked to construct a time-Doppler map (matrix X). Examples of Doppler maps of the human activities collected in our dataset are shown in Figure 2.

2.2. Normalization

Time-Doppler maps show the time evolution of Doppler bins. However, they are still dependent on the subject’s position, where reflections of movements close to the radar have higher signal energies than the ones performed from a further distance. Moreover, movements of small body parts like hands can generate low-energy time-Doppler signatures due to their small cross-section. To minimize these effects, normalization is used. As is presented in (Equation (1)), the time-Doppler map is first centered and then divided by the Frobenius norm.

Consider a time-Doppler map X. The processed matrix Y is defined as

Y = \frac{X - \bar{x}}{∥ X - \bar{x} ∥_{F}}

(1)

where

\bar{x}

is a scalar representing the arithmetic mean of the matrix X and

{∥ \cdot ∥}_{F}

denotes the Frobenius norm.

2.3. HAR Model Training

The time-Doppler map can be considered as an image. Therefore, not unsurprisingly, popular image processing techniques—more specifically, deep learning methods—are used to automatically add semantics to the time-Doppler maps. Prior to training deep learning models based on some representative dataset, a model architecture needs to be selected. The complexity of such model architecture depends on the training set size. For small training sets, the model architecture complexity should also be kept small. Popular image model architectures contain a number of convolutional layers to extract image features followed by fully connected layers that classify the image based on the previously calculated features. As deep learning models are prone to overfitting, mechanisms such as dropout, l2 regularization and batch normalization are usually employed to reduce the risk of overfitting [19].

2.4. Post-Processing

Time-Doppler maps are calculated based on a fixed number of radar frames, which corresponds to a fixed time window. Each human activity has a different duration. Therefore, for each activity, a variable number of time-Doppler maps can be generated. For instance, a single “lay-down” activity generates more time-Doppler map examples in the dataset than a “sit-down” activity. In the post-processing stage, subsequent activities of the same type can be merged into a single activity label.

2.5. Data Augmentation

Data augmentation techniques inspired from computer vision like rotating and flipping are not applicable for time-Doppler data since these lack physical coherence. In radar-based HAR it is required to have a representative variation of human movement in the training dataset. Persons perform activities differently based on their physical characteristics, personal habits and physical capabilities. For instance, an older person might rise from a chair more slowly compared to a younger, more agile individual. Furthermore, different body positions, furniture arrangements or radar placement can cause different aspect angles, causing changes in the Doppler profile that is measured. Hence, an acquired activity can be quite different depending on environmental context and individual body mechanics. Capturing every possible variation within a single dataset would be an immense task. Rather than collecting an exhaustive range of participants, which would be prohibitively time-consuming and expensive, time–frequency warping methods can be used to simulate a richer diversity of human movement. In radar-based HAR, time and frequency warping means stretching or shrinking the time and frequency axes of the time-Doppler map. This physically means that the movements are performed, respectively, with lower or higher speeds [16]. However, frequency warping may also mean a different aspect angle of the movement, since this parameter is proportional to the radial velocity [13].

During the training process, these techniques were exploited jointly or separately to respect the physical aspect of movements. In other words, if the time-Doppler axis is dilated, then the time axis will be compressed proportionally. When the time or frequency is reduced, zero padding is applied to retain the original time-Doppler dimensions, as shown in Figure 3. When time or frequency is enlarged, cropping is applied. The warping factor is sampled from a uniform distribution with bounds

[0.4 : 1.6]

. This factor modulates the frequency or temporal axis, where values under unity result in axis compression while those exceeding unity produce axis dilation.

Other data augmentation methodologies, particularly frequency and time masking (Figure 4), are employed to simulate occlusion phenomena that may occur during data acquisition. Such occlusion events manifest when radar signals are impeded by objects in the sensor’s field of view, resulting in partial monitoring of body movement. These techniques serve to enhance model robustness by reproducing real-world scenarios where complete motion capture may be compromised.

2.6. Unsupervised Domain Adaptation

To tackle the drop in performance due to a mismatch between training and testing conditions of models, unsupervised domain adaptation (UDA) was proposed. The reader is pointed towards [20] for a survey. Popular UDA techniques, inspired by Generative Adversarial Networks (GANs), include Domain Adversarial Neural Networks (DANNs) [21] and Conditional Domain Adversarial Networks (CDANs) [22]. GANs involve a game where two neural networks, the generator and the discriminator, compete against each other, with the generator creating synthetic data and the discriminator trying to distinguish between real and synthetic data, thereby improving their abilities through this adversarial process. Usually, the discriminator in GANs distinguishes data coming from the original dataset from those produced by the generator. In contrast, for both DANNs and CDANs, the discriminator tries to, based on the internal model features, distinguish between data that are collected from a source domain (the environments in which training data were collected) and target domain (the environment in which the model will be used in). The key difference lies in the input to the discriminator. DANNs learning will result in an encoder that delivers a feature representation, which makes it hard for the discriminator to discern if the data came from the source or target environment. CDANs extend this concept by conditioning the discriminator on both the feature representation and the corresponding class predictions. The DANN setup is visualized in Figure 5, and a similar architecture applies to CDANs with the addition of the conditional input to the discriminator.

Let us consider a labeled source dataset

{X_{i}^{(s)}, y_{i}^{(s)}}_{i = 1}^{i = n}

and an unlabeled target dataset

{X_{i}^{(t)}}_{i = 1}^{i = m}

. We denote E as the encoder, D as discriminator and C as classifier. The objective function of DANNs is as follows:

\begin{matrix} min_{E, C} L_{C} (C (E (X^{(s)})), y^{(s)}) - λ (L_{D} (D (f (X^{(s)}))) - L_{D} (D (f (X^{(t)})))) \end{matrix}

(2)

\begin{matrix} max_{D} L_{D} (D (f (X^{(s)}))) - L_{D} (D (f (X^{(t)}))) \end{matrix}

(3)

where

f (.)

represents the feature transformation specific to each method. For DANN,

f (X) = E (X)

, focusing solely on the encoded features. In contrast, for CDAN,

f (X) = E (X) \otimes C (E (X))

is a multilinear feature map to combine the feature representation as well as classifier output of a source/target sample before being fed to the discriminator. The first equation is minimizing the classifier loss (

L_{C}

) and discriminator loss (

L_{D}

) by optimizing the parameter values in E and C. The discriminator loss encourages the transformed source data to be similar to the transformed target data. A hyper-parameter

λ

controls the trade-off between the two terms. The second objective function searches for discriminator parameter values that maximize the discriminator accuracy.

3. Dataset

Regrettably, limited datasets [9,23,24,25] are publicly available online. These datasets do not enable performing the targeted cross-environment experiments, and differ in the monitored activities and radar features, which makes it impossible to fuse them. For instance, some datasets would either have point cloud, radar cube, range-Doppler maps or time-Doppler maps. For these reasons, a proprietary dataset was collected.

3.1. Recording Environment

Data collection was conducted across two distinct environments, involving a total of 12 participants. All participants were male, ranging in age from 24 to 45 years.

Environment A: The first room was equipped with furniture to mimic a hospital environment (Figure 6). It had dimensions of 2.3 m × 5.8 m × 3.25 m and contained two medical beds, a table and medical equipment. The radar was mounted on the ceiling. The two beds were placed perpendicularly to the line of sight of the radar with an angle of approximately 45° with respect to each bed. The chair was placed near a table underneath the radar in between the two beds. In the hospital environment, a radar is mounted to the ceiling with an angle of 45°
Environment B: The second room was organized to mimic an ambient assisted living situation (Figure 7). The room had dimensions 2.3 m × 5.8 m × 3.25 m and included a sofa, a TV, a table with four chairs and a sink. The radar was again mounted on the ceiling. Environment B was more cluttered compared to environment A as it contained more items and had less free space. The radar is mounted to the ceiling and covers all the room in its field of view.

3.2. Radar Sensor Configuration

In this work, a Frequency Modulated Continuous Wave (FMCW) radar from Texas Instruments (Dallas, TX, USA) (IWR6843AoP) was used. The considered mmWave radar operates in the range of 60 GHz to 64 GHz and was designed for automotive and industrial applications. It allows the control of the transmitted signal waveform by defining chirp and frame parameters. Parameters can be configured to adjust the range resolution, maximum range, velocity resolution and maximum velocity [26]. In this work, the configuration in Table 1 was used.

It is essential to use a radar configuration with a Doppler bandwidth that encompasses the range of human movement speeds. Additionally, range resolution is a critical parameter for localization as it determines the minimum distance required to distinguish between two objects in the range profile. Our data collection was conducted using a configuration that provides a 12 cm range resolution and a 0.093 m/s Doppler resolution, as detailed in Table 1.

3.3. Data Collection

To avoid any contrived data, participants were not given instructions on how to perform the movements. The participants were only asked to perform a specific sequence of activities. Other than that, they had the freedom to choose how to execute them. Next to the radar data, camera images were also stored. During the data annotation process, the on- and offset times of activities were marked on the time-Doppler maps with the help of the camera images. For certain activities such as “lay-down” (on the bed), a participant could first “sit-down” on the bed after which the participant laid down. In such a situation, the first part was annotated as “sit-down” and the second as “lay-down”. A similar situation occurred for “get-up” (from the bed), where some participants first “sit-up” and then, after a few seconds, “stand-up”.

The various human activities documented in this research are presented in Figure 8.

Informed consent was obtained from all subjects involved in the study. For the two environments, the following data were collected:

Environment A: Ten participants each repeatedly performed four different scenarios five times. Each scenario contained a sequence of activities. In total, the dataset contains 246 “walk”, 96 “sit-down”, 96 “stand-up”, 42 “hand-movement”, 108 “lay-down” and 108 “get-up” activities.
Environment B: Five participants each repeatedly performed a sequence of activities five times. These data contain 78 “walk”, 46 “sit-down”, 48 “stand-up”, 25 “hand-movement”, 25 “lay-down” and 24 “get-up” activities.

4. Experimental Setup

This section elucidates the underlying reason for the amount of temporal context that was inputted to the HAR model; then, the model architecture and training parameters are explained to finally conclude with the evaluation metrics used to assess model’s performance.

4.1. Temporal Context

In order to let the neural network predict the activity based on the time-Doppler map, a sufficient amount of temporal context (or length of the time horizon, which is determined by the number of considered radar signal frames) is required. To ensure that the longest activity can be fully represented in the time-Doppler map, a temporal context of 2.5 s was chosen. Note that for the “walking” activity, which can take much longer than 2.5 s, it is expected that within 2.5 s, at least a single full gait cycle will be covered by the considered temporal context.

4.2. Model Architecture

The activities are recognized using the CNN architecture shown in Figure 9. This classifier predicts the presence of six activities (walk, sit-down, lay-down, stand-up, get-up, hand-movement) based on time-Doppler maps. The model input is composed out of 50 frames (which, in our radar setup, results in a temporal context of 2.5 s) and 64 frequency bins. The architecture of the CNN model was inspired by the work of Bhavansi et al. [9]. The model has four convolutional layers with 8, 16, 32 and 64 kernels, respectively. Each convolution layer is stacked to a max pooling on the frequency axis except for the fourth convolutional layer. The Exponential Linear Unit (ELU) activation function, a variant of Relu, is used due to its effectiveness against vanishing gradient and faster learning process [27]. Subsequently, there are two fully connected layers comprising 32 perceptrons each. The output layer of the model employs a softmax activation, which transforms the output values to a probability distribution corresponding to the human activities. A filter size of three by three was chosen for all four convolutional layers. As the model goes deeper, it delves more extensively into specific features and have a higher receptive field [28]. To prevent overfitting, dropout, batch normalization and L2 regularization with a factor of

1 \times 10^{- 4}

are mutually utilized. Given that our task involves classification, we employ categorical cross entropy as a loss function in conjunction with an Adam optimizer. Initially, the learning rate is set to

0.0001

. However, to ensure optimal convergence, we dynamically reduce the learning rate in the function of the number of epochs, especially when the loss function is no longer decreasing. In DANN and CDAN domain adaptation methods, a discriminator network is required. In this paper, the latter network was composed out of two dense layers, with each layer consisting of 32 perceptrons. The output layer is a single perceptron with a sigmoid activation function. The sigmoid activation function is employed to squash the output values between 0 and 1, enabling the discriminator to make probabilistic predictions regarding the environment of the input data.

4.3. Classification Evaluation Metrics

To assess the performance of the HAR models learned using different alternative learning procedures, classification experiments are targeted. This means that it is assumed that the activities are to be isolated first (by some segmentation process) prior to being classified. Since the classifier model is expecting time-Doppler maps with fixed dimensions (corresponding to 50 frames or 2.5 s and 64 frequency bins) while each human activity has a different length, the following procedure is used. When the activity is longer than 50 frames, it is chopped into overlapping parts that each have 50 frames along the time dimension with a hope size of one frame. If an activity has less than 50 frames, then zero padding is used to fill-up the time-Doppler map. To assess the performance of the models, the averaged precision, recall and F1-scores are calculated either per time-Doppler map (window based score) or per activity (activity based score). For the latter, majority voting is applied on the predictions from all time-Doppler maps linked to the same activity.

5. Experiments

This section provides results of the experiments that were conducted. Firstly, a Leave-One-Subject-Out (LOSO) experiment is carried out as a baseline experiment to assess the model performance when it is trained and validated on data collected from the same environment. Two training procedures, one with and the other without data augmentation, are compared to each other. Secondly, an experiment is carried out where the training and validation data were collected from two different environments. The training procedures with and without data augmentation are compared to a domain adaptation strategy. For all experiments, the encoder model—consisting of the convolutional layers—of the CNN model is first pretrained in an unsupervised manner using an encoder–decoder architecture and data from both environments in a similar fashion as performed in Seyfioglu et al. [29]. In this way, the encoder weights are initialized properly before fine-tuning them on the specific HAR task.

5.1. Baseline Performance in Single Environment

For this experiment, the data from a single environment A were used as it contained the most activities. The training procedures with and without data augmentation were compared to each other using a LOSO cross-validation method. Trivially, in this case, the training and validation data are collected from the same environment A. In the LOSO experiment, the model is trained using the data from nine participants first and validated using the data from the remaining 10th participant. This process is repeated 10 times and the evaluation metrics are averaged. This approach reduces the risk of overfitting to specific participants. The same experiment is reproduced using data augmentation during the training phase. Activities with less samples are augmented using the random time–frequency warping technique to let all activity classes have the same number of examples as the largest activity class (walking in this dataset). This method compensates the class imbalance in the dataset and pushes towards a faster model convergence [16]. Experiments that further increased the training dataset size in all classes (while keeping them balanced) using data augmentation did not improve the results significantly.

Without data augmentation, the window-based scores are as follows: a mean F1-score of

90.5 \pm 6.7 %

, a mean recall of

91.4 \pm 6.0 %

and mean precision of

91.4 \pm 6.1 %

. Models trained using data augmentation have a comparable performance to the previous experiment, where the mean F1-score is

91.9 \pm 5.2 %

, mean recall is

92.3 \pm 4.7 %

and mean precision is

92.9 \pm 4.3 %

.

5.2. Cross-Environment Robustness

After assessing the model performance in a single environment, the model robustness to a change of environment is assessed. For this purpose, a model is trained on data from environment A and validated on data from environment B, and vice-versa. Note that environment A has fewer pieces of furniture than environment B.

For both the training procedures with and without data augmentation, the experiment is repeated 10 times with 10 different (but fixed) realizations of training and validation sets. The same model architecture as in the previous baseline experiment was used. The pretrained encoder model together with the randomly initialized fully connected layers is trained for HAR using data from a single source environment. Subsequently, the model is evaluated on the second target environment. Evaluation metrics of the 10 experimental runs are shown in Table 2. Compared to the baseline result, there is a clear drop in performance. For example, when training a model without data augmentation using all data from environment B and evaluating it on that of environment A, the F1-score performance drops, with 38.5% compared to the baseline experiment. When comparing the training procedure with and without data augmentation, it can be observed that data augmentation enhanced the model generalization capabilities. This can be observed by comparing the activity F1-score, which is increased by 11.8% when moving from environment A to environment B. Conversely, when moving from B to A, the activity F1-score is improved by 28.1%. This indicates that moving from a more cluttered environment B to a less cluttered one A results in better generalization capabilities of the model compared to doing it the other way around.

5.3. Domain Adaptation

In this section, the results of four domain adaptation experiments are summarized. In the first two experiments, domain adaptation is used with and without data augmentation, where data from environment A are used as source data and that of environment B as target data. In the second set of two experiments, the roles are reversed with B as source and A as target. The same 10 realizations of training and validation sets are used here to compare the alternatives. The encoder and classifier network of the DANN and CDAN models are initialized using trained models from the previous experiment. The discriminator network was initialized randomly. A regularization parameter (

λ

) of one is selected. Preliminary analysis indicated that this was the most proper value. A learning rate of

1 \times 10^{- 5}

is used, the models are trained for 20 epochs and checkpoints are saved in each one. Then, the model checkpoint with the highest performance is chosen.

Results in Table 2 indicate the benefit of using domain adaptation. Without data augmentation, the DANN strategy improved the activity F1-score in the transition from environments A to B with 5.7% (60.9% → 65.6%) compared to a model trained without data augmentation. When adapting a model that was trained using the augmented dataset, the activity F1-score is enhanced by 4.6% (72.7% → 77.3%) for DANN and by 3.3% (72.7% → 76.0%) for CDAN compared to a model trained with data augmentation (but without adaptation). Note that, for the adaptation, a target dataset was also used that was balanced over the classes using the suggested data augmentation. When transitioning from environments B to A, the adaptation without data augmentation resulted in an improvement of 23.1% (52.0% → 75.1%) for DANN and 13.4% (52.0% → 63.4%) for CDAN. With data augmentation, adaptation caused the model to further improve by 5.6% (80.1% → 85.7%) for DANN and by 2.7% (80.1% → 82.8%) for CDAN. This suggests that adapting the learned representations from a model trained on a less complex domain to a more complex one represents greater challenges than the inverse scenario.

From the results shown in Table 2, the optimal generalization performance occurs when domain adaptation and data augmentation are jointly used. For this experiment, the DANN strategy turned out to have the best performance. An 85.7% F1-score is obtained when transitioning from environments B to A and 77.3% in the other direction. Notably, a model trained only using data augmentation exhibited a comparable F1-score with only 5% average difference relative to the model trained using the combined approach. Moreover, the data augmentation training approach demonstrated superior performance compared to the domain-adapted model trained on non-augmented data.

5.4. Ablation Study on the Impact of the Data Augmentation Configuration

Additionally, we conducted an ablation study to examine two key aspects of data augmentation: (1) the effect of varying the quantity of synthetically augmented samples on model performance; (2) the comparative effectiveness of different augmentation techniques. Our investigation systematically analyzed how model performance changes when increasing the augmented dataset, while also comparing distinct augmentation methods. In total, six augmentation methods were studied: frequency warping (freq-warp), time warping (time-warp), coupled time–frequency warping (timefreq-warp), uncoupled time–frequency warping (timefreq-warp-uncoupled) and two zero-replacement strategies (replace-bins-with-zeros and replace-frame-with-zeros). The experiment was conducted using five incrementally larger numbers of augmented samples (2k, 5k, 20k, 40k, 80k) to assess how augmentation effectiveness scales with data availability. The training process is performed on the dataset from environment A and the testing is performed on environment B. We performed five independent trials of each experiment and computed the mean F1-score across all runs to assess the model’s performance.

The analysis of data augmentation methods (Figure 10) reveals significant trends. Time–frequency warping techniques consistently outperformed other methods, with coupled time–frequency warping achieving the highest F1-score of 74.8% at the largest dataset size. As the dataset size grows, we observe a decrease in standard deviation, indicating more reliable and consistent performance. Time warping alone also demonstrated strong performance, particularly in smaller dataset conditions. Time-warping yielded comparable performance to time–frequency warping when trained on a large dataset. While zero-masking strategies showed consistent performance across different dataset sizes, with F1-scores ranging from 64 to 67%, they underperformed compared to time–frequency warping and time warping strategies. Frequency warping yielded the lowest performance across all dataset sizes. Generally, the average F1-score increases as the size of the augmented dataset grows, with time-based methods showing substantial improvements, while frequency-based approaches and masking techniques remain relatively stagnant despite increased data size.

6. Discussion and Perspectives

When observing the confusion tables in Table 3, it is noted that the “lay-down” and “get-up” events are often confused in environment B. Here, the sofa (where people were asked to lay-down on) is oriented at a 90-degree angle compared to the bed in environment A. It was observed that the participants performed these activities at a slower pace on the sofa compared to the bed, which caused the time-Doppler map to have speeds that concentrate themselves around the zeroth Doppler bin. Typically, lying down on a normal bed involves sitting on the edge and then swinging up the legs while reclining backward. The height of the bed makes this process easier. The sofa has a lower height and an uneven cushioning, which makes these movements dissimilar to the bed. The employed frequency warping technique is able to compress the time-Doppler signatures of environment A to make them more similar to those from environment B. This improved the model performance with respect to these cases. However, it is expected that an increase in resolution on the frequency axis will have an improved representation of slow movements since it might yield more nuanced and detailed patterns. This can be accomplished by changing the configuration of the FMCW radar in future experiments.

Additionally, in environment B, the chairs are positioned around the table, partially constraining the “sit-down” and “stand-up” events. The process of these activities is further complicated as the participant has to move the chair to sit or stand up. This is different compared to environment A, where the table was not obstructing these movements. Besides, the presence of the table may obstruct the radar signal as there is no direct line of sight between some body parts and the radar.

To sum up, environmental factors play a crucial role in radar-based human activity recognition, with various elements impacting the quality and characteristics of radar signatures. The physical setup, including furniture type and placement, significantly influences how activities are performed—from the speed of movements to posture variations, as demonstrated by the differences between medical bed and sofa interactions. Environmental complexities such as occlusion from furniture and the presence of moving objects like chairs create additional challenges by altering the Doppler profile captured by the radar.

7. Conclusions

In this work, different training procedures and their results are assessed in terms of their robustness to unseen environments. For this purpose, a dataset that includes 12 participants in two different environments was collected. When assessing a model trained on data from one environment using data from the other, the model performance dropped significantly (29.6% degradation in F1-score) compared to when the model was evaluated on data from the same environment. When transitioning from a less to more complex environment, our experiments indicated that both data augmentation and domain adaptation increased the model performance in unseen environments. Data augmentation manifested a strong robustness improvement with an increase of 11.8% (60.9% → 72.7%) in F1-score. The use of the DANN strategy yielded the best generalization performance when combined with data augmentation resulting in a significant F1-score increase of 16.4% (60.9% → 77.3%). Furthermore, when the transition occurs from a more to a less complex environment, the F1-score improves by 28.1% (52.0% → 80.1%) with data augmentation and by 33.7% (52.0% → 85.7%) when the DANN strategy is jointly used with data augmentation. However, there is still a performance gap with the baseline performance of a model that was trained and validated with data from the same environment.

Empirical evidence demonstrates that, in scenarios with limited annotated data, the proposed proportional time–frequency warping combined with domain adaptation significantly enhances the robustness of FMCW-based HAR models. As a result, these techniques effectively reduce the need for extensive data collection and labeling. However, it is important to acknowledge the inherent limitations of data augmentation, particularly in simulating complex environmental phenomena such as occlusion. This highlights the need for a balanced approach, combining well-chosen variations in real-world measurements with simulated data, to support comprehensive model development in radar applications. In future work, this insight can be used to add data that include situations that cannot be directly simulated with data augmentation methods, for example, situations where furniture creates challenges for radar detection in two ways: first, when objects block the radar’s view of parts of the body, and second, when furniture influences how people move—such as when someone adjusts a chair while getting up from a table. Both scenarios affect the radar’s ability to capture accurate movement patterns. Furthermore, ideas from infrared object detection, such as in [30], to efficiently recalibrate features to address domain mismatch will be explored.

Author Contributions

Conceptualization, R.E.H. and P.K.; Methodology, R.E.H. and P.K.; Software, R.E.H.; Validation, R.E.H. and P.K.; Formal analysis, R.E.H. and P.K.; Resources, P.K.; Data curation, R.E.H. and P.M.; Writing—original draft, R.E.H.; Writing—review & editing, R.E.H., P.M., D.M.M.-P.S. and P.K.; Visualization, R.E.H.; Supervision, D.M.M.-P.S. and P.K.; Project administration, D.M.M.-P.S. and P.K.; Funding acquisition, D.M.M.-P.S. and P.K. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported by the NextPerception project funded from H2020 ECSEL-2019-2-RIA Joint Undertaking (GA No. 876487), CZ GA No. 8A20007; by project DistriMuSe funded from HORIZON-KDT-JU-2023-2RIA Joint Undertaking (GA No. 101139769), CZ GA No. 8A20007; and by the Flemish FWO project (grant number: G0B9821N).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Privacy and Ethical Review Commission under protocol code G-2024-8332. The ethical considerations were carefully reviewed to ensure compliance with the relevant guidelines, protecting the rights and privacy of all participants involved in the study.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data supporting the findings of this study are not publicly available at this time due to ongoing research but will be made available in the future. Interested researchers may contact the corresponding author for further details.

Acknowledgments

We would like to thank MobiLab for providing the Experience Laboratory for data collection.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rys, A. Health-EU Newsletter 250—Focus. European Commission. 2024. Available online: https://health.ec.europa.eu/other-pages/basic-page/health-eu-newsletter-250-focus_en (accessed on 24 November 2024).
Pew Research Center. Older People Are More Likely to Live Alone in the U.S. Than Elsewhere in the World. 2020. Available online: https://www.pewresearch.org/short-reads/2020/03/10/older-people-are-more-likely-to-live-alone-in-the-u-s-than-elsewhere-in-the-world/ (accessed on 10 June 2024).
Eurostat. Ageing Europe—Statistics on Housing and Living Conditions. 2024. Available online: https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Ageing_Europe_-_statistics_on_housing_and_living_conditions (accessed on 10 June 2024).
Mehrjouseresht, P.; Hail, R.E.; Karsmakers, P.; Schreurs, D.M.P. Respiration and Heart Rate Monitoring in Smart Homes: An Angular-Free Approach with an FMCW Radar. Sensors 2024, 24, 2448. [Google Scholar] [CrossRef] [PubMed]
Zeintl, C.; Eibensteiner, F.; Langer, J. Evaluation of FMCW Radar for Vibration Sensing in Industrial Environments. In Proceedings of the 2019 29th International Conference Radioelektronika (RADIOELEKTRONIKA), Pardubice, Czech Republic, 16–18 April 2019; pp. 1–5. [Google Scholar] [CrossRef]
Ullmann, I.; Guendel, R.G.; Kruse, N.C.; Fioranelli, F.; Yarovoy, A. A Survey on Radar-Based Continuous Human Activity Recognition. IEEE J. Microwaves 2023, 3, 938–950. [Google Scholar] [CrossRef]
Cao, L.; Liang, S.; Zhao, Z.; Wang, D.; Fu, C.; Du, K. Human Activity Recognition Method Based on FMCW Radar Sensor with Multi-Domain Feature Attention Fusion Network. Sensors 2023, 23, 5100. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Zhao, Y.; Ma, X.; Gao, Q.; Pan, M.; Wang, H. Cross-Scenario Device-Free Activity Recognition Based on Deep Adversarial Networks. IEEE Trans. Veh. Technol. 2020, 69, 5416–5425. [Google Scholar] [CrossRef]
Bhavanasi, G.; Werthen-Brabants, L.; Dhaene, T.; Couckuyt, I. Patient activity recognition using radar sensors and machine learning. Neural Comput. Appl. 2022, 34, 16033–16048. [Google Scholar] [CrossRef]
Gorji, A.; Khalid, H.; Bourdoux, A.; Sahli, H. On the Generalization and Reliability of Single Radar-Based Human Activity Recognition. IEEE Access 2021, 9, 8533485349. [Google Scholar] [CrossRef]
Shah, S.A.; Fioranelli, F. Human Activity Recognition: Preliminary Results for Dataset Portability using FMCW Radar. In Proceedings of the 2019 International Radar Conference (RADAR), Toulon, France, 23–27 September 2019; pp. 1–4. [Google Scholar] [CrossRef]
Muaaz, M.; Waqar, S.; Pätzold, M. Orientation-independent human activity recognition using complementary radio frequency sensing. Sensors 2023, 23, 5810. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Hou, C.; Lang, Y.; Sakamoto, T.; He, Y.; Xiang, W. Omnidirectional Motion Classification With Monostatic Radar System Using Micro-Doppler Signatures. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3574–3587. [Google Scholar] [CrossRef]
Upadhyaya, S.; Buyens, W.; Vranken, E.; Desmet, W.; Karsmakers, P. Assessment of Data Augmentation and Transfer Learning for Making PIG Cough Classifier Robust to Changing Farm Conditions. In Proceedings of the 2023 International Conference on Machine Learning and Applications (ICMLA), Jacksonville, FL, USA, 15–17 December 2023; pp. 952–957. [Google Scholar] [CrossRef]
Kern, N.; Waldschmidt, C. Data Augmentation in Time and Doppler Frequency Domain for Radar-based Gesture Recognition. In Proceedings of the 2021 18th European Radar Conference (EuRAD), London, UK, 16–18 February 2022; pp. 33–36. [Google Scholar] [CrossRef]
She, D.; Lou, X.; Ye, W. RadarSpecAugment: A Simple Data Augmentation Method for Radar-Based Human Activity Recognition. IEEE Sens. Lett. 2021, 5, 1–4. [Google Scholar] [CrossRef]
Long, M.; Wang, J. Learning Transferable Features with Deep Adaptation Networks. arXiv 2015, arXiv:1502.02791. [Google Scholar] [CrossRef]
Lin, J.; Hu, J.; Xie, Z.; Zhang, Y.; Huang, G.; Chen, Z. A Multitask Network for People Counting, Motion Recognition, and Localization Using Through-Wall Radar. Sensors 2023, 23, 8147. [Google Scholar] [CrossRef] [PubMed]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Liu, X.; Yoo, C.; Xing, F.; Oh, H.; El Fakhri, G.; Kang, J.W.; Woo, J. Deep unsupervised domain adaptation: A review of recent advances and perspectives. APSIPA Trans. Signal Inf. Process. 2022, 11, 1–51. [Google Scholar] [CrossRef]
Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; March, M.; Lempitsky, V. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar]
Long, M.; Cao, Z.; Wang, J.; Jordan, M.I. Conditional adversarial domain adaptation. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, QC, Canada, 2–8 December 2018. [Google Scholar]
Yang, S.; Le Kernec, J.; Romain, O.; Fioranelli, F.; Cadart, P.; Fix, J.; Ren, C.; Manfredi, G.; Letertre, T.; Sáenz, I.D.H.; et al. The Human Activity Radar Challenge: Benchmarking based on the ‘Radar signatures of human activities’ dataset from Glasgow University. IEEE J. Biomed. Health Inform. 2023, 27, 1813–1824. [Google Scholar] [CrossRef] [PubMed]
Singh, A.D.; Sandha, S.S.; Garcia, L.; Srivastava, M. RadHAR: Human Activity Recognition from Point Clouds Generated through a Millimeter-wave Radar. In Proceedings of the 3rd ACM Workshop on Millimeter-Wave Networks and Sensing Systems, ACM, Los Cabos, Mexico, 25 October 2019; pp. 51–56. [Google Scholar]
Jin, F.; Sengupta, A.; Cao, S. mmFall: Fall Detection using 4D MmWave Radar and Variational Recurrent Autoencoder. IEEE Trans. Autom. Sci. Eng. 2020, arXiv:2003.02386. [Google Scholar] [CrossRef]
Iovescu, C.; Rao, S. The Fundamentals of Millimeter Wave Sensors; Texas Instruments: Dallas, TX, USA, 2017; pp. 1–8. [Google Scholar]
Clevert, D.A.; Unterthiner, T.; Hochreiter, S. Fast and accurate deep network learning by exponential linear units (elus). arXiv 2015, arXiv:1511.07289. [Google Scholar]
Araujo, A.; Norris, W.; Sim, J. Computing Receptive Fields of Convolutional Neural Networks. Distill 2019, 4, e21. Available online: https://distill.pub/2019/computing-receptive-fields (accessed on 30 November 2024). [CrossRef]
Seyfioğlu, M.S.; Gürbüz, S.Z. Deep Neural Network Initialization Methods for Micro-Doppler Classification with Low Training Sample Support. IEEE Geosci. Remote Sens. Lett. 2017, 14, 2462–2466. [Google Scholar] [CrossRef]
Zhang, R.; Xu, L.; Yu, Z.; Shi, Y.; Mu, C.; Xu, M. Deep-IRTarget: An Automatic Target Detector in Infrared Imagery Using Dual-Domain Feature Extraction and Allocation. IEEE Trans. Multimed. 2022, 24, 1735–1749. [Google Scholar] [CrossRef]

Figure 1. Complete design flow from data collection in environment A and signal processing, through training the neural network with and without data augmentation, and testing it on data from another environment B. When the unsupervised domain adaptation approach is employed the model is first adapted towards the new environment (without the need of annotated data) before it is used for classification.

Figure 2. Time-Doppler maps of different activities.

Figure 3. This figure illustrates the impact of different warping factors in the coupled time–frequency warping transform on time-Doppler maps. (a) This figure displays the original time-Doppler map. (b) This figure illustrates the augmented version using a frequency warping factor lower than 1. (c) This figure depicts the augmented map using a frequency warping factor higher than 1.

Figure 4. This figure demonstrates the effects of time and frequency masking. (a) In time masking, all Doppler bins associated with a specific frame are set to zero (b); whereas, in frequency masking, a single Doppler bin is zeroed across all time stamps (c).

Figure 5. DANN architecture.

Figure 6. Environment A: Hospital Room. Radar indicated by red dashed line.

Figure 7. Environment B: Ambient Assisted Living. Radar indicated by red dashed line.

Figure 8. The range of human activities analyzed in this study.

Figure 9. The employed CNN model architecture.

Figure 10. Comparative analysis of data augmentation methods across dataset sizes (mean F1-scores ± standard deviation). The number of augmented data samples is indicated in the column names.

Table 1. Radar configurations parameters.

Parameter	Value
Number of transmit antennas	3
Number of receive antennas	4
Number of chirp loops	96
Number of ADC samples	64
Frame rate	50 ms
Start frequency	60.75 GHz
Wavelength	4.94 mm
Slope	54.71 MHz/us
Bandwidth	1180.05 MHz
Sampling frequency	2950 ksps
Range resolution	0.126 m
Maximum range	6.46 m
Doppler resolution	0.093 m/s
Maximum Doppler	4.471 m/s

Table 2. Assess the model robustness to changing environmental conditions with and without data augmentation and/or unsupervised domain adaptation.

		Without Data Augmentation	Data Augmentation	DANN	DANN Augmentation	CDAN	CDAN Augmentation
Environment	Metric	$μ \pm σ$	$μ \pm σ$	$μ \pm σ$	$μ \pm σ$	$μ \pm σ$	$μ \pm σ$
A to B	Activity F1-score	$60.9 \pm 4.3$	$72.7 \pm 4.4$	$65.6 \pm 1.9$	$77.3 \pm 1.6$	$57.5 \pm 3.7$	$76.0 \pm 3.2$
	Activity recall	$61.9 \pm 5.0$	$73.7 \pm 3.7$	$66.2 \pm 2.5$	$78.1 \pm 1.3$	$60.8 \pm 2.3$	$76.9 \pm 2.8$
	Activity precision	$65.9 \pm 6.9$	$75.6 \pm 2.4$	$70.0 \pm 1.4$	$79.6 \pm 1.4$	$66.3 \pm 2.7$	$78.1 \pm 1.9$
	Window F1-score	$55.8 \pm 3.5$	$70.7 \pm 2.9$	$61.0 \pm 1.4$	$74.6 \pm 0.9$	$55.2 \pm 3.3$	$73.2 \pm 3.2$
	Window recall	$55.9 \pm 4.5$	$69.1 \pm 3.7$	$59.6 \pm 1.7$	$75.6 \pm 0.9$	$55.5 \pm 2.4$	$73.2 \pm 2.6$
	Window precision	$62.3 \pm 1.6$	$74.9 \pm 1.2$	$67.2 \pm 0.9$	$76.0 \pm 0.8$	$65.4 \pm 1.6$	$75.4 \pm 2.9$
B to A	Activity F1-score	$52.0 \pm 6.0$	$80.1 \pm 2.8$	$75.1 \pm 4.2$	$85.7 \pm 3.1$	$63.4 \pm 7.4$	$82.8 \pm 4.3$
	Activity recall	$56.6 \pm 5.0$	$82.3 \pm 2.5$	$77.2 \pm 3.4$	$86.3 \pm 2.4$	$68.8 \pm 4.2$	$83.5 \pm 4.3$
	Activity precision	$55.1 \pm 8.6$	$83.4 \pm 2.9$	$77.5 \pm 5.1$	$88.5 \pm 2.0$	$63.8 \pm 9.4$	$83.2 \pm 4.6$
	Window F1-score	$36.7 \pm 6.4$	$64.8 \pm 1.8$	$71.3 \pm 3.0$	$82.4 \pm 2.8$	$60.9 \pm 7.0$	$78.8 \pm 2.7$
	Window recall	$47.5 \pm 4.8$	$67.5 \pm 1.6$	$73.6 \pm 2.5$	$83.4 \pm 2.1$	$65.0 \pm 4.6$	$80.9 \pm 2.6$
	Window precision	$37.7 \pm 7.3$	$65.7 \pm 1.8$	$72.7 \pm 4.4$	$83.9 \pm 2.4$	$64.6 \pm 8.9$	$77.7 \pm 3.0$

Table 3. Comparison of confusion matrices: Env A to Env B vs. Env B to Env A (non-augmented and augmented data). The diagonal elements of the confusion matrix are highlighted in bold.

	Env A to Env B (Non-Augmented)						Env A to Env B (Augmented)
	A1	A2	A3	A4	A5	A6	A1	A2	A3	A4	A5	A6
A1	60	0	4	0	10	1	74	0	1	0	1	0
A2	0	26	13	3	2	0	1	35	8	0	0	0
A3	0	0	10	0	14	0	3	0	9	0	12	0
A4	1	8	2	24	11	0	2	6	0	28	9	0
A5	0	0	13	0	10	0	1	0	4	0	17	0
A6	0	0	0	0	0	25	0	0	0	0	0	25
	Env B to Env A (Non-Augmented)						Env B to Env A (Augmented)
	A1	A2	A3	A4	A5	A6	A1	A2	A3	A4	A5	A6
A1	37	2	0	2	3	0	46	0	0	0	0	0
A2	0	26	1	5	0	0	0	33	0	1	0	0
A3	4	19	6	0	7	0	1	2	30	0	3	0
A4	2	1	0	16	0	0	0	0	0	19	0	0
A5	18	0	9	0	6	0	2	0	24	0	9	0
A6	2	0	0	1	0	10	0	0	0	0	0	14

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

El Hail, R.; Mehrjouseresht, P.; Schreurs, D.M.M.-P.; Karsmakers, P. Radar-Based Human Activity Recognition: A Study on Cross-Environment Robustness. Electronics 2025, 14, 875. https://doi.org/10.3390/electronics14050875

AMA Style

El Hail R, Mehrjouseresht P, Schreurs DMM-P, Karsmakers P. Radar-Based Human Activity Recognition: A Study on Cross-Environment Robustness. Electronics. 2025; 14(5):875. https://doi.org/10.3390/electronics14050875

Chicago/Turabian Style

El Hail, Reda, Pouya Mehrjouseresht, Dominique M. M.-P. Schreurs, and Peter Karsmakers. 2025. "Radar-Based Human Activity Recognition: A Study on Cross-Environment Robustness" Electronics 14, no. 5: 875. https://doi.org/10.3390/electronics14050875

APA Style

El Hail, R., Mehrjouseresht, P., Schreurs, D. M. M.-P., & Karsmakers, P. (2025). Radar-Based Human Activity Recognition: A Study on Cross-Environment Robustness. Electronics, 14(5), 875. https://doi.org/10.3390/electronics14050875

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Radar-Based Human Activity Recognition: A Study on Cross-Environment Robustness

Abstract

1. Introduction

2. Methodology

2.1. Pre-Processing

2.2. Normalization

2.3. HAR Model Training

2.4. Post-Processing

2.5. Data Augmentation

2.6. Unsupervised Domain Adaptation

3. Dataset

3.1. Recording Environment

3.2. Radar Sensor Configuration

3.3. Data Collection

4. Experimental Setup

4.1. Temporal Context

4.2. Model Architecture

4.3. Classification Evaluation Metrics

5. Experiments

5.1. Baseline Performance in Single Environment

5.2. Cross-Environment Robustness

5.3. Domain Adaptation

5.4. Ablation Study on the Impact of the Data Augmentation Configuration

6. Discussion and Perspectives

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI