Multi-Feature Unsupervised Domain Adaptation (M-FUDA) Applied to Cross Unaligned Domain-Specific Distributions in Device-Free Human Activity Classification

Hassan, Muhammad; Kelsey, Tom

doi:10.3390/s25061876

Open AccessArticle

Multi-Feature Unsupervised Domain Adaptation (M-FUDA) Applied to Cross Unaligned Domain-Specific Distributions in Device-Free Human Activity Classification

by

Muhammad Hassan

and

Tom Kelsey

^*

School of Computer Science, University of St. Andrews, St. Andrews KY16 9SX, UK

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(6), 1876; https://doi.org/10.3390/s25061876

Submission received: 30 January 2025 / Revised: 25 February 2025 / Accepted: 26 February 2025 / Published: 18 March 2025

(This article belongs to the Special Issue Advances in Wireless Sensor and Mobile Networks)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Human–computer interaction (HCI) drives innovation by bridging humans and technology, with human activity recognition (HAR) playing a key role. Traditional HAR systems require user cooperation and infrastructure, raising privacy concerns. In recent years, Wi-Fi devices have leveraged channel state information (CSI) to decode human movements without additional infrastructure, preserving privacy. However, these systems struggle with unseen users, new environments, and scalability, thereby limiting real-world applications. Recent research has also demonstrated that the impact of surroundings causes dissimilar variations in the channel state information at different times of the day. In this paper, we propose an unsupervised multi-source domain adaptation technique that addresses these challenges. By aligning diverse data distributions with target domain variations (e.g., new users, environments, or atmospheric conditions), the method enhances system adaptability by leveraging public datasets with varying domain samples. Experiments on three public CSI datasets using a preprocessing module to convert CSI into image-like formats demonstrate significant improvements to baseline methods with an average micro-F1 score of 81% for cross-user, 76% for cross-user and cross-environment, and 73% for cross-atmospheric tasks. The approach proves effective for scalable, device-free sensing in realistic cross-domain HAR scenarios.

Keywords:

cross-domains; domain-generalization; multi-source unsupervised domain adaptation; combined-source unsupervised domain adaptation; channel state information; diverse domains; device-free sensing

1. Introduction

The vision of a smart city relies on interactive data dissemination among smart devices through human interaction. Human involvement in commanding machines has traditionally involved manual switching. In recent years, we have entered a developed era of audiovisual techniques involving little or no human obtrusion, bringing the concept of an ultra-modern society closer to reality. Audiovisual and sensor-based networks can monitor an environment continuously for long periods with an unobtrusive sensing mechanism. This unobtrusive sensing forms a key contribution to our ability to recognize human gestures by using smart technologies, motivating the role of an advanced field of sensing, termed human activity recognition (HAR). HAR-based systems are helpful in propagating instructions from humans to computers through gestures/activities without making physical contact. Initial research on HAR-oriented systems mainly concerns audiovisual-based technologies using static or moving image data captured through cameras [1,2,3]. This kind of recognition task has serious privacy issues since the collected data are interpreted into human-perceivable form. In- and on-body sensors have also been intensively studied for human motion tracking in coarse- and fine-grain applications. However, they have technical requirements, in particular sensor-mounted devices such as accelerometers, and gyrosensors, and rely on a pre-installed setup with sensing devices attached to or placed inside a person’s body [4,5,6,7,8,9]. This is not a convenient way of monitoring, and constant examination is impossible because users struggle to carry such cumbersome devices all the time [10]. Wi-Fi-based systems have led to a breakthrough in the field of sensing due to their ubiquitous availability for data communication. This scenario has been successfully leveraged for a wide range of operations including, but not limited to, health monitoring [11,12,13,14], fall detection [15,16], human gesture and location identification [17,18,19,20], activity recognition [21,22,23], object detection [24,25], and humidity estimation [26]. Inspiration for this modern device-free sensing comes from (i) an easy and convenient way of sensing the environmental impacts of an activity with the help of a multi-path radio wave propagation, (ii) infrastructure-less operation with readily and extensively accessible radio devices, and (iii) high adherence to privacy and governance standards, as the wireless sensor data are complex and cannot be easily understood by the intended or unintended users.

Over the past few years, wireless-fidelity (Wi-Fi) sensor-based networks have seen significant demand and rapid growth for transmitting data via radio signals. This is due to their ease of large-scale deployment and high throughput capabilities. Recently, these sensor networks have been utilized for tracking human motion in laboratory environments using device-free wireless sensing (DFWS) technology. Wi-Fi communication is a multi-path propagation of radio waves. These radio signals arrive at the receiving antenna after following several paths with different time delays and undergo reflection, refraction, and scattering by the static and moving objects present in a sensing area. The radio signals emitted by these devices experience fluctuations caused by movement within the sensing environment. These fluctuations create distinct activity patterns at the receiving end which can be leveraged for motion detection.

However, the effectiveness of this technology depends on several factors, including sensitivity to changes in the sensing environment, the presence of previously unseen subjects, the ability to track multiple targets, data collection configurations, and the specific activities being monitored [27]. In other words, differences in domain distributions between the training and testing phases arise due to these variations, which are highly likely to occur in real-world scenarios. These are called the environmental effects of surroundings on radio waves [28]. Static objects produce persistent impacts that remain consistent unless the sensing environment changes, whereas the impact of moving objects creates a unique pattern associated with a particular activity. These systematic fluctuations establish a solid foundation for training a deep model to learn a distinctive mapping of an activity for its classification via channel state information (CSI). The CSI defines wireless channel propagation characteristics that include the consequences of fading and scattering on the system [29]. It is a complex representation of a unique pattern of static and dynamic vectors’ movement in a 3D space which can be analyzed by vector magnitude and phase in a complex polar form [30]. Since radio signals are heavily dependent upon environmental specifications, the performance of such trained deep models is relevant to these conditions. There are limitations and challenges which motivate novel active research in similar domains. The recorded CSI data are assumed to be generated for a particular environment with certain characteristics, such as the position of surrounding objects, which are mainly responsible for constructing static vectors. If their positions are changed from training to testing data, network performance suffers from degradation [31,32]. Interference caused by other moving objects apart from the target movement in the sensed area can lead to changes in the recorded CSI from training to the testing data because it may produce additional phase shifts or alter the magnitude of the dynamic vector and affect the system overall accuracy [30]. Hardware used in the transceiver antennas produces noises such as sampling frequency offset (SFO) which may differ by the variation of hardware from different vendors. Thus, hardware noises can modify the results and ultimately the accuracy of the system [33]. Interference generated by other Wi-Fi devices and microwave appliances nearby is another cause of desensitization of the recorded CSI data [34]. The CSI measurements are also sensitive to atmospheric conditions and can show dissimilarities in recorded data for an activity collected at different times or days of the week [35]. Since multiple factors can restrict a system’s performance in a controlled environment, every time a system undergoes any change, it needs to be retrained with new data samples for mapping the CSI variations according to the recent modifications. Collecting new data samples is impractical, as it takes heavy data storage, maintenance, human labor, and time. Retraining a model from scratch is also computationally expensive. We therefore seek a concrete solution that can help to generalize this field for more practical applications.

In this paper, we attempt for the first time to apply multi-source domain adaptation to an unsupervised model using Wi-Fi-based CSI data for cross-domain applications. Given the fact that domain shifting has the extra burden of generating, migrating, and annotating sufficient data, we take advantage of multiple scarce source domains to map their feature spaces to a target domain. Most of the public CSI datasets are collected in such a way that they cannot be used for cross-domain applications due to limited data samples per domain (particular user/environment or a specific day of the week). However, the proposed model overcomes the problem by successfully leveraging the data from diverse domains and aligning the distributions related to each specific source domain with a target domain. In addition, we train domain-specific classifiers by using domain-invariant representations acquired from diverse source domains. When these trained classifiers are used to predict target classes, they might classify them differently due to distribution mismatch near the decision boundaries. Ultimately, we align the decision boundaries of each classifier for the target domain. Figure 1 illustrates the workflow of our proposed methodology, detailing the process from collecting raw CSI measurements to classifying the activities being performed. (1) We first extract the raw CSI magnitude values. (2) We normalize them to a scale of [0–255] RGB scale. (3) We apply PCA for selecting features with maximum variation. (4) We use t-SNE embedding to find local relationships within high-dimensional data by projecting them into a lower-dimensional space. (5) Spectrograms are generated leveraging STFT and are fed to the model as input. (6) The model use such spectrograms generated from diverse sources and transforms the model to adopt a new target domain. (7) Finally, after being optimized, the model classifies activities on this new domain.

The motivation behind our proposed model stems from the advantages of leveraging diverse source distributions. Models trained with multiple sources develop richer feature representations and exhibit greater robustness when adapting to unseen target domains. This is because having multiple sources increases the likelihood of overlapping features with the target domain, facilitating smoother adaptation and reducing overfitting to a single source. Moreover, extracting common domain-invariant representations becomes more challenging in a SUDA (single-source unsupervised domain adaptation) setting when dealing with highly diverse source domains. Additionally, public CSI datasets are generally unsuitable for cross-domain analysis due to data scarcity within individual domains. This limitation can potentially be mitigated by incorporating multiple sources, which provides a larger training dataset and improves model performance in low-data scenarios. The main contributions of our work in this paper are summarized as follows:

✓: Multi-source M-FUDA outperforms all the baseline methods for most of the transfer learning tasks performed on three publicly available CSI datasets utilized for creating cross-user, cross-environment, and/or cross-atmospheric conditions using device-free HAR.
✓: The proposed model is applied to a multi-source unsupervised domain adaptation (MUDA) setup and contrasted against a single-source unsupervised domain adaptation (SUDA) setup designed with all the underlined sources combined in a single-source vs target setting. The proposed model produces promising results, surpassing traditional domain adaptation methods for device-free sensing. Our findings suggest that aligning multiple domain-invariant representations with domain-specific classifiers near class boundaries improves generalization. This alignment is particularly effective for each pair of source and target domains. As a result, the model performs well across various domain-shifting tasks.
✓: Empirical evaluation of various distance minimization approaches on one of the selected CSI datasets for each pair of source and target distributions indicates the suitability of maximum mean discrepancy (MMD) over others.
✓: Extensive evaluation shows that the predictive outputs of classifiers from different CSI sources capture target samples far from the support of underlying sources with the involvement of discrepancy and contrastive semantic alignment losses. This shows the role of proposed alignment losses in reducing the gap between the classifiers.

2. Related Work

Early reach work in device-free HAR has mainly been developed for a controlled laboratory environment without changing system characteristics from the training to the testing domain [36,37,38,39]. Although the accuracy of these systems is quite high, they cannot be implemented for a domain-shifting task without retraining the model with new target domain-labeled data samples. Without a doubt, is impractical to retrain models for every new domain, and sufficient labeled data for each new domain are not always available. Single-source and combined-source unsupervised domain adaptation (UDA) have been widely adopted in device-free HAR, confronting these shortcomings [27,40,41]. FewSense [42] is a few-shot learning (FSL) approach [43,44] that uses two methods for feature generation. The authors used a trained feature extractor to produce feature embeddings directly from a support set, called direct feature matrix generation. They also utilized an untrained classifier with the feature extractor and optimized their weight matrix using the support set, called fine-tuned feature matrix generation. Then, they computed the cosine similarity between the latent features of a query set and generated a feature matrix to train the model for novel classes introduced in cross-domains. WiGR [45] and DFGR [46] are also FSL-based deep similarity evaluation networks for activity classification in cross-domain Wi-Fi sensing. Features extracted from a query sample are concatenated with the features of every support sample, and their similarity is computed in a CNN-based similarity evaluation network using episode-based training. Network parameters are updated until the dissimilarity between support and query sets is small. Convolutional neural networks (CNNs) have shown remarkable contributions in the field of computer vision for image recognition tasks. CNN, despite its low preprocessing requirements, can explore the spatial and temporal dependencies in an image, which is also very helpful for HAR applications. Moshiri et al. [47] used a public raw CSI dataset [48] to generate grayscale CSI images using black-and-white colormaps. These images were fed into two 2D- convolutional layers of 3 × 3 filter size, each of them followed by a 2 × 2 max pooling layer, ReLU activation, and batch normalization. Finally, a dense neural network with softmax activation was used to produce class predictions. Model overall accuracy was 89.22% with 91% accuracy for fall-like in-domain activities, and it is thus suitable for healthcare applications. Moshiri et al. [49] produced pseudo-color RGB scale CSI images and fed them to a 2D-CNN model. Model overall accuracy was around 95% better than long short-term memory (LSTM) and bi-directional LSTM regarding training time and recognition accuracy. However, the model applicability was only tested for domain-specific tasks. DASAN [50] is an attempt to use pseudo-color RGB scale CSI images for the analysis of system sustainability in cross-user domain-shifting variations using inter-domain and intra-domain alignment techniques. Changsheng Zhang and Wanguo Jiao [51] performed five transformations on time series CSI data to convert them into images for the first time. These introduced CSI transformations were Gramian angular sum field (GASF) transformation [52], recurrence plot (RP) transformation [53], Gramian angular difference field (GADF) transformation [52], short-time Fourier transportation [54], and Markov transition field (MTF) transformation [52]. A three-layer 2D-CNN architecture was employed to recognize the activity from the converted images. Their results showed the superiority of RT over other techniques; however, this was determined through in-domain testing, which is most likely to achieve good results. The recognition accuracy of these CSI transformations in cross-domains is yet to be explored. Since CSI data streams vary in terms of time and correlate with target actions over a certain period, we can therefore assume them to be CSI images with temporal dependencies and leverage the extraordinary capabilities of CNNs for identifying such actions as an image recognition problem. There has been extensive research work in the last few years regarding the very same idea of representing CSI data streams in images and making the most of CNN-based architectures for device-free HAR. In the study of [55], a four-stage fall detection mechanism was presented. (1) They collected raw CSI data. (2) They denoised these data using discrete wavelet transform (DWT) [56]. DWT increased the signal-to-noise ratio and reduced the mean square error. (3) Short-time Fourier transformation [57] was applied to obtain CSI time-frequency diagrams for model training. Frequency domain analysis is effective for eliminating environmental influences and can identify the same activities performed at different venues. (4) A pre-trained GoogLeNet model [58] was further trained using time-frequency spectrograms for new domain classifications. Likewise, the work of [59] exploited the time-frequency spectrograms generated from a principle component analysis (PCA) of the first 20 impactful data features extracted from multiple sub-carriers. A generator and two asymmetric classifiers were operated in an adversarial fashion to maximize the discrepancy near the decision boundaries of the classifiers. Finally, the generator was optimized to produce target data features in support of source samples. This increased model accuracy in domain-shifting variations caused by unseen users and new locations.

Recent solutions for cross-domain Wi-Fi sensing are intended to align a single source domain to a single target domain or a multi-source combined domain to a single target domain. Fidora [60] is a wireless localization system designed to overcome Wi-Fi fingerprint inconsistencies caused by factors like body shape variations, background objects, and environmental changes. The novelty of their approach is the generation of augmented samples from collected CSI data using a data augmenter consisting of eight variational auto-encoders (VAE), one for each location. The augmented and original data were passed through feature extraction layers followed by classification and reconstruction layers. The classifier predicted location labels, while the reconstruction layers regenerated the original input samples. Compared to baseline models such as AutoFi [61], VAE-only, and FiDo [62], Fidora demonstrated a significant performance improvement, achieving 17.8% and 23.1% higher F1 scores in cross-user and cross-environment scenarios, respectively. WiGR [45] is a Wi-Fi-based few-shot learning system for gesture recognition, capable of domain adaptation across different environments. It employs supervised learning to generalize across new tasks using limited data. The CSI phase values, preprocessed using phase unwrapping and an FIR filter, were used for training. The model consists of a feature extraction subnetwork and a similarity discrimination subnetwork. The extracted features are combined, and similarity scores determine gesture classifications. WiGR outperformed competing models, including WiGeR [63], WiCatch [64], SignFi [65], and Siamese-LSTM [66], across different users, environments, and locations.

In our previous study [50], we examined the impact of transitioning users from the training phase to the testing phase on the predictive performance of an adversarial model named DASAN-MMD. This model utilized CSI data in a single-source-domain to single-target-domain setup. However, our proposed approach struggled to achieve high accuracy in some of the cross-user tasks due to the limited number of per-domain samples available in the public dataset [49]. Therefore, the combination of diverse source domains in a single common domain helps to enlarge the data distribution and confirms the effectiveness of SUDA methods in certain cases [27,41,59]; however, the improvement might not be significant or guaranteed in many domain-shifting tasks. This demands a better way to transform multi-source domains into a diverse target domain via multi-domain feature alignment at intermediate levels and near class boundaries. Our proposed model—Multi-Feature Unsupervised Domain Adaptation (M-FUDA)—makes full use of multiple source distributions and minimizes mismatch among diverse source and target domains in the first place. Second, it reduces the disparity among domain-specific classifiers at decision boundaries and increases the degree of accuracy using the average of multiple classifier outputs over the target domain. Finally, it diminishes the distance between samples of positive classes and increases the distance between samples of negative classes belonging to diverse source and target domains. Our model is a modified version of a previously published work [67] which has shown its superiority over other MUDA methods [68] in numerous domain shifting tasks due to its multi-level alignment strategy.

3. Preliminaries

3.1. Channel State Information (CSI)

The CSI represents detailed information about the state of a communication channel being impacted by various environmental causes during the signal transmission from the transmitter (TX) to the receiver (RX). The radio transmission is a multi-path propagation of Wi-Fi signals that travels through several paths facing line-of-sight (LOS) and non-line-of-sight (NLOS) contacts from TX to RX in an indoor environment. During a multi-path propagation, emerging signals have different time delays and attenuate differently with varying phase shifts due to reflection, scattering, diffraction, fading, and interference produced by the surrounding objects. In systems that support multiple-input multiple-output (MIMO) antennas, the CSI can efficiently use the spatial diversity of orthogonal frequency division multiplexing (OFDM) and can carry multiple copies of the same signal characterized by sub-carriers. The channel impulse response of the estimated CSI at the receiving end is defined as [33]

H_{x, y, z} = \sum_{i}^{N} A_{i} {exp}^{- j \times 2 π \times d_{x, y, i} \times f_{z} / c}

(1)

Also,

H_{x, y, z} = \underset{m a g n i t u d e}{\underset{︸}{| H_{x, y, z} |}} {exp}^{- j \underset{p h a s e - s h i f t}{\underset{︸}{∠ H_{x, y, z}}}}

(2)

where N is the total number of propagation paths, out of which the

i - t h

path length from the

x - t h

transmit antenna to the

y - t h

receiver antenna is

d_{x, y, i}

, bearing an amplitude of

A_{i}

;

f_{z}

is the

z - t h

sub-carrier frequency; and c is the speed of radio waves in a vacuum, which is approximately

3 \times 10^{8}

m/s. Equation (1) is the simplified form of the estimated CSI, neglecting the impacts of cyclic shift diversity (CSD), sampling time offset (STO), sampling frequency offset (SFO) and beamforming due to signal modulation and demodulation; TX and RX hardware imperfections; and software errors.

When all the objects in an indoor environment are static, an occupant movement can be traced through a characteristic change in the CSI magnitude and phase-shift over multiple sub-carriers caused by the target movement. These characteristic variations create a principle background on how deep models are trained for device-free sensing. Suppose a is a transmitted radio signal. Upon reception, it is estimated to be

b = H \times a + n

, where n is the noise vector, b is the received signal, and H is the CSI matrix. The estimated CSI is a

3 - D

matrix of complex values that variate along the time axis. The time axis is the fourth dimension that refers to the knowledge of the target movement with respect to time. In a Wi-Fi-based MIMO system having x transmit antennas and y receive antennas divided among z sub-carriers, the

4 - D

CSI matrix is denoted as

H_{x, y, z} = {[M A T R I X_{(C S I)}]}^{x \times y \times z \times t}

(3)

where x and y represent the spatial diversity, z shows the diversity in the frequency domain, and t is the time diversity in terms of the data packets sent to the receiver.

3.2. Wasserstein Distance

The Wasserstein distance measures the minimal cost required to match two dissimilar probability distributions by reshaping one into the other. Suppose P and Q are two different probability distributions in terms of mass over space. The minimal cost is the least amount of work needed to move mass m to a distance d in order to transform probability distribution P into probability distribution Q. It has gained popularity in domain shifting tasks due to its robust support for transferring knowledge from the source domain to the target domain even when the distributions are far apart. Mathematically, the measure of 1D-Wasserstein distance between probability distributions P and Q is defined as [69,70,71]

W (P, Q) = inf_{γ \in Γ (P, Q)} E_{(x, y) γ} [| | x - y | |]

(4)

where

Γ (P, Q)

is the set of all joint distributions with marginals P and Q, and

| | • | |

is the L1-norm. When P and Q are represented as finite discrete samples, say u and v with n samples each, the 1D-Wasserstein distance simplifies significantly, which is given by

W (P, Q) = \frac{1}{n} \sum_{i = 1}^{n} | u_{i} - v_{i} |

(5)

where

u_{i}

and

v_{i}

are the sorted values of the samples u and v for pairing points optimally in terms of transportation cost in a single dimension. Sorting these samples mimics the effect of minimizing the transportation cost in uni-direction and computing their absolute differences, representing the transportation cost for each pair of matched points. Finally, taking the mean results in the average optimal transportation cost, which is the objective of the 1D-Wasserstein distance [71].

3.3. Correlation Alignment

Correlation alignment is another kind of statistical analysis for minimizing the difference between two covariance matrices acquired from the extracted features of the source and the target domains. Suppose the extracted source and target features are

X_{s}

and

X_{t}

, respectively. These generated features are used to calculate pairwise correlations which are defined as [72]

C_{s} = \frac{1}{n - 1} (X_{s}^{T} X_{s}) - \frac{1}{n^{2}} (X_{s}^{T} 1) (1^{T} X_{s})

(6)

C_{t} = \frac{1}{n - 1} (X_{t}^{T} X_{t}) - \frac{1}{n^{2}} (X_{t}^{T} 1) (1^{T} X_{t})

(7)

where

1

is a vector of ones and n is the number of data points.

C_{s}

and

C_{t}

are the covariance matrices generated from

X_{s}

and

X_{t}

, respectively.

Finally, the element-wise difference between the covariance matrices

C_{s}

and

C_{t}

is computed to measure the CORAL loss, known as the Frobenius norm. CORAL loss is defined as [72,73]

L_{C O R A L} = \frac{1}{4 d^{2}} | | C_{s} - C_{t} {| |}_{F}^{2}

(8)

where

| | • {| |}_{F}

represents the Frobenius norm, and d is the dimensionality of the

X_{s}

/

X_{t}

.

3.4. Maximum Mean Discrepancy and Its Variants

Maximum mean discrepancy (MMD) is a statistical method that transforms the mean embeddings of two probability distributions into a high-dimensional reproducing kernel Hilbert space (RKHS). The RKHS is a special type of space where comparing two probability distributions is simpler. Let us suppose that we have two probability distributions P and Q. Their mean embeddings in the RKHS are defined as

μ_{P} = E_{(x) P} [ϕ (x)]

(9)

μ_{Q} = E_{(x) Q} [ϕ (x)]

(10)

where

ϕ (•)

is a mapping function which maps the features into the RKHS.

ϕ (•)

is associated with a characteristic kernel function

k (X_{s}, X_{t}) = < ϕ (X_{s}), ϕ (X_{t}) >

, and

X_{s}

and

X_{t}

are the source and target domain features, respectively. In deep learning, MMD loss is computed in order to minimize the discrepancy between distributions P and Q by measuring the difference of the source and target mean embeddings in the RKHS [74]:

L_{M M D} (P, Q) = | | μ_{P} - μ_{Q} {| |}_{H}

(11)

where H is the reproducing kernel Hilbert space (RKHS). When the kernel function used in MMD is changed to a multi-kernel function

{k_{i} (X_{s}, X_{t})}_{i = 1}^{M}

, it helps to leverage the strengths of multiple kernels to adapt to complex differences between source and target distributions. These kernels are useful for extracting features from different aspects of data using a balancing factor that decides the contribution of each aspect to the overall alignment of two distributions to achieve better adaptation. Such an alignment technique is called multiple-kernel maximum mean discrepancy (MK-MMD) [75,76]. Mathematically, a multi-kernel function is defined as

K (X_{s}, X_{t}) = \sum_{i = 1}^{M} W_{i} k_{i} (X_{s}, X_{t})

(12)

where

W_{i}

is a balancing factor for which

W_{i} \geq 0

and

\sum_{i = 1}^{M} W_{i} = 1

. MMD aligns a single feature between two probability distributions

P (X_{s})

and

Q (X_{s})

using a single kernel function

k (X_{s}, X_{t})

to compare samples. We map multiple feature spaces or joint feature representations of two probability distributions to a reproducing-kernel Hilbert space (RKHS) such that their joint probability distributions are

P (X_{s}^{1}, X_{s}^{2}, \dots, X_{s}^{M})

and

Q (X_{t}^{1}, X_{t}^{2}, \dots, X_{t}^{M})

where

{X_{s}^{j}}_{j = 1}^{M}

and

{X_{t}^{j}}_{j = 1}^{M}

are the source and target samples, respectively. A single-layer kernel is now replaced with a joint kernel of individual layers using

K ((X_{s}^{1}, \dots, X_{s}^{M}), (X_{t}^{1}, \dots, X_{t}^{M}))

, and such a variant of MMD is called a joint maximum mean discrepancy (JMMD) [77]. This mapping may involve more complex and powerful alignment; however, it is computationally heavier. We cannot always guarantee improvement in matching source and target distributions robustly using any of these techniques because they do not explicitly consider class labels or their relationships, especially in imbalanced data, as they may fail to align class-specific features effectively. Therefore, it is an exercise in trial and error to find their suitability in particular scenarios.

4. Problem Definition

Let us suppose that we have N different Wi-Fi CSI data source distributions such that

D = {D_{1}, D_{2}, D_{3}, \dots, D_{N}}

, and their probability distributions are represented as

{P_{s_{i}} (X_{s_{i}}, Y_{s_{i}})}_{i = 1}^{N}

. These source distributions have distinct features in terms of sensed environmental characteristics taken from N heterogeneous spaces for which source distributions are not uniform, i.e.,

D_{1} \neq D_{2} \neq D_{3} \dots \neq D_{N}

. Each source domain is equipped with labels in such a way that

T_{s} = {(X_{s_{i}}, Y_{s_{i}})}_{i = 1}^{N}

, where

X_{s_{i}} = {x_{s_{i}}^{j}}_{j = 1}^{M_{s}}

denotes the

j - t h

sample of the

i - t h

source domain, having a total of

M_{s}

data samples in the underlying source domain and

Y_{s_{i}} = {y_{s_{i}}^{j}}_{j = 1}^{M_{s}}

is the corresponding ground-truth label. A transfer learning task is to match these diverse source distributions to a target distribution

P_{t} (X_{t}, Y_{t})

, with domain data

T_{t} = {(X_{t}, Y_{t})}

where

X_{t} = {x_{t}^{j}}_{j = 1}^{M_{t}}

having no ground-truth labels

{y_{t}^{j}}_{j = 1}^{M_{t}}

. This target domain is distinct from all the source domains, i.e.,

D_{t} \neq {D_{1}, D_{2}, D_{3}, \dots, D_{N}}

. Suppose that these diverse source domains are mapped into a common feature space

D_{s}

and align with the target domain

D_{t}

using traditional domain generalization methods. Such adversarial learning would mainly focus on common domain-invariant representations coming from all domains, which is not an optimal solution to generalize these diverse domains because of systems’ inability to align myriad domain-invariant multi-source data features in a common feature space. This is likely to lead to poor performance in the domain-shifting Wi-Fi sensing tasks.

This problem definition leads to the research question addressed in this paper: how best to align wireless sensor data for multi-feature domain distributions continuously to a new domain with no label observations so that the system can maintain high accuracy regarding heterogeneous domain discrepancies. A reasonable solution is an unsupervised network with multiple subnetworks, each used for mapping a pair of source and target domains into specific feature spaces and matching their distributions. This would entail minimizing the distance between domain-specific feature spaces via measuring the difference between two probability distributions, then utilizing domain-specific classifiers to classify each domain label before finally aligning the domain-specific classifiers’ outputs for the target domain via domain discrepancy minimization techniques near the domain-specific decision boundary [67]. Domain discrepancy minimization refers to the process of reducing the differences between a source domain and a target domain, each belonging to distinct data distributions, to improve model generalization across them. Additionally, reverting to point-wise surrogates of the source and target distribution distances and similarities might be effective in multi-domain generalization problems [78]. The main objective of this paper is to apply multi-feature unsupervised domain adaptation techniques discussed above to cross unaligned domain-specific distributions in device-free human activity classification so that the model can leverage full benefits from a multi-source environment for an optimal solution of the problem, to investigate whether the multi-source unsupervised domain adaptation (MUDA) setup successfully matches diverse source distributions to a target domain, and to outmatch the classical single-source unsupervised domain adaptation (SUDA) methods used earlier as multi-source combined solutions.

5. Materials and Methods

5.1. Proposed Method

This paper applies an unsupervised wireless sensor alignment approach to multiple cross-domains affected by one or more types of variations caused by unknown users, unseen environments, and/or surrounding atmospheric conditions in human activity recognition using multi-feature unsupervised domain adaptation (M-FUDA), inspired by the work of [67]. M-FUDA is a four-stage alignment procedure intended to efficiently use the diverse source domain features in domain transferring tasks. In this section, we will first present an overview of the proposed approach and then describe the technical details of each component.

5.2. Overview

Prior research work in the field of device-free sensing is mainly focused on adversarial models trained on a multi-source combined data features to extract domain-invariant representations of all domains and map them onto a new target domain for matching cross-domains with inconsistency problems [27,41,59]. At first, it is difficult for an adversarial model to efficiently extract domain-invariant features coming from a diverse range of source domains, and one-on-one mapping of these domains in a common feature space aligned with the target domain is also not convenient. This leads to poor recognition accuracy in many domain shifting tasks. Researchers also employed pre-trained models for transforming domains in device-free HAR in order to minimize the computational cost and training time and also to deal with the training data scarcity issues. However, transfer learning-based approaches are not very effective in reducing the training data proportions, as investigated by the work of [79]. They achieved comparable results to the adversarial models with 500 training data samples using a pre-trained roaming model that was already trained between environments A and B in order to learn quickly between environments A and C. Again, common feature extraction from multiple source domains is not easier using transfer learning-based approaches, especially when the training data samples of a new domain are very limited. Therefore, our proposed model is a combination of a shared pre-trained model connected with many adversarially trained domain-specific sub-networks. The first stage uses the benefits of transfer learning to extract domain-invariant representations of different pairings of multiple sources with a target domain. The second stage uses domain adaptation techniques to align each pair of source and target domains into specific feature spaces and match their distributions. To the best of our knowledge, this is the first attempt to test multi-source domain adaptation with a transfer learning-based method to assess the domain inconsistency problem in human activity classification using device-free sensing.

5.3. Domain Invariant Feature Alignment

The first part of our proposed architecture is a common feature extractor. This sub-network is a shared framework for N mutually distinct source domains and an unseen target domain different in features from any of the source distributions. It helps the model to map common domain-invariant features from their original feature spaces into a common feature space. Suppose we have a batch of L samples

{x_{s_{i}}^{j}}_{j = 1}^{L}

belonging to the

j - t h

source domain and a batch of L samples

{x_{t}}_{j = 1}^{L}

belongs to the target domain. After passing through the common feature extractor

F (•)

with the mapping parameter

θ_{f}

, they get transformed into

Z_{{f_{i}}_{s}}^{j} = F ({x_{s_{i}}^{j}}_{j = 1}^{L}; θ_{f})

and

Z_{f_{t}} = F ({x_{t}}_{j = 1}^{L}; θ_{f})

. However, it would take a long time for the sub-network to converge on a common feature space from a diverse range of source distributions. Therefore, we utilize the resources of transfer learning with the help of a pre-trained convolution neural network (CNN) model. The sub-network training starts with the fine-tuning of a ResNet50 [80] architecture pre-trained on the ImageNet 2012 dataset. We follow the fine-tuning steps explained by the study of [81]. ImageNet 2012, a part of the ImageNet project [82], contains 1000 classes of images including animals, everyday objects, tools, etc. A ResNet50 architecture pre-trained on 1.2 million such images belonging to the family of the ImageNet project is the shared sub-network for all the source and target domains. We further fine-tune the sub-model weights with the diverse multi-domain sources and target domain data samples to make the feature vectors of all the domains as similar as possible and minimize domain discrepancy. The input to the ResNet50 sub-network is 3D CSI pseudo-color map images/spectrograms of size

64 \times 64 \times 3

(height, width, RGB channels). Figure 2 shows the architectural details of ResNet50 for fine-tuning the shared sub-network weights.

5.4. Domain-Specific Feature Alignment

As mentioned earlier, it is not always possible for an adversarially trained model to significantly extract similar data features from various dissimilar source distributions. In order to tackle this problem, we further feed common domain-invariant representations received from the previous Stage 1 of the shared sub-network into N unshared CNN-based domain-specific feature extractors in a group of batches. Each domain-specific feature extractor is utilized to map a pair of source and target domains into a specific feature space by producing the target representations in support of the paired source domain. Suppose

H_{i} (•)

is the

i - t h

domain-specific feature extractor mapping function with a mapping parameter

θ_{h_{i}}

introduced with the

j - t h

source- and target-generated common data features

Z_{{f_{i}}_{s}}^{j}

and

Z_{f_{t}}

, respectively. The domain-specific feature extractor produces the domain-specific feature spaces such that

Z_{{h_{i}}_{s}}^{j} = H_{i} (Z_{{f_{i}}_{s}}^{j}; θ_{h_{i}})

and

Z_{{h_{i}}_{t}} = H_{i} (Z_{f_{t}}; θ_{h_{i}})

. To achieve this goal, we test several statistical methods to minimize the distance between two probability distributions so that they can be correlated to each other as similarly as possible. We evaluated the strength of three well-known domain discrepancy reduction techniques in our device-free cross-domain HAR, namely maximum mean discrepancy (MMD) [74], multiple-kernel maximum mean discrepancy (MK-MMD) [75,76], and joint maximum mean discrepancy (JMMD) [77]. Finally, we choose the best one in terms of its suitability for the specific HAR application. We add these distance minimization losses to the backpropagation loss of the classifier output, which is the third stage of the proposed architecture. Figure 3 depicts the architectural details of domain-specific feature extractors connected with the Stage 1 shared sub-network, illustrated in Figure 2.

5.5. Domain-Specific Classifier Alignment

A domain-specific classifier is a classification model trained on a paired source domain with a target domain and optimized for classifying unseen samples from the target domain. It is specifically designed to handle domain shifts, ensuring good generalization to the target domain despite differences in data distribution. The adversarial models trained for shifting domains usually tend to produce a one-to-one mapping between two distinct feature spaces. However, this approach is effective for coarse-grained applications when the two feature vectors are not significantly different. As the considered domain-transforming tasks in our case are being applied to fine-grained variations where the subject, environment, and/or atmospheric conditions are fully changed from the source to the target domain, such models are not expected to be very efficient in achieving good recognition accuracy for fine-grained domain shifting tasks in device-free HAR. These significant variations may have impact near to the decision boundaries of the classifier output and can misclassify similar source and target data features owing to dissimilarities created near to the decision boundaries.Models trained to transform a single source domain into a target domain often struggle to classify distinct classes near decision boundaries due to overlapping feature distributions. Data points near these overlaps exhibit characteristics of both classes, making it difficult for the model to assign high-confidence predictions. In our case, the source samples come from a diverse distribution which is highly likely to suffer from overlapping feature distributions and uncertainties in classifying target classes near decision boundaries.

To address this, the third stage of our proposed model involves N domain-specific predictors. Each of them receives a pair of source and target domain-specific invariant data features from the Stage 2 output of N unshared domain-specific feature extractors. Each classifier uses a softmax layer to classify the probability distribution into K classes. We exploit the disagreement of multiple classifiers in order to minimize the discrepancy near to the decision boundaries. Suppose

C_{i} (•)

is the

i - t h

domain-specific classifier mapping function with a mapping parameter

θ_{c_{i}}

introduced with the Stage 2 paired

j - t h

source and the target-generated domain-specific data features

Z_{{h_{i}}_{s}}^{j}

and

Z_{{h_{i}}_{t}}

, respectively. The domain-specific classifier produces the probability distribution divided among K classes such that

Z_{{c_{i}}_{s}}^{j} = C_{i} (Z_{{h_{i}}_{s}}^{j}; θ_{c_{i}})

and

Z_{c_{t}} = C_{i} (Z_{{h_{i}}_{t}}; θ_{c_{i}})

. Suppose there are such N classifiers trained on a pair of source and target domain-specific data features of N total such distinct source domains. In that case, they are more likely to produce disagreement in the classifiers’ outputs for the unseen target samples due to their mismatch near to the class boundaries. Since the classifiers are trained on different source domains paired with a target domain, they are supposed to create this disagreement intuitively. Our objective is to minimize the discrepancy among classifiers’ outputs as much as possible and then take the average of all the classifiers’ outputs as the model’s final outcome. This final prediction has more confidence due to its weightage taken from the mutual agreement of N aligned classifiers on the target domain. To minimize this discrepancy, we utilize discrepancy loss as the absolute values of the differences among all pairs of classifiers’ probabilistic outputs. Suppose

P_{c_{i 1}} (y_{i 1} | x_{i 1})

and

P_{c_{i 2}} (y_{i 2} | x_{i 2})

are the softmax layer probabilistic outputs for

i 1 - t h

and

i 2 - t h

classifiers of K total classes. Their discrepancy loss is defined as

L_{d i s c} = \frac{1}{N \times (N - 1)} \sum_{i 2 = 1}^{N - 1} \sum_{i 1 = i 2 + 1}^{N} \{\frac{1}{K} \sum_{c = 1}^{K} [| P_{c_{i 1}} (y_{i 1} | x_{i 1}) - P_{c_{i 2}} (y_{i 2} | x_{i 2}) |]\}

(13)

The discrepancy loss is finally added with each classifier classification loss, which is calculated using negative log likelihood loss (NLLoss), defined as

N L L o s s = \frac{1}{L} \sum_{i = 1}^{L} \{\frac{1}{K} \sum_{c = 1}^{K} log (P_{c_{i}} [y_{i} | x_{i}])\}

(14)

where L is the number of data samples divided among a group of batches.

Figure 4 depicts the architectural details of domain-specific classifiers connected with the corresponding Stage 2 unshared domain-specific feature extractors, illustrated in Figure 3. These domain-specific classifiers follow a straightforward feed-forward neural network architecture with three fully connected layers.

5.6. Contrastive Semantic Alignment

The Stage 4 alignment of our proposed model is the contrastive learning and separation of the semantic probability distributions of paired source and target data features near to the critical decision boundaries. This semantically aligned yet maximally separated mapping of the embedded subspace effectively generalizes distinct domains with an extremely low number of labeled target training samples, as demonstrated by the study of [78]. However, our proposed approach is based on an unsupervised adaptation technique that does not have target label observations. Therefore, our first step is to pre-label target samples. The previous three stages of alignment steps have improved our model by increasing confidence in target label predictions. We utilize this setup to generate pseudo-labels on target samples from each of the domain-specific classifiers and leverage them to minimize the intra-class discrepancy from their counterpart paired source samples so as to reduce the distance between the samples within the same classes of source and target domains. At the same time, we maximize their inter-class differences so as to push the samples belonging to the dissimilar classes of source and target domains apart. This two-way optimization of intra-class and inter-class discrepancies is helpful to improve the domain generalization process using contrastive semantic alignment loss, another type of distance minimization method that is added with the other losses of the model and jointly helps in the adaptation strategy. Finally, an average of the multiple classifiers’ outputs increases the model performance in the recognition task. Mathematically, we can define it as

L_{C S A} = \frac{1}{2} (1 - Y) {[D (x_{s_{i}}^{j}, x_{t}^{j})]}^{2} + \frac{1}{2} (Y) max {[0, m a r g i n - D (x_{s_{i}}^{j}, x_{t}^{j})]}^{2}

(15)

where D is the pairwise distance between source and target embeddings, Y shows the similarity between the source label and the target pseudo-label with a value of 0, and in the case of dissimilar pairs, it acquires a value of 1, the margin is declared as 1, and if the distance between dissimilar pairs goes beyond this margin, the loss of these pairs become 0 to avoid unnecessary separation.

Figure 5 shows the proposed architecture with the four stage alignment losses combined together as the overall loss for backpropagation of the model. It shows the overall architecture of the proposed model, with the three blocks having been separately explained in their layout design in the previous discussions of Figure 2, Figure 3 and Figure 4. Additionally, it introduces all the domain generalization losses used at different stages of model training. Also, Algorithm 1 explains all the training steps of our presented model. The total loss of the proposed architecture after the four-stage alignment technique is formulated as

L_{t o t a l} = L_{c l a s s i f i c a t i o n} + α L_{D i s t a n c e_M i n i m i z a t i o n} + β L_{d i s c r e p a n c y} + γ L_{C o n t r a s t i v e_A l i g n m e n t}

(16)

where

α

,

β

, and

γ

are weights used to emphasize different losses. We fixed

γ = 0.02

during the model training and exponentially decreased

α

and

β

using the formula

\frac{2}{1 + exp (- 10 * p)} - 1

, which progressively schedules these weights with a gradual change in p as the model training progresses [83].

Algorithm 1: Multi-Source M-FUDA Training

Data:: N labelled source domains ${D_{s_{i}}}_{i = 1}^{N} = {\{{x_{s_{i}}^{(j)}, y_{s_{i}}^{(j)}}_{j = 1}^{M_{s}}\}}_{i = 1}^{N}$ and unlabelled target domain $D_{t} = {x_{t}^{(j)}}_{j = 1}^{M_{t}}$
Result:: Optimized pre-trained ResNet50 $(F)$ [80], N trained domain-specific feature extractors ${\{H_{i 1}\}}_{i 1 = 1}^{N}$ , and N trained domain-specific classifiers ${\{C_{i 2}\}}_{i 2 = 1}^{N}$

6. Experimental Results

The main objective of our evaluation is to assess the performance of M-FUDA in tackling the main types of domain adaptation tasks in human activity recognition using device-free sensing identified in the previous discussion as domain-shifting inconsistency produced due to varying users, environments, and/or atmospheric conditions from the training to the testing phase of the model. All experiments conducted in this study were performed on an Intel Core i7-9700K CPU (Intel Corp., Santa Clara, CA, USA) equipped with 32GB of internal memory and an Nvidia RTX 3060 GPU (Nvidia Corp, Santa Clara, CA, USA).

6.1. Datasets

We accessed three publicly available datasets [41,49,84] for our model evaluation on 24 September 2024. We have not had access to information that could identify individual participants during or after data collection. Regarding our first dataset, a personal computer using Raspberry Pi as a monitoring point (MP) and a TP-link archer c20 as an access point (AP) was employed to communicate over a single antenna pair with 52 useful sub-carriers, and a Nexmon Tool [85] was used to collect CSI data for seven daily activities repeated by 3 volunteers 20 times, each resulting in 420 samples in total. We used the magnitude values of the complex CSI to create cross-user domain transformations from the source to the target domain using the underlined dataset, which we called the Parisafm dataset [49]. For our assessment of cross-user and cross-environment variations overall, we utilized the Alsaify dataset [84], which was collected from 12 sets of activities repeated by 30 volunteers, with 20 trials for each, and 30 useful sub-carriers among 3 antenna pairs, resulting in 90 effective data features (columns) of complex CSI. Out of 12 sets of activities, we picked 6 activities performed by subjects 1, 2, and 3 in Environment 1; by subjects 11, 12, and 13 in Environment 2; and by subjects 21, 22, and 23 in Environment 3. Environments 1 and 2 were featured line-of-sight (LOS) contact between Tx and Rx, whereas Environment 3 presented a non-line-of-sight (NLOS) contact with the hurdle of a wooden wall of 8 cm between Tx and Rx. In total, we used 3240 samples of CSI magnitude values. For the Parisafm dataset, we repeated the preprocessing steps presented in [49] to produce 3D CSI images, whereas the preprocessing of the Alsaify dataset started with the normalization of data values between [0–255], applying Hampel filtering followed by principle component analysis (PCA) to extract 80 impactful data features. These PCA components were then employed to visualize t-distributed stochastic neighbor embedding (t-SNE) into 3 component dimensions. Finally, we generated CSI spectrograms from t-SNE output embeddings using short-time Fourier transform (STFT) [86] at a sampling rate of 200. For our last and very naturally occurring domain shifting task of varying atmospheric conditions from the training to the testing phase, we employed the dataset we called the Brinke and Meratnia dataset [41]. A mini-PC with an Intel Ultimate Wi-Fi Link 5300 NIC as a monitoring point and an access point (TP-LINK AC1750) were used to communicate over a

3 \times 3

MIMO antenna pair with 30 useful sub-carriers, leading to 180 effective data features (columns), and a CSI Tool [87] was installed to collect CSI data for 6 daily activities repeated by 9 different participants over 6 multiple days. For our assessment, we extracted the data samples for 9 experiments of 4 activities performed by 2 participants over 3 days, and each activity was recorded for 50 trials of each experiment, generating 5400 samples in total. We only utilized CSI-magnitude for these samples, and we replicated the preprocessing steps explained in the study of [49]. Finally, we got a 3D-CSI image using MATLAB (2024) pseudocolor plots. Figure 6 presents the t-SNE plots of all three datasets with underlined proportions of samples mentioned in Table 1. These plots show the divergence and heterogeneity of samples in different classes coming from a diverse range of users, environments, and varying atmospheric conditions. This also demonstrates the difficulty level of domain adaptation tasks in the case of cross-domain alignments. The densely clustered data highlights the complexity of domain transformation. In our cross-user experiments, we worked with seven activities from the Parisafm dataset [49]. For our cross-user and cross-environment experiments, we used six activities from the Alsaify dataset [84] and four activities from the Brinke and Meratnia dataset [41]. These activities are labeled in Table 1, and each color in Figure 6 corresponds to an activity label in Table 1.

6.2. Configuration and Hyperparameter Tuning

Inspired by the foundational work of [67], we initiated our model simulations using the same hyperparameters as those employed in their study in order to establish a consistent baseline for comparison. The architecture begins with a shared sub-network based on a pre-trained ResNet50 [80], which outputs a feature space of size

(1, 1, 2408)

. This shared sub-network leverages the robustness of ResNet50, trained on the ImageNet 2012 dataset, to extract generalized features applicable across different domains.

6.2.1. Domain-Specific Feature Extractors

For each domain-specific task, the feature extractor comprises a combination of convolutional layers. Specifically, it includes two

2 D - C N N (1 \times 1)

layers and one

2 D - C N N (3 \times 3)

layer, followed by an average pooling layer of size

(7, 7)

. This configuration produces a refined feature space of size

(1, 1, 2042)

, as depicted in Figure 3. The inclusion of

1 \times 1

convolutions helps with dimensionality reduction and fine-grained feature selection, while the

3 \times 3

convolutional layer captures spatial relationships and patterns critical for domain-specific tasks.

6.2.2. Domain-Specific Classifiers

Each domain-specific classifier is implemented as a three-layer fully connected network, structured as

2042 \to 256 \to K

classes, where K represents the number of output classes for the specific domain, as depicted in Figure 4. A softmax activation function is applied to the final layer to output class probabilities. This structure ensures sufficient capacity for complex classification tasks while maintaining computational efficiency.

6.2.3. Grid Search for Optimal Configuration

To enhance model performance, we conducted a thorough grid search by varying key architectural components. These variations included modifying the number of output channels and kernel sizes in the

2 D - C N N

layers as well as experimenting with deeper fully connected layers in the domain-specific classifiers. We also tested different batch sizes

(10, 30, 60)

during training. Interestingly, despite extensive experimentation, the most optimal configuration remained aligned with the one proposed in [67]. The only significant improvement was achieved by replacing the cross-entropy loss function with negative log-likelihood loss (NLLoss), which provided a noticeable boost in classification accuracy.

6.2.4. Learning Rate Strategy and Optimization

Since the shared sub-network was pre-trained on ImageNet, its learning rate was set 10 times smaller than those for the domain-specific feature extractors and classifiers in order to preserve learned representations while allowing for fine-tuning. We tested learning rates in the range of

[0.0001, 0.01]

. To further optimize training, we adopted a dynamic learning rate adjustment strategy based on [67]. The learning rate was updated after each epoch using the following formula:

L R_{n e w} = \frac{L R_{p r e v}}{{(1 + 10 * p)}^{0.75}}

(17)

where p is a parameter that progresses linearly from 0 to 1 over the course of training, allowing for gradual refinement of the learning process, and

0.75

is set by the strategy design.

6.2.5. Optimization Algorithms

We evaluated several optimization algorithms, including

Adaptive Moment Estimation (Adam): Known for its adaptive learning rate capabilities.
Adaptive Gradient Algorithm (Adagrad): Effective for sparse data scenarios.
Adam with Weight Decay (AdamW): Combines Adam’s efficiency with weight decay for better regularization.
Stochastic Gradient Descent (SGD) with momentum (0.9): Provides stability and faster convergence by dampening oscillations.

Among these, SGD with momentum yielded the best performance, balancing optimization speed and model generalization.

6.2.6. Early Stopping and Data Splits

To prevent overfitting and ensure robust model training, we implemented early stopping. The model’s overall loss was monitored after an initial warm-up phase, and training was halted if the loss did not improve over five consecutive epochs. This strategy helped maintain computational efficiency while optimizing model performance. The dataset was split into 60% for training, 20% for validation, and 20% for testing. This division ensured a fair evaluation of the model’s generalization capabilities. The final model was selected based on its high predictive performance on the validation set, particularly for domain-shifting tasks that challenged the adaptability of the architecture.

Our analysis revealed that while the original configuration proposed in [67] was highly effective, certain refinements—such as the adoption of NLLoss and fine-tuned learning rate strategies—led to further improvements. These modifications highlight the importance of nuanced adjustments in achieving state-of-the-art results for domain-adaptive learning tasks.

6.3. Comparison Techniques and Evaluation Metrics

Our proposed model consists of three blocks with four aligning stages. Stage 1 is a pre-trained model optimizing its weights on a multi-source CSI data distribution, producing a shared feature subspace for Stage 2 of N unshared domain-specific feature extractors. These feature extractors generalize the target distribution with each diverse source distribution using distance minimization approaches such as maximum mean discrepancy (MMD) [74]. We also tested two variants of maximum mean discrepancy (MMD) [74], namely multiple-kernel maximum mean discrepancy (MK-MMD) [75,76] and joint maximum mean discrepancy (JMMD) [77].

We first explored the sole impact of discrepancy and contrastive semantic alignment losses on the Parisafm dataset and then added these losses to one of the selected distance minimization losses. Finally, we chose the two best models in terms of high predictive performance and moderate simulation time and compared the state-of-the-art baseline adaptation techniques to the preferred versions of the recommended setup. For a fair comparison, all of the baseline methods followed the same deep learning architecture as the recommended version and were fine-tuned from a pytorch-provided model of ResNet50 [80]. The entire training process was repeated ten times for each domain-shifting task. Each time performance was evaluated on unseen target samples in order to take the average of these multiple trials for unbiased and satisfactory analysis. Our first baseline model was a combination of the source version of the proposed model with recommended model settings and MMD loss because this loss function successfully adopted the target domain with the highest or second-highest predictive performance among all of our tested multi-domain unsupervised domain adaptation (MUDA) techniques and did so with moderate simulation time. We used a single feature extractor and a classifier followed by the pre-trained ResNet50 [80] architecture and combined all the source domains into a traditional single-source vs target setting to testify to the importance of multiple sources being combined in training a prototype that has originated from a single unsupervised domain adaptation (SUDA) technique. Our second baseline model was derived from Wi-Adaptor [59], although we changed its architecture to match the proposed one in order to achieve a fair model comparison. Wi-Adaptor [59] trains a generator and two classifiers on combined-source samples to generate discriminative features and then utilizes the trained classifiers to maximize the discrepancy on target samples. Finally, it trains the generator on target samples to minimize the discrepancy. The authors’ contribution was inspired by the study of maximum classifier discrepancy (MCD) for unsupervised domain adaptation [88]. We further tested our proposed architectures against two classical distance minimization approaches—correlation alignment (CA) [72,73] and Wasserstein distance [69,70]—which have proven their effectiveness in domain adaptation tasks in earlier research works [72,89]. For these two baseline methods, we used the same architecture as in single-source M-FUDA, with the only difference being that the maximum mean discrepancy (MMD) loss after the domain-specific feature extractor was replaced by these distance minimization techniques. For better clarity, refer to Figure 5. We reported micro- and macro-F1 scores to assess the performance of different models. The best results for each specific task were colored in light green, and the best average of all the tasks was colored in light blue. Micro-F1 reports the models’ statistics based on the overall performance of the model, treating every instance as having equal importance, which is preferable in our imbalanced data distribution. Meanwhile, macro-F1 treats minority classes with the same importance as majority classes and reports the model performance across all the classes based on the weighted average of F1 scores for individual classes. Each table in our reported results presents the average of individual micro- and macro-F1 scores, calculated for each domain-shifting task performed on the selected datasets and displayed at the end of the table.

7. Results and Discussion

This section presents and discusses our results in three cross-domain experiments performed on three selected datasets. All experiments conducted in this study were performed on an Intel Core i7-9700K CPU equipped with 32GB of internal memory and an Nvidia RTX 3060 GPU.

7.1. Experiments with Varying Users

Our first experiment was to assess the model accuracy across different users to test how well it was able to generalize various users’ activities, each having different physical characteristics, and not become overfitted on any specific user in a pool of diverse source distributions. For this analysis, we chose the Parisafm dataset, which contains data from three users performing seven activities in a laboratory environment; see Table 1. This dataset was suitable for the underlined experiments because we were able to train our model on the features of any two users to investigate the learning adaptability of the third user as a target domain. We utilized three dissimilar source domains. Two of them belonged to single-source subjects 1, 2, or 3, and the third source contained combined data features of previously chosen subjects to achieve better commonality between diverse sources and target distributions. We first examined the significance of different alignment losses at various stages of neural network architecture. This started with the discrepancy (disc) loss, then we added contrastive semantic alignment (CSA) loss and included numerous distance minimization losses as we moved from left to right in Table 2 and Table 3. We finally deciphered the recommended settings for our proposed model, with discrepancy loss for aligning multiple classifiers’ outputs, contrastive semantic alignment loss for reducing the distance between samples of the same classes, and maximum mean discrepancy (MMD)/multiple-kernel maximum mean discrepancy (MK-MMD) loss for mapping target distribution onto the paired source domain. We called this selected model Multi-Feature Unsupervised Domain Adaptation (M-FUDA), which we employed for the first time for cross-domain applications in device-free HAR as our main contribution in this paper. Table 2 and Table 3 demonstrate our concluding remarks in terms of the superiority of the recommended settings over all the other combinations of loss functions and shows how this model outperforms other settings in three cross-user tasks in a multi-source environment. A multi-source environment consists of diverse source domains with related, yet distinct, data features. We leveraged these varied domains to expand the training dataset, addressing the limited availability of data within each domain of public datasets. Figure 7 illustrates a comparison of the different variants of multi-source M-FUDA, showcasing their respective performances under various loss functions.

In Table 4 and Table 5, we compared variants of multi-source M-FUDA to its combined-source version with MMD loss, CORAL [72], Wasserstein [89], and Wi-Adaptor [59], also known as MCD [88], trained on M-FUDA configurations taken for defining feature extractor and classifier architectures. We also used the same pre-trained model ResNet50 [80] as the initial block for all the baselines and fed them with combined source domains into a traditional single-source vs target setting to explore whether the multiple sources were valuable to exploit in a single-source domain adaptation (SUDA) setting or in the proposed M-FUDA in a multiple-source domain adaptation (MUDA) setting applied to cross unaligned domain-specific distributions in device-free human activity classification. The experimental micro- and macro-F1 results reported in Table 4 and Table 5 show that M-FUDA (MMD) is 13.78% better in terms of averaged micro-F1 score and 15.47% better in terms of averaged macro-F1 score than the combined-source MCD considering 3 domain shifting tasks created by 3 different users. Each task is repeated for 10 trials and an average figure is reported. These readings further validate that multi-source M-FUDA (MMD) is 3.85% higher in averaged micro-F1 score and 4.13% higher in averaged macro-F1 score than that of combined-source M-FUDA (MMD) calculated for 3 cross-user tasks run for 10 trials each. This substantiates that M-FUDA (MMD) can achieve a steadier recognition accuracy in a MUDA setting than its counterpart combined-source SUDA setups for cross-user domain shifting tasks in device-free HAR.

Similarly, M-FUDA (MMD) achieved very high recognition accuracy compared to CORAL [72] and Wasserstein [89], which ranked second-last and last in predictive performance, respectively. In contrast, M-FUDA (MK-MMD) outperformed M-FUDA (MMD) by 2.47% in averaged micro-F1 and macro-F1 scores. However, this improvement came at the cost of longer training times due to the higher computational complexity of multi-kernel evaluation, as shown in Table 10. It is evident that there was a big difference in the performance of multi-source M-FUDA vs combined-source SUDA methods, validating the superiority of the proposed architecture in cross-user conditions using device-free Wi-Fi CSI data. A detailed analysis of various cross-user tasks on the Parisafm dataset is presented below in which we compare the averaged micro- and macro-F1 scores of multi-source M-FUDA to baseline models, as illustrated in Figure 8.

7.2. Experiments with Varying Users and Environments

Our second experiment was designed to train our multi-source M-FUDA model for activities performed by users with different physical characteristics. Their performance also depended upon the environmental settings, such as the position of furniture, doors, windows, and transmission contact between the transmitter (Tx) and the receiver (Rx). The training data were not good enough to capture the diversity of the users and environments when the trained model was tested on varying external factors, for instance unseen users in unknown environments. Therefore, the transfer learning task was to assess our trained model’s robustness, adaptability, and transferability in terms of how well it generalized in such critical but very near to real-world scenarios. We utilized the Alsaify dataset to evaluate our model in such domain-changing applications using nine different users executing six different activities in three environments specified as (1) Jordan University of Science and Technology, a research laboratory (4.7 m × 4.7 m) area equipped with a Tx–Rx pair 3.7 m apart with a line-of-sight contact; (2) Jordan University of Science and Technology, Department, a university hallway (7.95 m × 3.6 m) area equipped with a Tx–Rx pair 7.6 m apart with a line-of-sight contact; and (3) Jordan University of Science and Technology, a room attached to a corridor fitted with a Tx inside the room and an Rx is outside the room with an 8 cm-thick wooden wall as a barrier causing a non-line-of-sight contact between the Tx–Rx pair, which are spaced 5.44 m apart. We also compared our model’s performance with baselines to investigate the impacts of model configurations and arrangements in multi-source MUDA vs combined-source SUDA settings. Multi-source M-FUDA (MMD) achieved the best micro- and macro-F1 scores on 7 out of 11 tasks when we changed both users and environments from the training to the testing phase, and this was even better than multi-source M-FUDA (MK-MMD) in terms of higher accuracy and shorter simulation time, as reported in Section 7.4. Our model obtained the highest averaged micro- and macro-F1 scores—75.6% and 75.0%, which is 3.70% and 3.59% higher than combined-source MCD [59,88], the second best performing technique, as reported in Table 6 and Table 7. It is also worth noting that the MUDA setting with the proposed alignments was able to achieve more balanced accuracy on individual instances and also per each class, leading to much higher micro- and macro-F1 scores than all the comparison techniques. In addition, multi-source MUDA (MMD) outperformed single-source M-FUDA (MMD) by 6.48% and 6.53% in micro- and macro-F1 scores; see Table 6 and Table 7. Performance comparison with multi-source M-FUDA (MMD) also showed superiority over combined-source CORAL [72] and Wasserstein [89].This demonstrates the effectiveness of utilizing diverse sources in the MUDA setting and the proposed architecture’s suitability for cross-user along with cross-environment domain changing setups in terms of capturing discriminative transfer features between classes using device-free Wi-Fi CSI data. A closer examination of various cross-user and cross-environment tasks on the Alsaify dataset revealed a comparison of the averaged micro- and macro-F1 scores for multi-source M-FUDA and the baseline models, as shown in Figure 9.

7.3. Experiments with Varying Atmospheric Conditions

Our last experiment was designed to train our multi-source M-FUDA model for activities performed by users on different days of the week. Since radio signals are sensitive to atmospheric conditions such as temperature, humidity, sunlight, darkness, wind speed, visibility, etc., the model’s adaptability to these typical scenarios is also an essential factor in analyzing the recognition accuracy in MUDA vs SUDA settings. To the best of our knowledge, we are the pioneers in assessing the performance of an adaptation model in cross-atmospheric conditions with the help of device-free Wi-Fi sensing. We found the Brinke and Meratnia dataset suitable for this assessment because we could use the CSI data for two participants performing activities on three different days of a week marked as day6, day7, and day8 in the public dataset. We focused on four activities, including clapping, falling, nothing, and walking. Multi-source M-FUDA (MMD) achieved the best micro-F1 and macro-F1 scores on 5 out of 6 tasks when we switched between day6, day7, and day8 from the training to the testing phase. It obtained the highest averaged micro- and macro-F1 scores—73.3% and 73.1%, which was 1.38% and 1.39% higher than combined-source M-FUDA (MMD) and MCD [59,88], as reported in Table 8 and Table 9. The proposed multi-source M-FUDA (MMD) also successfully outperformed multi-source M-FUDA (MK-MMD) and combined-source CORAL [72] and Wasserstein [89]. However, there was not a big difference in the predictive performance of multi-source M-FUDA vs combined-source SUDA methods in cross-atmospheric conditions, showing the limitations of the proposed architecture. M-FUDA is a computationally expensive architecture because we need to train each pair of source and target domains separately, and increasing the diversity in domains demands high computational costs, which is not advisable for minor accuracy improvements in scenarios such as cross-atmospheric conditions domain-shifting tasks, as demonstrated in Table 10. The Brinke and Meratnia dataset was analyzed for cross-atmospheric tasks, comparing the averaged micro- and macro-F1 scores of multi-source M-FUDA with those of baseline models, as depicted in Figure 10. Last, we compared the training times of the proposed M-FUDA variants with baseline methods across cross-domain tasks, as illustrated in Figure 11.

7.4. Computational Complexity

Table 10 reports the average computational time for all cross-user, cross-user with cross-environment, and cross-atmospheric experiments performed in Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9 for the proposed multi-source M-FUDA and baseline methods. It is evident that M-FUDA is a complex model architecture, as it trains domain-specific feature extractors and classifiers separately for each pair of source and target domains. Consequently, incorporating more diverse sources increases model complexity and computational cost. However, this approach can also enhance predictive performance, as the model learns separately from diverse source domains. Thus, the proposed model presents a trade-off between improved recognition accuracy and higher computational cost.

To mitigate complexity, our model adopts a relatively simple architecture, utilizing a pre-trained ResNet50 [80] block while maintaining a minimal design for other blocks with only a few layers. This deliberate simplification helps reduce model complexity. However, despite these efforts, the simulation time remains higher compared to single-source SUDA methods. This highlights a limitation of our proposed model in scenarios where the predictive performance does not significantly justify the increased computational cost.

8. Conclusions

This paper introduces multi-source M-FUDA, a technique for unsupervised domain adaptation (UDA) applied to Wi-Fi-based human activity recognition. Unlike traditional single-source UDA methods, M-FUDA integrates multiple source domains, leading to better recognition accuracy and robustness. The model uses a four-stage alignment process with a pre-trained model, domain-specific feature extractors, and classifiers, along with a maximum mean discrepancy (MMD) loss for better performance and simpler architecture.

The approach minimizes the discrepancy between classifiers for target samples and uses contrastive semantic alignment loss to improve classification across source–target domain pairs. Evaluations on three public datasets show that M-FUDA outperforms single-source UDA in scenarios where domain samples belong to diverse distributions. However, the model’s higher complexity introduces a trade-off between computational cost and performance, limiting its applicability in low-gain, high-cost situations.

Future work will explore the model’s effectiveness in more challenging scenarios, such as multi-target tracking, using more complex architectures to improve predictive accuracy.

Author Contributions

Conceptualization, M.H.; methodology, M.H.; software, M.H.; validation, M.H.; formal analysis, M.H. and T.K.; investigation, M.H. and T.K.; resources, M.H.; data curation, M.H.; writing—original draft preparation, M.H. and T.K.; visualization, M.H.; supervision, M.H. and T.K.; project administration, M.H. and T.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study, due to the public availability and prior anonymization of all data used in the study.

Informed Consent Statement

The data used in this study are publicly available with no personal information about the subjects disclosed. Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available at the following uniform resource locator (URL) links: https://github.com/parisafm/CSI-HAR-Dataset, https://data.mendeley.com/datasets/v38wjmz6f6 and https://data.4tu.nl/datasets/575f95f7-abce-4be0-b4d6-29b4a683cf4c/1 (accessed on 13 January 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Guo, G.; Lai, A. A Survey on Still Image based Human Action Recognition. Pattern Recognit. 2014, 47, 3343–3361. [Google Scholar] [CrossRef]
Anuradha, K.; Sairam, N. Spatio-temporal based approaches for human action recognition in static and dynamic background: A survey. Indian J. Sci. Technol. 2016, 9, 1–12. [Google Scholar] [CrossRef]
Ke, S.R.; Thuc, H.L.U.; Lee, Y.J.; Hwang, J.N.; Yoo, J.H.; Choi, K.H. A review on video-based human activity recognition. Computers 2013, 2, 88–131. [Google Scholar] [CrossRef]
Hassan, M.M.; Huda, S.; Uddin, M.Z.; Almogren, A.; Alrubaian, M. Human activity recognition from body sensor data using deep learning. J. Med. Syst. 2018, 42, 1–8. [Google Scholar] [CrossRef] [PubMed]
Webber, M.; Rojas, R.F. Human activity recognition with accelerometer and gyroscope: A data fusion approach. IEEE Sens. J. 2021, 21, 16979–16989. [Google Scholar] [CrossRef]
Yang, P.; Yang, C.; Lanfranchi, V.; Ciravegna, F. Activity graph based convolutional neural network for human activity recognition using acceleration and gyroscope data. IEEE Trans. Ind. Inform. 2022, 18, 6619–6630. [Google Scholar] [CrossRef]
Chen, K.; Zhang, D.; Yao, L.; Guo, B.; Yu, Z.; Liu, Y. Deep learning for sensor-based human activity recognition: Overview, challenges, and opportunities. ACM Comput. Surv. (CSUR) 2021, 54, 77. [Google Scholar] [CrossRef]
Wang, J.; Chen, Y.; Hao, S.; Peng, X.; Hu, L. Deep learning for sensor-based activity recognition: A survey. Pattern Recognit. Lett. 2019, 119, 3–11. [Google Scholar] [CrossRef]
Lara, O.D.; Labrador, M.A. A survey on human activity recognition using wearable sensors. IEEE Commun. Surv. Tutor. 2012, 15, 1192–1209. [Google Scholar] [CrossRef]
Al-Qaness, M.A.; Abd Elaziz, M.; Kim, S.; Ewees, A.A.; Abbasi, A.A.; Alhaj, Y.A.; Hawbani, A. Channel state information from pure communication to sense and track human motion: A survey. Sensors 2019, 19, 3329. [Google Scholar] [CrossRef]
Abdelnasser, H.; Harras, K.A.; Youssef, M. UbiBreathe: A ubiquitous non-invasive WiFi-based breathing estimator. In Proceedings of the 16th ACM International Symposium on Mobile Ad Hoc Networking and Computing, Paderborn, Germany, 5–8 July 2015; pp. 277–286. [Google Scholar]
Wang, P.; Guo, B.; Xin, T.; Wang, Z.; Yu, Z. TinySense: Multi-user respiration detection using Wi-Fi CSI signals. In Proceedings of the 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom), Dalian, China, 12–15 October 2017; pp. 1–6. [Google Scholar]
Liu, J.; Wang, Y.; Chen, Y.; Yang, J.; Chen, X.; Cheng, J. Tracking vital signs during sleep leveraging off-the-shelf Wi-Fi. In Proceedings of the 16th ACM International Symposium on Mobile Ad Hoc Networking and Computing, Paderborn, Germany, 5–8 July 2015; pp. 267–276. [Google Scholar]
Boudlal, H.; Serrhini, M.; Tahiri, A. A monitoring system for elderly people using Wi-Fi sensing with channel state information. Int. J. Interact. Mob. Technol. 2023, 17, 112. [Google Scholar] [CrossRef]
Wang, Y.; Wu, K.; Ni, L.M. Wifall: Device-free fall detection by wireless networks. IEEE Trans. Mob. Comput. 2016, 16, 581–594. [Google Scholar] [CrossRef]
Chu, Y.; Cumanan, K.; Sankarpandi, S.K.; Smith, S.; Dobre, O.A. Deep learning-based fall detection using Wi-Fi channel state information. IEEE Access 2023, 11, 83763–83780. [Google Scholar] [CrossRef]
Abdelnasser, H.; Youssef, M.; Harras, K.A. Wigest: A ubiquitous Wi-Fi-based gesture recognition system. In Proceedings of the 2015 IEEE Conference on Computer Communications (INFOCOM), Kowloon, Hong Kong, 26 April–1 May 2015; pp. 1472–1480. [Google Scholar]
Adib, F.; Katabi, D. See through walls with Wi-Fi! In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, Hong Kong, China, 12–16 August 2013; pp. 75–86. [Google Scholar]
Chen, Y.; Dong, W.; Gao, Y.; Liu, X.; Gu, T. Rapid: A multimodal and device-free approach using noise estimation for robust person identification. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2017, 1, 41. [Google Scholar] [CrossRef]
Cheng, L.; Wang, J. How can I guard my AP? Non-intrusive user identification for mobile devices using WiFi signals. In Proceedings of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing, Paderborn, Germany, 4–8 July 2016; pp. 91–100. [Google Scholar]
Arshad, S.; Feng, C.; Liu, Y.; Hu, Y.; Yu, R.; Zhou, S.; Li, H. Wi-chase: A WiFi based human activity recognition system for sensorless environments. In Proceedings of the 2017 IEEE 18th International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), Macau, China, 12–15 June 2017; pp. 1–6. [Google Scholar]
Feng, C.; Arshad, S.; Liu, Y. Mais: Multiple activity identification system using channel state information of wifi signals. In Proceedings of the 12th International Conference on Wireless Algorithms, Systems, and Applications: WASA 2017, Guilin, China, 19–21 June 2017; Proceedings 12. Springer International Publishing: Cham, Switzerland, 2017; pp. 419–432. [Google Scholar]
Gao, Q.; Wang, J.; Ma, X.; Feng, X.; Wang, H. CSI-based device-free wireless localization and activity recognition using radio image features. IEEE Trans. Veh. Technol. 2017, 66, 10346–10356. [Google Scholar] [CrossRef]
Won, M.; Zhang, S.; Son, S.H. WiTraffic: Low-cost and non-intrusive traffic monitoring system using Wi-Fi. In Proceedings of the 2017 26th International Conference on Computer Communication and Networks (ICCCN), Vancouver, BC, Canada, 31 July–3 August 2017; pp. 1–9. [Google Scholar]
Zhu, Y.; Yao, Y.; Zhao, B.Y.; Zheng, H. Object recognition and navigation using a single networking device. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, Niagara Falls, NY, USA, 19–23 June 2017; pp. 265–277. [Google Scholar]
Zhang, X.; Ruby, R.; Long, J.; Wang, L.; Ming, Z.; Wu, K. WiHumidity: A novel CSI-based humidity measurement system. In Proceedings of the First International Conference on Smart Computing and Communication: SmartCom 2016, Shenzhen, China, 17–19 December 2017; Proceedings 1. Springer International Publishing: Cham, Switzerland, 2017; pp. 537–547. [Google Scholar]
Zinys, A.; van Berlo, B.; Meratnia, N. A domain-independent generative adversarial network for activity recognition using WiFi CSI data. Sensors 2021, 21, 7852. [Google Scholar] [CrossRef] [PubMed]
Tse, D.; Viswanath, P. Fundamentals of Wireless Communication; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Kumar, R.; Sinwar, D.; Singh, V. Analysis of QoS aware traffic template in n78 band using proportional fair scheduling in 5G NR. Telecommun. Syst. 2024, 87, 17–32. [Google Scholar] [CrossRef]
Chen, C.; Zhou, G.; Lin, Y. Cross-domain Wi-Fi sensing with channel state information: A survey. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar]
Shi, Z.; Zhang, J.A.; Xu, R.; Cheng, Q.; Pearce, A. Towards environment-independent human activity recognition using deep learning and enhanced CSI. In Proceedings of the GLOBECOM 2020-2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020; pp. 1–6. [Google Scholar]
Wang, X.; Yang, C.; Mao, S. Resilient respiration rate monitoring with realtime bimodal CSI data. IEEE Sens. J. 2020, 20, 10187–10198. [Google Scholar] [CrossRef]
Ma, Y.; Zhou, G.; Wang, S. Wi-Fi sensing with channel state information: A survey. ACM Comput. Surv. (CSUR) 2019, 52, 46. [Google Scholar]
Huang, J.; Liu, B.; Chen, C.; Jin, H.; Liu, Z.; Zhang, C.; Yu, N. Towards anti-interference human activity recognition based on Wi-Fi subcarrier correlation selection. IEEE Trans. Veh. Technol. 2020, 69, 6739–6754. [Google Scholar] [CrossRef]
Ahmed, H.F.T.; Ahmad, H.; Aravind, C.V. Device free human gesture recognition using Wi-Fi CSI: A survey. Eng. Appl. Artif. Intell. 2020, 87, 103281. [Google Scholar] [CrossRef]
Shalaby, E.; ElShennawy, N.; Sarhan, A. Utilizing deep learning models in CSI-based human activity recognition. Neural Comput. Appl. 2022, 34, 5993–6010. [Google Scholar] [CrossRef]
Chen, Z.; Zhang, L.; Jiang, C.; Cao, Z.; Cui, W. Wi-Fi CSI based passive human activity recognition using attention based BLSTM. IEEE Trans. Mob. Comput. 2018, 18, 2714–2724. [Google Scholar] [CrossRef]
Khan, D.A.; Razak, S.; Raj, B.; Singh, R. Human behaviour recognition using Wi-Fi channel state information. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 7625–7629. [Google Scholar]
Zhuravchak, A.; Kapshii, O.; Pournaras, E. Human activity recognition based on Wi-Fi CSI data- A deep neural network approach. Procedia Comput. Sci. 2022, 198, 59–66. [Google Scholar] [CrossRef]
Zou, H.; Yang, J.; Zhou, Y.; Spanos, C.J. Joint adversarial domain adaptation for resilient WiFi-enabled device-free gesture recognition. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 202–207. [Google Scholar]
Brinke, J.K.; Meratnia, N. Scaling activity recognition using channel state information through convolutional neural networks and transfer learning. In Proceedings of the First International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things, New York, NY, USA, 10–13 November 2019; pp. 56–62. [Google Scholar]
Yin, G.; Zhang, J.; Shen, G.; Chen, Y. FewSense, towards a scalable and cross-domain Wi-Fi sensing system using few-shot learning. IEEE Trans. Mob. Comput. 2022, 23, 453–468. [Google Scholar] [CrossRef]
Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv. (CSUR) 2020, 53, 63. [Google Scholar] [CrossRef]
Song, Y.; Wang, T.; Cai, P.; Mondal, S.K.; Sahoo, J.P. A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities. ACM Comput. Surv. 2023, 55, 1–40. [Google Scholar] [CrossRef]
Hu, P.; Tang, C.; Yin, K.; Zhang, X. Wigr: A practical Wi-Fi-based gesture recognition system with a lightweight few-shot network. Appl. Sci. 2021, 11, 3329. [Google Scholar] [CrossRef]
Ma, X.; Zhao, Y.; Zhang, L.; Gao, Q.; Pan, M.; Wang, J. Practical device-free gesture recognition using Wi-Fi signals based on metalearning. IEEE Trans. Ind. Inform. 2019, 16, 228–237. [Google Scholar] [CrossRef]
Moshiri, P.F.; Nabati, M.; Shahbazian, R.; Ghorashi, S.A. CSI-based human activity recognition using convolutional neural networks. In Proceedings of the 2021 11th International Conference on Computer Engineering and Knowledge (ICCKE), Mashhad, Iran, 28–29 October 2021; pp. 7–12. [Google Scholar]
Yousefi, S.; Narui, H.; Dayal, S.; Ermon, S.; Valaee, S. A survey on behavior recognition using Wi-Fi channel state information. IEEE Commun. Mag. 2017, 55, 98–104. [Google Scholar] [CrossRef]
Moshiri, P.F.; Shahbazian, R.; Nabati, M.; Ghorashi, S.A. A CSI-based human activity recognition using deep learning. Sensors 2021, 21, 7225. [Google Scholar] [CrossRef]
Hassan, M.; Kelsey, T.; Rahman, F. Adversarial AI applied to cross-user inter-domain and intra-domain adaptation in human activity recognition using wireless signals. PLoS ONE 2024, 19, e0298888. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Jiao, W. Imgfi: A high accuracy and lightweight human activity recognition framework using CSI image. IEEE Sens. J. 2023, 23, 21966–21977. [Google Scholar] [CrossRef]
Wang, Z.; Oates, T. Imaging time-series to improve classification and imputation. arXiv 2015, arXiv:1506.00327. [Google Scholar]
Eckmann, J.P.; Kamphorst, S.O.; Ruelle, D. Recurrence plots of dynamical systems. World Sci. Ser. Nonlinear Sci. Ser. A 1995, 16, 441–446. [Google Scholar]
Lee, H.; Ahn, C.R.; Choi, N. Fine-grained occupant activity monitoring with Wi-Fi channel state information: Practical implementation of multiple receiver settings. Adv. Eng. Inform. 2020, 46, 101147. [Google Scholar] [CrossRef]
Sharma, L.; Chao, C.H.; Wu, S.L.; Li, M.C. High accuracy Wi-Fi-based human activity classification system with time-frequency diagram CNN method for different places. Sensors 2021, 21, 3797. [Google Scholar] [CrossRef]
Sundararajan, D. Discrete Wavelet Transform: A Signal Processing Approach; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
Gray, R.M.; Goodman, J.W. Fourier Transforms: An Introduction for Engineers; Springer Science+Business Media: New York, NY, USA, 2012; Volume 322. [Google Scholar]
Anand, R.; Shanthi, T.; Nithish, M.S.; Lakshman, S. Face recognition and classification using GoogleNET architecture. Soft Comput. Probl. Solving 2018, 1, 261–269. [Google Scholar]
Zhang, H.; Zhou, Z.; Gong, W. Wi-adaptor: Fine-grained domain adaptation in Wi-Fi-based activity recognition. In Proceedings of the 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021; pp. 1–6. [Google Scholar]
Chen, X.; Li, H.; Zhou, C.; Liu, X.; Wu, D.; Dudek, G. Fidora: Robust WiFi-based indoor localization via unsupervised domain adaptation. IEEE Internet Things J. 2022, 9, 9872–9888. [Google Scholar] [CrossRef]
Yang, J.; Chen, X.; Zou, H.; Wang, D.; Xie, L. Autofi: Toward automatic Wi-Fi human sensing via geometric self-supervised learning. IEEE Internet Things J. 2022, 10, 7416–7425. [Google Scholar] [CrossRef]
Chen, X.; Li, H.; Zhou, C.; Liu, X.; Wu, D.; Dudek, G. Fido: Ubiquitous fine-grained Wi-Fi based localization for unlabeled users via domain adaptation. In Proceedings of the Web Conference, Taipei, Taiwan, 20–24 April 2020; pp. 23–33. [Google Scholar]
Al-qaness, M.A.A.; Li, F. WiGeR: WiFi-based gesture recognition system. ISPRS Int. J. Geo-Inf. 2016, 5, 92. [Google Scholar] [CrossRef]
Tian, Z.; Wang, J.; Yang, X.; Zhou, M. WiCatch: A Wi-Fi based hand gesture recognition system. IEEE Access 2018, 6, 16911–16923. [Google Scholar] [CrossRef]
Ma, Y.; Zhou, G.; Wang, S.; Zhao, H.; Jung, W. SignFi: Sign language recognition using Wi-Fi. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2018, 2, 1–21. [Google Scholar] [CrossRef]
Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese neural networks for one-shot image recognition. ICML Deep Learn. Workshop 2015, 2, 1–30. [Google Scholar]
Zhu, Y.; Zhuang, F.; Wang, D. Aligning domain-specific distribution and classifier for cross-domain classification from multiple sources. Proc. AAAI Conf. Artif. Intell. 2019, 33, 5989–5996. [Google Scholar] [CrossRef]
Xu, R.; Chen, Z.; Zuo, W.; Yan, J.; Lin, L. Deep cocktail network: Multi-source unsupervised domain adaptation with category shift. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3964–3973. [Google Scholar]
Kolouri, S.; Nadjahi, K.; Simsekli, U.; Badeau, R.; Rohde, G. Generalized sliced Wasserstein distances. Adv. Neural Inf. Process. Syst. 2019, 32, 261–272. [Google Scholar]
Courty, N.; Flamary, R.; Tuia, D.; Rakotomamonjy, A. Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1853–1865. [Google Scholar] [CrossRef]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
Sun, B.; Saenko, K. Deep coral: Correlation alignment for deep domain adaptation. In Proceedings of the European Conference on Computer Vision—ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–10, 15–16 October 2016; Proceedings, Part III 14. Springer International Publishing: Cham, Switzerland, 2016; pp. 443–450. [Google Scholar]
Sun, B.; Feng, J.; Saenko, K. Return of frustratingly easy domain adaptation. Proc. AAAI Conf. Artif. Intell. 2016, 30, 10306. [Google Scholar] [CrossRef]
Gretton, A.; Borgwardt, K.M.; Rasch, M.J.; Schölkopf, B.; Smola, A. A kernel two-sample test. J. Mach. Learn. Res. 2012, 13, 723–773. [Google Scholar]
Li, X.; Yuan, P.; Su, K.; Li, D.; Xie, Z.; Kong, X. Innovative integration of multi-scale residual networks and MK-MMD for enhanced feature representation in fault diagnosis. Meas. Sci. Technol. 2024, 35, 086108. [Google Scholar] [CrossRef]
Xia, P.; Niu, H.; Li, Z.; Li, B. Enhancing backdoor attacks with multi-level MMD regularization. IEEE Trans. Dependable Secur. Comput. 2022, 20, 1675–1686. [Google Scholar] [CrossRef]
Wang, W.; Li, B.; Yang, S.; Sun, J.; Ding, Z.; Chen, J.; Dong, X.; Wang, Z.; Li, H. A unified joint maximum mean discrepancy for domain adaptation. arXiv 2021, arXiv:2101.09979. [Google Scholar]
Motiian, S.; Piccirilli, M.; Adjeroh, D.A.; Doretto, G. Unified deep supervised domain adaptation and generalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5715–5725. [Google Scholar]
Zhang, J.; Tang, Z.; Li, M.; Fang, D.; Nurmi, P.; Wang, Z. CrossSense: Towards cross-site and large-scale Wi-Fi sensing. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, New Delhi, India, 29 October–2 November 2018; pp. 305–320. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Deep transfer learning with joint adaptation networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 2208–2217. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; March, M.; Lempitsky, V. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar]
Baha’A, A.; Almazari, M.M.; Alazrai, R.; Daoud, M.I. A dataset for Wi-Fi-based human activity recognition in line-of-sight and non-line-of-sight indoor environments. Data Brief 2020, 33, 106534. [Google Scholar]
Gringoli, F.; Schulz, M.; Link, J.; Hollick, M. Free your CSI: A channel state information extraction platform for modern Wi-Fi chipsets. In Proceedings of the 13th International Workshop on Wireless Network Testbeds, Experimental Evaluation & Characterization, Los Cabos, Mexico, 25 October 2019; pp. 21–28. [Google Scholar]
Portnoff, M. Time-frequency representation of digital signals and systems based on short-time Fourier analysis. IEEE Trans. Acoust. Speech Signal Process. 1980, 28, 55–69. [Google Scholar] [CrossRef]
Halperin, D.; Hu, W.; Sheth, A.; Wetherall, D. Tool release: Gathering 802.11 n traces with channel state information. ACM SIGCOMM Comput. Commun. Rev. 2011, 41, 53. [Google Scholar] [CrossRef]
Saito, K.; Watanabe, K.; Ushiku, Y.; Harada, T. Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3723–3732. [Google Scholar]
Han, Y.; Liu, X.; Sheng, Z.; Ren, Y.; Han, X.; You, J.; Liu, R.; Luo, Z. Wasserstein loss-based deep object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 998–999. [Google Scholar]

Figure 1. System overflow. Dark red indicates higher amplitude; blue indicates moderate amplitude; yellow and green indicate lower amplitude.

Figure 2. Architectural details of ResNet50 for fine-tuning the shared sub-network.

Figure 3. Architectural details of domain-specific feature extractors.

Figure 4. Architectural details of domain-specific classifiers.

Figure 5. Model proposed architecture with four-stage alignment losses.

Figure 6. t-SNE plots for the selected datasets. The activity labels are shown in Table 1. (a) t-SNE visualization of the Parisafm dataset. (b) t-SNE visualization of the Alsaify dataset. (c) t-SNE visualization of the Brinke and Meratnia dataset.

Figure 7. Comparison of average micro- and macro-F1 scores for variants of multi-source M-FUDA across different cross-user tasks.

Figure 8. Comparison of average micro- and macro-F1 scores for multi-source M-FUDA and baseline models across cross-user tasks.

Figure 9. Comparison of average micro- and macro-F1 scores for multi-source M-FUDA and baseline models across cross-user and cross-environment tasks.

Figure 10. Comparison of average micro- and macro-F1 scores for multi-source M-FUDA and baseline models across cross-atmospheric tasks.

Figure 11. Comparison of training time for multi-source M-FUDA and baseline models across cross-domain tasks.

Table 1. Descriptions of datasets.

Dataset	No. of Features	No. of Samples	Antenna Pairs	No. of Users	No. of Environments	Atmospheric Impacts	Activities
Parisafm	52	420	1	3	1	Disregarded	(0) bending, (1) falling, (2) lie down, (3) running, (4) sit down, (5) stand up, and (6) walking
Alsaify	90	3240	3	6	3	Disregarded	(0) sit still on a chair, (1) falling down, (2) lie down, (3) stand still, (4) walking from the transmitter to the receiver, and (5) pick a pen from the ground
Brinke and Meratnia	270	5400	6	2	1	Considered	(0) clapping, (1) falling, (2) nothing, (3) walking

Table 2. Micro-F1 scores for variants of multi-source M-FUDA. Green background indicates the best accuracy for an individual task; blue background indicates best average accuracy; yellow background and bold font highlight the average performances.

Micro-F1
Source Domains			Target Domain	Proposed Methods
Source Domains			Target Domain	Variants of Multi-Source M-FUDA
Sourrce 1	Source 2	Source 3	Target 1	(Disc)	(Disc + CCSA)	(Disc + CCSA + JMMD)	(Disc + CCSA + MK-MMD)	(Disc + CCSA + MMD)
S1	S2	S1 + S2	S3	0.74	0.85	0.69	0.86	0.86
S2	S3	S2 + S3	S1	0.62	0.67	0.67	0.83	0.73
S1	S3	S1 + S3	S2	0.75	0.84	0.69	0.79	0.84
Average				0.70	0.78	0.68	0.83	0.81

Table 3. Macro-F1 scores for variants of multi-source M-FUDA. Green background indicates the best accuracy for an individual task; blue background indicates best average accuracy; yellow background and bold font highlight the average performances.

Macro-F1
Source Domains			Target Domain	Proposed Methods
Source Domains			Target Domain	Variants of Multi-Source M-FUDA
Sourrce 1	Source 2	Source 3	Target 1	(Disc)	(Disc + CCSA)	(Disc + CCSA + JMMD)	(Disc + CCSA + MK-MMD)	(Disc + CCSA + MMD)
S1	S2	S1 + S2	S3	0.73	0.81	0.66	0.86	0.83
S2	S3	S2 + S3	S1	0.61	0.65	0.66	0.82	0.74
S1	S3	S1 + S3	S2	0.75	0.83	0.70	0.80	0.85
Average				0.69	0.76	0.67	0.83	0.81

Table 4. Micro-F1 scores for multi-source M-FUDA and baseline models across cross-user tasks. Green background indicates the best accuracy for an individual task; blue background indicates best average accuracy; yellow background and bold font highlight the average performances.

Micro-F1
Source Domains			Target Domain	Proposed Multi-Source Models		Combined-Source Models
Source 1	Source 2	Source 3	Target 1	M-FUDA (MMD)	M-FUDA (MK-MMD)	M-FUDA (MMD)	CORAL [72]	Wasserstein [89]	MCD [59,88]
S1	S2	S1 + S2	S3	0.86	0.86	0.81	0.83	0.83	0.78
S2	S3	S2 + S3	S1	0.73	0.83	0.64	0.62	0.62	0.59
S1	S3	S1 + S3	S2	0.84	0.79	0.88	0.66	0.66	0.77
Average				0.81	0.83	0.78	0.70	0.70	0.71

Note: S1 means subject 1, S2 means subject 2, S3 means subject 3.

Table 5. Macro-F1 scores for multi-source M-FUDA and baseline models across cross-user tasks. Green background indicates the best accuracy for an individual task; blue background indicates best average accuracy; yellow background and bold font highlight the average performances.

Macro-F1
Source Domains			Target Domain	Proposed Multi-Source Models		Combined-Source Models
Source 1	Source 2	Source 3	Target 1	M-FUDA (MMD)	M-FUDA (MK-MMD)	M-FUDA (MMD)	CORAL [72]	Wasserstein [89]	MCD [59,88]
S1	S2	S1 + S2	S3	0.83	0.86	0.78	0.81	0.82	0.76
S2	S3	S2 + S3	S1	0.74	0.82	0.66	0.63	0.59	0.59
S1	S3	S1 + S3	S2	0.85	0.80	0.88	0.69	0.69	0.74
Average				0.81	0.83	0.77	0.71	0.70	0.70

Note: S1 means Subject 1, S2 means Subject 2, S3 means Subject 3.

Table 6. Micro-F1 scores for multi-source M-FUDA and baseline models across cross-user and cross-environment tasks. Green background indicates the best accuracy for an individual task; blue background indicates best average accuracy; yellow background and bold font highlight the average performances.

Micro-F1
Source Domains			Target Domain	Proposed Multi-Source Models		Combined-Source Models
Source 1	Source 2	Source 3	Target 1	M-FUDA (MMD)	M-FUDA (MK-MMD)	M-FUDA (MMD)	CORAL [72]	Wasserstein [89]	MCD [59,88]
E1(S1)	E1(S2)	E1(S3)	E2(S12)	0.69	0.65	0.67	0.62	0.62	0.67
E1(S1)	E1 (S2)	E1(S3)	E3(S21)	0.72	0.74	0.72	0.70	0.69	0.75
E2(S11)	E2(S12)	E2(S13)	E1(S3)	0.71	0.69	0.57	0.53	0.50	0.59
E2(S11)	E2(S12)	E2(S13)	E3(S21)	0.81	0.81	0.73	0.68	0.63	0.71
E2(S11)	E2(S12)	E2(S13)	E3(S23)	0.81	0.77	0.73	0.7	0.68	0.74
E3(S21)	E3(S22)	E3(S23)	E1(S1)	0.74	0.68	0.69	0.64	0.63	0.80
E3(S21)	E3(S22)	E3(S23)	E1(S2)	0.90	0.88	0.84	0.75	0.72	0.85
E3(S21)	E3(S22)	E3(S23)	E1(S3)	0.72	0.74	0.71	0.68	0.65	0.73
E3(S21)	E3(S22)	E3(S23)	E2(S11)	0.70	0.71	0.67	0.72	0.71	0.74
E3(S21)	E3(S22)	E3(S23)	E2(S12)	0.80	0.79	0.77	0.66	0.60	0.75
E3(S21)	E3(S22)	E3(S23)	E2(S13)	0.73	0.70	0.71	0.65	0.62	0.70
Average				0.76	0.74	0.71	0.67	0.64	0.73

Note: S1 means Subject 1, S2 means Subject 2, S3 means Subject 3, S11 means Subject 11, S12 means Subject 12, S13 means Subject 13, S21 means Subject 21, S22 means Subject 22, S23 means Subject 23, E1 means Environment 1, E2 means Environment 2, and E3 means Environment 3.

Table 7. Macro-F1 scores for multi-source M-FUDA and baseline models across cross-user and cross-environment tasks. Green background indicates the best accuracy for an individual task; blue background indicates best average accuracy; yellow background and bold font highlight the average performances.

Macro-F1
Source Domains			Target Domain	Proposed Multi-Source Models		Combined-Source Models
Source 1	Source 2	Source 3	Target 1	M-FUDA (MMD)	M-FUDA (MK-MMD)	M-FUDA (MMD)	CORAL [72]	Wasserstein [89]	MCD [59,88]
E1(S1)	E1(S2)	E1(S3)	E2(S12)	0.70	0.67	0.67	0.63	0.61	0.68
E1(S1)	E1 (S2)	E1(S3)	E3(S21)	0.68	0.71	0.72	0.69	0.68	0.74
E2(S11)	E2(S12)	E2(S13)	E1(S3)	0.68	0.65	0.54	0.52	0.50	0.55
E2(S11)	E2(S12)	E2(S13)	E3(S21)	0.81	0.81	0.73	0.69	0.62	0.70
E2(S11)	E2(S12)	E2(S13)	E3(S23)	0.81	0.78	0.74	0.7	0.67	0.73
E3(S21)	E3(S22)	E3(S23)	E1(S1)	0.72	0.66	0.66	0.62	0.61	0.78
E3(S21)	E3(S22)	E3(S23)	E1(S2)	0.90	0.86	0.83	0.75	0.73	0.85
E3(S21)	E3(S22)	E3(S23)	E1(S3)	0.69	0.72	0.68	0.67	0.64	0.71
E3(S21)	E3(S22)	E3(S23)	E2(S11)	0.71	0.72	0.68	0.72	0.70	0.75
E3(S21)	E3(S22)	E3(S23)	E2(S12)	0.81	0.79	0.78	0.65	0.59	0.76
E3(S21)	E3(S22)	E3(S23)	E2(S13)	0.75	0.72	0.72	0.64	0.62	0.71
Average				0.75	0.74	0.70	0.66	0.63	0.72

Note: S1 means Subject 1, S2 means Subject 2, S3 means Subject 3, S11 means Subject 11, S12 means Subject 12, S13 means Subject 13, S21 means Subject 21, S22 means Subject 22, S23 means Subject 23, E1 means Environment 1, E2 means Environment 2, and E3 means Environment 3.

Table 8. Micro-F1 scores for multi-source M-FUDA and baseline models across cross-atmospheric tasks. Green background indicates the best accuracy for an individual task; blue background indicates best average accuracy; yellow background and bold font highlight the average performances.

Micro-F1
Source Domains			Target Domain	Proposed Multi-Source Models		Combined-Source Models
Source 1	Source 2	Source 3	Target 1	M-FUDA (MMD)	M-FUDA (MK-MMD)	M-FUDA (MMD)	CORAL [72]	Wasserstein [89]	MCD [59,88]
D6(S1)	D7(S1)	D8(S2)	D8(S1)	0.70	0.68	0.69	0.57	0.58	0.68
D7(S1)	D8(S1)	D6(S2)	D6(S1)	0.78	0.75	0.80	0.71	0.68	0.89
D6(S1)	D8(S1)	D7(S2)	D7(S1)	0.76	0.73	0.75	0.68	0.66	0.70
D6(S2)	D7(S2)	D8(S1)	D8(S2)	0.67	0.66	0.63	0.58	0.60	0.62
D7(S2)	D8(S2)	D6(S1)	D6(S2)	0.74	0.67	0.74	0.67	0.65	0.73
D6(S2)	D8(S2)	D7(S1)	D7(S2)	0.75	0.72	0.73	0.69	0.67	0.72
Average				0.73	0.70	0.72	0.65	0.64	0.72

Note: S1 means Subject 1, S2 means Subject 2, D6 means day6, D7 means day7, and D8 means day8.

Table 9. Macro-F1 scores for multi-source M-FUDA and baseline models across cross-atmospheric tasks. Green background indicates the best accuracy for an individual task; blue background indicates best average accuracy; yellow background and bold font highlight the average performances.

Macro-F1
Source Domains			Target Domain	Proposed Multi-Source Models		Combined-Source Models
Source 1	Source 2	Source 3	Target 1	M-FUDA (MMD)	M-FUDA (MK-MMD)	M-FUDA (MMD)	CORAL [72]	Wasserstein [89]	MCD [59,88]
D6(S1)	D7(S1)	D8(S2)	D8(S1)	0.70	0.67	0.68	0.56	0.57	0.68
D7(S1)	D8(S1)	D6(S2)	D6(S1)	0.78	0.75	0.80	0.69	0.67	0.89
D6(S1)	D8(S1)	D7(S2)	D7(S1)	0.76	0.72	0.75	0.68	0.66	0.70
D6(S2)	D7(S2)	D8(S1)	D8(S2)	0.67	0.64	0.62	0.58	0.60	0.62
D7(S2)	D8(S2)	D6(S1)	D6(S2)	0.74	0.67	0.74	0.68	0.65	0.72
D6(S2)	D8(S2)	D7(S1)	D7(S2)	0.74	0.71	0.73	0.67	0.66	0.72
Average				0.73	0.70	0.72	0.64	0.64	0.72

Note: S1 means Subject 1, S2 means Subject 2, D6 means day6, D7 means day7, and D8 means day8.

Table 10. Average training time for multi-source M-FUDA and baseline models across cross-domain tasks. Green background indicates the best accuracy for an individual task; blue background indicates best average accuracy; yellow background and bold font highlight the average performances.

Training Time (seconds)
Cross-Domain Tasks	Proposed Multi-Source Models		Combined-Source Models
Cross-Domain Tasks	M-FUDA (MMD)	M-FUDA (MK-MMD)	M-FUDA (MMD)	CORAL [72]	Wasserstein [89]	MCD [59,88]
Cross-User	521.97	632.49	370.29	324.26	312.34	353.91
Cross-User + Cross-Environment	504.23	615.13	430.47	409.12	395.56	426.89
Cross-Atmospheric	441.38	532.65	367.98	347.36	325.14	320.91
Average	489.19	593.42	389.58	360.25	344.35	367.24

Note: All the experimental values for training time are in seconds.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hassan, M.; Kelsey, T. Multi-Feature Unsupervised Domain Adaptation (M-FUDA) Applied to Cross Unaligned Domain-Specific Distributions in Device-Free Human Activity Classification. Sensors 2025, 25, 1876. https://doi.org/10.3390/s25061876

AMA Style

Hassan M, Kelsey T. Multi-Feature Unsupervised Domain Adaptation (M-FUDA) Applied to Cross Unaligned Domain-Specific Distributions in Device-Free Human Activity Classification. Sensors. 2025; 25(6):1876. https://doi.org/10.3390/s25061876

Chicago/Turabian Style

Hassan, Muhammad, and Tom Kelsey. 2025. "Multi-Feature Unsupervised Domain Adaptation (M-FUDA) Applied to Cross Unaligned Domain-Specific Distributions in Device-Free Human Activity Classification" Sensors 25, no. 6: 1876. https://doi.org/10.3390/s25061876

APA Style

Hassan, M., & Kelsey, T. (2025). Multi-Feature Unsupervised Domain Adaptation (M-FUDA) Applied to Cross Unaligned Domain-Specific Distributions in Device-Free Human Activity Classification. Sensors, 25(6), 1876. https://doi.org/10.3390/s25061876

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Feature Unsupervised Domain Adaptation (M-FUDA) Applied to Cross Unaligned Domain-Specific Distributions in Device-Free Human Activity Classification

Abstract

1. Introduction

2. Related Work

3. Preliminaries

3.1. Channel State Information (CSI)

3.2. Wasserstein Distance

3.3. Correlation Alignment

3.4. Maximum Mean Discrepancy and Its Variants

4. Problem Definition

5. Materials and Methods

5.1. Proposed Method

5.2. Overview

5.3. Domain Invariant Feature Alignment

5.4. Domain-Specific Feature Alignment

5.5. Domain-Specific Classifier Alignment

5.6. Contrastive Semantic Alignment

6. Experimental Results

6.1. Datasets

6.2. Configuration and Hyperparameter Tuning

6.2.1. Domain-Specific Feature Extractors

6.2.2. Domain-Specific Classifiers

6.2.3. Grid Search for Optimal Configuration

6.2.4. Learning Rate Strategy and Optimization

6.2.5. Optimization Algorithms

6.2.6. Early Stopping and Data Splits

6.3. Comparison Techniques and Evaluation Metrics

7. Results and Discussion

7.1. Experiments with Varying Users

7.2. Experiments with Varying Users and Environments

7.3. Experiments with Varying Atmospheric Conditions

7.4. Computational Complexity

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI