1. Introduction
With the fast advancement of the Internet of Things (IoT) and intelligent positioning technologies, Ultra-Wideband (UWB) positioning technology has been widely applied in fields such as indoor positioning [
1,
2,
3], industrial navigation, robots, and personnel tracking due to its high temporal resolution, strong resistance to multipath interference, and excellent real-time performance [
4,
5,
6]. However, in complex real-world scenarios, signal propagation paths are easily obstructed by obstacles such as human bodies, walls, industrial equipment, etc., leading to Non-Line-of-Sight (NLOS) propagation [
7,
8]. NLOS propagation can cause significant ranging errors [
9], severely impacting the positioning accuracy of UWB systems. Thus, to achieve high accuracy in the UWB indoor positioning system, it is imperative to mitigate the impact of NLOS propagation [
7,
10]. There are many methods to mitigate the impact of NLOS propagation on UWB positioning. Common approaches include directly optimizing the filtering framework (e.g., Bayesian, Kalman filtering, etc.) [
1,
11], and preprocessing ranging data through NLOS detection to remove or attenuate NLOS components before feeding these data into the filter [
12]. The latter has become the most widely used method in engineering due to its high compatibility with classical filtering algorithms. Accurate identification of NLOS propagation is a key prerequisite for overcoming the non-Gaussian nature of ranging errors and thereby improving UWB positioning accuracy in complex environments. Therefore, NLOS identification is crucial for UWB positioning systems [
13].
In early studies, NLOS identification was performed based on manually extracted statistical features from signals rather than using Machine Learning (ML) algorithms [
14,
15]. However, this type of NLOS identification algorithm is highly dependent on manual feature design capabilities. If the features are unreasonably designed, it will lead to low recognition accuracy. Additionally, the core parameters of the threshold method and feature fusion decision (such as thresholds and feature weights) need to be manually optimized based on experimental data from specific scenarios, resulting in poor environmental adaptability. The NLOS identification method based on ML utilizes the automatic feature learning ability of the model to directly capture the essential differences between LOS and NLOS from the original signal or low dimensional features, providing a new path for accurate recognition in complex scenes. At present, common NLOS recognition methods are mainly based on ML techniques, which can be classified into two categories: one is based on collected raw physical data (e.g., raw Channel Impulse Response (CIR)), and the other is the feature-based method [
7]. The core principle of ML-based NLOS identification using raw CIR is to leverage ML models to automatically extract the differential features between Line-of-Sight (LOS) and NLOS propagation from raw CIR signals, thereby realizing the classification of these two propagation scenarios. Jiang et al. [
16] developed a UWB NLOS/LOS signal classification approach based on Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN). Authors adopted CNN to automatically extract features, and then fed the CNN output into the LSTM to perform identification. Liu et al. [
17] proposed an NLOS/LOS classification strategy employing CNN and Gated Recurrent Unit (GRU). Authors utilized the CNN and GRU to capture spatial features and temporal features, respectively. With squeeze-and-excitation blocks embedded in these architectures, researchers are able to weight channel-wise features adaptively. The core principle of feature-based ML for NLOS identification is to first manually design or computationally extract key features that can distinguish between LOS and NLOS from signals. Then, ML models are employed to capture the relationship between these features and propagation scenarios, and, finally, the classification is realized. Si et al. [
13] developed a NLOS identification strategy that leverages CNN-derived features from raw CIR and handcrafted features to distinguish NLOS from LOS conditions. In the developed method, the Multilayer Perceptron (MLP) was employed to fuse the CNN-extracted features with six manual features. The results illustrate that the presented approach achieves superior performance over the traditional image-based CNN method, with a performance improvement of 44.16%. Yang et al. [
18] put forward a two-step identification approach for classifying UWB channels, which relies on the fuzzy credibility-based Support
Vector Machine (SVM) and dynamic threshold comparison. Although these existing NLOS identification methods can achieve a certain level of identification performance in their verified scenarios, these approaches exhibit strong scene dependency. When the test environment changes (e.g., switching from an indoor office scenario to a complex outdoor environment, or from a single-obstacle scenario to a multi-obstacle scenario), their recognition accuracy will decrease significantly due to differences in the distribution of environmental features.
To address this critical issue, in recent years, some scholars have proposed integrating transfer learning (TL) algorithms into the NLOS identification framework. By exploring the conventional features of NLOS signals across various scenarios and mitigating the impact of distribution shifts caused by scenario variations, this approach offers a novel solution to the issue of unstable recognition accuracy in cross-scenario NLOS applications. Sun et al. [
19] observed that integrating TL into the Stockwell transform and convolutional neural network (ST-CNN)-based NLOS identification approach can effectively cut down both training time and data requirements compared to the standalone ST-CNN method, while still preserving high performance levels. Fontaine et al. [
7] proposed a TL strategy for UWB NLOS identification employing feature-based and CIR-based NNs. For the feature-based method, the Deep Neural Network (DNN) was used to identify NLOS. For the raw CIR-based method, the CNN was used to identify NLOS. In addition, the authors put forward automatic optimization strategies for TL-based DNNs, enabling these models to adapt to new environments and different UWB configurations. Nkrow et al. [
20] provided a robust TL-based UWB NLOS identification method, leveraging CIR data collected from two separate environments. Park et al. [
21] introduced a TL framework for UWB NLOS/LOS recognition, employing MLP and CNN as classifiers in an unmeasured scenario. The developed approach delivers about 10% improvement in accuracy and accelerates the training process by nearly five times. Subsequently, scholars such as Wang [
22] found through their research that compared with transfer learning technology, the introduction of Domain-Adversarial Neural Network (DANN) can achieve higher NLOS identification accuracy. Wang et al. [
22] developed a novel enhanced DANN for recognizing UWB signal occlusion in dynamic environments. The developed NLOS recognition strategy utilizes CIR and manual CIR features. The experimental results show that in binary/multi-class classification and environmental transfer scenarios, the DANN-based NLOS identification accuracy is significantly superior to that of traditional Machine Learning (i.e., Binary Hypothesis Testing, SVM), deep learning (i.e., CNN), and basic transfer learning methods. Results demonstrate that the developed strategy attains over 97.36% accuracy in traditional LOS/NLOS binary classification across different scenarios, with multi-class classification accuracy reaching 97.07%. However, in [
22] the handcrafted features do not require additional processing by deep learning algorithms (such as convolutional layers or fully connected layers); instead, they are fused with CIR features solely through concatenation. In addition, the feature extractor employs four convolutional layers and four pooling layers in [
22]. While this method effectively extracts deep features of CIR signals, its ability to capture temporal information of signals in dynamic environments is limited.
To address this shortcoming, we develop a novel NLOS identification strategy based on a Domain-Adversarial Neural Network for UWB positioning systems. In the proposed NLOS identification strategy, the DANN framework is used to significantly boost its cross-domain adaptation capability, with the CNN-DAE-MLP-Attention (CDM) method utilized for feature extraction. Specifically, we use a 1D CNN to capture local and temporal features from the raw CIR. The Denoising Autoencoder (DAE) is adopted to denoise and reconstruct the original handcrafted features. The handcrafted features processed by DAE are fed into the MLP for further feature extraction. Then, the attention mechanism is used to fuse the features extracted by CNN and MLP. Finally, the fused features are fed to the label predictor and domain classifier, respectively; the former is for NLOS identification with supervised training, while the latter is employed for adversarial training to learn domain-invariant features by distinguishing the source and target domains. Numerous parameters need to be tuned in the NLOS identification process. However, manual parameter tuning is excessively time-consuming, and the trial-and-error process is prone to limitations imposed by experience, making it difficult for parameter combinations to cover the optimal solution. This compromises the performance stability and generalization capability of NLOS recognition tasks. Therefore, we introduce an automated hyperparameter tuning strategy for the proposed method using Bayesian optimization to resolve this series of problems. We also compare the proposed method with other methods in diverse environments. The key contributions of this paper are summarized as follows.
- 1.
We develop a novel strategy combining CDM with DANN (CDMD) for UWB LOS/NLOS signal identification. The proposed CDMD method not only has cross-scenario adaptability, but is also robust in identifying UWB NLOS signals.
- 2.
We propose a novel feature extraction approach: on the one hand, it automatically extracts CIR features using the CNN network; on the other hand, it performs deep processing on manually designed features through DAE and MLP. The proposed approach leverages the denoising and reconstruction mechanism of DAE to effectively filter out noise and interference from the manually extracted features by learning the mapping relationship that recovers authentic features from noisy ones, thereby enhancing the quality and robustness of the features.
The remainder of this article is organized as follows. In
Section 2, we describe the background theoretical knowledge regarding the devised approach. Then, we present the developed CDMD approach for NLOS identification in
Section 3. In
Section 4, we present a detailed discussion on the experimental setup and the analysis of the experimental results. Finally, the conclusions are presented in
Section 5.
3. Proposed NLOS/LOS Identification Method
DANN introduces the concept of domain-adversarial learning. The DANN employs adversarial training to explicitly minimize the distribution discrepancy between the source and target domains, thereby improving the generalization ability of the model to unseen scenarios [
22]. Based on the DANN framework [
26], we develop a novel CDMD approach to classify LOS and NLOS conditions.
Figure 1 presents the overall framework of the developed NLOS recognition method in this paper. The basic structure of the proposed method comprises three modules, namely, the label predictor, domain classifier, and feature extractor. In the developed identification strategy, the feature extractor module mainly captures high-level features from the input CIR data and handcrafted features. The label predictor employs the derived features for LOS/NLOS discrimination. The extracted features are first fed into the Gradient Reversal Layer (GRL), which acts as an identity transformation during forward propagation to preserve feature representations for the subsequent domain classifier. During backpropagation, the gradient of the domain classification loss is multiplied by a negative coefficient via the GRL, thereby inverting the gradient signal before it propagates back to the shared feature extractor. The GRL-enabled adversarial training forces the feature extractor to learn domain-invariant features, while the label predictor (trained only on source-domain samples) learns task-relevant features for NLOS/LOS classification. The role of the domain classifier is to distinguish whether the features are derived from the target domain or the source domain.
3.1. Feature Extractor
Using a single type of feature for NLOS identification makes it difficult to achieve high accuracy. To enhance LOS/NLOS identification performance, we employ both the measured CIR features and manually extracted features for NLOS identification. In this paper, we present a novel feature extraction framework leveraging CNN and DAE-MLP. The feature extraction architecture primarily consists of the CIR feature extractor, the handcrafted feature extractor and feature fusion module. The CIR feature extractor is designed to learn discriminative patterns from the CIR. It employs a 1D CNN for temporal feature extraction. The handcrafted feature extractor consists of DAE and MLP. Most existing feature extraction methods directly employ deep learning modules to process handcrafted features, or fuse handcrafted features with other types of features for NLOS identification. However, handcrafted features typically contain environmental noise and measurement errors, especially under NLOS conditions, which significantly degrade feature quality and recognition reliability. To effectively suppress noise interference in handcrafted features and learn more robust representations, the DAE is adopted in this paper to denoise and reconstruct the original handcrafted features. The handcrafted features processed by DAE are fed into the deep learning module for further feature extraction, thereby improving the accuracy and stability of NLOS identification. Specifically, the manually extracted features are used as the input of the DAE. The high-order nonlinear features generated by the DAE are fed into the MLP, which maps these features into an optimized feature space via multiple nonlinear activation functions (e.g., ReLU). Finally, we leverage an attention mechanism to perform weighted fusion of the features extracted by the CNN and DAE-MLP modules, generating a robust shared feature representation. Subsequently, this shared representation is fed into both a label predictor for NLOS identification and a domain classifier for domain classification.
3.1.1. CIR Feature Extractor
Since CNNs are effective at capturing local correlations and spatial patterns, they can extract local features of the CIR in the delay domain and are therefore widely used for NLOS classification tasks. 1D CNN are frequently employed for time series data, as they can perform convolutions directly along the temporal dimension to capture local temporal patterns. Therefore, this paper utilizes a 1D CNN to extract temporal features from CIR samples.
A typical CNN consists of three main layers, namely, convolutional layers, pooling layers, and fully connected (FC) layers [
13,
27,
28]. In this paper, the input data of the CNN is CIR data. The convolutional layer employs convolutional kernels (filters) to extract local features and capture local correlations within the data. Pooling layers are usually added after convolutional layers to reduce computational load while preserving key features. The FC layer integrates the features extracted by the convolutional and pooling layers, mapping the convolutional features to a low-dimensional vector whose length equals the number of categories. The convolutional layers and FC layers (except the final layer) are sometimes followed by a ReLU activation function. By applying the ReLU activation after the convolutional layers, nonlinearity is introduced, which helps mitigate the vanishing gradient problem and accelerates convergence. By employing ReLU after the intermediate fully connected layers, the nonlinear expressive capability of features prior to classification is enhanced.
3.1.2. Handcrafted Feature Extractor
We extract a series of handcrafted features from the CIR, however, measurement noise exists in the manual features. DAE can eliminate noise and redundant information from high-dimensional spaces [
29]. We employ DAE to filter noise from manual features through encoding–decoding reconstruction constraints. In addition, employing DAE enables the capture of nonlinear characteristics within manually extracted features, and their transformation into higher-order features with stronger discriminability. The DAE is trained by minimizing the reconstruction loss, which ensures the encoded features retain the critical information of the original static features. After DAE processing, the feature boundaries of NLOS/LOS become clearer when input into the MLP, thereby enhancing NLOS recognition accuracy.
The MLP learns the importance of different features, automatically assigning higher weights to more discriminative ones, while also capturing nonlinear interactions among features. An MLP comprises an input layer, hidden layers, and a final output layer [
21]. The input layer is designed to have a neuron count equal to the feature dimension. Typically, one to three hidden layers are used, as too many layers may lead to overfitting. The output layer dimension is consistent with the dimension of the high-dimensional feature vector. In our work, manually extracted features are first fed into a DAE to eliminate noise and redundant information, and the resulting low-dimensional representations are then input to the MLP, which outputs the final feature vectors for these preprocessed manual features.
3.1.3. Feature Fusion
In this paper, we consider two separate feature streams. One feature stream consists of the temporal dynamic features extracted by the CNN, with raw CIR data serving as the input to the CNN. These features characterize the intrinsic time-domain properties of the wireless channel. The other feature stream comprises the denoised handcrafted features optimized by the MLP. These two feature types possess distinct characteristics and information content. If we simply concatenate the features, we may fail to fully leverage their respective importance. To enhance the accuracy of NLOS identification, the attention fusion mechanism is designed to adaptively integrate two separate feature streams for NLOS/LOS classification. The attention fusion mechanism adaptively learns task-specific attention weights for each feature stream within two heterogeneous feature streams, rather than manually assigning fixed weights. First, it assigns a dedicated adaptive weight to the CNN-derived temporal dynamic feature stream and the MLP-optimized denoised handcrafted feature stream, respectively. These weights are learned to quantify the discriminative contribution of each feature stream to the NLOS/LOS classification task. Then, the two feature streams are concatenated in a weighted summation manner based on the learned adaptive weights. This adaptive weighting strategy highlights the more informative feature stream for NLOS/LOS discrimination while suppressing the irrelevant one, thereby enhancing the discriminability of the fused features. This approach fully leverages the statistical information from manual features and the dynamic information from temporal features.
3.2. Domain Classifier
The domain classifier mainly serves to discriminate whether features derived from the feature extractor belong to the target domain or the source domain. During training, the domain classifier updates its parameters by minimizing the domain classification loss (e.g., cross-entropy loss) for source- and target-domain discrimination. Meanwhile, a gradient reversal layer is inserted between the shared feature extractor and the domain classifier. This layer inverts the domain classification loss gradient by multiplying it with a negative coefficient during backpropagation before the gradient is propagated back to the feature extractor. This adversarial mechanism forms a minimax game between the feature extractor and the domain classifier, driving the feature extractor to capture domain-invariant and task-relevant features that the domain classifier cannot identify. This effectively aligns the feature distributions of the source and target domains, and improves the generalization ability of the model on the unlabeled target domain for NLOS recognition.
The target-domain samples have no LOS/NLOS labels and only carry domain labels, therefore only the domain classification loss is calculated for them in the training process. The loss function of the domain classifier is written as [
22]
where
represents the predicted probability of belonging to the 1 class for the
i-th sample, which is obtained from the second dimension of the softmax-normalized 2-dimensional network output. In domain classifier, 1 denotes the target domain and 0 denotes the source domain. In Formula (
17),
denotes the probability that the domain predictor predicts the source-domain feature
as the target domain.
denotes the probability that the domain predictor predicts the target-domain feature
as the target domain.
and
represent the numbers of training samples in the target domain and source domain, respectively.
3.3. Label Predictor
Based on the features generated by the feature extractor, the label predictor is designed to distinguish between LOS and NLOS UWB propagation states. The label predictor learns classification through features solely from the source domain and is updated by minimizing the label prediction loss (e.g., cross-entropy loss). In this work, the loss function utilizes cross-entropy between actual and predicted labels. During the training process, the parameters of the label predictor (such as the weights and biases parameters of the fully connected layer) and feature extractor are continuously updated to better distinguish between NLOS and LOS, relying on the features output by the feature extractor.
The training data
is given, where
denotes the number of source-domain training samples and
. In label predictor, the true label
is defined such that
indicates the LOS scenario and
denotes the NLOS scenario. In the case of two-class classification (LOS/NLOS), the loss function is written as [
22,
30]
where
denotes the number of training samples in the source domain,
denotes the true label of the
i-th sample,
denotes the predicted probability of belonging to the 1 class for the
i-th sample. In Formula (
18),
specifically represents the label predictor’s predicted probability of the
i-th sample belonging to the NLOS class, which is obtained from the second dimensional of the softmax-normalized 2-dimensional network output. During the prediction, for a given test sample
x, the output of the identification is performed using
. If
, it is categorized as LOS; otherwise, it is classified as NLOS. Note that the loss for the label predictor is calculated only on the source domain, as only the source domain possesses true label values for LOS/NLOS. Target-domain samples lack labels and do not contribute to the calculation of the label predictor loss.
To realize the domain-adversarial training and balance the task performance (NLOS classification) and domain generalization ability, the total loss function of the CDMD model is constructed by combining the classification loss of the label predictor, the domain classification loss of the domain classifier, and the reconstruction loss of DAE. The parameters
and
are used to adjust the weights for the domain-adversarial loss and the DAE reconstruction loss, respectively. The overall loss function is calculated by
where
denotes the label classification loss,
denotes the domain classification loss, and
represents a regularization parameter that prevents the trained network from overfitting.
denotes the weight of the DAE reconstruction loss and
denotes the DAE reconstruction loss and is calculated by
where
denotes the number of training samples in the source domain and
represents the numbers of training samples in the target domain.
represents the
i-th static feature vector from the source domain,
denotes the
j-th static feature vector from the target domain, and
denotes the DAE function that maps input features to reconstructed outputs.
represents the squared L2 norm.
During model training, the domain classifier is independently optimized by minimizing (with feature extractor and label predictor parameters fixed), enabling it to accurately discriminate domain origins. The label predictor and shared feature extractor are jointly optimized by minimizing the total loss (with domain classifier parameters fixed). This forces the feature extractor to simultaneously satisfy two conflicting objectives. These objectives are minimizing to preserve classification discriminability and maximizing to confuse the domain classifier, while the DAE module enhances static feature denoising via the reconstruction loss .
3.4. DANN Optimization Strategy
Manual hyperparameter tuning is not only time-consuming and labor-intensive but also prone to limitations imposed by human experience, making it difficult for parameter combinations to cover the global optimal solution. In addition, repeated testing of different hyperparameter values consumes substantial computational resources, reduces model iteration efficiency, and may even lead to unreasonable parameter configurations due to subjective judgment biases, thereby affecting the performance stability and generalization ability of the NLOS recognition. Therefore, we employed Bayesian optimization to determine the optimal combination of hyperparameters, comprising dropout rate, learning rate, the number of filters per layer in CNN, the number of convolution layers in CNN, the number of neurons in each layer of the MLP, and the number of hidden layers in MLP, etc. In this study, we adopted the prediction accuracy on the validation set as the objective function for the Bayesian optimization process.
First, we defined the search space of the hyperparameters to be optimized. The search space is configured as follows: the learning rates of the domain predictor, label predictor, and feature extractor are set to the range of [0.0001, 0.01]; the dropout rates are set to the range of [0.3, 0.5]. The Bayesian optimization process is conducted with the following settings: we set a batch size of 32, adopt Adam as the optimizer for all models, and fix the initial training epochs to 250. Based on the above fixed experimental settings, we specifically configured the Bayesian optimizer to perform a hyperparameter search by maximizing the objective function. We first performed a random search with 10 initial points to explore the hyperparameter space broadly, followed by 90 Bayesian optimization iterations for targeted search. In total, 100 hyperparameter combinations were sampled and evaluated. Each hyperparameter combination sampled by Bayesian optimization was trained independently with a fixed seed (42), and the validation accuracy was taken as the objective function value for that combination. The Bayesian optimization process terminated when the preset 100 Bayesian optimization iterations were completed.
After obtaining the optimal learning rates and dropout rates from the Bayesian optimization, we independently configured the network structure of the model. Specifically, we selected the optimal network structure by evaluating the model performance under different structure configurations with the optimized hyperparameters. The search space for the network structure was defined as discrete values: the CNN component of the feature extractor is set to 2 or 3 convolutional layers, with the number of filters in each convolutional layer selected from [32, 64, 128]; the MLP component of feature extractor is configured with 1 to 3 hidden layers, and the hidden layer dimensions are also selected from [8, 16, 32]. The final network structure was determined based on the highest validation accuracy achieved during the evaluation of these configurations.
Finally, we fine-tuned the optimized hyperparameters. We adopted a fixed-step learning rate decay strategy during training, where the learning rate was reduced by 30% every 120 epochs (starting from the optimal initial learning rate obtained from the first Bayesian optimization stage). The final optimal hyperparameter configuration, including the initial learning rates, dropout rates, network structure, and learning rate decay details, is presented in
Table 1 and
Table 2.