1. Introduction
With rapid technological advancements and continuous improvement of the public’s quality of life, the demand for indoor positioning technology is growing in various fields. Traditional outdoor positioning technologies, such as the global navigation satellite system (GNSS), can achieve the precise positioning of outdoor targets [
1]. However, positioning accuracy significantly decreases within enclosed spaces due to signal obstruction and multipath phenomena; thus, indoor positioning technology has emerged to address these challenges. Received signal strength indicator (RSSI) and angle of arrival (AoA) are two of the most frequently employed interior positioning methods.
AoA estimation holds a significant position in the field of array signal processing. Classical subspace-based methods have been widely studied, including multiple-signal classification (MUSIC) [
2] and the estimation of signal parameters via rotational invariance techniques (ESPRIT) [
3]. In 1996, Krim summarized two decades of advancements in array signal processing, analyzing the performance and applicability of classical algorithms such as MUSIC and ESPRIT in AoA estimation. Additionally, he contrasted their assets and weaknesses in various scenarios [
4]. Notably, the accuracy of these methods decreases in scenarios with low signal-to-noise ratios (SNRs) or limited snapshots. Similarly, RSSI is vital in the domains of wireless and indoor positioning. Classical RSSI-based positioning methods include the fingerprinting method, based on signal strength databases, and distance-based trilateration, which leverages path loss models for distance estimation. However, in complex indoor environments, the high variability of RSSI often leads to a decline in positioning accuracy. Researchers have proposed hybrid positioning systems that integrate these two methods to improve indoor positioning accuracy. While such fusion methods enhance accuracy somewhat, their performance remains constrained by various factors. For instance, in non-line-of-sight (NLoS) environments, AoA estimation errors increase significantly, and the variability of RSSI can result in inaccurate distance estimations [
5].
Deep learning and machine learning techniques have been widely applied to IPS based on AoA or RSSI [
6]. The latest deep learning models, such as artificial neural networks (ANNs), convolutional neural networks (CNNs), and graph neural networks (GNNs) [
7], have demonstrated their potential in extracting features and predicting positions in the field of indoor positioning. However, these advanced models still cannot entirely eliminate the impact of environmental factors. Traditional machine learning methods, such as k-nearest neighbors (KNNs) [
8], support vector machines (SVMs) [
9], and random forests (RFs) [
10], have also been applied to fingerprinting-based positioning.
With the rapid development of indoor positioning technologies, multi-method fusion positioning systems have gradually become a research hotspot. For example, deep learning-based WiFi fingerprinting can handle complex environments’ multipath effects and signal fluctuations [
11]. The multi-sensor fusion positioning approach for indoor mobile robots using the factor graph (MSF-FG) method, which integrates inertial measurement unit (IMU) and light detection and ranging (LiDAR) data, achieves high-precision, real-time positioning in dynamic environments [
12]. Additionally, indoor positioning schemes that fuse and collaborate WiFi and ultra-wideband (UWB) technologies have emerged [
13]. While these fusion techniques have improved positioning accuracy, they still face numerous challenges. On the one hand, complex environments can affect measurement results, leading to a decline in positioning performance [
14]. On the other hand, issues such as hardware costs, computational complexity, and the design of reasonable data fusion algorithms to fully leverage the advantages of each technology remain significant challenges [
15].
This paper proposes a deep learning-based positioning system that integrates AoA and RSSI. For data processing, the system applies the Kalman filter (KF) to reduce the angular error in AoA measurements and uses a median filter (MF) and moving average filter (MAF) to address the fluctuations in RSSI-based distance measurements, thereby minimizing the impact of signal variability on the final results. A CNN–multi-head attention (MHA) model is proposed in the deep learning network architecture to extract features from angular and distance information. The model dynamically modifies the weights of input features to mitigate the impact of environmental fluctuations on positioning outcomes.
The main contributions of this work are organized as follows:
(1) We employ a KF to enhance the stability of azimuth and elevation angles. At the same time, we use MF and MAF to improve the stability of RSSI signals and the accuracy of distance estimation.
(2) We propose the CNN-MHA model. Firstly, the CNN extracts features related to orientation and distance information from the input. Then, the output of the CNN is fed into the MHA layer, where the dynamic weight adjustment capability of MHA automatically shifts the focus based on the input features. This allows the model to effectively predict the position of the signal source even when environmental changes occur, or orientation and distance information becomes distorted. Finally, the fully connected layers (FC) layer maps the output features of the MHA layer to the final positioning results. During training, the backpropagation (BP) algorithm updates the entire network’s parameters by solving the loss function’s derivative, gradually optimizing the model’s performance. Additionally, batch normalization (BN) and Dropout layers are incorporated to enhance training stability and improve generalization capabilities, while the Leaky ReLU activation mechanism is applied to address the vanishing gradient difficulty.
(3) We present a novel indoor positioning system that leverages deep learning and Bluetooth technology, integrating RSSI and AoA characteristics to enhance accuracy and robustness. Experimental results demonstrate that the system can accurately predict the position of signal sources and exhibit strong resilience in complex environments.
The rest of this paper is organized as follows.
Section 2 introduces the related work.
Section 3 provides a detailed explanation of data processing and the deep learning network.
Section 4 presents the practical tests and simulation experiments. Finally,
Section 5 concludes the paper.
2. Related Works
IPS usually employs filters for data preprocessing, which is a crucial step due to the complexity of indoor environments and the noise in sensor data. These techniques help suppress noise, smooth signals, and fuse data from multiple sensors. Researchers can broadly categorize them into mathematical methods and machine learning-based approaches.
Mathematical methods include the KF, particle filtering, and moving average. Particle filtering is widely adopted in warehouse environments due to its nonlinear tracking capability, whereas KF and MAF are more prevalent in office settings where linear systems dominate. The researchers can use these filters to preprocess the data, improving the accuracy of the final distance calculations [
16]. For instance, in [
17], the authors utilized particle filtering technology to correct vehicle position measurement errors caused by magnetic disturbances, improving measurement accuracy by leveraging collected radio signals. In [
18], to enhance positioning accuracy, the researchers employed MAF as the primary algorithm while simultaneously utilizing the KF to fuse data obtained from UWB and IMU sensors. In [
19], the authors employed the KF to fuse GPS, IPS, and the Inertial Navigation System (INS) data. They also utilized the Extended Kalman Filter (EKF) to linearize the nonlinear model.
Machine learning includes decision trees, long short-term memory (LSTM), neural networks (NNs), CNN, SVMs, RFs, and KNNs. Among all environments, NNs and CNN are the most commonly used. Machine learning filters can enhance data accuracy and improve predictive performance in various applications [
20]. For example, in [
21], to identify incidents in the refrigerated warehouse, the authors proposed an unsupervised deep-learning neural network system that utilizes distance and vibration data for detecting conditions within cold storage. In [
22], the authors used the CNN to filter the input data. According to [
23], given the significant impact of NLoS and multipath propagation on indoor positioning, many researchers focused their studies on mitigating or eliminating these effects. Some researchers proposed utilizing the characteristics of NN and KF to correct the errors introduced by UWB and TDOA. The authors improved the accuracy and robustness of positioning by employing deep learning based on geometric fingerprinting methods in [
24], e.g., predicting initial data. Among various machine learning techniques, NNs were the most commonly used [
1].
In recent years, several scholars have integrated AoA and RSSI to improve the accuracy of indoor positioning and mitigate the impact of environmental factors. For example, the authors in [
6] combined AoA estimation and RSSI-based ranging with ANN. Unlike traditional ranging methods that measure the length of the direct signal path using time measurements, this method utilizes the signal propagation cycle. Successful experimental results have shown that accurate indoor distance measurement is feasible without requiring synchronization or broad signal bandwidth. In [
25], authors proposed a weighted fingerprint feature-matching algorithm based on AoA and RSSI to enhance positioning accuracy. During the fingerprint database construction phase, natural discontinuity classification was utilized to identify features as fingerprint measurements. RFs were then applied to optimize the weights assigned to each attribute. The final results showed substantial enhancements in accuracy compared to KF with four base stations (KF4BSs) and KNNs. In [
26], the authors first applied principal component analysis (PCA) to reduce the redundancy in RSSI measurements and used a KF to smooth AoA measurements. Subsequently, CNN was utilized for feature extraction, separately extracting deep features from RSSI and AoA measurements. The two features were then fused using a concatenation operation, followed by classification learning using a Softmax layer. Results demonstrated that this method outperformed several state-of-the-art techniques in terms of performance.
MHA is an important mechanism in deep learning and has been widely applied in natural language processing (NLP) and computer vision (CV). With the advancement of time, MHA has also been used in localization systems to enhance positioning accuracy. For example, in [
27], measures were adopted to understand better and evaluate the positioning performance of the GNSS and to reduce the impact of errors on positioning. The MHA mechanism and gating operation were incorporated into the multilayer perceptron model to dynamically choose and refine features, thereby improving the model’s capacity to comprehend input data. Comparative experiments showed that the proposed method’s root mean square error (RMSE) was 39.2% lower than the latest LSTM and 17% lower than the CNN. In [
28], the paper proposed an indoor localization algorithm combining MHA and practical channel state information (CSI). Through extensive experiments, the average positioning error of the algorithm was 0.71 m in the comprehensive office and 0.64 m in the laboratory.
Our proposed method integrates AoA and RSSI, utilizing KF to smooth the AoA orientation information while applying MF and MAF to process the RSSI distance information. For feature extraction, we employ CNN to extract input features and the MHA to dynamically select and weigh different input features, effectively mitigating the impact of distorted inputs on the results. During the training process, the BP algorithm is used to compute the gradient of the loss function and update the parameters of the entire network.
4. Experimental Results and Analysis
The experiments in this paper were conducted in a specific indoor area with dimensions of 8 m × 5 m. As illustrated in
Figure 6, the space is relatively enclosed to minimize external interference and ensure the accuracy of the localization experiments. This experiment used three sets of Bluetooth 5.1-based 4 × 4 dual-polarized antenna arrays, specifically the RB4191A antenna boards and SLWSTK6021A development boards, for AoA measurements. Three LAUNCHXL-CC26X2R1 development boards were employed as base stations to receive signals and measure RSSI for RSSI measurements, as shown in
Figure 7. During the experiment, the AoA and RSSI base stations were positioned in three corners of the room, with adjacent base stations spaced 5 m and 8 m apart. This arrangement ensured accurate signal source localization and comprehensive signal coverage.
The experiment was conducted via the following steps:
Set up the indoor experimental environment. As shown in
Figure 6, place the AoA antenna arrays and RSSI base stations at three corners and connect them to the same local area network for deployment.
Move a signal source within the experimental area while the PC records the real-time timestamp, azimuth angle, elevation angle, and RSSI values.
Process the collected data to create a dataset, dividing it into a training subset and a testing subset with a distribution of 9:1. After filtering the training data, train the deep learning model and evaluate the system’s effectiveness with the testing subset.
We tested the trained model in a real-world scenario by randomly selecting 100 positions for evaluation. The test results are shown in
Figure 8, where blue circles represent the actual positions of the signal source and red triangles indicate the predicted positions by the system.
Figure 8 shows that the predicted coordinates are very close to the actual coordinates in most cases.
We also compared the traditional AoA-based positioning method under the same testing conditions, the AoA+Deep Learning method, and the proposed RSSI+AoA+Deep Learning method. The traditional AoA method uses the MUSIC algorithm to calculate the corresponding azimuth and elevation angles from the I/Q data received by three base stations. This angle information, combined with the known coordinates of all the base stations, is utilized in conjunction with triangulation to determine the position coordinates of the signal source. The AoA+Deep Learning method entails inputting the three sets of azimuth and elevation angle data obtained from the base stations into a deep learning model to predict the location of the signal source. The experimental results are presented in
Table 2, where the first column shows the average error of the three methods, and the subsequent three columns illustrate the proportion of each positioning method within different error ranges. The proposed method achieved an average error of 0.29 m, with an accuracy of 93% for errors less than 0.4 m and 99% for errors less than 0.5 m. The results demonstrate that the proposed indoor localization method significantly improves positioning accuracy compared to the traditional and standalone AoA+Deep Learning methods.
To further evaluate the performance of the proposed method, we conducted experiments using the open-source Bluetooth 5.1 AoA+RSSI dataset released in [
34]. This dataset was collected in an open indoor environment covering 110 m
2 (13.8 m × 8 m), where four base stations were deployed in a rhombus layout around the room’s perimeter. The environment was covered by several Wi-Fi networks, mimicking a realistic indoor setting. Each base station was capable of synchronously measuring the azimuth and elevation angles of the signal source, the RSSI values in two polarization directions, and recording high-precision timestamps.
For the three-base-station model proposed in this paper, we removed the data from the east base station during preprocessing, retaining only the azimuth and elevation angles measured by the remaining three base stations. The RSSI values from the two different polarization directions at each base station were averaged, and the RSSI-based distance estimation formula was then used to calculate the relative distance between each base station and the signal source. Through this processing, we obtained a dataset compatible with our system, where each sample contains azimuth, elevation, and distance information from all three base stations for nine data points per sample.
We aim to compare the performance of different methods under various environments by generating controllable noise that conforms to the characteristics of the 2.4 GHz frequency band using MATLAB R2023a. First, we generate Gaussian white noise using MATLAB’s randn function, which exhibits independent and identically distributed properties and follows a standard normal distribution. Consequently, it ensures a uniform energy distribution in the frequency domain. The generated noise undergoes amplitude calibration to achieve the desired noise power level. Subsequently, band-limited noise is generated using a finite impulse response (FIR) bandpass filter with a cutoff frequency range of 2.4 to 2.5 GHz, and its spectral characteristics are verified through power spectral density analysis. Following this, the required noise power is calculated based on the target SNR, and the power-calibrated band-limited noise is proportionally added to the original dataset. To maintain the reproducibility of the experiments, a fixed random seed is employed during the noise generation process. This method allows precise and controllable noise injection within the 0 to 30 dB SNR range.
Figure 9 illustrates the relationship between the RMSE and the SNR, with SNR ranging from 0 to 30 dB. Under low SNR conditions (0–10 dB), the error of the RSSI-based triangulation method is significantly higher. In contrast, under high SNR conditions (20–30 dB), the RMSEs of all four methods decrease, with the deep learning method based on RSSI and AoA feature fusion consistently achieving the lowest error. Overall, the proposed method reduces the average RMSE by 0.49 m compared to the RSSI-based triangulation method, representing an improvement of 55.34%; by 0.21 m compared to AoA-based localization, representing an improvement of 34.52%; and by 0.079 m compared to the AoA+Deep Learning method, representing an improvement of 13.39%.
Figure 10 shows the variation in RMSE with height under an SNR of 5 dB for the four methods. We collected independent datasets at heights of 0 m, 0.3 m, 0.6 m, 0.9 m, 1.2 m, and 1.5 m and divided them into training and testing sets in a ratio of 9:1. We trained the deep learning model using the independent training set collected at each height and used the testing set to validate the RMSE performance of different positioning methods across the various heights. Within the height range of 0 to 0.6 m, the localization accuracy of all four methods is relatively low, with the RSSI-based triangulation method having the highest RMSE of 1.03 m. In contrast, the proposed method achieves the lowest RMSE of 0.63 m. This phenomenon is primarily attributed to signal quality degradation and enhanced multipath effects caused by ground reflections and obstacles. As the height increases beyond 0.6 m, the signal propagation path becomes more direct, and environmental noise interference decreases, leading to improved localization accuracy for all methods. Under the 5 dB SNR condition, the proposed method reduces the average RMSE by 0.33 m compared to the RSSI-based triangulation method, representing an improvement of 52.4%; by 0.19 m compared to AoA-based localization, representing an improvement of 30.6%; and by 0.06 m compared to the AoA+Deep Learning method, representing an improvement of 11.88%.
The deep learning techniques adopted in this paper include BN layers, Leaky ReLU activation functions, and Dropout layers, which offer advantages such as reducing network volatility, accelerating convergence, enhancing generalization ability, and improving accuracy. The reasons for selecting these components to construct the deep learning network are illustrated in
Figure 11 and
Figure 12. When using only the Dropout layer, the model’s accuracy is merely 11.3%; after introducing the BN layer, the accuracy significantly increases to approximately 60%; and with the combined use of BN layers, Dropout layers, and Leaky ReLU activation functions, the network structure achieves a further improvement in accuracy, reaching 97.3%.
During the network loss training process, the proposed model rapidly reduces the loss value to 0.12 within just 50 epochs and eventually converges to 0.035. In contrast, the model using only the Dropout layer achieves a minimum loss value of only 3.8; while the model combining BN layers and Dropout layers shows some improvements, its loss value decreases slower. The experimental results demonstrate that the proposed network structure significantly outperforms other component combinations in terms of accuracy and convergence speed, validating its effectiveness and superiority.
During the construction of the deep learning network, the impact of different optimizers on the accuracy of the network is shown in
Figure 13. The experimental results indicate that the model achieves the highest accuracy of 97.3% when using the RMSprop optimizer, reaching 86.7% in less than 200 epochs. In contrast, the final accuracy of the Adamax and AdaGrad optimizers is only around 85%, with slower convergence speeds. Furthermore, the SGD and AdamW optimizers perform poorly under this network framework and are unsuitable for this model.
Table 3 summarizes several recent indoor localization methods, which employ various techniques such as filters, machine learning, and sensor fusion, to achieve significant improvements in localization accuracy, robustness, and applicability. Specifically, Ref. [
35] proposed a novel algorithm for integrating indoor target positioning with communication using WiFi signals for the first time. This method utilized the received signal strength (RSS) values and channel state information (CSI) values for indoor positioning, ultimately achieving a positioning error of 0.25 m. However, this method was predicated on obtaining perfect CSI values; if the CSI values were inaccurate, the positioning would also be adversely affected. In [
36], the authors combined geomagnetic field strength and WiFi signals to obtain indoor localization information, offering high usability with an average error of 0.57 m. Ref. [
37] proposed an indoor positioning algorithm that employed a simulated annealing (SA) algorithm and a genetic algorithm (GA) optimized neural network (SAGA-BP) to enhance the accuracy of ZigBee indoor positioning, achieving an average error of 0.75 m. In [
38], the authors used CNN to process the RSS data constructed in image format, which resulted in an average error of 1.22 m. Ref. [
25] introduced a weighted fingerprint feature matching algorithm based on AoA and RSSI, combined with techniques such as RFs, achieving an average error of 0.42 m. Ref. [
39] proposed an indoor positioning algorithm utilizing an Adaptive Confidence-based Multi-Objective Optimization Evaluator (ACMOOE). This system enhanced positioning accuracy by adapting the impact of the two positioning techniques. The proposed ACMOOE’s positioning error was 0.45 m. Ref. [
40] estimated the position of Bluetooth Low-Energy (BLE) transmitters (tags) by utilizing the characteristics of signals received from multiple anchor points (APs); additionally, the Least Squares (LS) algorithm was employed to estimate the location accurately, leading to an average error of 0.7 m. Among the various indoor positioning techniques, the method proposed in this paper shows better performance regarding average error, significantly outperforming most of the compared methods.
Notably, the last three methods mentioned in
Table 3 shared similarities with the approach presented in this paper, as they were all based on the AoA and RSSI for indoor positioning. Specifically, Ref. [
25] employed a NB classification method to process AoA and RSSI data for dataset generation. This paper utilized KF to process AoA data in conjunction with MF and MAF for RSSI data. Regarding feature extraction and weighting, Ref. [
25] extracted feature values through NB classification and employed RFs to train feature weights. Additionally, using Multi-objective Optimization, Ref. [
39] established the error propagation relationship between RSSI and AoA measurement errors and positioning errors. Similarly, Ref. [
40] utilized CNN and this paper also employed CNN but introduced MHA to adjust feature weighting dynamically. In the positioning prediction phase, Ref. [
25] adopted an improved KNN algorithm, while Ref. [
39] employed Multi-Objective Particle Swarm Optimization (MOPSO) to obtain a Pareto optimal solution set for predicting the target position. Finally, while Ref. [
40] implemented positioning through the Least Squares method, this paper transformed the high-dimensional data, dynamically adjusted by MHA, into specific positions using the FC layer.
Table 4 shows the structural comparison of the three methods.