Next Article in Journal
Cam-Unet: Print-Cam Image Correction for Zero-Bit Fourier Image Watermarking
Previous Article in Journal
Monitoring of Curing Process of Epoxy Resin by Long-Period Fiber Gratings
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Method of UAV-Assisted Trajectory Localization for Forestry Environments

Department of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(11), 3398; https://doi.org/10.3390/s24113398
Submission received: 24 February 2024 / Revised: 6 May 2024 / Accepted: 21 May 2024 / Published: 25 May 2024
(This article belongs to the Section Navigation and Positioning)

Abstract

:
Global positioning systems often fall short in dense forest environments, leading to increasing demand for innovative localization methods. Notably, existing methods suffer from the following limitations: (1) traditional localization frameworks necessitate several fixed anchors to estimate the locations of targets, which is difficult to satisfy in complex and uncertain forestry environments; (2) the uncertain environment severely decreases the quality of signal measurements and thus the localization accuracy. To cope with these limitations, this paper proposes a new method of trajectory localization for forestry environments with the assistance of UAVs. Based on the multi-agent DRL technique, the topology of UAVs is optimized in real-time to cater for high-accuracy target localization. Then, with the aid of RSS measurements from UAVs to the target, the least squares algorithm is used to estimate the location, which is more flexible and reliable than existing localization systems. Furthermore, a shared replay memory is incorporated into the proposed multi-agent DRL system, which can effectively enhance learning performance and efficiency. Simulation results show that the proposed method can obtain a flexible and high-accuracy localization system with the aid of UAVs, which exhibits better robustness against high-dimensional heterogeneous data and is suitable for forestry environments.

1. Introduction

Forest fires can cause severe damage to animal and plant resources as well as the ecological environment, leading to soil erosion, significant economic losses, and even casualties. During the firefighting process, in order to achieve the most optimized firefighting strategy and plan the optimal escape routes, the location information of the firefighters is particularly crucial. Normally, a global navigation satellite system (GNSS) can provide satisfactory localization and navigation performance. However, in the forestry environment, the GNSS signals are highly vulnerable to environmental uncertainties, rendering the localization result unreliable, especially under emergency security scenarios [1,2].
In recent years, machine learning (ML) algorithms have been widely used [3,4], and reinforcement learning (RL), as a major branch of ML, has also attracted much attention. The rise of RL has brought great progress in the field of localization [4]. Dou et al. [5] proposed a hierarchical framework for two-dimensional localization in which DRL was used to continuously move and shrink the two-dimensional plane window until the target accuracy was achieved. This localization framework requires no prior knowledge of plane plans in the environment. Moreover, Dou et al. [6] extend the two-dimensional localization scheme into a hierarchical framework for 3D localization, which can provide more information and functionality in the IoT era. By constantly moving and shrinking the cube form, DRL is used to continuously divide the search space, starting from the whole building until the preset target position is reached. Moreover, Mohammadi et al. [7] proposed a semi-supervised Deep Reinforcement Learning (DRL) model in which the agent moves step by step in a grid area according to the designed actions until the target is accurately located. Similarly, Li et al. [8] proposed a localization model utilizing a novel reward function based on near-field conditions and the location of the wireless gateway, which is the first DRL localization approach without a site-survey process. The above methods are almost based on RSS technology, which is inaccurate in complex environments. In order to enhance localization accuracy using RSS, some researchers consider that RL can be assisted by Unmanned Aerial Vehicles (UAVs), which can measure the RSS of objects from multiple different angles. It also has a higher Line-of-Sight (LoS) probability. Testi et al. [9] used RSS as a localization signal source and the RL algorithm to find the best spatial configuration of UAVs to locate the target in an unknown environment. Afifi et al. [10] proposed a geometry-based localization algorithm based on 5G RSS measurements from four base stations for 3D UAV localization, which has the advantage of providing practical real-time calculation for localization problems compared with typical deep learning algorithms.
Existing UAV-assisted localization frameworks fall into the category of trilateration localization frameworks, the accuracy of which highly depends on the quality of signal measurements. Moreover, the RL-based UAV-assisted localization schemes suffer from the following issues: Traditional Q-learning algorithms store state-action data in Q-tables, which can only cope with low-dimensional state RL problems. Moreover, traditional RL algorithms are typically designed for single-agent systems, while in actual UAV-assisted localization problems, multiple-UAV formulations can expand the exploring space and promote multi-agent perception capability.
To address the above problems, we propose a multi-agent deep reinforcement learning (multi-agent DRL)-based trajectory localization framework for UAVs. Firstly, a least squares (LS) algorithm is employed to estimate the location of targets based on RSS measurements. As proved in the literatures [11,12], the localization accuracy of the LS estimator can approximate the Cramér–Rao lower bound. Then, we utilize the multi-agent deep Q-network (multi-agent DQN) scheme to navigate the UAVs to form a better topology, which can perform better localization for the target by autonomously getting rid of channel uncertainties. In the process of DQN training, we employ the labels of trajectory data to set reward functions and iteratively update the network parameters using gradient decent methods until convergence. In the simulation settings, we model different UAVs as agents with different environments and noise parameters, which corresponds with the device heterogeneity issues in RSS localization. The main contributions of this paper are summarized as follows:
  • For different existing localization systems [2,3,13,14,15,16,17], the proposed multi-agent DRL-based trajectory localization framework employs easy-deployed UAVs as the signal anchors and eliminates the requirements for several pre-deployed anchors with fixed locations, which is more feasible for the complicated and changeable forestry environments, especially in the emergency rescue process.
  • To cope with the environmental uncertainty and heterogeneity among agents, which severely degrade the localization performance, the proposed trajectory localization method utilizes the multi-agent DRL technique to automatically navigate the UAVs to form an optimal topology in real-time, allowing higher-accuracy localization for the targets.
  • Moreover, by developing a shared replay memory for multi-agent interactions, the complementary information among agents can be utilized to enhance learning efficiency and performance, which contributes to superior and robust localization performance.

2. Preliminaries and Problem Formulation

Assume that the location of the target to be positioned is x = [ x , y ] T , the location of the i UAV equipped with sensors is x l = [ x l , y l ] T , l = 1 , 2 , , L . L 3 represents the number of sensors. x l is known as prior information. The measurement information of these drones and targets, as well as the position information of these sensors, is expected to be used to work out the actual location of the target.
The received signal strength (RSS) is the average received power, which is widely employed in many fields [18,19] by virtue of its easy availability. It is generally assumed that the signal propagation follows an exponentially decayed path loss model, which is a function of the transmit-receive distance, path loss factor, and transmitted power. RSS localization has a lower implementation cost than TOA/TDOA localization because it does not require time synchronization between transmit, receive, and receiver. As long as the distance between the transmitting and receiving base stations is estimated, the position can be solved using the trilateral localization of the TOA.
Assume that the transmitting power is P t , the receiving power of the i th UAV P r , i can be expressed as follows:
P r , i = K i P t x x i 2 α , i = 1 , 2 , , M ,
where K i is the receiving-transmitting gain depending on the height, gain of the antenna, and α [ 2 , 5 ] is the path loss element. Empirically, α = 2 in a free-space propagation environment. Equation (1) can be rewritten in the following logarithm form:
ln ( P r , i ) = ln ( K i ) + ln ( P t ) α ln ( d i ) + n R S S , i ,
where n R S S , i is a zero-mean Gaussian noise and σ R S S , i 2 is the variance. d i denotes the distance between the target and the i th sensor, which can be calculated as follows:
d l = ( x x l ) 2 + ( y y l ) 2 .
With the definition of RSS as follows: r R S S , i = ln ( P r , i ) ln ( K i ) ln ( P t ) , Equation (2) can be expressed as follows:
r R S S , i = α ln ( d i ) + n R S S , i , i = 1 , 2 , , L .
For notation conciseness, it can be further written in the following vector form:
r R S S = f R S S ( x ) + n R S S ,
where
r R S S = [ r R S S , 1 , r R S S , 2 , , r R S S , L ] T ,
n R S S = [ n R S S , 1 , n R S S , 2 , , n R S S , L ] T ,
f R S S ( x ) = p = α [ ln ( ( x x 1 ) 2 + ( y y 1 ) 2 ) ln ( ( x x 2 ) 2 + ( y y 2 ) 2 ) ln ( ( x x L ) 2 + ( y y L ) 2 ) ] .
The goal of trajectory localization is to estimate the location of the target in real time based on the RSS measurements. Traditional localization schemes [11,20,21] utilize linear least squares (LLS), weighted linear least squares (WLLS), or other regression methods to estimate x = [ x , y ] T . The position of the sensor is fixed and known. However, this assumption is difficult to meet in the post-disaster rescue environment because the fire may cause drastic environmental changes at any time, which will seriously interfere with the reliability of communication and sensing equipment. Therefore, the use of pre-deployed sensor networks to provide location services is not reliable. Moreover, the topology of UAVs severely restricts the localization accuracy of targets. For example, if the RSS measurements from some UAVs are blocked by barriers like trees or walls, the localization performance using LLS may be terrible. In order to solve these problems, this paper adopts a UAV equipped with RSS sensors, which can effectively build a flexible sensor network and provide an observation platform, allowing for the formation of the optimal UAV topology in pursuit of high-precision localization of targets. The goal of this paper is to predict the movement trajectory of users in real time based on the RSS data collected by UAVs, so as to ensure the safety of personnel and assist in the subsequent rescue work. The main architecture of our localization system is given in Figure 1.

3. Proposed Multi-Agent DQN-Based Method

In this section, we elaborate on the UAV-assisted positioning procedure. Firstly, in order to find the optimal UAV topology for accurate localization, we model the positioning framework as a Markov decision process. A MDP system consists of four key components: state space S , action space A , reward function r R , and state transition probability p ( s t + 1 | s t , a t ) , where s t S and a t A represent the state and action of the agent at time t , respectively. The objective of MDP is to find an optimal policy to maximize the expected accumulated reward R t = i = 1 ( γ i r t + i ) , where r t + i is the reward at time t + i and γ i [ 0 , 1 ] is the discount factor. The MDP in this paper is modeled as follows:
State space: In the proposed MDP model, the state is composed of four parts: (1) L agents with known coordinates x 1 , x 2 , , x L ; (2) n history actions (each action is encoded by a one-hot code, where the encoded bit is the dimension of the action); (3) the RSS sequences obtained by UAVs from targets along a trajectory; (4) a mark that judges whether the target is inside the localization region.
Action space: We split the localization region into equally spaced grids, and the action space of each agent consists of nine actions, i.e., staying at the same grid and moving toward north, south, west, east, northwest, northeast, southwest, and southeast for a grid. In each step of movement, the agent takes an action from the action space.
Reward function: The reward function is designed based on the localization accuracy, which is measured by the estimated error d = | | x x ^ | | between the estimated location and actual location, where x ^ denotes the estimated location using LLS\WLS methods. If the estimated error is undesirably larger than a predefined threshold, the current topology of UAVs is not beneficial for accurate localization, and a penalty should be given. In contrast, a relatively small, estimated error indicates that the localization performance is acceptable. Hence, a reward should be given, and the smaller d is, the bigger the reward should be set. On the other hand, based on the near-field condition, i.e., a strong RSS value can ensure a short distance between agents and the target, we give the agent a reward or penalty if and only if the average distance between agents and the target is smaller than a pre-defined threshold d 0 . If the estimated location is within the threshold scope, we give the agent a positive reward equal to the inverse of the reciprocal of the estimated error. In contrast, we give the agent a penalty if the estimated location is outside the threshold, which is the negative value of the estimated error. The agent receives no reward or penalty if the average distance between agents and the target is greater than d 0 . The reward function is computed as follows:
r = { 1 d , d n d 0 & d < d t h d , d n d 0 & d d t h 0 , d n > d ,
where d n is the average distance between 3 agents and the target, and d t h denotes the distance threshold for location estimates.
The positioning framework proposed in this paper assumes communication between the UAV and the target, as shown in Figure 2. The model uses the least squares model to estimate the location of the target and then uses the multi-agent DRL algorithm to navigate the UAVs to autonomously form the optimal topology. The RSS in the environment is utilized when estimating the location of the target and the UAV, and their label location information is used in the calculation of the reward function in the multi-agent DRL algorithm in the training process. In the process of algorithm execution, it is assumed that the target moves forward first; the UAV can measure the RSS of the target at the moment by itself and estimate the target position according to the LLS/WLLS algorithm. At time t 1 , each UAV moves forward by taking an action based on the trained DQN in order to find a better placement for target localization.
As shown in Figure 2, this paper adopts the Deep Q Network (DQN) model to solve the above problems. DQN is a reinforcement learning optimization method in Q-learning. The goal of Q-learning is to solve the following functions:
Q : s t Q ( s t , a t ; θ ) ,
where θ is the action-value model parameter, which is used to map the state of the input to the decision of the output. This function can be used to calculate the cumulative expected reward for taking action a t in state s t . With the aid of Equation (10), we can obtain the maximization strategy in the current state:
π ( s ) = argmax a Q ( s , a ) .
In traditional Q-learning, the Q function can be calculated by the Q matrix method, but in this task, due to the uncertainty and complexity of the localization task, it is difficult to use limited state space to model this continuous problem. Therefore, deep networks are used in this paper to approximate the function in this continuous space, namely DQN.
In the training of DQN, the input is the current state, and the output includes the reward values for each possible action and the state for the next step. We save this result in an experience replay memory D . At each step, a small batch of data is randomly drawn from D to calculate the loss function and then update the parameters in the DQN. Generate an experience replay memory of capacity N e p when initializing the model, and then store each experience sample. When the number of experiences in the playback pool reaches the threshold N s t , a batch of N m b samples is randomly selected to train the network. At the same time, the epsilon-greedy policy is used to select the action in the current state. This strategy balances knowledge based on the DQN model (development) and, by trying out new behaviors (exploration), maximizing the reward for acquiring new knowledge. The explored factor ϵ decays linearly from an initial value ϵ 0 until a minimum value ϵ f is reached. For each experience sample, the following loss function is calculated as follows:
L ( θ ) = E [ ( y j Q ( ϕ ( s ) , a ; θ j ) ) 2 ] ,
where E [ · ] represents the expected value for , and y j is the target value calculated by the following:
y j = r j + 1 + γ max a Q ^ ( ϕ ( s j + 1 , a ; θ ) ) .
Based on this loss function, DQN is trained using the stochastic gradient descent (SGD) method. We summarize the training process of the proposed multi-agent DQN-based trajectory localization framework in Algorithm 1.
Algorithm 1. Training process for the UAV-assisted trajectory localization framework
Input: The RSS sequence, the trajectory of the UAVs, and the trajectory for the target.
Output: DQN parameters.
1: Initialize the model parameters, environment, space, and experience replay memory.
2: for episode in 1 to M do:
3:      for each trajectory do:
4:           Utilizing the LLS/WLLS scheme to estimate the initial location for the target
5:           for t in 1 to Tmax do:
6:                for agent in 1 to L do:
7:                     Select an action using epsilon-greedy policy
8:                     Execute the action and obtain reward r, next state s
9:                     Navigate the UAV itself to next placement
10:                   Estimate the location for the target in time t using LLS/WLLS
11:                   Store the experience into the replay memory
12:                   Randomly select a batch from the replay memory
13:                   Using Equations (12) and (13) to update DQN
14:              end for
15:         end for
16:    end for
17: end for
After training the DQN, the UAVs can autonomously navigate themselves to the optimal placement for target localization, and then, by using LLS/WLLS methods, an accurate and robust localization result can be obtained.
In order to comprehensively evaluate the proposed algorithm, we further analyze its complexity subsequently. Notably, we mainly focus on the computational complexity of the online UAV-assisted localization process rather than the training process that typically takes place on computation-intensive central servers or simulation platforms. Firstly, we assume that the DQN, our algorithm, is composed of basic, fully connected layers, such that the computational complexity of Algorithm 1 mainly depends on the size of the neural network and the learning rate of the agent. For each execution time, the complexities of operating the DQN are O ( N S N H + N H N A ) , with N S , N H , and N A denoting the dimensions of the state space, the hidden layer of the DQN, and the action space, respectively. For a system consisting of L agents, the policy is operated L × T m a x times to optimize the Q value in Equation (13). After the UAVs form an optimal topology, LLS/WLLS methods will be executed to localize the target, which costs a complexity of O ( L 3 + L 2 ) for two-dimensional localization. Hence, the execution complexity of the proposed method is O ( L T m a x N H ( N S + N A ) + L 3 ) . Note that in our work L = 3 , T m a x 20 , N A = 9 , and N S 20 , which makes the executing complexity acceptable for modern digital processers.

4. Simulation Results and Analyses

4.1. Dataset Description

The data in this simulation mainly consist of trajectory data and RSS data. The trajectory data further include target trajectory data (used for the computation of the reward function and the evaluation of localization performance) and the agent’s trajectory data (used for the computation of the reward function). The RSS data contain the RSS data received by the agent at moment t and the location of the target at t 1 moment. The trajectory data meet the condition that UAVs do not collide with each other and their respective distances from the target do not exceed a certain limit d 0 .
With the aid of path loss models [22], the RSS value r l from the l th sensor is generated as follows:
r l = α ln ( d l ) + n l , l = 1 , 2 , 3 ,
where α is the path loss exponent (PLE), which depends on the multi-path properties in a certain environment and ranges from 1 to 5. Empirically, the PLE satisfies 2 α 5 and 1 α < 2 under outdoor and indoor scenarios, respectively. In the free space, we set α = 2 . d l denotes the Euclidean distance between the l th sensor and the signal source. n l is a random variable describing the path loss and can be expressed as follows:
n l = 0.1 ln ( 10 ) w l .
Without loss of generality, w l can be modeled as a zero-means Gaussian variable with a known variance modeled as follows:
λ l 2 = 0.01 ( ln ( 10 ) ) 2 σ l 2 , l = 1 , 2 , , L ,
where σ l 2 is the variance value and is known to us. Typically, we assume that small-scale fading can be ignored. Hence, we set α = 2 , σ = 1 , α = 1.6 , σ = 6 , and α = 1.9 , and σ = 6 to simulate heterogeneous situations, which can be referred to in the literature [21].

4.2. Environmental Setting

The involved parameters in the training of the proposed multi-agent DLR algorithm are listed in Table 1.

4.3. Evaluation of Localization Performance

4.3.1. Training Process

The normalized loss of multi-agent DQN in the training process is presented in Figure 3, where the loss monotonically decreases and converges to zero, validating the convergence of DQN.
The trend in the normalized cumulative reward value during the training process of the model is shown in Figure 4. The normalized cumulative reward value fluctuates significantly. However, as can be seen from this figure, it generally shows an increasing trend, which is consistent with the objective of DRL training. And finally, the reward converges after about 150 episodes, which lays a solid foundation for the subsequent localization process.

4.3.2. Testing Process

In the testing process, we generate simulation data under different noise and environmental conditions. We utilize 100 traces, each of which consists of 100 steps for simulation, and 3 performance indices to evaluate the trajectory estimation performance, i.e., average localization error (ALE), root mean square error (RMSE), and minimum localization error (MLE) [23], which are calculated as follows:
A L E = Σ d i n , R M S E = Σ d i 2 n , M L E = min d i ,
where d i is the normed error between the ith estimated location and the ith actual location and n is the number of testing samples.
In the testing process, we chose LLS, WLLS, and improved LLS, as typical localization schemes assisted by our proposed multi-agent DRL technique, while trilateration [11] represents a classical geometry-based method without the proposed optimization procedures. The testing performance of different localization models under different environmental settings is presented in Table 2.
After analyzing the simulation results in Table 2, we can draw the following conclusions. This model is significantly affected by noise and environmental changes. In free space and under the condition that the distribution of the RSS data follows the standard normal distribution, the ALE, RMSE, and MLE of 100 trajectory data are commonly small using LLS, WLLS, or improved LLS algorithms compared to other conditions. Specifically, under these conditions, the errors for the multi-agent DQN algorithm, LLS, WLLS, and improved LLS algorithms are 3.439 m, 3.503 m, and 3.410 m, respectively, while the error for trilateration is 4.419 m. With the standard deviation of RSS data distribution being constant, the closer the environmental conditions are to free space, the better the performance of the algorithms.
To more intuitively compare the performance of the algorithm under different noise and environmental conditions, as well as with different LS-solving methods, we present the positioning errors of 100 test trajectories in the form of kernel density distribution across various dimensions. From Figure 5, we can clearly observe the distribution density of the model’s positioning errors across different noise and environmental conditions. Under the same noise and environmental conditions, a similar distribution of ALEs can be found, which conforms to the results in Table 2.
Note that in practical applications in the forestry environment, the data variation problem due to the forest density and terrain features is inevitable, which greatly affects the distribution of received RSS measurements. Hence, we subsequently evaluate the localization performance with heterogeneous data, i.e., simulation with different noise conditions and environmental conditions. Specifically, we divide the data into seven groups based on different values of α and σ , train corresponding models, and test the results. Suppose UAV1 receives RSS simulation data from the target corresponding to conditions α 1 and σ 1 , UAV2 corresponds to α 2 and σ 2 , and UAV3 corresponds to α 3 and σ 3 .
Firstly, we use the form of kernel density to show the distribution differences of RSS data under different noise and environmental conditions, as shown in Figure 6. The first group of RSS (obtained under conditions σ = 1 and α = 2 ) is assumed to be obtained in free space, and its RSS value distribution is relatively concentrated compared to the other two groups, with smaller overall distribution differences. The other two groups of RSS, since they have the same standard deviation, both with σ = 6 , but slightly different α values, have similar kernel density distributions. The standard deviations of these two groups of RSS are larger than the first group, resulting in greater differences in RSS value distributions and less concentrated values. Therefore, we will combine these data sets with significant RSS value distribution differences to discuss the performance of the algorithm proposed in this paper under heterogeneous situations, corresponding to the common device heterogeneity problem in RSS localization. The localization error evaluations of heterogeneous data are shown in Table 3.
As can be seen from Table 3, the algorithm of the model shows different performance under various least squares (LS)-solving methods, as well as under different noise and environmental conditions. It can be concluded that different noise and environmental conditions are suitable for different LS-solving methods. The linear least squares (LLS) algorithm performs best in the third group of heterogeneous data, specifically when the noise and environmental condition parameters for UAV1 are set to α 1 = 2 and σ 1 = 1 , and those for UAV2 and UAV3 are set as α 2 = 1 . 6 and σ 2 = 6 . Under these conditions, the average positioning errors and RMSE for the multi-agent DQN and LLS algorithm are 4.611 m and 3.816 m, respectively. The weighted linear least squares (WLLS) algorithm performs best in the second group of heterogeneous data, specifically when the noise and environmental condition parameters for UAV1 and UAV2 are set to α 1 = 2 , σ 1 = 1 , and those for UAV3 are set to α 3 = 1 . 9 , σ 3 = 6 . Under these conditions, the average positioning errors and RMSE for the multi-agent DQN and WLLS algorithms are 3.990 m and 3.429 m, respectively. The improved LLS algorithm performs best in the first group of heterogeneous data, specifically when the noise and environmental condition parameters for UAV1 and UAV2 are set as α 1 = 2 and σ 1 = 1 , and those for UAV3 are set as α 2 = 1 . 6 and σ 2 = 6 . Under these conditions, the average positioning errors and RMSE for the multi-agent DQN and improved LLS algorithms are 4.343 m and 3.535 m, respectively.
The kernel density distributions of the average positioning error under different heterogeneous data are depicted in Figure 7, Figure 8 and Figure 9:
From the above figures, we can draw similar conclusions to Table 2. Furthermore, by analyzing the trend of the kernel density distribution curves in the figures, it can be observed that for different groups of heterogeneous data, the change in trend of the kernel density distribution curves of the LLS algorithm is not significant, while that of the WLLS algorithm shows greater variation, and the improved LLS algorithm exhibits the largest variation in trend. Therefore, it can be inferred that in the testing simulations with heterogeneous data, the LLS algorithm is the most stable in performance, followed by the WLLS algorithm, with the improved LLS algorithm being the least stable.

4.3.3. Performance Comparison between the Proposed Scheme and the Traditional Trilateration Scheme

We use the trilateration method as a comparison to evaluate the average positioning error under different environmental and noise conditions. As illustrated in Figure 10, it is evident that the method proposed in this paper has significantly better adaptability to environmental changes compared to the trilateration method. In a quantitative analysis, the multi-agent DQN and three different solving methods of the LS algorithm, under simulation conditions with relatively high noise levels (where values are least close to free space in three sets of conditions), incur an average positioning error of 4.971 m, 5.159 m, and 4.879 m, respectively. In contrast, the average positioning error for the trilateration method is 9.225 m.
To more clearly demonstrate the advantages of the algorithm model proposed in this paper, we compared it with the trilateration method, evaluating the positioning error of the algorithm under device heterogeneity, as shown in Figure 11. The kernel density distribution of the average positioning error of the method proposed in this paper is significantly better than the traditional trilateration method under heterogeneous device conditions. As can be seen from the figure, both the trilateration method and the method proposed in this paper show a normal distribution of the average positioning error for each trajectory under heterogeneous data. The average positioning errors of the multi-agent DQN algorithm combined with LLS, WLLS, and improved LLS are mainly concentrated between 2.5 m and 7.5 m, with densities of 0.914, 0.907, and 0.954, respectively. In contrast, the average positioning error of the background technique mainly concentrates between 5 m and 10 m, with a density of 0.85. This indicates that under device heterogeneity, the average positioning error of the method proposed in this paper has a probability of over 90% being distributed between 2.5 m and 7.5 m, while the trilateration method has an 85% probability of being distributed between 5 m and 10 m. These results further validate the effectiveness and superiority of the proposed positioning scheme.

5. Conclusions

In this paper, we propose a UAV-assisted multi-agent DRL scheme to provide accurate location information for forestry environments, where GNSS signals are typically unstable or unavailable. Notably, the proposed positioning scheme avoids the need for fixed anchor points by using UAVs to provide ranging information, which is much more flexible and easier to deploy in forestry environments. Moreover, considering environmental uncertainty and equipment heterogeneity, we utilize the multi-agent DRL method to automatically navigate the UAVs to form an optimal topology for target localization and then estimate the target location with the aid of the LLS/WLLS algorithm. In addition, we incorporate a shared experience replay memory for multi-agent DRL to enhance the training performance and efficiency of different UAVs. Simulation results validate the effectiveness of the proposed UAV-assisted multi-agent DRL as an effective positioning solution for forestry environments.

Author Contributions

Methodology, X.G.; Software, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant (No. 62171086).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Guo, X.; Nirwan, A.; Hu, F.; Shao, Y.; Elikplim, N.R.; Li, L. A survey on fusion-based indoor positioning. IEEE Commun. Surv. Tuts. 2019, 22, 566–594. [Google Scholar] [CrossRef]
  2. Szrek, J.; Trybała, P.; Góralczyk, M.; Michalak, A.; Ziętek, B.; Zimroz, R. Accuracy evaluation of selected mobile inspection robot localization techniques in a GNSS-denied environment. Sensors 2020, 21, 141. [Google Scholar] [CrossRef] [PubMed]
  3. Guo, Y.; Guo, Z.; Wang, Y.; Yao, D.; Li, B.; Li, L. A survey of trajectory planning methods for autonomous driving—Part I: Unstructured scenarios. IEEE Trans. Intell. Veh. 2023, 1–29. [Google Scholar] [CrossRef]
  4. Zhao, C.; Chu, D.; Deng, Z.; Lu, L. Human-like decision making for autonomous driving with social skills. IEEE Trans. Intell. Transp. Syst. 2024, 1–16. [Google Scholar] [CrossRef]
  5. Dou, F.; Lu, J.; Wang, Z.; Xiao, X.; Bi, J.; Huang, C.-H. Top-down indoor localization with Wi-fi fingerprints using deep Q-network. In Proceedings of the 2018 IEEE 15th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), Chengdu, China, 9–12 October 2018; pp. 166–174. [Google Scholar]
  6. Dou, F.; Lu, J.; Zhu, T.; Bi, J. On-device indoor positioning: A federated reinforcement learning approach with heterogeneous devices. IEEE Internet Things J. 2023, 11, 3909–3926. [Google Scholar] [CrossRef]
  7. Mohammadi, M.; Al-Fuqaha, A.; Guizani, M.; Oh, J.-S. Semisupervised deep reinforcement learning in support of IoT and smart city services. IEEE Internet Things J. 2017, 5, 624–635. [Google Scholar] [CrossRef]
  8. Li, Y.; Hu, X.; Zhuang, Y.; Gao, Z.; Zhang, P.; El-Sheimy, N. Deep reinforcement learning (DRL): Another perspective for unsupervised wireless localization. IEEE Internet Things J. 2019, 7, 6279–6287. [Google Scholar] [CrossRef]
  9. Testi, E.; Favarelli, E.; Giorgetti, A. Reinforcement Learning for Connected Autonomous Vehicle Localization via UAVs. In Proceedings of the 2020 IEEE International Workshop on Metrology for Agriculture and Forestry (MetroAgriFor), Trento, Italy, 4–6 November 2020; pp. 13–17. [Google Scholar]
  10. Afifi, G.; Gadallah, Y. Autonomous 3-D UAV Localization Using Cellular Networks: Deep Supervised Learning Versus Reinforcement Learning Approaches. IEEE Access 2021, 9, 155234–155248. [Google Scholar] [CrossRef]
  11. So, H.C.; Lin, L. Linear least squares approach for accurate received signal strength based source localization. IEEE Trans. Signal Process. 2011, 59, 4035–4040. [Google Scholar] [CrossRef]
  12. Guo, X.; Li, L.; Xu, F.; Ansari, N. Expectation maximization indoor localization utilizing supporting set for Internet of Things. IEEE Internet Things J. 2018, 6, 2573–2582. [Google Scholar] [CrossRef]
  13. Si, H.; Guo, X.; Ansari, N. Multi-agent interactive localization: A positive transfer learning perspective. IEEE Trans. Cogn. Commun. Netw. 2024, 10, 553–566. [Google Scholar] [CrossRef]
  14. Miao, Q.; Huang, B.; Jia, B. Estimating distances via received signal strength and connectivity in wireless sensor networks. Wirel. Netw. 2020, 26, 971–982. [Google Scholar] [CrossRef]
  15. Tarrio, P.; Bernardos, A.M.; Besada, J.A.; Casar, J.R. A new positioning technique for RSS-based localization based on a weighted least squares estimator. In Proceedings of the 2008 IEEE International Symposium on Wireless Communication Systems, Reykjavik, Iceland, 21–24 October 2008; pp. 633–637. [Google Scholar]
  16. Chitte, S.D.; Dasgupta, S.; Ding, Z. Source localization from received signal strength under log-normal shadowing: Bias and variance. In Proceedings of the 2009 2nd International Congress on Image and Signal Processing, Tianjin, China, 17–19 October 2009; pp. 1–5. [Google Scholar]
  17. Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef] [PubMed]
  18. Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
  19. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
  20. Patwari, N.; Hero, A.O.; Perkins, M.; NCorreal, S. Relative location estimation in wireless sensor networks. IEEE Trans. Signal Process. 2003, 51, 2137–2148. [Google Scholar] [CrossRef]
  21. Chiu, W.Y.; Chen, B.S.; Yang, C.Y. Robust relative location estimation in wireless sensor networks with inexact position problems. IEEE Trans. Mob. Comput. 2012, 11, 935–946. [Google Scholar] [CrossRef]
  22. Zanca, G.; Zorzi, F.; Zanella, A.; Zorzi, M. Experimental comparison of RSSI-based localization algorithms for indoor wireless sensor networks. In Proceedings of the Workshop on Real-world Wireless Sensor Networks, Glasgow, UK, 1 April 2008; pp. 1–5. [Google Scholar]
  23. Deng, Z.; Chu, D.; Wu, C.; Liu, S.; Sun, C.; Liu, T.; Cao, D. A probabilistic model for driving-style-recognition-enabled driver steering behaviors. IEEE Trans. Syst. Man Cybern. Syst. 2020, 52, 1838–1851. [Google Scholar] [CrossRef]
Figure 1. Main architecture of the proposed UAV-assisted positioning method.
Figure 1. Main architecture of the proposed UAV-assisted positioning method.
Sensors 24 03398 g001
Figure 2. Architecture of the multi-agent DQN.
Figure 2. Architecture of the multi-agent DQN.
Sensors 24 03398 g002
Figure 3. Loss convergence trend.
Figure 3. Loss convergence trend.
Sensors 24 03398 g003
Figure 4. Changing trend of normalized reward value during training iteration.
Figure 4. Changing trend of normalized reward value during training iteration.
Sensors 24 03398 g004
Figure 5. Kernel density distribution diagram of the average localization error of isomorphic data with different environments and noise.
Figure 5. Kernel density distribution diagram of the average localization error of isomorphic data with different environments and noise.
Sensors 24 03398 g005
Figure 6. Kernel density distribution diagram of the RSS value with different noise and environmental conditions.
Figure 6. Kernel density distribution diagram of the RSS value with different noise and environmental conditions.
Sensors 24 03398 g006
Figure 7. Kernel density distribution diagram of the average localization error of multi-agent DQN with the LLS algorithm with heterogeneous data.
Figure 7. Kernel density distribution diagram of the average localization error of multi-agent DQN with the LLS algorithm with heterogeneous data.
Sensors 24 03398 g007
Figure 8. Kernel density distribution diagram of the average localization error of multi-agent DQN with the WLLS algorithm with heterogeneous data.
Figure 8. Kernel density distribution diagram of the average localization error of multi-agent DQN with the WLLS algorithm with heterogeneous data.
Sensors 24 03398 g008
Figure 9. Kernel density distribution diagram of the average localization error of multi-agent DQN with the improved LLS algorithm with heterogeneous data.
Figure 9. Kernel density distribution diagram of the average localization error of multi-agent DQN with the improved LLS algorithm with heterogeneous data.
Sensors 24 03398 g009
Figure 10. Average localization error of multi-agent DQN with different LS algorithms in different environments and noise conditions.
Figure 10. Average localization error of multi-agent DQN with different LS algorithms in different environments and noise conditions.
Sensors 24 03398 g010
Figure 11. Kernel density distribution diagram of the average localization error of multi-agent DQN with different LS algorithms and heterogeneous data.
Figure 11. Kernel density distribution diagram of the average localization error of multi-agent DQN with different LS algorithms and heterogeneous data.
Sensors 24 03398 g011
Table 1. Algorithm parameters.
Table 1. Algorithm parameters.
ParametersValues
Environmental parametersNumber of grid lines10
Number of grid volumes10
Grid size1 m × 1 m
Number of actions9
Number of agents3
Number of training traces1000
Number of testing traces100
Number of steps in each trace100
Algorithm parametersLS algorithmPath loss exponent α 2/1.6/1.9
Stand deviation σ 1/6
multi-agent DRL algorithmSize of replay memory20,000
Initial size of replay memory2000
Batch size for gradient decent200
Update frequency of the network100
Discount factor0.99
Learning rate0.001
Initial exploration rate1
Discount factor of exploration rate0.002
Final exploration rate0.1
Table 2. Localization error evaluation for the simulation data under different noise and environmental conditions.
Table 2. Localization error evaluation for the simulation data under different noise and environmental conditions.
ALE (m)RMSE (m)MLE (m)
α = 2   σ = 1 LLS4.0813.4391.952
WLLS4.1613.5031.952
Improved LLS3.9913.4101.837
Trilateration5.6234.4191.495
α = 1.6   σ = 6 LLS4.9713.9902.039
WLLS5.1594.1352.375
Improved LLS4.8793.9382.235
Trilateration9.22516.3845.918
α = 1.9   σ = 6 LLS4.9253.9822.014
WLLS4.7103.8552.571
Improved LLS4.5713.6972.323
Trilateration7.2748.9194.860
Table 3. Localization error evaluation for simulated heterogeneous data.
Table 3. Localization error evaluation for simulated heterogeneous data.
APE (m)RMSE (m)LPE (m)
α 1 = 2 , σ 1 = 1
α 1 = 2 , σ 1 = 1
α 2 = 1.6 , σ 2 = 6
LLS4.8403.8892.191
WLLS5.0844.0701.870
Improved LLS4.3433.5352.326
Trilateration6.7247.2702.459
α 1 = 2 , σ 1 = 1
α 1 = 2 , σ 1 = 1
α 3 = 1.9 , σ 3 = 6
LLS4.9444.0122.024
WLLS3.9903.4291.215
Improved LLS5.1984.2071.480
Trilateration6.2626.2822.388
α 1 = 2 , σ 1 = 1
α 2 = 1.6 , σ 2 = 6
α 2 = 1.6 , σ 2 = 6
LLS4.6113.8161.790
WLLS4.8463.9852.034
Improved LLS4.7933.9491.848
Trilateration8.30313.1654.999
α 1 = 2 , σ 1 = 1
α 2 = 1.6 , σ 2 = 6
α 3 = 1.9 , σ 3 = 6
LLS4.9914.0141.638
WLLS4.5183.7462.147
Improved LLS5.0454.0862.183
Trilateration7.85211.7434.847
α 1 = 2 , σ 1 = 1
α 3 = 1.9 , σ 3 = 6
α 3 = 1.9 , σ 3 = 6
LLS4.7753.8691.887
WLLS5.0274.0741.202
Improved LLS5.3524.2852.806
Trilateration7.0459.6254.395
α 2 = 1.6 , σ 2 = 6
α 2 = 1.6 , σ 2 = 6
α 3 = 1.9 , σ 3 = 6
LLS5.0414.0821.788
WLLS5.0174.0681.730
Improved LLS4.8853.9282.090
Trilateration8.58212.9605.198
α 2 = 1.6 , σ 2 = 6
α 3 = 1.9 , σ 3 = 6
α 3 = 1.9 , σ 3 = 6
LLS5.0384.0842.023
WLLS5.2744.2681.754
Improved LLS5.2914.2442.975
Trilateration8.76611.5865.024
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, J.; Guo, X. A Novel Method of UAV-Assisted Trajectory Localization for Forestry Environments. Sensors 2024, 24, 3398. https://doi.org/10.3390/s24113398

AMA Style

Huang J, Guo X. A Novel Method of UAV-Assisted Trajectory Localization for Forestry Environments. Sensors. 2024; 24(11):3398. https://doi.org/10.3390/s24113398

Chicago/Turabian Style

Huang, Jian, and Xiansheng Guo. 2024. "A Novel Method of UAV-Assisted Trajectory Localization for Forestry Environments" Sensors 24, no. 11: 3398. https://doi.org/10.3390/s24113398

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop