Adaptive Dynamic Programming-Based Multi-Sensor Scheduling for Collaborative Target Tracking in Energy Harvesting Wireless Sensor Networks

Collaborative target tracking is one of the most important applications of wireless sensor networks (WSNs), in which the network must rely on sensor scheduling to balance the tracking accuracy and energy consumption, due to the limited network resources for sensing, communication, and computation. With the recent development of energy acquisition technologies, the building of WSNs based on energy harvesting has become possible to overcome the limitation of battery energy in WSNs, where theoretically the lifetime of the network could be extended to infinite. However, energy-harvesting WSNs pose new technical challenges for collaborative target tracking on how to schedule sensors over the infinite horizon under the restriction on limited sensor energy harvesting capabilities. In this paper, we propose a novel adaptive dynamic programming (ADP)-based multi-sensor scheduling algorithm (ADP-MSS) for collaborative target tracking for energy-harvesting WSNs. ADP-MSS can schedule multiple sensors for each time step over an infinite horizon to achieve high tracking accuracy, based on the extended Kalman filter (EKF) for target state prediction and estimation. Theoretical analysis shows the optimality of ADP-MSS, and simulation results demonstrate its superior tracking accuracy compared with an ADP-based single-sensor scheduling scheme and a simulated-annealing based multi-sensor scheduling scheme.


Introduction
A wireless sensor network (WSN) is usually deployed to monitor the physical phenomena in the geographic area covered by a large number of sensor nodes. It has the advantages of low cost, rapid deployment, self-organization, and fault tolerance, with wide applications such as environmental monitoring [1,2], medical care [3], pension service [4], and intelligent transportation [5].
Target tracking is a typical research problem in WSNs for studying collaborative signal and information processing where the sensors are scheduled by considering the tracking performance and the constrained network resources (e.g., the wireless bandwidth and limited sensor energy).
Usually, sensor scheduling relates to tasking the appropriate sensors at the right time to achieve satisfactory performance by considering the limited sensing, computing, and communication resources of the sensors. Effective sensor scheduling algorithms are needed for collaborative target tracking in order to get accurate estimates and effective resource utilization. For example, sensors are scheduled by using dynamic clusters and duty cycling technology to economize the limited energy of the sensors in [6]. An adaptive sensor scheduling scheme was proposed in [7] to improve tracking accuracy

Related Work
In the existing work for target tracking in WSNs, various sensor scheduling schemes have been proposed. By considering the number of scheduled sensors for each time step, they can be classified into one-sensor scheduling schemes and multi-sensor scheduling schemes. Meanwhile, by considering the scheduling mechanisms, they can be classified into non-adaptive sensor scheduling schemes and adaptive sensor scheduling schemes.
For example, a periodic sensor scheduling (PSS) scheme in which sensors sense the target alternatively within the predefined time slots was presented to avoid the inter-sensor interference (ISI) problem and utilize the sensors more effectively [21]. The drawback of PSS is the existence of an empty detection when a scheduled sensor cannot generate an effective measurement, which results in lower tracking accuracy and the wasting of sensor power. A distributed sensor scheduling scheme was proposed in [22] where the tasking sensor is elected spontaneously from the sensors with pending sensing tasks via random competition based on carrier sense multiple access. Each node does not need to know the location of the other nodes, which requires less occupied memory. However, the computation burden of the scheduling is shared to all the active nodes. Aiming to optimize the tradeoff between the tracking accuracy and the energy cost for collaborative target tracking in WSNs, a dynamic sensor selection scheme based on genetic algorithms was proposed in [23].
In the above approaches, only one sensor node is scheduled for performing the measurement at each time step. Generally, tracking performance can be further improved by multi-sensor scheduling. For example, a distributed multi-sensor target tracking algorithm was proposed in [24] by using a cluster-based Kalman filter (KF). At each time step, one sensor is selected as the head to fuse the measurements from the other sensors, estimate the target state using EKF, and send the results to the base station. A distributed-saturation-degree-based algorithm was proposed for target tracking with multiple ultrasonic sensors, where the ISI avoidance problem is converted to the problem of multiple access in a shared channel and the scheduling problem is transformed to a coloring problem [25]. In [26], probability-based prediction and sleep scheduling protocols were presented to improve energy efficiency with limited tracking performance loss. The above work does not analyze the energy consumption of the sensor node, and lacks the adaptation mechanism in response to energy changes in the network.
Some adaptive sensor scheduling solutions can be found in the literature. For example, an adaptive sensor scheduling scheme was introduced by scheduling the next tasking sensor for the next time step according to the predicted tracking accuracy derived from the trace of the covariance matrix of the state estimation [21]. In [27], an energy-efficient target tracking method was proposed, where the KF is used to predict the target location for the next time step, then the sensor node and the cluster are selected to minimize the energy consumption. A multi-step sensor scheduling scheme is adopted based on the adaptive sampling interval approach to achieve fast tracking speed and superior energy efficiency without degrading the tracking accuracy [28]. To improve the performance of energy efficiency and tracking accuracy, authors in [29] proposed a multi-step sensor scheduling approach using the branch-and-bound algorithm. It could achieve the optimal multi-step sensor scheduling solution, but easy led to the "curse of dimensionality" problem.
Nevertheless, the above adaptive sensor scheduling solutions only choose one sensor node for each time step. Similarly, a flexible mechanism to improve the tracking performance can be obtained by adaptive multi-sensor scheduling. For example, in [30], a distributed adaptive multi-sensor scheduling was presented to implement the target tracking with the cooperation of the sensor nodes. In the adaptive sampling interval approach for single target tracking, the sensors are scheduled in alternative tracking mode to implement energy-efficient tracking according to the predicted tracking accuracy based on EKF [31]. To minimize the estimation error over multiple time steps in a computationally tractable fashion, Huber [32] proposed an information-based pruning algorithm for multi-step sensor scheduling by using the information matrices of the sensors and the monotonicity of the Riccati equation. In [33], several suboptimal scheduling algorithms were proposed with the performance expressed by the weighted sum of the estimated error covariance matrix in KF and the energy consumption. The posterior Cramer-Rao lower bound was proposed as a sensor selection metric, which put a constraint on the total number of selected sensors to observe the target over a time window [7,34].
However, all these methods dispatch the sensors based on the optimization of local performance, instead of global performance. In energy-harvesting WSNs, novel design criteria are required to achieve an overall performance optimization over an infinite horizon.
There are some studies in the literature on energy harvesting-based WSNs. For example, the Markov decision process (MDP) was presented to maximize the long-term expected throughput

Basic Models
A sensor node usually consists of a sensing unit, a processing unit, a transceiver unit, and a power unit. The energy is finite in a power unit, while the energy of an energy-harvesting sensor node can be collected by the energy harvesting device through ambient energy from time to time and stored as electric energy.
In this paper, we assume that solar energy-based harvesting technology is adopted by the sensor nodes. We also assume that the WSN of this paper is composed of one sink node and M energy-harvesting sensor nodes. Each sensor node with enough energy can sense the target in its sensing region and transmit the perceived information to the sink node. The sink node fuses the received measurements, predicts the target states, and performs sensor scheduling.

Solar Energy Harvesting Model of the Sensor Nodes
In this paper, we assume that solar energy harvesting is used by the sensor. If the sensor's energy storage capacity is unlimited, the harvested energy of a sensor node can be modeled as [54]: where t 0 is the starting time for energy harvesting, ∆T is the time duration, f e (t) is the statistical distribution of the solar energy, and η e is the conversion efficiency of the solar panel. However, the unlimited storage capacity is impractical. Suppose that the maximal energy storage capacity of sensor i is H max Then, the harvested energy of sensor i with the residual

EKF-Based Prediction and Estimation Model for Target State
In this paper, we will apply EKF to the target tracking problem. The basic idea is to use minimum mean square error as the best estimation criterion and update the current estimated state with the previous prediction and the current measurements [55]. In this paper, we adopt a linear target motion model and a non-linear measurement model, both with Gaussian noise distributions.
The state of the target at the k-th time step at t k is where (x(k) , y(k)) are the location coordinates of the target and v x (k), v y (k) are the velocity of the target along the x-axis and the y-axis at t k . The target motion is modeled by the following constant velocity motion model If the target is detected by n sensors, then the sink will obtain n measurements z j (k)(j = 1, · · · , n). Let Z(k) = [z 1 (k), z 2 (k), · · · , z n (k)] T , then the measurement model is given by Some notations used in EKF are listed as follows: Step prediction for the (k + 1)-th time step using state estimation at the k-th time step. X(k + 1|k + 1): State estimation at the (k + 1)-th time step. ∆t k : Sampling time interval between two successive time steps.
w(∆t k ): Process noise at the k-th time step. Q(∆t k ): Covariance matrix of the process noise at the k-th time step. q: Given scalar that represents the intensity of the process noise. H(k + 1): Jacobian matrix of h at t k+1 with respect toX(k + 1|k). P(k + 1|k): Error covariance matrix of the state prediction for the (k + 1)-th time step. P(k + 1|k + 1): Error covariance matrix of the state estimation at the (k + 1)-th time step. I: Unit matrix. K(k + 1): Kalman gain at the (k + 1)-th time step.
Both w(∆t k ) and v i are independent and assumed to have zero-mean, white, Gaussian probability distributions. h i is generally non-linear depending on X(k), the measurement characteristic, and the parameters (e.g., the location) of sensor i.
In EKF, the prediction is operated aŝ The estimation is operated as

Tracking Accuracy
At the k-th step, the sink node schedules the sensors to minimize the global performance, which is composed of the energy consumption and tracking accuracy, under the limited energy harvesting capabilities. However, it is impractical to calculate the error through the difference between the real state and the estimated state because the measurement is unobtainable prior to the sink scheduling the sensors. However, the error covariance based on EKF is available before measuring, and it describes the degree of the difference between the estimation and the expectation. Hence, (11) can be used to evaluate the tracking accuracy: T(k) = trace(P(k|k)).

System Assumptions
In our proposed algorithm, the assumptions made about the network model are as follows.
• The sensors and sink node are stationary.

•
The sink node has strong computing ability and energy storage capacity with enough memory.

•
The sink is aware of the locations of the sensor nodes. • All sensor nodes are homogeneous (i.e., having the same sensing, processing, and communication capabilities). • A sensor node and the sink node can communicate directly with each other via a single-hop link. Figure 1 illustrates a target tracking scenario in an energy-harvesting WSN. When observed the target, the tasking sensors transmit the perceived measurements to the sink node. Then the sink node fuses the received measurements, predicts the target state for the next time step based on EKF, schedules the next tasking sensor nodes by ADP-MSS and notifies them by the low-power paging channel.

Energy Consumption Analysis
At the k-th time step, the detection model of sensor i is described as where h E is a threshold value for sensing the target, i E k ( ) represents the received signal level,  The general tracking system in the energy-harvesting WSN works as follows.

1.
Initialization. When the target enters the sensor field, the energy-harvesting sensor with enough energy that detects the target for the first time becomes the first tasking sensor. It sends the measurement to the sink node.

2.
State estimation and prediction. When the sink gets the new measurements, it estimates and predicts the state and error covariance by EKF. 3.
Sensor scheduling. Based on the above solar energy harvesting model, the sink performs the sensor scheduling by ADP-MSS to minimize the performance which consists of the predicted tracking accuracy and energy consumption. 4.
Mode swapping. The sink wakes up the tasking sensors for the current time step and switches the others to the sleeping mode via the low-power paging channel.

5.
Monitoring and transmitting. The tasking sensors monitor the target and transmit the measurements to the sink.

Energy Consumption Analysis
At the k-th time step, the detection model of sensor i is described as where E h is a threshold value for sensing the target, E i (k) represents the received signal level, in which E 0 i and β are constant and d (x,i) is the Euclidean distance between the target and sensor i. The set of tasking sensors scheduled to track the target Ω T (k) is a subset of Ω D (k) = {i|D i (k) = 1}, which denotes the set of all candidate sensors that possibly detect the target. At t k , the energy consumption of sensor i is If u i (k) = 1, the sensor i is scheduled, otherwise the sensor i is sleeping. E r = e r b r represents the energy consumed to receive b r bits of data. E t (i) = (e t + e d d 2 (s,i) )b t represents the energy consumption due to transmitting b t bits of data to the sink node s. E p = e p b p represents the energy consumption due to sensing and data processing of b p bits, and E s represents the energy required for sleeping. e r , e t , e d , and e p are decided by the specifications of the sensor.
In this paper, the design objective is to schedule the sensors for high tracking performance over an infinite horizon. Set the system state as the residual energy of the energy-harvesting sensors and the system control as the sensor scheduling solution. At t k , before scheduled, the residual energy of sensor i is where ) is the residual energy of sensor i after being scheduled at the (k − 1)-th time step. If scheduled, sensor i must satisfy the restriction Let Ω u (k) be the subset in Ω D (k). For time step k, the scheduled sensors must be a subset of Ω u (k). After being scheduled at the k-th time step, the consumed energy and residual energy of sensor According to (14), (15), and (17), we can obtain where g i = E s − E r + E t (i) + E p . Let g = [g 1 , g 2 , · · · , g M ], the system state of the k-th time step , and the control of the k-th time step is u(k) = [u 1 (k), u 2 (k), · · · , u M (k)]. Then, the system model is where g × u(k) means the Hadamard product (i.e., element-wise product) between g and u(k).

The Proposed Algorithm
We analyzed the predicted tracking accuracy and energy consumption respectively for time step k. To acquire the trade-offs between the potentially infinite network lifetime and the tracking accuracy, we define the utility function at time step k as in which β 1 > 0 is a coefficient to adjust the weight of the tracking accuracy [7]. It is obvious that U(k) is finite. Define the global performance index as the weight sum of the utility function from time step k to the infinite: where 0 < γ ≤ 1 is a discount factor. Then, we can derive a Hamilton-Jacobi-Bellman (HJB) equation: Hence, the objective function of the optimization multi-sensor scheduling problem for target tracking in an energy-harvesting WSN is Let J * (k) = min u(k) J(k). Then, we can get the following HJB equation and the optimal control sequence u * (k) by Generally, the optimal performance index function J * (k) is nonlinear, and it is difficult to obtain the optimal control by directly solving (24). To overcome the above problem, the ADP-MSS is proposed to get the approximate optimal solution in this paper.
A diagram of the proposed ADP-MSS is shown in Figure 2, which is comprised of three modules: model, critic network, and action. The model describes the relationship between the next system state E a f (k + 1) with the current system state E a f (k) and the system control u(k) (i.e., the model in (19)). The critic network evaluating the infinite horizon performance is realized by a neural network, in which the input is the system state and the output is the evaluated performance index Φ(k) which tends to satisfy the HJB equation defined as in (22). The action is executed to find the optimal control for the evaluated performance in the critic network.
Generally, the optimal performance index function ( ) J k * is nonlinear, and it is difficult to obtain the optimal control by directly solving (24). To overcome the above problem, the ADP-MSS is proposed to get the approximate optimal solution in this paper. A diagram of the proposed ADP-MSS is shown in Figure 2, which is comprised of three modules: model, critic network, and action. The model describes the relationship between the next system state af E k ( +1) with the current system state af E k ( ) and the system control u k ( ) (i.e., the model in (19)). The critic network evaluating the infinite horizon performance is realized by a neural network, in which the input is the system state and the output is the evaluated performance index Φ k ( ) which tends to satisfy the HJB equation defined as in (22). The action is executed to find the optimal control for the evaluated performance in the critic network.
Next, when the iteration step i = 1, 2,···, we can obtain It runs as follows. At first, let Φ [0] (k) = 0 for any k, then we can obtain the optimal performance index at the first iteration step Φ [1] (k) = min and the optimal control strategy Next, when the iteration step i = 1, 2, · · · , we can obtain The critic network is designed to approximate Φ [i+1] . The input is E a f (k) ∈ R 1×M where R is the set of real numbers and the output is The optimal object can be expressed as Hence, we can define the error of the network as Therefore, the objective function needed to be minimized in the critic network is The steepest descent method is used for the weight update: in which 0 < α c < 1 is the learning rate and the updated weights are w c (k) and v c (k).

The ADP-MSS Implementation Process
The pseudocode for ADP-MSS at time step k is given in Algorithm 1. Here, the system state E a f (k) is known, δ is a very small positive value defined by the user, and Φ [i] (k) denotes the iterative global performance index from time step k to the infinite, at iteration step i. This iteration procedure can be terminated after a predefined number of iteration step (MI) is reached.

Theoretical Analysis
Now we will prove the convergence of ADP-MSS. That is,
Proof: Define a new sequence as follows: Hence, Ψ [i+1] (k) is bounded. According to (28), we can conclude that
From Theorem 1 and Theorem 2, it can be inferred that Theorem 3. For any k, Φ ∞ (k) is the optimal performance index, that is, Φ ∞ (k) satisfies the HJB equation Proof: From Theorem 2, we can get Φ ∞ (k) ≥ Φ [i+1] (k). Let i → ∞ , then we can obtain Based on the definition of Φ ∞ (k), ∀ε > 0, ∃Φ [p] (k), such that Then, we have ε can be ignored because it is any positive value. Let p → ∞ , then From (42) and (45), we can get (41), which is just the definition of J * (k) after replacing Φ ∞ (·) by J * (·). Hence, we can conclude that Φ ∞ (k) = J * (k), which means that the sequence of the iterative performance indexes in the proposed ADP-MSS will converge to the optimal solution.

Simulation Results
In this paper we used Matlab 2014 as the simulation tool and considered a numerical example in which a WSN is deployed to monitor the moving target in a closed region with 10 m × 10 m square. The WSN contained 24 sensor nodes and one sink located at the center, as shown in Figure 3. For each sensor node, the sensing region was a circle centered on its own location with a radius of 3 m.
Proof: From Theorem 2, we can get ∞ + Φ ≥Φ i k k [ 1] ( ) ( ) . Let → ∞ i , then we can obtain From (42) and (45), we can get (41), which is just the definition of J k * ( ) after replacing ∞ Φ  ( ) by  J * ( ) . Hence, we can conclude that ∞ Φ = k J k * ( ) ( ) , which means that the sequence of the iterative performance indexes in the proposed ADP-MSS will converge to the optimal solution. □

Simulation Results
In this paper we used Matlab 2014 as the simulation tool and considered a numerical example in which a WSN is deployed to monitor the moving target in a closed region with 10 m × 10 m square. The WSN contained 24 sensor nodes and one sink located at the center, as shown in Figure 3. For each sensor node, the sensing region was a circle centered on its own location with a radius of 3 m.   In the simulations of this paper, the ranging sensors were used to measure the distance between the sensor and the target. For sensor i located at (x i , y i ), the measurement function h i is The Jacobian matrix for the measurement function is We assumed the solar panel's area was 5 cm × 5 cm. The harvested energy rate was 0.1 W/cm 2 , and energy conversion efficiency was 15%. The max capacity of each battery was 5 × 10 −2 J with the initial energy being 2.5 × 10 −3 J, and they had infinite recharge cycles. Meanwhile, the variance of the measurement noise of the sensor nodes changed from 0.01 to 0.1. Except for E s , the energy consumption parameters were borrowed from [31] as shown in Table 1, and the other constant parameters are given in Table 2. Table 1. Parameters in the energy consumption model.

Parameters Value
process noise parameter q 1 coefficient β 1 0.10 discount factor γ 0.70 learning rate α c 0.20 sampling interval 0.05 s Packet size in each transmission 10 bits number of nodes in the NN hidden layer 30 initial location of the target (8.81, 6.23) computation precision δ 1 × 10 −3 max iteration step MI in ADP algorithm 1000 In the simulations, the true trajectory of the target was a circle with a radius of 4 m centered at the center of the WSN. The residual energy of the 24 sensors at time step k E a f (k) was used as the system state, and could be obtained by the previous system state estimation and the control according to (19). The control was the sensor scheduling scheme u(k) = [u 1 (k), u 2 (k), · · · , u 24 (k)], where u i (k) ∈ {0, 1} and u i (k) = 1 means that sensor i was scheduled as one of the tasking sensors at time step k, otherwise the sensor i was not scheduled and could remain in the sleeping mode. While the target was moving in the monitoring area, the tracking system iteratively performed target detection by the scheduled tasking sensors, transmitting the measurements from the tasking sensors to the sink, target state estimation and prediction by the sink, and sensor scheduling by the sink. If the sensors are not properly scheduled, it can result in the failure of the tracking or degradation of the overall tracking performance.
The structure of the adopted critic network was 24-30-1 with 24 inputs, 30 nodes in the hidden layer, and 1 output. Its initial weight values were set randomly from the range (0, 0.5). Figure 4 shows the changes of the performance indexes for the first time-step of ADP-MSS, initialized at 0. It can be found that the change of the performance indexes was monotone non-decreasing as analyzed in Theorem 2, and the curve converged after about 600 iterations. degradation of the overall tracking performance.
The structure of the adopted critic network was 24-30-1 with 24 inputs, 30 nodes in the hidden layer, and 1 output. Its initial weight values were set randomly from the range (0, 0.5). Figure 4 shows the changes of the performance indexes for the first time-step of ADP-MSS, initialized at 0. It can be found that the change of the performance indexes was monotone non-decreasing as analyzed in Theorem 2, and the curve converged after about 600 iterations.  The true trace and estimated trajectories of the target are shown in Figure 5 when the variance of the measurement noise was 0.05 and the target speed was 5 m/s. The corresponding tracking error is shown in Figure 6, which consists of the Euclidean distance from the true coordinate to the estimated coordinate of the target at time step k. To evaluate the tracking accuracy of ADP-MSS, the tracking errors of an ADP-based single-sensor scheduling algorithm (ADP-SSS) and simulated annealing algorithm-based multi-sensor scheduling (SAA-MSS) are also shown in the same figure, where it is obvious that the proposed approach ADP-MSS was more stable and accurate. Figures 7 and 8 show the tracking errors respectively while the target speed increased from 1 to 10 m/s and the variance of the measurement noise changed from 0.01 to 0.1. From the curves in these two figures, we can find that the results of the two multi-sensor scheduling schemes (ADP-MSS and SAA-MSS) were more stable and accurate than those of the single-sensor one (ADP-SSS). This is because multiple sensors can provide more information to improve the tracking accuracy using data fusion. In addition, it is easy to find that the results from ADP-MSS scheme were better than those from SAA-MSS. The main reason is that the SAA-MSS only takes the local optimization of the performance into account. In fact, the node's state is associated with its previous state and may influence its states at the following steps. Hence, from the overall performance perspective, local optimal solutions are not the most reasonable decisions, and may have a negative impact on the global performance. The true trace and estimated trajectories of the target are shown in Figure 5 when the variance of the measurement noise was 0.05 and the target speed was 5 m/s. The corresponding tracking error is shown in Figure 6, which consists of the Euclidean distance from the true coordinate to the estimated coordinate of the target at time step k. To evaluate the tracking accuracy of ADP-MSS, the tracking errors of an ADP-based single-sensor scheduling algorithm (ADP-SSS) and simulated annealing algorithm-based multi-sensor scheduling (SAA-MSS) are also shown in the same figure, where it is obvious that the proposed approach ADP-MSS was more stable and accurate. Figures 7 and 8 show the tracking errors respectively while the target speed increased from 1 to 10 m/s and the variance of the measurement noise changed from 0.01 to 0.1. From the curves in these two figures, we can find that the results of the two multi-sensor scheduling schemes (ADP-MSS and SAA-MSS) were more stable and accurate than those of the single-sensor one (ADP-SSS). This is because multiple sensors can provide more information to improve the tracking accuracy using data fusion. In addition, it is easy to find that the results from ADP-MSS scheme were better than those from SAA-MSS. The main reason is that the SAA-MSS only takes the local optimization of the performance into account. In fact, the node's state is associated with its previous state and may influence its states at the following steps. Hence, from the overall performance perspective, local optimal solutions are not the most reasonable decisions, and may have a negative impact on the global performance.
From Figures 4-8, the following conclusions can be drawn: • The performance index of ADP-MSS was monotonically non-decreasing and converged.

•
The multi-sensor scheduling schemes were more stable and reliable than the single one.       • The performance index of ADP-MSS was monotonically non-decreasing and converged.

•
The multi-sensor scheduling schemes were more stable and reliable than the single one.

•
The proposed ADP-MSS could achieve global performance optimality.   • The performance index of ADP-MSS was monotonically non-decreasing and converged.

•
The multi-sensor scheduling schemes were more stable and reliable than the single one.

•
The proposed ADP-MSS could achieve global performance optimality.

Conclusions
ADP is an efficient method to solve the dynamic programming problems of discrete systems. This paper introduces the ADP approach (ADP-MSS) to the optimal multi-sensor scheduling problem for target tracking in energy-harvesting WSNs. We present an adaptive scheme to schedule the tasking sensors by considering the global optimization of the performance composed of the energy consumption and tracking accuracy over an infinite time horizon. Theoretical analysis proved that the iterative control by ADP-MSS will converge to the optimal solution. Through simulation results, we found that the multi-sensor scheduling schemes were more stable and reliable than the single sensor scheduling scheme and the proposed ADP-MSS was superior to an SAA-based multi-sensor scheduling scheme from a global perspective. As future work, more advanced ADP based cross-layer sensor network design schemes can be studied by jointly designing the network protocols with the sensor scheduling.
Author Contributions: F.L. proposed the ADP-MSS scheme for target tracking in the energy-harvesting WSN and conducted the experiments and analysis. W.X. supervised the work. S.C. and C.J. were involved in the discussions on ADP theory and its applications.

Conflicts of Interest:
The authors declare no conflict of interest.