Next Article in Journal
Foot Strike Pattern Detection Using a Loadsol® Sensor Insole
Previous Article in Journal
Associations Between Daily Heart Rate Variability and Self-Reported Wellness: A 14-Day Observational Study in Healthy Adults
Previous Article in Special Issue
Securing UAV Flying Ad Hoc Wireless Networks: Authentication Development for Robust Communications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Failure Risk-Aware Multi-Hop Routing Protocol in LPWANs Using Deep Q-Network

by
Shaojun Tao
1,2,
Hongying Tang
1,
Jiang Wang
1,* and
Baoqing Li
1,*
1
Science and Technology on Micro-System Laboratory, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
*
Authors to whom correspondence should be addressed.
Sensors 2025, 25(14), 4416; https://doi.org/10.3390/s25144416
Submission received: 19 June 2025 / Revised: 12 July 2025 / Accepted: 13 July 2025 / Published: 15 July 2025
(This article belongs to the Special Issue Security, Privacy and Trust in Wireless Sensor Networks)

Abstract

Multi-hop routing over low-power wide-area networks (LPWANs) has emerged as a promising technology for extending network coverage. However, existing protocols face high transmission disruption risks due to factors such as dynamic topology driven by stochastic events, dynamic link quality, and coverage holes induced by imbalanced energy consumption. To address this issue, we propose a failure risk-aware deep Q-network-based multi-hop routing (FRDR) protocol, aiming to reduce transmission disruption probability. First, we design a power regulation mechanism (PRM) that works in conjunction with pre-selection rules to optimize end-device node (EN) activations and candidate relay selection. Second, we introduce the concept of routing failure risk value (RFRV) to quantify the potential failure risk posed by each candidate next-hop EN, which correlates with its neighborhood state characteristics (i.e., the number of neighbors, the residual energy level, and link quality). Third, a deep Q-network (DQN)-based routing decision mechanism is proposed, where a multi-objective reward function incorporating RFRV, residual energy, distance to the gateway, and transmission hops is utilized to determine the optimal next-hop. Simulation results demonstrate that FRDR outperforms existing protocols in terms of packet delivery rate and network lifetime while maintaining comparable transmission delay.

1. Introduction

Multi-hop routing in low-power wide-area networks (LPWANs) has emerged as a promising solution for expanding geographical coverage [1,2]. Within such networks, event-driven architectures are widely adopted to enhance energy efficiency [3,4]. However, multi-hop routing over event-driven LPWANs is challenged by high transmission disruption risk. Specifically, dynamic link quality introduces unstable link connections, while imbalanced energy consumption and nonuniform end-device node (EN) distribution lead to coverage holes that disrupt data forwarding [5,6,7]. Consequently, developing multi-hop routing protocols that guide ENs to select routes with low disruption probability is critical.
Over the past decades, numerous multi-hop routing protocols have been proposed to determine optimal relays by evaluating the intrinsic EN state and neighborhood state [8,9,10,11,12]. However, these studies primarily focus on assessing link quality within the neighborhood state. By overlooking the number and residual energy of neighbors, these methods struggle to avoid selecting ENs that introduce high routing failure risk. Specifically, ENs with few neighbors exhibit higher transmission failure probabilities due to limited next-hop availability, while those connected to low-energy neighbors are prone to instability caused by energy depletion during data forwarding. Therefore, developing a comprehensive neighborhood state assessment framework to avoid relays that introduce high routing failure risk is imperative.
Given these, we propose a failure risk-aware deep Q-network-based multi-hop routing (FRDR) protocol. In FRDR, by evaluating multiple neighborhood state characteristics, a distinct routing failure risk value (RFRV) is assigned to each EN. RFRV is then integrated with other metrics into the reward function of a deep Q-network (DQN)-based routing decision framework to determine the optimal next-hop. The DQN employs reinforcement learning (RL), where agents continuously interact with external environments to learn optimal policies that maximize cumulative rewards [13]. Furthermore, by employing deep neural networks (DNNs) to approximate the Q-function within the Q-learning framework, DQN can effectively handle multi-hop routing under dynamic and complex conditions [14,15].
The main contributions of our study are summarized as follows:
  • We design a novel power regulation mechanism (PRM) that adaptively adjusts activation ranges based on the average signal-to-noise ratio (SNR) of received signals from neighbors. This mechanism further incorporates pre-selection rules to optimize EN activations and candidate relay selection.
  • We introduce the concept of routing failure risk value (RFRV) to quantify the potential failure risk posed by each candidate next-hop EN, which is evaluated based on its neighborhood state characteristics, including the number of neighbors, residual energy level, and link quality.
  • We develop a DQN-based routing decision mechanism that integrates RFRV into the reward function. Building upon metrics such as residual energy, distance to the gateway, and transmission hop count, our mechanism prioritizes low-RFRV ENs, thereby reducing transmission failures.
  • Through meticulous evaluation across various metrics, our simulation results demonstrate the advantages of FRDR in improving packet delivery rate and network lifetime while maintaining comparable transmission delay.
The remainder of this paper is organized as follows. Related studies are discussed in Section 2. Section 3 presents a brief review of DQN, and Section 4 introduces system models. In Section 5, the details of FRDR are described. Simulation results are thoroughly analyzed in Section 6 to illustrate the superiority of FRDR over other protocols, while Section 7 concludes this paper.

2. Related Studies

Over the past decades, numerous multi-hop routing protocols have been investigated, with a focus on relay selection strategies to optimize routing performance. In [8], link state information within two hops was considered when selecting relays to minimize delay and reduce packet loss. However, this two-hop dependency incurs high computational overhead in dynamic networks with frequent topology changes. In [9], the candidate relay with the highest reliability was selected to establish high-reliability and low-latency routes. Nevertheless, due to the dependence on predefined fuzzy rules, its adaptability to unmodeled network scenarios is limited. A method based on link quality prediction was proposed in [10], where a fuzzy logic system that incorporates distance, residual energy, and link quality (estimated via Kalman filtering) was adopted in relay decisions. However, this method is susceptible to model mismatch in event-driven networks, as bursty traffic violates the Markovian assumption underlying Kalman filtering-based prediction.
Given the limitations of traditional approaches, RL-based methods have emerged as a promising solution. These methods enable agents to learn optimal routing policies through real-time interaction with the external environment and reward-driven optimization, eliminating dependence on predefined models [13,14]. A Q-learning-based routing protocol was developed in [16], where energy consumption, bandwidth utilization, throughput, and data latency are jointly considered during relay selection. Similarly, ref. [17] proposed a Q-learning framework to reduce packet losses by deprioritizing predicted faulty nodes within routing decisions. In [15], a DQN-based intelligent routing (DQIR) protocol that balances residual energy distribution while minimizing routing distance was introduced to select relays. To address challenges such as insufficient adaptability to network topology changes, high communication delays, and short network lifetime in multi-hop routing, a dueling double deep Q-network was employed in [14] to optimize routing decisions. In [18], a reinforcement learning framework that integrates different node centrality metrics was developed to optimize relay selection.
A review of existing research reveals that while neighborhood state has been incorporated into relay selection decisions, these studies primarily focus on link quality without simultaneously considering the number of neighbors and their residual energy. This narrow focus prevents these methods from effectively excluding relays that introduce high routing failure risk, particularly in dynamic and complex networks. Geared toward this shortcoming, we propose FRDR in this article.

3. Brief Review of DQN

An RL framework is typically modeled as a Markov Decision Process (MDP), characterized by a tuple S , A , P , R , where S represents the state space, A denotes the action space, P is the state-transition probability, and R signifies the rewards. At each time step t , the agent executes the action a t A determined by the policy π based on the current state s t S . Subsequently, the environment provides the agent with an immediate reward r t R contingent upon a t and transitions to the next state s t + 1 . This process generates an experience s t , a t , r t , s t + 1 . The overarching goal of the agent is to derive an optimal policy π that maximizes the expected cumulative reward, thereby optimizing long-term performance within the given MDP framework [15].
Q-learning is a value-based RL algorithm that iteratively refines policies to approximate π . The action-value function Q s , a estimates the expected return of taking action a in state s , which is updated iteratively using the following formula:
Q s , a Q s , a + α r + γ max a A Q s , a Q s , a ,
where max a A Q s , a is the maximum Q-value over all possible actions a in the subsequent state s , γ is the discount factor, and α is the learning rate. π directs the agent toward actions that yield the highest Q s , a in each state.
When the state space is large, exhaustively computing Q s , a becomes infeasible. Consequently, DQN is adopted to approximate Q s , a , where the output is Q s , a ; ω Q s , a . Here, ω represents the weights of the DNN, and the stochastic gradient descent (SGD) algorithm is used to update parameters.
However, the neural network can become unstable owing to correlations between Q-value and target value, or small updates to Q-value at each step. To address this instability, experience replay and a quasi-static target network are employed in DQN [18]. In experience replay, at each time step t , an experience sample e t = s t , a t , r t , s t + 1 is stored in a replay memory M = e 1 , e 2 , , e t . During training, the agent randomly samples a minibatch of experiences from M , thus removing the correlations between continuous samples and improving the stability and efficiency of learning. Additionally, an independent target neural network with weights ω is used for the quasi-static target network. The loss function L ω is calculated as follows:
L ω = E s , a , r , s M y s , r Q s , a ; ω 2 ,
where y s , r is the output of the target neural network:
y s , r = r + γ max a A Q s , a ; ω ,
where ω is synchronized with ω every C steps. This approach decouples the target value computation from the Q-network weights, thereby reducing the likelihood of divergence and ensuring more stable learning.

4. System Models

4.1. Network Model

Without loss of generality, we consider a network where N ENs are randomly distributed within an L × L monitoring area with nonuniform density. As established in [19,20,21,22], the network model satisfies the following assumptions to construct a standardized scenario:
1.
A gateway (GW) is located in the center of the network and remains powered on. Central placement simplifies the network model, providing a consistent reference point for all ENs while facilitating a more balanced distribution of data flow.
2.
All ENs are homogeneous. This configuration minimizes performance variations due to hardware differences, thereby facilitating an unbiased evaluation of the logic and effectiveness of routing protocols under consistent operating parameters.
3.
Both ENs and GW are stationary after deployment. This configuration eliminates route fluctuations caused by the mobility of ENs and GW.
4.
All ENs are synchronized and can determine their locations via Global Positioning System (GPS) or other self-localization algorithms. Synchronization is essential for ordering control and data packets in negotiation-based protocols, while geographic information is fundamental for distance-based relay selection.
5.
The links are symmetric. This assumption ensures bidirectional connectivity and consistent link characteristics, thereby avoiding complications from unidirectional paths that disrupt acknowledgment-dependent routing protocols.

4.2. Routing Failure Risk Value

To reduce transmission disruption probability, ENs with higher Routing Failure Risk Value (RFRV) are deprioritized in FRDR. The effectiveness of this approach is demonstrated in Figure 1.
Definition  1.
For a given EN, its neighboring ENs are all ENs located within its maximum direct communication range.
Generally, neighborhood state characteristics, including the number of neighbors N n , the residual energy level of neighbors E n , and link quality L Q , are jointly considered when evaluating RFRV. An EN with lower N n , E n , and L Q is associated with a higher RFRV. For E N i , R F R V i can be computed via Equations (4)–(6).
R F R V i = λ 1 N ˜ n i + λ 2 L Q ˜ i + λ 3 E ˜ n i ,
L Q i = R S S I i ¯ R S S I t h R S S I t h S N R i ¯ S N R t h S N R t h ,
E n i = e r e s n i ¯ / e i n i t ,
x ˜ = x x min / x max x min ,
where R S S I i ¯ and S N R i ¯ denote the average received signal strength indicator (RSSI) and signal-to-noise ratio (SNR) at E N i from signals transmitted by its neighbors, while R S S I t h and S N R t h are the corresponding thresholds. e r e s n i ¯ indicates the average residual energy of neighbors, while e i n i t is the initial energy. To eliminate dimensional differences among heterogeneous indicators, N n , E n , and L Q are standardized by the min-max normalization, as defined in Equation (7).
The weights λ i i = 1 , 2 , 3 in Equation (4) are determined using the Analytical Hierarchy Process (AHP) [23]. Benefiting from its capability in establishing quantitative frameworks for complex and ambiguous decision-making problems, as well as systematically relating criterion weights to overarching objectives, AHP is widely adopted for deriving criterion weights in multi-criteria decision analysis [24].
The AHP process begins by constructing a pairwise comparison matrix A for decision criteria, as defined in Equation (8).
A = a i j k × k = 1 a 12 a 1 k a 21 1 a 2 k a k 1 a k 2 1 ,
where each element a i j denotes the relative importance of the criterion associated with row index i compared to the criterion related to the column index j . When constructing A , a 1–9 scale [25] is widely adopted to quantify the relative importance between each pair of criteria. This well-known AHP scale is shown in Table 1.
The weight vector w is then derived by solving the characteristic equation:
A w = λ max w ,
where λ max is the largest eigenvalue of the pairwise comparison matrix A .
Since w represents unnormalized priorities, the final criteria weights w i are obtained through normalization:
w i = w i / j = 1 k w j ,
where k is the order of A (i.e., the number of criteria).
Since pairwise comparisons in AHP are heavily dependent on human judgment, they are susceptible to inconsistencies. To address this issue, a standard procedure is provided in [26] to check for the consistency of the pairwise comparison matrix by utilizing the largest eigenvalue λ max . The deviation of λ max from the matrix dimension k is quantified by the Consistency Index (CI):
C I = λ max k k 1 ,
To benchmark CI, Random CI (RI) is also proposed in [26] (Table 2), which is derived from randomly generated reciprocal matrices of various dimensions. The Consistency Ratio (CR) is then calculated as follows:
C R = C I R I ,
According to the established threshold [27], C R 0.1 indicates that the results are satisfactory. Otherwise, the pairwise comparison matrix must be re-evaluated.
According to Equations (8)–(10), the weights λ i i = 1 , 2 , 3 in Equation (4) are determined using the pairwise comparison matrix presented in Table 3. This matrix satisfies the consistency requirement (CR = 0.0079 < 0.1), confirming the reliability of the weight results.

5. Detailed Description of the Proposed Protocol

In this section, FRDR is introduced in detail with the overall flowchart illustrated in Figure 2.
To deliver an event-related data packet to the GW, data-holding EN (DEN) in FRDR first checks whether the GW is within its one-hop communication range. If direct transmission is feasible, the data packet is forwarded directly. Otherwise, DEN triggers the relay selection process, which consists of two phases: candidate router selection and optimal relay decision-making.
Given the need for real-time topology awareness in dynamic networks, FRDR utilizes the Sensor Protocol for Information via Negotiation (SPIN) [28] to manage neighbor discovery and state updates, with its message sequence detailed in Figure 3. The specific workflow is outlined as follows:
DEN initiates the relay selection process by broadcasting an Advertisement (ADV) message with transmission power regulated by the Power Regulation Mechanism (PRM). This mechanism dynamically adjusts EN activation ranges based on the average neighbor signal SNR to enhance energy efficiency. Neighboring ENs that receive the ADV message parse its metadata to extract information such as the DEN-to-GW distance.
To prevent redundant participation, a pre-selection mechanism is employed to restrict relay requests from ENs that receive the ADV message. Each EN autonomously determines whether to apply for packet forwarding based on its residual energy and distance to the GW. Eligibility criteria for the application are as follows:
(1)
The distance from the EN to the GW must be shorter than the DEN-to-GW distance recorded in the ADV metadata, thereby preventing data backhaul.
(2)
The residual energy of the EN must exceed a predefined threshold e t h , which is derived from the energy cost of receiving and forwarding a data packet at the minimum transmission power level. This criterion prevents resource wastage caused by ENs with insufficient energy applying for relay tasks.
Neighboring ENs that fail either condition discard the ADV packet, and qualified ENs respond with a Request (REQ) message containing self-reported metrics (i.e., residual energy and distance to the GW). These responding ENs form the candidate set of next-hop routers.
If the DEN receives no REQ packet within the designated reception window, it rebroadcasts the ADV packet at its maximum power level to activate more potential relays. If this second broadcast also fails to elicit any response, the transmission is deemed a failure due to the unavailability of suitable relays.
Conversely, when REQ responses are received, the DEN executes a DQN-based routing decision mechanism, as detailed in Section 5.2, to determine the optimal relay from candidates. Following this selection, the DEN forwards the data packet to the chosen relay. Upon completing the role transition, the current DEN exits the routing process and enters a low-power sleep mode, where it awaits its next activation to minimize energy consumption.

5.1. Power Regulation Mechanism

By default, ENs operate at maximum power to maintain periodic neighbor information exchange. However, this configuration becomes inefficient during data transmission, as excessive power causes resource wastage (e.g., redundant EN activations). To address this issue, FRDR introduces a Power Regulation Mechanism (PRM) that adaptively adjusts transmission power levels based on the average SNR of received signals from neighbors, thereby reducing overhead.
As detailed in Algorithm 1, the standard Adaptive Data Rate (ADR) algorithm [29] adjusts data rates based on SNR to optimize throughput and energy efficiency.
Algorithm 1. Standard Adaptive Data Rate Algorithm.
Initialize:   Spreading   factor   S F 7 , 12 ,   Transmitting   Power   T P 2   dBm , 14   dBm
1: S N R r e q ← demodulation floor (current data rate)
2: S N R max ← max (SNR of last 20 frames)
3: S N R m a r g i n     S N R max S N R r e q M a r g i n _ d B
4: N S t e p     int ( S N R max / 3 )
5: while   N S t e p > 0   and   S F > S F min   do
6:     S F     S F 1
7:     N S t e p     N S t e p 1
8:end while
9: while   N S t e p > 0   and   T P > T P min   do
10:     T P     T P 3
11:     N S t e p     N S t e p 1
12:end while
13: while   N S t e p < 0   and   T P < T P max   do
14:     T P     T P + 3
15:     N S t e p     N S t e p + 1
16:end while
17: Output :   T P   and   S F
Building upon the standard ADR framework, we propose the PRM, which redirects optimization from data rates to transmission power levels. The workflow of PRM operates as follows:
Step 1: Calculate the average SNR S N R a v g of the most recently received signals transmitted by n neighbors.
Step 2: Subtract a predefined margin M a (default: 10 dB) from the difference between S N R a v g and S N R t h to determine the SNR margin S N R m a r g i n , i.e., S N R m a r g i n = S N R a v g S N R t h M a .
Step 3: Adjust the current transmission power level l t x based on S N R m a r g i n . If S N R m a r g i n > 0 , it suggests that l t x can be reduced without compromising communication reliability.
Given the dynamic nature of the link environment, relying on a fixed neighbor count, n , may lead to inaccurate decisions. To overcome this limitation, PRM adaptively adjusts n based on link variability. Specifically, the DEN randomly selects n neighbors from its routing table and calculates the volatility rate of the link environment, R c h a n g e , using Equation (13).
R c h a n g e = 1 n i = 1 n R S S I SL i R S S I L i R S S I t h R S S I L i ,
where R S S I L i and R S S I S L i denote the RSSI of the last and penultimate signals received by DEN from the i - th EN, respectively.
Higher values of R c h a n g e indicate a more volatile link environment. When R c h a n g e exceeds the threshold R c h a n g e , to enhance decision accuracy, PRM increases n by 1 to incorporate diverse information from additional neighbors into the decision-making process. This adjustment repeats until either R c h a n g e R c h a n g e or n N n , thereby achieving an adaptive balance between decision accuracy and computational overhead. The initial empirical value of n is set at 3 according to [29,30]. The detailed PRM workflow is described in Algorithm 2.
Algorithm 2. Power Regulation Mechanism
Input:  S N R t h ,   the   upper   limit   l t x max   and   lower   limit   l t x min   of   l t x   ( with   initial   value   l t x = l t x max ) ,   SNR   and   RSSI   of   received   signals   from   neighbors ,   the   number   of   neighbors   N n
Output :   l t x
1:Randomly select   n   neighbors   and   calculate   R c h a n g e by Equation (13)
2: if   N n 3   then
3:       l t x = l t x max
4:else
5:       if   R c h a n g e > R c h a n g e   and   n < N n   then
6:               n = n + 1
7:      Go to line 1
8:   end if
9:end if
10: calculate   S N R a v g   and   S N R m a r g i n
11: N S t e p r o u n d S N R m a r g i n / 3
12: while   N S t e p > 0   and   l t x > l t x min   do
13:       l t x = l t x 1   and   N S t e p = N S t e p 1
14:end while

5.2. DQN-Based Routing Decision Mechanism

At time step t , all ENs that respond to a REQ packet form the candidate relay set t c f , from which DEN selects the optimal next-hop router. However, an EN selected as a relay will consume more energy, which may lead to unbalanced energy distribution across the network and potentially cause coverage holes. Furthermore, selecting candidates with high RFRV increases transmission disruption probability. Given these, we propose a low-latency, long-lifetime, and high-success-rate routing decision mechanism. The optimal relay selection is formulated as follows:
max e r e s i , t min R F R V i , t min d s t i min h o p s . t . E N i t c f
where e r e s i , t denotes the residual energy of E N i at time step t , d s t i is the distance between E N i and the GW, and h o p is the packet transmission hop count.

5.2.1. MDP Model for FRDR

Given the dynamic nature of network conditions, the routing decision process in FRDR is modeled as an MDP and solved using DQN. The overall framework is illustrated in Figure 4.
By modeling DEN as an agent, the corresponding states, actions, and reward functions are defined as follows:
  • States: The state integrates hop count and features of EN to form a unified vector s t 3 N + 1 . To handle dynamic fluctuations in the number of candidate relays, FRDR employs a feature masking mechanism. For each E N i at time step t , its feature vector is defined as follows:
    f i , t = d s t i / L , e r e s i , t / e i n i t , R F R V i , t , E N i t c f 1 , 0 , 1 , o t h e r w i s e
    Based on the current hop count h t , the overall state s t is expressed as follows:
    s t = h t , i = 1 N 1 , 0 , 1 T , f a i l u r e h t , i = 1 N 0 , 1 , 0 T , s u c c e s s h t , i = 1 N f i , t T , i n t e r m e d i a t e
    where denotes vector concatenation.
  • Actions: By executing action a t at time step t , the agent selects the corresponding EN as the next-hop router, i.e., a t = i i 1 , 2 , , N indicates that E N i is chosen as the relay.
  • Reward Function: To determine the optimal relay in Equation (14), the reward function in FRDR is designed to guide the agent towards solutions that maximize residual energy, minimize RFRV, reduce distance to the GW, and minimize hop count. It is defined as follows:
    r t = R max , C 1 R max , C 2 r ^ t , C 3
    In Equation (17), C 1 represents successful packet delivery to the GW, for which a positive reward R max is granted. Conversely, C 2 denotes transmission disruption, which is penalized with R max . All other cases fall under C 3 , where the composite reward r ^ t implements the optimization objectives from Equation (14) through specific reward components:
    r t 1 = d s t i / L ,
    r t 2 = e r e s i , t / e i n i t ,
    r t 3 = R F R V i , t ,
    r t 4 = h t / H max ,
    Note that r t 1 , r t 2 , and r t 3 pertain to the attributes of ENs, while r t 4 is a path attribute. By integrating Equations (18)–(21), the composite reward r ^ t is derived as Equation (23). Here, r ˜ t 1 and r ˜ t 3 are the normalized versions of r t 1 and r t 3 via Equation (7), while r t 2 is normalized using Equation (22).
    x ˜ = x max x / x max x min ,
    r ^ t = i = 1 3 μ i r ˜ t i η r t 4 ,
    The weights μ i in Equation (23) are calculated via AHP using Equations (8)–(10). The pairwise comparison matrix for decision criteria in Equation (23) is presented in Table 4, with a CR of 0.0158. This CR value is well below the threshold of 0.1, confirming the logical coherence of the pairwise comparisons and the reliability of the weight results.

5.2.2. DQN Architecture

The DQN architecture implemented in FRDR is detailed as follows:
1.
Input layer: A fully connected (FC) layer is used as the input layer. The input feature dimension is set to 3 N + 1 , corresponding to the pre-masked state s t .
2.
Hidden layer: The hidden layer comprises two FC layers with 64 and 32 neurons, respectively. For each FC layer, the Leaky Rectified Linear Unit (Leaky ReLU) activation function with a negative slope coefficient of 0.01 is employed. Moreover, the backpropagation gradients from non-candidate ENs are set to zero.
3.
Output layer: We define an FC layer with N neurons as the output layer to generate raw Q-values Q r a w , where a linear activation function is utilized. Feature-based masking is then applied to compute final values by Equation (24).
Q = Q r a w , E N i t c f 10 5 , o t h e r w i s e

5.2.3. Network Training and Routing Decision

At time step t , the agent gathers state information from candidate relays and constructs the state vector s t according to Equation (16). s t is fed into the DQN, which outputs Q-values corresponding to each EN. The agent then selects action a t using an annealing ε - g r e e d y strategy, where ε decays as follows [31]:
ε = ε e n d + ε s t a r t ε e n d exp τ N e p s n o w ,
where ε s t a r t and ε e n d represent the initial and terminal values of ε , respectively. N e p s n o w denotes the number of current training iterations, while τ is the attenuation rate. With probability 1 ε , the agent exploits by selecting the EN associated with the maximum Q-value. During exploration, a Weighted Probability Selection method that prioritizes candidates with lower RFRV is employed to enhance efficiency. The selection probability p ε i for E N i during exploration is given by:
p ε i = 1 R F R V i E N i t c f R F R V i , E N i t c f 0 , o t h e r w i s e
Upon determining a t , DEN forwards the data packet to the corresponding EN. At the next time step, the state transitions to s t + 1 , and the environment provides the agent with reward r t computed via Equation (17). The agent then constructs the transition tuple s t , a t , r t , s t + 1 and stores it in replay memory M . When M accumulates a sufficient number of samples, the agent randomly samples a minibatch from M every C exp timesteps to train the DQN by minimizing the loss L ¯ ω defined in Equation (2). Additionally, the target network is periodically synchronized with the evaluation network every C t timesteps.
The overall framework of the DQN-based routing decision mechanism is illustrated in Algorithm 3. Notably, the learning process is conducted in the virtual environment configured on computers to avoid high-performance demands on EN entities.
Algorithm 3. DQN-Based Routing Decision Mechanism
Input :   ε s t a r t ,   ε e n d ,   τ ,   γ ,   α ,   experience   replay   update   frequency   C exp ,   target   update   frequency   C t ,   minibatch   size   B ,   maximum   training   episodes   N e p s max ,   maximum   iterations   N i t e r max
Initialize: replay memory M ,   experience   counter   N exp = 0 ,   evaluation   network   with   random   weights   ω ,   target   network   with   weights   ω - = ω
Offline Learning
1: for   N e p s = 1 : N e p s max   do
2: for   N i t e r = 1 : N i t e r max   do
3:  Event-related data packet generated
4:  while GW is out of one-hop range do
5:    Determine   the   candidate   relay   set   c f and   calculate   RFRV   for   each   E N c f by Equation (4)
6:    if   c f =   then
7:    Go to line 16
8:   end if
9:   Formulate state vector  s by Equation (16)
10:   Get Q-values for   s in the evaluation network
11:   Select action   a   via   annealing   ε - g r e e d y strategy
12:   Forward data packet according to the action   a
13:   Perform lines 16–25
14:  end while
15:  Send the data packet to the GW
16:   Compute the reward   r by Equation (17)
17:  State transitions to   s
18:   Store   s ¯ , a , r , s ¯   into   M   and   set   N exp = N exp + 1
19:   if   M B   and   mod N exp , C exp = 0   then
20:    Sample   a   random   minibatch   s , a , r , s   from   M
21:    Update   target   Q - values   of   all   samples   with   y s , r = r + γ max a A Q s , a ; ω , C 3 r , o t h e r w i s e
22:    Compute   L ¯ ω by Equation (2)
23:   Update   ω in the evaluation network using SGD
24:  end if
25:   if   mod N exp , C t = 0   then   ω - = ω
26:end for
27:end for
Output: Evaluation network with ω
Online Decision
Input: Trained evaluation network with ω
28:DEN determines the set of candidate next-hop ENs and calculates RFRV for each candidate according to Equation (4)
29:DEN constructs a state vector
30:Input the current state into the evaluation network, and output the optimal action with maximum Q-value
31:Forward the data packet according to the optimal action

6. Simulation Results and Analysis

In this section, we conduct extensive experiments to evaluate the performance of FRDR using MATLAB R2020b.

6.1. Simulation Models

It is assumed that each EN sends packets to the GW either directly or via intermediate hops using the LoRa protocol. The spreading factor (SF), bandwidth (BW), and coding rate R c are fixed at SF = 7, BW = 125 kHz, and R c = 1 , respectively. The low data rate optimization is not enabled by default, while explicit header type (H = 0) and CRC are adopted. These parameter settings determine the simulation models adopted in the experiments, including those for packet transmission time, energy consumption, path loss, etc.

6.1.1. Packet Transmission Time Measurement Model

The time for ENs to transmit a packet, T o A , is computed as follows [32]:
T o A = n p r + 4.25 + 8 + P 1 2 S F / B W ,
P 1 = max 8 n p l 4 S F + 28 + 16 C R C 20 H 4 S F R c + 4 , 0 ,
where n p r and n p l denote the symbol number of the preamble and payload, respectively.

6.1.2. Energy Consumption Measurement Model

Given that energy consumption in the dormant state is significantly lower than that in other transceiver states, the energy consumption of an EN, denoted as E c , can be simplified as the sum of transmit and receive energies [2]:
E c = V DD I t x T t x + I r x T r x ,
where V DD is the nominal voltage, I t x and I r x denote the transmitting and receiving currents, respectively. T t x and T r x indicate the duration of transmitting and receiving, satisfying T t x = T r x = T o A . According to [33], these parameters are configured as specified in Table 5. Additionally, P T denotes the transmit power at the transmission power level l t x .

6.1.3. Path Loss Measurement Model

Results presented in this study were computed at carrier frequency f = 868   MHz and the path loss model defined in [33]:
P L dB = 32.45 + 30 e p l 2 + 20 lg f MHz + e p l 10 lg d km + δ [ dB ] ,
where P L is the path loss at distance d , e p l is the path loss index, and δ Ν 0 , σ 2 models random channel fluctuations resulting from shadowing. In this study, we use e p l = 5 and σ = 3   dB .

6.1.4. RSSI and SNR Measurement Models

Based on Equation (30), RSSI is computed as a function of P T :
R S S I dBm = P T dBm + G A T dB + G A R dB P L dB ,
where G A T and G A R denote the transmitting and receiving antenna gains, respectively. In this study, we set G A T = G A R = 3   dBi .
During data transmission, the noise power P n of each EN is considered in the calculation of SNR [29].
S N R dB = P r dBm P n dBm ,
P n dBm = 10 lg T r + T b B W κ + 30 ,
where T b is the background temperature, typically set to 290 K, κ = 1.379 × 10 23   J K - 1 is the Boltzmann Constant, and T r is the receiver temperature, given by the following expression:
T r = ( 10 N F / 10 1 ) T b ,
where N F is the noise figure of the receiver. According to [29], the parameters are set as N F = 6   dB , R S S I t h = 124.5   dBm , and S N R t h = 7.5   dB .

6.2. Simulation Setup

In this section, a comparison among FRDR, Minimum Hop Routing (MHR), and DQIR [15] is performed. MHR operates as a distributed routing algorithm that selects relays based on the minimum hop count to the GW. In DQIR, next-hop selection from candidate routers is conducted by a DQN-based routing protocol. Notably, to achieve a more equitable comparison, the learning rate and replay memory in DQIR are adjusted to 0.01 and 5000, respectively, through extensive hyperparameter tuning.
To comprehensively evaluate the performance of FRDR, we introduce three self-contrasting algorithms detailed in Table 6.
Additionally, to ensure fair benchmarking, all compared algorithms adopt the same pre-selection rules as FRDR and utilize the SPIN-based interaction process to determine candidate routers.
The dataset generation framework introduced in [34] is applied to deploy ENs in a stochastic and nonuniform manner across a 1   km × 1   km area, while a GW is located at the center. The initial energy of all ENs is fixed at 0.5 mAh. During each iteration, a source EN is randomly selected from ENs, and a data packet is transmitted from this source EN to the GW under different multi-hop routing protocols. Additional network parameters are detailed in Table 7.
During the offline training of DQN, we execute 100 training episodes, and each episode comprises 1000 complete packet transmission simulations. For each simulation, a source EN is randomly chosen from the monitoring area, which then transmits a data packet toward the GW via single-hop or multi-hop routing. Upon transmission completion (either successful delivery to the GW or failure), the simulation proceeds immediately to the next packet transmission. Other specific DQN parameters are detailed in Table 8.

6.3. Performance Analysis

The performance metrics employed in our simulation are packet delivery rate (PDR), mean transmission delay (MTD), mean number of transmission hops (MTHs), mean energy consumption for delivering a data packet (MECP), and network lifetime. The network lifetime can be commonly measured in terms of the first node dead (FND), half node dead (HND), and the last node dead (LND). However, only HND is adopted in our simulation as the network was disabled long before LND, while the death of the first EN had little impact on network performance.
We randomly select an experimental scenario with 300 ENs from our simulations and use the routing process of the first data packet after network deployment to visually demonstrate the FRDR protocol in Figure 5. To clarify the relay selection mechanism of FRDR, Figure 5 further illustrates the spatial distribution of candidate ENs for Router1. Specifically, the source EN first adjusts its transmission power level according to PRM, thereby optimizing the EN activation range based on the average SNR of received signals from neighbors. In this case, the source EN broadcasts an ADV packet at Level 6. Subsequently, ENs that receive the ADV packet and satisfy pre-selection rules form the candidate set for Router1. Table 9 quantitatively summarizes their critical attributes, including ID, RFRV, distance to the GW, and residual energy. Then, by utilizing the DQN-based routing decision mechanism (Algorithm 3), the source EN determines Router1 from the candidates. Experimental result indicates that EN167, the candidate EN exhibiting the lowest RFRV and shortest distance to the GW, is selected as Router1. This outcome is consistent with the relay selection objective of FRDR defined in Equation (14).
With the identical configuration as Figure 5, Figure 6 further compares routing processes across different multi-hop routing algorithms. As shown in Figure 6, both FRDR and PRRS exhibit fewer candidate ENs for Router1. This reduction is primarily attributed to the PRM, which adjusts EN activation ranges based on the average SNR of received signals from neighbors. By utilizing PRM, the source EN in FRDR and PRRS reduces transmission power to Level 6, thereby confining the ADV broadcast range. In contrast, non-PRM protocols broadcast ADV packets at maximum power (Level 7), which can easily cause redundant EN activation (e.g., EN263 and EN244). Following candidate screening, each algorithm applies distinct criteria for final relay selection. Notably, PFRS and PRRS exhibit higher hop counts due to random relay selection, whereas FRDR, MHR, DQIR, and PFRD achieve lower hop counts through effective selection rules. Specifically, MHR selects relays via pre-stored routing tables guided by minimum hop counts, while DQIR prioritizes relays that minimize distance to the GW and balance residual energy distribution. FRDR and PFRD incorporate multi-dimensional neighborhood state characteristics of candidate ENs into routing decisions, effectively avoiding relays that introduce high routing failure risk (e.g., Router1 in MHR and Router3 in DQIR).
First, the performance analysis of the proposed PRM and Algorithm 3 is provided.
Figure 7a demonstrates that PRM effectively reduces energy consumption. By employing PRM, DENs in FRDR and PRRS adjust transmission power levels according to demand, thereby reducing redundant EN activations. Consequently, the lifetime of individual EN can be extended, which in turn enhances the overall network lifetime and PDR. Figure 7b further illustrates that the PDR curves of PRRS and FRDR decline more slowly than PFRS and PFRD. Specifically, when the PDR of PFRD and PFRS drops to 0.80, FRDR and PRRS maintain values of 0.88 and 0.85, representing improvements of 10.00% and 6.25%, respectively. Additionally, as depicted in Figure 7c, both FRDR and PRRS achieve higher HND than their respective counterparts, which confirms the effectiveness of PRM in extending network lifetime.
Table 10 presents a comparison of MTH, MTD, and MECP for delivering the first 1000 packets, during which the PDR of each algorithm remains at a relatively high level. It reveals that PRM introduces a slight increase in transmission delay. Compared to PFRD and PFRS, the MTH of FRDR and PRRS increased by 0.20 and 0.30, respectively. This increase is attributed to the fact that the execution of PRM prevents the single-hop range from consistently reaching its maximum, potentially increasing the number of hops required for data delivery. Nevertheless, through the effective combination with pre-selection rules, PRM further amplifies the benefits of reducing redundant EN activations, thereby significantly decreasing the delay and energy consumption associated with REQ packet reception. As a result, the impact of these additional hops on overall delay is minimal. Specifically, the 0.14 s increase in MTD for FRDR constitutes only 3.26% of its total MTD, while for PRRS, it accounts for 3.13%. Overall, PRM achieves an effective balance between transmission efficiency and other critical performance metrics, including energy consumption, PDR, and network lifetime.
As for Algorithm 3, Figure 7b clearly illustrates its superiority in PDR. Specifically, when the PDR of PRRS and PFRS decreases to 0.80, FRDR and PFRD sustain values of 0.96 and 0.94, achieving improvements of 20.00% and 17.5%. These improvements arise from the multi-factor routing strategy of Algorithm 3, where RFRV, distance to the GW, transmission hops, and residual energy are considered in tandem. Therefore, Algorithm 3 effectively reduces transmission disruption and enables faster delivery to the GW while balancing energy consumption. The results presented in Figure 7c and Table 10 further confirm the advantage of Algorithm 3. In terms of HND, FRDR improved by 16.81%, while PFRD achieved a growth rate of 9.46%. Additionally, during the whole network lifetime, FRDR achieves reductions of 21.48%, 25.35%, and 24.64% in MTH, MTD, and MECP, respectively, compared to PRRS, while PFRD demonstrates reductions of 20.75%, 25.45%, and 26.03%.
Second, to fully illustrate the superiority of FRDR, a comparison among FRDR, MHR, and DQIR is presented.
The comparisons of PDR and residual energy of the network among different EN densities are depicted in Figure 8 and Figure 9. It is evident that FRDR significantly outperforms MHR and DQIR by maintaining a higher PDR and reducing energy consumption. Moreover, Table 11 provides a comparison of network lifetime, while a more detailed comparison of MTH, MTD, and MECP across different EN densities is presented in Table 12. They demonstrate that FRDR achieves the longest network lifetime while maintaining comparable transmission delay. These improvements are attributed to the integration of PRM and the DQN-based multi-factor routing strategy, which dynamically adjusts activation range to reduce redundant activation and optimizes routing decisions based on RFRV, residual energy, transmission hops, and the distance to the GW.
MHR focuses solely on minimizing transmission hops, which contributes to its superiority in MTH, as shown in Table 12. However, to achieve this goal, the maximum transmission power is fixed in MHR, which leads to higher redundant activation than FRDR, particularly as EN density increases. This increased redundancy diminishes the delay and energy efficiency advantages gained by minimizing transmission hops, as higher reception delay and energy consumption occur during REQ reception. Table 12 further reveals that MHR results in a higher MECP than FRDR, while achieving a marginal reduction in MTD. Moreover, the exclusive consideration of hop count in MHR inevitably leads to hotspot issues due to the overutilization of partial ENs, which in turn leads to a shorter network lifetime and lower PDR. Conversely, by integrating RFRV and residual energy into routing decisions, FRDR effectively avoids routers that will introduce high routing failure risk and realizes a more balanced energy distribution. As a result, FRDR achieves a higher network lifetime and PDR. Specifically, when the PDR of MHR drops to 0.80, FRDR maintains a higher PDR, achieving improvements of 14.71%, 15.69%, and 18.90% at EN densities of 300, 350, and 400, respectively. Consequently, compared to MHR, FRDR effectively improves the PDR and network lifetime while maintaining a comparable transmission delay.
As for DQIR, multiple factors, including residual energy and distance to the GW, are considered when selecting the next-hop router from candidate ENs to minimize delay and balance energy distribution. Table 12 indicates that DQIR achieves lower MTH at EN densities of 350 and 400 compared to FRDR. However, DQIR requires all ENs that receive broadcast information to transmit a message to a designated agent for routing decisions. Although this method offloads the reception energy consumption from DEN to an additional agent without energy constraints, leading to more balanced energy consumption, the excessive overhead from replies significantly increases energy consumption and delay. In contrast, through the combination of PRM and pre-selection rules, FRDR effectively reduces redundant transmissions by dynamically adjusting activation ranges and requiring only ENs that meet the pre-selection rules to respond. As a result, compared to DQIR, FRDR achieves lower MTD and MECP, as well as a higher network lifetime. Moreover, by considering RFRV, FRDR effectively avoids selecting candidate routers that will introduce high routing failure risk, which further enhances the performance of PDR. Specifically, when the PDR of DQIR drops to 0.80, FRDR maintains a higher PDR, achieving improvements of 18.90%, 20.68%, and 21.96% at EN densities of 300, 350, and 400, respectively.
To summarize, the performance superiority of FRDR mainly comes from the PRM and DQN-based routing decision mechanism. PRM dynamically adjusts activation ranges, which works with pre-selection rules, further reducing unnecessary reception overhead. Meanwhile, the RFRV, in conjunction with other factors such as residual energy, distance to the GW, and transmission hops, is integrated into the DQN-based routing decision mechanism, effectively reducing transmission disruption and enabling faster delivery to the GW while balancing energy consumption. Consequently, FRDR significantly enhances PDR and network lifetime while maintaining a comparable transmission delay.

7. Conclusions

In this paper, we proposed a novel multi-hop routing protocol for LPWANs, named FRDR, which aims to reduce transmission disruption probability. FRDR comprehensively considered RFRV, distance to the GW, residual energy, and transmission hops as routing criteria, thereby deriving a low-latency, long-lifetime, and high-success-rate routing decision policy through a DQN-based framework. Simulation results confirmed that, compared with MHR and DQIR, our FRDR significantly reduces transmission disruption probability and extends network lifetime while maintaining a comparable delay. Specifically, when the PDR of MHR and DQIR drops to 0.80, FRDR maintains a higher PDR, achieving a minimum improvement of 14.71% and 18.90%, respectively.
Our current research focuses on multi-hop routing optimization under standardized scenarios with generalized assumptions. However, real-world deployments introduce non-ideal factors such as edge-positioned GWs, mobile GWs, and asymmetric link conditions, which significantly impact protocol performance. Consequently, future study will focus on addressing challenges arising from these non-ideal factors to enhance the robustness and scalability of the routing protocol in practical deployments. Additionally, field trials will be conducted to evaluate the practical feasibility and performance of the proposed protocol.

Author Contributions

Conceptualization, S.T.; methodology, S.T.; validation, S.T.; formal analysis, S.T.; data curation, S.T.; writing—original draft preparation, S.T.; writing—review and editing, S.T., H.T., J.W. and B.L.; supervision, J.W. and B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to express their sincere gratitude to Xue Zhao for her contributions to data curation and valuable discussions throughout the development of this study. We also sincerely thank the reviewers for their critical comments and suggestions for improving this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ADRAdaptive Data Rate algorithm
ADVAdvertisement
AHPAnalytical Hierarchy Process
BWBandwidth
CIConsistency Index
CRConsistency Ratio
DENData-holding End-device Node
DNNDeep Neural Network
DQNDeep Q-Network
DQIRDQN-based Intelligent Routing protocol
ENEnd-device Node
FNDFirst Node Dead
FRDRFailure Risk-aware Deep Q-network-based multi-hop routing protocol
GPSGlobal Positioning System
GWgateway
HNDHalf Node Dead
LNDLast Node Dead
LPWANLow-Power Wide-Area Network
MDPMarkov Decision Process
MECPMean Energy Consumption for Delivering a Data Packet
MHRMinimum Hop Routing protocol
MTDMean Transmission Delay
MTHMean Number of Transmission Hops
PDRPacket Delivery Rate
PRMPower Regulation Mechanism
REQRequest
RFRVRouting Failure Risk Value
RIRandom Consistency Index
RLReinforcement Learning
RSSIReceived Signal Strength Indicator
SFSpreading Factor
SGDStochastic Gradient Descent
SNRSignal-to-Noise Ratio
SPINSensor Protocol for Information via Negotiation

References

  1. Misbahuddin, M.; Iqbal, M.S.; Budiman, D.F.; Wiriasto, G.W.; Akbar, L.A.S.I. EAM-LoRaNet: Energy aware multi-hop LoRa network for Internet of Things. Kinetik 2022, 7, 81–90. [Google Scholar] [CrossRef]
  2. Barrachina-Munoz, S.; Bellalta, B.; Adame, T.; Bel, A. Multi-hop communication in the uplink for LPWANs. Comput. Netw. 2017, 123, 153–168. [Google Scholar] [CrossRef]
  3. Guo, Z.; Chen, H. A reinforcement learning-based sleep scheduling algorithm for cooperative computing in event-driven wireless sensor networks. Ad Hoc Netw. 2022, 130, 102837. [Google Scholar] [CrossRef]
  4. Fang, W.; Zhu, C.; Zhang, W. Toward secure and lightweight data transmission for cloud–edge–terminal collaboration in artificial intelligence of things. IEEE Internet Things J. 2024, 11, 105–113. [Google Scholar] [CrossRef]
  5. Sharma, N.; Thota, V.S.P.; Tankala, Y.; Tripathi, S.; Pandey, O.J. OptRISQL: Toward performance improvement of time-varying IoT networks using Q-learning. IEEE Trans. Netw. Serv. 2024, 21, 3008–3020. [Google Scholar] [CrossRef]
  6. Wong, A.W.-L.; Goh, S.L.; Hasan, M.K.; Fattah, S. Multi-hop and mesh for LoRa networks: Recent advancements, issues, and recommended applications. ACM Comput. Surv. 2024, 56, 136. [Google Scholar] [CrossRef]
  7. Fang, W.; Zhu, C.; Guizani, M.; Rodrigues, J.J.P.C.; Zhang, W. HC-TUS: Human cognition-based trust update scheme for AI-enabled VANET. IEEE Netw. 2023, 37, 247–252. [Google Scholar] [CrossRef]
  8. Zolfaghari, D.; Taheri, H.; Rezaie, A.H.; Rezaei, M. A robust and reliable routing based on multi-hop information in industrial wireless sensor networks. Int. J. Ad Hoc Ubiquitous Comput. 2015, 19, 29–37. [Google Scholar] [CrossRef]
  9. Li, J.; Wang, M.; Zhu, P.; Wang, D.; You, X. Highly reliable fuzzy-logic-assisted AODV routing algorithm for mobile ad hoc networks. Sensors 2021, 21, 5965. [Google Scholar] [CrossRef]
  10. Xu, J.; Zhang, Y.; Jiang, J.; Kan, J. A multi-hop routing protocol based on link state prediction for intra-body wireless nanosensor networks. Ad Hoc Netw. 2021, 116, 102470. [Google Scholar] [CrossRef]
  11. Fang, W.; Zhang, W.; Yang, W.; Li, Z.; Gao, W.; Yang, Y. Trust management-based and energy efficient hierarchical routing protocol in wireless sensor networks. Digit. Commun. Netw. 2021, 7, 470–478. [Google Scholar] [CrossRef]
  12. Fang, W.; Zhu, C.; Yu, F.R.; Wang, K.; Zhang, W. Towards energy-efficient and secure data transmission in AI-enabled software defined industrial networks. IEEE Trans. Ind. Inf. 2022, 18, 4265–4274. [Google Scholar] [CrossRef]
  13. Mukhutdinov, D.; Filchenkov, A.; Shalyto, A.; Vyatkin, V. Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system. Future Gener. Comp. Syst. 2019, 94, 587–600. [Google Scholar] [CrossRef]
  14. Yang, X.; Yan, J.; Wang, D.; Xu, Y.; Hua, G. WOAD3QN-RP: An intelligent routing protocol in wireless sensor networks—A swarm intelligence and deep reinforcement learning based approach. Expert Syst. Appl. 2024, 246, 123089. [Google Scholar] [CrossRef]
  15. Geng, X.; Zhang, B. Deep Q-network-based intelligent routing protocol for underwater acoustic sensor network. IEEE Sens. J. 2023, 23, 3936–3943. [Google Scholar] [CrossRef]
  16. Pandey, O.J.; Yuvaraj, T.; Paul, J.K.; Nguyen, H.H.; Gundepudi, K.; Shukla, M.K. Improving energy efficiency and QoS of LPWANs for IoT using Q-learning based data routing. IEEE Trans. Cognit. Commun. Netw. 2022, 8, 365–379. [Google Scholar] [CrossRef]
  17. Chilamkurthy, N.S.; Karna, N.; Vuddagiri, V.; Tiwari, S.K.; Ghosh, A.; Cenkeramaddi, L.R.; Pandey, O.J. Energy-efficient and QoS-aware data transfer in Q-learning-based small-world LPWANs. IEEE Internet Things J. 2023, 10, 22636–22649. [Google Scholar] [CrossRef]
  18. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
  19. Chang, Y.; Tang, H.; Li, B.; Yuan, X. Distributed joint optimization routing algorithm based on the analytic hierarchy process for wireless sensor networks. IEEE Commun. Lett. 2017, 21, 2718–2721. [Google Scholar] [CrossRef]
  20. Qu, Z.; Xu, H.; Zhao, X.; Tang, H.; Wang, J.; Li, B. A fault-tolerant sensor scheduling approach for target tracking in wireless sensor networks. Alex. Eng. J. 2022, 61, 13001–13010. [Google Scholar] [CrossRef]
  21. Yao, Y.D.; Li, H.C.; Zeng, Z.B.; Wang, C.; Zhang, Y.Q. Clustering routing protocol based on tuna swarm optimization and fuzzy control theory in wireless sensor networks. IEEE Sens. J. 2024, 24, 17102–17115. [Google Scholar] [CrossRef]
  22. Liu, X.; Cao, Q.; Jin, B.; Zhou, P. CNCMSA-ERCP: An innovative energy-efficient clustering routing protocol for improving the performance of industrial IoT. IEEE Internet Things J 2025, 12, 11827–11840. [Google Scholar] [CrossRef]
  23. Can, G.F.; Toktas, P.; Pakdil, F. Six sigma project prioritization and selection using AHP-CODAS integration: A case study in healthcare industry. IEEE Trans. Eng. Manag. 2023, 70, 3587–3600. [Google Scholar] [CrossRef]
  24. Fang, W.; Cui, N.; Chen, W.; Zhang, W.; Chen, Y. A Trust-Based Security System for Data Collection in Smart City. IEEE Trans. Ind. Inf. 2021, 17, 4131–4140. [Google Scholar] [CrossRef]
  25. Saaty, T.L. Decision making with the analytic hierarchy process. Int. J. Serv. Sci. 2008, 1, 83–98. [Google Scholar] [CrossRef]
  26. Saaty, R.W. The analytic hierarchy process—What it is and how it is used. Math. Model. 1987, 9, 161–176. [Google Scholar] [CrossRef]
  27. Saaty, T.L. Theory and Applications of the Analytic Network Process: Decision Making with Benefits, Opportunities, Costs, and Risks, 3rd ed.; RWS publications: Pittsburgh, PA, USA, 2005. [Google Scholar]
  28. Kulik, J.; Heinzelman, W.; Balakrishnan, H. Negotiation-based protocols for disseminating information in wireless sensor networks. Wirel. Netw. 2002, 8, 169–185. [Google Scholar] [CrossRef]
  29. Jiang, C.; Yang, Y.; Chen, X.; Liao, J.; Song, W.; Zhang, X. A new-dynamic adaptive data rate algorithm of LoRaWAN in harsh environment. IEEE Internet Things J. 2022, 9, 8989–9001. [Google Scholar] [CrossRef]
  30. Benkahla, N.; Tounsi, H.; Ye-Qiong, S.; Frikha, M. Enhanced ADR for LoRaWAN networks with mobility. In Proceedings of the 15th International Wireless Communications & Mobile Computing Conference, Tangier, Morocco, 24–28 June 2019. [Google Scholar]
  31. Hassen, H.; Meherzi, S.; Jemaa, Z.B. Improved exploration strategy for Q-learning based multipath routing in SDN networks. J. Netw. Syst. Manag. 2024, 32, 25. [Google Scholar] [CrossRef]
  32. Milarokostas, C.; Tsolkas, D.; Passas, N.; Merakos, L. A comprehensive study on LPWANs with a focus on the potential of LoRa/LoRaWAN systems. IEEE Commun. Surv. Tutor. 2023, 25, 825–867. [Google Scholar] [CrossRef]
  33. Marini, R.; Mikhaylov, K.; Pasolini, G.; Buratti, C. LoRaWANSim: A flexible simulator for LoRaWAN networks. Sensors 2021, 21, 695. [Google Scholar] [CrossRef] [PubMed]
  34. Sah, D.K.; Cengiz, K.; Donta, P.K.; Inukollu, V.N.; Amgoth, T. EDGF: Empirical dataset generation framework for wireless sensor networks. Comput. Commun. 2021, 180, 48–56. [Google Scholar] [CrossRef]
Figure 1. Routing diagram with and without RFRV.
Figure 1. Routing diagram with and without RFRV.
Sensors 25 04416 g001
Figure 2. Overall flowchart of FRDR.
Figure 2. Overall flowchart of FRDR.
Sensors 25 04416 g002
Figure 3. Diagram of SPIN-based message interaction flow.
Figure 3. Diagram of SPIN-based message interaction flow.
Sensors 25 04416 g003
Figure 4. Framework of DQN-based routing decision mechanism.
Figure 4. Framework of DQN-based routing decision mechanism.
Sensors 25 04416 g004
Figure 5. Simulation experiment scenario diagram of FRDR.
Figure 5. Simulation experiment scenario diagram of FRDR.
Sensors 25 04416 g005
Figure 6. Simulation experiment scenario diagram of multi-hop routing: (a) FRDR, (b) MHR, (c) DQIR, (d) PFRS, (e) PFRD, and (f) PRRS.
Figure 6. Simulation experiment scenario diagram of multi-hop routing: (a) FRDR, (b) MHR, (c) DQIR, (d) PFRS, (e) PFRD, and (f) PRRS.
Sensors 25 04416 g006
Figure 7. Comparisons between FRDR, PRRS, PFRD, and PFRS in terms of the residual energy of the network, PDR, and HND: (a) residual energy of the network, (b) PDR, and (c) HND.
Figure 7. Comparisons between FRDR, PRRS, PFRD, and PFRS in terms of the residual energy of the network, PDR, and HND: (a) residual energy of the network, (b) PDR, and (c) HND.
Sensors 25 04416 g007
Figure 8. Residual energy of the network under different densities: (a) 300, (b) 350, and (c) 400.
Figure 8. Residual energy of the network under different densities: (a) 300, (b) 350, and (c) 400.
Sensors 25 04416 g008
Figure 9. PDR under different densities: (a) 300, (b) 350, and (c) 400.
Figure 9. PDR under different densities: (a) 300, (b) 350, and (c) 400.
Sensors 25 04416 g009
Table 1. 1–9 scale.
Table 1. 1–9 scale.
ScaleNumerical RatingReciprocal
Equally importance11
Slight importance21/2
Moderate importance31/3
Moderate to strong importance41/4
Strong importance51/5
Strong to very strong importance61/6
Very strong importance71/7
Very strong to extreme importance81/8
Extreme importance91/9
Table 2. Random consistency index.
Table 2. Random consistency index.
k12345678910
RI000.580.91.121.241.321.411.451.49
Table 3. Pairwise comparison matrix for three criteria in RFRV.
Table 3. Pairwise comparison matrix for three criteria in RFRV.
E n N n L Q
E n 123
N n 1/212
L Q 1/31/21
Table 4. Pairwise comparison matrix for three criteria in r ^ t .
Table 4. Pairwise comparison matrix for three criteria in r ^ t .
r ˜ t 1 r ˜ t 2 r ˜ t 3
r ˜ t 1 124
r ˜ t 2 1/213
r ˜ t 3 1/41/31
Table 5. Energy consumption parameters.
Table 5. Energy consumption parameters.
l t x 7654321
P T [dBm]1412108642
I t x [mA]3835.132.43027.524.722.3
I r x [mA]14.2
V DD [V]3.3
Table 6. Self-contrasting algorithms.
Table 6. Self-contrasting algorithms.
Transmit Power LevelRouting Decision Mechanism
PFRS Transmit   with   l t x max Random selection
PFRD Transmit   with   l t x max Algorithm 3
PRRSPRMRandom selection
FRDRPRMAlgorithm 3
Table 7. Network parameters.
Table 7. Network parameters.
ParametersValueParametersValue
N 300/350/400 e t h 100 mJ
R c h a n g e 0.5 n p r 8 symbols
n p l of ADV/REQ1 B n p l of data packet300 B
Table 8. Parameters of DQN.
Table 8. Parameters of DQN.
ParametersValueParametersValue
N e p s m a x 100 N i t e r m a x 1000
α 0.009 γ 0.95
M 5000 B 64
C exp 10 C t 400
R max 1 τ 0.2
η 2 H max 10
ε s t a r t 0.5 ε e n d 0.01
Table 9. Critical attributes of candidate EN for Router1.
Table 9. Critical attributes of candidate EN for Router1.
IDRFRVDistance (m)Residual Energy (J)
10.1648499.06845.9358
780.3042461.90265.9358
880.2711559.56035.9358
930.4604534.84175.9358
1670.1247436.67325.9358
1690.1389444.52975.9358
2450.2584529.71915.9358
2520.3133542.95915.9358
2760.2191500.70715.9358
2900.2280550.72585.9358
Table 10. MTH, MTD, and MECP for delivering the first 1000 packets.
Table 10. MTH, MTD, and MECP for delivering the first 1000 packets.
PFRSPRRSPFRDFRDR
MTH4.825.123.824.02
MTD (s)5.585.764.164.30
MECP (J)0.730.690.540.52
Table 11. HND under different densities.
Table 11. HND under different densities.
FRDRMHRDQIR
300251622492205
350265122982445
400273624202452
Table 12. MTH, MTD, and MECP for delivering the first 1000 packets under different densities.
Table 12. MTH, MTD, and MECP for delivering the first 1000 packets under different densities.
FRDRMHRDQIR
300MTH4.023.764.06
MTD (s)4.304.145.94
MECP (J)0.520.540.70
350MTH3.883.573.60
MTD (s)4.304.075.47
MECP (J)0.540.560.67
400MTH3.653.393.52
MTD (s)4.214.045.72
MECP (J)0.550.580.72
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tao, S.; Tang, H.; Wang, J.; Li, B. A Failure Risk-Aware Multi-Hop Routing Protocol in LPWANs Using Deep Q-Network. Sensors 2025, 25, 4416. https://doi.org/10.3390/s25144416

AMA Style

Tao S, Tang H, Wang J, Li B. A Failure Risk-Aware Multi-Hop Routing Protocol in LPWANs Using Deep Q-Network. Sensors. 2025; 25(14):4416. https://doi.org/10.3390/s25144416

Chicago/Turabian Style

Tao, Shaojun, Hongying Tang, Jiang Wang, and Baoqing Li. 2025. "A Failure Risk-Aware Multi-Hop Routing Protocol in LPWANs Using Deep Q-Network" Sensors 25, no. 14: 4416. https://doi.org/10.3390/s25144416

APA Style

Tao, S., Tang, H., Wang, J., & Li, B. (2025). A Failure Risk-Aware Multi-Hop Routing Protocol in LPWANs Using Deep Q-Network. Sensors, 25(14), 4416. https://doi.org/10.3390/s25144416

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop