A Failure Risk-Aware Multi-Hop Routing Protocol in LPWANs Using Deep Q-Network

Tao, Shaojun; Tang, Hongying; Wang, Jiang; Li, Baoqing

doi:10.3390/s25144416

Open AccessArticle

A Failure Risk-Aware Multi-Hop Routing Protocol in LPWANs Using Deep Q-Network

by

Shaojun Tao

^1,2

,

Hongying Tang

¹,

Jiang Wang

^1,* and

Baoqing Li

^1,*

¹

Science and Technology on Micro-System Laboratory, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

^*

Authors to whom correspondence should be addressed.

Sensors 2025, 25(14), 4416; https://doi.org/10.3390/s25144416

Submission received: 19 June 2025 / Revised: 12 July 2025 / Accepted: 13 July 2025 / Published: 15 July 2025

(This article belongs to the Special Issue Security, Privacy and Trust in Wireless Sensor Networks)

Download

Browse Figures

Versions Notes

Abstract

Multi-hop routing over low-power wide-area networks (LPWANs) has emerged as a promising technology for extending network coverage. However, existing protocols face high transmission disruption risks due to factors such as dynamic topology driven by stochastic events, dynamic link quality, and coverage holes induced by imbalanced energy consumption. To address this issue, we propose a failure risk-aware deep Q-network-based multi-hop routing (FRDR) protocol, aiming to reduce transmission disruption probability. First, we design a power regulation mechanism (PRM) that works in conjunction with pre-selection rules to optimize end-device node (EN) activations and candidate relay selection. Second, we introduce the concept of routing failure risk value (RFRV) to quantify the potential failure risk posed by each candidate next-hop EN, which correlates with its neighborhood state characteristics (i.e., the number of neighbors, the residual energy level, and link quality). Third, a deep Q-network (DQN)-based routing decision mechanism is proposed, where a multi-objective reward function incorporating RFRV, residual energy, distance to the gateway, and transmission hops is utilized to determine the optimal next-hop. Simulation results demonstrate that FRDR outperforms existing protocols in terms of packet delivery rate and network lifetime while maintaining comparable transmission delay.

Keywords:

routing failure risk value; deep Q-network; multi-hop routing; LPWANs

1. Introduction

Multi-hop routing in low-power wide-area networks (LPWANs) has emerged as a promising solution for expanding geographical coverage [1,2]. Within such networks, event-driven architectures are widely adopted to enhance energy efficiency [3,4]. However, multi-hop routing over event-driven LPWANs is challenged by high transmission disruption risk. Specifically, dynamic link quality introduces unstable link connections, while imbalanced energy consumption and nonuniform end-device node (EN) distribution lead to coverage holes that disrupt data forwarding [5,6,7]. Consequently, developing multi-hop routing protocols that guide ENs to select routes with low disruption probability is critical.

Over the past decades, numerous multi-hop routing protocols have been proposed to determine optimal relays by evaluating the intrinsic EN state and neighborhood state [8,9,10,11,12]. However, these studies primarily focus on assessing link quality within the neighborhood state. By overlooking the number and residual energy of neighbors, these methods struggle to avoid selecting ENs that introduce high routing failure risk. Specifically, ENs with few neighbors exhibit higher transmission failure probabilities due to limited next-hop availability, while those connected to low-energy neighbors are prone to instability caused by energy depletion during data forwarding. Therefore, developing a comprehensive neighborhood state assessment framework to avoid relays that introduce high routing failure risk is imperative.

Given these, we propose a failure risk-aware deep Q-network-based multi-hop routing (FRDR) protocol. In FRDR, by evaluating multiple neighborhood state characteristics, a distinct routing failure risk value (RFRV) is assigned to each EN. RFRV is then integrated with other metrics into the reward function of a deep Q-network (DQN)-based routing decision framework to determine the optimal next-hop. The DQN employs reinforcement learning (RL), where agents continuously interact with external environments to learn optimal policies that maximize cumulative rewards [13]. Furthermore, by employing deep neural networks (DNNs) to approximate the Q-function within the Q-learning framework, DQN can effectively handle multi-hop routing under dynamic and complex conditions [14,15].

The main contributions of our study are summarized as follows:

We design a novel power regulation mechanism (PRM) that adaptively adjusts activation ranges based on the average signal-to-noise ratio (SNR) of received signals from neighbors. This mechanism further incorporates pre-selection rules to optimize EN activations and candidate relay selection.
We introduce the concept of routing failure risk value (RFRV) to quantify the potential failure risk posed by each candidate next-hop EN, which is evaluated based on its neighborhood state characteristics, including the number of neighbors, residual energy level, and link quality.
We develop a DQN-based routing decision mechanism that integrates RFRV into the reward function. Building upon metrics such as residual energy, distance to the gateway, and transmission hop count, our mechanism prioritizes low-RFRV ENs, thereby reducing transmission failures.
Through meticulous evaluation across various metrics, our simulation results demonstrate the advantages of FRDR in improving packet delivery rate and network lifetime while maintaining comparable transmission delay.

The remainder of this paper is organized as follows. Related studies are discussed in Section 2. Section 3 presents a brief review of DQN, and Section 4 introduces system models. In Section 5, the details of FRDR are described. Simulation results are thoroughly analyzed in Section 6 to illustrate the superiority of FRDR over other protocols, while Section 7 concludes this paper.

2. Related Studies

Over the past decades, numerous multi-hop routing protocols have been investigated, with a focus on relay selection strategies to optimize routing performance. In [8], link state information within two hops was considered when selecting relays to minimize delay and reduce packet loss. However, this two-hop dependency incurs high computational overhead in dynamic networks with frequent topology changes. In [9], the candidate relay with the highest reliability was selected to establish high-reliability and low-latency routes. Nevertheless, due to the dependence on predefined fuzzy rules, its adaptability to unmodeled network scenarios is limited. A method based on link quality prediction was proposed in [10], where a fuzzy logic system that incorporates distance, residual energy, and link quality (estimated via Kalman filtering) was adopted in relay decisions. However, this method is susceptible to model mismatch in event-driven networks, as bursty traffic violates the Markovian assumption underlying Kalman filtering-based prediction.

Given the limitations of traditional approaches, RL-based methods have emerged as a promising solution. These methods enable agents to learn optimal routing policies through real-time interaction with the external environment and reward-driven optimization, eliminating dependence on predefined models [13,14]. A Q-learning-based routing protocol was developed in [16], where energy consumption, bandwidth utilization, throughput, and data latency are jointly considered during relay selection. Similarly, ref. [17] proposed a Q-learning framework to reduce packet losses by deprioritizing predicted faulty nodes within routing decisions. In [15], a DQN-based intelligent routing (DQIR) protocol that balances residual energy distribution while minimizing routing distance was introduced to select relays. To address challenges such as insufficient adaptability to network topology changes, high communication delays, and short network lifetime in multi-hop routing, a dueling double deep Q-network was employed in [14] to optimize routing decisions. In [18], a reinforcement learning framework that integrates different node centrality metrics was developed to optimize relay selection.

A review of existing research reveals that while neighborhood state has been incorporated into relay selection decisions, these studies primarily focus on link quality without simultaneously considering the number of neighbors and their residual energy. This narrow focus prevents these methods from effectively excluding relays that introduce high routing failure risk, particularly in dynamic and complex networks. Geared toward this shortcoming, we propose FRDR in this article.

3. Brief Review of DQN

An RL framework is typically modeled as a Markov Decision Process (MDP), characterized by a tuple

〈S, A, P, R〉

, where

S

represents the state space,

A

denotes the action space,

P

is the state-transition probability, and

R

signifies the rewards. At each time step

t

, the agent executes the action

a_{t} \in A

determined by the policy

π

based on the current state

s_{t} \in S

. Subsequently, the environment provides the agent with an immediate reward

r_{t} \in R

contingent upon

a_{t}

and transitions to the next state

s_{t + 1}

. This process generates an experience

(s_{t}, a_{t}, r_{t}, s_{t + 1})

. The overarching goal of the agent is to derive an optimal policy

π^{*}

that maximizes the expected cumulative reward, thereby optimizing long-term performance within the given MDP framework [15].

Q-learning is a value-based RL algorithm that iteratively refines policies to approximate

π^{*}

. The action-value function

Q (s, a)

estimates the expected return of taking action

a

in state

s

, which is updated iteratively using the following formula:

Q (s, a) \leftarrow Q (s, a) + α \{r + γ \max_{a^{'} \in A} Q (s^{'}, a^{'}) - Q (s, a)\},

(1)

where

\max_{a^{'} \in A} Q (s^{'}, a^{'})

is the maximum Q-value over all possible actions

a^{'}

in the subsequent state

s^{'}

,

γ

is the discount factor, and

α

is the learning rate.

π^{*}

directs the agent toward actions that yield the highest

Q (s, a)

in each state.

When the state space is large, exhaustively computing

Q (s, a)

becomes infeasible. Consequently, DQN is adopted to approximate

Q (s, a)

, where the output is

Q (s, a; ω) \approx Q (s, a)

. Here,

ω

represents the weights of the DNN, and the stochastic gradient descent (SGD) algorithm is used to update parameters.

However, the neural network can become unstable owing to correlations between Q-value and target value, or small updates to Q-value at each step. To address this instability, experience replay and a quasi-static target network are employed in DQN [18]. In experience replay, at each time step

t

, an experience sample

e_{t} = (s_{t}, a_{t}, r_{t}, s_{t + 1})

is stored in a replay memory

M = \{e_{1}, e_{2}, \dots, e_{t}\}

. During training, the agent randomly samples a minibatch of experiences from

M

, thus removing the correlations between continuous samples and improving the stability and efficiency of learning. Additionally, an independent target neural network with weights

ω^{-}

is used for the quasi-static target network. The loss function

L (ω)

is calculated as follows:

L (ω) = E_{(s, a, r, s^{'}) \in M} [{(y (s^{'}, r) - Q (s, a; ω))}^{2}],

(2)

where

y (s^{'}, r)

is the output of the target neural network:

y (s^{'}, r) = r + γ \max_{a^{'} \in A} Q (s^{'}, a^{'}; ω^{-}),

(3)

where

ω^{-}

is synchronized with

ω

every

C

steps. This approach decouples the target value computation from the Q-network weights, thereby reducing the likelihood of divergence and ensuring more stable learning.

4. System Models

4.1. Network Model

Without loss of generality, we consider a network where

N

ENs are randomly distributed within an

L \times L

monitoring area with nonuniform density. As established in [19,20,21,22], the network model satisfies the following assumptions to construct a standardized scenario:

1.: A gateway (GW) is located in the center of the network and remains powered on. Central placement simplifies the network model, providing a consistent reference point for all ENs while facilitating a more balanced distribution of data flow.
2.: All ENs are homogeneous. This configuration minimizes performance variations due to hardware differences, thereby facilitating an unbiased evaluation of the logic and effectiveness of routing protocols under consistent operating parameters.
3.: Both ENs and GW are stationary after deployment. This configuration eliminates route fluctuations caused by the mobility of ENs and GW.
4.: All ENs are synchronized and can determine their locations via Global Positioning System (GPS) or other self-localization algorithms. Synchronization is essential for ordering control and data packets in negotiation-based protocols, while geographic information is fundamental for distance-based relay selection.
5.: The links are symmetric. This assumption ensures bidirectional connectivity and consistent link characteristics, thereby avoiding complications from unidirectional paths that disrupt acknowledgment-dependent routing protocols.

4.2. Routing Failure Risk Value

To reduce transmission disruption probability, ENs with higher Routing Failure Risk Value (RFRV) are deprioritized in FRDR. The effectiveness of this approach is demonstrated in Figure 1.

Definition 1.

For a given EN, its neighboring ENs are all ENs located within its maximum direct communication range.

Generally, neighborhood state characteristics, including the number of neighbors

N_{n}

, the residual energy level of neighbors

E_{n}

, and link quality

L Q

, are jointly considered when evaluating RFRV. An EN with lower

N_{n}

,

E_{n}

, and

L Q

is associated with a higher RFRV. For

E N_{i}

,

R F R V_{i}

can be computed via Equations (4)–(6).

R F R V_{i} = λ_{1} {\tilde{N}}_{n}^{i} + λ_{2} {\tilde{L Q}}_{i} + λ_{3} {\tilde{E}}_{n}^{i},

(4)

L Q_{i} = \sqrt{\frac{\bar{R S S I_{i}} - R S S I_{t h}}{R S S I_{t h}} \cdot \frac{\bar{S N R_{i}} - S N R_{t h}}{S N R_{t h}}},

(5)

E_{n}^{i} = \bar{e_{r e s}^{n} (i)} / e_{i n i t},

(6)

\tilde{x} = (x - x_{\min}) / (x_{\max} - x_{\min}),

(7)

where

\bar{R S S I_{i}}

and

\bar{S N R_{i}}

denote the average received signal strength indicator (RSSI) and signal-to-noise ratio (SNR) at

E N_{i}

from signals transmitted by its neighbors, while

R S S I_{t h}

and

S N R_{t h}

are the corresponding thresholds.

\bar{e_{r e s}^{n} (i)}

indicates the average residual energy of neighbors, while

e_{i n i t}

is the initial energy. To eliminate dimensional differences among heterogeneous indicators,

N_{n}

,

E_{n}

, and

L Q

are standardized by the min-max normalization, as defined in Equation (7).

The weights

λ_{i} (i = 1, 2, 3)

in Equation (4) are determined using the Analytical Hierarchy Process (AHP) [23]. Benefiting from its capability in establishing quantitative frameworks for complex and ambiguous decision-making problems, as well as systematically relating criterion weights to overarching objectives, AHP is widely adopted for deriving criterion weights in multi-criteria decision analysis [24].

The AHP process begins by constructing a pairwise comparison matrix

A

for decision criteria, as defined in Equation (8).

A = {(a_{i j})}_{k \times k} = [\begin{matrix} 1 & a_{12} & \dots & a_{1 k} \\ a_{21} & 1 & \dots & a_{2 k} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ a_{k 1} & a_{k 2} & \dots & 1 \end{matrix}],

(8)

where each element

a_{i j}

denotes the relative importance of the criterion associated with row index

i

compared to the criterion related to the column index

j

. When constructing

A

, a 1–9 scale [25] is widely adopted to quantify the relative importance between each pair of criteria. This well-known AHP scale is shown in Table 1.

The weight vector

w

is then derived by solving the characteristic equation:

A w = λ_{\max} w,

(9)

where

λ_{\max}

is the largest eigenvalue of the pairwise comparison matrix

A

.

Since

w

represents unnormalized priorities, the final criteria weights

w_{i}^{'}

are obtained through normalization:

w_{i}^{'} = w_{i} / \sum_{j = 1}^{k} w_{j},

(10)

where

k

is the order of

A

(i.e., the number of criteria).

Since pairwise comparisons in AHP are heavily dependent on human judgment, they are susceptible to inconsistencies. To address this issue, a standard procedure is provided in [26] to check for the consistency of the pairwise comparison matrix by utilizing the largest eigenvalue

λ_{\max}

. The deviation of

λ_{\max}

from the matrix dimension

k

is quantified by the Consistency Index (CI):

C I = \frac{λ_{\max} - k}{k - 1},

(11)

To benchmark CI, Random CI (RI) is also proposed in [26] (Table 2), which is derived from randomly generated reciprocal matrices of various dimensions. The Consistency Ratio (CR) is then calculated as follows:

C R = \frac{C I}{R I},

(12)

According to the established threshold [27],

C R \leq 0.1

indicates that the results are satisfactory. Otherwise, the pairwise comparison matrix must be re-evaluated.

According to Equations (8)–(10), the weights

λ_{i} (i = 1, 2, 3)

in Equation (4) are determined using the pairwise comparison matrix presented in Table 3. This matrix satisfies the consistency requirement (CR = 0.0079 < 0.1), confirming the reliability of the weight results.

5. Detailed Description of the Proposed Protocol

In this section, FRDR is introduced in detail with the overall flowchart illustrated in Figure 2.

To deliver an event-related data packet to the GW, data-holding EN (DEN) in FRDR first checks whether the GW is within its one-hop communication range. If direct transmission is feasible, the data packet is forwarded directly. Otherwise, DEN triggers the relay selection process, which consists of two phases: candidate router selection and optimal relay decision-making.

Given the need for real-time topology awareness in dynamic networks, FRDR utilizes the Sensor Protocol for Information via Negotiation (SPIN) [28] to manage neighbor discovery and state updates, with its message sequence detailed in Figure 3. The specific workflow is outlined as follows:

DEN initiates the relay selection process by broadcasting an Advertisement (ADV) message with transmission power regulated by the Power Regulation Mechanism (PRM). This mechanism dynamically adjusts EN activation ranges based on the average neighbor signal SNR to enhance energy efficiency. Neighboring ENs that receive the ADV message parse its metadata to extract information such as the DEN-to-GW distance.

To prevent redundant participation, a pre-selection mechanism is employed to restrict relay requests from ENs that receive the ADV message. Each EN autonomously determines whether to apply for packet forwarding based on its residual energy and distance to the GW. Eligibility criteria for the application are as follows:

(1): The distance from the EN to the GW must be shorter than the DEN-to-GW distance recorded in the ADV metadata, thereby preventing data backhaul.
(2): The residual energy of the EN must exceed a predefined threshold $e_{t h}$ , which is derived from the energy cost of receiving and forwarding a data packet at the minimum transmission power level. This criterion prevents resource wastage caused by ENs with insufficient energy applying for relay tasks.

Neighboring ENs that fail either condition discard the ADV packet, and qualified ENs respond with a Request (REQ) message containing self-reported metrics (i.e., residual energy and distance to the GW). These responding ENs form the candidate set of next-hop routers.

If the DEN receives no REQ packet within the designated reception window, it rebroadcasts the ADV packet at its maximum power level to activate more potential relays. If this second broadcast also fails to elicit any response, the transmission is deemed a failure due to the unavailability of suitable relays.

Conversely, when REQ responses are received, the DEN executes a DQN-based routing decision mechanism, as detailed in Section 5.2, to determine the optimal relay from candidates. Following this selection, the DEN forwards the data packet to the chosen relay. Upon completing the role transition, the current DEN exits the routing process and enters a low-power sleep mode, where it awaits its next activation to minimize energy consumption.

5.1. Power Regulation Mechanism

By default, ENs operate at maximum power to maintain periodic neighbor information exchange. However, this configuration becomes inefficient during data transmission, as excessive power causes resource wastage (e.g., redundant EN activations). To address this issue, FRDR introduces a Power Regulation Mechanism (PRM) that adaptively adjusts transmission power levels based on the average SNR of received signals from neighbors, thereby reducing overhead.

As detailed in Algorithm 1, the standard Adaptive Data Rate (ADR) algorithm [29] adjusts data rates based on SNR to optimize throughput and energy efficiency.

Algorithm 1. Standard Adaptive Data Rate Algorithm.
Initialize: $Spreading factor S F \in [7, 12]$ $, Transmitting Power T P \in [2 dBm, 14 dBm]$
1:	$S N R_{r e q}$ ← demodulation floor (current data rate)
2:	$S N R_{\max}$ ← max (SNR of last 20 frames)
3:	$S N R_{m a r g i n}$ $\leftarrow S N R_{\max} - S N R_{r e q} - M a r g i n_d B$
4:	$N S t e p$ $\leftarrow int (S N R_{\max} / 3$ )
5:	$while N S t e p > 0$ $and S F > S F_{\min}$ do
6:	$S F$ $\leftarrow S F - 1$
7:	$N S t e p$ $\leftarrow N S t e p - 1$
8:	end while
9:	$while N S t e p > 0$ $and T P > T P_{\min}$ do
10:	$T P$ $\leftarrow T P - 3$
11:	$N S t e p$ $\leftarrow N S t e p - 1$
12:	end while
13:	$while N S t e p < 0$ $and T P < T P_{\max}$ do
14:	$T P$ $\leftarrow T P + 3$
15:	$N S t e p$ $\leftarrow N S t e p + 1$
16:	end while
17:	$Output : T P$ $and S F$

Building upon the standard ADR framework, we propose the PRM, which redirects optimization from data rates to transmission power levels. The workflow of PRM operates as follows:

Step 1: Calculate the average SNR

S N R_{a v g}

of the most recently received signals transmitted by

n

neighbors.

Step 2: Subtract a predefined margin

M a

(default: 10 dB) from the difference between

S N R_{a v g}

and

S N R_{t h}

to determine the SNR margin

S N R_{m a r g i n}

, i.e.,

S N R_{m a r g i n} = (S N R_{a v g} - S N R_{t h}) - M a

.

Step 3: Adjust the current transmission power level

l_{t x}

based on

S N R_{m a r g i n}

. If

S N R_{m a r g i n} > 0

, it suggests that

l_{t x}

can be reduced without compromising communication reliability.

Given the dynamic nature of the link environment, relying on a fixed neighbor count,

n

, may lead to inaccurate decisions. To overcome this limitation, PRM adaptively adjusts

n

based on link variability. Specifically, the DEN randomly selects

n

neighbors from its routing table and calculates the volatility rate of the link environment,

R_{c h a n g e}

, using Equation (13).

R_{c h a n g e} = \frac{1}{n} \sum_{i = 1}^{n} \frac{|R S S I_{SL} (i) - R S S I_{L} (i)|}{|R S S I_{t h} - R S S I_{L} (i)|},

(13)

where

R S S I_{L} (i)

and

R S S I_{S L} (i)

denote the RSSI of the last and penultimate signals received by DEN from the

i - th

EN, respectively.

Higher values of

R_{c h a n g e}

indicate a more volatile link environment. When

R_{c h a n g e}

exceeds the threshold

R_{c h a n g e}^{-}

, to enhance decision accuracy, PRM increases

n

by 1 to incorporate diverse information from additional neighbors into the decision-making process. This adjustment repeats until either

R_{c h a n g e} \leq R_{c h a n g e}^{-}

or

n \geq N_{n}

, thereby achieving an adaptive balance between decision accuracy and computational overhead. The initial empirical value of

n

is set at 3 according to [29,30]. The detailed PRM workflow is described in Algorithm 2.

Algorithm 2. Power Regulation Mechanism
Input: $S N R_{t h}$ $, the upper limit l_{t x}^{\max}$ $and lower limit l_{t x}^{\min}$ $of l_{t x}$ $(with initial value l_{t x} = l_{t x}^{\max}$ $), SNR and RSSI of received signals from neighbors, the number of neighbors N_{n}$
$Output : l_{t x}$
1:	Randomly select $n$ $neighbors and calculate R_{c h a n g e}$ by Equation (13)
2:	$if N_{n} \leq 3$ then
3:	$l_{t x} = l_{t x}^{\max}$
4:	else
5:	$if R_{c h a n g e} > R_{c h a n g e}^{-}$ $and n < N_{n}$ then
6:	$n = n + 1$
7:	Go to line 1
8:	end if
9:	end if
10:	$calculate S N R_{a v g}$ $and S N R_{m a r g i n}$
11:	$N S t e p \leftarrow r o u n d (S N R_{m a r g i n} / 3)$
12:	$while N S t e p > 0$ $and l_{t x} > l_{t x}^{\min}$ do
13:	$l_{t x} = l_{t x} - 1$ $and N S t e p = N S t e p - 1$
14:	end while

5.2. DQN-Based Routing Decision Mechanism

At time step

t

, all ENs that respond to a REQ packet form the candidate relay set

ℕ_{t}^{c f}

, from which DEN selects the optimal next-hop router. However, an EN selected as a relay will consume more energy, which may lead to unbalanced energy distribution across the network and potentially cause coverage holes. Furthermore, selecting candidates with high RFRV increases transmission disruption probability. Given these, we propose a low-latency, long-lifetime, and high-success-rate routing decision mechanism. The optimal relay selection is formulated as follows:

\begin{matrix} \max & e_{r e s} (i, t) \\ \min & R F R V_{i, t} \\ \min & d s t_{i} \\ \min & h o p \\ s . t . & E N_{i} \in ℕ_{t}^{c f} \end{matrix}

(14)

where

e_{r e s} (i, t)

denotes the residual energy of

E N_{i}

at time step

t

,

d s t_{i}

is the distance between

E N_{i}

and the GW, and

h o p

is the packet transmission hop count.

5.2.1. MDP Model for FRDR

Given the dynamic nature of network conditions, the routing decision process in FRDR is modeled as an MDP and solved using DQN. The overall framework is illustrated in Figure 4.

By modeling DEN as an agent, the corresponding states, actions, and reward functions are defined as follows:

States: The state integrates hop count and features of EN to form a unified vector $s_{t} \in ℝ^{3 N + 1}$ . To handle dynamic fluctuations in the number of candidate relays, FRDR employs a feature masking mechanism. For each $E N_{i}$ at time step $t$ , its feature vector is defined as follows:

$f_{i, t} = \{\begin{matrix} [d s t_{i} / L, e_{r e s} (i, t) / e_{i n i t}, R F R V_{i, t}], & E N_{i} \in ℕ_{t}^{c f} \\ [1, 0, 1], & o t h e r w i s e \end{matrix}$

(15)

Based on the current hop count $h_{t}$ , the overall state $s_{t}$ is expressed as follows:

$s_{t} = \{\begin{matrix} {[h_{t}, \oplus_{i = 1}^{N} [1, 0, 1]]}^{T}, & f a i l u r e \\ {[h_{t}, \oplus_{i = 1}^{N} [0, 1, 0]]}^{T}, & s u c c e s s \\ {[h_{t}, \oplus_{i = 1}^{N} f_{i, t}]}^{T}, & i n t e r m e d i a t e \end{matrix}$

(16)

where $\oplus$ denotes vector concatenation.
Actions: By executing action $a_{t}$ at time step $t$ , the agent selects the corresponding EN as the next-hop router, i.e., $a_{t} = i (i \in \{1, 2, \dots, N\})$ indicates that $E N_{i}$ is chosen as the relay.
Reward Function: To determine the optimal relay in Equation (14), the reward function in FRDR is designed to guide the agent towards solutions that maximize residual energy, minimize RFRV, reduce distance to the GW, and minimize hop count. It is defined as follows:

$r_{t} = \{\begin{matrix} R_{\max}, & C_{1} \\ - R_{\max}, & C_{2} \\ {\hat{r}}_{t}, & C_{3} \end{matrix}$

(17)

In Equation (17), $C_{1}$ represents successful packet delivery to the GW, for which a positive reward $R_{\max}$ is granted. Conversely, $C_{2}$ denotes transmission disruption, which is penalized with $- R_{\max}$ . All other cases fall under $C_{3}$ , where the composite reward ${\hat{r}}_{t}$ implements the optimization objectives from Equation (14) through specific reward components:

$r_{t}^{1} = d s t_{i} / L,$

(18)

$r_{t}^{2} = e_{r e s} (i, t) / e_{i n i t},$

(19)

$r_{t}^{3} = R F R V_{i, t},$

(20)

$r_{t}^{4} = h_{t} / H_{\max},$

(21)

Note that $r_{t}^{1}$ , $r_{t}^{2}$ , and $r_{t}^{3}$ pertain to the attributes of ENs, while $r_{t}^{4}$ is a path attribute. By integrating Equations (18)–(21), the composite reward ${\hat{r}}_{t}$ is derived as Equation (23). Here, ${\tilde{r}}_{t}^{1}$ and ${\tilde{r}}_{t}^{3}$ are the normalized versions of $r_{t}^{1}$ and $r_{t}^{3}$ via Equation (7), while $r_{t}^{2}$ is normalized using Equation (22).

$\tilde{x} = (x_{\max} - x) / (x_{\max} - x_{\min}),$

(22)

${\hat{r}}_{t} = \sum_{i = 1}^{3} μ_{i} \cdot {\tilde{r}}_{t}^{i} - η \cdot r_{t}^{4},$

(23)

The weights $μ_{i}$ in Equation (23) are calculated via AHP using Equations (8)–(10). The pairwise comparison matrix for decision criteria in Equation (23) is presented in Table 4, with a CR of 0.0158. This CR value is well below the threshold of 0.1, confirming the logical coherence of the pairwise comparisons and the reliability of the weight results.

5.2.2. DQN Architecture

The DQN architecture implemented in FRDR is detailed as follows:

1.: Input layer: A fully connected (FC) layer is used as the input layer. The input feature dimension is set to $3 N + 1$ , corresponding to the pre-masked state $s_{t}$ .
2.: Hidden layer: The hidden layer comprises two FC layers with 64 and 32 neurons, respectively. For each FC layer, the Leaky Rectified Linear Unit (Leaky ReLU) activation function with a negative slope coefficient of 0.01 is employed. Moreover, the backpropagation gradients from non-candidate ENs are set to zero.
3.: Output layer: We define an FC layer with $N$ neurons as the output layer to generate raw Q-values $Q_{r a w}$ , where a linear activation function is utilized. Feature-based masking is then applied to compute final values by Equation (24).

$Q = \{\begin{array}{l} Q_{r a w}, & E N_{i} \in ℕ_{t}^{c f} \\ - 10^{5}, & o t h e r w i s e \end{array}$

(24)

5.2.3. Network Training and Routing Decision

At time step

t

, the agent gathers state information from candidate relays and constructs the state vector

s_{t}

according to Equation (16).

s_{t}

is fed into the DQN, which outputs Q-values corresponding to each EN. The agent then selects action

a_{t}

using an annealing

ε - g r e e d y

strategy, where

ε

decays as follows [31]:

ε = ε_{e n d} + (ε_{s t a r t} - ε_{e n d}) \cdot \exp (- τ \cdot N_{e p s}^{n o w}),

(25)

where

ε_{s t a r t}

and

ε_{e n d}

represent the initial and terminal values of

ε

, respectively.

N_{e p s}^{n o w}

denotes the number of current training iterations, while

τ

is the attenuation rate. With probability

1 - ε

, the agent exploits by selecting the EN associated with the maximum Q-value. During exploration, a Weighted Probability Selection method that prioritizes candidates with lower RFRV is employed to enhance efficiency. The selection probability

p ε_{i}

for

E N_{i}

during exploration is given by:

p ε_{i} = \{\begin{matrix} 1 - \frac{R F R V_{i}}{\sum_{E N_{i} \in ℕ_{t}^{c f}} R F R V_{i}}, & E N_{i} \in ℕ_{t}^{c f} \\ 0, & o t h e r w i s e \end{matrix}

(26)

Upon determining

a_{t}

, DEN forwards the data packet to the corresponding EN. At the next time step, the state transitions to

s_{t + 1}

, and the environment provides the agent with reward

r_{t}

computed via Equation (17). The agent then constructs the transition tuple

(s_{t}, a_{t}, r_{t}, s_{t + 1})

and stores it in replay memory

M

. When

M

accumulates a sufficient number of samples, the agent randomly samples a minibatch from

M

every

C_{\exp}

timesteps to train the DQN by minimizing the loss

\bar{L} (ω)

defined in Equation (2). Additionally, the target network is periodically synchronized with the evaluation network every

C_{t}

timesteps.

The overall framework of the DQN-based routing decision mechanism is illustrated in Algorithm 3. Notably, the learning process is conducted in the virtual environment configured on computers to avoid high-performance demands on EN entities.

Algorithm 3. DQN-Based Routing Decision Mechanism
$Input : ε_{s t a r t}$ $, ε_{e n d}$ $, τ$ $, γ$ $, α$ $, experience replay update frequency C_{\exp}$ $, target update frequency C_{t}$ $, minibatch size B$ $, maximum training episodes N_{e p s}^{\max}$ $, maximum iterations N_{i t e r}^{\max}$
Initialize: replay memory M $, experience counter N_{\exp} = 0$ $, evaluation network with random weights ω$ $, target network with weights ω^{-} = ω$
Offline Learning
1:	$for N_{e p s}$ $= 1 : N_{e p s}^{\max}$ do
2:	$for N_{i t e r}$ $= 1 : N_{i t e r}^{\max}$ do
3:	Event-related data packet generated
4:	while GW is out of one-hop range do
5:	$Determine the candidate relay set ℕ^{c f}$ $and calculate RFRV for each E N \in ℕ^{c f}$ by Equation (4)
6:	$if ℕ^{c f} = \emptyset$ then
7:	Go to line 16
8:	end if
9:	Formulate state vector $s$ by Equation (16)
10:	Get Q-values for $s$ in the evaluation network
11:	Select action $a$ $via annealing ε - g r e e d y$ strategy
12:	Forward data packet according to the action $a$
13:	Perform lines 16–25
14:	end while
15:	Send the data packet to the GW
16:	$Compute the reward r$ by Equation (17)
17:	State transitions to $s^{'}$
18:	$Store (\bar{s}, a, r, {\bar{s}}^{'})$ $into M$ $and set N_{\exp} = N_{\exp} + 1$
19:	$if \|M\| \geq B$ $and \mod (N_{\exp}, C_{\exp}) = 0$ then
20:	$Sample a random minibatch (s, a, r, s^{'})$ $from M$
21:	$Update target Q - values of all samples with y (s^{'}, r) = \{\begin{matrix} r + γ \max_{a^{'} \in A} Q (s^{'}, a^{'}; ω^{-}), & C_{3} \\ r, & o t h e r w i s e \end{matrix}$
22:	$Compute \bar{L} (ω)$ by Equation (2)
23:	Update $ω$ in the evaluation network using SGD
24:	end if
25:	$if \mod (N_{\exp}, C_{t}) = 0$ $then ω^{-} = ω$
26:	end for
27:	end for
Output: Evaluation network with ω
Online Decision
Input: Trained evaluation network with ω
28:	DEN determines the set of candidate next-hop ENs and calculates RFRV for each candidate according to Equation (4)
29:	DEN constructs a state vector
30:	Input the current state into the evaluation network, and output the optimal action with maximum Q-value
31:	Forward the data packet according to the optimal action

6. Simulation Results and Analysis

In this section, we conduct extensive experiments to evaluate the performance of FRDR using MATLAB R2020b.

6.1. Simulation Models

It is assumed that each EN sends packets to the GW either directly or via intermediate hops using the LoRa protocol. The spreading factor (SF), bandwidth (BW), and coding rate

R_{c}

are fixed at SF = 7, BW = 125 kHz, and

R_{c} = 1

, respectively. The low data rate optimization is not enabled by default, while explicit header type (H = 0) and CRC are adopted. These parameter settings determine the simulation models adopted in the experiments, including those for packet transmission time, energy consumption, path loss, etc.

6.1.1. Packet Transmission Time Measurement Model

The time for ENs to transmit a packet,

T o A

, is computed as follows [32]:

T o A = \{(n_{p r} + 4.25) + 8 + P_{1}\} \cdot 2^{S F} / B W,

(27)

P_{1} = \max (⌈\frac{8 n_{p l} - 4 S F + 28 + 16 C R C - 20 H}{4 \cdot S F}⌉ \cdot (R_{c} + 4), 0),

(28)

where

n_{p r}

and

n_{p l}

denote the symbol number of the preamble and payload, respectively.

6.1.2. Energy Consumption Measurement Model

Given that energy consumption in the dormant state is significantly lower than that in other transceiver states, the energy consumption of an EN, denoted as

E_{c}

, can be simplified as the sum of transmit and receive energies [2]:

E_{c} = V_{DD} \cdot (I_{t x} \cdot T_{t x} + I_{r x} \cdot T_{r x}),

(29)

where

V_{DD}

is the nominal voltage,

I_{t x}

and

I_{r x}

denote the transmitting and receiving currents, respectively.

T_{t x}

and

T_{r x}

indicate the duration of transmitting and receiving, satisfying

T_{t x} = T_{r x} = T o A

. According to [33], these parameters are configured as specified in Table 5. Additionally,

P_{T}

denotes the transmit power at the transmission power level

l_{t x}

.

6.1.3. Path Loss Measurement Model

Results presented in this study were computed at carrier frequency

f = 868 MHz

and the path loss model defined in [33]:

\begin{matrix} P L [dB] = 32.45 + 30 (e_{p l} - 2) + 20 \lg (f [MHz]) \\ + e_{p l} \cdot 10 \lg (d [km]) + δ [dB] \end{matrix},

(30)

where

P L

is the path loss at distance

d

,

e_{p l}

is the path loss index, and

δ \sim Ν (0, σ^{2})

models random channel fluctuations resulting from shadowing. In this study, we use

e_{p l} = 5

and

σ = 3 dB

.

6.1.4. RSSI and SNR Measurement Models

Based on Equation (30), RSSI is computed as a function of

P_{T}

:

R S S I [dBm] = P_{T} [dBm] + G_{A}^{T} [dB] + G_{A}^{R} [dB] - P L [dB],

(31)

where

G_{A}^{T}

and

G_{A}^{R}

denote the transmitting and receiving antenna gains, respectively. In this study, we set

G_{A}^{T} = G_{A}^{R} = 3 dBi

.

During data transmission, the noise power

P_{n}

of each EN is considered in the calculation of SNR [29].

S N R [dB] = P_{r} [dBm] - P_{n} [dBm],

(32)

P_{n} [dBm] = 10 \lg ((T_{r} + T_{b}) \cdot B W \cdot κ) + 30,

(33)

where

T_{b}

is the background temperature, typically set to 290 K,

κ = 1.379 \times 10^{- 23} J \cdot K^{- 1}

is the Boltzmann Constant, and

T_{r}

is the receiver temperature, given by the following expression:

T_{r} = (10^{N F / 10} - 1) \cdot T_{b},

(34)

where

N F

is the noise figure of the receiver. According to [29], the parameters are set as

N F = 6 dB

,

R S S I_{t h} = - 124.5 dBm

, and

S N R_{t h} = - 7.5 dB

.

6.2. Simulation Setup

In this section, a comparison among FRDR, Minimum Hop Routing (MHR), and DQIR [15] is performed. MHR operates as a distributed routing algorithm that selects relays based on the minimum hop count to the GW. In DQIR, next-hop selection from candidate routers is conducted by a DQN-based routing protocol. Notably, to achieve a more equitable comparison, the learning rate and replay memory in DQIR are adjusted to 0.01 and 5000, respectively, through extensive hyperparameter tuning.

To comprehensively evaluate the performance of FRDR, we introduce three self-contrasting algorithms detailed in Table 6.

Additionally, to ensure fair benchmarking, all compared algorithms adopt the same pre-selection rules as FRDR and utilize the SPIN-based interaction process to determine candidate routers.

The dataset generation framework introduced in [34] is applied to deploy ENs in a stochastic and nonuniform manner across a

1 km \times 1 km

area, while a GW is located at the center. The initial energy of all ENs is fixed at 0.5 mAh. During each iteration, a source EN is randomly selected from ENs, and a data packet is transmitted from this source EN to the GW under different multi-hop routing protocols. Additional network parameters are detailed in Table 7.

During the offline training of DQN, we execute 100 training episodes, and each episode comprises 1000 complete packet transmission simulations. For each simulation, a source EN is randomly chosen from the monitoring area, which then transmits a data packet toward the GW via single-hop or multi-hop routing. Upon transmission completion (either successful delivery to the GW or failure), the simulation proceeds immediately to the next packet transmission. Other specific DQN parameters are detailed in Table 8.

6.3. Performance Analysis

The performance metrics employed in our simulation are packet delivery rate (PDR), mean transmission delay (MTD), mean number of transmission hops (MTHs), mean energy consumption for delivering a data packet (MECP), and network lifetime. The network lifetime can be commonly measured in terms of the first node dead (FND), half node dead (HND), and the last node dead (LND). However, only HND is adopted in our simulation as the network was disabled long before LND, while the death of the first EN had little impact on network performance.

We randomly select an experimental scenario with 300 ENs from our simulations and use the routing process of the first data packet after network deployment to visually demonstrate the FRDR protocol in Figure 5. To clarify the relay selection mechanism of FRDR, Figure 5 further illustrates the spatial distribution of candidate ENs for Router1. Specifically, the source EN first adjusts its transmission power level according to PRM, thereby optimizing the EN activation range based on the average SNR of received signals from neighbors. In this case, the source EN broadcasts an ADV packet at Level 6. Subsequently, ENs that receive the ADV packet and satisfy pre-selection rules form the candidate set for Router1. Table 9 quantitatively summarizes their critical attributes, including ID, RFRV, distance to the GW, and residual energy. Then, by utilizing the DQN-based routing decision mechanism (Algorithm 3), the source EN determines Router1 from the candidates. Experimental result indicates that EN167, the candidate EN exhibiting the lowest RFRV and shortest distance to the GW, is selected as Router1. This outcome is consistent with the relay selection objective of FRDR defined in Equation (14).

With the identical configuration as Figure 5, Figure 6 further compares routing processes across different multi-hop routing algorithms. As shown in Figure 6, both FRDR and PRRS exhibit fewer candidate ENs for Router1. This reduction is primarily attributed to the PRM, which adjusts EN activation ranges based on the average SNR of received signals from neighbors. By utilizing PRM, the source EN in FRDR and PRRS reduces transmission power to Level 6, thereby confining the ADV broadcast range. In contrast, non-PRM protocols broadcast ADV packets at maximum power (Level 7), which can easily cause redundant EN activation (e.g., EN263 and EN244). Following candidate screening, each algorithm applies distinct criteria for final relay selection. Notably, PFRS and PRRS exhibit higher hop counts due to random relay selection, whereas FRDR, MHR, DQIR, and PFRD achieve lower hop counts through effective selection rules. Specifically, MHR selects relays via pre-stored routing tables guided by minimum hop counts, while DQIR prioritizes relays that minimize distance to the GW and balance residual energy distribution. FRDR and PFRD incorporate multi-dimensional neighborhood state characteristics of candidate ENs into routing decisions, effectively avoiding relays that introduce high routing failure risk (e.g., Router1 in MHR and Router3 in DQIR).

First, the performance analysis of the proposed PRM and Algorithm 3 is provided.

Figure 7a demonstrates that PRM effectively reduces energy consumption. By employing PRM, DENs in FRDR and PRRS adjust transmission power levels according to demand, thereby reducing redundant EN activations. Consequently, the lifetime of individual EN can be extended, which in turn enhances the overall network lifetime and PDR. Figure 7b further illustrates that the PDR curves of PRRS and FRDR decline more slowly than PFRS and PFRD. Specifically, when the PDR of PFRD and PFRS drops to 0.80, FRDR and PRRS maintain values of 0.88 and 0.85, representing improvements of 10.00% and 6.25%, respectively. Additionally, as depicted in Figure 7c, both FRDR and PRRS achieve higher HND than their respective counterparts, which confirms the effectiveness of PRM in extending network lifetime.

Table 10 presents a comparison of MTH, MTD, and MECP for delivering the first 1000 packets, during which the PDR of each algorithm remains at a relatively high level. It reveals that PRM introduces a slight increase in transmission delay. Compared to PFRD and PFRS, the MTH of FRDR and PRRS increased by 0.20 and 0.30, respectively. This increase is attributed to the fact that the execution of PRM prevents the single-hop range from consistently reaching its maximum, potentially increasing the number of hops required for data delivery. Nevertheless, through the effective combination with pre-selection rules, PRM further amplifies the benefits of reducing redundant EN activations, thereby significantly decreasing the delay and energy consumption associated with REQ packet reception. As a result, the impact of these additional hops on overall delay is minimal. Specifically, the 0.14 s increase in MTD for FRDR constitutes only 3.26% of its total MTD, while for PRRS, it accounts for 3.13%. Overall, PRM achieves an effective balance between transmission efficiency and other critical performance metrics, including energy consumption, PDR, and network lifetime.

As for Algorithm 3, Figure 7b clearly illustrates its superiority in PDR. Specifically, when the PDR of PRRS and PFRS decreases to 0.80, FRDR and PFRD sustain values of 0.96 and 0.94, achieving improvements of 20.00% and 17.5%. These improvements arise from the multi-factor routing strategy of Algorithm 3, where RFRV, distance to the GW, transmission hops, and residual energy are considered in tandem. Therefore, Algorithm 3 effectively reduces transmission disruption and enables faster delivery to the GW while balancing energy consumption. The results presented in Figure 7c and Table 10 further confirm the advantage of Algorithm 3. In terms of HND, FRDR improved by 16.81%, while PFRD achieved a growth rate of 9.46%. Additionally, during the whole network lifetime, FRDR achieves reductions of 21.48%, 25.35%, and 24.64% in MTH, MTD, and MECP, respectively, compared to PRRS, while PFRD demonstrates reductions of 20.75%, 25.45%, and 26.03%.

Second, to fully illustrate the superiority of FRDR, a comparison among FRDR, MHR, and DQIR is presented.

The comparisons of PDR and residual energy of the network among different EN densities are depicted in Figure 8 and Figure 9. It is evident that FRDR significantly outperforms MHR and DQIR by maintaining a higher PDR and reducing energy consumption. Moreover, Table 11 provides a comparison of network lifetime, while a more detailed comparison of MTH, MTD, and MECP across different EN densities is presented in Table 12. They demonstrate that FRDR achieves the longest network lifetime while maintaining comparable transmission delay. These improvements are attributed to the integration of PRM and the DQN-based multi-factor routing strategy, which dynamically adjusts activation range to reduce redundant activation and optimizes routing decisions based on RFRV, residual energy, transmission hops, and the distance to the GW.

MHR focuses solely on minimizing transmission hops, which contributes to its superiority in MTH, as shown in Table 12. However, to achieve this goal, the maximum transmission power is fixed in MHR, which leads to higher redundant activation than FRDR, particularly as EN density increases. This increased redundancy diminishes the delay and energy efficiency advantages gained by minimizing transmission hops, as higher reception delay and energy consumption occur during REQ reception. Table 12 further reveals that MHR results in a higher MECP than FRDR, while achieving a marginal reduction in MTD. Moreover, the exclusive consideration of hop count in MHR inevitably leads to hotspot issues due to the overutilization of partial ENs, which in turn leads to a shorter network lifetime and lower PDR. Conversely, by integrating RFRV and residual energy into routing decisions, FRDR effectively avoids routers that will introduce high routing failure risk and realizes a more balanced energy distribution. As a result, FRDR achieves a higher network lifetime and PDR. Specifically, when the PDR of MHR drops to 0.80, FRDR maintains a higher PDR, achieving improvements of 14.71%, 15.69%, and 18.90% at EN densities of 300, 350, and 400, respectively. Consequently, compared to MHR, FRDR effectively improves the PDR and network lifetime while maintaining a comparable transmission delay.

As for DQIR, multiple factors, including residual energy and distance to the GW, are considered when selecting the next-hop router from candidate ENs to minimize delay and balance energy distribution. Table 12 indicates that DQIR achieves lower MTH at EN densities of 350 and 400 compared to FRDR. However, DQIR requires all ENs that receive broadcast information to transmit a message to a designated agent for routing decisions. Although this method offloads the reception energy consumption from DEN to an additional agent without energy constraints, leading to more balanced energy consumption, the excessive overhead from replies significantly increases energy consumption and delay. In contrast, through the combination of PRM and pre-selection rules, FRDR effectively reduces redundant transmissions by dynamically adjusting activation ranges and requiring only ENs that meet the pre-selection rules to respond. As a result, compared to DQIR, FRDR achieves lower MTD and MECP, as well as a higher network lifetime. Moreover, by considering RFRV, FRDR effectively avoids selecting candidate routers that will introduce high routing failure risk, which further enhances the performance of PDR. Specifically, when the PDR of DQIR drops to 0.80, FRDR maintains a higher PDR, achieving improvements of 18.90%, 20.68%, and 21.96% at EN densities of 300, 350, and 400, respectively.

To summarize, the performance superiority of FRDR mainly comes from the PRM and DQN-based routing decision mechanism. PRM dynamically adjusts activation ranges, which works with pre-selection rules, further reducing unnecessary reception overhead. Meanwhile, the RFRV, in conjunction with other factors such as residual energy, distance to the GW, and transmission hops, is integrated into the DQN-based routing decision mechanism, effectively reducing transmission disruption and enabling faster delivery to the GW while balancing energy consumption. Consequently, FRDR significantly enhances PDR and network lifetime while maintaining a comparable transmission delay.

7. Conclusions

In this paper, we proposed a novel multi-hop routing protocol for LPWANs, named FRDR, which aims to reduce transmission disruption probability. FRDR comprehensively considered RFRV, distance to the GW, residual energy, and transmission hops as routing criteria, thereby deriving a low-latency, long-lifetime, and high-success-rate routing decision policy through a DQN-based framework. Simulation results confirmed that, compared with MHR and DQIR, our FRDR significantly reduces transmission disruption probability and extends network lifetime while maintaining a comparable delay. Specifically, when the PDR of MHR and DQIR drops to 0.80, FRDR maintains a higher PDR, achieving a minimum improvement of 14.71% and 18.90%, respectively.

Our current research focuses on multi-hop routing optimization under standardized scenarios with generalized assumptions. However, real-world deployments introduce non-ideal factors such as edge-positioned GWs, mobile GWs, and asymmetric link conditions, which significantly impact protocol performance. Consequently, future study will focus on addressing challenges arising from these non-ideal factors to enhance the robustness and scalability of the routing protocol in practical deployments. Additionally, field trials will be conducted to evaluate the practical feasibility and performance of the proposed protocol.

Author Contributions

Conceptualization, S.T.; methodology, S.T.; validation, S.T.; formal analysis, S.T.; data curation, S.T.; writing—original draft preparation, S.T.; writing—review and editing, S.T., H.T., J.W. and B.L.; supervision, J.W. and B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to express their sincere gratitude to Xue Zhao for her contributions to data curation and valuable discussions throughout the development of this study. We also sincerely thank the reviewers for their critical comments and suggestions for improving this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADR	Adaptive Data Rate algorithm
ADV	Advertisement
AHP	Analytical Hierarchy Process
BW	Bandwidth
CI	Consistency Index
CR	Consistency Ratio
DEN	Data-holding End-device Node
DNN	Deep Neural Network
DQN	Deep Q-Network
DQIR	DQN-based Intelligent Routing protocol
EN	End-device Node
FND	First Node Dead
FRDR	Failure Risk-aware Deep Q-network-based multi-hop routing protocol
GPS	Global Positioning System
GW	gateway
HND	Half Node Dead
LND	Last Node Dead
LPWAN	Low-Power Wide-Area Network
MDP	Markov Decision Process
MECP	Mean Energy Consumption for Delivering a Data Packet
MHR	Minimum Hop Routing protocol
MTD	Mean Transmission Delay
MTH	Mean Number of Transmission Hops
PDR	Packet Delivery Rate
PRM	Power Regulation Mechanism
REQ	Request
RFRV	Routing Failure Risk Value
RI	Random Consistency Index
RL	Reinforcement Learning
RSSI	Received Signal Strength Indicator
SF	Spreading Factor
SGD	Stochastic Gradient Descent
SNR	Signal-to-Noise Ratio
SPIN	Sensor Protocol for Information via Negotiation

References

Misbahuddin, M.; Iqbal, M.S.; Budiman, D.F.; Wiriasto, G.W.; Akbar, L.A.S.I. EAM-LoRaNet: Energy aware multi-hop LoRa network for Internet of Things. Kinetik 2022, 7, 81–90. [Google Scholar] [CrossRef]
Barrachina-Munoz, S.; Bellalta, B.; Adame, T.; Bel, A. Multi-hop communication in the uplink for LPWANs. Comput. Netw. 2017, 123, 153–168. [Google Scholar] [CrossRef]
Guo, Z.; Chen, H. A reinforcement learning-based sleep scheduling algorithm for cooperative computing in event-driven wireless sensor networks. Ad Hoc Netw. 2022, 130, 102837. [Google Scholar] [CrossRef]
Fang, W.; Zhu, C.; Zhang, W. Toward secure and lightweight data transmission for cloud–edge–terminal collaboration in artificial intelligence of things. IEEE Internet Things J. 2024, 11, 105–113. [Google Scholar] [CrossRef]
Sharma, N.; Thota, V.S.P.; Tankala, Y.; Tripathi, S.; Pandey, O.J. OptRISQL: Toward performance improvement of time-varying IoT networks using Q-learning. IEEE Trans. Netw. Serv. 2024, 21, 3008–3020. [Google Scholar] [CrossRef]
Wong, A.W.-L.; Goh, S.L.; Hasan, M.K.; Fattah, S. Multi-hop and mesh for LoRa networks: Recent advancements, issues, and recommended applications. ACM Comput. Surv. 2024, 56, 136. [Google Scholar] [CrossRef]
Fang, W.; Zhu, C.; Guizani, M.; Rodrigues, J.J.P.C.; Zhang, W. HC-TUS: Human cognition-based trust update scheme for AI-enabled VANET. IEEE Netw. 2023, 37, 247–252. [Google Scholar] [CrossRef]
Zolfaghari, D.; Taheri, H.; Rezaie, A.H.; Rezaei, M. A robust and reliable routing based on multi-hop information in industrial wireless sensor networks. Int. J. Ad Hoc Ubiquitous Comput. 2015, 19, 29–37. [Google Scholar] [CrossRef]
Li, J.; Wang, M.; Zhu, P.; Wang, D.; You, X. Highly reliable fuzzy-logic-assisted AODV routing algorithm for mobile ad hoc networks. Sensors 2021, 21, 5965. [Google Scholar] [CrossRef]
Xu, J.; Zhang, Y.; Jiang, J.; Kan, J. A multi-hop routing protocol based on link state prediction for intra-body wireless nanosensor networks. Ad Hoc Netw. 2021, 116, 102470. [Google Scholar] [CrossRef]
Fang, W.; Zhang, W.; Yang, W.; Li, Z.; Gao, W.; Yang, Y. Trust management-based and energy efficient hierarchical routing protocol in wireless sensor networks. Digit. Commun. Netw. 2021, 7, 470–478. [Google Scholar] [CrossRef]
Fang, W.; Zhu, C.; Yu, F.R.; Wang, K.; Zhang, W. Towards energy-efficient and secure data transmission in AI-enabled software defined industrial networks. IEEE Trans. Ind. Inf. 2022, 18, 4265–4274. [Google Scholar] [CrossRef]
Mukhutdinov, D.; Filchenkov, A.; Shalyto, A.; Vyatkin, V. Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system. Future Gener. Comp. Syst. 2019, 94, 587–600. [Google Scholar] [CrossRef]
Yang, X.; Yan, J.; Wang, D.; Xu, Y.; Hua, G. WOAD3QN-RP: An intelligent routing protocol in wireless sensor networks—A swarm intelligence and deep reinforcement learning based approach. Expert Syst. Appl. 2024, 246, 123089. [Google Scholar] [CrossRef]
Geng, X.; Zhang, B. Deep Q-network-based intelligent routing protocol for underwater acoustic sensor network. IEEE Sens. J. 2023, 23, 3936–3943. [Google Scholar] [CrossRef]
Pandey, O.J.; Yuvaraj, T.; Paul, J.K.; Nguyen, H.H.; Gundepudi, K.; Shukla, M.K. Improving energy efficiency and QoS of LPWANs for IoT using Q-learning based data routing. IEEE Trans. Cognit. Commun. Netw. 2022, 8, 365–379. [Google Scholar] [CrossRef]
Chilamkurthy, N.S.; Karna, N.; Vuddagiri, V.; Tiwari, S.K.; Ghosh, A.; Cenkeramaddi, L.R.; Pandey, O.J. Energy-efficient and QoS-aware data transfer in Q-learning-based small-world LPWANs. IEEE Internet Things J. 2023, 10, 22636–22649. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
Chang, Y.; Tang, H.; Li, B.; Yuan, X. Distributed joint optimization routing algorithm based on the analytic hierarchy process for wireless sensor networks. IEEE Commun. Lett. 2017, 21, 2718–2721. [Google Scholar] [CrossRef]
Qu, Z.; Xu, H.; Zhao, X.; Tang, H.; Wang, J.; Li, B. A fault-tolerant sensor scheduling approach for target tracking in wireless sensor networks. Alex. Eng. J. 2022, 61, 13001–13010. [Google Scholar] [CrossRef]
Yao, Y.D.; Li, H.C.; Zeng, Z.B.; Wang, C.; Zhang, Y.Q. Clustering routing protocol based on tuna swarm optimization and fuzzy control theory in wireless sensor networks. IEEE Sens. J. 2024, 24, 17102–17115. [Google Scholar] [CrossRef]
Liu, X.; Cao, Q.; Jin, B.; Zhou, P. CNCMSA-ERCP: An innovative energy-efficient clustering routing protocol for improving the performance of industrial IoT. IEEE Internet Things J 2025, 12, 11827–11840. [Google Scholar] [CrossRef]
Can, G.F.; Toktas, P.; Pakdil, F. Six sigma project prioritization and selection using AHP-CODAS integration: A case study in healthcare industry. IEEE Trans. Eng. Manag. 2023, 70, 3587–3600. [Google Scholar] [CrossRef]
Fang, W.; Cui, N.; Chen, W.; Zhang, W.; Chen, Y. A Trust-Based Security System for Data Collection in Smart City. IEEE Trans. Ind. Inf. 2021, 17, 4131–4140. [Google Scholar] [CrossRef]
Saaty, T.L. Decision making with the analytic hierarchy process. Int. J. Serv. Sci. 2008, 1, 83–98. [Google Scholar] [CrossRef]
Saaty, R.W. The analytic hierarchy process—What it is and how it is used. Math. Model. 1987, 9, 161–176. [Google Scholar] [CrossRef]
Saaty, T.L. Theory and Applications of the Analytic Network Process: Decision Making with Benefits, Opportunities, Costs, and Risks, 3rd ed.; RWS publications: Pittsburgh, PA, USA, 2005. [Google Scholar]
Kulik, J.; Heinzelman, W.; Balakrishnan, H. Negotiation-based protocols for disseminating information in wireless sensor networks. Wirel. Netw. 2002, 8, 169–185. [Google Scholar] [CrossRef]
Jiang, C.; Yang, Y.; Chen, X.; Liao, J.; Song, W.; Zhang, X. A new-dynamic adaptive data rate algorithm of LoRaWAN in harsh environment. IEEE Internet Things J. 2022, 9, 8989–9001. [Google Scholar] [CrossRef]
Benkahla, N.; Tounsi, H.; Ye-Qiong, S.; Frikha, M. Enhanced ADR for LoRaWAN networks with mobility. In Proceedings of the 15th International Wireless Communications & Mobile Computing Conference, Tangier, Morocco, 24–28 June 2019. [Google Scholar]
Hassen, H.; Meherzi, S.; Jemaa, Z.B. Improved exploration strategy for Q-learning based multipath routing in SDN networks. J. Netw. Syst. Manag. 2024, 32, 25. [Google Scholar] [CrossRef]
Milarokostas, C.; Tsolkas, D.; Passas, N.; Merakos, L. A comprehensive study on LPWANs with a focus on the potential of LoRa/LoRaWAN systems. IEEE Commun. Surv. Tutor. 2023, 25, 825–867. [Google Scholar] [CrossRef]
Marini, R.; Mikhaylov, K.; Pasolini, G.; Buratti, C. LoRaWANSim: A flexible simulator for LoRaWAN networks. Sensors 2021, 21, 695. [Google Scholar] [CrossRef] [PubMed]
Sah, D.K.; Cengiz, K.; Donta, P.K.; Inukollu, V.N.; Amgoth, T. EDGF: Empirical dataset generation framework for wireless sensor networks. Comput. Commun. 2021, 180, 48–56. [Google Scholar] [CrossRef]

Figure 1. Routing diagram with and without RFRV.

Figure 2. Overall flowchart of FRDR.

Figure 3. Diagram of SPIN-based message interaction flow.

Figure 4. Framework of DQN-based routing decision mechanism.

Figure 5. Simulation experiment scenario diagram of FRDR.

Figure 6. Simulation experiment scenario diagram of multi-hop routing: (a) FRDR, (b) MHR, (c) DQIR, (d) PFRS, (e) PFRD, and (f) PRRS.

Figure 7. Comparisons between FRDR, PRRS, PFRD, and PFRS in terms of the residual energy of the network, PDR, and HND: (a) residual energy of the network, (b) PDR, and (c) HND.

Figure 8. Residual energy of the network under different densities: (a) 300, (b) 350, and (c) 400.

Figure 9. PDR under different densities: (a) 300, (b) 350, and (c) 400.

Table 1. 1–9 scale.

Scale	Numerical Rating	Reciprocal
Equally importance	1	1
Slight importance	2	1/2
Moderate importance	3	1/3
Moderate to strong importance	4	1/4
Strong importance	5	1/5
Strong to very strong importance	6	1/6
Very strong importance	7	1/7
Very strong to extreme importance	8	1/8
Extreme importance	9	1/9

Table 2. Random consistency index.

k	1	2	3	4	5	6	7	8	9	10
RI	0	0	0.58	0.9	1.12	1.24	1.32	1.41	1.45	1.49

Table 3. Pairwise comparison matrix for three criteria in RFRV.

	$E_{n}$	$N_{n}$	$L Q$
$E_{n}$	1	2	3
$N_{n}$	1/2	1	2
$L Q$	1/3	1/2	1

Table 4. Pairwise comparison matrix for three criteria in

{\hat{r}}_{t}

.

Table 4. Pairwise comparison matrix for three criteria in

{\hat{r}}_{t}

.

	${\tilde{r}}_{t}^{1}$	${\tilde{r}}_{t}^{2}$	${\tilde{r}}_{t}^{3}$
${\tilde{r}}_{t}^{1}$	1	2	4
${\tilde{r}}_{t}^{2}$	1/2	1	3
${\tilde{r}}_{t}^{3}$	1/4	1/3	1

Table 5. Energy consumption parameters.

$l_{t x}$	7	6	5	4	3	2	1
$P_{T}$ [dBm]	14	12	10	8	6	4	2
$I_{t x}$ [mA]	38	35.1	32.4	30	27.5	24.7	22.3
$I_{r x}$ [mA]	14.2
$V_{DD}$ [V]	3.3

Table 6. Self-contrasting algorithms.

	Transmit Power Level	Routing Decision Mechanism
PFRS	$Transmit with l_{t x}^{\max}$	Random selection
PFRD	$Transmit with l_{t x}^{\max}$	Algorithm 3
PRRS	PRM	Random selection
FRDR	PRM	Algorithm 3

Table 7. Network parameters.

Parameters	Value	Parameters	Value
$N$	300/350/400	$e_{t h}$	100 mJ
$R_{c h a n g e}^{-}$	0.5	$n_{p r}$	8 symbols
$n_{p l}$ of ADV/REQ	1 B	$n_{p l}$ of data packet	300 B

Table 8. Parameters of DQN.

Parameters	Value	Parameters	Value
$N_{e p s}^{m a x}$	100	$N_{i t e r}^{m a x}$	1000
$α$	0.009	$γ$	0.95
$\|M\|$	5000	$B$	64
$C_{\exp}$	10	$C_{t}$	400
$R_{\max}$	1	$τ$	0.2
$η$	2	$H_{\max}$	10
$ε_{s t a r t}$	0.5	$ε_{e n d}$	0.01

Table 9. Critical attributes of candidate EN for Router1.

ID	RFRV	Distance (m)	Residual Energy (J)
1	0.1648	499.0684	5.9358
78	0.3042	461.9026	5.9358
88	0.2711	559.5603	5.9358
93	0.4604	534.8417	5.9358
167	0.1247	436.6732	5.9358
169	0.1389	444.5297	5.9358
245	0.2584	529.7191	5.9358
252	0.3133	542.9591	5.9358
276	0.2191	500.7071	5.9358
290	0.2280	550.7258	5.9358

Table 10. MTH, MTD, and MECP for delivering the first 1000 packets.

	PFRS	PRRS	PFRD	FRDR
MTH	4.82	5.12	3.82	4.02
MTD (s)	5.58	5.76	4.16	4.30
MECP (J)	0.73	0.69	0.54	0.52

Table 11. HND under different densities.

	FRDR	MHR	DQIR
300	2516	2249	2205
350	2651	2298	2445
400	2736	2420	2452

Table 12. MTH, MTD, and MECP for delivering the first 1000 packets under different densities.

		FRDR	MHR	DQIR
300	MTH	4.02	3.76	4.06
	MTD (s)	4.30	4.14	5.94
	MECP (J)	0.52	0.54	0.70
350	MTH	3.88	3.57	3.60
	MTD (s)	4.30	4.07	5.47
	MECP (J)	0.54	0.56	0.67
400	MTH	3.65	3.39	3.52
	MTD (s)	4.21	4.04	5.72
	MECP (J)	0.55	0.58	0.72

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tao, S.; Tang, H.; Wang, J.; Li, B. A Failure Risk-Aware Multi-Hop Routing Protocol in LPWANs Using Deep Q-Network. Sensors 2025, 25, 4416. https://doi.org/10.3390/s25144416

AMA Style

Tao S, Tang H, Wang J, Li B. A Failure Risk-Aware Multi-Hop Routing Protocol in LPWANs Using Deep Q-Network. Sensors. 2025; 25(14):4416. https://doi.org/10.3390/s25144416

Chicago/Turabian Style

Tao, Shaojun, Hongying Tang, Jiang Wang, and Baoqing Li. 2025. "A Failure Risk-Aware Multi-Hop Routing Protocol in LPWANs Using Deep Q-Network" Sensors 25, no. 14: 4416. https://doi.org/10.3390/s25144416

APA Style

Tao, S., Tang, H., Wang, J., & Li, B. (2025). A Failure Risk-Aware Multi-Hop Routing Protocol in LPWANs Using Deep Q-Network. Sensors, 25(14), 4416. https://doi.org/10.3390/s25144416

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Failure Risk-Aware Multi-Hop Routing Protocol in LPWANs Using Deep Q-Network

Abstract

1. Introduction

2. Related Studies

3. Brief Review of DQN

4. System Models

4.1. Network Model

4.2. Routing Failure Risk Value

5. Detailed Description of the Proposed Protocol

5.1. Power Regulation Mechanism

5.2. DQN-Based Routing Decision Mechanism

5.2.1. MDP Model for FRDR

5.2.2. DQN Architecture

5.2.3. Network Training and Routing Decision

6. Simulation Results and Analysis

6.1. Simulation Models

6.1.1. Packet Transmission Time Measurement Model

6.1.2. Energy Consumption Measurement Model

6.1.3. Path Loss Measurement Model

6.1.4. RSSI and SNR Measurement Models

6.2. Simulation Setup

6.3. Performance Analysis

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI