Optimizing LoRaWAN Performance Through Learning Automata-Based Channel Selection

Atadet, Luka Aime; Musabe, Richard; Hitimana, Eric; Gatera, Omar

doi:10.3390/fi17120555

Open AccessArticle

Optimizing LoRaWAN Performance Through Learning Automata-Based Channel Selection

¹

School of Information and Communication Technology (SoICT), College of Science and Technology, University of Rwanda, Kigali P.O. Box 3900, Rwanda

²

African Centre of Excellence in Internet of Things, College of Science and Technology, University of Rwanda, Kigali P.O. Box 3900, Rwanda

^*

Authors to whom correspondence should be addressed.

Future Internet 2025, 17(12), 555; https://doi.org/10.3390/fi17120555

Submission received: 7 June 2025 / Revised: 18 September 2025 / Accepted: 25 September 2025 / Published: 2 December 2025

Download

Browse Figures

Versions Notes

Abstract

The rising demand for long-range, low-power wireless communication in applications such as monitoring, smart metering, and wide-area sensor networks has emphasized the critical need for efficient spectrum utilization in LoRaWAN (Long Range Wide Area Network). In response to this challenge, this paper proposes a novel channel selection framework based on Hierarchical Discrete Pursuit Learning Automata (HDPA), aimed at enhancing the adaptability and reliability of LoRaWAN operations in dynamic and interference-prone environments. HDPA leverages a tree-structure reinforcement learning model to monitor and respond to transmission success in real-time, dynamically updating channel probabilities based on environmental feedback. Simulation results conducted in MATLAB R2023b demonstrate that HDPA significantly outperforms conventional algorithms such as Hierarchical Continuous Pursuit Automata (HCPA) in terms of convergence speed, selection accuracy, and throughput performance. Specifically, HDPA achieved 98.78% accuracy with a mean convergence of 6279 iterations, compared to HCPA’s 93.89% accuracy and 6778 iterations in an eight-channel setup. Unlike the Tug-of-War-based Multi-Armed Bandit strategy, which emphasizes fairness in real-world heterogeneous networks, HDPA offers a computationally lightweight and highly adaptive solution tailored to LoRaWAN’s stochastic channel dynamics. These results position HDPA as a promising framework for improving reliability and spectrum utilization in future IoT deployments.

Keywords:

channel selection; pursuit learning automata; estimator based–LA; LoRaWAN; HCPA

1. Introduction

The proliferation of the Internet of Things (IoT) and the associated demand for ubiquitous, low-power, and long-range wireless communication have propelled the development and adoption of Long-Range Wide Area Network (LoRaWAN) systems [1]. LoRaWAN offers a compelling framework for IoT applications due to its ability to provide wide-area coverage with minimal energy consumption [2]. However, as deployments expand, especially in urban and industrial environments with increasing device densities, maintaining high network performance becomes a significant challenge [3]. This is particularly problematic in a LoRaWAN environment where resource constraints prevent the deployment of computationally intensive solutions. As node density increases, the probability of channel congestion and transmission failure grows, which calls for lightweight self-adapting methods for intelligent spectrum access. The core difficulty lies in effective radio channel selection amid dynamic, congested, and interference-prone environments.

LoRaWAN’s pseudo-random channel hopping spreads traffic but does not adapt to time-varying interference, so collisions and retransmissions increase with node density. Prior machine–learning approaches either assume feature-rich sensing and compute budgets, such as Q-learning with large state-action spaces, or require hardware validation in heterogeneous testbeds, which can be impractical for constrained nodes [4]. This motivates a computationally light, feedback-driven method that learns good channels with minimal state and no handcrafted features, precisely the niche addressed by HDPA’s pursuit-based hierarchical updates.

To address this, our research introduces a Learning Automata (LA)-based solution, specifically the Hierarchical Discrete Pursuit Learning Automata (HDPA), as an optimal channel selection mechanism for LoRaWAN. Learning Automata, a class of reinforcement learning algorithms, operate by interacting with a stochastic environment to identify the best actions through trial-and-error processes based on rewards and penalties [5]. The HDPA model extends traditional LA by employing a hierarchical structure that allows for faster and more accurate convergence to the optimal channel, especially in multi-step, dynamic environments like LoRaWAN. Our approach is “intelligent” in that it adapts decisions from ongoing success/failure signals rather than following fixed rules. It is “sequential” because the hierarchy decomposes the action space and refines choices level by level, balancing exploration and exploitation more efficiently than random selection; this yields faster convergence to a near-deterministic policy on the best channel under a fixed threshold [6]. Beyond adapting a general pursuit-learning idea, the proposed HDPA introduces a level-wise tree structure that narrows the search space and propagates rewards upward, which is not present in HCPA or standard pursuit schemes. This hierarchy enables fast exploitation of promising branches while retaining a small per-node state, a property well aligned with the binary-feedback nature of LoRaWAN and its need for lightweight control. Compared with HCPA’s flat update of all actions, HDPA learns through successive depth-limited refinement, which reduces unnecessary exploration and lowers iteration counts when the number of channels grows.

The objectives of this research are multifaceted: to critically review existing channel selection techniques in LoRaWAN and their limitations; to design and implement the HDPA model for LoRaWAN environments; and to evaluate the model’s performance against existing solutions, such as Hierarchical Continuous Pursuit Automata (HCPA), through rigorous simulations [7]. We hypothesize that HDPA will demonstrate superior performance in terms of throughput, convergence speed, and decision-making accuracy.

The proposed methodology combines theoretical modeling, algorithmic design, and simulation-based validation using MATLAB. The simulations are configured with realistic network scenarios, channel characteristics, and iterative experiments to assess metrics like accuracy, standard deviation, and convergence time.

Preliminary results affirm that HDPA significantly outperforms HCPA, especially under high-density and variable channel conditions. With a mean convergence iteration of approximately 6279.64 and an accuracy of 98.78%, HDPA proves to be a highly effective algorithm for channel classification and selection in LoRaWAN. Beyond outperforming HCPA, our work also addresses a different problem space compared to reinforcement learning-based approaches, such as the Tug-of-War (TOW) dynamics method. While the TOW-MAB algorithm is validated on real IoT hardware and focuses on coexistence scenarios, it inherently models multi-network interference and requires physical deployment for performance assessment. In contrast, HDPA is tailored for LoRaWAN’s stochastic channel behavior, using a simulation-driven and computationally lightweight design that scales efficiently to high channel counts without the need for physical synchronization. This positions HDPA as a complementary alternative-providing a theoretically grounded, simulation-verified solution with quantifiable gains in accuracy and convergence speed under dense deployment conditions.

The remainder of this paper is organized as follows. Section 2 provides a detailed summary of the related work. Section 3 describes and analyzes the system model along with the channel selection problems. Section 4 presents extensive simulation results that demonstrate the advantages of using HDPA for channel selection. Finally, Section 5 concludes the paper.

2. Related Work

Several studies have investigated the optimization of LoRaWAN-based IoT networks using machine learning and analytical approaches. In [8], the authors investigated SF prediction using supervised ML algorithms in a mobile LoRaWAN environment. They evaluated various classifiers, including k-Nearest Neighbors, Decision Trees, Random Forests, and Multinomial Logistic Regression, using manually selected features such as RSSI and SNR, antenna height, distance to the gateway, and frequency. The study identified RSSI and SNR as the most significant predictors, achieving around 65% accuracy. However, the model was limited by manual feature selection and a constrained urban dataset.

In the context of large-scale smart city deployments [9] employed ML models to predict Estimated Signal Power (ESP) using data collected from over 30,000 smart water meters across Cyprus. Decision Trees and XGBoost classifiers were used to forecast ESP based on environmental and topographical features, to improve network planning and deployment efficiency. Although the models showed high predictive performance, the focus on ESP alone limits broader insights into other critical metrics like packet delivery ratio and throughput.

Addressing coexistence challenges in Low-power wide-area networks (LPWANs), Ref. [10] proposed an analytical interference model between LoRaWAN and Sigfox networks. The model accounted for parameters such as duty cycle and node density and was validated using SEAMCAT simulations. The concept of “protection distance” was introduced to minimize mutual interference. Despite its theoretical contributions, the model assumed uniform node distribution and ignored dynamic network characteristics such as adaptive SF.

To improve transmission efficiency and data rates, ref. [11] developed a resource allocation algorithm (LRA) for LoRa devices equipped with dual transceivers. The algorithm leveraged the quasi-orthogonality of different SFs to concurrently transmit data, effectively increasing the channel capacity. Evaluations showed significant improvements in transmission time and bit rate, particularly for large data transfers like image transmissions.

Reinforcement Learning has also been explored for decentralized channel selection in dense LoRaWAN environments [12] implemented a lightweight multi-armed bandit (MAB) algorithm based on Tug-of-War (ToW) dynamics on actual LoRa hardware. Their approach demonstrated superior convergence and channel allocation performance compared to traditional RL strategies like UBC1 + Tuned and ε-greedy. However, the evaluation was limited to a small-scale indoor setup with restricted channel diversity. A subsequent study by the same authors [13] tested the MAB-based strategy in urban outdoors environments using Lazurite 920J devices. This follow–up algorithm’s compatibility with coexisting LPWANs like Sigfox.

In more complex cognitive radio-based IIoT applications, Ref. [14] introduced a dual Q-learning framework for proactive spectrum handoff. By jointly estimating channel availability and RSSI trends, the algorithm minimized latency and improved throughput in dynamic wireless environments. However, the dual Q-learning framework incurs high computational and memory costs due to its reliance on real-time RSSI trend estimation and state-action learning. This makes it less applicable to constrained IoT nodes. In contrast, our proposed HDPA algorithm is computationally lightweight and requires only binary feedback; enabling real-time adaptation without the overhead accelerates convergence by decomposing the action space, making it more scalable in a high-channel-count environment.

At the MAC layer, slotted Aloha has been proposed as a potential enhancement over pure Aloha in LoRaWAN. Ref. [15] used simulations to evaluate the performance of slotted Aloha under different traffic conditions, reporting up to 67% improvement in reducing collisions. Another work presented a Markov-based model to determine the optimal retransmission probability, balancing throughput and delay.

In [16]. The authors develop a comprehensive mathematical model to evaluate the throughput capacity of a LoRaWAN communication channel. The model accounts for key parameters such as Spreading Factor (SF), duty cycle, payload size, and message structure, offering quantitative insights into how these variables affect performance. The study further explores throughput under different regional duty cycle regulations (0.1%, 1%, and 10%), providing valuable guidance for regulatory compliance and deployment planning. It analyzes the trade-offs between message repetitions and range, showing that a reduction in repetitions with higher SF can lead to a 28% increase in throughput.

To address collision management in dense networks, Ref. [17] introduces CANL, an open-loop collision avoidance protocol that leverages neighbor listening instead of relying on the unreliable Channel Activity Detection (CAD) mechanism. CANL significantly enhances the packet delivery ratio (PDR) and energy efficiency in dense deployments. The authors also propose CANL RTS, a variant that overcomes hardware limitations by employing a short request-to-send (RTS) frame. Extensive simulation results using an enhanced LoRaSim tool demonstrate CANL’s superiority over traditional ALOHA and CAD + Backoff schemes.

3. Methods

This section outlines the research methodology adopted for the study. A scientific approach forms the foundation of the work, with a strong emphasis on quantitative methods to support experimental analysis. The chosen methods ensure a systematic and objective investigation of the research objectives.

3.1. System Model

We consider a single-gateway LoRaWAN star topology with two end devices transmitting uplinks and receiving ACK–based feedback via the gateway. Each of the N channels is modeled as an independent Bernoulli process with fixed success probability

P i \in (0,1)

over a simulation run. Independence abstracts away correlated fading and cross-technology interference; we acknowledge this as a limitation and note that correlation and cross-time variation will be addressed in future work. The gateway returns binary feedback

β (t) \in \{0,1\}

, where

β (t) = 0

denotes a successful transmission (ACK) on iteration t, and

β (t) = 1

denotes failure. These signals drive the online updates of the HDPA.

The proposed model shown in Figure 1 illustrates the interaction between multiple components in a LoRa-based communication system designed to optimize data transmission through adaptive learning and feedback mechanisms.

The node is an end device equipped with a radio transmitter that sends data packets. The decision maker represented by the robot implements the Hierarchical Discrete Pursuit Learning Automata (HDPA) algorithms. Its role is to select the optimal radio channel for data transmission based on past transmission success rate feedback from the gateway. The decision maker updates the probability distribution of channel selection using the number of times it received a reward over the number of times it was selected.

The feedback loop represents the action taken by the decision maker regarding which channel to use for the next data transmission. Beta represents the feedback received from the gateway. If the transmission is successful, the decision maker receives positive feedback reinforcing the chosen channel. If unsuccessful, the decision maker receives negative feedback, decreasing the likelihood of selecting that channel again.

3.2. Proposed Algorithm Flowchart

In terms of process flow as illustrated in Figure 2, the node transmits data to the gateway using a channel selected by the decision maker. The gateway provides feedback on the transmission’s success. Positive feedback increases the probability of selecting a successful channel in the future, while negative feedback decreases the probability of encouraging the decision maker to explore other channels. The decision maker then continuously updates its channel selection probabilities based on the feedback, adapting to the dynamic network environment to optimize data transmission reliability.

3.3. Mathematical Development

At iteration t, the root automaton samples a branch according to its current probability vector and activates the child; this sequential sampling continues down the tree until a leaf corresponding to a specific channel is reached. After transmitting it on that channel, the gateway returns

β (t)

. The leaf updates its reward estimate and selection probability via pursuit updates with a learning rate

δ

. The same feedback is propagated upward, so each ancestor along the chosen path updates its local two-branch probabilities toward the branch with the higher inherited estimate. If at any depth, a branch probability exceeds the convergence threshold B (close to 1), the updating stops; once all depths on the active path are frozen, the overall policy has converged. The algorithmic formulation of the proposed model is presented in Algorithm 1.

Algorithm 1: Hierarchical Learning Automata-Based Channel Selection
	Input: System structure (tree depth K, learning rates δ, Δ), initial probabilities, and environment feedback β(t).
	Output: A stable (converged) policy that selects the best-performing channel with the highest learned reward estimate.
1	Initialize: Set $t = 0$ . Initialize all probability vectors and reward estimates.
2	Loop
3		Depths from 0 to K − 1:
4			$A_{[0,1]}$ selects a channel by randomly sampling as per its channel probability vector $[p_{\{1,1\}} (t), p_{\{1,2\}} (t)]$ .
5			We denote $j_{1} (t)$ as the chosen channel at depth 0 with $j_{1} (t) ϵ \{1,2\}$ . $A_{\{1, j_{1} (t)\}}$ , chooses a channel and activates the next LA at depth «2». The process continues until K − 1, which is the level that chooses the channel.
6		2. Depth K:
7			The index of the channel chosen at depth K is denoted $j_{K} (t) ϵ \{1, \dots 2^{K}\}$ .
8			Update the estimated chance of reward based on the response received from the environment at leaf depth K: $u_{\{K, J k (t)\}} (t + 1) = u_{\{K, J k (t)\}} (t) + (1 - β_{(t)})$ $v_{\{K, J k (t)\}} (t + 1) = v_{\{K, J k (t)\}} (t) + 1$ ${\hat{d}}_{\{K, J k (t)\}} (t + 1) = \frac{u_{\{K, J k (t)\}} (t + 1)}{v_{\{K, J k (t)\}} (t + 1)}$ . For the other channel at the leaf, where $j ϵ \{1, \dots, 2^{K}\}$ and $j \neq j k (t)$ : $u_{\{K, J\}} (t + 1) = u_{\{K, J\}} (t)$ $v_{\{K, J\}} (t + 1) = v_{\{K, J\}} (t)$ ${\hat{d}}_{\{K, J\}} (t + 1) = \frac{u_{\{K, J\}} (t + 1)}{v_{\{K, J\}} (t + 1)}$
9		3. Define the reward estimate recursively for all subsequent channels along the path to the root, $k ϵ \{0, \dots K - 1\}$ , where $A$ at any one level inherits the feedback from the $A$ at the level below: ${\hat{d}}_{\{k, j\}} (t) = m a x ({\hat{d}}_{\{k + 1,2 j - 1\}} (t), {\hat{d}}_{\{k + 1,2 j\}} (t))$ The reward estimate $u_{\{K, J k (t)\}} (t + 1)$ and attempt count $v_{\{K, J\}} (t + 1)$ are updated as: $u_{\{K, J k (t)\}} (t + 1) = u_{\{K, J k (t)\}} (t) + (1 - β_{(t)}) v_{\{K, J\}} (t + 1) = v_{k, j} (t) + 1$
10		4. Update the channel probability vectors along the path to the leaf with the current maximum reward estimate: Each $A$ $j ϵ \{1, \dots, 2^{k}\}$ at depth k where $k ϵ \{0, \dots, K - 1\}$ has two channels $α \{k + 1,2 j - 1\}$ and $α \{k + 1,2 j\}$ . We denote the larger element between ${\hat{d}}_{\{k + 1, 2 j - 1\}} (t)$ and ${\hat{d}}_{\{k + 1, 2 j\}} (t)$ as $j_{k + 1}^{h} (t) \{2 j - 1,2 j\}$ and the lower reward estimate as $\bar{j_{k + 1}^{h} (t)} = \{2 j - 1,2 j\} ∖ j_{k + 1}^{h} (t)$ .
				Update $P_{\{k + 1, j_{k + 1}^{h} (t)\}}$ and $P_{\{k + 1, \bar{j_{k + 1}^{h} (t)}\}}$ using the estimate ${\hat{d}}_{\{k + 1, 2 j - 1\}} (t)$ and ${\hat{d}}_{\{k + 1,2 j\}} (t)$ for all $k ϵ \{0, \dots, K - 1\}$ as:
11				If $β (t) = 0$ Then
12					$P_{\{k + 1, j_{k + 1}^{h} (t)\}} (t + 1) = m i n (P_{\{k + 1, j_{k + 1}^{h} (t)\}} (t) + Δ, 1),$
13					$P_{\{k + 1, \bar{j_{k + 1}^{h} (t)}\}} (t + 1) = 1 - P_{\{k + 1, j_{k + 1}^{h} (t)\}} (t + 1),$
14				Else
15				If β(t) = 1 Then
16					$P_{\{k + 1, j_{k + 1}^{h} (t)\}} (t + 1) = m a x (P_{\{k + 1, j_{k + 1}^{h} (t)\}} (t) + Δ, 0),$
17					$P_{\{k + 1, \bar{j_{k + 1}^{h} (t)}\}} (t + 1) = 1 - P_{\{k + 1, j_{k + 1}^{h} (t)\}} (t + 1)$
18					$P_{\{k + 1, \bar{j_{k + 1}^{h} (t)}\}} (t + 1) = P_{\{k + 1, \bar{j_{k + 1}^{h} (t)}\}} (t),$
19					$P_{\{k + 1, \bar{j_{k + 1}^{h} (t)}\}} (t + 1) = P_{\{k + 1, \bar{j_{k + 1}^{h} (t)}\}} (t),$
20				End if
21		5. For each Learning Automata, if either of its channel selection probabilities surpasses a threshold T, with T being a positive number close to unity, the channel probability will stop updating, meaning the convergence is achieved.
22		6. t = t + 1
23	End Loop
	Return: Optimal channel with maximum estimated reward.

3.4. Channel Propagation

The propagation model assumes a static radio environment with channel characteristics represented by success probabilities between 0 and 1. These probabilities simulate the likelihood of successful transmission and are shown in Table 1. The values are derived based on typical LoRaWAN conditions, incorporating packet loss due to noise, interference, and gateway distance. No physical propagation model is used explicitly; instead, this probabilistic abstraction allows simplified performance benchmarking of learning behaviors.

3.5. Software Environment

MATLAB is chosen for its robust capabilities and extensive support for simulations involving complex algorithms and network models.

3.6. System Simulation

This process combines theoretical modeling with practical experiments to validate the hypothesis that HDPA can enhance the efficiency and reliability of channel selection in LoRaWAN networks.

The simulation setup begins with the configuration of the node to transmit data packets. This node represents an end device in the LoRaWAN network, equipped with a radio transmitter. The simulation adheres to the channel access mechanism defined in the LoRa Alliance [18]. The node interacts with the gateway, responsible for receiving the transmitted data. The gateway acts as an intermediary, forwarding the data to a server network for processing and storage.

At the heart of the simulation is the decision maker, which implements the HDPA algorithm. The decision maker selects the optimal radio channel for data transmission based on the feedback from the previous transmissions. This feedback involves the gateway providing success or failure notifications for each transmission, which the decision maker uses to update its channel selection probabilities.

The simulation is conducted in a MATLAB software environment, chosen for its robust capabilities in handling complex algorithms, providing a platform for running extensive simulations to assess performance under various conditions. The simulation parameters include the node, the channel available the successful data transmission.

Throughout the simulation, key performance metrics are monitored. Including accuracy, the overall network throughput, Std, and speed. By analyzing these metrics, the effectiveness of the HDPA can be evaluated.

3.7. Simulation Variable

The simulation setup relies on a set of well-defined parameters that govern the evaluation of the proposed HDPA algorithm. Table 2 summarizes the simulation variables, including the number of channels, initial probability vectors, learning rate, convergence threshold, and reward estimates, which collectively form the foundation for assessing channel selection performance in LoRaWAN.

This section has detailed the methodological framework used to assess the HDPA algorithm for channel selection in LoRaWAN. By combining theatrical modeling with simulation-based validation and by outlining the system model and algorithmic structure, the study establishes a solid foundation for evaluating HDPA’s performance. The next section presents and analyzes the results obtained through this methodology.

3.8. Implementation Feasibility on End Devices

LoRaWAN Class-A nodes typically provide only a few kilobytes of RAM and limited CPU cycles. HDPA’s memory footprint grows approximately linearly with the number of channels: for N channels, the automaton stores 2 (N − 1) probabilities plus N reward counters, about 200 iterations for using 32-bit floats. Each update involves a handful of additions and multiplications along a depth log2N path, giving a 0 (log N) computational cost per uplink. These estimates suggest that HDPA is practical for typical STM32 OR esp32-class MCUs, though profiling on actual hardware is needed for confirmation.

4. Results and Discussion

4.1. Performance Evaluation and Comparison Analysis

This section presents the outcomes of employing Learning Automata into LoRaWAN, highlighting the critical importance of efficient channel selection for network performance, aiming to test the effectiveness of HDPA. We present a comprehensive result from the simulation that was conducted and discuss the implications of these findings. This analysis not only highlights the strengths of HDPA but also compares it with HCPA.

Unless stated otherwise, we use B = 0.99 as the convergence criterion and L = log2N hierarchy level. A run is deemed converged when, at every depth along the active path, the larger branch probability exceeds B. We report the mean and standard deviation of the iteration count across 200 independent experiments, alongside accuracy and throughput summaries

M e a n = \frac{\sum_{i = 1}^{n} X_{i}}{n}

(1)

where

X_{i}

is the number of iterations in the i-th trial.

n

: the total number of iterations.

V a r i a n c e = \frac{\sum_{i = 1}^{n} {(X_{i} - M e a n)}^{2}}{n}

(2)

S t a n d a r d d e v i a t i o n = \sqrt{V a r i a n c e}

(3)

The performance of HDPA is demonstrated here using the formula above and carrying out the simulation results to ensure the effectiveness of our simulations; we set our number of iterations to be 9000 and 10,000, using 200 experiments, expecting that the HDPA with the highest successful transmission probability would converge faster and select the best channel.

The simulation was performed for the environment with 8 channels on a benchmark successful transmission probabilities list in Figure 3, showing the probability of the actions with successful transmission, meaning the action with

β = 0

which is a reward from the environment.

From Table 3, our simulation shows that the HDPA with a small learning parameter can converge to the optimal channel with highly successful data transmission. A higher learning parameter leads to fast convergence to the optimal channel; however, when we set the Learning parameter higher than 0.00087, the algorithm did not converge to the best channel with a successful transmission probability. Therefore, to find the optimal channel with a higher speed of convergence, we decreased the Learning parameter step by step until we achieved 98.78% accuracy. From this value, the algorithm converged to the optimal channel, but took all the iterations that were set. The Mean value to converge to the optimal channel for the 200 experiments with the convergence criterion of 0.99 was 6279.64, confirming that the HDPA achieved a 0.99 probability of choosing one of the channels with a std of 131.36% on the benchmark probabilities.

The throughput curve presented in Figure 4 demonstrates the learning behavior of the HDPA algorithm over successive iterations. Initially, the throughput increases sharply, indicating that the algorithm is rapidly acquiring knowledge about the environment and selecting efficient channels. The early phase reflects the exploratory strength of HDPA in adapting to dynamic conditions. As iterations progress, the throughput gradually levels off and stabilizes around 450 bps, signifying convergence to a set of optimal channels. This steady-state performance suggests that HDPA has effectively learned the optimal channel strategy, resulting in sustained high throughput. The graph validates the effectiveness of the proposed approach in optimizing network performance through fast adaptation and robust learning in a fluctuating communication environment.

Figure 5 illustrates the process of selecting the optimal channel from a set of eight available channels. Initially, all channels are explored for communication, with one channel demonstrating a consistently higher probability of successful message transmission, while others perform with comparatively lower success rates. At the beginning of the simulation, there is no prior knowledge regarding which channel is optimal. The Learning Automata mechanism enables the system to gradually converge toward the most effective channel, thereby maximizing throughput. Over time, channels 2 and 7 are identified by the HDPA as the best and second-best channels, respectively. Around iteration 3000, channel 7 temporarily outperforms channel 2. However, due to the stochastic nature of the learning and decision-making process, channel 2 is ultimately selected as the optimal channel, highlighting the HDPA’s capability to balance exploration and exploitation, ensuring adaptability while optimizing long-term performance.

The comparative analysis between HDPA and HCPA, as illustrated in Figure 6, reveals key performance differences. When the convergence criterion was set to 0.9, HCPA outperformed HDPA by converging in approximately 3500 iterations, compared to over 4500 for HDPA across 200 experiments. However, as the convergence threshold increased toward 0.99, our target for successful data transmission, HDPA, began to outperform HCPA. For instance, at a convergence level of 0.97, HDPA converged in around 8000 iterations, while HCPA required about 9000.

As indicated in Table 4, the optimal learning rates identified were 0.00087 for HDPA and 0.00069 for HCPA. In terms of mean iterations to converge, HDPA averaged 6279.64 with a higher standard deviation, indicating greater variability, whereas HCPA averaged 6778.34 with a lower standard deviation of 117.12. Importantly, HDPA achieved higher accuracy, close to 99%, making it more effective for precise channel selection in LoRaWAN compared to HCPA.

Although these findings demonstrate clear convergence and accuracy advantages, they were obtained in an abstract Bernoulli channel model without duty-cycle limits, fading. The current evaluation isolates learning dynamics to verify algorithmic behavior. Assessing HDPA under realistic network load, interference, and power budgets, remain an essential next step.

Relative to [14]’s ToW-based MAB on hardware, our simulation focuses on decision accuracy and convergence under controlled stochastic channels rather than FSR/FI under heterogeneous coexistence. The different instruments and environments preclude a direct overlay; however, Table 5 summarizes the methodological contrasts (learning structure, metrics, environment, and implementation). Taken together, the results demonstrate that under our LoRaWAN–focused assumptions, HDPA achieves higher selection accuracy and competitive convergence, suggesting that hierarchical pursuit is an effective low-overhead alternative when binary feedback and constrained nodes are the dominant considerations.

4.2. Sensitivity Analysis

We assess robustness to the learning rate

δ

, threshold B, and channel count N. Increasing

δ

accelerates early learning but can induce premature lock-in and degrade accuracy beyond δ ≈ 8.7 × 10⁻⁴; decreasing

δ

improves stability and final accuracy but increases mean iterations. Tightening B from 0.97 to 0.99 increases the iteration budget but yields more selective policies. Scaling N increases depth L and consequently, the exploration horizon, modestly increasing time to convergence while retaining high accuracy, so long as

δ

is tuned conservatively. These findings guide parameter choices for dense deployments.

5. Conclusions and Future Work

5.1. Conclusions

This paper presented a comprehensive study on the application of Hierarchical Discrete Pursuit Learning Automata (HDPA) for adaptive channel selection in LoRaWAN environments. Motivated by the limitations of static and pseudo-random access methods, HDPA offers a self-adjusting, learning-based strategy that dynamically improves communication reliability by tracking transmission success. Through detailed simulations, we showed that HDPA achieves over 98% accuracy in identifying optimal channels and converges significantly faster than the benchmark HCPA methods. Its layered decision-making structure not only enhances learning speed but also maintains low computational overhead, making it suitable for dense and resource-constrained IoT networks. In comparison with other machine learning-based methods, such as the Tug-of-War (TOW) dynamics algorithm, which distinguishes itself through its simulation-driven optimization and tailored design for LoRaWAN. While TOW prioritizes real-world coexistence scenarios, HDPA excels in stochastic environments with fewer assumptions. Our evaluation is simulation-based with independent time-invariant channel probabilities and binary ACK feedback. We do not model realistic fading, correlated interference. These choices isolate the learning dynamics but limit external validity.

While HDPA achieved over 98% accuracy with moderate iteration counts, the study relied on stationary success probabilities and assumed ideal feedback links. We did not model correlated fading, duty-cycle restrictions, and energy cost of maintaining the probability tree. Profiling HDPA on low-power LoRa boards and extending the analysis to time-varying interference are planned so that feasibility and robustness can be confirmed in an operational environment.

5.2. Future Works

The successful integration of HDPA in this study opens an avenue for future research. One potential area is expanding the scope of Learning Automata to manage interference in densely populated IoT networks represents a critical research frontier. Future work will incorporate time-varying and correlated channels network-level metrics, such as per-channel throughput distributions, and hardware-tested validation to quantify performance under coexistence between simulation evidence and deployable systems.

Author Contributions

Conceptualization, L.A.A., R.M. and E.H.; methodology, L.A.A.; software, L.A.A.; validation, L.A.A., R.M. and O.G.; formal analysis, L.A.A.; investigation, L.A.A.; resources, L.A.A.; data curation, L.A.A.; writing—original draft preparation, L.A.A.; writing—review and editing, L.A.A.; visualization, L.A.A.; supervision, L.A.A.; project administration, L.A.A.; funding acquisition, R.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The Article Processing Charge (APC) was funded by the African Center of Excellence in Internet of Things (ACEIoT), University of Rwanda.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cheikh, I.; Sabir, E.; Aouami, R.; Sadik, M.; Roy, S. Throughput-Delay Tradeoffs for Slotted-Aloha-based LoRaWAN Networks. In Proceedings of the 2021 International Wireless Communications and Mobile Computing (IWCMC), Harbin City, China, 28 June–2 July 2021. [Google Scholar] [CrossRef]
Wang, H.; Pei, P.; Pan, R.; Wu, K.; Zhang, Y.; Xiao, J.; Yang, J. A Collision Reduction Adaptive Data Rate Algorithm Based on the FSVM for a Low-Cost LoRa Gateway. Mathematics 2022, 10, 3920. [Google Scholar] [CrossRef]
Zhang, X.; Jiao, L.; Granmo, O.-C.; Oommen, B.J. Channel selection in cognitive radio networks: A switchable Bayesian learning automata approach. In Proceedings of the IEEE 24th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), London, UK, 8–11 September 2013; IEEE: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
Diane, A.; Diallo, O.; Ndoye, E.H.M. A systematic and comprehensive review on low power wide area network: Characteristics, architecture, applications and research challenges. Discov. Internet Things 2025, 5, 7. [Google Scholar] [CrossRef]
Bai, H.; Cheng, R.; Jin, Y. Evolutionary reinforcement learning: A survey. Intell. Comput. 2023, 2, 0025. [Google Scholar] [CrossRef]
Omslandseter, R.O.; Jiao, L.; Zhang, X.; Yazidi, A.; Oommen, B.J. The hierarchical discrete pursuit learning automaton: A novel scheme with fast convergence and epsilon-optimality. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 8278–8292. [Google Scholar] [CrossRef] [PubMed]
Yazidi, A.; Zhang, X.; Jiao, L.; Oommen, B.J. The hierarchical continuous pursuit learning automation: A novel scheme for environments with large numbers of actions. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 512–526. [Google Scholar] [CrossRef] [PubMed]
Prakash, A.; Choudhury, N.; Hazarika, A.; Gorrela, A. Effective Feature Selection for Predicting Spreading Factor with ML in Large LoRaWAN-based Mobile IoT Networks. In Proceedings of the 2025 National Conference on Communications (NCC), New Delhi, India, 6–9 March 2025. [Google Scholar] [CrossRef]
Lavdas, S.; Bakas, N.; Vavousis, K.; Khalifeh, A.; Hajj, W.E.; Zinonos, Z. Evaluating LoRaWAN Network Performance in Smart City Environments Using Machine Learning. IEEE Internet Things J. 2025, 12, 27060–27074. [Google Scholar] [CrossRef]
Garlisi, D.; Pagano, A.; Giuliano, F.; Croce, D.; Tinnirello, I. Interference Analysis of LoRaWAN and Sigfox in Large-Scale Urban IoT Networks. IEEE Access 2025, 13, 44836–44848. [Google Scholar] [CrossRef]
Keshmiri, H.; Rahman, G.M.; Wahid, K.A. LoRa Resource Allocation Algorithm for Higher Data Rates. Sensors 2025, 25, 518. [Google Scholar] [CrossRef] [PubMed]
Li, A.; Fujisawa, M.; Urabe, I.; Kitagawa, R.; Kim, S.-J.; Hasegawa, M. A lightweight decentralized reinforcement learning based channel selection approach for high-density LoRaWAN. In Proceedings of the 2021 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), Los Angeles, CA, USA, 13–15 December 2021. [Google Scholar] [CrossRef]
Oyewobi, S.S.; Hancke, G.P.; Abu-Mahfouz, A.M.; Onumanyi, A.J. An effective spectrum handoff based on reinforcement learning for target channel selection in the industrial Internet of Things. Sensors 2019, 19, 1395. [Google Scholar] [CrossRef] [PubMed]
Hasegawa, S.; Kim, S.-J.; Shoji, Y.; Hasegawa, M. Performance evaluation of machine learning based channel selection algorithm implemented on IoT sensor devices in coexisting IoT networks. In Proceedings of the 2020 IEEE 17th Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 10–13 January 2020. [Google Scholar] [CrossRef]
Loh, F.; Mehling, N.; Geißler, S.; Hoßfeld, T. Simulative performance study of slotted aloha for LoRaWAN channel access. In Proceedings of the NOMS 2022—2022 IEEE/IFIP Network Operations and Management Symposium, Budapest, Hungary, 25–29 April 2022. [Google Scholar] [CrossRef]
Yurii, L.; Anna, L.; Stepan, S. Research on the Throughput Capacity of LoRaWAN Communication Channel. In Proceedings of the 2023 IEEE East-West Design & Test Symposium (EWDTS), Batumi, Georgia, 22–25 September 2023. [Google Scholar] [CrossRef]
Gaillard, G.; Pham, C. CANL LoRa: Collision Avoidance by Neighbor Listening for Dense LoRa Networks. In Proceedings of the 2023 IEEE Symposium on Computers and Communications (ISCC), Gammarth, Tunisia, 9–12 July 2023. [Google Scholar] [CrossRef]
LoRa Alliance, Inc. LoRaWAN^® L2 1.0.4 Specification (TS001-1.0.4). White Paper, October 2020. Available online: https://lora-alliance.org/wp-content/uploads/2021/11/LoRaWAN-Link-Layer-Specification-v1.0.4.pdf (accessed on 20 May 2025).

Figure 1. Proposed LoRa Network Model.

Figure 2. Simulation flow chart.

Figure 3. Reward probabilities for 8 channels. The red asterisks represent the benchmark successful transmission probabilities assigned to each channel, where higher values indicate a greater likelihood of successful data transmission.

Figure 4. Throughput.

Figure 5. Channel selection updating probability for successful transmission.

Figure 6. Number of iterations for convergence for 200 experiments.

Table 1. List of successful data transmission probabilities for 8 channels.

$A_{1}$	$A_{2}$	$A_{3}$	$A_{4}$	$A_{5}$	$A_{6}$	$A_{7}$	$A_{8}$
0.199	0.282	0.394	0.499	0.681	0.698	0.971	0.999

Table 2. Simulation Variables.

Variable	Symbol	Description
Number of channels	$N$	Total number of available channels in the network.
Initial channel probability	$P (0)$	Initial probability vector for channel selection, $P (0) = [p 1, p 2, \dots, p N]$
Reward	$R$	Binary value: 0 for successful transmission (ACK), 1 otherwise
Learning Rate	$δ$	Step size for probability updates.
Hierarchical levels	$L$	Number of levels $L = {l o g}_{2} (N)$ , e.g., $L = 3$ for 8 channels
Convergence threshold	$B$	Probability threshold set (0.99) for stopping updates once a channel is confidently optimal
Maximum iteration	$T$	Maximum number of iterations for the simulation.
Action selection probability	$P i$	Probability of selecting channels at iteration.
Reward estimate	$d i$	Estimate of the reward for channel i.
Channel State	$S$	The state of each channel is either idle or busy.

Table 3. Result of our simulations for 8 channels.

Parameter	HDPA
Mean	6279.64
Std	131.36
Accuracy	98.78%
Learning parameter	8.7 × 10⁻⁴

Table 4. Comparison between HDPA and HCPA.

Parameters	HDPA	HCPA
Mean	6279.64	6778.34
STD	131.36	117.12
Accuracy	98.78%	93.89%
Learning parameter	8.7 × 10⁻⁴	6.9 × 10⁻⁴

Table 5. Comparison between HDPA and TOW-MAB.

Parameters	HDPA	TOW-MAB
Parameter sensitivity:	Learning rate, Convergence threshold	Frame Success Rate (FSR) Fairness Index (FI)
Environment Type:	Simulated LoRaWAN Under stochastic Channel states	Real-world deployment with coexisting IoT networks
Hardware Implementation:	Simulation-based	Raberry Pi + Lazurite 920J Modules
Network Model:	Single gateway, adaptive Channel selection, reward-based feedback	Multiple gateways, fixed sensor deployment, and real interference patterns
Learning structure:	Hierarchical tree with recursive probability updates	Single-layer with oscillatory decision dynamics

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Atadet, L.A.; Musabe, R.; Hitimana, E.; Gatera, O. Optimizing LoRaWAN Performance Through Learning Automata-Based Channel Selection. Future Internet 2025, 17, 555. https://doi.org/10.3390/fi17120555

AMA Style

Atadet LA, Musabe R, Hitimana E, Gatera O. Optimizing LoRaWAN Performance Through Learning Automata-Based Channel Selection. Future Internet. 2025; 17(12):555. https://doi.org/10.3390/fi17120555

Chicago/Turabian Style

Atadet, Luka Aime, Richard Musabe, Eric Hitimana, and Omar Gatera. 2025. "Optimizing LoRaWAN Performance Through Learning Automata-Based Channel Selection" Future Internet 17, no. 12: 555. https://doi.org/10.3390/fi17120555

APA Style

Atadet, L. A., Musabe, R., Hitimana, E., & Gatera, O. (2025). Optimizing LoRaWAN Performance Through Learning Automata-Based Channel Selection. Future Internet, 17(12), 555. https://doi.org/10.3390/fi17120555

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing LoRaWAN Performance Through Learning Automata-Based Channel Selection

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. System Model

3.2. Proposed Algorithm Flowchart

3.3. Mathematical Development

3.4. Channel Propagation

3.5. Software Environment

3.6. System Simulation

3.7. Simulation Variable

3.8. Implementation Feasibility on End Devices

4. Results and Discussion

4.1. Performance Evaluation and Comparison Analysis

4.2. Sensitivity Analysis

5. Conclusions and Future Work

5.1. Conclusions

5.2. Future Works

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI