MDPI - Publisher of Open Access Journals

18 pages, 1261 KiB

Open AccessArticle

Firmware Attestation in IoT Swarms Using Relational Graph Neural Networks and Static Random Access Memory

by Abdelkabir Rouagubi, Chaymae El Youssofi and Khalid Chougdali

AI 2025, 6(7), 161; https://doi.org/10.3390/ai6070161 - 21 Jul 2025

Viewed by 196

The proliferation of Internet of Things (IoT) swarms—comprising billions of low-end interconnected embedded devices—has transformed industrial automation, smart homes, and agriculture. However, these swarms are highly susceptible to firmware anomalies that can propagate across nodes, posing serious security threats. To address this, we [...] Read more.

The proliferation of Internet of Things (IoT) swarms—comprising billions of low-end interconnected embedded devices—has transformed industrial automation, smart homes, and agriculture. However, these swarms are highly susceptible to firmware anomalies that can propagate across nodes, posing serious security threats. To address this, we propose a novel Remote Attestation (RA) framework for real-time firmware verification, leveraging Relational Graph Neural Networks (RGNNs) to model the graph-like structure of IoT swarms and capture complex inter-node dependencies. Unlike conventional Graph Neural Networks (GNNs), RGNNs incorporate edge types (e.g., Prompt, Sensor Data, Processed Signal), enabling finer-grained detection of propagation dynamics. The proposed method uses runtime Static Random Access Memory (SRAM) data to detect malicious firmware and its effects without requiring access to firmware binaries. Experimental results demonstrate that the framework achieves 99.94% accuracy and a 99.85% anomaly detection rate in a 4-node swarm (Swarm-1), and 100.00% accuracy with complete anomaly detection in a 6-node swarm (Swarm-2). Moreover, the method proves resilient against noise, dropped responses, and trace replay attacks, offering a robust and scalable solution for securing IoT swarms. Full article

► Show Figures

Figure 1

30 pages, 2843 KiB

Open AccessArticle

Survey on Replay-Based Continual Learning and Empirical Validation on Feasibility in Diverse Edge Devices Using a Representative Method

by Heon-Sung Park, Hyeon-Chang Chu, Min-Kyung Sung, Chaewoon Kim, Jeongwon Lee, Dae-Won Kim and Jaesung Lee

Mathematics 2025, 13(14), 2257; https://doi.org/10.3390/math13142257 - 12 Jul 2025

Viewed by 445

Abstract

The goal of on-device continual learning is to enable models to adapt to streaming data without forgetting previously acquired knowledge, even with limited computational resources and memory constraints. Recent research has demonstrated that weighted regularization-based methods are constrained by indirect knowledge preservation and [...] Read more.

The goal of on-device continual learning is to enable models to adapt to streaming data without forgetting previously acquired knowledge, even with limited computational resources and memory constraints. Recent research has demonstrated that weighted regularization-based methods are constrained by indirect knowledge preservation and sensitive hyperparameter settings, and dynamic architecture methods are ill-suited for on-device environments due to increased resource consumption as the structure scales. In order to compensate for these limitations, replay-based continuous learning, which maintains a compact structure and stable performance, is gaining attention. The limitations of replay-based continuous learning are (1) the limited amount of historical training data that can be stored due to limited memory capacity, and (2) the computational resources of on-device systems are significantly lower than those of servers or cloud infrastructures. Consequently, designing strategies that balance the preservation of past knowledge with rapid and cost-effective updates of model parameters has become a critical consideration in on-device continual learning. This paper presents an empirical survey of replay-based continual learning studies, considering the nearest class mean classifier with replay-based sparse weight updates as a representative method for validating the feasibility of diverse edge devices. Our empirical comparison of standard benchmarks, including CIFAR-10, CIFAR-100, and TinyImageNet, deployed on devices such as Jetson Nano and Raspberry Pi, showed that the proposed representative method achieved reasonable accuracy under limited buffer sizes compared with existing replay-based techniques. A significant reduction in training time and resource consumption was observed, thereby supporting the feasibility of replay-based on-device continual learning in practice. Full article

(This article belongs to the Special Issue Computational Intelligence in Systems, Signals and Image Processing)

► Show Figures

Figure 1

24 pages, 1645 KiB

Open AccessArticle

Dual-Stage Clean-Sample Selection for Incremental Noisy Label Learning

by Jianyang Li, Xin Ma and Yonghong Shi

Bioengineering 2025, 12(7), 743; https://doi.org/10.3390/bioengineering12070743 - 8 Jul 2025

Viewed by 369

Abstract

Class-incremental learning (CIL) in deep neural networks is affected by catastrophic forgetting (CF), where acquiring knowledge of new classes leads to the significant degradation of previously learned representations. This challenge is particularly severe in medical image analysis, where costly, expertise-dependent annotations frequently contain [...] Read more.

Class-incremental learning (CIL) in deep neural networks is affected by catastrophic forgetting (CF), where acquiring knowledge of new classes leads to the significant degradation of previously learned representations. This challenge is particularly severe in medical image analysis, where costly, expertise-dependent annotations frequently contain pervasive and hard-to-detect noisy labels that substantially compromise model performance. While existing approaches have predominantly addressed CF and noisy labels as separate problems, their combined effects remain largely unexplored. To address this critical gap, this paper presents a dual-stage clean-sample selection method for Incremental Noisy Label Learning (DSCNL). Our approach comprises two key components: (1) a dual-stage clean-sample selection module that identifies and leverages high-confidence samples to guide the learning of reliable representations while mitigating noise propagation during training, and (2) an experience soft-replay strategy for memory rehearsal to improve the model’s robustness and generalization in the presence of historical noisy labels. This integrated framework effectively suppresses the adverse influence of noisy labels while simultaneously alleviating catastrophic forgetting. Extensive evaluations on public medical image datasets demonstrate that DSCNL consistently outperforms state-of-the-art CIL methods across diverse classification tasks. The proposed method boosts the average accuracy by 55% and 31% compared with baseline methods on datasets with different noise levels, and achieves an average noise reduction rate of 73% under original noise conditions, highlighting its effectiveness and applicability in real-world medical imaging scenarios. Full article

(This article belongs to the Special Issue Diagnostic Biomedical Image and Processing with Artificial Intelligence and Deep Learning)

► Show Figures

Figure 1

19 pages, 6421 KiB

Open AccessFeature PaperArticle

Automated Deadlift Techniques Assessment and Classification Using Deep Learning

by Wegar Lien Grymyr and Isah A. Lawal

AI 2025, 6(7), 148; https://doi.org/10.3390/ai6070148 - 7 Jul 2025

Viewed by 400

Abstract

This paper explores the application of deep learning techniques for evaluating and classifying deadlift weightlifting techniques from video input. The increasing popularity of weightlifting, coupled with the injury risks associated with improper form, has heightened interest in this area of research. To address [...] Read more.

This paper explores the application of deep learning techniques for evaluating and classifying deadlift weightlifting techniques from video input. The increasing popularity of weightlifting, coupled with the injury risks associated with improper form, has heightened interest in this area of research. To address these concerns, we developed an application designed to classify three distinct styles of deadlifts: conventional, Romanian, and sumo. In addition to style classification, our application identifies common mistakes such as a rounded back, overextension at the top of the lift, and premature lifting of the hips in relation to the back. To build our model, we created a comprehensive custom dataset comprising lateral-view videos of lifters performing deadlifts, which we meticulously annotated to ensure accuracy. We adapted the MoveNet model to track keypoints on the lifter’s joints, which effectively represented their motion patterns. These keypoints not only served as visualization aids in the training of Convolutional Neural Networks (CNNs) but also acted as the primary features for Long Short-Term Memory (LSTM) models, both of which we employed to classify the various deadlift techniques. Our experimental results showed that both models achieved impressive F1-scores, reaching up to 0.99 for style and 1.00 for execution form classifications on the test dataset. Furthermore, we designed an application that integrates keypoint visualizations with motion pattern classifications. This tool provides users with valuable feedback on their performance and includes a replay feature for self-assessment, helping lifters refine their technique and reduce the risk of injury. Full article

► Show Figures

Figure 1

21 pages, 2797 KiB

Open AccessArticle

Model-Driven Meta-Learning-Aided Fast Beam Prediction in Millimeter-Wave Communications

by Wenqin Lu, Xueqin Jiang, Yuwen Cao, Tomoaki Ohtsuki and Enjian Bai

Electronics 2025, 14(13), 2734; https://doi.org/10.3390/electronics14132734 - 7 Jul 2025

Viewed by 235

Abstract

Beamforming plays a key role in improving the spectrum utilization efficiency of multi-antenna systems. However, we observe that (i) conventional beam prediction solutions suffer from high model training overhead and computational latency and thus cannot adapt quickly to changing wireless environments, and (ii) [...] Read more.

Beamforming plays a key role in improving the spectrum utilization efficiency of multi-antenna systems. However, we observe that (i) conventional beam prediction solutions suffer from high model training overhead and computational latency and thus cannot adapt quickly to changing wireless environments, and (ii) deep-learning-based beamforming may face the risk of catastrophic oblivion in dynamically changing environments, which can significantly degrade system performance. Inspired by the above challenges, we propose a continuous-learning-inspired beam prediction model for fast beamforming adaptation in dynamic downlink millimeter-wave (mmWave) communications. More specifically, we develop a meta-empirical replay (MER)-based beam prediction model. It combines empirical replay and optimization-based meta-learning. This approach optimizes the trade-offs between transmission and interference in dynamic environments, enabling effective fast beamforming adaptation. Finally, the high-performance gains brought by the proposed model in dynamic communication environments are verified through simulations. The simulation results show that our proposed model not only maintains a high-performance memory for old tasks but also adapts quickly to new tasks. Full article

► Show Figures

Figure 1

28 pages, 1293 KiB

Open AccessArticle

A Lightweight Double-Deep Q-Network for Energy Efficiency Optimization of Industrial IoT Devices in Thermal Power Plants

by Shuang Gao, Yuntao Zou and Li Feng

Electronics 2025, 14(13), 2569; https://doi.org/10.3390/electronics14132569 - 25 Jun 2025

Viewed by 327

Abstract

Industrial Internet of Things (IIoT) deployments in thermal power plants face significant energy efficiency challenges due to harsh operating conditions and device resource constraints. This paper presents gradient memory double-deep Q-network (GM-DDQN), a lightweight reinforcement learning approach for energy optimization on resource-constrained IIoT [...] Read more.

Industrial Internet of Things (IIoT) deployments in thermal power plants face significant energy efficiency challenges due to harsh operating conditions and device resource constraints. This paper presents gradient memory double-deep Q-network (GM-DDQN), a lightweight reinforcement learning approach for energy optimization on resource-constrained IIoT devices. At its core, GM-DDQN introduces the gradient memory mechanism, a novel memory-efficient alternative to experience replay. This core innovation, combined with a simplified neural network architecture and efficient parameter quantization, collectively reduces memory requirements by 99% and computation time by 85–90% compared to standard methods. Experimental evaluations across three realistic simulated thermal power plant scenarios demonstrate that GM-DDQN improves energy efficiency by 42% compared to fixed policies and 27% compared to threshold-based approaches, extending battery lifetime from 8–9 months to 14–15 months while maintaining 96–97% PSR. The method enables sophisticated reinforcement learning directly on IIoT edge devices without requiring cloud connectivity, reducing maintenance costs and improving monitoring reliability in industrial environments. Full article

(This article belongs to the Topic Advanced Propagation Channel Estimation Techniques for Sixth-Generation (6G) Wireless Communications)

► Show Figures

Figure 1

25 pages, 3667 KiB

Open AccessArticle

A Long-Time Series Forecast Method for Wind Turbine Blade Strain with Incremental Bi-LSTM Learning

by Bingkai Wang, Wenlei Sun and Hongwei Wang

Sensors 2025, 25(13), 3898; https://doi.org/10.3390/s25133898 - 23 Jun 2025

Viewed by 272

Abstract

This article presents a novel incremental forecast method to address the challenges in long-time strain status prediction for a wind turbine blade (WTB) under wind loading. Taking strain as the key indicator of structural health, a mathematical model is established to characterize the [...] Read more.

This article presents a novel incremental forecast method to address the challenges in long-time strain status prediction for a wind turbine blade (WTB) under wind loading. Taking strain as the key indicator of structural health, a mathematical model is established to characterize the long-time series forecast forecasting process. Based on the Bi-directional Long Short-Term Memory (Bi-LSTM) framework, the proposed method incorporates incremental learning via an error-supervised feedback mechanism, enabling the dynamic self-updating of the model parameters. The experience replay and elastic weight consolidation are integrated to further enhance the prediction accuracy. Ultimately, the experimental results demonstrate that the proposed incremental forecast method achieves a 24% and 4.6% improvement in accuracy over the Bi-LSTM and Transformer, respectively. This research not only provides an effective solution for long-time prediction of WTB health but also offers a novel technical framework and theoretical foundation for long-time series forecasting. Full article

(This article belongs to the Special Issue Intelligent Sensing Technologies for Blade Health Monitoring and Fault Detection)

► Show Figures

Figure 1

25 pages, 7158 KiB

Open AccessArticle

Anti-Jamming Decision-Making for Phased-Array Radar Based on Improved Deep Reinforcement Learning

by Hang Zhao, Hu Song, Rong Liu, Jiao Hou and Xianxiang Yu

Electronics 2025, 14(11), 2305; https://doi.org/10.3390/electronics14112305 - 5 Jun 2025

Viewed by 541

Abstract

In existing phased-array radar systems, anti-jamming strategies are mainly generated through manual judgment. However, manually designing or selecting anti-jamming decisions is often difficult and unreliable in complex jamming environments. Therefore, reinforcement learning is applied to anti-jamming decision-making to solve the above problems. However, [...] Read more.

In existing phased-array radar systems, anti-jamming strategies are mainly generated through manual judgment. However, manually designing or selecting anti-jamming decisions is often difficult and unreliable in complex jamming environments. Therefore, reinforcement learning is applied to anti-jamming decision-making to solve the above problems. However, the existing anti-jamming decision-making models based on reinforcement learning often suffer from problems such as low convergence speeds and low decision-making accuracy. In this paper, a multi-aspect improved deep Q-network (MAI-DQN) is proposed to improve the exploration policy, the network structure, and the training methods of the deep Q-network. In order to solve the problem of the

ϵ

-greedy strategy being highly dependent on hyperparameter settings, and the Q-value being overly influenced by the action in other deep Q-networks, this paper proposes a structure that combines a noisy network, a dueling network, and a double deep Q-network, which incorporates an adaptive exploration policy into the neural network and increases the influence of the state itself on the Q-value. These enhancements enable a highly adaptive exploration strategy and a high-performance network architecture, thereby improving the decision-making accuracy of the model. In order to calculate the target value more accurately during the training process and improve the stability of the parameter update, this paper proposes a training method that combines n-step learning, target soft update, variable learning rate, and gradient clipping. Moreover, a novel variable double-depth priority experience replay (VDDPER) method that more accurately simulates the storage and update mechanism of human memory is used in the MAI-DQN. The VDDPER improves the decision-making accuracy by dynamically adjusting the sample size based on different values of experience during training, enhancing exploration during the early stages of training, and placing greater emphasis on high-value experiences in the later stages. Enhancements to the training method improve the model’s convergence speed. Moreover, a reward function combining signal-level and data-level benefits is proposed to adapt to complex jamming environments, which ensures a high reward convergence speed with fewer computational resources. The findings of a simulation experiment show that the proposed phased-array radar anti-jamming decision-making method based on MAI-DQN can achieve a high convergence speed and high decision-making accuracy in environments where deceptive jamming and suppressive jamming coexist. Full article

(This article belongs to the Special Issue Advanced Radar Waveform Design and Intelligent Countermeasures in Integrated Radar and Communication Systems)

► Show Figures

Figure 1

26 pages, 7159 KiB

Open AccessArticle

Methodology for Human–Robot Collaborative Assembly Based on Human Skill Imitation and Learning

by Yixuan Zhou, Naisheng Tang, Ziyi Li and Hanlei Sun

Machines 2025, 13(5), 431; https://doi.org/10.3390/machines13050431 - 19 May 2025

Viewed by 740

Abstract

With the growing demand for personalized and flexible production, human–robot collaboration technology receives increasing attention. However, enabling robots to accurately perceive and align with human motion intentions remains a significant challenge. To address this, a novel human–robot collaborative control framework is proposed, which [...] Read more.

With the growing demand for personalized and flexible production, human–robot collaboration technology receives increasing attention. However, enabling robots to accurately perceive and align with human motion intentions remains a significant challenge. To address this, a novel human–robot collaborative control framework is proposed, which utilizes electromyography (EMG) signals as an interaction interface and integrates human skill imitation with reinforcement learning. Specifically, to manage the dynamic variation in muscle coordination patterns induced by joint angle changes, a temporal graph neural network enhanced with an Angle-Guided Attention mechanism is developed. This method adaptively models the topological relationships among muscle groups, enabling high-precision three-dimensional dynamic arm force estimation. Furthermore, an expert reward function and a fuzzy experience replay mechanism are introduced in the reinforcement learning model to guide the human skill learning process, thereby enhancing collaborative comfort and smoothness. The proposed approach is validated through a collaborative assembly task. Experimental results show that the proposed arm force estimation model reduces estimation errors by 10.38%, 8.33%, and 11.20% across three spatial directions compared to a conventional Deep Long Short-Term Memory (Deep-LSTM). Moreover, it significantly outperforms state-of-the-art methods, including traditional imitation learning and adaptive admittance control, in terms of collaborative comfort, smoothness, and assembly accuracy. Full article

(This article belongs to the Special Issue Robotic Intelligence Development of AI in Robot Perception, Learning, and Decision)

► Show Figures

Figure 1

25 pages, 3758 KiB

Open AccessArticle

An Efficient Framework for Secure Communication in Internet of Drone Networks Using Deep Computing

by Vivek Kumar Pandey, Shiv Prakash, Aditya Ranjan, Sudhanshu Kumar Jha, Xin Liu and Rajkumar Singh Rathore

Designs 2025, 9(3), 61; https://doi.org/10.3390/designs9030061 - 13 May 2025

Viewed by 1428

Abstract

The rapid deployment of the Internet of Drones (IoD) across different fields has brought forth enormous security threats in real-time data communication. To overcome authentication vulnerabilities, this paper introduces a secure lightweight framework integrating deep learning-based user behavior analysis and cryptographic protocols. The [...] Read more.

The rapid deployment of the Internet of Drones (IoD) across different fields has brought forth enormous security threats in real-time data communication. To overcome authentication vulnerabilities, this paper introduces a secure lightweight framework integrating deep learning-based user behavior analysis and cryptographic protocols. The proposed framework is verified through AVISPA security verification against replay, man-in-the-middle, and impersonation attacks. Performance analysis via NS2 simulations based on changing network parameters (5–50 drones, 1–20 users, 2–8 ground stations) validates enhancements in computation overhead, authentication delay, memory usage, power consumption, and communication effectiveness in comparison with recent models such as LDAP, TAUROT, IoD-Auth, and LEMAP, thereby establishing our system as an optimal choice for safe IoD operation. Full article

(This article belongs to the Collection Editorial Board Members’ Collection Series: Drone Design)

► Show Figures

Figure 1

27 pages, 855 KiB

Open AccessArticle

Edge Exemplars Enhanced Incremental Learning Model for Tor-Obfuscated Traffic Identification

by Sicai Lv, Zibo Wang, Yunxiao Sun, Chao Wang and Bailing Wang

Electronics 2025, 14(8), 1589; https://doi.org/10.3390/electronics14081589 - 14 Apr 2025

Viewed by 579

Abstract

Tor is the most widely used anonymous communication network. Tor has developed a series of pluggable transports (PTs) to obfuscate traffic and avoid censorship. These PTs use different traffic obfuscation techniques, and many of them have been maintained and updated. In order to [...] Read more.

Tor is the most widely used anonymous communication network. Tor has developed a series of pluggable transports (PTs) to obfuscate traffic and avoid censorship. These PTs use different traffic obfuscation techniques, and many of them have been maintained and updated. In order to achieve continual learning against PTs and their updates, this paper proposes an incremental learning model for Tor traffic detection. First, we analyzed several common traffic obfuscation techniques, including randomization, mimicry, and tunneling. A feature set was designed for Tor obfuscation traffic detection. Second, this paper constructs the Tor incremental learning framework and proposes edge exemplar enhancement to enhance the memory of trained models for previous classes. It can enhance the previous class memory of the model through edge feature enhancement and selective replay to alleviate the catastrophic forgetting problem of incremental learning. Finally, we combined public and self-collected datasets to simulate the development of Tor PTs and verify the effectiveness of our model. The experimental results show that the improved model in this paper has the highest accuracy rate of 87.6% in the simulated environment. This means that the incremental learning model can effectively cope with the updating of PTs. Full article

► Show Figures

Figure 1

13 pages, 6521 KiB

Open AccessArticle

Spatial–Adaptive Replay for Foreground Classes in Class-Incremental Semantic Segmentation

by Wei Huang, Zhuoming Gu, Mengfan Xu and Xiaofeng Lu

Electronics 2025, 14(7), 1338; https://doi.org/10.3390/electronics14071338 - 27 Mar 2025

Viewed by 250

Abstract

Class-Incremental Semantic Segmentation (CISS) addresses the challenge of catastrophic forgetting in semantic segmentation models. In autonomous driving scenarios, the model can learn the background class information from the new data due to the repetition of many structural background classes in new data. Traditional [...] Read more.

Class-Incremental Semantic Segmentation (CISS) addresses the challenge of catastrophic forgetting in semantic segmentation models. In autonomous driving scenarios, the model can learn the background class information from the new data due to the repetition of many structural background classes in new data. Traditional replay-based methods store the original pixels of these background classes from old data, resulting in low memory efficiency. To enhance memory efficiency, we propose Spatial–Adaptive replay for Foreground objects (SAF), a method that stores only foreground-class pixels and their spatial information. In addition, Spatial–Adaptive Mix-up (SAM) is designed in the proposed method, based on the spatial distribution characteristics of the foreground classes. Spatial–adaptive alignment ensures that foreground objects are mixed with new samples in a reasonable spatial arrangement to help the model learn critical contextual information. Experiments on the Cityscapes and BDD100K datasets show that the proposed method obtains competitive results in class-incremental semantic segmentation tasks for autonomous driving scenarios. Full article

► Show Figures

Figure 1

19 pages, 3693 KiB

Open AccessArticle

Real-Time On-Device Continual Learning Based on a Combined Nearest Class Mean and Replay Method for Smartphone Gesture Recognition

by Heon-Sung Park, Min-Kyung Sung, Dae-Won Kim and Jaesung Lee

Sensors 2025, 25(2), 427; https://doi.org/10.3390/s25020427 - 13 Jan 2025

Viewed by 1379

Abstract

Sensor-based gesture recognition on mobile devices is critical to human–computer interaction, enabling intuitive user input for various applications. However, current approaches often rely on server-based retraining whenever new gestures are introduced, incurring substantial energy consumption and latency due to frequent data transmission. To [...] Read more.

Sensor-based gesture recognition on mobile devices is critical to human–computer interaction, enabling intuitive user input for various applications. However, current approaches often rely on server-based retraining whenever new gestures are introduced, incurring substantial energy consumption and latency due to frequent data transmission. To address these limitations, we present the first on-device continual learning framework for gesture recognition. Leveraging the Nearest Class Mean (NCM) classifier coupled with a replay-based update strategy, our method enables continuous adaptation to new gestures under limited computing and memory resources. By employing replay buffer management, we efficiently store and revisit previously learned instances, mitigating catastrophic forgetting and ensuring stable performance as new gestures are added. Experimental results on a Samsung Galaxy S10 device demonstrate that our method achieves over 99% accuracy while operating entirely on-device, offering a compelling synergy between computational efficiency, robust continual learning, and high recognition accuracy. This work demonstrates the potential of on-device continual learning frameworks that integrate NCM classifiers with replay-based techniques, thereby advancing the field of resource-constrained, adaptive gesture recognition. Full article

(This article belongs to the Special Issue Computer Vision-Based Human Activity Recognition)

► Show Figures

Figure 1

13 pages, 1769 KiB

Open AccessArticle

Collaborative Beamforming with DQN for Interference Mitigation in 5G and Beyond Networks

by Alaelddin F. Y. Mohammed, Salman Md Sultan and Sakshi Patni

Telecom 2024, 5(4), 1192-1204; https://doi.org/10.3390/telecom5040060 - 3 Dec 2024

Viewed by 1920

Abstract

This paper addresses the problem of side lobe interference in 5G networks by proposing a unique collaborative beamforming strategy based on Deep Q-Network (DQN) reinforcement learning. Our method, which operates in the sub-6 GHz band, maximizes beam steering and power management by using [...] Read more.

This paper addresses the problem of side lobe interference in 5G networks by proposing a unique collaborative beamforming strategy based on Deep Q-Network (DQN) reinforcement learning. Our method, which operates in the sub-6 GHz band, maximizes beam steering and power management by using a two-antenna system with DQN-controlled phase shifters. We provide an OFDM cellular network environment where inter-cell interference is managed while many base stations serve randomly dispersed customers. In order to reduce interference strength and improve signal-to-interference-plus-noise ratio (SINR), the DQN agent learns to modify the interference angle. Our model integrates experience replay memory with a long short-term memory (LSTM) recurrent neural network for time series prediction to enhance learning stability. The outcomes of our simulations show that our suggested DQN approach works noticeably better than current DQN and Q-learning methods. In particular, our technique reaches a maximum of 29.18 dB and a minimum of 5.15 dB, whereas the other approaches only manage 0.77–27.04 dB. Additionally, we significantly decreased the average interference level to 5.42 dB compared to competing approaches of 38.84 dB and 34.12 dB. The average sum-rate capacity is also increased to 3.90 by the suggested strategy, outperforming previous approaches. These findings demonstrate how well our cooperative beamforming method reduces interference and improves overall network performance in 5G systems. Full article

► Show Figures

Figure 1

22 pages, 517 KiB

Open AccessArticle

LIRL: Latent Imagination-Based Reinforcement Learning for Efficient Coverage Path Planning

by Zhenglin Wei, Tiejiang Sun and Mengjie Zhou

Symmetry 2024, 16(11), 1537; https://doi.org/10.3390/sym16111537 - 17 Nov 2024

Cited by 2 | Viewed by 1659

Abstract

Coverage Path Planning (CPP) in unknown environments presents unique challenges that often require the system to maintain a symmetry between exploration and exploitation in order to efficiently cover unknown areas. This paper introduces latent imagination-based reinforcement learning (LIRL), a novel framework that addresses [...] Read more.

Coverage Path Planning (CPP) in unknown environments presents unique challenges that often require the system to maintain a symmetry between exploration and exploitation in order to efficiently cover unknown areas. This paper introduces latent imagination-based reinforcement learning (LIRL), a novel framework that addresses these challenges by integrating three key components: memory-augmented experience replay (MAER), a latent imagination module (LIM), and multi-step prediction learning (MSPL) within a soft actor–critic architecture. MAER enhances sample efficiency by prioritizing experience retrieval, LIM facilitates long-term planning via simulated trajectories, and MSPL optimizes the trade-off between immediate rewards and future outcomes through adaptive n-step learning. MAER, LIM, and MSPL work within a soft actor–critic architecture, and LIRL creates a dynamic equilibrium that enables efficient, adaptive decision-making. We evaluate LIRL across diverse simulated environments, demonstrating substantial improvements over state-of-the-art methods. Through this method, the agent optimally balances short-term actions with long-term planning, maintaining symmetrical responses to varying environmental changes. The results highlight LIRL’s potential for advancing autonomous CPP in real-world applications such as search and rescue, agricultural robotics, and warehouse automation. Our work contributes to the broader fields of robotics and reinforcement learning, offering insights into integrating memory, imagination, and adaptive learning for complex sequential decision-making tasks. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

Search Results (62)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (62)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI