Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (39)

Search Parameters:
Keywords = joint agent-environment systems

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 1077 KB  
Article
Research on Optimization of RIS-Assisted Air-Ground Communication System Based on Reinforcement Learning
by Yuanyuan Yao, Xinyang Liu, Sai Huang and Xinwei Yue
Sensors 2025, 25(20), 6382; https://doi.org/10.3390/s25206382 - 16 Oct 2025
Viewed by 632
Abstract
In urban emergency communication scenarios, building obstructions can reduce the performance of base station (BS) communication networks. To address such issues, this paper proposes an air-ground wireless network enabled by an unmanned aerial vehicle (UAV) and assisted by reconfigurable intelligent surfaces (RIS). This [...] Read more.
In urban emergency communication scenarios, building obstructions can reduce the performance of base station (BS) communication networks. To address such issues, this paper proposes an air-ground wireless network enabled by an unmanned aerial vehicle (UAV) and assisted by reconfigurable intelligent surfaces (RIS). This system enhances the efficacy of UAV-enabled MISO networks. Treating the UAV as an intelligent agent moving in 3D space, sensing changes in the channel environment, and adopting zero-forcing (ZF) precoding to eliminate interference from ground users. Meanwhile, joint design is performed for UAV movement, RIS phase shifts, and power allocation for users. We propose two deep reinforcement learning (DRL) algorithms, which are termed D3QN-WF and DDQN-WF, respectively. Simulation results indicate that D3QN-WF achieves a 15.9% higher sum rate and 50.1% greater throughput than the DDQN-WF baseline, while also demonstrating significantly faster convergence. Full article
Show Figures

Figure 1

21 pages, 2648 KB  
Article
A Hybrid Reinforcement Learning Framework Combining TD3 and PID Control for Robust Trajectory Tracking of a 5-DOF Robotic Arm
by Zied Ben Hazem, Firas Saidi, Nivine Guler and Ali Husain Altaif
Automation 2025, 6(4), 56; https://doi.org/10.3390/automation6040056 - 14 Oct 2025
Viewed by 1430
Abstract
This paper presents a hybrid reinforcement learning framework for trajectory tracking control of a 5-degree-of-freedom (DOF) Mitsubishi RV-2AJ robotic arm by integrating model-free deep reinforcement learning (DRL) algorithms with classical control strategies. A novel hybrid PID + TD3 agent is proposed, combining a [...] Read more.
This paper presents a hybrid reinforcement learning framework for trajectory tracking control of a 5-degree-of-freedom (DOF) Mitsubishi RV-2AJ robotic arm by integrating model-free deep reinforcement learning (DRL) algorithms with classical control strategies. A novel hybrid PID + TD3 agent is proposed, combining a Proportional–Integral–Derivative (PID) controller with the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, and is compared against standalone TD3 and PID controllers. In this architecture, the PID controller provides baseline stability and deterministic disturbance rejection, while the TD3 agent learns residual corrections to enhance tracking accuracy, robustness, and control smoothness. The robotic system is modeled in MATLAB/Simulink with Simscape Multibody, and the agents are trained using a reward function inspired by artificial potential fields, promoting energy-efficient and precise motion. Extensive simulations are performed under internal disturbances (e.g., joint friction variations, payload changes) and external disturbances (e.g., unexpected forces, environmental interactions). Results demonstrate that the hybrid PID + TD3 approach outperforms both standalone TD3 and PID controllers in convergence speed, tracking precision, and disturbance rejection. This study highlights the effectiveness of combining reinforcement learning with classical control for intelligent, robust, and resilient robotic manipulation in uncertain environments. Full article
(This article belongs to the Topic New Trends in Robotics: Automation and Autonomous Systems)
Show Figures

Figure 1

26 pages, 9360 KB  
Article
Multi-Agent Hierarchical Reinforcement Learning for PTZ Camera Control and Visual Enhancement
by Zhonglin Yang, Huanyu Liu, Hao Fang, Junbao Li and Yutong Jiang
Electronics 2025, 14(19), 3825; https://doi.org/10.3390/electronics14193825 - 26 Sep 2025
Viewed by 918
Abstract
Border surveillance, as a critical component of national security, places increasingly stringent demands on the target perception capabilities of video monitoring systems, especially in wide-area and complex environments. To address the limitations of existing systems in low-confidence target detection and multi-camera collaboration, this [...] Read more.
Border surveillance, as a critical component of national security, places increasingly stringent demands on the target perception capabilities of video monitoring systems, especially in wide-area and complex environments. To address the limitations of existing systems in low-confidence target detection and multi-camera collaboration, this paper proposes a novel visual enhancement method for cooperative control of multiple PTZ (Pan–Tilt–Zoom) cameras based on hierarchical reinforcement learning. The proposed approach establishes a hierarchical framework composed of a Global Planner Agent (GPA) and multiple Local Executor Agents (LEAs). The GPA is responsible for global target assignment, while the LEAs perform fine-grained visual enhancement operations based on the assigned targets. To effectively model the spatial relationships among multiple targets and the perceptual topology of the cameras, a graph-based joint state space is constructed. Furthermore, a graph neural network is employed to extract high-level features, enabling efficient information sharing and collaborative decision-making among cameras. Experimental results in simulation environments demonstrate the superiority of the proposed method in terms of target coverage and visual enhancement performance. Hardware experiments further validate the feasibility and robustness of the approach in real-world scenarios. This study provides an effective solution for multi-camera cooperative surveillance in complex environments. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

47 pages, 12662 KB  
Review
Strength in Adhesion: A Multi-Mechanics Review Covering Tensile, Shear, Fracture, Fatigue, Creep, and Impact Behavior of Polymer Bonding in Composites
by Murat Demiral
Polymers 2025, 17(19), 2600; https://doi.org/10.3390/polym17192600 - 25 Sep 2025
Cited by 7 | Viewed by 4954
Abstract
The growing demand for lightweight and reliable structures across aerospace, automotive, marine, and civil engineering has driven significant advances in polymer adhesive technology. These materials serve dual roles, functioning as matrices in composites and as structural bonding agents, where they must balance strength, [...] Read more.
The growing demand for lightweight and reliable structures across aerospace, automotive, marine, and civil engineering has driven significant advances in polymer adhesive technology. These materials serve dual roles, functioning as matrices in composites and as structural bonding agents, where they must balance strength, toughness, durability, and sometimes sustainability. Recent review efforts have greatly enriched understanding, yet most approach the topic from specialized angles—whether emphasizing nanoscale toughening, multifunctional formulations, sustainable alternatives, or microscopic failure processes in bonded joints. While such perspectives provide valuable insights, they often remain fragmented, leaving open questions about how nanoscale mechanisms translate into macroscopic reliability, how durability evolves under realistic service conditions, and how mechanical responses interact across different loading modes. To address this, the present review consolidates knowledge on the performance of polymer adhesives under tension, shear, fracture, fatigue, creep, and impact. By integrating experimental findings with computational modeling and emerging data-driven approaches, it situates localized mechanisms within a broader structure–performance framework. This unified perspective not only highlights persistent gaps—such as predictive modeling of complex failure, scalability of nanomodified systems, and long-term durability under coupled environments—but also outlines strategies for developing next-generation adhesives capable of delivering reliable, high-performance bonding solutions for demanding applications. Full article
(This article belongs to the Special Issue Polymer Composites: Design, Manufacture and Characterization)
Show Figures

Graphical abstract

24 pages, 11782 KB  
Article
Research on Joint Game-Theoretic Modeling of Network Attack and Defense Under Incomplete Information
by Yifan Wang, Xiaojian Liu and Xuejun Yu
Entropy 2025, 27(9), 892; https://doi.org/10.3390/e27090892 - 23 Aug 2025
Cited by 1 | Viewed by 1020
Abstract
In the face of increasingly severe cybersecurity threats, incomplete information and environmental dynamics have become central challenges in network attack–defense scenarios. In real-world network environments, defenders often find it difficult to fully perceive attack behaviors and network states, leading to a high degree [...] Read more.
In the face of increasingly severe cybersecurity threats, incomplete information and environmental dynamics have become central challenges in network attack–defense scenarios. In real-world network environments, defenders often find it difficult to fully perceive attack behaviors and network states, leading to a high degree of uncertainty in the system. Traditional approaches are inadequate in dealing with the diversification of attack strategies and the dynamic evolution of network structures, making it difficult to achieve highly adaptive defense strategies and efficient multi-agent coordination. To address these challenges, this paper proposes a multi-agent network defense approach based on joint game modeling, termed JG-Defense (Joint Game-based Defense), which aims to enhance the efficiency and robustness of defense decision-making in environments characterized by incomplete information. The method integrates Bayesian game theory, graph neural networks, and a proximal policy optimization framework, and it introduces two core mechanisms. First, a Dynamic Communication Graph Neural Network (DCGNN) is used to model the dynamic network structure, improving the perception of topological changes and attack evolution trends. A multi-agent communication mechanism is incorporated within the DCGNN to enable the sharing of local observations and strategy coordination, thereby enhancing global consistency. Second, a joint game loss function is constructed to embed the game equilibrium objective into the reinforcement learning process, optimizing both the rationality and long-term benefit of agent strategies. Experimental results demonstrate that JG-Defense outperforms the Cybermonic model by 15.83% in overall defense performance. Furthermore, under the traditional PPO loss function, the DCGNN model improves defense performance by 11.81% compared to the Cybermonic model. These results verify that the proposed integrated approach achieves superior global strategy coordination in dynamic attack–defense scenarios with incomplete information. Full article
(This article belongs to the Section Multidisciplinary Applications)
Show Figures

Figure 1

22 pages, 2972 KB  
Article
Cooperative Schemes for Joint Latency and Energy Consumption Minimization in UAV-MEC Networks
by Ming Cheng, Saifei He, Yijin Pan, Min Lin and Wei-Ping Zhu
Sensors 2025, 25(17), 5234; https://doi.org/10.3390/s25175234 - 22 Aug 2025
Viewed by 1278
Abstract
The Internet of Things (IoT) has promoted emerging applications that require massive device collaboration, heavy computation, and stringent latency. Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) systems can provide flexible services for user devices (UDs) with wide coverage. The optimization of both [...] Read more.
The Internet of Things (IoT) has promoted emerging applications that require massive device collaboration, heavy computation, and stringent latency. Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) systems can provide flexible services for user devices (UDs) with wide coverage. The optimization of both latency and energy consumption remains a critical yet challenging task due to the inherent trade-off between them. Joint association, offloading, and computing resource allocation are essential to achieving satisfying system performance. However, these processes are difficult due to the highly dynamic environment and the exponentially increasing complexity of large-scale networks. To address these challenges, we introduce a carefully designed cost function to balance the latency and the energy consumption, formulate the joint problem into a partially observable Markov decision process, and propose two multi-agent deep-reinforcement-learning-based schemes to tackle the long-term problem. Specifically, the multi-agent proximal policy optimization (MAPPO)-based scheme uses centralized learning and decentralized execution, while the closed-form enhanced multi-armed bandit (CF-MAB)-based scheme decouples association from offloading and computing resource allocation. In both schemes, UDs act as independent agents that learn from environmental interactions and historic decisions, make decision to maximize its individual reward function, and achieve implicit collaboration through the reward mechanism. The numerical results validate the effectiveness and show the superiority of our proposed schemes. The MAPPO-based scheme enables collaborative agent decisions for high performance in complex dynamic environments, while the CF-MAB-based scheme supports independent rapid response decisions. Full article
Show Figures

Figure 1

18 pages, 1040 KB  
Article
A TDDPG-Based Joint Optimization Method for Hybrid RIS-Assisted Vehicular Integrated Sensing and Communication
by Xinren Wang, Zhuoran Xu, Qin Wang, Yiyang Ni and Haitao Zhao
Electronics 2025, 14(15), 2992; https://doi.org/10.3390/electronics14152992 - 27 Jul 2025
Viewed by 727
Abstract
This paper proposes a novel Twin Delayed Deep Deterministic Policy Gradient (TDDPG)-based joint optimization algorithm for hybrid reconfigurable intelligent surface (RIS)-assisted integrated sensing and communication (ISAC) systems in Internet of Vehicles (IoV) scenarios. The proposed system model achieves deep integration of sensing and [...] Read more.
This paper proposes a novel Twin Delayed Deep Deterministic Policy Gradient (TDDPG)-based joint optimization algorithm for hybrid reconfigurable intelligent surface (RIS)-assisted integrated sensing and communication (ISAC) systems in Internet of Vehicles (IoV) scenarios. The proposed system model achieves deep integration of sensing and communication by superimposing the communication and sensing signals within the same waveform. To decouple the complex joint design problem, a dual-DDPG architecture is introduced, in which one agent optimizes the transmit beamforming vector and the other adjusts the RIS phase shift matrix. Both agents share a unified reward function that comprehensively considers multi-user interference (MUI), total transmit power, RIS noise power, and sensing accuracy via the CRLB constraint. Simulation results demonstrate that the proposed TDDPG algorithm significantly outperforms conventional DDPG in terms of sum rate and interference suppression. Moreover, the adoption of a hybrid RIS enables an effective trade-off between communication performance and system energy efficiency, highlighting its practical deployment potential in dynamic IoV environments. Full article
(This article belongs to the Section Microwave and Wireless Communications)
Show Figures

Figure 1

31 pages, 1576 KB  
Article
Joint Caching and Computation in UAV-Assisted Vehicle Networks via Multi-Agent Deep Reinforcement Learning
by Yuhua Wu, Yuchao Huang, Ziyou Wang and Changming Xu
Drones 2025, 9(7), 456; https://doi.org/10.3390/drones9070456 - 24 Jun 2025
Viewed by 1569
Abstract
Intelligent Connected Vehicles (ICVs) impose stringent requirements on real-time computational services. However, limited onboard resources and the high latency of remote cloud servers restrict traditional solutions. Unmanned Aerial Vehicle (UAV)-assisted Mobile Edge Computing (MEC), which deploys computing and storage resources at the network [...] Read more.
Intelligent Connected Vehicles (ICVs) impose stringent requirements on real-time computational services. However, limited onboard resources and the high latency of remote cloud servers restrict traditional solutions. Unmanned Aerial Vehicle (UAV)-assisted Mobile Edge Computing (MEC), which deploys computing and storage resources at the network edge, offers a promising solution. In UAV-assisted vehicular networks, jointly optimizing content and service caching, computation offloading, and UAV trajectories to maximize system performance is a critical challenge. This requires balancing system energy consumption and resource allocation fairness while maximizing cache hit rate and minimizing task latency. To this end, we introduce system efficiency as a unified metric, aiming to maximize overall system performance through joint optimization. This metric comprehensively considers cache hit rate, task computation latency, system energy consumption, and resource allocation fairness. The problem involves discrete decisions (caching, offloading) and continuous variables (UAV trajectories), exhibiting high dynamism and non-convexity, making it challenging for traditional optimization methods. Concurrently, existing multi-agent deep reinforcement learning (MADRL) methods often encounter training instability and convergence issues in such dynamic and non-stationary environments. To address these challenges, this paper proposes a MADRL-based joint optimization approach. We precisely model the problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and adopt the Multi-Agent Proximal Policy Optimization (MAPPO) algorithm, which follows the Centralized Training Decentralized Execution (CTDE) paradigm. Our method aims to maximize system efficiency by achieving a judicious balance among multiple performance metrics, such as cache hit rate, task delay, energy consumption, and fairness. Simulation results demonstrate that, compared to various representative baseline methods, the proposed MAPPO algorithm exhibits significant superiority in achieving higher cumulative rewards and an approximately 82% cache hit rate. Full article
Show Figures

Figure 1

28 pages, 40968 KB  
Article
Collaborative Search Algorithm for Multi-UAVs Under Interference Conditions: A Multi-Agent Deep Reinforcement Learning Approach
by Wei Wang, Yong Chen, Yu Zhang, Yong Chen and Yihang Du
Drones 2025, 9(6), 445; https://doi.org/10.3390/drones9060445 - 18 Jun 2025
Cited by 3 | Viewed by 1590
Abstract
Unmanned aerial vehicles (UAVs) have emerged as a promising solution for collaborative search missions in complex environments. However, in the presence of interference, communication disruptions between UAVs and ground control stations can severely degrade coordination efficiency, leading to prolonged search times and reduced [...] Read more.
Unmanned aerial vehicles (UAVs) have emerged as a promising solution for collaborative search missions in complex environments. However, in the presence of interference, communication disruptions between UAVs and ground control stations can severely degrade coordination efficiency, leading to prolonged search times and reduced mission success rates. To address these challenges, this paper proposes a novel multi-agent deep reinforcement learning (MADRL) framework for joint spectrum and search collaboration in multi-UAV systems. The core problem is formulated as a combinatorial optimization task that simultaneously optimizes channel selection and heading angles to minimize the total search time under dynamic interference conditions. Due to the NP-hard nature of this problem, we decompose it into two interconnected Markov decision processes (MDPs): a spectrum collaboration subproblem solved using a received signal strength indicator (RSSI)-aware multi-agent proximal policy optimization (MAPPO) algorithm and a search collaboration subproblem addressed through a target probability map (TPM)-guided MAPPO approach with an innovative action-masking mechanism. Extensive simulations demonstrate superior performance compared to baseline methods (IPPO, QMIX, and IQL). Extensive experimental results demonstrate significant performance advantages, including 68.7% and 146.2% higher throughput compared to QMIX and IQL, respectively, along with 16.7–48.3% reduction in search completion steps versus baseline methods, while maintaining robust operations under dynamic interference conditions. The framework exhibits strong resilience to communication disruptions while maintaining stable search performance, validating its practical applicability in real-world interference scenarios. Full article
Show Figures

Figure 1

25 pages, 9731 KB  
Article
Channel and Power Allocation for Multi-Cell NOMA Using Multi-Agent Deep Reinforcement Learning and Unsupervised Learning
by Ming Sun, Yihe Zhong, Xiaoou He and Jie Zhang
Sensors 2025, 25(9), 2733; https://doi.org/10.3390/s25092733 - 25 Apr 2025
Cited by 1 | Viewed by 1045
Abstract
Among the 5G and anticipated 6G technologies, non-orthogonal multiple access (NOMA) has attracted considerable attention due to its notable advantages in data throughput. Nevertheless, it is challenging to find the near-optimal allocation of the channel and power resources to maximize the performance of [...] Read more.
Among the 5G and anticipated 6G technologies, non-orthogonal multiple access (NOMA) has attracted considerable attention due to its notable advantages in data throughput. Nevertheless, it is challenging to find the near-optimal allocation of the channel and power resources to maximize the performance of the multi-cell NOMA system. In addition, due to the complex and dynamically changing wireless communication environment and the lack of the near-optimal labels, conventional supervised learning methods cannot be directly applied. To address these challenges, this paper proposes a framework of MDRL-UL that integrates the multi-agent deep reinforcement learning with the unsupervised learning to allocate the channel and power resources in a near-optimal manner. In the framework, a multi-agent deep reinforcement learning neural network (MDRLNN) is proposed for channel allocation, while an attention-based unsupervised learning neural network (ULNN) is proposed for power allocation. Furthermore, the joint action (JA) derived from the MDRLNN for channel allocation is used as a representation to be fed into the ULNN for power allocation. In order to maximize the energy efficiency of the multi-cell NOMA system, the expectation of the energy efficiency is used to train both the MDRLNN and the ULNN. Simulation results indicate that the proposed MDRL-UL can achieve higher energy efficiency and transmission rates than other algorithms. Full article
(This article belongs to the Section Communications)
Show Figures

Figure 1

19 pages, 15931 KB  
Article
Voronoi-GRU-Based Multi-Robot Collaborative Exploration in Unknown Environments
by Yang Lei, Jian Hou, Peixin Ma and Mingze Ma
Appl. Sci. 2025, 15(6), 3313; https://doi.org/10.3390/app15063313 - 18 Mar 2025
Viewed by 2022
Abstract
In modern society, the autonomous exploration of unknown environments has attracted extensive attention due to its broad applications, such as in search and rescue operations, planetary exploration, and environmental monitoring. This paper proposes a novel collaborative exploration strategy for multiple mobile robots, aiming [...] Read more.
In modern society, the autonomous exploration of unknown environments has attracted extensive attention due to its broad applications, such as in search and rescue operations, planetary exploration, and environmental monitoring. This paper proposes a novel collaborative exploration strategy for multiple mobile robots, aiming to quickly realize the exploration of entire unknown environments. Specifically, we investigate a hierarchical control architecture, comprising an upper decision-making layer and a lower planning and mapping layer. In the upper layer, the next frontier point for each robot is determined using Voronoi partitioning and the Multi-Agent Twin Delayed Deep Deterministic policy gradient (MATD3) deep reinforcement learning algorithm in a centralized training and decentralized execution framework. In the lower layer, navigation planning is achieved using A* and Timed Elastic Band (TEB) algorithms, while an improved Cartographer algorithm is used to construct a joint map for the multi-robot system. In addition, the improved Robot Operating System (ROS) and Gazebo simulation environments speed up simulation times, further alleviating the slow training of high-precision simulation engines. Finally, the simulation results demonstrate the superiority of the proposed strategy, which achieves over 90% exploration coverage in unknown environments with a significantly reduced exploration time. Compared to MATD3, Multi-Agent Proximal Policy Optimization (MAPPO), Rapidly-Exploring Random Tree (RRT), and Cost-based methods, our strategy reduces time consumption by 41.1%, 47.0%, 63.9%, and 74.9%, respectively. Full article
(This article belongs to the Special Issue Advanced Technologies in AI Mobile Robots)
Show Figures

Figure 1

15 pages, 3608 KB  
Article
Trajectory Tracking Control Based on Deep Reinforcement Learning for a Robotic Manipulator with an Input Deadzone
by Fujie Wang, Jintao Hu, Yi Qin, Fang Guo and Ming Jiang
Symmetry 2025, 17(2), 149; https://doi.org/10.3390/sym17020149 - 21 Jan 2025
Cited by 4 | Viewed by 3365
Abstract
This paper proposes a deep reinforcement learning (DRL) method that combines random network distillation (RND) and long short-term memory (LSTM) to address the tracking control problem, while leveraging the inherent symmetry in robotic arm movements to eliminate the need for learning or knowing [...] Read more.
This paper proposes a deep reinforcement learning (DRL) method that combines random network distillation (RND) and long short-term memory (LSTM) to address the tracking control problem, while leveraging the inherent symmetry in robotic arm movements to eliminate the need for learning or knowing the system’s dynamic model. In general, the complexity and strong coupling of robotic manipulators make trajectory tracking extremely challenging. Firstly, the prediction network and fixed network are jointly trained using the RND method. The difference in output values between the two networks acts as an internal reward for the robotic manipulator environment. This internal reward mechanism encourages the robotic arm agent to actively explore unpredictable and unknown environmental states, thereby consequently boosting the performance and efficiency of the tracking control for the robotic manipulator. Then, the Soft Actor-Critic (SAC) algorithm, the LSTM network, and the attention mechanism are integrated to resolve the instability problem during training and acquire a stable policy. The LSTM model effectively captures the symmetry and temporal changes in joint angles, while the attention mechanism dynamically prioritizes important features, thereby reducing the instability of the robotic manipulator during tracking tasks and enhancing feature extraction efficiency. The simulation outcomes demonstrate that the proposed method effectively performs the robot tracking task, confirming the efficacy and efficiency of the DRL algorithm. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

31 pages, 7296 KB  
Article
NOMA-Based Rate Optimization for Multi-UAV-Assisted D2D Communication Networks
by Guowei Wu, Guifen Chen and Xinglong Gu
Drones 2025, 9(1), 62; https://doi.org/10.3390/drones9010062 - 16 Jan 2025
Cited by 1 | Viewed by 1192
Abstract
With the proliferation of smart devices and the emergence of high-bandwidth applications, Unmanned Aerial Vehicle (UAV)-assisted Device-to-Device (D2D) communications and Non-Orthogonal Multiple Access (NOMA) technologies are increasingly becoming important means of coping with the scarcity of the spectrum and with high data demand [...] Read more.
With the proliferation of smart devices and the emergence of high-bandwidth applications, Unmanned Aerial Vehicle (UAV)-assisted Device-to-Device (D2D) communications and Non-Orthogonal Multiple Access (NOMA) technologies are increasingly becoming important means of coping with the scarcity of the spectrum and with high data demand in future wireless networks. However, the efficient coordination of these techniques in complex and changing 3D environments still faces many challenges. To this end, this paper proposes a NOMA-based multi-UAV-assisted D2D communication model in which multiple UAVs are deployed in 3D space to act as airborne base stations to serve ground-based cellular users with D2D clusters. In order to maximize the system throughput, this study constructs an optimization problem of joint channel assignment, trajectory design, and power control, and on the basis of these points, this study proposes a joint dynamic hypergraph Multi-Agent Deep Q Network (DH-MDQN) algorithm. The dynamic hypergraph method is first used to construct dynamic simple edges and hyperedges and to transform them into directed graphs for efficient dynamic coloring to optimize the channel allocation process; subsequently, in terms of trajectory design and power control, the problem is modeled as a multi-agent Markov Decision Process (MDP), and the Multi-Agent Deep Q Network (MDQN) algorithm is used to collaboratively determine the trajectory design and power control of the UAVs. Simulation results show the following: (1) the proposed algorithm can achieve higher system throughput than several other benchmark algorithms with different numbers of D2D clusters, different D2D cluster communication spacing, and different UAV sizes; (2) the proposed algorithm designs UAV trajectory optimization with a 27% improvement in system throughput compared to the 2D trajectory; and (3) in the NOMA scenario, compared to the case of no decoding order constraints, the system throughput shows on average a 34% improvement. Full article
Show Figures

Figure 1

25 pages, 8441 KB  
Article
Reinforcement Learning of a Six-DOF Industrial Manipulator for Pick-and-Place Application Using Efficient Control in Warehouse Management
by Ahmed Iqdymat and Grigore Stamatescu
Sustainability 2025, 17(2), 432; https://doi.org/10.3390/su17020432 - 8 Jan 2025
Cited by 5 | Viewed by 4861
Abstract
This study investigates the integration of reinforcement learning (RL) with optimal control to enhance precision and energy efficiency in industrial robotic manipulation. A novel framework is proposed, combining Deep Deterministic Policy Gradient (DDPG) with a Linear Quadratic Regulator (LQR) controller, specifically applied to [...] Read more.
This study investigates the integration of reinforcement learning (RL) with optimal control to enhance precision and energy efficiency in industrial robotic manipulation. A novel framework is proposed, combining Deep Deterministic Policy Gradient (DDPG) with a Linear Quadratic Regulator (LQR) controller, specifically applied to the ABB IRB120, a six-degree-of-freedom (6-DOF) industrial manipulator, for pick-and-place tasks in warehouse automation. The methodology employs an actor–critic RL architecture with a 27-dimensional state input and a 6-dimensional joint action output. The RL agent was trained using MATLAB’s Reinforcement Learning Toolbox and integrated with ABB’s RobotStudio simulation environment via TCP/IP communication. LQR controllers were incorporated to optimize joint-space trajectory tracking, minimizing energy consumption while ensuring precise control. The novelty of this research lies in its synergistic combination of RL and LQR control, addressing energy efficiency and precision simultaneously—an area that has seen limited exploration in industrial robotics. Experimental validation across 100 diverse scenarios confirmed the framework’s effectiveness, achieving a mean positioning accuracy of 2.14 mm (a 28% improvement over traditional methods), a 92.5% success rate in pick-and-place tasks, and a 22.7% reduction in energy consumption. The system demonstrated stable convergence after 458 episodes and maintained a mean joint angle error of 4.30°, validating its robustness and efficiency. These findings highlight the potential of RL for broader industrial applications. The demonstrated accuracy and success rate suggest its applicability to complex tasks such as electronic component assembly, multi-step manufacturing, delicate material handling, precision coordination, and quality inspection tasks like automated visual inspection, surface defect detection, and dimensional verification. Successful implementation in such contexts requires addressing challenges including task complexity, computational efficiency, and adaptability to process variability, alongside ensuring safety, reliability, and seamless system integration. This research builds upon existing advancements in warehouse automation, inverse kinematics, and energy-efficient robotics, contributing to the development of adaptive and sustainable control strategies for industrial manipulators in automated environments. Full article
(This article belongs to the Special Issue Smart Sustainable Techniques and Technologies for Industry 5.0)
Show Figures

Figure 1

40 pages, 1079 KB  
Article
Context-Adaptable Deployment of FastSLAM 2.0 on Graphic Processing Unit with Unknown Data Association
by Jessica Giovagnola, Manuel Pegalajar Cuéllar and Diego Pedro Morales Santos
Appl. Sci. 2024, 14(23), 11466; https://doi.org/10.3390/app142311466 - 9 Dec 2024
Cited by 1 | Viewed by 2637
Abstract
Simultaneous Localization and Mapping (SLAM) algorithms are crucial for enabling agents to estimate their position in unknown environments. In autonomous navigation systems, these algorithms need to operate in real-time on devices with limited resources, emphasizing the importance of reducing complexity and ensuring efficient [...] Read more.
Simultaneous Localization and Mapping (SLAM) algorithms are crucial for enabling agents to estimate their position in unknown environments. In autonomous navigation systems, these algorithms need to operate in real-time on devices with limited resources, emphasizing the importance of reducing complexity and ensuring efficient performance. While SLAM solutions aim at ensuring accurate and timely localization and mapping, one of their main limitations is their computational complexity. In this scenario, particle filter-based approaches such as FastSLAM 2.0 can significantly benefit from parallel programming due to their modular construction. The parallelization process involves identifying the parameters affecting the computational complexity in order to distribute the computation among single multiprocessors as efficiently as possible. However, the computational complexity of methodologies such as FastSLAM 2.0 can depend on multiple parameters whose values may, in turn, depend on each specific use case scenario ( ingi.e., the context), leading to multiple possible parallelization designs. Furthermore, the features of the hardware architecture in use can significantly influence the performance in terms of latency. Therefore, the selection of the optimal parallelization modality still needs to be empirically determined. This may involve redesigning the parallel algorithm depending on the context and the hardware architecture. In this paper, we propose a CUDA-based adaptable design for FastSLAM 2.0 on GPU, in combination with an evaluation methodology that enables the assessment of the optimal parallelization modality based on the context and the hardware architecture without the need for the creation of separate designs. The proposed implementation includes the parallelization of all the functional blocks of the FastSLAM 2.0 pipeline. Additionally, we contribute a parallelized design of the data association step through the Joint Compatibility Branch and Bound (JCBB) method. Multiple resampling algorithms are also included to accommodate the needs of a wide variety of navigation scenarios. Full article
Show Figures

Figure 1

Back to TopTop