MDPI - Publisher of Open Access Journals

16 pages, 8397 KB

Open AccessArticle

Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)

by Almira Budiyanto, Keisuke Azetsu and Nobutomo Matsunaga

Automation 2024, 5(4), 597-612; https://doi.org/10.3390/automation5040034 - 27 Nov 2024

Cited by 3 | Viewed by 2220

Abstract

A method for cooperative transportation, which required formation change in a traveling environment, is gaining interest. Deep reinforcement learning is used in formation changes for multi-robot cases. The MADDPG (Multi-Agent Deep Deterministic Policy Gradient) method is popularly used for recognized environments. On the [...] Read more.

A method for cooperative transportation, which required formation change in a traveling environment, is gaining interest. Deep reinforcement learning is used in formation changes for multi-robot cases. The MADDPG (Multi-Agent Deep Deterministic Policy Gradient) method is popularly used for recognized environments. On the other hand, re-learning may be required in unrecognized circumstances by using the MADDPG method. Although the development of MADDPG using model-based learning and imitation learning has been applied to reduce learning time, it is unclear how the learning results are transferred when the number of robots changes. For example, in the GASIL-MADDPG (Generative adversarial self-imitation learning and Multi-agent Deep Deterministic Policy Gradient) method, how the results of three robot training can be transferred to the four robots’ neural networks is uncertain. Nowadays, Scaled Dot Product Attention (SDPA) has attracted attention and is highly impactful for its speed and accuracy in natural language processing. When transfer learning is combined with fast computation, the efficiency of edge-level re-learning is improved. This paper proposes a formation change algorithm that allows easy and fast multi-robot knowledge transfer using SDPA combined with MAPPO (Multi-Agent Proximal Policy Optimization), compared to other methods. This algorithm applies SDPA to multi-robot formation learning and performs fast learning by transferring the acquired knowledge of formation changes to a certain number of robots. The proposed algorithm is verified by simulating the robot formation change and was able to achieve dramatic high-speed learning capabilities. The proposed SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization) learned 20.83 times faster than the Deep Dyna-Q method. Furthermore, using transfer learning from a three-robot to five-robot case, the method shows that the learning time can be reduced by about 56.57 percent. A scenario of three-robot to five-robot is chosen based on the number of robots often used in cooperative robots. Full article

► Show Figures

Figure 1

19 pages, 3537 KB

Open AccessArticle

Integral-Valued Pythagorean Fuzzy-Set-Based Dyna Q+ Framework for Task Scheduling in Cloud Computing

by Bhargavi Krishnamurthy and Sajjan G. Shiva

Sensors 2024, 24(16), 5272; https://doi.org/10.3390/s24165272 - 14 Aug 2024

Cited by 1 | Viewed by 1125

Abstract

Task scheduling is a critical challenge in cloud computing systems, greatly impacting their performance. Task scheduling is a nondeterministic polynomial time hard (NP-Hard) problem that complicates the search for nearly optimal solutions. Five major uncertainty parameters, i.e., security, traffic, workload, availability, and price, [...] Read more.

Task scheduling is a critical challenge in cloud computing systems, greatly impacting their performance. Task scheduling is a nondeterministic polynomial time hard (NP-Hard) problem that complicates the search for nearly optimal solutions. Five major uncertainty parameters, i.e., security, traffic, workload, availability, and price, influence task scheduling decisions. The primary rationale for selecting these uncertainty parameters lies in the challenge of accurately measuring their values, as empirical estimations often diverge from the actual values. The integral-valued Pythagorean fuzzy set (IVPFS) is a promising mathematical framework to deal with parametric uncertainties. The Dyna Q+ algorithm is the updated form of the Dyna Q agent designed specifically for dynamic computing environments by providing bonus rewards to non-exploited states. In this paper, the Dyna Q+ agent is enriched with the IVPFS mathematical framework to make intelligent task scheduling decisions. The performance of the proposed IVPFS Dyna Q+ task scheduler is tested using the CloudSim 3.3 simulator. The execution time is reduced by 90%, the makespan time is also reduced by 90%, the operation cost is below 50%, and the resource utilization rate is improved by 95%, all of these parameters meeting the desired standards or expectations. The results are also further validated using an expected value analysis methodology that confirms the good performance of the task scheduler. A better balance between exploration and exploitation through rigorous action-based learning is achieved by the Dyna Q+ agent. Full article

(This article belongs to the Special Issue AI Technology for Cybersecurity and IoT Applications)

► Show Figures

Figure 1

25 pages, 27580 KB

Open AccessArticle

Enhancing Quadcopter Autonomy: Implementing Advanced Control Strategies and Intelligent Trajectory Planning

by Samira Hadid, Razika Boushaki, Fatiha Boumchedda and Sabrina Merad

Automation 2024, 5(2), 151-175; https://doi.org/10.3390/automation5020010 - 14 Jun 2024

Cited by 5 | Viewed by 2991

Abstract

In this work, an in-depth investigation into enhancing quadcopter autonomy and control capabilities is presented. The focus lies on the development and implementation of three conventional control strategies to regulate the behavior of quadcopter UAVs: a proportional–integral–derivative (PID) controller, a sliding mode controller, [...] Read more.

In this work, an in-depth investigation into enhancing quadcopter autonomy and control capabilities is presented. The focus lies on the development and implementation of three conventional control strategies to regulate the behavior of quadcopter UAVs: a proportional–integral–derivative (PID) controller, a sliding mode controller, and a fractional-order PID (FOPID) controller. Utilizing careful adjustments and fine-tuning, each control strategy is customized to attain the desired dynamic response and stability during quadcopter flight. Additionally, an approach called Dyna-Q learning for obstacle avoidance is introduced and seamlessly integrated into the control system. Leveraging MATLAB as a powerful tool, the quadcopter is empowered to autonomously navigate complex environments, adeptly avoiding obstacles through real-time learning and decision-making processes. Extensive simulation experiments and evaluations, conducted in MATLAB 2018a, precisely compare the performance of the different control strategies, including the Dyna-Q learning-based obstacle avoidance technique. This comprehensive analysis allows us to understand the strengths and limitations of each approach, guiding the selection of the most effective control strategy for specific application scenarios. Overall, this research presents valuable insights and solutions for optimizing flight stability and enabling secure and efficient operations in diverse real-world scenarios. Full article

► Show Figures

Figure 1

27 pages, 6109 KB

Open AccessArticle

An Improved Dyna-Q Algorithm Inspired by the Forward Prediction Mechanism in the Rat Brain for Mobile Robot Path Planning

by Jing Huang, Ziheng Zhang and Xiaogang Ruan

Biomimetics 2024, 9(6), 315; https://doi.org/10.3390/biomimetics9060315 - 23 May 2024

Cited by 4 | Viewed by 2523

Abstract

The traditional Model-Based Reinforcement Learning (MBRL) algorithm has high computational cost, poor convergence, and poor performance in robot spatial cognition and navigation tasks, and it cannot fully explain the ability of animals to quickly adapt to environmental changes and learn a variety of [...] Read more.

The traditional Model-Based Reinforcement Learning (MBRL) algorithm has high computational cost, poor convergence, and poor performance in robot spatial cognition and navigation tasks, and it cannot fully explain the ability of animals to quickly adapt to environmental changes and learn a variety of complex tasks. Studies have shown that vicarious trial and error (VTE) and the hippocampus forward prediction mechanism in rats and other mammals can be used as key components of action selection in MBRL to support “goal-oriented” behavior. Therefore, we propose an improved Dyna-Q algorithm inspired by the forward prediction mechanism of the hippocampus to solve the above problems and tackle the exploration–exploitation dilemma of Reinforcement Learning (RL). This algorithm alternately presents the potential path in the future for mobile robots and dynamically adjusts the sweep length according to the decision certainty, so as to determine action selection. We test the performance of the algorithm in a two-dimensional maze environment with static and dynamic obstacles, respectively. Compared with classic RL algorithms like State-Action-Reward-State-Action (SARSA) and Dyna-Q, the algorithm can speed up spatial cognition and improve the global search ability of path planning. In addition, our method reflects key features of how the brain organizes MBRL to effectively solve difficult tasks such as navigation, and it provides a new idea for spatial cognitive tasks from a biological perspective. Full article

(This article belongs to the Special Issue Bioinspired Algorithms)

► Show Figures

Figure 1

13 pages, 2670 KB

Open AccessArticle

Elucidating the Impact of Deleterious Mutations on IGHG1 and Their Association with Huntington’s Disease

by Alaa Shafie, Amal Adnan Ashour, Farah Anjum, Anas Shamsi and Md. Imtaiyaz Hassan

J. Pers. Med. 2024, 14(4), 380; https://doi.org/10.3390/jpm14040380 - 1 Apr 2024

Cited by 4 | Viewed by 2253

Abstract

Huntington’s disease (HD) is a chronic, inherited neurodegenerative condition marked by chorea, dementia, and changes in personality. The primary cause of HD is a mutation characterized by the expansion of a triplet repeat (CAG) within the huntingtin gene located on chromosome 4. Despite [...] Read more.

Huntington’s disease (HD) is a chronic, inherited neurodegenerative condition marked by chorea, dementia, and changes in personality. The primary cause of HD is a mutation characterized by the expansion of a triplet repeat (CAG) within the huntingtin gene located on chromosome 4. Despite substantial progress in elucidating the molecular and cellular mechanisms of HD, an effective treatment for this disorder is not available so far. In recent years, researchers have been interested in studying cerebrospinal fluid (CSF) as a source of biomarkers that could aid in the diagnosis and therapeutic development of this disorder. Immunoglobulin heavy constant gamma 1 (IGHG1) is one of the CSF proteins found to increase significantly in HD. Considering this, it is reasonable to study the potential involvement of deleterious mutations in IGHG1 in the pathogenesis of this disorder. In this study, we explored the potential impact of deleterious mutations on IGHG1 and their subsequent association with HD. We evaluated 126 single-point amino acid substitutions for their impact on the structure and functionality of the IGHG1 protein while exploiting multiple computational resources such as SIFT, PolyPhen-2, FATHMM, SNPs&Go mCSM, DynaMut2, MAESTROweb, PremPS, MutPred2, and PhD-SNP. The sequence- and structure-based tools highlighted 10 amino acid substitutions that were deleterious and destabilizing. Subsequently, out of these 10 mutations, eight variants (Y32C, Y32D, P34S, V39E, C83R, C83Y, V85M, and H87Q) were identified as pathogenic by disease phenotype predictors. Finally, two pathogenic variants (Y32C and P34S) were found to reduce the solubility of the protein, suggesting their propensity to form protein aggregates. These variants also exhibited higher residual frustration within the protein structure. Considering these findings, the study hypothesized that the identified variants of IGHG1 may compromise its function and potentially contribute to HD pathogenesis. Full article

(This article belongs to the Special Issue Personalized Treatment for Musculoskeletal Diseases)

► Show Figures

Figure 1

22 pages, 27075 KB

Open AccessArticle

Deep Dyna-Q for Rapid Learning and Improved Formation Achievement in Cooperative Transportation

by Almira Budiyanto and Nobutomo Matsunaga

Automation 2023, 4(3), 210-231; https://doi.org/10.3390/automation4030013 - 10 Jul 2023

Cited by 10 | Viewed by 3899

Abstract

Nowadays, academic research, disaster mitigation, industry, and transportation apply the cooperative multi-agent concept. A cooperative multi-agent system is a multi-agent system that works together to solve problems or maximise utility. The essential marks of formation control are how the multiple agents can reach [...] Read more.

Nowadays, academic research, disaster mitigation, industry, and transportation apply the cooperative multi-agent concept. A cooperative multi-agent system is a multi-agent system that works together to solve problems or maximise utility. The essential marks of formation control are how the multiple agents can reach the desired point while maintaining their position in the formation based on the dynamic conditions and environment. A cooperative multi-agent system closely relates to the formation change issue. It is necessary to change the arrangement of multiple agents according to the environmental conditions, such as when avoiding obstacles, applying different sizes and shapes of tracks, and moving different sizes and shapes of transport objects. Reinforcement learning is a good method to apply in a formation change environment. On the other hand, the complex formation control process requires a long learning time. This paper proposed using the Deep Dyna-Q algorithm to speed up the learning process while improving the formation achievement rate by tuning the parameters of the Deep Dyna-Q algorithm. Even though the Deep Dyna-Q algorithm has been used in many applications, it has not been applied in an actual experiment. The contribution of this paper is the application of the Deep Dyna-Q algorithm in formation control in both simulations and actual experiments. This study successfully implements the proposed method and investigates formation control in simulations and actual experiments. In the actual experiments, the Nexus robot with a robot operating system (ROS) was used. To confirm the communication between the PC and robots, camera processing, and motor controller, the velocities from the simulation were directly given to the robots. The simulations could give the same goal points as the actual experiments, so the simulation results approach the actual experimental results. The discount rate and learning rate values affected the formation change achievement rate, collision number among agents, and collisions between agents and transport objects. For learning rate comparison, DDQ (0.01) consistently outperformed DQN. DQN obtained the maximum −170 reward in about 130,000 episodes, while DDQ (0.01) could achieve this value in 58,000 episodes and achieved a maximum −160 reward. The application of an MEC (model error compensator) in the actual experiment successfully reduced the error movement of the robots so that the robots could produce the formation change appropriately. Full article

(This article belongs to the Topic Target Tracking, Guidance, and Navigation for Autonomous Systems)

► Show Figures

Figure 1

17 pages, 551 KB

Open AccessArticle

Improved Dyna-Q: A Reinforcement Learning Method Focused via Heuristic Graph for AGV Path Planning in Dynamic Environments

by Yiyang Liu, Shuaihua Yan, Yang Zhao, Chunhe Song and Fei Li

Drones 2022, 6(11), 365; https://doi.org/10.3390/drones6110365 - 19 Nov 2022

Cited by 17 | Viewed by 6350

Abstract

Dyna-Q is a reinforcement learning method widely used in AGV path planning. However, in large complex dynamic environments, due to the sparse reward function of Dyna-Q and the large searching space, this method has the problems of low search efficiency, slow convergence speed, [...] Read more.

Dyna-Q is a reinforcement learning method widely used in AGV path planning. However, in large complex dynamic environments, due to the sparse reward function of Dyna-Q and the large searching space, this method has the problems of low search efficiency, slow convergence speed, and even inability to converge, which seriously reduces the performance and practicability of it. To solve these problems, this paper proposes an Improved Dyna-Q algorithm for AGV path planning in large complex dynamic environments. First, to solve the problem of the large search space, this paper proposes a global path guidance mechanism based on heuristic graph, which can effectively reduce the path search space and, thus, improve the efficiency of obtaining the optimal path. Second, to solve the problem of the sparse reward function in Dyna-Q, this paper proposes a novel dynamic reward function and an action selection method based on the heuristic graph, which can provide more intensive feedback and more efficient action decision for AGV path planning, effectively improving the convergence of the algorithm. We evaluated our approach in scenarios with static obstacles and dynamic obstacles. The experimental results show that the proposed algorithm can obtain better paths more efficiently than other reinforcement-learning-based methods including the classical Q-Learning and the Dyna-Q algorithms. Full article

► Show Figures

Figure 1

11 pages, 2808 KB

Open AccessArticle

Anti-Jamming Path Selection Method in a Wireless Communication Network Based on Dyna-Q

by Guoliang Zhang, Yonggui Li, Yingtao Niu and Quan Zhou

Electronics 2022, 11(15), 2397; https://doi.org/10.3390/electronics11152397 - 31 Jul 2022

Cited by 8 | Viewed by 2377

Abstract

Aiming at efficiently establishing the optimal transmission path in a wireless communication network in a malicious jamming environment, this paper proposes an anti-jamming algorithm based on Dyna-Q. Based on previous observations of the environment, the algorithm selects the optimal sequential node by searching [...] Read more.

Aiming at efficiently establishing the optimal transmission path in a wireless communication network in a malicious jamming environment, this paper proposes an anti-jamming algorithm based on Dyna-Q. Based on previous observations of the environment, the algorithm selects the optimal sequential node by searching the Q table to reduce the packet loss rate. The algorithm can accelerate the updating of the Q table based on previous experience. The Q table converges to the optimal value quickly. This is beneficial for the optimal selection of subsequent nodes. Simulation results show that the proposed algorithm has the advantage of faster convergence speed compared with the model-free reinforcement learning algorithm. Full article

(This article belongs to the Section Networks)

► Show Figures

Figure 1

17 pages, 7124 KB

Open AccessArticle

Reinforcement Learning Approach to Design Practical Adaptive Control for a Small-Scale Intelligent Vehicle

by Bo Hu, Jiaxi Li, Jie Yang, Haitao Bai, Shuang Li, Youchang Sun and Xiaoyu Yang

Symmetry 2019, 11(9), 1139; https://doi.org/10.3390/sym11091139 - 7 Sep 2019

Cited by 24 | Viewed by 5743

Abstract

Reinforcement learning (RL) based techniques have been employed for the tracking and adaptive cruise control of a small-scale vehicle with the aim to transfer the obtained knowledge to a full-scale intelligent vehicle in the near future. Unlike most other control techniques, the purpose [...] Read more.

Reinforcement learning (RL) based techniques have been employed for the tracking and adaptive cruise control of a small-scale vehicle with the aim to transfer the obtained knowledge to a full-scale intelligent vehicle in the near future. Unlike most other control techniques, the purpose of this study is to seek a practical method that enables the vehicle, in the real environment and in real time, to learn the control behavior on its own while adapting to the changing circumstances. In this context, it is necessary to design an algorithm that symmetrically considers both time efficiency and accuracy. Meanwhile, in order to realize adaptive cruise control specifically, a set of symmetrical control actions consisting of steering angle and vehicle speed needs to be optimized simultaneously. In this paper, firstly, the experimental setup of the small-scale intelligent vehicle is introduced. Subsequently, three model-free RL algorithm are conducted to develop and finally form the strategy to keep the vehicle within its lanes at constant and top velocity. Furthermore, a model-based RL strategy is compared that incorporates learning from real experience and planning from simulated experience. Finally, a Q-learning based adaptive cruise control strategy is intermixed to the existing tracking control architecture to allow the vehicle slow-down in the curve and accelerate on straightaways. The experimental results show that the Q-learning and Sarsa (λ) algorithms can achieve a better tracking behavior than the conventional Sarsa, and Q-learning outperform Sarsa (λ) in terms of computational complexity. The Dyna-Q method performs similarly with the Sarsa (λ) algorithms, but with a significant reduction of computational time. Compared with a fine-tuned proportion integration differentiation (PID) controller, the good-balanced Q-learning is seen to perform better and it can also be easily applied to control problems with over one control actions. Full article

► Show Figures

Figure 1

19 pages, 4875 KB

Open AccessArticle

A Dyna-Q-Based Solution for UAV Networks Against Smart Jamming Attacks

by Zhiwei Li, Yu Lu, Yun Shi, Zengguang Wang, Wenxin Qiao and Yicen Liu

Symmetry 2019, 11(5), 617; https://doi.org/10.3390/sym11050617 - 2 May 2019

Cited by 25 | Viewed by 5652

Abstract

Unmanned aerial vehicle (UAV) networks have a wide range of applications, such as in the Internet of Things (IoT), 5G communications, and so forth. However, the communications between UAVs and UAVs to ground control stations mainly use radio channels, and therefore these communications [...] Read more.

Unmanned aerial vehicle (UAV) networks have a wide range of applications, such as in the Internet of Things (IoT), 5G communications, and so forth. However, the communications between UAVs and UAVs to ground control stations mainly use radio channels, and therefore these communications are vulnerable to cyberattacks. With the advent of software-defined radio (SDR), smart attacks that can flexibly select attack strategies according to the defender’s state information are gradually attracting the attention of researchers and potential attackers of UAV networks. The smart attack can even induce the defender to take a specific defense strategy, causing even greater damage. Inspired by symmetrical thinking, a solution using a software-defined network (SDN) to combat software-defined radio was proposed. We propose a network architecture which uses dual controllers, including a UAV flight controller and SDN controller, to achieve collaborative decision-making. Built on the top of the SDN, the state information of the whole network converges quickly and is fitted to an environment model used to develop an improved Dyna-Q-based reinforcement learning algorithm. The improved algorithm integrates the power allocation and track planning of UAVs into a unified action space. The simulation data showed that the proposed communication solution can effectively avoid smart jamming attacks and has faster learning efficiency and higher convergence performance than the compared algorithms. Full article

► Show Figures

Figure 1

18 pages, 783 KB

Open AccessArticle

Reinforcement Learning–Based Energy Management Strategy for a Hybrid Electric Tracked Vehicle

by Teng Liu, Yuan Zou, Dexing Liu and Fengchun Sun

Energies 2015, 8(7), 7243-7260; https://doi.org/10.3390/en8077243 - 16 Jul 2015

Cited by 96 | Viewed by 10256

Abstract

This paper presents a reinforcement learning (RL)–based energy management strategy for a hybrid electric tracked vehicle. A control-oriented model of the powertrain and vehicle dynamics is first established. According to the sample information of the experimental driving schedule, statistical characteristics at various velocities [...] Read more.

This paper presents a reinforcement learning (RL)–based energy management strategy for a hybrid electric tracked vehicle. A control-oriented model of the powertrain and vehicle dynamics is first established. According to the sample information of the experimental driving schedule, statistical characteristics at various velocities are determined by extracting the transition probability matrix of the power request. Two RL-based algorithms, namely Q-learning and Dyna algorithms, are applied to generate optimal control solutions. The two algorithms are simulated on the same driving schedule, and the simulation results are compared to clarify the merits and demerits of these algorithms. Although the Q-learning algorithm is faster (3 h) than the Dyna algorithm (7 h), its fuel consumption is 1.7% higher than that of the Dyna algorithm. Furthermore, the Dyna algorithm registers approximately the same fuel consumption as the dynamic programming–based global optimal solution. The computational cost of the Dyna algorithm is substantially lower than that of the stochastic dynamic programming. Full article

(This article belongs to the Special Issue Advances in Plug-in Hybrid Vehicles and Hybrid Vehicles)

► Show Figures

Figure 1

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI