MDPI - Publisher of Open Access Journals

25 pages, 7158 KiB

Open AccessArticle

Anti-Jamming Decision-Making for Phased-Array Radar Based on Improved Deep Reinforcement Learning

by Hang Zhao, Hu Song, Rong Liu, Jiao Hou and Xianxiang Yu

Electronics 2025, 14(11), 2305; https://doi.org/10.3390/electronics14112305 - 5 Jun 2025

Viewed by 618

In existing phased-array radar systems, anti-jamming strategies are mainly generated through manual judgment. However, manually designing or selecting anti-jamming decisions is often difficult and unreliable in complex jamming environments. Therefore, reinforcement learning is applied to anti-jamming decision-making to solve the above problems. However, [...] Read more.

In existing phased-array radar systems, anti-jamming strategies are mainly generated through manual judgment. However, manually designing or selecting anti-jamming decisions is often difficult and unreliable in complex jamming environments. Therefore, reinforcement learning is applied to anti-jamming decision-making to solve the above problems. However, the existing anti-jamming decision-making models based on reinforcement learning often suffer from problems such as low convergence speeds and low decision-making accuracy. In this paper, a multi-aspect improved deep Q-network (MAI-DQN) is proposed to improve the exploration policy, the network structure, and the training methods of the deep Q-network. In order to solve the problem of the

ϵ

-greedy strategy being highly dependent on hyperparameter settings, and the Q-value being overly influenced by the action in other deep Q-networks, this paper proposes a structure that combines a noisy network, a dueling network, and a double deep Q-network, which incorporates an adaptive exploration policy into the neural network and increases the influence of the state itself on the Q-value. These enhancements enable a highly adaptive exploration strategy and a high-performance network architecture, thereby improving the decision-making accuracy of the model. In order to calculate the target value more accurately during the training process and improve the stability of the parameter update, this paper proposes a training method that combines n-step learning, target soft update, variable learning rate, and gradient clipping. Moreover, a novel variable double-depth priority experience replay (VDDPER) method that more accurately simulates the storage and update mechanism of human memory is used in the MAI-DQN. The VDDPER improves the decision-making accuracy by dynamically adjusting the sample size based on different values of experience during training, enhancing exploration during the early stages of training, and placing greater emphasis on high-value experiences in the later stages. Enhancements to the training method improve the model’s convergence speed. Moreover, a reward function combining signal-level and data-level benefits is proposed to adapt to complex jamming environments, which ensures a high reward convergence speed with fewer computational resources. The findings of a simulation experiment show that the proposed phased-array radar anti-jamming decision-making method based on MAI-DQN can achieve a high convergence speed and high decision-making accuracy in environments where deceptive jamming and suppressive jamming coexist. Full article

(This article belongs to the Special Issue Advanced Radar Waveform Design and Intelligent Countermeasures in Integrated Radar and Communication Systems)

► Show Figures

Figure 1

31 pages, 363 KiB

Open AccessArticle

Dynamic Stepsize Techniques in DR-Submodular Maximization

by Yanfei Li, Min Li, Qian Liu and Yang Zhou

Mathematics 2025, 13(9), 1447; https://doi.org/10.3390/math13091447 - 28 Apr 2025

Viewed by 261

Abstract

The Diminishing-Return (DR)-submodular function maximization problem has garnered significant attention across various domains in recent years. Classic methods often employ continuous greedy or Frank–Wolfe approaches to tackle this problem; however, high iteration and subproblem solver complexity are typically required to control the approximation [...] Read more.

The Diminishing-Return (DR)-submodular function maximization problem has garnered significant attention across various domains in recent years. Classic methods often employ continuous greedy or Frank–Wolfe approaches to tackle this problem; however, high iteration and subproblem solver complexity are typically required to control the approximation ratio effectively. In this paper, we introduce a strategy that employs a binary search to find the dynamic stepsize, integrating it into traditional algorithm frameworks to address problems with different constraint types. We demonstrate that algorithms using this dynamic stepsize strategy can achieve comparable approximation ratios to those using a fixed stepsize strategy. In the monotone case, the iteration complexity is

O ({∥ \nabla F (0) ∥}_{1} ϵ^{- 1})

, while in the non-monotone scenario, it is

O (n + {∥ \nabla F (0) ∥}_{1} ϵ^{- 1})

, where F denotes the objective function. We then apply this strategy to solving stochastic DR-submodular function maximization problems, obtaining corresponding iteration complexity results in a high-probability form. Furthermore, theoretical examples as well as numerical experiments validate that this stepsize selection strategy outperforms the fixed stepsize strategy. Full article

(This article belongs to the Special Issue Optimization Theory, Method and Application, 2nd Edition)

► Show Figures

Figure 1

33 pages, 1020 KiB

Open AccessArticle

Reinforcement Q-Learning-Based Adaptive Encryption Model for Cyberthreat Mitigation in Wireless Sensor Networks

by Sreeja Balachandran Nair Premakumari, Gopikrishnan Sundaram, Marco Rivera, Patrick Wheeler and Ricardo E. Pérez Guzmán

Sensors 2025, 25(7), 2056; https://doi.org/10.3390/s25072056 - 26 Mar 2025

Cited by 1 | Viewed by 1194

Abstract

The increasing prevalence of cyber threats in wireless sensor networks (WSNs) necessitates adaptive and efficient security mechanisms to ensure robust data transmission while addressing resource constraints. This paper proposes a reinforcement learning-based adaptive encryption framework that dynamically scales encryption levels based on real-time [...] Read more.

The increasing prevalence of cyber threats in wireless sensor networks (WSNs) necessitates adaptive and efficient security mechanisms to ensure robust data transmission while addressing resource constraints. This paper proposes a reinforcement learning-based adaptive encryption framework that dynamically scales encryption levels based on real-time network conditions and threat classification. The proposed model leverages a deep learning-based anomaly detection system to classify network states into low, moderate, or high threat levels, which guides encryption policy selection. The framework integrates dynamic Q-learning for optimizing energy efficiency in low-threat conditions and double Q-learning for robust security adaptation in high-threat environments. A Hybrid Policy Derivation Algorithm is introduced to balance encryption complexity and computational overhead by dynamically switching between these learning models. The proposed system is formulated as a Markov Decision Process (MDP), where encryption level selection is driven by a reward function that optimizes the trade-off between energy efficiency and security robustness. The adaptive learning strategy employs an

ϵ

-greedy exploration-exploitation mechanism with an exponential decay rate to enhance convergence in dynamic WSN environments. The model also incorporates a dynamic hyperparameter tuning mechanism that optimally adjusts learning rates and exploration parameters based on real-time network feedback. Experimental evaluations conducted in a simulated WSN environment demonstrate the effectiveness of the proposed framework, achieving a 30.5% reduction in energy consumption, a 92.5% packet delivery ratio (PDR), and a 94% mitigation efficiency against multiple cyberattack scenarios, including DDoS, black-hole, and data injection attacks. Additionally, the framework reduces latency by 37% compared to conventional encryption techniques, ensuring minimal communication delays. These results highlight the scalability and adaptability of reinforcement learning-driven adaptive encryption in resource-constrained networks, paving the way for real-world deployment in next-generation IoT and WSN applications. Full article

(This article belongs to the Special Issue Intelligence, Security, Trust and Privacy Advances in IoT, Bigdata and 5G Networks (2nd Edition))

► Show Figures

Figure 1

21 pages, 1004 KiB

Open AccessArticle

A Histogram Publishing Method under Differential Privacy That Involves Balancing Small-Bin Availability First

by Jianzhang Chen, Shuo Zhou, Jie Qiu, Yixin Xu, Bozhe Zeng, Wanchuan Fang, Xiangying Chen, Yipeng Huang, Zhengquan Xu and Youqin Chen

Algorithms 2024, 17(7), 293; https://doi.org/10.3390/a17070293 - 4 Jul 2024

Viewed by 1633

Abstract

Differential privacy, a cornerstone of privacy-preserving techniques, plays an indispensable role in ensuring the secure handling and sharing of sensitive data analysis across domains such as in census, healthcare, and social networks. Histograms, serving as a visually compelling tool for presenting analytical outcomes, [...] Read more.

Differential privacy, a cornerstone of privacy-preserving techniques, plays an indispensable role in ensuring the secure handling and sharing of sensitive data analysis across domains such as in census, healthcare, and social networks. Histograms, serving as a visually compelling tool for presenting analytical outcomes, are widely employed in these sectors. Currently, numerous algorithms for publishing histograms under differential privacy have been developed, striving to balance privacy protection with the provision of useful data. Nonetheless, the pivotal challenge concerning the effective enhancement of precision for small bins (those intervals that are narrowly defined or contain a relatively small number of data points) within histograms has yet to receive adequate attention and in-depth investigation from experts. In standard DP histogram publishing, adding noise without regard for bin size can result in small data bins being disproportionately influenced by noise, potentially severely impairing the overall accuracy of the histogram. In response to this challenge, this paper introduces the SReB_GCA sanitization algorithm designed to enhance the accuracy of small bins in DP histograms. The SReB_GCA approach involves sorting the bins from smallest to largest and applying a greedy grouping strategy, with a predefined lower bound on the mean relative error required for a bin to be included in a group. Our theoretical analysis reveals that sorting bins in ascending order prior to grouping effectively prioritizes the accuracy of smaller bins. SReB_GCA ensures strict

ϵ

-DP compliance and strikes a careful balance between reconstruction error and noise error, thereby not only initially improving the accuracy of small bins but also approximately optimizing the mean relative error of the entire histogram. To validate the efficiency of our proposed SReB_GCA method, we conducted extensive experiments using four diverse datasets, including two real-life datasets and two synthetic ones. The experimental results, quantified by the Kullback–Leibler Divergence (KLD), show that the SReB_GCA algorithm achieves substantial performance enhancement compared to the baseline method (DP_BASE) and several other established approaches for differential privacy histogram publication. Full article

(This article belongs to the Section Randomized, Online, and Approximation Algorithms)

► Show Figures

Figure 1

34 pages, 76174 KiB

Open AccessArticle

Cooperative Multi-Agent Reinforcement Learning for Data Gathering in Energy-Harvesting Wireless Sensor Networks

by Efi Dvir, Mark Shifrin and Omer Gurewitz

Mathematics 2024, 12(13), 2102; https://doi.org/10.3390/math12132102 - 4 Jul 2024

Cited by 8 | Viewed by 2345

Abstract

This study introduces a novel approach to data gathering in energy-harvesting wireless sensor networks (EH-WSNs) utilizing cooperative multi-agent reinforcement learning (MARL). In addressing the challenges of efficient data collection in resource-constrained WSNs, we propose and examine a decentralized, autonomous communication framework where sensors [...] Read more.

This study introduces a novel approach to data gathering in energy-harvesting wireless sensor networks (EH-WSNs) utilizing cooperative multi-agent reinforcement learning (MARL). In addressing the challenges of efficient data collection in resource-constrained WSNs, we propose and examine a decentralized, autonomous communication framework where sensors function as individual agents. These agents employ an extended version of the Q-learning algorithm, tailored for a multi-agent setting, enabling independent learning and adaptation of their data transmission strategies. We introduce therein a specialized

ϵ

-p-greedy exploration method which is well suited for MAS settings. The key objective of our approach is the maximization of report flow, aligning with specific applicative goals for these networks. Our model operates under varying energy constraints and dynamic environments, with each sensor making decisions based on interactions within the network, devoid of explicit inter-sensor communication. The focus is on optimizing the frequency and efficiency of data report delivery to a central collection point, taking into account the unique attributes of each sensor. Notably, our findings present a surprising result: despite the known challenges of Q-learning in MARL, such as non-stationarity and the lack of guaranteed convergence to optimality due to multi-agent related pathologies, the cooperative nature of the MARL protocol in our study obtains high network performance. We present simulations and analyze key aspects contributing to coordination in various scenarios. A noteworthy feature of our system is its perpetual learning capability, which fosters network adaptiveness in response to changes such as sensor malfunctions or new sensor integrations. This dynamic adaptability ensures sustained and effective resource utilization, even as network conditions evolve. Our research lays grounds for learning-based WSNs and offers vital insights into the application of MARL in real-world EH-WSN scenarios, underscoring its effectiveness in navigating the intricate challenges of large-scale, resource-limited sensor networks. Full article

(This article belongs to the Special Issue Markov Decision Processes with Applications)

► Show Figures

Figure 1

14 pages, 18567 KiB

Open AccessArticle

Toward Energy-Efficient Routing of Multiple AGVs with Multi-Agent Reinforcement Learning

by Xianfeng Ye, Zhiyun Deng, Yanjun Shi and Weiming Shen

Sensors 2023, 23(12), 5615; https://doi.org/10.3390/s23125615 - 15 Jun 2023

Cited by 15 | Viewed by 4071

Abstract

This paper presents a multi-agent reinforcement learning (MARL) algorithm to address the scheduling and routing problems of multiple automated guided vehicles (AGVs), with the goal of minimizing overall energy consumption. The proposed algorithm is developed based on the multi-agent deep deterministic policy gradient [...] Read more.

This paper presents a multi-agent reinforcement learning (MARL) algorithm to address the scheduling and routing problems of multiple automated guided vehicles (AGVs), with the goal of minimizing overall energy consumption. The proposed algorithm is developed based on the multi-agent deep deterministic policy gradient (MADDPG) algorithm, with modifications made to the action and state space to fit the setting of AGV activities. While previous studies overlooked the energy efficiency of AGVs, this paper develops a well-designed reward function that helps to optimize the overall energy consumption required to fulfill all tasks. Moreover, we incorporate the e-greedy exploration strategy into the proposed algorithm to balance exploration and exploitation during training, which helps it converge faster and achieve better performance. The proposed MARL algorithm is equipped with carefully selected parameters that aid in avoiding obstacles, speeding up path planning, and achieving minimal energy consumption. To demonstrate the effectiveness of the proposed algorithm, three types of numerical experiments including the ϵ-greedy MADDPG, MADDPG, and Q-Learning methods were conducted. The results show that the proposed algorithm can effectively solve the multi-AGV task assignment and path planning problems, and the energy consumption results show that the planned routes can effectively improve energy efficiency. Full article

(This article belongs to the Special Issue Internet of Things, Big Data and Smart Systems II)

► Show Figures

Figure 1

24 pages, 10648 KiB

Open AccessArticle

Slicing Resource Allocation Based on Dueling DQN for eMBB and URLLC Hybrid Services in Heterogeneous Integrated Networks

by Geng Chen, Rui Shao, Fei Shen and Qingtian Zeng

Sensors 2023, 23(5), 2518; https://doi.org/10.3390/s23052518 - 24 Feb 2023

Cited by 8 | Viewed by 3493

Abstract

In 5G/B5G communication systems, network slicing is utilized to tackle the problem of the allocation of network resources for diverse services with changing demands. We proposed an algorithm that prioritizes the characteristic requirements of two different services and tackles the problem of allocation [...] Read more.

In 5G/B5G communication systems, network slicing is utilized to tackle the problem of the allocation of network resources for diverse services with changing demands. We proposed an algorithm that prioritizes the characteristic requirements of two different services and tackles the problem of allocation and scheduling of resources in the hybrid services system with eMBB and URLLC. Firstly, the resource allocation and scheduling are modeled, subject to the rate and delay constraints of both services. Secondly, the purpose of adopting a dueling deep Q network (Dueling DQN) is to approach the formulated non-convex optimization problem innovatively, in which a resource scheduling mechanism and the

ϵ

-greedy strategy were utilized to select the optimal resource allocation action. Moreover, the reward-clipping mechanism is introduced to enhance the training stability of Dueling DQN. Meanwhile, we choose a suitable bandwidth allocation resolution to increase flexibility in resource allocation. Finally, the simulations indicate that the proposed Dueling DQN algorithm has excellent performance in terms of quality of experience (QoE), spectrum efficiency (SE) and network utility, and the scheduling mechanism makes the performance much more stable. In contrast with Q-learning, DQN as well as Double DQN, the proposed algorithm based on Dueling DQN improves the network utility by 11%, 8% and 2%, respectively. Full article

(This article belongs to the Special Issue Smart Mobile and Sensing Applications)

► Show Figures

Figure 1

24 pages, 2680 KiB

Open AccessArticle

A Mapless Local Path Planning Approach Using Deep Reinforcement Learning Framework

by Yan Yin, Zhiyu Chen, Gang Liu and Jianwei Guo

Sensors 2023, 23(4), 2036; https://doi.org/10.3390/s23042036 - 10 Feb 2023

Cited by 21 | Viewed by 5489

Abstract

The key module for autonomous mobile robots is path planning and obstacle avoidance. Global path planning based on known maps has been effectively achieved. Local path planning in unknown dynamic environments is still very challenging due to the lack of detailed environmental information [...] Read more.

The key module for autonomous mobile robots is path planning and obstacle avoidance. Global path planning based on known maps has been effectively achieved. Local path planning in unknown dynamic environments is still very challenging due to the lack of detailed environmental information and unpredictability. This paper proposes an end-to-end local path planner n-step dueling double DQN with reward-based

ϵ

-greedy (RND3QN) based on a deep reinforcement learning framework, which acquires environmental data from LiDAR as input and uses a neural network to fit Q-values to output the corresponding discrete actions. The bias is reduced using n-step bootstrapping based on deep Q-network (DQN). The

ϵ

-greedy exploration-exploitation strategy is improved with the reward value as a measure of exploration, and an auxiliary reward function is introduced to increase the reward distribution of the sparse reward environment. Simulation experiments are conducted on the gazebo to test the algorithm’s effectiveness. The experimental data demonstrate that the average total reward value of RND3QN is higher than that of algorithms such as dueling double DQN (D3QN), and the success rates are increased by 174%, 65%, and 61% over D3QN on three stages, respectively. We experimented on the turtlebot3 waffle pi robot, and the strategies learned from the simulation can be effectively transferred to the real robot. Full article

(This article belongs to the Topic Advances in Mobile Robotics Navigation)

► Show Figures

Figure 1

26 pages, 636 KiB

Open AccessArticle

Influence Maximization under Fairness Budget Distribution in Online Social Networks

by Bich-Ngan T. Nguyen, Phuong N. H. Pham, Van-Vang Le and Václav Snášel

Mathematics 2022, 10(22), 4185; https://doi.org/10.3390/math10224185 - 9 Nov 2022

Cited by 5 | Viewed by 2631

Abstract

In social influence analysis, viral marketing, and other fields, the influence maximization problem is a fundamental one with critical applications and has attracted many researchers in the last decades. This problem asks to find a k-size seed set with the largest expected [...] Read more.

In social influence analysis, viral marketing, and other fields, the influence maximization problem is a fundamental one with critical applications and has attracted many researchers in the last decades. This problem asks to find a k-size seed set with the largest expected influence spread size. Our paper studies the problem of fairness budget distribution in influence maximization, aiming to find a seed set of size k fairly disseminated in target communities. Each community has certain lower and upper bounded budgets, and the number of each community’s elements is selected into a seed set holding these bounds. Nevertheless, resolving this problem encounters two main challenges: strongly influential seed sets might not adhere to the fairness constraint, and it is an NP-hard problem. To address these shortcomings, we propose three algorithms (

FBIM 1

,

FBIM 2

, and

FBIM 3

). These algorithms combine an improved greedy strategy for selecting seeds to ensure maximum coverage with the fairness constraints by generating sampling through a Reverse Influence Sampling framework. Our algorithms provide a

(1 / 2 - ϵ)

-approximation of the optimal solution, and require

O (k T log ((8 + 2 ϵ) n \frac{ln \frac{2}{δ} + ln (_{k}^{n})}{ϵ^{2}}))

,

O (k T log \frac{n}{ϵ^{2} k})

, and

O (\frac{T}{ϵ} log \frac{k}{ϵ} log \frac{n}{ϵ^{2} k})

complexity, respectively. We conducted experiments on real social networks. The result shows that our proposed algorithms are highly scalable while satisfying theoretical assurances, and that the coverage ratios with respect to the target communities are larger than those of the state-of-the-art alternatives; there are even cases in which our algorithms reaches

100 %

coverage with respect to target communities. In addition, our algorithms are feasible and effective even in cases involving big data; in particular, the results of the algorithms guarantee fairness constraints. Full article

(This article belongs to the Special Issue Complex Network Modeling: Theory and Applications)

► Show Figures

Figure 1

20 pages, 3941 KiB

Open AccessArticle

Reinforcement Learning for Compressed-Sensing Based Frequency Agile Radar in the Presence of Active Interference

by Shanshan Wang, Zheng Liu, Rong Xie and Lei Ran

Remote Sens. 2022, 14(4), 968; https://doi.org/10.3390/rs14040968 - 16 Feb 2022

Cited by 12 | Viewed by 2911

Abstract

Compressed sensing (CS)-based frequency agile radar (FAR) is attractive due to its superior data rate and target measurement performance. However, traditional frequency strategies for CS-based FAR are not cognitive enough to adapt well to the increasingly severe active interference environment. In this paper, [...] Read more.

Compressed sensing (CS)-based frequency agile radar (FAR) is attractive due to its superior data rate and target measurement performance. However, traditional frequency strategies for CS-based FAR are not cognitive enough to adapt well to the increasingly severe active interference environment. In this paper, we propose a cognitive frequency design method for CS-based FAR using reinforcement learning (RL). Specifically, we formulate the frequency design of CS-based FAR as a model-free partially observable Markov decision process (POMDP) to cope with the non-cooperation of the active interference environment. Then, a recognizer-based belief state computing method is proposed to relieve the storage and computation burdens in solving the model-free POMDP. This method is independent of the environmental knowledge and robust to the sensing scenario. Finally, the double deep Q network-based method using the exploration strategy integrating the CS-based recovery metric into the ϵ-greedy strategy (DDQN-CSR-ϵ-greedy) is proposed to solve the model-free POMDP. This can achieve better target measurement performance while avoiding active interference compared to the existing techniques. A number of examples are presented to demonstrate the effectiveness and advantage of the proposed design. Full article

(This article belongs to the Topic Artificial Intelligence in Sensors)

► Show Figures

Graphical abstract

15 pages, 2501 KiB

Open AccessArticle

An Improved Dueling Deep Double-Q Network Based on Prioritized Experience Replay for Path Planning of Unmanned Surface Vehicles

by Zhengwei Zhu, Can Hu, Chenyang Zhu, Yanping Zhu and Yu Sheng

J. Mar. Sci. Eng. 2021, 9(11), 1267; https://doi.org/10.3390/jmse9111267 - 13 Nov 2021

Cited by 25 | Viewed by 5739

Abstract

Unmanned Surface Vehicle (USV) has a broad application prospect and autonomous path planning as its crucial technology has developed into a hot research direction in the field of USV research. This paper proposes an Improved Dueling Deep Double-Q Network Based on Prioritized Experience [...] Read more.

Unmanned Surface Vehicle (USV) has a broad application prospect and autonomous path planning as its crucial technology has developed into a hot research direction in the field of USV research. This paper proposes an Improved Dueling Deep Double-Q Network Based on Prioritized Experience Replay (IPD3QN) to address the slow and unstable convergence of traditional Deep Q Network (DQN) algorithms in autonomous path planning of USV. Firstly, we use the deep double Q-Network to decouple the selection and calculation of the target Q value action to eliminate overestimation. The prioritized experience replay method is adopted to extract experience samples from the experience replay unit, increase the utilization rate of actual samples, and accelerate the training speed of the neural network. Then, the neural network is optimized by introducing a dueling network structure. Finally, the soft update method is used to improve the stability of the algorithm, and the dynamic

ϵ

-greedy method is used to find the optimal strategy. The experiments are first conducted in the Open AI Gym test platform to pre-validate the algorithm for two classical control problems: the Cart pole and Mountain Car problems. The impact of algorithm hyperparameters on the model performance is analyzed in detail. The algorithm is then validated in the Maze environment. The comparative analysis of simulation experiments shows that IPD3QN has a significant improvement in learning performance regarding convergence speed and convergence stability compared with DQN, D3QN, PD2QN, PDQN, PD3QN. Also, USV can plan the optimal path according to the actual navigation environment with the IPD3QN algorithm. Full article

(This article belongs to the Special Issue Artificial Intelligence in Marine Science and Engineering)

► Show Figures

Figure 1

22 pages, 4240 KiB

Open AccessArticle

Exploring Reward Strategies for Wind Turbine Pitch Control by Reinforcement Learning

by Jesús Enrique Sierra-García and Matilde Santos

Appl. Sci. 2020, 10(21), 7462; https://doi.org/10.3390/app10217462 - 23 Oct 2020

Cited by 29 | Viewed by 4053

Abstract

In this work, a pitch controller of a wind turbine (WT) inspired by reinforcement learning (RL) is designed and implemented. The control system consists of a state estimator, a reward strategy, a policy table, and a policy update algorithm. Novel reward strategies related [...] Read more.

In this work, a pitch controller of a wind turbine (WT) inspired by reinforcement learning (RL) is designed and implemented. The control system consists of a state estimator, a reward strategy, a policy table, and a policy update algorithm. Novel reward strategies related to the energy deviation from the rated power are defined. They are designed to improve the efficiency of the WT. Two new categories of reward strategies are proposed: “only positive” (O-P) and “positive-negative” (P-N) rewards. The relationship of these categories with the exploration-exploitation dilemma, the use of ϵ-greedy methods and the learning convergence are also introduced and linked to the WT control problem. In addition, an extensive analysis of the influence of the different rewards in the controller performance and in the learning speed is carried out. The controller is compared with a proportional-integral-derivative (PID) regulator for the same small wind turbine, obtaining better results. The simulations show how the P-N rewards improve the performance of the controller, stabilize the output power around the rated power, and reduce the error over time. Full article

(This article belongs to the Special Issue Intelligent Control in Industrial and Renewable Systems)

► Show Figures

Graphical abstract

15 pages, 449 KiB

Open AccessArticle

Latent Structure Matching for Knowledge Transfer in Reinforcement Learning

by Yi Zhou and Fenglei Yang

Future Internet 2020, 12(2), 36; https://doi.org/10.3390/fi12020036 - 13 Feb 2020

Cited by 1 | Viewed by 4367

Abstract

Reinforcement learning algorithms usually require a large number of empirical samples and give rise to a slow convergence in practical applications. One solution is to introduce transfer learning: Knowledge from well-learned source tasks can be reused to reduce sample request and accelerate the [...] Read more.

Reinforcement learning algorithms usually require a large number of empirical samples and give rise to a slow convergence in practical applications. One solution is to introduce transfer learning: Knowledge from well-learned source tasks can be reused to reduce sample request and accelerate the learning of target tasks. However, if an unmatched source task is selected, it will slow down or even disrupt the learning procedure. Therefore, it is very important for knowledge transfer to select appropriate source tasks that have a high degree of matching with target tasks. In this paper, a novel task matching algorithm is proposed to derive the latent structures of value functions of tasks, and align the structures for similarity estimation. Through the latent structure matching, the highly-matched source tasks are selected effectively, from which knowledge is then transferred to give action advice, and improve exploration strategies of the target tasks. Experiments are conducted on the simulated navigation environment and the mountain car environment. The results illustrate the significant performance gain of the improved exploration strategy, compared with traditional

ϵ

-greedy exploration strategy. A theoretical proof is also given to verify the improvement of the exploration strategy based on latent structure matching. Full article

► Show Figures

Figure 1

29 pages, 793 KiB

Open AccessArticle

Competitive Influence Maximization within Time and Budget Constraints in Online Social Networks: An Algorithmic Approach

by Canh V. Pham, Hieu V. Duong, Huan X. Hoang and My T. Thai

Appl. Sci. 2019, 9(11), 2274; https://doi.org/10.3390/app9112274 - 1 Jun 2019

Cited by 16 | Viewed by 4048

Abstract

Competitive Influence Maximization (

CIM

) problem, which seeks a seed set nodes of a player or a company to propagate their product’s information while at the same time their competitors are conducting similar strategies, has been paid much attention recently due to [...] Read more.

Competitive Influence Maximization (

CIM

) problem, which seeks a seed set nodes of a player or a company to propagate their product’s information while at the same time their competitors are conducting similar strategies, has been paid much attention recently due to its application in viral marketing. However, existing works neglect the fact that the limited budget and time constraints can play an important role in competitive influence strategy of each company. In addition, based on the the assumption that one of the competitors dominates in the competitive influence process, the majority of prior studies indicate that the competitive influence function (objective function) is monotone and submodular.This led to the fact that

CIM

can be approximated within a factor of

1 - 1 / e - ϵ

by a Greedy algorithm combined with Monte Carlo simulation method. Unfortunately, in a more realistic scenario where there is fair competition among competitors, the objective function is no longer submodular. In this paper, we study a general case of

CIM

problem, named Budgeted Competitive Influence Maximization (

BCIM

) problem, which considers

CIM

with budget and time constraints under condition of fair competition. We found that the objective function is neither submodular nor suppermodular. Therefore, it cannot admit Greedy algorithm with approximation ratio of

1 - 1 / e

. We propose Sandwich Approximation based on Polling-Based Approximation (

SPBA

), an approximation algorithm based on Sandwich framework and polling-based method. Our experiments on real social network datasets showed the effectiveness and scalability of our algorithm that outperformed other state-of-the-art methods. Specifically, our algorithm is scalable with million-scale networks in only 1.5 min. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 10960 KiB

Open AccessArticle

Adaptive Object Tracking via Multi-Angle Analysis Collaboration

by Wanli Xue, Zhiyong Feng, Chao Xu, Zhaopeng Meng and Chengwei Zhang

Sensors 2018, 18(11), 3606; https://doi.org/10.3390/s18113606 - 24 Oct 2018

Cited by 2 | Viewed by 3343

Abstract

Although tracking research has achieved excellent performance in mathematical angles, it is still meaningful to analyze tracking problems from multiple perspectives. This motivation not only promotes the independence of tracking research but also increases the flexibility of practical applications. This paper presents a [...] Read more.

Although tracking research has achieved excellent performance in mathematical angles, it is still meaningful to analyze tracking problems from multiple perspectives. This motivation not only promotes the independence of tracking research but also increases the flexibility of practical applications. This paper presents a significant tracking framework based on the multi-dimensional state–action space reinforcement learning, termed as multi-angle analysis collaboration tracking (MACT). MACT is comprised of a basic tracking framework and a strategic framework which assists the former. Especially, the strategic framework is extensible and currently includes feature selection strategy (FSS) and movement trend strategy (MTS). These strategies are abstracted from the multi-angle analysis of tracking problems (observer’s attention and object’s motion). The content of the analysis corresponds to the specific actions in the multidimensional action space. Concretely, the tracker, regarded as an agent, is trained with Q-learning algorithm and

ϵ

-greedy exploration strategy, where we adopt a customized rewarding function to encourage robust object tracking. Numerous contrast experimental evaluations on the OTB50 benchmark demonstrate the effectiveness of the strategies and improvement in speed and accuracy of MACT tracker. Full article

(This article belongs to the Special Issue Emerging Algorithms and Applications in Vision Sensors System based on Artificial Intelligence)

► Show Figures

Figure 1

Search Results (16)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (16)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI