MDPI - Publisher of Open Access Journals

14 pages, 2101 KiB

Open AccessArticle

Policy-Based Reinforcement Learning Approach in Imperfect Information Card Game

by Kamil Chrustowski and Piotr Duch

Appl. Sci. 2025, 15(4), 2121; https://doi.org/10.3390/app15042121 - 17 Feb 2025

Cited by 1 | Viewed by 1212

Games provide an excellent testing ground for machine learning and artificial intelligence, offering diverse environments with strategic challenges and complex decision-making scenarios. This study seeks to design a self-learning artificial intelligent agent capable of playing the trick-taking stage of the popular card game Thousand, known for its complex bidding system and dynamic gameplay. Due to the game’s vast state space and strategic complexity, other artificial intelligence approaches, such as Monte Carlo Tree Search and Deep Counterfactual Regret Minimisation, are infeasible. To address these challenges, the enhanced version of the REINFORCE policy gradient algorithm is proposed. Introducing a score-related parameter

β

designed to guide the learning process by prioritising valuable games, the proposed approach enhances policy updates and improves overall learning outcomes. Moreover, leveraging the off-policy experience replay, along with the importance weighting of behavioural policy, enhanced training stability and reduced model variance. The proposed algorithm was applied to the trick-taking stage of the popular game Thousand Schnapsen in a two-player setup. Four distinct neural network models were explored to evaluate the performance of the proposed approach. A custom test suite of selected deals and tournament evaluations was employed to assess effectiveness. Comparisons were made against two benchmark strategies: a random strategy agent and an alpha-beta pruning tree search with varying search depths. The proposed algorithm achieved win rates exceeding 65% against the random agent, nearly 60% against alpha-beta pruning at a search depth of 6, and 55% against alpha-beta pruning at the maximum possible depth. Full article

(This article belongs to the Special Issue Advancements and Applications in Reinforcement Learning)

► Show Figures

Figure 1

25 pages, 5483 KiB

Open AccessArticle

Automated Negotiation Agents for Modeling Single-Peaked Bidders: An Experimental Comparison

by Fatemeh Hassanvand, Faria Nassiri-Mofakham and Katsuhide Fujita

Information 2024, 15(8), 508; https://doi.org/10.3390/info15080508 - 22 Aug 2024

Cited by 1 | Viewed by 1175

Abstract

During automated negotiations, intelligent software agents act based on the preferences of their proprietors, interdicting direct preference exposure. The agent can be armed with a component of an opponent’s modeling features to reduce the uncertainty in the negotiation, but how negotiating agents with a single-peaked preference direct our attention has not been considered. Here, we first investigate the proper representation of single-peaked preferences and implementation of single-peaked agents within bidder agents using different instances of general single-peaked functions. We evaluate the modeling of single-peaked preferences and bidders in automated negotiating agents. Through experiments, we reveal that most of the opponent models can model our benchmark single-peaked agents with similar efficiencies. However, the accuracies differ among the models and in different rival batches. The perceptron-based P1 model obtained the highest accuracy, and the frequency-based model Randomdance outperformed the other competitors in most other performance measures. Full article

(This article belongs to the Special Issue Intelligent Agent and Multi-Agent System)

► Show Figures

Figure 1

20 pages, 2485 KiB

Open AccessArticle

An Investigation on a Closed-Loop Supply Chain of Product Recycling Using a Multi-Agent and Priority Based Genetic Algorithm Approach

by Yong-Tong Chen and Zhong-Chen Cao

Mathematics 2020, 8(6), 888; https://doi.org/10.3390/math8060888 - 2 Jun 2020

Cited by 6 | Viewed by 2863

Abstract

Product recycling issues have gained increasing attention in many industries in the last decade due to a variety of reasons driven by environmental, governmental and economic factors. Closed-loop supply chain (CLSC) models integrate the forward and reverse flow of products. Since the optimization of these CLSC models is known to be NP-Hard, competition on optimization quality in terms of solution quality and computational time becomes one of the main focuses in the literature in this area. A typical six-level closed-loop supply chain network is examined in this paper, which has great complexity due to the high level of echelons. The proposed solution uses a multi-agent and priority based approach which is embedded within a two-stage Genetic Algorithm (GA), decomposing the problem into (i) product flow, (ii) demand allocation and (iii) pricing bidding process. To test and demonstrate the optimization quality of the proposed algorithm, numerical experiments have been carried out based on the well-known benchmarking network. The results prove the reliability and efficiency of the proposed approach compared to LINGO and the benchmarking algorithm discussed in the literature. Full article

(This article belongs to the Special Issue Advances in Statistical Process Control and Their Applications)

► Show Figures

Figure 1

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI