This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Comparative Evaluation of Bandit-Style Heuristic Policies for Moving Target Detection in a Linear Grid Environment
by
Hyunmin Kang
Hyunmin Kang 1,2
,
Minho Ahn
Minho Ahn 3,4 and
Yongduek Seo
Yongduek Seo 2,*
1
Digital Healthcare Center, Gumi Electronics & Information Technology Research Institute, Gumi 39253, Republic of Korea
2
Department of Artificial Intelligence, Sogang University, Seoul 04107, Republic of Korea
3
Artificial Intelligence Laboratory, Konan Technology Inc., Seoul 06627, Republic of Korea
4
Department of Defense Acquisition, Konkuk University, Seoul 05209, Republic of Korea
*
Author to whom correspondence should be addressed.
Sensors 2026, 26(1), 226; https://doi.org/10.3390/s26010226 (registering DOI)
Submission received: 27 November 2025
/
Revised: 23 December 2025
/
Accepted: 25 December 2025
/
Published: 29 December 2025
Abstract
Moving-target detection under strict sensing constraints is a recurring subproblem in surveillance, search-and-rescue, and autonomous robotics. We study a canonical one-dimensional finite grid in which a sensor probes one location per time step with binary observations while the target follows reflecting random-walk dynamics. The objective is to minimize the expected time to detection using transparent, training-free decision rules defined on the belief state of the target location. We compare two belief-driven heuristics with purely online implementation: a greedy rule that always probes the most probable location and a belief-proportional sampling (BPS, probability matching) rule that samples sensing locations according to the belief distribution (i.e., posterior probability of the target location). Repeated Monte Carlo simulations quantify the exploitation–exploration trade-off and provide a self-comparison between the two policies. Across tested grid sizes, the greedy policy consistently yields the shortest expected time to detection, improving by roughly 17–20% over BPS and uniform random probing in representative settings. BPS trades some average efficiency for stochastic exploration, which can be beneficial under model mismatch. This study provides an interpretable baseline and quantitative reference for extensions to noisy sensing and higher-dimensional search.
Share and Cite
MDPI and ACS Style
Kang, H.; Ahn, M.; Seo, Y.
Comparative Evaluation of Bandit-Style Heuristic Policies for Moving Target Detection in a Linear Grid Environment. Sensors 2026, 26, 226.
https://doi.org/10.3390/s26010226
AMA Style
Kang H, Ahn M, Seo Y.
Comparative Evaluation of Bandit-Style Heuristic Policies for Moving Target Detection in a Linear Grid Environment. Sensors. 2026; 26(1):226.
https://doi.org/10.3390/s26010226
Chicago/Turabian Style
Kang, Hyunmin, Minho Ahn, and Yongduek Seo.
2026. "Comparative Evaluation of Bandit-Style Heuristic Policies for Moving Target Detection in a Linear Grid Environment" Sensors 26, no. 1: 226.
https://doi.org/10.3390/s26010226
APA Style
Kang, H., Ahn, M., & Seo, Y.
(2026). Comparative Evaluation of Bandit-Style Heuristic Policies for Moving Target Detection in a Linear Grid Environment. Sensors, 26(1), 226.
https://doi.org/10.3390/s26010226
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.