Block-Wise State Encoding for Action-Masked Reinforcement Learning in Flexible Job-Shop Scheduling

Hrishchenko, Kostiantyn; Pysarchuk, Oleksii

doi:10.3390/a19060423

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Block-Wise State Encoding for Action-Masked Reinforcement Learning in Flexible Job-Shop Scheduling

by

Kostiantyn Hrishchenko

^*

and

Oleksii Pysarchuk

Faculty of Informatics and Computer Science, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, 03056 Kyiv, Ukraine

^*

Author to whom correspondence should be addressed.

Algorithms 2026, 19(6), 423; https://doi.org/10.3390/a19060423 (registering DOI)

Submission received: 23 April 2026 / Revised: 13 May 2026 / Accepted: 21 May 2026 / Published: 23 May 2026

(This article belongs to the Special Issue Machine Learning for Planning and Logistics)

Download Versions Notes

Abstract

This paper addresses the flexible job-shop scheduling problem (FJSP) as a constrained combinatorial optimization task with a large discrete action space. Although action-masked reinforcement learning has shown promise for such problems, the effect of structured vector-state encoding in scheduling has received less attention. The main contribution of this work is a structured block-wise state representation and a multi-branch feature extraction module for action-masked Proximal Policy Optimization (PPO). The proposed representation decomposes the scheduling state into three heterogeneous components capturing resource availability, operation readiness, and temporal attributes of operation–machine alternatives. Instead of flattening these signals into a single vector, the proposed encoder processes each block separately before aggregation, with the aim of preserving semantic structure during policy learning. To isolate the effect of representation design, we compare the proposed multi-branch encoder with a baseline single-branch multilayer perceptron under identical PPO hyperparameters and training conditions. Experiments on the Brandimarte MK benchmark suite show that the proposed architecture yields a lower best-achieved makespan on nine of ten instances and improves the best baseline result by up to 27.84%. Additional validation on selected Behnke and Geiger instances indicates that the BR encoder’s advantage extends to larger FJSP cases while preserving sub-second inference.

Keywords: artificial intelligence; reinforcement learning; deep learning; combinatorial optimization; flexible job-shop scheduling; resource allocation; action masking; state representation

Share and Cite

MDPI and ACS Style

Hrishchenko, K.; Pysarchuk, O. Block-Wise State Encoding for Action-Masked Reinforcement Learning in Flexible Job-Shop Scheduling. Algorithms 2026, 19, 423. https://doi.org/10.3390/a19060423

AMA Style

Hrishchenko K, Pysarchuk O. Block-Wise State Encoding for Action-Masked Reinforcement Learning in Flexible Job-Shop Scheduling. Algorithms. 2026; 19(6):423. https://doi.org/10.3390/a19060423

Chicago/Turabian Style

Hrishchenko, Kostiantyn, and Oleksii Pysarchuk. 2026. "Block-Wise State Encoding for Action-Masked Reinforcement Learning in Flexible Job-Shop Scheduling" Algorithms 19, no. 6: 423. https://doi.org/10.3390/a19060423

APA Style

Hrishchenko, K., & Pysarchuk, O. (2026). Block-Wise State Encoding for Action-Masked Reinforcement Learning in Flexible Job-Shop Scheduling. Algorithms, 19(6), 423. https://doi.org/10.3390/a19060423

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Block-Wise State Encoding for Action-Masked Reinforcement Learning in Flexible Job-Shop Scheduling

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI