Abstract
The rapid growth of online chess has intensified the challenge of distinguishing engine-assisted from authentic human play, exposing the limitations of existing approaches that rely solely on deterministic evaluation metrics. This study introduces a proof-of-concept hybrid framework for discriminating between engine-like and human-like chess play patterns, integrating Stockfish’s deterministic evaluations with stylometric behavioral features derived from the Maia engine. Key metrics include Centipawn Loss (CPL), Mismatch Move Match Probability (MMMP), and a novel Curvature-Based Stability (ΔS) indicator. These features were incorporated into a convolutional neural network (CNN) classifier and evaluated on a controlled benchmark dataset of 1000 games, where ‘suspicious’ gameplay was algorithmically generated to simulate engine-optimal patterns, while ‘clean’ play was modeled using Maia’s human-like predictions. Results demonstrate the framework’s ability to discriminate between these behavioral archetypes, with the hybrid model achieving a macro F1-score of 0.93, significantly outperforming the Stockfish-only baseline (F1 = 0.87), as validated by McNemar’s test (p = 0.0153). Feature ablation confirmed that Maia-derived features reduced false negatives and improved recall, while ΔS enhanced robustness. This work establishes a methodological foundation for behavioral pattern discrimination in chess, demonstrating the value of combining deterministic and human-centric modeling. Beyond chess, the approach offers a template for behavioral anomaly analysis in cybersecurity, education, and other decision-based domains, with real-world validation on adjudicated misconduct cases identified as the essential next step.