Abstract
Infantile Epileptic Spasms Syndrome (IESS) is a devastating epileptic encephalopathy of infancy that carries a high risk of lifelong neurodevelopmental disability. Timely diagnosis is critical, as every week of delay in effective treatment is associated with worse cognitive outcomes. Although synchronized electroencephalogram (EEG) and surface electromyography (EMG) recordings capture both the electrophysiological and motor signatures of spasms, accurate automated detection remains challenging due to the non-stationary nature of the signals and the absence of physiologically plausible inter-modal fusion in current deep learning approaches. We introduce IESS-FusionNet, an end-to-end dual-stream framework specifically designed for accurate, real-time IESS detection from simultaneous EEG and EMG. Each modality is processed by a dedicated Unimodal Encoder that hierarchically integrates Continuous Wavelet Transform, Spatio-Temporal Convolution, and Bidirectional Mamba to efficiently extract frequency-specific, spatially structured, local and long-range temporal features within a compact module. A novel Cross Time-Mixing module, built upon the linear recurrent attention of the Receptance Weighted Key Value (RWKV) architecture, subsequently performs efficient, time-decaying, bidirectional cross-modal integration that explicitly respects the causal and physiological properties of cortico-muscular coupling during spasms. Evaluated on an in-house clinical dataset of synchronized EEG-EMG recordings from infants with confirmed IESS, IESS-FusionNet achieves 89.5% accuracy, 90.7% specificity, and 88.3% sensitivity, significantly outperforming recent unimodal and multimodal baselines. Comprehensive ablation studies validate the contribution of each component, while the proposed cross-modal fusion requires approximately 60% fewer parameters than equivalent quadratic cross-attention mechanisms, making it suitable for real-time clinical deployment. IESS-FusionNet delivers an accurate, computationally efficient solution with physiologically inspired cross-modal fusion for the automated detection of infantile epileptic spasms, offering promise for future clinical applications in reducing diagnostic delay.