This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Power and Phase Fusion Spectrogram with Three-Dimensional Convolution and Vision Transformer for Seizure Detection
1
School of Integrated Circuits, Shandong University, Jinan 250101, China
2
Shenzhen Research Institute of Shandong University, Shenzhen 518000, China
3
Tandon School of Engineering, New York University, New York, NY 10012, USA
4
Key Laboratory of Social Computing and Cognitive Intelligence, Dalian University of Technology, Ministry of Education, Dalian 116024, China
*
Author to whom correspondence should be addressed.
Diagnostics 2026, 16(13), 2012; https://doi.org/10.3390/diagnostics16132012 (registering DOI)
Submission received: 12 May 2026
/
Revised: 18 June 2026
/
Accepted: 24 June 2026
/
Published: 27 June 2026
Abstract
Background/Objectives: Reliable detection of epileptic seizures using electroencephalography (EEG) is crucial for clinical diagnosis and for alleviating clinicians’ workload. However, existing studies still make insufficient use of phase information, and the synergy between local time–frequency pattern extraction and global dependency modeling remains limited. Methods: We propose a seizure detection framework based on the continuous wavelet transform (CWT), a three-dimensional convolutional neural network (3D-CNN), and a vision transformer (ViT). First, multichannel EEG segments are preprocessed, after which CWT is used to generate power spectrograms and phase spectrograms. These representations are then fused along the depth dimension into a unified power-phase volume and fed into a hybrid network composed of a 3D-CNN feature extractor and a single-layer ViT encoder to jointly learn local time–frequency–channel coupling patterns and higher-level global dependencies. Finally, seizure detection is completed by combining moving-average filtering, thresholding, and collar correction. Results: On the public CHB-MIT dataset and the clinical SH-SDU dataset, the proposed method achieved average segment-level sensitivities of 98.68% and 92.05%, specificities of 98.33% and 97.53%, accuracies of 98.49% and 96.37%, and AUC values of 97.26% and 92.89%, respectively. In event-level evaluation, the average sensitivities were 99.13% and 96.08%, with false detection rates of 0.88/h and 0.69/h, respectively. Further multi-stage ablation experiments together with t-SNE and Grad-CAM visualizations provided qualitative and experimental support for the design rationale of the joint power-phase input and the hybrid 3D-CNN-ViT architecture. Conclusions: The proposed framework effectively exploits the complementary discriminative value of power and phase information in epileptic EEG and demonstrates strong detection performance under patient-specific evaluation on both public and clinically collected datasets.
Share and Cite
MDPI and ACS Style
Jiang, Y.; Wang, Z.; Zhao, Y.; Zhou, W.; Liu, G.
Power and Phase Fusion Spectrogram with Three-Dimensional Convolution and Vision Transformer for Seizure Detection. Diagnostics 2026, 16, 2012.
https://doi.org/10.3390/diagnostics16132012
AMA Style
Jiang Y, Wang Z, Zhao Y, Zhou W, Liu G.
Power and Phase Fusion Spectrogram with Three-Dimensional Convolution and Vision Transformer for Seizure Detection. Diagnostics. 2026; 16(13):2012.
https://doi.org/10.3390/diagnostics16132012
Chicago/Turabian Style
Jiang, Yuyue, Zhuohan Wang, Yazhou Zhao, Weidong Zhou, and Guoyang Liu.
2026. "Power and Phase Fusion Spectrogram with Three-Dimensional Convolution and Vision Transformer for Seizure Detection" Diagnostics 16, no. 13: 2012.
https://doi.org/10.3390/diagnostics16132012
APA Style
Jiang, Y., Wang, Z., Zhao, Y., Zhou, W., & Liu, G.
(2026). Power and Phase Fusion Spectrogram with Three-Dimensional Convolution and Vision Transformer for Seizure Detection. Diagnostics, 16(13), 2012.
https://doi.org/10.3390/diagnostics16132012
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.