Semantic Latent Geometry Reveals Imagination–Perception Structure in EEG
Abstract
1. Introduction
- (G1) evidence that semantically aligned representations support subject-independent decoding of imagination vs. perception across multiple sensory modalities;
- (G2) evidence that such representations expose stable brain–behavior relationships with VVIQ/BAIS beyond what is observed with traditional features and standard classifiers.
- (Q1) Do semantically aligned features enable robust, subject-independent decoding of imagination vs. perception within each modality?
- (Q2) Do VVIQ (visual) and BAIS (auditory) scores relate to (i) per-subject subject-independent decoding metrics and (ii) latent-space separability/compactness indices that quantify semantic geometry?
- (C1) A representation-centric framework for imagination/perception decoding that prioritizes semantic alignment, with multiscale embeddings learned without task labels, demonstrating subject-independent performance across three modalities (addresses G1).
- (C2) To our knowledge, the first trait-aware linkage between psychometric vividness (VVIQ/BAIS) and semantic EEG representations, assessed at two levels: (i) subject-independent decoding metrics and (ii) representation geometry (latent separability/compactness and cross-modal overlap), yielding interpretable brain–behavior associations that standard pipelines often miss (addresses G2).
- (C3) An empirical comparison against conventional models, showing that while direct accuracy–VVIQ/BAIS correlations are weak and imprecise in this cohort, representation-level semantic geometry diagnostics remain informative when outcome metrics saturate, revealing robust imagination/perception structure that conventional feature sets fail to expose.
2. Methodology
2.1. Dataset
- Pictorial: Images with three levels of complexity, i.e., simple, intermediate, and naturalistic, with eight exemplars at the simple level and nine at both intermediate and naturalistic for flower and guitar, and nine at each level for penguin.
- Orthographic: Thirty text stimuli per category, presented in five colors and six font styles.
- Audio: Spoken words recorded in three voice tones (normal, low, high).
2.1.1. Trial Structure and Epoching
2.1.2. Labeling
2.1.3. Data Volume and Sessions
2.2. Preprocessing
2.3. Unsupervised Semantic Feature Learning
2.4. Supervised Imagination (I) ↔ Perception (P) Decoding Heads
2.5. Evaluation Protocol
2.6. Latent-Geometry Diagnostics
- Greedy correlation filtering: Within each subject × phase × modality, we iteratively prune latent channels whose absolute Pearson correlation with an already-kept channel exceeds a threshold (). This yields a retained set of latent-channel indices at each . We evaluate a grid () (8 thresholds).
- Keep Ratio: At each , the fraction of latent channels retained after pruning, . We define (with ). We also summarize the per-subject mean keep ratio, , which is later used as a sparsity covariate.
- Semantic Sensitivity Index (SSI): The signed area under the curve (AUC) of the phase-difference in Keep_Ratio, , whereIn practice, the SSI is computed by trapezoidal integration over the predefined grid, preserving the sign of .
- Full curves: Subject-wise trajectories across () to localize where phase differences emerge (e.g., only at aggressive pruning).
- Cross-Modal Overlap (CMO): Jaccard overlap of kept-feature sets between modality pairs at matched (Audio–Pictorial, Audio–Orthographic, Orthographic–Pictorial). We report per-subject means across thresholds and distributions across subjects. At each , for a modality pair , we compute and summarize it by averaging over the grid.
- Using a predefined grid avoids committing to a single arbitrary correlation cutoff and reduces the risk of cherry-picking. Accordingly, our primary indices (SSI and mean CMO) integrate information over , while threshold-resolved trajectories are reported to show where effects emerge and to verify robustness.
2.7. Trait-Aware Correlation Analysis (VVIQ/BAIS)
- Performance indices: Modality-specific LOSO balanced accuracy. We also summarize P→I and I→P transfer within LOSO.
- Geometry indices: (overall and per modality), full behavior, and CMO.Questionnaires (VVIQ, BAIS) are used as released; subjects missing a questionnaire are excluded listwise for that trait. For partial correlations, we control for meanKR (global sparsity) and the other questionnaire (e.g., the partial correlation between SSI and VVIQ controlling for meanKR and BAIS).
2.8. Statistics
2.9. Mixed-Effects Analysis of Keep_Ratio
3. Results
3.1. Unsupervised Pretraining
3.2. Supervised Downstream Decoding
3.3. Correlation Analysis
3.3.1. Semantic Sensitivity Index (SSI)
3.3.2. Cross-Modal Overlap (CMO)
4. Discussion
- First, the proposed model achieves consistently high LOSO decoding performance across modalities (Figure 4), yet accuracy shows weak and imprecise associations with self-reported imagery traits (Supplementary Table S5);
- Third, these probes provide mechanistic descriptors of how cognition occupies the learned space (compactness/redundancy/overlap), complementing performance-centric evaluation.
4.1. Accuracy–Vividness Mismatch
4.2. Subject-Specific Heterogeneity and Threshold Sensitivity
4.3. Latent Geometry vs. Vividness
- SSI indicates whether imagination tends to retain more (or less) non-redundant latent support than perception as sparsification tightens. The SSI distribution shows a meaningful between-subject spread around a near-zero center (Figure 6), consistent with heterogeneous phase preferences that are not captured by accuracy.
- Full curves localize where differences occur across the threshold grid (Supplementary Figure S4). Several subjects exhibit positive imagination margins primarily at aggressive pruning (large ), suggesting that the most selective latent channels may preferentially support imagery for those individuals.
- CMO indicates substantial cross-modal overlap of retained-feature sets (Figure 7), consistent with a modality-invariant latent core. This cross-modal stability provides a representation-level explanation for why frozen-latent decoders can perform strongly across modality-specific tasks.
4.4. Why Questionnaire Alignment Is Weak in This Cohort
4.5. Implications
- High accuracy is necessary but not sufficient for scientific interpretation: strong LOSO performance can coexist with weak trait alignment when heterogeneous strategies are collapsed into a single scalar metric.
- Representation-aware diagnostics provide interpretable mechanistic descriptors—compactness under pruning (SSI), modality-invariant structure (CMO), and threshold-localized phase differences ()—that complement outcome metrics.
- The observed cross-modal overlap supports the design of decoders that explicitly target modality-invariant structure, which may be essential for generalization beyond tightly controlled laboratory settings.
- The magnitude of CMO (0.5–0.8) implies that, after redundancy pruning at matched thresholds, roughly 50–80% of the retained latent channels are shared across modality pairs, suggesting a modality-invariant semantic core. Practically, this supports parameter sharing or a unified decoder across modalities and motivates geometry-aware regularization (or multi-modal training objectives) that explicitly encourages cross-modal alignment to improve robustness when the stimulus modality varies or is partially missing.
4.6. Limitations and Future Directions
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| Acronym | Definition | Acronym | Definition |
| AUC | Area Under the Curve | GAP | Global Average Pooling |
| BAIS | Bucknell Auditory Imagery Scale | ICA | Independent Component Analysis |
| BAIS-V | BAIS—Vividness subscale | I↔P | Imagination ↔ Perception |
| BCa | Bias-Corrected and Accelerated (bootstrap) | I→P | Train on Imagination, Test on Perception |
| BCI | Brain–Computer Interface | LR | Learning Rate |
| BIDS | Brain Imaging Data Structure | LSL | Lab Streaming Layer |
| BH | Benjamini–Hochberg (FDR control) | LOSO | Leave-One-Subject-Out |
| BLUP | Best Linear Unbiased Predictor | MNE | MNE-Python toolkit |
| CAR | Common Average Reference | MSE | Mean Squared Error |
| CI | Confidence Interval | P→I | Train on Perception, Test on Imagination |
| CMO | Cross-Modal Overlap | PyPREP | Python PREP (bad-channel handling) |
| CNN | Convolutional Neural Network | REML | Restricted Maximum Likelihood |
| CPz | Midline Central–parietal Electrode | RF | Random Forest |
| CSP | Common Spatial Pattern | RNN | Recurrent Neural Network |
| EEG | Electroencephalography | SD | Standard Deviation |
| EOG | Electrooculography | SEM | Standard Error of the Mean |
| ERP | Event-Related Potential | SNR | Signal-to-Noise Ratio |
| FBCSP | Filter Bank Common Spatial Pattern | SSI | Semantic Sensitivity Index |
| FDR | False Discovery Rate | SVM | Support Vector Machine |
| FFN | Feed-Forward Network | t-SNE | t-Distributed Stochastic Neighbor Embedding |
| FIR | Finite Impulse Response | VVIQ | Vividness of Visual Imagery Questionnaire |
References
- Pulvermüller, F.; Shtyrov, Y. Language outside the focus of attention: The mismatch negativity as a tool for studying higher cognitive processes. Prog. Neurobiol. 2006, 79, 49–71. [Google Scholar] [CrossRef]
- Michel, C.M.; Brunet, D. EEG Source Imaging: A Practical Review of the Analysis Steps. Front. Neurol. 2019, 10, 446653. [Google Scholar] [CrossRef]
- Farah, M.J. Is visual imagery really visual? Overlooked evidence from neuropsychology. Psychol. Rev. 1988, 95, 307–317. [Google Scholar] [CrossRef]
- Kosslyn, S.M.; Ganis, G.; Thompson, W.L. Neural foundations of imagery. Nat. Rev. Neurosci. 2001, 2, 635–642. [Google Scholar] [CrossRef] [PubMed]
- Dijkstra, N.; Bosch, S.E.; van Gerven, M.A. Vividness of Visual Imagery Depends on the Neural Overlap with Perception in Visual Areas. J. Neurosci. 2017, 37, 1367–1373. [Google Scholar] [CrossRef] [PubMed]
- Binder, J.R.; Desai, R.H.; Graves, W.W.; Conant, L.L. Where Is the Semantic System? A Critical Review and Meta-Analysis of 120 Functional Neuroimaging Studies. Cereb. Cortex 2009, 19, 2767–2796. [Google Scholar] [CrossRef] [PubMed]
- Schacter, D.L.; Addis, D.R.; Hassabis, D.; Martin, V.C.; Spreng, R.N.; Szpunar, K.K. The Future of Memory: Remembering, Imagining, and the Brain. Neuron 2012, 76, 677–694. [Google Scholar] [CrossRef]
- Rybář, M.; Daly, I. Neural Decoding of Semantic Concepts: A Systematic Literature Review. J. Neural Eng. 2022, 19, 021002. [Google Scholar] [CrossRef]
- Rekrut, M.; Sharma, M.; Schmitt, M.; Alexandersson, J.; Krüger, A. Decoding Semantic Categories from EEG Activity in Object-Based Decision Tasks. In Proceedings of the 2020 8th International Winter Conference on Brain-Computer Interface (BCI), Gangwon, Republic of Korea, 26–28 February 2020; pp. 1–7. [Google Scholar] [CrossRef]
- Lee, K.-W.; Lee, D.-H.; Kim, S.-J.; Lee, S.-W. Decoding Neural Correlation of Language-Specific Imagined Speech using EEG Signals. In Proceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Glasgow, UK, 11–15 July 2022; pp. 1977–1980. [Google Scholar] [CrossRef]
- Ahmadi, H.; Mesin, L. Enhancing MI EEG Signal Classification With a Novel Weighted and Stacked Adaptive Integrated Ensemble Model: A Multi-Dataset Approach. IEEE Access 2024, 12, 103626–103646. [Google Scholar] [CrossRef]
- Ahmadi, H.; Costa, P.; Mesin, L. A Novel Hierarchical Binary Classification for Coma Outcome Prediction Using EEG, CNN, and Traditional ML Approaches. TechRxiv 2024. [Google Scholar] [CrossRef]
- Ahmadi, H.; Kuhestani, A.; Mesin, L. Adversarial Neural Network Training for Secure and Robust Brain-to-Brain Communication. IEEE Access 2024, 12, 39450–39469. [Google Scholar] [CrossRef]
- Ahmadi, H.; Kuhestani, A.; Keshavarzi, M.; Mesin, L. Securing Brain-to-Brain Communication Channels Using Adversarial Training on SSVEP EEG. IEEE Access 2025, 13, 14358–14378. [Google Scholar] [CrossRef]
- Marks, D.F. Visual imagery differences in the recall of pictures. Br. J. Psychol. 1973, 64, 17–24. [Google Scholar] [CrossRef] [PubMed]
- Shinkareva, S.V.; Malave, V.L.; Mason, R.A.; Mitchell, T.M.; Just, M.A. Commonality of neural representations of words and pictures. NeuroImage 2011, 54, 2418–2425. [Google Scholar] [CrossRef] [PubMed]
- Halpern, A.R. Differences in auditory imagery self-report predict neural and behavioral outcomes. Psychomusicol. Music Mind Brain 2015, 25, 37. Available online: https://api.semanticscholar.org/CorpusID:143273929 (accessed on 12 November 2025). [CrossRef]
- Rueschemeyer, A. Cross-Modal Integration of Lexical-Semantic Features during Word Processing: Evidence from Oscillatory Dynamics during EEG. PLoS ONE 2014, 9, e101042. [Google Scholar] [CrossRef]
- Huang, J.; Chang, Y.; Li, W.; Tong, J.; Du, S. A Spatio-Temporal Capsule Neural Network with Self-Correlation Routing for EEG Decoding of Semantic Concepts of Imagination and Perception Tasks. Sensors 2024, 24, 5988. [Google Scholar] [CrossRef]
- Chen, H.; He, L.; Liu, Y.; Yang, L. Visual Neural Decoding via Improved Visual-EEG Semantic Consistency. arXiv 2024. Available online: https://arxiv.org/abs/2408.06788 (accessed on 24 October 2025).
- Ahmadi, H.; Mesin, L. Universal semantic feature extraction from EEG signals: A task-independent framework. J. Neural. Eng. 2025, 22, 036003. [Google Scholar] [CrossRef]
- Fahimi Hnazaee, M.; Khachatryan, E.; Van Hulle, M.M. Semantic Features Reveal Different Networks During Word Processing: An EEG Source Localization Study. Front. Hum. Neurosci. 2018, 12, 503. [Google Scholar] [CrossRef]
- Zeng, H.; Xia, N.; Qian, D.; Hattori, M.; Wang, C.; Kong, W. DM-RE2I: A Framework Based on Diffusion Model for the Reconstruction from EEG to Image. Biomed. Signal Process. Control 2023, 86, 105125. [Google Scholar] [CrossRef]
- Ahmadi, H.; Mesin, L. Decoding Visual Imagination and Perception from EEG via Topomap Sequences. In Proceedings of the 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Copenhagen, Denmark, 14–17 July 2025; pp. 1–7. [Google Scholar] [CrossRef]
- Zeng, H.; Xia, N.; Tao, M.; Pan, D.; Zheng, H.; Wang, C.; Xu, F.; Zakaria, W.; Dai, G. DCAE: A Dual Conditional Autoencoder Framework for the Reconstruction from EEG into Image. Biomed. Signal Process. Control 2023, 81, 104440. [Google Scholar] [CrossRef]
- Wilson, H.; Golbabaee, M.; Proulx, M.J.; Charles, S. EEG-based BCI Dataset of Semantic Concepts for Imagination and Perception Tasks. Sci. Data 2023, 10, 386. [Google Scholar] [CrossRef]
- Wang, F.; Ke, H.; Cai, C. Deep Wavelet Self-Attention Non-negative Tensor Factorization for non-linear analysis and classification of fMRI data. Appl. Soft Comput. 2025, 182, 113522. [Google Scholar] [CrossRef]
- Lawhern, V.J.; Solon, A.J.; Waytowich, N.R.; Gordon, S.M.; Hung, C.P.; Lance, B.J. EEGNet: A compact convolutional neural network for EEG-based brain-computer interfaces. J. Neural Eng. 2018, 15, 056013. [Google Scholar] [CrossRef]
- Schirrmeister, R.T.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef]
- Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic Routing Between Capsules. arXiv 2017. Available online: https://arxiv.org/abs/1710.09829 (accessed on 26 October 2025).
- Song, Y.; Jia, X.; Yang, L.; Xie, L. Transformer-based Spatial-Temporal Feature Learning for EEG Decoding. arXiv 2021. Available online: https://arxiv.org/abs/2106.11170 (accessed on 11 November 2025).
- Liu, Z.; Mao, H.; Wu, C.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. arXiv 2022. Available online: https://arxiv.org/abs/2201.03545 (accessed on 26 October 2025).
- Li, C.; Wang, B.; Zhang, S.; Liu, Y.; Song, R.; Cheng, J.; Chen, X. Emotion recognition from EEG based on multi-task learning with capsule network and attention mechanism. Comput. Biol. Med. 2022, 143, 105303. [Google Scholar] [CrossRef] [PubMed]
- Song, Y.; Zheng, Q.; Liu, B.; Gao, X. EEG Conformer: Convolutional Transformer for EEG Decoding and Visualization. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 710–719. [Google Scholar] [CrossRef]
- Miao, Z.; Zhao, M.; Zhang, X.; Ming, D. LMDA-Net: A lightweight multi-dimensional attention network for general EEG-based brain-computer interfaces and interpretability. NeuroImage 2023, 276, 120209. [Google Scholar] [CrossRef] [PubMed]







Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ahmadi, H.; Impagnatiello, M.; Mesin, L. Semantic Latent Geometry Reveals Imagination–Perception Structure in EEG. Appl. Sci. 2026, 16, 661. https://doi.org/10.3390/app16020661
Ahmadi H, Impagnatiello M, Mesin L. Semantic Latent Geometry Reveals Imagination–Perception Structure in EEG. Applied Sciences. 2026; 16(2):661. https://doi.org/10.3390/app16020661
Chicago/Turabian StyleAhmadi, Hossein, Martina Impagnatiello, and Luca Mesin. 2026. "Semantic Latent Geometry Reveals Imagination–Perception Structure in EEG" Applied Sciences 16, no. 2: 661. https://doi.org/10.3390/app16020661
APA StyleAhmadi, H., Impagnatiello, M., & Mesin, L. (2026). Semantic Latent Geometry Reveals Imagination–Perception Structure in EEG. Applied Sciences, 16(2), 661. https://doi.org/10.3390/app16020661

