You are currently viewing a new version of our website. To view the old version click .
Journal of Imaging
  • Article
  • Open Access

15 December 2025

Sensory Representation of Neural Networks Using Sound and Color for Medical Imaging Segmentation

,
and
1
IT Department, Computer Engineering, School of Engineering, University of Minho, 4704-553 Braga, Portugal
2
Algoritmi/LASI, University of Minho, 4704-553 Braga, Portugal
3
Faculty of Sciences and Technology, University of Azores, 9500-321 Ponta Delgada, Portugal
*
Author to whom correspondence should be addressed.
This article belongs to the Section Medical Imaging

Abstract

This paper introduces a novel framework for sensory representation of brain imaging data, combining deep learning-based segmentation with multimodal visual and auditory outputs. Structural magnetic resonance imaging (MRI) predictions are converted into color-coded maps and stereophonic/MIDI sonifications, enabling intuitive interpretation of cortical activation patterns. High-precision U-Net models efficiently generate these outputs, supporting clinical decision-making, cognitive research, and creative applications. Spatial, intensity, and anomalous features are encoded into perceivable visual and auditory cues, facilitating early detection and introducing the concept of “auditory biomarkers” for potential pathological identification. Despite current limitations, including dataset size, absence of clinical validation, and heuristic-based sonification, the pipeline demonstrates technical feasibility and robustness. Future work will focus on clinical user studies, the application of functional MRI (fMRI) time-series for dynamic sonification, and the integration of real-time emotional feedback in cinematic contexts. This multisensory approach offers a promising avenue for enhancing the interpretability of complex neuroimaging data across medical, research, and artistic domains.

1. Introduction

Advances in medical image segmentation have substantially improved the accuracy of identifying brain structures and delineating pathological regions. Despite these technical improvements, interpreting segmentation outputs remains challenging, especially for clinicians and researchers without specialized expertise in artificial intelligence (AI) or computational visualization. Traditional visualization techniques—such as grayscale images or heatmaps—often fail to convey the multidimensional nature of neuroimaging data intuitively. This limits both accessibility and practical usability, preventing the full exploitation of AI-based neuroimaging tools in clinical and interdisciplinary applications.
Effective interpretation of complex neuroimaging data requires not only technical precision but also cognitive accessibility. Static visualizations may impose a high cognitive load, making it difficult for non-specialist users to understand brain activation patterns or morphological variations. Multimodal visualization approaches, integrating multiple perceptual channels such as color and sound, provide an opportunity to translate abstract computational results into perceptually intuitive representations. This approach aligns with human sensory integration, where the brain fuses visual, auditory, and somatosensory inputs to form coherent, context-dependent representations [1]. However, there remains a practical and scientific gap in developing frameworks that enable intuitive, multimodal interpretability of AI-based segmentation outputs, particularly in neuroimaging.
Several empirical studies confirm that non-expert users, including clinicians without radiological specialization, experience significant difficulty interpreting complex imaging outputs or AI-based heatmaps [2,3,4]. These challenges often stem from the lack of perceptual cues and cognitive overload associated with multidimensional medical data [2]. Moreover, recent research highlights that multimodal cues, such as color-coded visualizations and auditory sonification, can improve pattern recognition, enhance interpretability, and reduce decision-making time for non-technical audiences [5,6]. These findings reinforce the necessity for perceptually grounded, accessible frameworks that enable intuitive interaction with medical images.
To address this limitation, we introduce a novel sensory representation framework that translates segmentation predictions into auditory and visual modalities. The primary objective is to enhance interpretability, accessibility, and engagement with medical imaging data, allowing users from diverse backgrounds including clinicians, neuroscientists, and creative professionals to interact with complex brain representations in a perceptually intuitive manner. This framework builds on recent advances in multimodal neuroimaging fusion, which show that integrating multiple sensory modalities can reveal complementary aspects of neural dynamics and facilitate human–AI interpretability [7,8,9].
From a neuroscientific perspective, human perception relies heavily on prior experience and multisensory integration to interpret complex stimuli. Auditory and visual information is dynamically combined within temporal and parietal cortical regions, supporting cognition, language, emotion, and memory [10,11,12]. Functional neuroimaging studies using fMRI reveal how distributed cortical networks synchronize to represent sensory and affective experiences [9]. These paradigms extend beyond controlled laboratory conditions to naturalistic stimuli—such as films or music—enabling the investigation of brain function in ecologically valid contexts [13,14,15]. Such multimodal and dynamic scenarios highlight the importance of perceptual integration in understanding neural activity.
In computational neuroscience, sensory representation translates abstract neural activations into interpretable perceptual forms. Models leverage top-down modulation mechanisms to enhance categorization and perceptual fidelity [16,17]. Features extracted from neural networks—such as intensity, spatial extent, and anomalous activity—can be encoded into perceivable cues, including color-coded visuals and auditory signals, reflecting meaningful brain patterns. This facilitates early detection of abnormalities and intuitive comprehension of complex neurobiological structures [18,19,20].
Accordingly, the present work introduces a multimodal sensory framework that integrates deep learning-based segmentation with perceptual translation through sound and color. By bridging machine perception with human intuition, it contributes to the emerging field of interpretable AI in medical imaging.
This work makes the following key contributions:
1.
We introduce a unified framework that converts deep learning-based segmentation outputs into perceptual sound (MIDI/sonification) and visual (color-coded) modalities, supporting sensory-based data interpretation.
2.
We demonstrate how multimodal mappings enhance interpretability and accessibility in neuroimaging, fostering cross-disciplinary understanding among clinicians, neuroscientists, and creative professionals.
3.
We propose the concept of “auditory biomarkers”, a perceptual mechanism for potential pathological identification linking sensory cues to early diagnostic features.

3. Materials and Methods

This study proposes a multimodal sensory pipeline for brain tumor analysis that extends beyond conventional segmentation methods. By leveraging outputs from deep learning segmentation models, our pipeline generates interpretable visual, auditory, and musical representations of brain MRI scans, facilitating anomaly detection and pattern recognition in clinical contexts.

3.1. Dataset

We used the Low-Grade Glioma (LGG) Segmentation Dataset, containing MRI scans with manually annotated masks highlighting abnormalities in the FLAIR sequence. The dataset includes 110 patients from Cancer Genome Atlas (TCGA) LGG collection, available via Cancer Imaging Archive (TCIA) [34,35].
For implementation, we accessed the curated Kaggle version (https://www.kaggle.com/datasets/mateuszbuda/lgg-mri-segmentation (accessed on 3 March 2024) reorganized into patient-wise folders pairing MRI slices and segmentation masks. Image paths were stored in arrays, and masks were filtered via the _mask suffix using glob and pandas.

3.2. Patient-Wise Splitting and Preprocessing

To avoid slice-level data leakage, patient IDs were randomly assigned into training (90%), validation (5%), and test (5%) sets using fixed seeds NumPy = 2.0.2, TensorFlow = 2.19.0, Python = 3.12.12.
The preprocessing pipeline included the following:
1.
listing and loading: paired images and masks indexed by patient ID.
2.
resampling and resizing: isotropic resolution and cropping/padding to 256 × 256 pixels.
3.
normalization: intensity scaled to [ 0 , 1 ] .
4.
data augmentation (train only): random flips (p = 0.5), rotations ( ± 7 ), elastic deformations, Gaussian noise, intensity shifts ( ± 10 % ), and motion blur with random linear kernels (size 3–7 pixels, directions: horizontal, vertical, diagonal) to simulate patient motion artifacts. Validation and test sets were unaugmented.

3.3. Proposed Model Architectures

This section provides a clear and technically detailed description of the four neural architectures evaluated in this study: U-Net, DeepLab, Graph Attention Network (GAT), and Sparse Graph Neural Network (SGNN). The aim is to balance scientific rigor with an accessible narrative while offering explicit component-level details that correspond directly to the block diagrams presented. Each model is described following the same structure: (i) input representation, (ii) core architectural components, (iii) decoding or reconstruction strategy, and (iv) segmentation output.
Overall, the four architectures share a common preprocessing pipeline (normalization and skull-stripping) and generate a binary segmentation mask, which is subsequently transformed into a visual color map and an auditory sonification output (Figure 1, Figure 2, Figure 3 and Figure 4).
Figure 1. Pipeline overview: Each model (U-Net, DeepLab, GAT, SGNN) is run individually, producing a segmentation mask that is converted into a color map and sent to the sonification module.
Figure 2. Diagrams of the U-Net, DeepLab, GAT, and SGNN architectures, scaled to fit within the MDPI page layout limits. (a) U-Net: encoder–decoder structure with symmetric skip connections. (b) DeepLab: Atrous Spatial Pyramid Pooling (ASPP) module. (c) GAT: message passing with learned attention coefficients. (d) SGNN: patch-based graph construction followed by a spectral GNN leading to the final segmentation output.
Figure 3. Sonification workflow: extraction of mask-derived features, auditory mapping, and synthesis.
Figure 4. Color map generation pipeline: normalization → colormap function → RGB visualization.

3.3.1. U-Net

U-Net serves as the convolutional baseline, chosen for its robustness in biomedical segmentation. The network adopts a symmetric encoder–decoder structure, allowing the model to integrate global context while preserving spatial resolution through skip connections.
Encoder: Four downsampling stages, each consisting of two 3 × 3 convolutions (BatchNorm + ReLU), followed by a 2 × 2 max-pooling layer. Bottleneck: Two convolutions with increased channel depth to capture high-level features. Decoder: Four upsampling stages using transposed convolutions, each concatenated with the corresponding encoder feature maps; every stage includes two convolutions (BN + ReLU). Output Layer: A final 1 × 1 convolution with sigmoid activation produces the binary mask. This architecture corresponds to the block diagram in Figure 2a.

3.3.2. DeepLab

The modified DeepLab incorporates multi-scale feature extraction and an attention-augmented encoder to strengthen discrimination of tumor boundaries.
Residual Encoder: Four residual blocks, each composed of two 3 × 3 convolutions (BN + ReLU) and a skip connection. A global channel-attention mechanism (global average pooling + dense layers) modulates feature importance. ASPP Module: Atrous Spatial Pyramid Pooling with dilation rates { 6 , 12 , 18 } extracts contextual information at multiple scales. Decoder: Transposed convolutions progressively restore spatial resolution. Output Layer: A sigmoid-activated 1 × 1 convolution. The corresponding diagram is shown in Figure 2b.

3.3.3. Graph Attention Network (GAT)

The GAT-based architecture reformulates segmentation as a message-passing process over a graph derived from intermediate feature maps, capturing spatial relationships not easily learned by convolutions.
Graph Construction: Feature maps are reshaped into node sequences, with edges defined via cosine similarity between node features, creating a graph where nodes correspond to spatial locations and edges encode feature similarity.
Attention Layer: Multi-head attention (eight heads) computes attention coefficients α i j that quantify the influence of node j on node i,
α i j = softmax j ( LeakyReLU ( a [ W h i W h j ] ) ) ,
with the following:
  • h i , h j R F : node feature vectors.
  • W R F × F : learnable linear transformation.
  • [ · · ] : concatenation.
  • a R 2 F : learnable vector for attention scoring.
  • LeakyReLU ( · ) : nonlinear activation.
  • softmax j ( · ) : normalizes over neighbors j α i j = 1 .
For multi-head attention with K heads, updated node features are
h i = k = 1 K σ ( j N ( i ) α i j ( k ) W ( k ) h j ) ,
where N ( i ) is the set of neighbors of i, α i j ( k ) and W ( k ) are the head-specific attention coefficients and weights, and σ ( · ) is a nonlinear activation. The outputs of all heads are concatenated (‖) to form the updated node embedding.
Decoder: Node embeddings are reshaped into spatial grids and refined via transposed convolutions to reconstruct the segmentation map. Figure 2c illustrates the conceptual graph structure and attention flow.

3.3.4. Sparse Graph Neural Network (SGNN)

The SGNN extends the GAT design by enforcing sparsity in edge connections, making computation more efficient while retaining key relational information.
Sparse Graph Definition: For each node, only the top-k most similar neighbors (cosine similarity) are preserved. Attention Mechanism: The same multi-head attention formulation as the GAT, applied on a sparsified adjacency matrix.
Graph Decoder: Embeddings are projected into spatial tensor form and decoded via convolutional upsampling. The architecture is depicted in Figure 2d.
Overall, the four architectures represent complementary modeling strategies: convolutional (U-Net), multi-scale contextual (DeepLab), relational (GAT), and sparse-relational (SGNN). Their structural differences are made explicit both in this section and in the block diagrams.

3.4. Comparison with Existing Methods

Conventional segmentation methods such as U-Net and DeepLab provide accurate visual outputs but lack multimodal interpretability. Our approach introduces the following two key innovations:
  • Graph-based spatial modeling: GAT and SGNN architectures capture non-local relationships between brain regions, improving boundary reconstruction and anomaly localization.
  • Multimodal sensory representation: model predictions are converted into auditory (stereo audio) and musical (MIDI) outputs, encoding intensity, texture, and anomaly information for intuitive interpretation.
This approach enables not only accurate segmentation but also synesthetic perception of pathological patterns, which is absent in traditional pipelines.

3.5. Advanced Brain Sonification Generation

The key steps of the multimodal representation pipeline are as follows:
  • Z-score and morphological analysis to identify anomalies relative to normal brain regions.
  • create _advanced_brain_sonification generates stereo audio incorporating melody, harmonics, rhythm, and dissonance to reflect tumor morphology and anomalies.
  • create_advanced_midi_from_brain converts image features into symbolic MIDI for pitch, duration, and rhythm, supporting interactive auditory exploration.
Figure 5 and Figure 6 illustrate the full pipeline, integrating segmentation, graph-based modeling, and auditory/musical outputs.
Figure 5. Overview of the multimodal sensory pipeline for brain MRI segmentation. The pipeline integrates deep learning-based segmentation, anomaly detection, and visual (color-coded maps) and auditory (stereo audio and MIDI) representations.
Figure 6. Revised sensory processing pipeline showing the integration of segmentation, graph-based modeling, and multimodal outputs is presented in Appendix A. Each stage transforms MRI data into interpretable visual and auditory representations.

3.6. Training Protocol and Hyperparameters

All models were trained under identical configurations for fair comparison, as follows:
  • Batch size: 32; learning rate: 1 × 10 4 (Adam optimizer).
  • Input: 256 × 256 × 3 ; epochs: maximum 150 (50 per session).
  • Loss: dice coefficient (primary) and Jaccard distance (auxiliary).
  • Scheduler: ReduceLROnPlateau (factor 0.2, patience 5). Early stopping: patience 20.
  • Checkpointing: best validation Dice saved; periodic checkpoint every 5 epochs.

3.7. Evaluation Metrics and Runtime Analysis

Segmentation performance was assessed using the Dice coefficient, Intersection over Union (IoU), and Jaccard distance (smooth factor = 100). Computational metrics included inference time, GPU/CPU usage, VRAM, FLOPs, and trainable parameters. Experiments were conducted on GPU-enabled Google Colab. Summary results are shown in Table 2 and Table 3.
Table 2. Computational complexity of segmentation models. GFLOPs = giga floating-point operations; Parameters in millions.
Table 3. Inference performance comparison on CPU and GPU. FPS = frames per second; Acceleration factor = CPU time/GPU time.

4. Results

We present a multisensory support tool for brain anomaly segmentation. Performance was evaluated using Dice, IoU, and binary accuracy. Binary accuracy is less informative due to class imbalance; Dice and IoU better reflect segmentation quality.

4.1. Comparative Analysis of Architectures

U-Net achieved the highest performance (Dice = 93.25%, IoU = 87.43%, Figure 7) as reported in Table 4, due to skip connections that preserve multi-scale spatial features, enabling precise tumor boundaries. IoU is lower than Dice because it penalizes small boundary errors more heavily: e.g., 93 overlapping pixels with 5 extra false positives and 5 missed pixels yield Dice 93% but IoU 87%. DeepLab captures global context but loses fine details, explaining its slightly lower Dice/IoU.
Figure 7. Comparative performance of segmentation models on the test set, showing Dice coefficient, IoU, and binary accuracy. U-Net achieves the highest spatial coherence for tumor delineation.
Table 4. Segmentation model performance on the test set.
SGNN and GAT underperform due to sparse graph connectivity, diluted attention signals, and reduced ability to capture fine structures (Figure 8). High binary accuracy (>98%) across all models reflects class imbalance, not segmentation quality.
Figure 8. Conceptual comparison of CNN versus GAT/SGNN for dense image segmentation. CNN preserves local neighborhoods leading to accurate and spatially coherent segmentation, whereas GAT/SGNN may lose fine spatial information due to sparse edges and attention mechanisms.

4.2. Benchmarking with Recent Literature

Our U-Net slightly outperforms prior studies, confirming robustness for LGG segmentation. Differences in Dice/IoU are due to dataset variations, preprocessing, and evaluation methods. Figure 7 and Figure 8 visually reinforce that U-Net provides spatially coherent segmentation, while graph-based methods struggle with dense images due to local information loss and attention dilution as reported in Table 5.
Table 5. Benchmarking segmentation performance from the recent literature.

4.3. GAT and SGNN Performance Degradation

GAT and SGNN underperform in dense image segmentation due to the following limitations in capturing local spatial structures as reported in Table 6:
Table 6. Factors affecting GAT/SGNN performance vs. CNNs.
  • Sparse connectivity leads to loss of critical local information.
  • Cosine-similarity edges may introduce noisy connections.
  • Multi-head attention can dilute informative signals.
  • Dropout of nodes or edges disrupts information flow.
  • Graph abstraction reduces fine-structure learning capability.

4.4. Advanced Sonification and Multimodal Representation

Neuroimaging features were translated into coherent auditory structures to enhance interpretability. Using region_to_musical_scale mapping, high-intensity regions corresponded to major scales, low-intensity to minor scales, large areas to pentatonic scales, and small/complex regions to blues scales. The anomaly_to_dissonance function encoded pathological areas with tritones (high-intensity anomalies), minor sevenths (moderate), and amplitude modulation to reflect spatial localization and anomaly severity. The integrated visual–auditory representations are summarized in Table 7 and Table 8. and illustrated in Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13.
Table 7. Comparison of visual versus auditory representation for brain image sonification. Panel A = visual segmentation; Panel B = spectrogram; Panel C = MIDI score.
Table 8. Estimated clinical endpoints with multimodal representation (color + sound). Interpretation time in seconds; confidence rated 1–5; NASA-TLX = subjective workload (0–100).
Figure 9. Advanced segmentation output highlighting abnormal regions (tumor) in brain MRI. Color intensity corresponds to predicted probability of abnormal tissue.
Figure 10. Advanced segmentation output for healthy brain regions. Color-coded mask reflects predicted normal tissue, supporting contrast with anomalous regions.
Figure 11. Visual representation of segmented tumor regions. Provides spatial context for abnormal areas detected in the MRI.
Figure 12. Auditory representation (spectrogram) of brain segmentation. Frequency and amplitude variations correspond to regional intensity and morphological patterns, enabling perceptual exploration of anomalies.
Figure 13. Auditory representation (PianoRoll) of brain segmentation. Pitch and Velocity variations correspond to regional intensity and morphological patterns, enabling perceptual exploration of anomalies.
The dataset including WAV/MIDI files, JSON metadata, and processing documentation has been made publicly available via Zenodo [39], DOI: https://doi.org/10.5281/zenodo.17116504.

4.5. User Accuracy and Learnability

U-Net’s superior Dice and IoU translate to clearer, spatially coherent segmentations, expected to improve radiologist accuracy. Sonification can reinforce ambiguous boundaries, potentially reducing false negatives/positives. Users with prior medical imaging experience are expected to learn the auditory cues rapidly; novice users may require longer exposure. Formal usability testing is planned for future work.

4.6. Discussion of Results

CNN-based models (U-Net, DeepLab) are preferred for dense segmentation tasks due to high spatial precision. Graph-based models (GAT, SGNN) may be suitable for low-latency or adaptive applications but underperform on fine spatial details. Multimodal outputs (color + sonification) enhance interpretability, potentially improving clinical decision-making.

4.7. Clinical and Translational Implications

Multimodal feedback improves interpretability, diagnostic accuracy, and efficiency. Applications extend beyond clinical settings to immersive audiovisual media and brain–computer interface systems.
Table 9 highlights trade-offs between segmentation accuracy, inference speed, and projected user-centered endpoints. Multimodal feedback is expected to increase diagnostic accuracy, reduce interpretation time, enhance confidence, and decrease cognitive workload (NASA-TLX).
Table 9. Algorithmic performance vs. estimated clinical endpoints.

5. Discussion

This study demonstrates the potential of translating cortical activation patterns into interpretable multimodal sensory representations, with applications spanning clinical diagnostics, cognitive neuroscience, and immersive audiovisual experiences. In clinical contexts, combining visual and auditory representations enhances interpretability, supports early detection of abnormalities, and may reduce cognitive fatigue during neuroimaging analysis.
CNN-based architectures, particularly U-Net, proved effective in generating high-fidelity sensory representations due to their ability to preserve spatial coherence across brain regions. Graph-based models, such as SGNN, offer low-latency advantages suitable for real-time or immersive audiovisual applications, albeit at some cost in spatial precision.

5.1. Clinical Rationale for Auditory Biomarkers

The concept of auditory biomarkers leverages human auditory perception to complement traditional visual inspection of neuroimaging data. High-intensity or anomalous regions in MRI scans can be mapped to distinctive sound patterns (e.g., dissonance, pitch modulation), enabling clinicians to detect subtle pathological changes that may be less apparent visually. This multisensory integration is grounded in cognitive neuroscience principles, where combining auditory and visual information facilitates pattern recognition, memory encoding, and anomaly detection [1,10]. Preliminary user feedback indicates that auditory cues can reinforce ambiguous boundaries and improve recognition speed, though formal usability studies are required to validate these observations in clinical practice.

5.2. Translational and Interdisciplinary Implications

Analogous to signal transformation in other high-dimensional systems—such as superconducting qubits encoding quantum states into phonons [40]—multimodal representations convert complex neural patterns into perceptually intuitive forms. This approach not only benefits clinical diagnostics but also enables applications in neurofeedback, education, and creative industries, including film and interactive media.

5.3. Deployment Considerations and Limitations

For real-world deployment, adherence to regulatory and ethical frameworks (e.g., EU Artificial Intelligence Act, 2024) is critical. Trustworthy sensory AI pipelines require interpretability, transparency, and human oversight to ensure safety and clinical reliability.
Current limitations include the following:
  • Analysis limited to 2D MRI slices; volumetric 3D data may enhance both segmentation and sonification fidelity.
  • GAT and SGNN architectures underperformed in dense segmentation due to sparse connectivity and attention dilution.
  • User-centered metrics reported here are projected estimates; formal clinical validation is needed.
  • Preliminary usability observations suggest rapid learnability for clinicians with prior imaging experience, but structured user studies are necessary.

5.4. Future Directions

Future work should explore the following:
  • Extension to 3D and multimodal neuroimaging for richer sensory representations.
  • Real-time adaptive sonification integrated with AI-assisted diagnostic tools and neurofeedback systems.
  • Formal user studies to quantify learning curves, interpretability gains, and diagnostic impact.
  • Optimization of graph-based models to improve spatial fidelity while maintaining low-latency performance.
  • Broader applications in educational and creative contexts, enabling cross-disciplinary engagement with neuroimaging data.
Overall, the proposed multimodal sensory framework demonstrates that integrating auditory and visual channels can enhance the interpretability and clinical relevance of AI-based neuroimaging analyses while providing avenues for interdisciplinary applications.

6. Conclusions

This study presents a validated pipeline for multimodal sensory representation of brain imaging data, integrating deep learning-based segmentation with complementary visual and auditory outputs. Structural MRI predictions were transformed into color-coded visual maps and stereophonic/MIDI sonifications, enabling intuitive, synesthetic interpretation of neural activity. Among evaluated architectures, the U-Net model achieved high-precision segmentation, supporting efficient generation of bimodal outputs.
The framework establishes a proof of concept linking computational image analysis with perceptual interpretation and introduces the notion of auditory biomarkers, which may facilitate early detection of pathological patterns through sound.
Key limitations include the modest dataset size, lack of formal clinical end-user validation, and heuristic sonification rules that do not yet incorporate systematic emotional or cognitive mappings. Future work should address these gaps as follows:
  • Conducting clinical validation with radiologists to quantify diagnostic accuracy, confidence, and interpretation speed.
  • Extending the approach to dynamic fMRI data, enabling time-resolved audiovisual representations of cognitive and emotional processes.
  • Integrating real-time feedback for immersive applications in film, neurofeedback, and human–machine interaction.
  • Optimizing sonification algorithms to incorporate perceptually grounded mappings, enhancing both clinical interpretability and educational utility.
Overall, this multisensory framework demonstrates the feasibility of translating neural activity into perceivable visual and auditory formats, providing a foundation for clinical, research, and artistic applications while promoting more intuitive understanding of brain function.

Author Contributions

Conceptualization, I.L.D.S., N.F.L., and J.M.F.M.; Methodology, I.L.D.S., N.F.L., and J.M.F.M.; Software, I.L.D.S.; Validation, N.F.L., and J.M.F.M.; Formal analysis, I.L.D.S., N.F.L., and J.M.F.M.; Investigation, I.L.D.S., and N.F.L.; Resources, I.L.D.S., N.F.L., and J.M.F.M.; Data curation, I.L.D.S.; Writing—original draft preparation, I.L.D.S., N.F.L., and J.M.F.M.; Writing—review and editing, I.L.D.S., N.F.L., and J.M.F.M.; Visualization, I.L.D.S., N.F.L., and J.M.F.M.; Supervision, N.F.L., and J.M.F.M.; Project administration, N.F.L., and J.M.F.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset supporting the results of this study, including WAV/MIDI files, JSON metadata, and processing documentation, has been made publicly available via Zenodo [39], DOI: https://doi.org/10.5281/zenodo.17116504 (accessed on 5 October 2025).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Attribution of Icons Used in the Figures

Table A1. Attribution of icons used in the figures. All icons were obtained from Flaticon.com under a free license with attribution, accessed on 2 November 2025.
Table A1. Attribution of icons used in the figures. All icons were obtained from Flaticon.com under a free license with attribution, accessed on 2 November 2025.
No.Icon/ConceptAuthor/DesignerDirect LinkAccess DateLicense
1BrainFreepikhttps://www.flaticon.com/free-icon/brain_69697112 November 2025Free with attribution
2Data LoadIda Desi Marianahttps://www.flaticon.com/free-icon/loading_126632882 November 2025Free with attribution
3NotebookGood Warehttps://www.flaticon.com/free-icon/diary_45336682 November 2025Free with attribution
4PythonFreepikhttps://www.flaticon.com/free-icon/python_18228992 November 2025Free with attribution
5RMI (Music/Stimulus)IconsMindhttps://www.flaticon.com/free-icon/music_167350822 November 2025Free with attribution
6Pre-processingUniconlabshttps://www.flaticon.com/free-icon/data-processing_142476192 November 2025Free with attribution
7Neural NetworkIconPro86https://www.flaticon.com/free-icon/neural-network_103215022 November 2025Free with attribution
8Neural Network (alt.)Pojok dhttps://www.flaticon.com/free-icon/neural-network_133205642 November 2025Free with attribution
9Loss FunctionPaul J.https://www.flaticon.com/free-icon/warning_113715612 November 2025Free with attribution
10OptimizationSyafii5758https://www.flaticon.com/free-icon/optimization_120016462 November 2025Free with attribution
11Predict ModelFreepikhttps://www.flaticon.com/free-icon/data-model_125386632 November 2025Free with attribution
12Sound SaveBoris Fariahttps://www.flaticon.com/free-icon/arrow_109235262 November 2025Free with attribution
13Sound GenerationFachhttps://www.flaticon.com/free-icon/ai-technology_170308402 November 2025Free with attribution
14Sound EffectsStockes Designhttps://www.flaticon.com/free-icon/sound-setting_43005052 November 2025Free with attribution
15Sound CombinationsEucalyphttps://www.flaticon.com/free-icon/sound-control_26378072 November 2025Free with attribution
16Sound PlaySmashiconshttps://www.flaticon.com/free-icon/play-button_6605592 November 2025Free with attribution
17HSV Color WheelCreatorhttps://www.flaticon.com/free-icon/color-wheel_114608362 November 2025Free with attribution
18RGBFreepikhttps://www.flaticon.com/free-icon/rgb_12158272 November 2025Free with attribution
19Save ImagePocikehttps://www.flaticon.com/free-icon/photo_154739702 November 2025Free with attribution
20Visual (Display)Smashiconshttps://www.flaticon.com/free-icon/computer-monitor_46179702 November 2025Free with attribution
21Sound WaveFreepikhttps://www.flaticon.com/free-icon/music-wave_40207492 November 2025Free with attribution
Note: All icons were obtained from Flaticon.com under the free license with attribution. Individual designers are acknowledged as listed above.

References

  1. Mantini, D.; Marzetti, L.; Corbetta, M.; Romani, G.L.; Del Gratta, C. Multimodal integration of fMRI and EEG data for high spatial and temporal resolution analysis of brain networks. Brain Topogr. 2010, 23, 150–158. [Google Scholar] [CrossRef] [PubMed]
  2. Lundervold, A.S.; Lundervold, A. An overview of deep learning in medical imaging focusing on MRI. Z. Med. Phys. 2019, 29, 102–127. [Google Scholar] [CrossRef] [PubMed]
  3. Kwee, T.C.; Kwee, R.M. Diagnostic radiology and its future: What do clinicians need. Eur. Radiol. 2023, 33, 6322–6333. [Google Scholar] [CrossRef]
  4. Lim, S.S.; Phan, T.D.; Law, M.; Goh, G.S.; Moriarty, H.K.; Lukies, M.W.; Joseph, T.; Clements, W. Non-radiologist perception of the use of artificial intelligence (AI) in diagnostic medical imaging reports. J. Med. Imaging Radiat. Oncol. 2022, 66, 1029–1034. [Google Scholar] [CrossRef]
  5. Chen, J.; Ma, J.; Huai, X. Qualitative studies: Designing a multimodal medical visualization tool for helping patients interpret 3D medical images. Front. Physiol. 2025, 16, 1559801. [Google Scholar] [CrossRef]
  6. Schütz, L.; Matinfar, S.; Schafroth, G.; Navab, N.; Fairhurst, M.; Wagner, A.; Wiestler, B.; Eck, U.; Navab, N. A Framework for Multimodal Medical Image Interaction. arXiv 2024, arXiv:2407.07015. [Google Scholar] [CrossRef] [PubMed]
  7. Shoeibi, A.; Khodatars, M.; Jafari, M.; Ghassemi, N.; Alizadehsani, R.; Nahavandi, S.; Gorriz, J.M.; Moradi, M.; Acharya, U.R. Diagnosis of brain diseases in fusion of neuroimaging modalities using deep learning: A review. Inf. Fus. 2023, 93, 3–25. [Google Scholar] [CrossRef]
  8. Xu, M.; Zhang, J.; Li, W.; Chen, Y.; Zhou, H.; Wang, P.; Zhang, L. Deep learning aided neuroimaging and brain regulation. Sensors 2023, 23, 4993. [Google Scholar] [CrossRef]
  9. Hansen, J.Y.; Cauzzo, S.; Singh, K.; García-Gomar, M.G.; Shine, J.M.; Bianciardi, M.; Misic, B. Integrating brainstem and cortical functional architectures. Nat. Neurosci. 2024, 27, 2500–2511. [Google Scholar] [CrossRef]
  10. Richter, D.; Heilbron, M.; de Lange, F.P. Dampened sensory representations for expected input across the ventral visual stream. Oxf. Open Neurosci. 2022, 1, kvac013. [Google Scholar] [CrossRef]
  11. Pérez-Bellido, A.; Barnes, K.A.; Crommett, L.E.; Yau, J.M. Auditory frequency representations in human somatosensory cortex. Cereb. Cortex 2018, 28, 3908–3921. [Google Scholar] [CrossRef]
  12. Eickhoff, S.B.; Milham, M.; Vanderwal, T. Towards clinical applications of movie fMRI. NeuroImage 2020, 220, 116860. [Google Scholar] [CrossRef]
  13. Schaefer, S.; Rotte, M. How movies move us: Movie preferences are linked to differences in brain reactivity. Front. Behav. Neurosci. 2024, 18, 1396811. [Google Scholar] [CrossRef]
  14. Mahrukh, R.; Shakil, S.; Malik, A.S. Sentiment analysis of fMRI using automatically generated stimuli labels under naturalistic paradigm. Sci. Rep. 2023, 13, 7267. [Google Scholar] [CrossRef] [PubMed]
  15. Saarimäki, H. Naturalistic stimuli in affective neuroimaging: A review. Front. Hum. Neurosci. 2021, 15, 675068. [Google Scholar] [CrossRef] [PubMed]
  16. Zhang, N.; Xu, N.-L. Reshaping sensory representations by task-specific brain states: Toward cortical circuit mechanisms. Curr. Opin. Neurobiol. 2022, 77, 102628. [Google Scholar] [CrossRef] [PubMed]
  17. van Hulle, M.M.; Orban, G.A. Entropy driven artificial neuronal networks and sensorial representation: A proposal. J. Parallel Distrib. Comput. 1989, 6, 264–290. [Google Scholar] [CrossRef]
  18. Mazzola, G.; Göller, S. Performance and interpretation. J. New Music Res. 2002, 31, 221–232. [Google Scholar] [CrossRef]
  19. Zifko, U.A.; Slomka, P.J.; Reid, R.H.; Young, G.B.; Remtulla, H.; Bolton, C.F. The cortical representation of somatosensory evoked potentials of the phrenic nerve. J. Neurol. Sci. 1996, 139, 197–202. [Google Scholar] [CrossRef]
  20. Bittar, R.G.; Ptito, A.; Reutens, D.C. Somatosensory representation in patients who have undergone hemispherectomy: A functional magnetic resonance imaging study. J. Neurosurg. 2000, 92, 45–51. [Google Scholar] [CrossRef]
  21. Wu, Y.; Mao, Y.; Feng, K.; Wei, D.; Song, L. Decoding of the neural representation of the visual RGB color model. PeerJ Comput. Sci. 2023, 9, e1376. [Google Scholar] [CrossRef] [PubMed]
  22. Tamakoshi, S.; Minoura, N.; Katayama, J.; Yagi, A. Entire sound representations are time-compressed in sensory memory: Evidence from MMN. Front. Neurosci. 2016, 10, 347. [Google Scholar] [CrossRef] [PubMed]
  23. Riecke, L.; van Opstal, A.J.; Goebel, R.; Formisano, E. Hearing illusory sounds in noise: Sensory-perceptual transformations in primary auditory cortex. J. Neurosci. 2007, 27, 12684–12689. [Google Scholar] [CrossRef]
  24. Melchiorre, A.B.; Schedl, M. Personality correlates of music audio preferences for modelling music listeners. Front. Psychol. 2020, 11, 760. [Google Scholar] [CrossRef]
  25. Khan, P.; Kader, M.F.; Islam, S.M.R.; Rahman, A.B.; Kamal, M.S.; Toha, M.U.; Kwak, K.-S. Machine learning and deep learning approaches for brain disease diagnosis: Principles and recent advances. IEEE Access 2021, 9, 37622–37655. [Google Scholar] [CrossRef]
  26. Samee, N.A.; Mahmoud, N.F.; Atteia, G.; Abdallah, H.A.; Alabdulhafith, M.; Al-Gaashani, M.S.A.M.; Ahmad, S.; Muthanna, M.S.A. Classification Framework for Medical Diagnosis of Brain Tumor with an Effective Hybrid Transfer Learning Model. Diagnostics 2022, 12, 2541. [Google Scholar] [CrossRef]
  27. Dudai, Y. The cinema-cognition dialogue: A match made in brain. Front. Hum. Neurosci. 2012, 6, 248. [Google Scholar] [CrossRef]
  28. Vanderwal, T.; Eilbott, J.; Castellanos, F.X. Movies in the magnet: Naturalistic paradigms in developmental functional neuroimaging. Dev. Cogn. Neurosci. 2019, 36, 100600. [Google Scholar] [CrossRef]
  29. Zhang, Z.; Xu, S.; Li, L.; Zhou, Y.; Lin, D.; Zhang, M.; Zhang, L.; Huang, G.; Liu, X.; Becker, B.; et al. Functional connectivity profiles of the default mode and visual networks reflect temporal accumulative effects of sustained naturalistic emotional experience. NeuroImage 2023, 273, 119941. [Google Scholar] [CrossRef]
  30. Roginska, A.; Mohanraj, H.; Ballora, M.; Friedman, K. Immersive Sonification for Displaying Brain Scan Data. In Proceedings of the International Conference on Health Informatics (HEALTHINF-2013), SCITEPRESS, Philadelphia, PA, USA, 9–11 September 2013; pp. 24–33. Available online: https://www.scitepress.org/Papers/2013/42029/42029.pdf (accessed on 9 November 2025). [CrossRef]
  31. Cadiz, R.F.; de la Cuadra, P.; Montoya, A.; Marín, V.; Andia, M.E.; Tejos, C.; Irarrazaval, P. Sonification of Medical Images Based on Statistical Descriptors. In Proceedings of the International Computer Music Conference (ICMC), Denton, TX, USA, 25 September–1 October 2015; Available online: https://quod.lib.umich.edu/cgi/p/pod/dod-idx/sonification-of-medical-images-based-on-statistical.pdf (accessed on 9 November 2025).
  32. Chiroiu, V.; Tarnita, D.; Popa, D. Using the Sonification for Hardly Detectable Details in Medical Images. Sci. Rep. 2019, 9, 17711. [Google Scholar] [CrossRef]
  33. Matinfar, S.; Dehghani, S.; Salehi, M.; Sommersperger, M.; Faridpooya, K.; Fairhurst, M.; Navab, N. From Tissue to Sound: A New Paradigm for Medical Sonic Interaction Design. Med. Image Anal. 2025, 103, 103571. [Google Scholar] [CrossRef] [PubMed]
  34. Clark, K.; Vendt, B.; Smith, K.; Freymann, J.; Kirby, J.; Koppel, P.; Moore, S.; Phillips, S.; Maffitt, D.; Pringle, M.; et al. The Cancer Imaging Archive (TCIA): Maintaining and operating a public information repository. J. Digit. Imaging 2013, 26, 1045–1057. [Google Scholar] [CrossRef] [PubMed]
  35. Zakharova, G.; Efimov, V.; Raevskiy, M.; Rumiantsev, P.; Gudkov, A.; Belogurova-Ovchinnikova, O.; Sorokin, M.; Buzdin, A. Reclassification of TCGA diffuse glioma profiles linked to transcriptomic, epigenetic, genomic and clinical data, according to the 2021 WHO CNS tumor classification. Int. J. Mol. Sci. 2023, 24, 157. [Google Scholar] [CrossRef] [PubMed]
  36. Li, Z.; Zhang, X.; Huang, J. Automatic brain MRI tumour segmentation based on deep fusion of feature maps. Artif. Intell. Rev. 2025, 588, 154. [Google Scholar] [CrossRef]
  37. Avazov, K.; Mirzakhalilov, S.; Umirzakova, S.; Abdusalomov, A.; Cho, Y.I. Dynamic Focus on Tumor Boundaries: A Lightweight U-Net for MRI Brain Tumor Segmentation. Bioengineering 2024, 11, 1302. [Google Scholar] [CrossRef]
  38. Liu, G.; Zhang, H.; Yang, Y. mResU-Net: Multi-scale residual U-Net-based brain tumour segmentation. IEEE J. Biomed. Health Inform. 2023, 27, 3349–3359. [Google Scholar] [CrossRef]
  39. da Silva, I.L.; Lori, N.F.; Machado, J.M.F. Advanced Brain Sonification Documentation (Version 1). Available online: https://doi.org/10.5281/zenodo.17116504 (accessed on 5 October 2025).
  40. Bozkurt, A.B.; Golami, O.; Yu, Y.; Tian, H.; Mirhosseini, M. A mechanical quantum memory for microwave photons. Nat. Phys. 2025, 21, 1469–1474. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.