Author Contributions
Conceptualization, H.A., S.B. and J.F.S.; Methodology, H.A., S.B. and J.F.S.; Software, H.A. and J.F.S.; Validation, H.A., S.B. and J.F.S.; Formal analysis, H.A., S.B. and J.F.S.; Investigation, H.A., S.B. and J.F.S.; Data curation, H.A., R.R. and R.K.; Writing—original draft preparation, H.A. and S.B.; Writing—review and editing, H.A., R.N., S.M., H.C. and M.M.; Visualization, H.A.; Supervision, R.N., S.M., H.C. and M.M.; Project administration, R.N., S.M., H.C. and M.M. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Example EEG signals from the Fp1 channel for R (left) and NR (right) patients to SSRI therapy (top row) and rTMS therapy (bottom row).
Figure 1.
Example EEG signals from the Fp1 channel for R (left) and NR (right) patients to SSRI therapy (top row) and rTMS therapy (bottom row).
Figure 2.
Workflow of the proposed computer-aided decision (CAD) system for predicting depression therapy outcomes from pre-treatment EEG signals. The pipeline comprises five stages: (1) Preprocessing: raw EEG recordings from the SSRI and rTMS datasets are band-pass filtered (0.5–70 Hz), notch filtered (50 Hz), motion-artifact corrected via artifact subspace reconstruction (ASR), re-referenced to the common average reference (CAR), denoised using Multiscale Principal Component Analysis (MSPCA), and segmented into non-overlapping 15-s epochs; (2) Image Generation: each epoch and channel (19 total) is transformed into a time-frequency image using Continuous Wavelet Transform (CWT), Variational Mode Decomposition (VMD), or their pixel-wise average (Fusion); (3) Cross-Validation: 6-fold CV applied at either the image level (image-independent) or patient level (subject-independent); (4) Classification: four ImageNet-pretrained CNNs (ResNet-18, MobileNet-V3, EfficientNet-B0, TinyViT-Hybrid) are fine-tuned to classify images as responder (R) or non-responder (NR); (5) Performance Metrics: accuracy, precision, recall, specificity, and F1-score are reported at the image level; subject-level accuracy with 95% CI is reported under subject-independent CV.
Figure 2.
Workflow of the proposed computer-aided decision (CAD) system for predicting depression therapy outcomes from pre-treatment EEG signals. The pipeline comprises five stages: (1) Preprocessing: raw EEG recordings from the SSRI and rTMS datasets are band-pass filtered (0.5–70 Hz), notch filtered (50 Hz), motion-artifact corrected via artifact subspace reconstruction (ASR), re-referenced to the common average reference (CAR), denoised using Multiscale Principal Component Analysis (MSPCA), and segmented into non-overlapping 15-s epochs; (2) Image Generation: each epoch and channel (19 total) is transformed into a time-frequency image using Continuous Wavelet Transform (CWT), Variational Mode Decomposition (VMD), or their pixel-wise average (Fusion); (3) Cross-Validation: 6-fold CV applied at either the image level (image-independent) or patient level (subject-independent); (4) Classification: four ImageNet-pretrained CNNs (ResNet-18, MobileNet-V3, EfficientNet-B0, TinyViT-Hybrid) are fine-tuned to classify images as responder (R) or non-responder (NR); (5) Performance Metrics: accuracy, precision, recall, specificity, and F1-score are reported at the image level; subject-level accuracy with 95% CI is reported under subject-independent CV.
![Brainsci 16 00301 g002 Brainsci 16 00301 g002]()
Figure 3.
CWT-based time-frequency images generated from the Fp1 channel for responder (R) and non-responder (NR) patients across SSRI and rTMS therapies. The images illustrate the spectral-temporal structure of EEG activity used for predicting treatment outcomes.
Figure 3.
CWT-based time-frequency images generated from the Fp1 channel for responder (R) and non-responder (NR) patients across SSRI and rTMS therapies. The images illustrate the spectral-temporal structure of EEG activity used for predicting treatment outcomes.
Figure 4.
VMD-based time-frequency images generated from the Fp1 channel for responder (R) and non-responder (NR) patients across SSRI and rTMS therapies. The images highlight intrinsic mode variations that differentiate treatment-responsive from treatment-resistant neural activity.
Figure 4.
VMD-based time-frequency images generated from the Fp1 channel for responder (R) and non-responder (NR) patients across SSRI and rTMS therapies. The images highlight intrinsic mode variations that differentiate treatment-responsive from treatment-resistant neural activity.
Figure 5.
Fusion-based EEG representations of responder (R) and non-responder (NR) across SSRI and rTMS therapies, generated from the Fp1 channel. The fused time-frequency patterns combine the information of CWT scalograms and VMD-derived spectrograms via pixel-wise averaging to emphasise discriminative structures for treatment-outcome prediction.
Figure 5.
Fusion-based EEG representations of responder (R) and non-responder (NR) across SSRI and rTMS therapies, generated from the Fp1 channel. The fused time-frequency patterns combine the information of CWT scalograms and VMD-derived spectrograms via pixel-wise averaging to emphasise discriminative structures for treatment-outcome prediction.
Figure 6.
Average 6-fold confusion matrices for SSRI treatment response prediction under image-independent CV. Results are shown for three time-frequency representations—CWT (panels a–d), VMD (panels e–h), and Fusion (panels i–l)—and four CNN architectures (ResNet-18, EfficientNet-B0, MobileNet-V3, TinyViT-Hybrid). Each cell reports the mean count averaged across the six folds; non-integer values arise from this averaging. Rows indicate the true class label and columns the predicted label, where Resp = responder and Non_Resp = non-responder to SSRI therapy. The colour scale reflects the normalised proportion per cell. The approximate number of test images per fold is 2159 (∼709R and ∼1090NR).
Figure 6.
Average 6-fold confusion matrices for SSRI treatment response prediction under image-independent CV. Results are shown for three time-frequency representations—CWT (panels a–d), VMD (panels e–h), and Fusion (panels i–l)—and four CNN architectures (ResNet-18, EfficientNet-B0, MobileNet-V3, TinyViT-Hybrid). Each cell reports the mean count averaged across the six folds; non-integer values arise from this averaging. Rows indicate the true class label and columns the predicted label, where Resp = responder and Non_Resp = non-responder to SSRI therapy. The colour scale reflects the normalised proportion per cell. The approximate number of test images per fold is 2159 (∼709R and ∼1090NR).
Figure 7.
Training and validation loss and accuracy curves for CWT-based representations on the SSRI dataset under image-independent CV. Results are averaged across six folds with standard deviation shading.
Figure 7.
Training and validation loss and accuracy curves for CWT-based representations on the SSRI dataset under image-independent CV. Results are averaged across six folds with standard deviation shading.
Figure 8.
Training and validation loss and accuracy curves for VMD-based representations on the SSRI dataset under image-independent CV. Results are averaged across six folds with standard deviation shading.
Figure 8.
Training and validation loss and accuracy curves for VMD-based representations on the SSRI dataset under image-independent CV. Results are averaged across six folds with standard deviation shading.
Figure 9.
Training and validation loss and accuracy curves for the Fusion (CWT+VMD) representation on the SSRI dataset under image-independent CV. Results are averaged across six folds with standard deviation shading.
Figure 9.
Training and validation loss and accuracy curves for the Fusion (CWT+VMD) representation on the SSRI dataset under image-independent CV. Results are averaged across six folds with standard deviation shading.
Figure 10.
Per-channel subject-level accuracy (%) for the SSRI dataset under subject-independent CV across CWT, VMD, and Fusion representations and four CNN models.
Figure 10.
Per-channel subject-level accuracy (%) for the SSRI dataset under subject-independent CV across CWT, VMD, and Fusion representations and four CNN models.
Figure 11.
Average 6-fold confusion matrices for rTMS treatment response prediction under image-independent CV. Results are shown for three time-frequency representations—CWT (panels a–d), VMD (panels e–h), and Fusion (panels i–l)—and four CNN architectures (ResNet-18, EfficientNet-B0, MobileNet-V3, TinyViT-Hybrid). Each cell reports the mean count averaged across the six folds; non-integer values arise from this averaging. Rows indicate the true class label and columns the predicted label, where Resp = responder and Non_Resp = non-responder to rTMS therapy. The colour scale reflects the normalised proportion per cell. The approximate number of test images per fold is 2711 (∼1384R and ∼1327NR), reflecting the balanced class distribution of the rTMS dataset.
Figure 11.
Average 6-fold confusion matrices for rTMS treatment response prediction under image-independent CV. Results are shown for three time-frequency representations—CWT (panels a–d), VMD (panels e–h), and Fusion (panels i–l)—and four CNN architectures (ResNet-18, EfficientNet-B0, MobileNet-V3, TinyViT-Hybrid). Each cell reports the mean count averaged across the six folds; non-integer values arise from this averaging. Rows indicate the true class label and columns the predicted label, where Resp = responder and Non_Resp = non-responder to rTMS therapy. The colour scale reflects the normalised proportion per cell. The approximate number of test images per fold is 2711 (∼1384R and ∼1327NR), reflecting the balanced class distribution of the rTMS dataset.
Figure 12.
Training and validation loss and accuracy curves for CWT-based representations on the rTMS dataset under image-independent CV. Results are averaged across six folds with standard deviation shading.
Figure 12.
Training and validation loss and accuracy curves for CWT-based representations on the rTMS dataset under image-independent CV. Results are averaged across six folds with standard deviation shading.
Figure 13.
Training and validation loss and accuracy curves for VMD-based representations on the rTMS dataset under image-independent CV. Curves show mean performance across six folds with standard deviation shading.
Figure 13.
Training and validation loss and accuracy curves for VMD-based representations on the rTMS dataset under image-independent CV. Curves show mean performance across six folds with standard deviation shading.
Figure 14.
Training and validation loss and accuracy curves for the Fusion (CWT+VMD) representation on the rTMS dataset under image-independent CV. Shaded regions represent one standard deviation across six folds.
Figure 14.
Training and validation loss and accuracy curves for the Fusion (CWT+VMD) representation on the rTMS dataset under image-independent CV. Shaded regions represent one standard deviation across six folds.
Figure 15.
Per-channel subject-level accuracy (%) for the rTMS dataset under subject-independent CV across CWT, VMD, and Fusion representations and four CNN models.
Figure 15.
Per-channel subject-level accuracy (%) for the rTMS dataset under subject-independent CV across CWT, VMD, and Fusion representations and four CNN models.
Table 1.
A summary of previous works on predicting outcomes of depression therapy. All reported accuracy values correspond to image-level classification results.
Table 1.
A summary of previous works on predicting outcomes of depression therapy. All reported accuracy values correspond to image-level classification results.
| Ref, Year | Methods | Therapy | No. of Patients | Accuracy |
|---|
| [14] | Wavelet transform + STFT + EMD; ROC analysis; logistic regression | SSRI | 16 R vs. 18 NR | 91.60% |
| [15] | Nonlinear EEG features; Fisher discriminant ratio; multifractal analysis classifier | SSRI | 11 R vs. 11 NR | 87.40% |
| [16] | EEG biomarkers + clinical variables; Student’s t-test; SVM | SSRI | 155 R vs. 67 NR | 82.40% |
| [17] | Demographic + EEG + source-localised features; PCA; random forest | SSRI | 27 R vs. 24 NR | 88.00% |
| [18] | Absolute/relative EEG band power + beta-to-alpha ratio; gradient boosted decision tree | SSRI | 528 patients | C-index = 0.963 |
| [19] | EEG signal signatures; sequential least-squares regression | SSRI | 309 patients | RMSE = 5.68 |
| [5] | CWT scalogram images; majority voting ensemble of five pretrained TL-CNN models | SSRI | 12 R vs. 18 NR | 96.55% |
| [20] | Low-resolution brain tomography; functional connectivity + coherence; ROC | SSRI | 12 R vs. 18 NR | – |
| [21] | Cascaded pretrained TL pipeline; biologically inspired LSTM; raw EEG | SSRI | 12 R vs. 18 NR | 98.84% |
| [22] | Inter-channel brain rhythm connectivity; four sequential TL models; LSTM | SSRI | 12 R vs. 18 NR | 98.33% |
| [23] | EEG channel connectivity images; voting-based TL ensemble + LSTM | rTMS | 23 R vs. 23 NR | 99.32% |
| [7] | CWT scalogram images; fine-tuned TL backbone; biologically inspired LSTM | rTMS | 23 R vs. 23 NR | 97.10% |
| [24] | Pretrained TL ensemble; biologically inspired LSTM; raw EEG sequences | rTMS | 23 R vs. 23 NR | 98.51% |
| [25] | EEG inter-channel connectivity; pretrained TL ensemble | rTMS | 34 patients | 92.28% |
| Proposed | CWT/VMD/Fusion images; ResNet-18, MobileNet-V3, EfficientNet-B0, TinyViT-Hybrid | SSRI/rTMS | 30/46 patients | 99.43% / 98.77% |
Table 2.
Number of images per subject for SSRI and rTMS datasets.
Table 2.
Number of images per subject for SSRI and rTMS datasets.
| SSRI Therapy | rTMS Therapy |
|---|
|
R
|
NR
|
R
|
NR
|
|---|
|
Subject
|
Images
|
Subject
|
Images
|
Subject
|
Images
|
Subject
|
Images
|
|---|
| 1 | 380 | 1 | 361 | 1 | 380 | 1 | 380 |
| 2 | 361 | 2 | 361 | 2 | 361 | 2 | 342 |
| 3 | 209 | 3 | 361 | 3 | 323 | 3 | 361 |
| 4 | 380 | 4 | 361 | 4 | 361 | 4 | 266 |
| 5 | 380 | 5 | 361 | 5 | 342 | 5 | 285 |
| 6 | 361 | 6 | 361 | 6 | 190 | 6 | 399 |
| 7 | 361 | 7 | 380 | 7 | 760 | 7 | 361 |
| 8 | 380 | 8 | 380 | 8 | 304 | 8 | 456 |
| 9 | 361 | 9 | 361 | 9 | 380 | 9 | 323 |
| 10 | 361 | 10 | 361 | 10 | 380 | 10 | 380 |
| 11 | 361 | 11 | 380 | 11 | 361 | 11 | 361 |
| 12 | 361 | 12 | 380 | 12 | 380 | 12 | 247 |
| – | – | 13 | 361 | 13 | 361 | 13 | 361 |
| – | – | 14 | 304 | 14 | 285 | 14 | 399 |
| – | – | 15 | 380 | 15 | 380 | 15 | 304 |
| – | – | 16 | 361 | 16 | 342 | 16 | 399 |
| – | – | 17 | 361 | 17 | 342 | 17 | 342 |
| – | – | 18 | 361 | 18 | 304 | 18 | 399 |
| – | – | – | – | 19 | 361 | 19 | 190 |
| – | – | – | – | 20 | 323 | 20 | 342 |
| – | – | – | – | 21 | 342 | 21 | 342 |
| – | – | – | – | 22 | 361 | 22 | 342 |
| – | – | – | – | 23 | 380 | 23 | 380 |
| Total | 4256 | Total | 6536 | Total | 8303 | Total | 7961 |
Table 3.
Summary of image generation parameters for CWT, VMD, and Fusion representations.
Table 3.
Summary of image generation parameters for CWT, VMD, and Fusion representations.
| Parameter | CWT | VMD | Fusion |
|---|
| Method | Continuous Wavelet Transform | Variational Mode Decomposition + STFT | Pixel-wise average of CWT and VMD |
| Mother wavelet/window | Analytic Morlet (amor) | Hamming window (256 samples) | – |
| Frequency resolution | 12 voices per octave | 7680 FFT points | Inherited from CWT and VMD |
| Frequency range | 2–60 Hz | 0–60 Hz | 0–60 Hz |
| Number of modes/overlap | – | IMFs; overlap = 250 samples (step = 6) | – |
| Penalty parameter | – | 2000 (MATLAB default) | – |
| Convergence tolerance | – | | – |
| Normalisation | Magnitude mapped to pixel intensity | Magnitude mapped to pixel intensity | Per-image min-max to via mat2gray; pixel-wise average |
| Colourmap | Jet | Jet | Jet |
| Output format | PNG, borderless | PNG, borderless | PNG, borderless |
| Output size (CNN input) | pixels for all models |
Table 4.
Architecture configurations and shared training hyperparameters for the four pretrained CNN models.
Table 4.
Architecture configurations and shared training hyperparameters for the four pretrained CNN models.
| Parameter | ResNet-18 | MobileNet-V3 | EfficientNet-B0 | TinyViT-Hybrid |
|---|
| Pretrained weights | ImageNet | ImageNet | ImageNet | ImageNet (backbone) |
| Backbone output dim | 512 | 960 | 1280 | 512 |
| Classifier head | Linear | 3-layer MLP | 2-layer MLP | Transformer + Linear |
| Transformer layers | – | – | – | 2 |
| Attention heads | – | – | – | 8 |
| Token dimension | – | – | – | 512 |
| Number of tokens | – | – | – | 49 (7 × 7) |
| MLP ratio | – | – | – | 4.0 |
| Dropout | – | – | – | 0.1 |
| Output classes | 2 | 2 | 2 | 2 |
| Shared Training Hyperparameters |
| Input resolution | |
| Batch size | 32 |
| Optimiser | AdamW |
| Learning rate | |
| Weight decay | |
| LR scheduler | Cosine annealing (, ) |
| Loss function | Cross-entropy |
| Max epochs | 8 |
| Early stopping patience | 3 (monitor: validation loss) |
| Normalisation (mean) | |
| Normalisation (std) | |
| Random seed | 42 (incremented by per fold) |
| Data augmentation | Not used |
Table 5.
Number of R and NR EEG images in the SSRI and rTMS datasets.
Table 5.
Number of R and NR EEG images in the SSRI and rTMS datasets.
| Dataset | R Images | NR Images |
|---|
| SSRI | 4256 | 6536 |
| rTMS | 8303 | 7961 |
Table 6.
Summary of the two CV strategies employed in this study.
Table 6.
Summary of the two CV strategies employed in this study.
| Property | Image-Independent CV | Subject-Independent CV |
|---|
| Purpose | Assess upper-bound image-level discrimination between R and non-R classes | Assess generalisation to completely unseen patients, simulating real clinical deployment |
| Partitioning unit | Individual images | Subjects (patients) |
| Number of folds | 6 | 6 |
| SSRI test fold | Stratified 1/6 of all images | responders + 3 non-responders |
| | | (5 subjects per fold, 30 total) |
| rTMS test fold | Stratified 1/6 of all images | Folds 1–5: 4R + 4NR; |
| | | Fold 6: 3R + 3NR |
| | | (46 subjects total) |
| Training (%) / validation split (%) | 90/10 | 90/10 |
| | (stratified, from non-test images) | (stratified by subject) |
| Can train and test images share | Yes, images from the same subject may | No, all images of a given subject are in |
| the same subject? | appear in both sets | one set only |
| Prediction level | Each image is classified independently | Images of each test subject are aggregated by a majority-vote |
| Reported metrics | Accuracy, Precision, Recall, Specificity, F1-Score | Subject-level Accuracy, (95% CI) |
| | (mean ± std over 6 folds) | |
| Interpretation | Upper-bound estimate of image-level class separability | Clinically realistic estimate; reflects true patient-level generalisation performance |
Table 7.
Image-level classification performance (%) for the SSRI dataset using three time–frequency representations (CWT, VMD, and Fusion) and four deep learning models under image-level CV. Values are reported as mean ± standard deviation across 6 folds.
Table 7.
Image-level classification performance (%) for the SSRI dataset using three time–frequency representations (CWT, VMD, and Fusion) and four deep learning models under image-level CV. Values are reported as mean ± standard deviation across 6 folds.
| Img Type | Model | Acc | Std | Prec | Std | Rec | Std | Spec | Std | F1 | Std |
|---|
| CWT | EfficientNet-B0 | 97.92 | 0.43 | 97.27 | 1.01 | 97.49 | 0.55 | 98.21 | 0.68 | 97.37 | 0.53 |
| CWT | MobileNet-V3 | 98.23 | 0.47 | 97.85 | 1.04 | 97.67 | 0.72 | 98.59 | 0.69 | 97.76 | 0.59 |
| CWT | ResNet-18 | 99.43 | 0.14 | 99.37 | 0.43 | 99.20 | 0.37 | 99.59 | 0.28 | 99.28 | 0.18 |
| CWT | TinyViT-Hybrid | 99.28 | 0.16 | 99.16 | 0.49 | 99.01 | 0.26 | 99.45 | 0.33 | 99.08 | 0.21 |
| VMD | EfficientNet-B0 | 96.75 | 0.55 | 95.39 | 1.61 | 96.45 | 0.57 | 96.94 | 1.13 | 95.91 | 0.66 |
| VMD | MobileNet-V3 | 97.19 | 0.41 | 96.60 | 1.17 | 96.29 | 0.74 | 97.78 | 0.79 | 96.44 | 0.50 |
| VMD | ResNet-18 | 98.84 | 0.18 | 98.50 | 0.54 | 98.57 | 0.47 | 99.02 | 0.36 | 98.53 | 0.23 |
| VMD | TinyViT-Hybrid | 98.72 | 0.26 | 98.68 | 0.37 | 98.07 | 0.43 | 99.14 | 0.24 | 98.37 | 0.33 |
| Fusion | EfficientNet-B0 | 97.08 | 0.42 | 95.26 | 1.01 | 97.46 | 0.56 | 96.83 | 0.71 | 96.34 | 0.51 |
| Fusion | MobileNet-V3 | 97.21 | 0.34 | 96.06 | 1.01 | 96.92 | 0.36 | 97.40 | 0.71 | 96.48 | 0.41 |
| Fusion | ResNet-18 | 99.02 | 0.24 | 98.72 | 0.69 | 98.80 | 0.36 | 99.16 | 0.46 | 98.76 | 0.30 |
| Fusion | TinyViT-Hybrid | 98.82 | 0.35 | 98.50 | 0.75 | 98.52 | 0.40 | 99.02 | 0.49 | 98.51 | 0.44 |
Table 8.
Subject-level classification accuracy (%) for the SSRI dataset under subject-independent CV.
Table 8.
Subject-level classification accuracy (%) for the SSRI dataset under subject-independent CV.
| Img Type | Model | Acc | Std | CI Low | CI High |
|---|
| CWT | EfficientNet-B0 | 79.17 | 16.44 | 64.64 | 93.70 |
| CWT | MobileNet-V3 | 71.67 | 14.34 | 55.55 | 87.79 |
| CWT | ResNet-18 | 82.50 | 14.07 | 68.90 | 96.10 |
| CWT | TinyViT-Hybrid | 78.33 | 11.79 | 63.59 | 93.07 |
| VMD | EfficientNet-B0 | 76.94 | 12.56 | 61.87 | 92.01 |
| VMD | MobileNet-V3 | 76.94 | 12.56 | 61.87 | 92.01 |
| VMD | ResNet-18 | 82.50 | 14.07 | 68.90 | 96.10 |
| VMD | TinyViT-Hybrid | 82.50 | 14.07 | 68.90 | 96.10 |
| Fusion | EfficientNet-B0 | 82.50 | 14.07 | 68.90 | 96.10 |
| Fusion | MobileNet-V3 | 82.50 | 14.07 | 68.90 | 96.10 |
| Fusion | ResNet-18 | 78.33 | 11.79 | 63.59 | 93.07 |
| Fusion | TinyViT-Hybrid | 78.33 | 11.79 | 63.59 | 93.07 |
Table 9.
Image-level classification performance (%) for the rTMS dataset using three time–frequency representations (CWT, VMD, and Fusion) and four deep learning models under image-level CV. Values are reported as mean ± standard deviation across six folds.
Table 9.
Image-level classification performance (%) for the rTMS dataset using three time–frequency representations (CWT, VMD, and Fusion) and four deep learning models under image-level CV. Values are reported as mean ± standard deviation across six folds.
| Img Type | Model | Acc | Std | Prec | Std | Rec | Std | Spec | Std | F1 | Std |
|---|
| CWT | EfficientNet-B0 | 96.48 | 0.43 | 96.33 | 0.76 | 96.81 | 0.72 | 96.14 | 0.85 | 96.56 | 0.42 |
| CWT | MobileNet-V3 | 96.90 | 0.22 | 96.70 | 0.58 | 97.24 | 0.45 | 96.53 | 0.64 | 96.97 | 0.21 |
| CWT | ResNet-18 | 98.65 | 0.34 | 98.66 | 0.63 | 98.70 | 0.32 | 98.59 | 0.67 | 98.68 | 0.33 |
| CWT | TinyViT-Hybrid | 98.40 | 0.19 | 98.23 | 0.28 | 98.64 | 0.49 | 98.14 | 0.30 | 98.43 | 0.19 |
| VMD | EfficientNet-B0 | 96.88 | 0.32 | 97.20 | 0.32 | 96.66 | 0.51 | 97.10 | 0.35 | 96.93 | 0.32 |
| VMD | MobileNet-V3 | 96.98 | 0.35 | 97.15 | 0.42 | 96.93 | 0.41 | 97.04 | 0.45 | 97.04 | 0.34 |
| VMD | ResNet-18 | 98.77 | 0.14 | 98.75 | 0.32 | 98.84 | 0.30 | 98.69 | 0.34 | 98.80 | 0.13 |
| VMD | TinyViT-Hybrid | 98.59 | 0.28 | 98.64 | 0.57 | 98.60 | 0.36 | 98.58 | 0.60 | 98.62 | 0.27 |
| Fusion | EfficientNet-B0 | 96.43 | 0.34 | 96.54 | 0.53 | 96.46 | 0.59 | 96.39 | 0.58 | 96.50 | 0.34 |
| Fusion | MobileNet-V3 | 97.07 | 0.47 | 96.83 | 0.48 | 97.46 | 0.72 | 96.67 | 0.52 | 97.14 | 0.46 |
| Fusion | ResNet-18 | 98.49 | 0.12 | 98.65 | 0.39 | 98.39 | 0.50 | 98.59 | 0.42 | 98.52 | 0.12 |
| Fusion | TinyViT-Hybrid | 98.54 | 0.28 | 98.49 | 0.56 | 98.65 | 0.26 | 98.42 | 0.59 | 98.57 | 0.27 |
Table 10.
Subject-level classification accuracy (%) for the rTMS dataset under subject-independent CV.
Table 10.
Subject-level classification accuracy (%) for the rTMS dataset under subject-independent CV.
| Img Type | Model | Acc | Std | CI Low | CI High |
|---|
| CWT | EfficientNet-B0 | 71.63 | 21.63 | 58.60 | 84.66 |
| CWT | MobileNet-V3 | 81.15 | 15.41 | 69.85 | 92.45 |
| CWT | ResNet-18 | 78.77 | 13.26 | 66.95 | 90.59 |
| CWT | TinyViT-Hybrid | 76.39 | 17.53 | 64.12 | 88.66 |
| VMD | EfficientNet-B0 | 83.53 | 12.30 | 72.81 | 94.25 |
| VMD | MobileNet-V3 | 77.82 | 5.58 | 65.81 | 89.83 |
| VMD | ResNet-18 | 83.53 | 12.30 | 72.81 | 94.25 |
| VMD | TinyViT-Hybrid | 81.15 | 10.06 | 69.85 | 92.45 |
| Fusion | EfficientNet-B0 | 76.39 | 21.06 | 64.12 | 88.66 |
| Fusion | MobileNet-V3 | 76.39 | 21.06 | 64.12 | 88.66 |
| Fusion | ResNet-18 | 71.63 | 21.63 | 58.60 | 84.66 |
| Fusion | TinyViT-Hybrid | 76.69 | 14.61 | 64.47 | 88.91 |
Table 11.
Pairwise statistical comparison (Wilcoxon signed-rank test, p-values) of per-fold image-level accuracy between the four CNN architectures across three time-frequency representations for the SSRI and rTMS datasets.
Table 11.
Pairwise statistical comparison (Wilcoxon signed-rank test, p-values) of per-fold image-level accuracy between the four CNN architectures across three time-frequency representations for the SSRI and rTMS datasets.
| Dataset | Model Pair | CWT | VMD | Fusion |
|---|
| SSRI | EfficientNet-B0 vs. MobileNet-V3 | 0.0263 | 0.303 | 0.427 |
| EfficientNet-B0 vs. ResNet-18 | 0.0003 | 0.0003 | 0.0001 |
| EfficientNet-B0 vs. TinyViT-Hybrid | 0.0002 | 0.0004 | <0.0001 |
| MobileNet-V3 vs. ResNet-18 | 0.0009 | 0.0001 | 0.0001 |
| MobileNet-V3 vs. TinyViT-Hybrid | 0.001 | 0.0009 | <0.0001 |
| ResNet-18 vs. TinyViT-Hybrid | 0.084 | 0.384 | 0.236 |
| rTMS | EfficientNet-B0 vs. MobileNet-V3 | 0.0759 | 0.678 | 0.049 |
| EfficientNet-B0 vs. ResNet-18 | <0.0001 | <0.0001 | <0.0001 |
| EfficientNet-B0 vs. TinyViT-Hybrid | 0.0001 | <0.0001 | <0.0001 |
| MobileNet-V3 vs. ResNet-18 | <0.0001 | 0.0001 | 0.0015 |
| MobileNet-V3 vs. TinyViT-Hybrid | 0.0001 | 0.0003 | 0.0005 |
| ResNet-18 vs. TinyViT-Hybrid | 0.224 | 0.257 | 0.666 |
Table 12.
Pairwise statistical comparison (Wilcoxon signed-rank test, p-values) of per-fold image-level accuracy between the three time-frequency representations for the SSRI and rTMS datasets.
Table 12.
Pairwise statistical comparison (Wilcoxon signed-rank test, p-values) of per-fold image-level accuracy between the three time-frequency representations for the SSRI and rTMS datasets.
| Representation Pair | SSRI | rTMS |
|---|
| CWT vs. VMD | <0.0001 | 0.045 |
| CWT vs. Fusion | <0.0001 | 0.732 |
| VMD vs. Fusion | 0.0826 | 0.057 |
Table 13.
Comparison with state-of-the-art studies on EEG-based prediction of depression therapy outcome on image level.
Table 13.
Comparison with state-of-the-art studies on EEG-based prediction of depression therapy outcome on image level.
| Study | Therapy | Model | CV | Accuracy (%) |
|---|
| [14] | SSRI | WT + STFT + EMD + LR | 10-fold | 91.6 |
| [15] | SSRI | PSD + Coherence + MFA | 80/20 | 87.4 |
| [5] | SSRI | CWT + Ensemble TL | 5-fold | 96.55 |
| [21] | SSRI | Raw EEG + TL-LSTM-Att | 5-fold | 98.84 |
| [7] | rTMS | CWT + CNN-LSTM-Attention | 5-fold | 97.1 |
| [24] | rTMS | Raw EEG + TL-BLSTM Ensemble | 5-fold | 98.51 |
| Proposed (SSRI) | SSRI | CWT + ResNet-18 | 6-fold | 99.43 |
| Proposed (rTMS) | rTMS | VMD + ResNet-18 | 6-fold | 98.77 |