1. Introduction
Post-traumatic stress disorder (PTSD) represents one of the most debilitating psychiatric conditions, characterized by intrusive re-experiencing, avoidance behaviors, negative alterations in cognition and mood, and hyperarousal symptoms [
1]. Recent studies show that about 3.6% of adults in the U.S. experience PTSD each year, while lifetime prevalence is estimated at 6.8% in the U.S. and 3.9% across the world population [
2]. Women are disproportionately affected by PTSD, with lifetime prevalence nearly twice as high in women (10–12%) as in men (5–6%). A similar disparity is evident in adolescence, where prevalence is 8.0% for females compared to 2.3% for males [
3]. PTSD imposes a substantial economic burden, with annual costs reaching
billion in the United States [
4], while European data demonstrates healthcare expenditures three times higher than controls, with lifetime costs approximating €43,000 per patient [
5].
The clinical differentiation of PTSD presents significant diagnostic challenges due to substantial symptom overlap with panic attacks and major depressive disorder [
6]. Post-traumatic stress disorder (PTSD) and panic attacks are pretty similar in terms of autonomic symptoms; panic attacks are reported as a secondary symptom in approximately 30–60% of PTSD patients [
7]. The diagnostic complexity is further amplified by PTSD’s substantial comorbidity with major depressive disorder, as approximately 52% of individuals with PTSD meet criteria for comorbid depression, while 36–61% of patients presenting with primary depression harbor undiagnosed PTSD [
8,
9,
10]. The diagnostic confusion stems from shared symptom clusters including anhedonia, emotional numbing, sleep disturbances, concentration difficulties, and social withdrawal [
11]. Patients with dual PTSD-depression diagnoses exhibit reduced treatment response rates and longer recovery trajectories [
12]. This diagnostic overlap necessitates sophisticated approaches to differential diagnosis, as misclassification can lead to suboptimal treatment selection [
13].
The diagnostic confusion stems from convergent effects on shared cardiovascular pathways, particularly through dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis and sympathoadrenal system [
14]. Meta-analytic evidence demonstrates that PTSD confers a 55–61% increased risk of coronary heart disease [
15], with all three disorders triggering excessive catecholamine release, resulting in identical acute cardiovascular manifestations including elevated heart rate, blood pressure fluctuations, and altered heart rate variability [
16]. While anxiety states activate both HPA and sympathoadrenal axes simultaneously, panic attacks demonstrate predominant sympathetic activation with minimal HPA involvement [
17]. The chronic dysregulation leads to sustained cardiovascular risk through endothelial dysfunction, accelerated atherosclerosis, increased inflammatory marker expression, and altered autonomic nervous system balance [
18]. Recent research utilizing the Trier Social Stress Test reveals that PTSD patients exhibit blunted acute stress responses with slower cardiovascular recovery and reduced heart rate variability [
19]. These shared pathophysiological mechanisms underscore the need for objective, signal-based diagnostic approaches [
20].
The emergence of deep learning (DL) technologies has revolutionized psychiatric disorder classification by providing unprecedented capabilities to extract complex patterns from neuroimaging and physiological signals [
21]. Recent studies have demonstrated the effectiveness of deep learning techniques in detecting cardiovascular diseases, identifying ECG arrhythmias, and performing automated health classification [
22,
23,
24,
25,
26].
Recent bibliometric analyses reveal rapid growth in DL applications for mental health disorders, with over 2811 research publications demonstrating CNN accuracies exceeding 98% for depression, schizophrenia, and anxiety disorders [
27,
28]. Neuroimaging-based DL approaches utilizing EEG, fMRI, and structural MRI have demonstrated remarkable success, with CNN-LSTM hybrid architectures showing superior performance in capturing both spatial and temporal features [
29,
30]. Multimodal deep learning algorithms analyzing EEG, fNIRS, and neuroimaging data yielded significant results, achieving 97.26% classification accuracy for schizophrenia detection and 94.34% for generalized anxiety disorder [
31,
32]. However, current DL approaches face significant limitations, including small heterogeneous datasets, lack of external validation, and the inability to effectively differentiate between disorders with overlapping symptomologies such as PTSD, depression, and panic attacks [
33]. This technological gap highlights the urgent need for innovative DL methodologies [
34].
Despite remarkable achievements in psychiatric disorder classification through neuroimaging, ECG-based DL for PTSD detection remains largely unexplored [
35]. While existing research has successfully utilized DL for cardiac pathology detection, psychiatric applications remain predominantly confined to binary PTSD versus control classification [
36,
37]. CWT emerges as the optimal solution, generating scalogram representations that simultaneously preserve time localization and frequency decomposition, creating rich visual patterns that CNNs can effectively process [
38]. The scalogram representation maintains non-stationary characteristics of cardiac signals, captures transient events crucial for psychiatric state identification, and provides multi-resolution analysis across different time scales [
39]. Recent methodological advances demonstrate that CWT-based scalograms enable CNNs to achieve superior performance in cardiac signal classification tasks [
40]. However, current ECG-based psychiatric classification research faces significant limitations, including a focus on binary classification and the absence of comprehensive frameworks for distinguishing between overlapping psychiatric conditions [
41].
Despite recent advances in artificial intelligence applications for mental health diagnostics, prior studies have predominantly focused on binary classification tasks (e.g., PTSD vs. control) and have not fully leveraged ECG-based features for multi-class differentiation. In addition, no comprehensive framework currently integrates time–frequency representations with hybrid deep and machine learning architectures to address overlapping psychiatric conditions.
To address these critical research gaps, this study introduces the first comprehensive framework for ECG-based multi-class psychiatric disorder classification through two key innovations. First, we develop a deep learning system capable of simultaneously differentiating PTSD, depression, panic attacks, and healthy control states from ECG signals, advancing beyond existing binary classification approaches to clinically relevant differential diagnosis. Second, we propose a novel hybrid CNN-SVM architecture that combines ResNet50’s automatic feature extraction with SVM’s robust classification performance, enhanced through PCA for optimal dimensionality reduction. This hybrid approach transforms CWT-derived scalograms into discriminative features that outperform individual CNN or traditional machine learning methods. Our framework employs rigorous 5-fold cross-validation and explores multiple ECG segment lengths to optimize diagnostic accuracy.
The main contributions of this study are summarised as follows:
A novel multi-class ECG-based diagnostic framework is introduced for differentiating PTSD, major depression, panic disorder, and healthy controls, addressing a gap left by prior binary classification studies.
Time–frequency representations (CWT-based scalograms) are used to capture patterns of autonomic dysregulation relevant to psychiatric conditions.
Multiple deep learning architectures (AlexNet, GoogLeNet, ResNet50) are systematically compared for the multi-class psychiatric ECG classification task.
A hybrid CNN–SVM pipeline enhanced by PCA is proposed to combine automatic deep feature extraction with robust machine-learning discrimination.
Four ECG segment durations (5 s, 10 s, 15 s, 20 s) are evaluated to investigate the effect of temporal resolution on diagnostic accuracy.
A comprehensive evaluation is conducted using accuracy, precision, recall, F1-score, AUC, and confusion-matrix analyses to identify the best classifier and window length.
3. Results
3.1. Multi-Class Classification Performance
Three different CNN architectures were evaluated for PTSD, depression, panic attacks, and multi-class classification of healthy control groups at varying segment lengths. The AlexNet model achieved an overall accuracy of 94.85% with an MCC value of 0.93 (
Table 5). The GoogLeNet model showed improved performance, yielding an accuracy of 96.14% and an MCC of 0.95 (
Table 6). The ResNet50 model achieved the highest performance on 5-s segments, with an overall accuracy of 96.65%, an MCC value of 0.96, and a micro-AUC of 0.998 (
Table 7). In PTSD classification, ResNet50 provided the best metrics, achieving an accuracy of 95.70%, a sensitivity of 94.67%, and a false positive rate of 4.30%.
Three CNN architectures were tested. These classified individuals with PTSD, depression, panic attacks, and healthy individuals. Different segment lengths were tried. ResNet50 yielded the best results. This model worked with 5-s segments. Overall accuracy was 96.65%. The MCC value was 0.96, and the micro-AUC value was 0.998. GoogLeNet showed different results. This model had an accuracy of 96.14% and an MCC value of 0.95. AlexNet performed worse, with an accuracy of 94.85% and an MCC value of 0.93. In PTSD classification, ResNet50 stood out. Accuracy was 95.70%, sensitivity was 94.67%, and the false positive rate was 4.30%. Clinically significant FOR values remained low in all models, ranging from 1.49% to 1.94%. This indicates a very low risk of missing PTSD cases. In all models, MCC values were above 0.92, indicating the strong correlation required for a psychiatric diagnosis. All models yielded the best results in 5-s segments. Performance decreased as segment duration increased. This finding is consistent with the acute cardiovascular features of PTSD attacks and supports the clinical significance of short ECG analysis windows.
3.2. Hybrid CNNs-SVM Multi-Class Classification
To further strengthen classification robustness, features extracted by CNN architectures were integrated with SVM classifiers following PCA dimensionality reduction. This hybrid approach combines the deep feature learning capacity of convolutional neural networks with the discriminative power of support vector machines, demonstrating superior performance in multi-class psychiatric disorder discrimination tasks.
SVM Configuration: Multi-class classification employed Error-Correcting Output Codes (ECOC) with linear kernel binary learners (MATLAB fitcecoc). Linear kernel was selected for its computational efficiency and appropriateness for PCA-transformed feature space. Features were standardized using Z-score normalization before PCA dimensionality reduction (95% variance retention), then classified via SVM.
The hybrid models consistently improved performance compared to CNN-only configurations across all segment lengths, with particularly pronounced advantages for shorter temporal segments. Notably, 5-s segments showed the most distinct class separations in PCA space, with each psychiatric condition forming well-defined clusters. As segment length increased to 20 s, class boundaries became increasingly diffuse, highlighting the critical impact of temporal resolution on feature extraction quality from ECG scalograms.
Figure 3 illustrates the PCA visualization of ResNet50 features across different ECG segment lengths, revealing the critical impact of temporal resolution on classification performance. In 5-s segments (a), the four psychiatric classes form remarkably distinct clusters in PCA space, with depression (blue) creating a compact cluster in the lower right, control group (red) concentrated in the lower left, panic attack (yellow) positioned in the middle right, and PTSD (purple) clearly separated in the upper left region. As segment length increases to 10 s (b), class separations remain preserved but cluster boundaries begin to soften slightly. At 15 s (c), inter-class distinctions show more pronounced degradation, particularly between control and depression groups, while PTSD maintains its characteristic position. In 20-s segments (d), inter-class overlaps reach maximum levels with all groups exhibiting more diffuse distributions, though general class tendencies are still preserved. This progressive degradation emphasizes that cardiovascular manifestations of psychiatric disorders require short-duration, high-resolution analysis for optimal discrimination.
For AlexNet + SVM, the best performance was obtained with 5 s segments, achieving 97.26% overall accuracy, 0.96 MCC, and a micro-AUC of 1.00 (
Table 8). PTSD classification reached 95.95% precision and 96.91% recall, with a low FDR of 4.04%, highlighting the model’s ability to minimize false negatives. Performance gradually decreased with longer durations, with accuracy dropping to 90.97% and MCC to 0.88 at 20 s.
Similarly, GoogleNet + SVM achieved its peak performance at 5 s with 96.35% accuracy, 0.95 MCC, and 0.99 micro-AUC (
Table 9). Precision and recall for PTSD were 94.20% and 94.83%, respectively. Despite robust results, a noticeable decline was observed for 15 s and 20 s, where accuracy fell to 91.33% and 90.31%, confirming the importance of short-segment ECG windows.
The ResNet50 + SVM hybrid achieved the best overall performance. With 5 s segments, it reached 97.05% accuracy, 0.97 MCC, and a nearly perfect micro-AUC of 1.00 (
Table 10). PTSD classification yielded 95.98% precision and 95.50% recall, with the lowest FDR (1.29%) among all models. Although performance slightly declined at longer durations (overall accuracy 91.48% at 20 s), the hybrid ResNet50 maintained consistently higher MCC values compared to AlexNet and GoogleNet.
The confusion matrices in
Figure 4 illustrate that ResNet50 + SVM provided the clearest class separation with 5 s segments, while misclassifications increased at 10–20 s. Corresponding ROC curves in
Figure 5 confirmed near-perfect separability at 5 s, with only minor reductions at longer durations. In summary, the CNNs-SVM hybrids demonstrated superior stability and generalization over CNN-only models. Among them, ResNet50 + SVM emerged as the most reliable, combining strong precision–recall balance with minimal false discovery rates. These findings highlight the potential of hybrid frameworks as clinically meaningful diagnostic tools for psychiatric ECG analysis, especially in minimizing the risk of missing PTSD cases.
3.3. Statistical Analysis
3.3.1. Statistical Significance Analysis of Resnet + SVM vs. CNN Models
The statistical significance of the observed performance differences was evaluated in
Table 11. Paired
t-tests and Wilcoxon tests were applied to the 5-fold cross-validation results. When the effect of SVM integration was examined, a 0.49% improvement (
p = 0.009) for ResNet50 and a 2.41% improvement (
p = 0.007) for AlexNet were found to be statistically significant, while no significant improvement was observed for GoogLeNet (
p = 0.548). In comparing the hybrid models, AlexNet + SVM achieved the highest accuracy (97.26%) and performed marginally better than ResNet50 + SVM (97.05%) (
p = 0.037). The difference between ResNet50 + SVM and GoogLeNet + SVM was not statistically significant (
p = 0.086). These results confirm that SVM integration provides real performance gains, especially for ResNet50 and AlexNet architectures.
3.3.2. McNemar’s Test for Prediction-Level Comparison on Hybrid CNN + SVM Models
To complement fold-level paired tests, we performed McNemar’s tests comparing prediction-level disagreements between models.
Table 12 presents the results.
McNemar’s test revealed no significant difference between ResNet50 + SVM and AlexNet + SVM (p = 0.502), indicating statistically equivalent performance despite the 0.21% mean accuracy difference. Both models significantly outperformed GoogLeNet + SVM (p < 0.05).
3.4. Error Analysis and Misclassification Patterns
Error analysis was performed on hybrid CNN + SVM models with 5-s segments. These results are shown in
Table 13. PTSD-Control confusion was most pronounced, while PTSD-Depression and PTSD-Panic confusions were minimal. AlexNet + SVM and Resnet50 + SVM showed the best overall discrimination. This confirms the specificity of ECG biomarkers in differentiating PTSD from other psychiatric disorders. Based on these results, PTSD is confused with the control group, and depression and panic attacks are also confused with each other.
3.5. Performance of Traditional Machine Learning Approaches
Table 14 presents the performance of conventional machine learning approaches using handcrafted statistical features extracted from ECG signals. These features include amplitude-based parameters (peak value, RMS, mean), variation measures (standard deviation, skewness, kurtosis), signal geometry descriptors (shape factor, crest factor, clearance factor, impulse factor), and signal quality metrics (SNR, SINAD, THD). All features were computed in MATLAB and selected based on their established effectiveness in biomedical signal processing for distinguishing pathological and healthy ECG patterns [
35].
Traditional machine learning approaches achieved overall accuracies ranging from 32–44%, barely exceeding chance performance (25% for 4-class classification). The best-performing traditional method, Linear SVM, achieved 44.30% accuracy, while the worst-performing method, Neural Network, achieved only 31.65%.
The ROC analysis in
Figure 6 clearly demonstrates the performance differences between classes. AUC values range from 0.405 to 0.791. The highest performance was obtained for the control class in the ensemble model with an AUC of 0.791. The lowest performance was observed for the PTSD class in the Three-Layer Neural Network with an AUC of 0.405. This wide performance range reflects the complexity of multiple classifications. These results demonstrate the necessity of deep learning for this complex multi-class psychiatric classification task, as our proposed hybrid CNN-SVM approach achieves approximately twice the accuracy compared to conventional methods.
4. Discussion
This study is based on ECG recordings from 79 participants in a single center. Collecting psychiatric ECG datasets is challenging, even for retrospective data, due to ethical constraints, expert requirements, and multicenter validation processes. Despite this limitation, the model demonstrated stable performance in layered cross-validation. To address potential data leakage concerns, we employed strict subject-wise cross-validation ensuring no participant’s segments appeared in both training and validation sets. However, our study lacks an independent test set, representing an inherent trade-off given our limited sample size. The reported performance reflects internal validation rather than true external generalization. Future studies will focus on multicenter external validation to establish clinical generalizability and assess model robustness across diverse populations.
Hyperparameters were tuned on one configuration (20 s-AlexNet) and applied to others. Nested cross-validation would provide more conservative estimates, but was computationally prohibitive. However, our best performance came from an untested configuration (5 s-ResNet50: 97.05% vs. 20 s-AlexNet: 91.48%), mitigating concerns about overfitting.
The second important aspect is longitudinal ECG monitoring. Repeated recordings or wearable measuring devices make it possible to assess temporal symptom dynamics and treatment response. This addresses the limitation of the single-session 5-min recordings used in this study. However, our approach partially compensates for this deficiency by dividing the 5-min ECG into shorter segments. These shorter windows capture more subtle autonomous changes and allow CNN models to extract deeper time-frequency features.
Explainability remains an open problem in deep learning. Current attribution methods do not provide fully reliable physiological interpretations for clinical decision-making. Future studies will integrate SHAP-based feature attribution and waveform-level analyses to better understand the cardiac dynamics driving model decisions. Recent studies have explored multimodal physiological fusion combining ECG with phonocardiogram (PCG), electrodermal activity (EDA), respiration, or accelerometry. The reference BSPC study [
47] demonstrates that integrating complementary modalities increases feature diversity. However, these approaches require multiple synchronous physiological signals that are not routinely accessible in standard psychiatric clinics. Our study deliberately focused on a single-modality ECG framework to assess whether autonomous signatures alone carry sufficient discriminatory information for multiclass psychiatric discrimination. This design is consistent with the practical limitations of real-world psychiatric assessment, where ECG is often the only routine physiological signal collected. However, multimodal fusion represents a valuable future direction, particularly for capturing complementary autonomous and behavioral markers.
In terms of computational feasibility, the model was trained and tested on a standard CPU-based workstation (Intel Core i5, 8 GB RAM) without GPU acceleration. Despite the modest hardware, inference per scalogram required only tens of milliseconds, demonstrating suitability for real-time or near-real-time deployment in future devices. This system is not designed as a standalone diagnostic tool, but rather as a decision support tool that can complement existing scales such as PCL-5, HAM-D, and clinical interviews.