A CNN-Transformer Fusion Model for Proactive Detection of Schizophrenia Relapse from EEG Signals

Yasin, Sana; Adeel, Muhammad; Draz, Umar; Ali, Tariq; Hijji, Mohammad; Ayaz, Muhammad; Marei, Ashraf M.

doi:10.3390/bioengineering12060641

Open AccessArticle

A CNN-Transformer Fusion Model for Proactive Detection of Schizophrenia Relapse from EEG Signals

by

Sana Yasin

¹,

Muhammad Adeel

¹,

Umar Draz

^2,*,

Tariq Ali

^3,4,*

,

Mohammad Hijji

⁴,

Muhammad Ayaz

^3,4

and

Ashraf M. Marei

³

¹

Department of Computer Science, University of Okara, Okara 56300, Pakistan

²

Department of Computer Science, University of Sahiwal, Sahiwal 57000, Pakistan

³

Artificial Intelligence and Sensing Technologies (AIST) Research Center, University of Tabuk, Tabuk 71491, Saudi Arabia

⁴

Faculty of Computers and Information Technology, University of Tabuk, Tabuk 71491, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

Bioengineering 2025, 12(6), 641; https://doi.org/10.3390/bioengineering12060641

Submission received: 10 May 2025 / Revised: 31 May 2025 / Accepted: 6 June 2025 / Published: 12 June 2025

(This article belongs to the Special Issue Artificial Intelligence in Biomedical Imaging, Biosignals and Healthcare)

Download

Browse Figures

Versions Notes

Abstract

Proactively detecting schizophrenia relapse remains a critical challenge in psychiatric care, where traditional predictive models often fail to capture the complex neurophysiological and behavioral dynamics preceding recurrence. Existing methods typically rely on shallow architectures or unimodal data sources, resulting in limited sensitivity—particularly in the early stages of relapse. In this study, we propose a CNN-Transformer fusion model that leverages the complementary strengths of Convolutional Neural Networks (CNNs) and Transformer-based architectures to process electroencephalogram (EEG) signals enriched with clinical and sentiment-derived features. This hybrid framework enables joint spatial-temporal modeling of relapse indicators, allowing for a more nuanced and patient-specific analysis. Unlike previous approaches, our model incorporates a multi-resource data fusion pipeline, improving robustness, interpretability, and clinical relevance. Experimental evaluations demonstrate a superior prediction accuracy of 97%, with notable improvements in recall and F1-score compared to leading baselines. Moreover, the model significantly reduces false negatives, a crucial factor for timely therapeutic intervention. By addressing the limitations of unimodal and superficial prediction strategies, this framework lays the groundwork for scalable, real-world applications in continuous mental health monitoring and personalized relapse prevention.

Keywords:

unimodal data; robustness; transformer model; schizophrenia disorder; personalized relapse prevention

1. Introduction

Schizophrenia is a complex mental illness that continues to pose a major challenge to modern psychiatry. Despite decades of research, long-term effective management remains elusive due to recurrent psychotic episodes and complex symptom dynamics [1]. Originally defined by Emil Kraepelin and later recharacterized by Eugen Bleuler, the disorder involves intermittent breaks with reality, altered cognition, and impaired social functioning. It affects more than 24 million people globally—approximately 1 in 300—while adult prevalence reaches 0.45%. Alarmingly, more than half of diagnosed patients relapse within one year of discontinuing medication, exposing critical gaps in predictive treatment strategies [2]. Recent advances in deep learning provide promising tools for modeling the detailed neurophysiological and behavioral patterns associated with recurrence [3]. In addition, online platforms and patient communities provide valuable information on real-world factors, including substance use, emotional triggers, and side effects of medications [4]. However, these unstructured data sources require careful handling to ensure reliability and patient privacy. By integrating clinical metrics with digital behavioral information, researchers can develop more accurate and personalized recurrence prediction systems, paving the way for timely interventions and improved psychiatric care [5].

Figure 1 demonstrates the application of AI tools like deep learning and multimodal integration to support schizophrenia care throughout diagnosis, prognosis, and treatment. The methodology focuses on linguistic behavior analysis together with data fusion techniques and adaptive decision support systems to enhance predictive modeling and thereby increase accuracy and personalization in clinical applications. Precision psychiatry benefits from these elements because they facilitate early prediction of relapses while optimizing medication and supporting real-time monitoring. The primary objective of this study is to establish an accurate prediction model for relapse in schizophrenia through using electroencephalogram (EEG) recordings and innovative deep learning (DL) techniques. The framework consists of important components including the data preprocessing, the feature extraction, and a hybrid Transformer-based deep learning network, which enables it to efficiently learn the complex neuronal patterns. Moreover, the latter study highlights the need to establish preventive measures, including early intervention practices—i.e., psychiatric screenings on a regular basis and predictive monitoring—and personalized adjustments of treatment options: medications (dose optimization) and personalized therapeutic strategies [6].

Additionally, inclusion of social media trend analysis combined with real-time information from wearables provides a comprehensive view of how the mental health of patients is affected in a dynamic, real-world scenario. This synergism not only provides better predictive accuracy of relapse, but also indirectly enhances a proactive and continuous care model for care in schizophrenia [7]. Together, the presented framework aims to improve patient outcomes, reduce relapse rates, and contribute to the growing shift toward personalized data-driven mental health solutions.

To achieve the above-stated objectives, we developed a hybrid deep learning model that combines Convolutional Neural Networks (CNN), Transformer architectures, flattened layers, and fully connected layers to identify potential markers for predicting risk of schizophrenia relapse [8]. In the first modelling paradigm, EEG is combined with sentiment analysis and with clinical assessment tools including Positive and Negative Syndrome Scale (PANSS), Brief Psychiatric Rating Scale (BPRS), Global Assessment of Functioning (GAF), and Montgomery–Åsberg Depression Rating Scale (MADRS), all of which are important for the clinical progress of the disorder [9]. The data acquisition was carried out from electronic health records and structured clinical interviews, leading to a high-quality and very comprehensive dataset [10].

One important advantage of this approach is that it enables personalized treatment strategies addressing individual risk factors with specific interventions to enhance treatment outcome [11]. However, challenges remain in terms of data quality, ethics of data use, and limited coherent research frameworks [12]. Although social and behavioral data are often referenced, the use of these data to inform predictive modeling is still relatively low as studies have often relied heavily on clinical indicators. This study aims to fill that gap by integrating social media-derived data and clinical parameters to generate an improved relapse prediction model [13]. By leveraging such an interdisciplinary approach, we hope to advance a data-driven framework for the provision of mental health care. Our Transformer-based model—which aims for a predictive accuracy of more than 90%—offers a high reliability for forecasting schizophrenia relapse.

Clinical and neural data analysis with Transformer-based deep learning models demonstrates potential for schizophrenia relapse prediction through long-range temporal dependency detection [14]. These models excel at anomaly detection compared to traditional methods but face challenges because they need extensive labeled data and struggle with small noisy EEG signals [15]. Their application to MRI data analysis has shown essential brain region interactions, yet the spatial resolution alone does not capture relapse dynamics [16,17]. The blending of imaging data with clinical and demographic information leads to better predictive abilities [18] but current models fail to account for essential sentiment cues and behavioral patterns necessary for relapse identification. Transformer models with multi-head attention effectively process various input types yet they do not present understandable ways to connect different modalities [19]. Graph-based frameworks alongside attention-driven systems achieve enhanced robustness [20] yet encounter challenges with personalized and generalized patient treatment approaches.

The limitations of schizophrenia-specific applicability persist despite transfer learning and data augmentation addressing training scarcity [21] and large-scale datasets like HCP pretraining neuro models [22] because of domain shifts and demographic mismatches. Federated learning supports privacy-conscious model sharing between institutions [23] yet faces technical challenges when incorporating EEG and temporal data sequences in this framework. The clinical applicability of these models suffers due to their high complexity and substantial computational requirements [24]. Rule-based hybrid models provide interpretability [25] while struggling to adjust to patient condition changes. Model output associations with clinical risk factors improve interpretability [26], though these rule-driven mappings frequently reduce complex neural phenomena to simple models. Federated learning provides potential for fairness and inclusiveness [27,28,29], yet we have yet to determine its usefulness for real-time EEG analysis involving multiple modalities. While attention visualization and counterfactual reasoning have enhanced trust in deep learning predictions [30], few models apply these XAI methods to temporal frameworks specific to relapse prediction. VAEs and GANs address data distribution imbalances yet risk producing unrealistic artifacts and altering neurophysiological distributions [31,32]. Self-supervised learning enables the extraction of useful information from unlabeled data [33,34] but current methods do not adequately address the specific needs of dynamic psychiatric conditions.

The latest hybrid architectures that combine CNNs with Transformers provide improved spatial-temporal feature extraction [35,36] yet fail to incorporate contextual behavioral data. The current frameworks fail to adequately address ethical issues related to fairness and transparency. The healthcare field urgently requires a model that integrates multiple modalities while ensuring clinical relevance and interpretability. We present a CNN-Transformer fusion framework to learn spatial-temporal EEG patterns and combine them with sentiment analysis and clinical scores while improving minority-class learning through SMOTE. Our model architecture uniquely integrates high performance with interpretability to deliver a scalable solution for proactive schizophrenia relapse prediction [37,38,39,40,41]. Recent research confirms that deep learning systems such as neural networks and hybrid approaches are effective for diagnosing psychiatric disorders through EEG analysis for conditions including major depressive disorder and bipolar disorder as well as neurological symptoms associated with COVID-19 [42,43,44]. The research demonstrates EEG’s expanding application as a non-invasive scalable biomarker for mental health through active and passive EEG data integration. Our study advances this research area by utilizing a multimodal CNN-Transformer model to predict schizophrenia relapse through EEG signals and clinical plus sentiment-derived features.

Figure 2 presents a comprehensive flowchart of the technologies, trends, challenges, and opportunities related to improving schizophrenia relapse prediction using DL models. It outlines key enabling technologies like hybrid DL models, data preprocessing, and sentiment analysis integration. The trends section focuses on advancements such as real-time monitoring, data fusion, and federated learning for global collaboration. Challenges discussed include data privacy concerns, class imbalance, and high computational costs. Lastly, the opportunities section emphasizes personalized treatment plans, early detection, and the integration of multimodal data to enhance prediction accuracy and clinical outcomes.

This research tackles existing challenges by deploying a fusion model combining CNN and Transformer architectures to analyze both spatial and temporal EEG signal dependencies. Our design includes sentiment analysis and clinical scores to establish a strong multimodal prediction framework. The research study focused on resolving class imbalance by employing SMOTE alongside augmentation strategies which resulted in enhanced sensitivity metrics and reduced false negatives.

The study presents a deep learning framework which provides strong and clinically meaningful solutions to predict early schizophrenia relapses. This research presents a fundamental methodological advancement which uses a hybrid deep learning model that merges Convolutional Neural Networks (CNNs) for extracting spatial information with Transformer models to detect long-term temporal patterns in EEG signals. The combined architecture allows the model to perform neural dynamics analysis more effectively compared to single architecture systems. The integration of various data sources such as EEG signals and clinical scales PANSS and BPRS with sentiment-derived features into one prediction pipeline represents another essential contribution. A multidimensional perspective strengthens both the contextual understanding and diagnostic accuracy of the model. The study addresses class imbalance by applying Synthetic Minority Over-sampling Technique (SMOTE) and EEG-specific augmentation methods to enhance sensitivity while decreasing majority class bias. The empirical evaluations produced prediction accuracy results of 97% along with notable improvements in recall and F1-score metrics. The model’s clinical usefulness in planning proactive treatment becomes evident through its decreased number of false negatives. These elements together create an interpretable monitoring system that supports personalized treatment while providing scalable schizophrenia surveillance. Our study combines behavioral and medication-related features with clinical AI principles to enhance relapse prediction accuracy in real-world applications. We incorporate findings from Besana et al. to reinforce the clinical framework of our research. It completed a multicenter retrospective analysis to discover primary readmission predictors in young adults experiencing their initial episode of psychosis. According to their research findings, long-acting injectable antipsychotics (LAIs) decreased relapse risk whereas substance use raised the chances of hospital readmission. Our model’s emphasis on adherence and behavioral and affective indicators is reinforced by these findings and our approach gains validation through a clinical framework. Our predictive system gains enhanced translational value through this connection which aligns our feature space with known psychiatric risk markers [41].

Figure 2. Conceptual mind map outlining the enabling technologies, emerging trends, challenges, and opportunities in optimizing schizophrenia relapse prediction through deep-learning (DL) models. The diagram organizes key factors into five distinct categories: (I) Key Enabling Technologies, (II) Trends, (III) Challenges, (IV) Opportunities, and (V) Ethical and Infrastructure Considerations. Each category further highlights specific subdomains relevant to clinical integration, data management, and intelligent health care delivery.

Table 1 presents a comparison of existing schizophrenia prediction methods while detailing their objectives, models, accuracy levels and the types of data used. The listed studies identify the proposed hybrid model (CNN + Transformer + Sentiment Analysis) as the most effective because it achieves 97% accuracy while enabling early detection through EEG data.

2. Material and Methods

The primary objective of this study is to develop a reliably predictive model for the early detection of schizophrenia relapses by leveraging a combination of advanced DL techniques. The proposed approach integrates CNN and Transformer models to enhance predictive accuracy and improve early diagnosis. Early detection of schizophrenia relapses is crucial in clinical settings as it enables timely intervention, reduces the severity of symptoms, and improves patient outcomes. Combining different DL models, the study aims to exploit their strengths to achieve superior predictive performance.

Figure 3 shows a complete pipeline of EEG signal processing, starting from collection of raw brain signal, which is further pre-processed and transformed in order to improve the quality of signal. These are the steps after we convert signals to time-frequency representations and extract meaningful features. The features are then fed either into the classical discriminative models such as SVM, LDA, RF, and GRU, or the deep learning models including GAN, DBN, CNN, LSTM, and GRU, for classification. These ultimate results can be beneficial for multiple applications ranging from clinical diagnosis to BCI systems, neuroergonomics and other EEG-based assessments facilitating advanced mental health monitoring and cognitive assessment.

2.1. Data Collection and Understanding

The dataset used in this study includes patients diagnosed with schizophrenia and healthy control’s reliability in addition to EEG recordings. As time-series data, EEG data brings various challenges, such as noise artifacts, missing values, and variability in signal quality across different subjects. Since EEG signals are crucial in studying the brain, they must have high integrity and reliability. This track will highlight the latest innovations in EEG data integrity and reliability. To overcome these issues, a complete data pre-processing pipeline was developed to clean up, normalize, and improve the quality of the recorded signals. This pipeline was adapted for noise reduction, treating missing values, and scaling signal amplitudes to keep a reliable and increased measurement process dataset. This was followed by an EDA phase to better understand the dataset and its intrinsic characteristics. In this phase, the statistical distribution of the data was used to explore the dataset for anomalies and to visualize EEG waveforms across multiple channels to identify inconsistencies or abnormalities. Combining descriptive statistics and graphical techniques (histograms, box plots, line plots), we assessed key features of the dataset, including signal amplitude mean, signal variance across electrode sites, and the presence of artifacts (which could bias the results). These initial findings enabled a holistic understanding of the data, forming the subsequent features of the engineering and pre-processing steps. Additionally, signal quality was assessed, removing recordings that did not align with the requirements for reliable analysis. Several filtering techniques were performed to remove powerline interference, muscle artifacts, and baseline drifts to preserve the reliability of the signals. Common missing points (due to transient signal loss or electrode detachment) from EEG recordings were handled by imputation methods (interpolation or model-based estimations). This provided a continuous, coherent dataset and reduced the likelihood of artifacts associated with the incomplete recordings. Cross-validation methods were used to evaluate the reproducibility of EEG signatures across subjects. Using spectral and temporal dynamics, differences between schizophrenia patients and healthy controls were systematically investigated. This enabled better target feature engineering in prediction and classification tasks. The results of the EDA segment provided a basis for customizing the preprocessing approaches, enhancing the effectiveness and dependability of the downstream DL models utilized in this research. This systematic data collection and understanding process helped us prepare the dataset for the following analytical methods. The steps taken in this phase to preprocess and analyze the data were integral in increasing the validity of the findings, ensuring that the study results were based on a strong data foundation.

Figure 3. EEG signal processing pipeline in relation to mental health prediction. It shows the workflow from EEG raw acquisition and pre-processing to time-frequency and feature extraction. The listed features are then used by discriminative models (SVM, LDA, RF, GRU) and deep learning models (GAN, DBN, CNN, LSTM, GRU) for classification to assist in clinical diagnosis, BCI applications, and neuroergonomics.

We collected EEG data through a BioSemi ActiveTwo system equipped with 64 channels and a sampling rate of 256 Hz. The study received ethical approval from the institutional review board after obtaining informed consent from all participants. The collection contains EEG recordings of 85 individuals including 42 schizophrenia patients and 43 healthy controls and features gender parity with 46% female participants and 54% male participants aged between 19 and 55. The EEG acquisition protocol combined measurements of the resting state with task-based stimuli. The raw data were processed in the 0.5 to 50 Hz range followed by Independent Component Analysis (ICA) processing to eliminate eye and muscle artifacts. We cannot share raw data due to privacy protection protocols, but we will release anonymized features and model code upon acceptable request to support reproducibility.

2.2. Handling Missing and Duplicate Values

Missing data is one of the major obstacles in the processing of EEG data, where sensor malfunctioning, movement artifacts, data collection, etc, cause the issue. Appropriate handling of missing values is crucial since random missing values can lead to considerable biases, skew statistical analyses, and compromise the precision and reliability of DL models [39]. To resolve this, we adopted a systematic and organized methodology to preserve the data’s integrity and prevent any inaccurate analysis thereafter. Dealing with missing values in the dataset required finding out what percentage of the data was missing. Records with a high presence of NaN values were detected and dropped. This step aimed to smooth out the influence of these records on the model training, avoiding noise, bias, or misleading patterns. Through the removal of overly incomplete records, the dataset was retained in such a way as to preserve robustness and representativeness for modeling and analysis. In cases where data was missing only in individual channels or at selected time points, suitable imputation methods were utilized to ensure data continuity. We used common imputation techniques, namely mean and median imputation, where the mean/median of the corresponding channel would replace missing data. Introducing synthetic data through imputation techniques, such as mean or median imputation, allowed the retention of information while reducing variance between the two datasets. Beyond simple imputation methods, interpolation techniques were used to reconstruct missing time series data using underlying temporal trends. Missing values were addressed using linear interpolation, which assumes a straight-line relationship between known data points. This considers data up to 20 elements, making approximations whereas cubic interpolation provides more advanced and smoother estimates by considering how curves are between things. The interpolation approaches enabled a more accurate recovery of missing information while preserving the natural variation in EEG signals. In addition to dealing with missing values, the dataset was checked for duplicate records that could add redundancy and distort analysis outputs. We then detected and eliminated duplicates, which helped minimize distortions in our dataset by preventing multiple entries with the same EEG signal. It was an essential step to maintain the data. Was not everyone made to be overrepresented in any particular distribution from such model training? After preprocessing with compression (removing duplicates) and changing text formats, the data was used to retain only complete observations that were accurate and reliable. This dataset-cleaning method resulted in a cleaner dataset and ensured the validity of subsequent analyses and model-building processes on the EEG data.

2.3. EEG Preprocessing: Noise Removal and Signal Enhancement

We applied preprocessing to our EEG data using EEGLAB toolbox version 2023.0 within MATLAB R2022b to maintain consistent and reproducible results for all recordings [44]. To minimize low-frequency drift and high-frequency muscle artifacts, the raw EEG data underwent initial band-pass filtering through a 4th-order Butterworth filter which set cutoff frequencies between 0.5 Hz and 50 Hz. The recordings underwent additional processing with a 50 Hz notch filter to remove power line noise. EEG channels with abnormal amplitudes or poor signal quality automatically received flags when a z-score threshold was exceeded and then underwent interpolation with spherical spline techniques. The FastICA algorithm executed Independent Component Analysis (ICA) to eliminate artifacts and set the number of independent components to match the 64 recorded EEG channels. Researchers identified artifact-related components such as eye blink signals, ocular movement patterns, muscle tension traces, and cardiac rhythm traces using spatial topography analysis in combination with temporal waveform observation and auxiliary EOG channel correlations followed by manual expert verification. We excluded these identified components prior to rebuilding the cleaned EEG signals for further analysis. The preprocessing strategy established a balance between robust artifact removal and neural signal preservation to create a clean dataset appropriate for deep learning-based relapse prediction.

We confirmed that all preprocessing steps maintained the full rank of the EEG data matrix before performing ICA decomposition to ensure its validity. Our preprocessing avoided any dimensionality reduction and rank-deficient interpolation along with aggressive referencing which would otherwise damage the data structure. This validation shows that the ICA algorithm processed a full-rank matrix which enables accurate source separation along with dependable artifact removal. The discussed practice matches the best-practice guidelines in [41] which highlight the critical role of preserving data rank for optimal ICA performance and preventing feature distortion.

2.4. Feature Extraction and Normalization

Once the EEG signals were cleaned and refined, meaningful representations were extracted from the raw data. This latter procedure was key for using less complicated signals without losing important content for detecting schizophrenia. The statistical characteristics were calculated for all channels in the EEG data, allowing the model to learn the basic distributional aspects of the EEG signals. Representations: Descriptive features included mean, standard deviation, skewness, and kurtosis, which provided insight into the central tendency, dispersion, and shape of the EEG signal distribution. Such features facilitated the quantification of differences in cortical activity and identifying possible anomalies related to schizophrenia. In addition to statistical features derived from APR, techniques from the frequency domain, such as the Fourier Transform, were applied to derive relevant features. Inspired by the Fourier Transform, we implemented this decomposition to isolate the various frequency components of EEG signals and analyze spectral patterns likely associated with neurological disorders. Power spectral density (PSD) was calculated to evaluate signal energy distribution in different frequency bands. Because different frequency bands (e.g., delta, theta, alpha, beta, gamma) are related to other cognitive and mental states, examining their spectral properties yielded additional insights into the neural properties that underlie schizophrenia. The combination of features gave the model a more complete description of the EEG signals, improving the model’s performance in classifying healthy and affected individuals. Due to the high variability of the EEG signal amplitudes between individuals and recording conditions, normalization was applied to make the extracted features consistent and comparable. Normalization is a crucial preprocessing step because EEG signals can vary significantly across individuals due to differences in electrode placement, scalp conductivity, and other physiological factors. These include two popular normalization methods: min-max normalization and Z-score normalization. The normalization technique used was a min-max normalization, which scales the features up to a fixed range (always between 0 and 1), allowing differences to remain while ensuring numerical stability. Conversely, Z-score normalization standardized the characteristics to conform to a standard normal distribution, with a mean of 0 and standard deviation of 1, reducing the effect of outliers and signal amplitude variation. Such normalization approaches made the dataset homogeneous, minimizing biases introduced due to amplitude variation and enabling DL to learn from a more consistent feature space. All of this enhanced the generalizability and robustness of the classification models, which helped detect schizophrenia using EEG signals, which were much more accurate and reliable.

2.5. Encoding Categorical Labels and Data Splitting

Label encoding was used to convert categorical class labels (schizophrenia and healthy control) into numerical representations in this study. As most algorithms require numerical input instead of categorical labels, this transformation was necessary to train DL models effectively. This process helped ensure that the respective classification algorithms could interpret the data accordingly by converting categorical labels to numerical values. In addition, label encoding allowed for the maintenance of the interpretability of the data and ensured compatibility with many classification methods for a smooth and effective modeling process. Subsequently, after the dataset is encoded, the dataset is divided into training and testing sets in an 80-20 ratio. This separation was performed in such a way as to ensure that the proportions of schizophrenia and healthy controls stayed the same in each part. This stratification was aimed primarily at avoiding a potentially damaging data imbalance that could offset the model’s performance. This increases the likelihood that the model will generalize well to unseen data because it adequately represents both categories in training and testing sets. By implementing this method, the model was less likely to develop a bias towards the majority class, improving its accuracy and trustworthiness when making predictions. Since the dataset contained an imbalance in class distribution, the Synthetic Minority Over-sampling Technique (SMOTE) was used to treat class imbalance further. One more advanced resampling technique is named SMOTE, which generates synthetic samples for the minority class, which means that, in our case, schizophrenia relapse cases, SMOTE performed an over-sampling algorithm to increase the representation of the minority class by adding artificial data points to reduce bias caused by class imbalance. Overall, this helped ensure that the model did not overly prefer the classes that dominated the dataset, improving overall classification accuracy and robustness. In this regard, the application of SMOTE was especially helpful, as the number of schizophrenia relapses in the dataset was inherently limited. Without proper weighting, the model could have poorly learned the patterns corresponding to these cases, resulting in undesirable performance. Synthetic samples improved the training process by allowing the model to understand schizophrenia relapse patterns better. In this way, we improved the classification accuracy, and, at the same time, we ensured that the model was fair and effective even during real-time prediction.

Figure 4 presents the four-phase deep learning pipeline to predict schizophrenia relapse from EEG signals, including EEG signal acquisition from subjects (Phase 1, the first step). Phase 2 includes preprocessing stages (e.g., signal filtering, artifact removal, normalization, epoch segmentation, feature extraction, and dimensionality reduction). In phase 3, a CNN-based deep learning model is used to classify the processed data. The fourth phase conducts a performance evaluation using the main metrics such as precision, sensitivity, specificity, F1 score, and Cohen’s Kappa coefficient.

2.6. Addressing Class Imbalance and Data Augmentation

Class imbalance is a common challenge in medical datasets, especially in the case of neurological and psychiatric disorders. When one category (e.g., healthy controls) is over-represented in the dataset compared to another category (e.g., subjects diagnosed with schizophrenia), such an imbalance may result in biased model performance, where the model favors the majority class but has lower prediction rates for minority class instances. Such bias can be detrimental in clinical applications, where it is important to classify affected individuals accurately for diagnosis and treatment planning. SMOTE was used to become aware of this problem and for more balanced data distribution regarding classes. The SMOTE technique is a sophisticated resampling algorithm that fights class imbalance by generating synthetic samples for the underrepresented class (rather than replicating the labels of existing instances). This is done using interpolation of new samples based on feature-space similarities between the existing examples of the minority class. As our synthetic data points reflect the original training samples, the SMOTE helps prevent overfitting and ensures that our model is trained on a balanced dataset. As a result, classification performance is enhanced, and the model can better predict, especially for minority cases. Including SMOTE in the preprocessing pipeline helped ensure that the dataset was more representative of real-world conditions and improved the model’s ability to distinguish between different classes more effectively. Moreover, data augmentation techniques were utilized to enhance this model’s robustness and generalizability, in parallel with addressing the class imbalance. As EEG signals are highly variable and susceptible to noise, special augmentation strategies are needed to enhance the classier models’ performance. Gaussian noise was added to EEG signals to add natural variability to the dataset. This mimicked real-world distractions, like electrical interference or physiological noise, exposing the model to more varied training instances. Introducing Gaussian noise, the model gained tolerance to small perturbations, resulting in better generalization across varying EEG recordings. Additionally, temporal-shift approaches were applied to change the temporal characteristics of the EEG signal. New variants can be introduced by making a time-wise shift of the signal sequences without modifying the major aspects of the data. This approach expanded the training set, so the model saw mutants with signal patterns that it did not see during training. Moreover, long EEG series were divided into overlapping windows, dramatically boosting training cases. Thus, in addition to enriching the dataset, we used this segmentation to help capture fine-grained temporal contexts that might be important for classification. Using SMOTE class balancing and several data augmentation approaches, the model’s generalization was improved across different EEG samples. These preprocessing procedures were crucial in achieving a more accurate and reliable classification system, serving as a foundation for the robustness of the predictive framework.

We obtained sentiment-derived features from structured clinical interviews and patient journal entries through a BERT-based sentiment classification model which was pretrained and fine-tuned using a mental health-specific corpus. Sentiment scores were summarized on a weekly basis for each patient and scaled between −1 and +1 to indicate affective polarity trends. Our dataset partitioning strategy prevented patient data from appearing in both training and validation/testing sets to avoid patient-level data leakage. We implemented a stratified 5-fold cross-validation method where each validation held unique patient groups to maintain temporal and subject-wise independence.

2.7. Model Development

This DL architecture is designed to efficiently capture both local spatial patterns and global temporal dependencies from EEG signals. Input Layer: The input is a 1D EEG signal sequence with 18 channels, reshaped to match the model’s requirements. Each input sample is structured with shape (1, 18), effectively allowing the CNN to operate over the spatial features. CNN Layer: A 1D convolutional layer with 32 filters and a kernel size of 3 is applied to extract localized spatial features. It is followed by a ReLU activation function and a MaxPooling layer with a kernel size of 2, reducing dimensionality and enhancing important patterns. CNNs are effective in handling high-dimensional signals and learning discriminative representations. Transformer Encoder Block: The output from the CNN is rearranged and fed into a Transformer Encoder consisting of 2 layers with multi-head self-attention (2 heads). This module captures long-range dependencies across time and EEG channels, enabling global contextual understanding. Residual connections and layer normalization within the encoder improve model stability and convergence. Flattened and Fully Connected Layers (Classification Head): After attention processing, the features are flattened and passed through a fully connected block. It includes a linear layer with 64 neurons, ReLU activation, dropout (0.3), and a final linear layer for classification. This component acts as the classification head, mapping learned representations to class probabilities. Output Layer: The final output consists of logits representing the probability scores for each class (Schizophrenia or Healthy). A softmax activation is applied during evaluation to derive predicted class probabilities for binary classification.

2.8. Experimental Setup and Computational Resources

The workstation utilized for all experiments had an NVIDIA RTX 3090 GPU (24 GB VRAM; NVIDIA Corporation, Santa Clara, CA, USA), 128 GB RAM, and an AMD Ryzen Threadripper 3970X CPU (Advanced Micro Devices, Inc., Santa Clara, CA, USA), and was located at AIST research center, University of Tabuk, Saudi Arabia.

We implemented the model with PyTorch 1.13 and trained it through 150 epochs while using a batch size of 64. The training process required 90 s for each epoch which added up to approximately 3.75 h for the entire training period. The Adam optimizer was used with a starting learning rate of 0.0001 while applying ReLU activation functions and setting dropout probability to 0.3. We utilized early stopping based on validation F1-score to avoid overfitting during training. Hyperparameters were optimized through 5-fold cross-validation.

Figure 5 provides a comparative study between Proactive Relapse Detection and Post-Relapse Analysis in the context of mental health. On the left is the proactive view which puts early warning at the forefront: real-time data analysis to the left for timely alerts, patient engagement, predictive insights, and dynamically adapting treatment. These interventions serve to preempt relapse via immediate feedback and tailored techniques. On the rightmost side, the post-relapse analysis is to gain insights regarding the past events from the observation history, observe learned interpretation and prediction models for outcomes and to examine the statistics. This is the side of the balance that favors recovery and future risk mitigation via pattern recognition, medical history analysis and data-driven prediction. The schema highlights the need for the combination of these two strategies to form a complete relapse control system.

Algorithm 1 evaluates the symptoms (hallucinations, delusions, disorganized speech, emotional withdrawal, etc.) based on a set of conditions. Other parameters are mentioned in Table 2. Since the decision-making process involves simple condition checks for each symptom, the time complexity for evaluating these conditions scales linearly with the number of symptoms n. Each condition is checked constantly, so the overall complexity is O(n).

Algorithm 1: Schizophrenia Early Detection

Like the early detection algorithm, this relapse prediction Algorithm 2 evaluates symptoms, medication adherence, stress levels, and sleep quality. Each of these factors is evaluated in constant time, and since they are independent checks, the overall time complexity scales linearly with the number of factors n involved in the prediction. Therefore, the time complexity is O(n).

Algorithm 2: Schizophrenia Relapse Prediction

3. Problem Formulation

This section presents a detailed mathematical modeling of our hybrid deep learning framework for predicting schizophrenia relapse using EEG signals, clinical features, and behavioral sentiment data. The system integrates spatial feature extraction (CNN), temporal modeling (Transformer), and multi-source data fusion. The complete process is formalized below with 45 equations, each thoroughly explained.

3.1. Input Representation and Normalization

P (R) = F (X_{E E G}, X_{c l i n i c a l}, X_{s e n t i m e n t}, X_{h i s t o r y})

(1)

Equation (1) defines the relapse probability as a function over EEG signals, clinical features, sentiment data, and historical relapse inputs. It serves as the entry point for our multimodal hybrid deep learning model. The function F encapsulates the full CNN-Transformer pipeline. It integrates diverse data types into one predictive mapping.

X_{E E G}^{n o r m} (t) = \frac{X_{E E G} (t) - μ}{σ}

(2)

Z-score normalization presented in Equation (2) standardizes EEG input signals by removing mean

μ

and scaling by standard deviation

σ

. This preprocessing step reduces inter-subject variability. It also ensures the model can generalize across individuals. Normalized EEG improves numerical stability.

F_{f r e q} (k) = {|\sum_{n = 0}^{N - 1} X_{E E G} (n) \cdot e^{- j 2 π k n / N}|}^{2}

(3)

Equation (3) applies a Discrete Fourier Transform (DFT) to EEG signals. It extracts frequency-domain information such as power in alpha, beta, and delta bands. These are relevant for detecting schizophrenia-related neural oscillations. The result highlights signal distortions.

H_{c n n} (t) = ReLU (W_{c o n v} * X_{E E G}^{n o r m} (t) + b_{c o n v})

(4)

A convolutional neural network layer processes normalized EEG input presented in Equation (4). It detects spatial and local temporal features in brain activity. The ReLU function introduces non-linearity. This layer is essential for capturing brain signal dynamics.

3.2. Transformer-Based Temporal Encoding

Z_{Q, K, V} = W_{Q, K, V} \cdot H_{c n n} (t)

(5)

CNN outputs presented in Equation (5) are linearly projected into query, key, and value matrices. These form the basis for the Transformer attention mechanism. Each matrix allows learning of temporal relationships across EEG segments. It sets the stage for contextual attention.

A (i, j) = softmax (\frac{Q_{i} \cdot K_{j}^{T}}{\sqrt{d_{k}}})

(6)

Equation (6) computes the attention score between two EEG time points i and j. A dot product similarity is scaled and normalized. This score allows focusing on key moments in the EEG. The softmax ensures weights sum to one.

Z_{a t t n} = \sum_{j = 1}^{T} A (i, j) \cdot V_{j}

(7)

Weighted summation of value vectors is performed in Equation (7) using attention scores. This generates a context vector summarizing important time steps. It allows the network to incorporate temporal dependencies. The output is rich in sequential information.

H_{t r a n s} (t) = LayerNorm (Z_{a t t n} + H_{c n n} (t))

(8)

Equation (8) combines CNN and Transformer outputs through residual connections. Layer normalization is applied for stable learning. It prevents internal covariant shifts. This fusion strengthens model capacity.

Y_{p r e d} = softmax (W_{f c 2} \cdot ReLU (W_{f c 1} \cdot H_{t r a n s} (t) + b_{f c 1}) + b_{f c 2})

(9)

A two-layer fully connected classifier presented in Equation (9) maps learned features to relapse probability. ReLU adds non-linearity, and softmax generates class probabilities. The output is interpretable as likelihoods. This is the decision layer of the model.

S_{t o t a l} = \sum_{t = 1}^{T} α_{t} \cdot S_{t}^{s e n t i m e n t}

(10)

Equation (10) aggregates sentiment features across time using dynamic weights

α_{t}

. It captures behavioral instability preceding relapse. By weighting emotional signals, it enhances context-awareness. Sentiment time-series are derived from patient speech/text.

M_{s c o r e} = \frac{1}{1 + e^{- (β_{0} + β_{1} M)}}

(11)

Medication adherence presented in Equation (11) is transformed into a probabilistic score using logistic regression. Non-adherence increases relapse risk, captured by

β_{1}

. This models clinical compliance effects. The output integrates into overall relapse fusion.

R_{a g g} = \sum_{i = 1}^{n} w_{i} X_{i}^{c l i n i c a l}

(12)

Equation (12) aggregates clinical variables such as age, diagnosis type, and comorbidities. Weights

w_{i}

are learnable and represent feature importance. The model prioritizes stronger relapse indicators. It builds a global clinical health score.

R_{t o t a l} = λ_{1} Y_{p r e d} + λ_{2} S_{t o t a l} + λ_{3} M_{s c o r e}

(13)

A weighted sum presented in Equation (13) combines relapse predictions from EEG, sentiment, and medication adherence.

λ_{i}

are fusion coefficients trained during backpropagation. This hybrid index increases predictive accuracy. It yields the model’s final relapse risk.

X_{a u g}^{E E G} = X_{E E G} + ϵ_{n o i s e}

(14)

Data augmentation from Equation (14) is applied by injecting Gaussian noise

ϵ_{n o i s e}

into EEG signals. This simulates real-world artifacts and improves generalization. The model learns to be robust to noisy measurements. Helps overcome data scarcity.

X_{s h i f t e d} (t) = X_{E E G} (t + δ)

(15)

Temporal shifting of EEG in Equation (15) mimics delayed or misaligned brain responses. The shift

δ

varies randomly across samples. It supports learning of time-invariant features. Aids generalization across asynchronous event patterns.

X_{w i n d o w} (i) = X_{E E G} (t_{i} : t_{i} + L)

(16)

EEG signals presented in Equation (16) are segmented into overlapping windows of length L. Each

X_{w i n d o w} (i)

captures localized signal activity. This improves spatial-temporal encoding. It supports CNN-based feature extraction.

L_{C E} = - \sum_{i = 1}^{N} y_{i} \log ({\hat{y}}_{i})

(17)

Cross-entropy loss from Equation (17) is used for supervised classification between true labels

y_{i}

and predictions

{\hat{y}}_{i}

. It penalizes incorrect assignments. A lower

L_{C E}

indicates better prediction alignment. Optimized during training.

L_{t o t a l} = L_{C E} + α \cdot L_{r e g}

(18)

Equation (18) adds a regularization term

L_{r e g}

to the main classification loss. The coefficient

α

balances performance and overfitting. Regularization helps generalize across test samples. It is critical in small-sample medical tasks.

F_{s c o r e} = \frac{2 P R}{P + R}

(19)

The F1-score from Equation (19) combines precision P and recall R into a harmonic mean. This metric is valuable in imbalanced data. A high F1-score means fewer false negatives. Critical for clinical reliability.

Q (s, a) = r + γ \cdot \max_{a^{'}} Q (s^{'}, a^{'})

(20)

Q-learning update from Equation (20) used in reinforcement learning. It models sequential decision-making for adaptive interventions. The model updates expected reward

Q (s, a)

based on observed reward r and future value. Useful in dynamic patient monitoring.

W_{g l o b a l} = \frac{1}{N} \sum_{i = 1}^{N} n_{i} W_{i}

(21)

Federated averaging from Equation (21) computes a global weight vector across N local models. Each weight

W_{i}

is scaled by local data size

n_{i}

. Enables decentralized learning without sharing raw data. Critical for privacy-aware hospital collaboration.

MCC = \frac{T P \cdot T N - F P \cdot F N}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}}

(22)

The Matthews Correlation Coefficient presented in Equation (22) balances all outcomes from the confusion matrix. It is especially robust in imbalanced scenarios. MCC ranges from −1 to 1; higher is better. Preferred for medical diagnostics with unequal class distributions.

X_{c l i n i c a l}^{n o r m} = \frac{X - μ}{σ}

(23)

Equation (23) normalizes clinical variables such as lab values and demographics. It ensures all features contribute equally to learning. Reduces biases due to different scales. Enhances fusion compatibility with EEG features.

R_{t} = α_{1} R_{t - 1} + α_{2} R_{t - 2} + ϵ_{t}

(24)

An autoregressive (AR) model from Equation (24) describes relapse dynamics over time. It reflects temporal dependency on past relapse scores. Coefficients

α_{1}, α_{2}

determine memory depth. Noise term

ϵ_{t}

captures unmodeled variability.

Y_{b i n a r y} = \{\begin{matrix} 1, & R_{t o t a l} \geq θ \\ 0, & otherwise \end{matrix}

(25)

A thresholding rule presented in Equation (25) converts the continuous relapse risk

R_{t o t a l}

into a binary decision. Patients above

θ

are predicted to relapse. This supports clinical actionability. Enables classification-based evaluation metrics.

L_{f u s e} = \sum_{i = 1}^{T} {∥ Z_{i}^{E E G} - Z_{i}^{c l i n i c a l} ∥}^{2}

(26)

Fusion loss from Equation (26) minimizes the distance between EEG and clinical embeddings. It encourages the learning of shared latent patterns. The alignment enforces multimodal coherence. Important for synchronized interpretation of biological and clinical data.

A_{m u l t i} (t) = \sum_{h = 1}^{H} W_{h} \cdot A_{h} (t)

(27)

Multi-head attention from Equation (27) combines outputs from H different attention heads. Each

A_{h} (t)

captures distinct relational patterns in time. Weighted sum aggregates global context. Improves interpretability and robustness.

E_{g l o b a l} = \frac{1}{N} \sum_{i = 1}^{N} {∥ {\hat{Y}}_{i} - Y_{i} ∥}^{2}

(28)

Equation (28) computes global mean squared error across all clients in federated learning. It helps in monitoring convergence during training. Minimizing

E_{g l o b a l}

promotes global model consistency. Vital for decentralized relapse prediction.

R_{s p i k e} = \sum_{t = 1}^{T} δ (| \nabla X_{E E G} (t) | > τ)

(29)

Equation (29) presents penalty term flags, sharp spikes in EEG signal gradients. It activates when change exceeds a threshold

τ

. Reduces model sensitivity to noise artifacts. Encourages smoother signal interpretation.

C_{e n t r o p y} = - \sum_{j = 1}^{C} p_{j} \log (p_{j})

(30)

Classification entropy from Equation (30) quantifies uncertainty in model predictions. Higher entropy indicates indecisiveness. Useful for trust calibration and uncertainty-based pruning. Helps filter unreliable outputs.

Δ_{g r a d} = {∥ \nabla W_{t} - \nabla W_{t - 1} ∥}^{2}

(31)

Equation (31) computes the squared norm of gradient differences between consecutive epochs. Large changes may indicate instability. Monitoring

Δ_{g r a d}

helps in diagnosing learning issues. It supports early stopping and adaptive learning rate strategies.

Z_{f u s e d} = ϕ (Z_{E E G} ∥ Z_{c l i n i c a l})

(32)

A fusion function from Equation (32)

ϕ

combines EEG and clinical embeddings into a joint representation. Concatenation (

∥

) aggregates multimodal features. The fused vector enables holistic understanding. Supports downstream decision-making.

L_{c o n t r a s t} = \sum_{i, j} 1_{y_{i} = y_{j}} \cdot {∥ z_{i} - z_{j} ∥}^{2}

(33)

Contrastive loss from Equation (33) pulls together embeddings of samples from the same class. It improves class-specific cluster formation. This strengthens feature discrimination. Especially useful for medical cases with subtle inter-class differences.

W_{E E G}^{i n i t} = Pretrain (X_{E E G}, Y_{r e l a p s e})

(34)

Equation (34) initializes EEG-related weights via pretraining on historical relapse labels. Pretraining accelerates convergence and improves early performance. Enables reuse of prior knowledge. Reduces dependency on labeled data.

KL (P ∥ Q) = \sum P (x) \log (\frac{P (x)}{Q (x)})

(35)

KL divergence from Equation (35) measures how prediction distribution P deviates from reference Q. It is used to monitor dataset shift or training instability. Helps maintain consistency between client models in federated settings. Useful for regularization.

θ_{f i n a l} = \arg \min_{θ} L_{t o t a l} (θ)

(36)

Final model parameters presented in Equation (36)

θ_{f i n a l}

are obtained by minimizing total loss. This marks the end of the optimization process. Optimal

θ

improves generalization. Supports deployment readiness.

Z_{r e s} (t) = Z_{f u s e d} (t) + P E (t)

(37)

Equation (37) adds positional encoding

P E (t)

to the fused embeddings. It introduces temporal awareness in Transformer models. Helps distinguish events across time. Critical for sequence modeling.

{AUC}_{R O C} = \int_{0}^{1} T P R (F P R^{- 1} (x)) d x

(38)

Area under the ROC curve from Equation (38) quantifies the trade-off between sensitivity and specificity. AUC is threshold-independent. High AUC indicates strong discrimination power. Essential for clinical model evaluation.

R_{d r o p} = \sum_{l} λ_{l} {∥ W_{l} ∥}_{1}

(39)

Equation (39) applies L1 regularization across model layers. It encourages sparsity in weight matrices. Helps prune irrelevant neurons. Reduces model complexity.

Y_{r e l a p s e} (t) = σ (\sum_{i = 1}^{d} w_{i} z_{i} (t) + b)

(40)

The sigmoid classifier from Equation (40) maps fused features into relapse probability. Weighted sum of latent vector

z_{i}

captures predictive signal. Output is interpretable and bounded. Forms the core relapse prediction output.

Ω_{t r u s t} = \frac{True Positive Decisions}{All Trusted Decisions}

(41)

This trust metric from Equation (41) quantifies the reliability of model decisions. A higher

Ω_{t r u s t}

indicates more consistent correct predictions. Useful for clinicians to gauge confidence in predictions. Supports safe AI deployment in sensitive contexts.

L_{e n t r o p y_m a s k} = \sum_{t} m_{t} \cdot C_{e n t r o p y} (t)

(42)

Entropy presented in Equation (42) masking suppresses ambiguous EEG segments during training.

m_{t}

acts as a binary mask highlighting valid time steps. Reduces noisy attention behavior. Enhances model robustness.

I_{g r a d} = \frac{\partial Y_{r e l a p s e}}{\partial X_{E E G} (t)}

(43)

Saliency map technique from Equation (43) measures how much the EEG input at time t affects the output. It enables interpretability by identifying critical time points. A key method in explainable AI (XAI). Helps gain clinician trust.

P_{s t a b l e} = P (| θ_{t} - θ_{t - 1} | < ϵ)

(44)

Training stability presented in Equation (44) is measured by how much model parameters change between epochs. Low fluctuation implies convergence. Useful for early stopping and training diagnostics. Indicates when learning has saturated.

F_{f i n a l} (x) = ψ (G (x_{E E G}), x_{c l i n i c a l}, x_{s o c i a l})

(45)

The final relapse prediction function presented in Equation (45)

ψ

fuses EEG embeddings, clinical variables, and behavioral data. This comprehensive model accounts for neurological, medical, and social dimensions. Supports robust, context-aware prediction. Enables multimodal decision-making. Further, Table 3 lists the acronyms used in this paper.

4. Results

This study aimed to propose and validate a robust predictive model for the relapse of schizophrenia based on integrating a hybrid DL model consisting of CNN and Transformer architectures trained on EEG signals. Relapse of schizophrenia is still one of the most challenging problems in clinical practice, and in more than half of cases will occur within one year after cessation of the treatment [2]. This relapse represents a significant barrier to enhancing long-term outcomes since, now that the relapsing nature of psychosis has been established, it becomes a revolving door of decompensating, hospitalization, medication adjustment, and a compromised quality of life. Early prediction of relapses is important, as it facilitates proactive intervention by healthcare providers, ultimately mitigating the severity of symptoms and the intensity of treatment required. However, models for predicting relapses have encountered some limitations: low accuracy, poor sensitivity, and a high rate of false positives. In this paper, we presented a novel hybrid model that combined the spatial feature extraction capacity of CNNs and the temporal dependency modeling capability of the Transformer model. This fusion allowed the model to capture local EEG signal characteristics and large-scale interactions between EEG channels over time. The model was superior to existing state-of-the-art models, substantially improving predictive accuracy and reliability in the clinical setting.

4.1. Performance of the Hybrid Model

The recently conducted research on schizophrenia relapse prediction was based on Long Short-Term Memory (LSTM) and on CNN-based models that, although effective to some extent, provide more or less 70% to 85% accuracy. In addition, many of these algorithms have poor recall (sensitivity), missing many relapse-prone patients. For instance, the relapse prediction had a 75% accuracy, where the recall was far lower than what was desirable, thus leading to false negatives and missed interventions in its early stage [23]. By contrast, our hybrid CNN-Transformer model achieved a high-octane accuracy of 97%, which is notably higher than these previous models. The self-attention mechanism in the Transformer module can focus on both spatial and temporal features in EEG data and, thus, is more capable of capturing subtle patterns related to relapse risk. This advancement will fix the problem of the low recall and sensitivity evident in previous work, and it will make our model highly reliable in detecting risk of relapse. There have been studies that have demonstrated significant deficiencies in relapse-prediction models. By using deep learning models that underperformed due to the class imbalance issue and made biased predictions, with healthy patients being frequently predicted as relapse-susceptible [7,11]. Furthermore, these models had poor precision, leading to more false positives and making clinical decisions more difficult. Figure 6 provides the complete performance analysis of a schizophrenia relapse prediction model, according to multiple metrics and methods of comparison. Sub-figure (a) presents the ROC curve whose AUC value is 1.00, corresponding to perfect classification ability. Sub-figure (b) presents the Precision–Recall (PR) curve, which shows that the model has a good balance between precision and recall. Sub-figure (c) shows a calibration plot for predicted probabilities to the real situation in both internal and external validation. Sub-figure (d) presents a histogram of predicted probabilities for schizophrenia classification which has a very clear bimodal distribution reflecting confident classification. Sub-figure (e) shows Accuracy, Precision, Recall, and F1-score of four models (MultiViT, MentalRoBERTa, XGBoost, and the proposed network), and the proposed model obtains the highest performance. Finally, sub-figure (f) compares Sensitivity (TPR) and Specificity (TNR) between the same techniques, again demonstrating the superior performance of the proposed method. This result in general shows the high reliability and efficiency of the suggested deep learning-based approach.

Figure 7 showed a multi-dimensional goodness-of-fit summary for the progression model to predict schizophrenia relapse using diverse performance measures and comparisons. Panel (a) shows the ROC curve with an area under the curve (AUC) of 1.00, which means the classification is excellent. The Precision–Recall curve shown in sub-figure (b) indicates high precision at all levels of recall. Tuning of the model is confirmed with the help of calibration plot. Sub-figure (c) shows predicted probabilities against actual outcomes using both internal and external validation, demonstrating appropriate calibration of the model. The histogram of the predicted probability distribution for the schizophrenia predictions is shown in sub-figure (d) which is clearly separated at the ends (close to 0 or 1), also indicating confident predictions. Sub-figure (e) shows a comparison of Accuracy, Precision, Recall, and F1-score over the four techniques, including MultiViT, MentalRoBERTa, XGBoost, and the proposed framework, demonstrating that the proposed model achieves significantly better performance compared to the other methods. Figure 7f analyzes Sensitivity (TPR) and Specificity (TNR), and we can see that the proposed model again outperforms others by better distinguishing schizophrenia and non-schizophrenia cases correctly. These three sub-figures collectively provide evidence for the robustness, reliability, and prediction power of the developed deep learning framework.

Table 4 shows the classification performance of the proposed model per class (healthy (0) and schizophrenia (1)). It shows important evaluation metrics such as Precision, Recall, F1-Score, and Samples per class. The row Accuracy reflects general prediction accuracy across all the samples. Macro Average takes the average of all metrics for all classes irrespective of how many samples are in each class; here, classes will be given equal weight, whereas Weighted Average will use class as a factor and provide weights according to the number of samples in each class. The Matthews Correlation Coefficient (MCC) provides a strong assessment of the classification quality, particularly in the case of imbalanced datasets. Finally, this AUC value proved excellent discrimination ability of the model, and the score was perfect (1.0).

Table 5 compares the classification performance for four techniques, XGBoost, MultiViT, MentalRoBERTa, and the Proposed Framework using a range of metrics that include Precision, Recall, F1-Score, Sensitivity (True Positive Rate), Specificity (True Negative Rate), F-Score, and Accuracy. The reported results demonstrate the superiority of the proposed framework over all other methods across all metrics; achieving near perfect precision and recall and 0.95 specificity. Conversely, conventional models such as MentalRoBERTa and MoreViT show poor sensitivity and specificity performance. The results underscore the robustness and generalization power of the proposed model in correctly identifying cases of schizophrenia from the EEG data.

4.2. Statistical Analysis

We evaluated the statistical significance of performance differences between our CNN-Transformer fusion model and baseline methods through the application of the Wilcoxon signed-rank test to F1-score, accuracy, and AUC-ROC results from five cross-validation iterations. We chose this non-parametric test because our performance data displayed non-normal distribution and paired characteristics. Through our statistical analysis we found that our model’s performance surpassed conventional CNNs and Transformer-only models as well as DenseNet baselines, with results showing statistically significant differences (p < 0.05). The results validate that our model’s enhanced performance outcomes are genuine rather than random deviations, which confirms its superiority as reliable.

4.3. Ablation Study and Contribution Analysis

We performed an ablation study that evaluated each core module’s effect on final performance to clarify the novelty and differentiate specific contributions of various architectural and data-handling components. Specifically, we compared the proposed CNN-Transformer fusion model against its reduced variants: We evaluated four reduced model variants including (i) a CNN-only model and (ii) a Transformer-only model along with (iii) a combined CNN + Transformer model without multimodal fusion and (iv) a CNN + Transformer model with multimodal fusion but without SMOTE-based balancing. Each model received the same training conditions with identical dataset and hyperparameter settings. The CNN-only model excelled in processing spatial EEG data but encountered issues maintaining temporal consistency when symptoms developed sequentially. The Transformer-only model demonstrated sensitivity to time-based patterns but failed in spatial differentiation which led to reduced precision in detecting event-specific patterns. The mixture of CNN and Transformer models demonstrated a 5.7% increase in F1-score compared to the best single-mode setup, which proves the advantages of combined spatial-temporal learning. The fusion of multimodal data sources, including EEG readings alongside clinical scores and sentiment analysis, improved recall by 3.9% and boosted AUC-ROC which shows how behavioral and contextual information enhances relapse prediction accuracy. The application of SMOTE led to a notable increase in sensitivity for the minority class by 6.2%, while simultaneously decreasing the model’s predisposition towards predicting non-relapse outcomes. The evaluation shows substantial contributions from spatial learning, temporal modeling, multimodal fusion, and class rebalancing toward enhanced final model performance. Their collaboration creates a synergistic system that surpasses traditional deep learning models which use separate signal processing or common pipelines. The modular breakdown showcases our method’s novelty while proving that our contribution surpasses a simple combination of existing techniques.

We performed an extra ablation test to validate our model and address overfitting concerns by studying how each data type—EEG data, clinical information, and sentiment analysis—affected model accuracy. The model achieved an F1-score of 89.2% when it was trained exclusively on EEG input data. The F1-score increased to 92.1% after clinical scores were added to the model, demonstrating their predictive contribution. The F1-score reached 94.8% after the inclusion of sentiment-derived features which revealed that behavioral and affective cues offer both non-redundant and complementary information. The superior performance of the CNN-Transformer model results from the combined use of multiple modalities instead of excessive fitting to individual data sources. Our multimodal fusion approach demonstrates both strong stability and wide applicability based on the experimental results.

4.4. External Validation and Generalizability

Our external validity assessment and generalizability test used a separate dataset composed of EEG recordings from 50 schizophrenia patients collected at three clinical sites. The dataset featured changes in hardware equipment and clinical procedures along with diverse patient demographic data which served as a comprehensive test for model flexibility. This external cohort demonstrated that the proposed CNN-Transformer model reached 94.5% accuracy along with 0.96 AUC-ROC and 0.92 F1-score. The model demonstrates consistent performance between internal validation and external application across varied populations. Internal testing employed a 5-fold stratified cross-validation process at the patient level to prevent data leakage and maintain completely separate subject groups across validations. The combined findings enhance both our model’s trustworthiness and its applicability across various clinical environments.

5. Discussion

The proposed research introduces a combined CNN-Transformer framework to anticipate schizophrenia relapse through EEG signals while incorporating sentiment indicators and clinical evaluation metrics. Statistical tests using the Wilcoxon signed-rank method confirm that the proposed model achieves superior performance to current baselines based on F1-score, sensitivity, and AUC-ROC metrics. The study indicates that merging spatial feature learning through CNNs with temporal dependency modeling via Transformer attention mechanisms proves especially useful for handling the irregular and nonlinear EEG patterns observed in schizophrenia patients. Traditional deep learning models depend only on EEG signals or structural neuroimaging whereas our approach includes clinical behavioral markers and affective features to generate a comprehensive image. Traditional unimodal studies [14,15,16,17,18,19,20,21] face challenges in generalizability and miss important relapse indicators. Our model utilizes attention-driven fusion to merge multimodal signals, which enhances the levels of context-awareness and personalization. Using synthetic minority over-sampling through SMOTE helped reduce performance bias toward majority classes, which resulted in the increased recall for relapse cases that is essential for clinical applications.

This study achieved good results but still has its limitations. The dataset demonstrates diversity but remains too small to ensure broad demographic applicability. Although EEG artifacts were removed with precision, deep learning performance remains susceptible to the natural noise and variability present in biosignals. The model’s training on cross-sectional data without time-series follow-up prevents it from accurately predicting long-term relapse trajectories. Attention mechanisms enhance interpretability but cannot fully replace explanations that have been clinically validated. It demonstrates clinical evidence that supports the application of AI-based monitoring systems to predict relapse at early stages. This proposed model can serve real-time neuro-monitoring environments and digital mental health services to identify high-risk episodes before symptoms worsen, which allows timely medical response. The multimodal approach of this method matches well with psychiatric evaluations because such evaluations require an integration of objective neuro-signals together with subjective assessments and behavioral analysis. Upcoming studies will aim to enlarge the dataset by partnering with multiple centers and including long-term EEG recordings to monitor how the disease progresses over time. Researchers plan to merge real-world patient-reported outcomes into their data and investigate federated learning methods to protect patient privacy during model training. The model’s clinical effectiveness will improve by incorporating advanced XAI modules for better explainability and through clinician feedback validation of model outputs. Our model strengthens clinical framing and interpretability by using attention maps and saliency visualizations to show which EEG segments and spatial regions have the most impact on each prediction. The tools help improve transparency while enabling clinical professionals to better understand relapse risks. Our upcoming research objectives include developing SHAP-based analytical methods and producing straightforward explanations that clinicians can easily understand to assist in decision-making processes. This approach connects technical advances with psychiatric practicality to enable real-world mental health care adoption of the model.

Our model shows potential clinical usefulness but faces real-world obstacles that hinder immediate implementation. Psychiatric applications of EEG monitoring confront obstacles including the expense and complexity of continuous data collection, inconsistent patient use of EEG wearables and ambulatory devices, and technical malfunctions during extended operation. Multiple barriers exist which could obstruct widespread real-time deployment of the system, particularly in settings with limited resources. To bring our personalized treatment planning model into actual clinical practice would necessitate integration into current clinical workflows by creating user-friendly interfaces alongside EHR compatibility and establishing clinician feedback loops. The next phase of research will resolve translational gaps through the creation of strong EEG data collection systems and decision aid platforms designed to match psychiatric practice needs.

6. Conclusions

In this study, we presented a Transformer-Based DL Model for schizophrenia relapse prediction using EEG data, which has shown effectiveness. The model achieved state-of-the-art performance (97% accuracy) with its combination of CNNs and Transformer architectures. The integration of EEG data with clinical evaluations and sentiment analysis demonstrates the model’s strength in comprehensively modeling relapse prediction’s neurological and psychological dimensions. Therefore, with the addition of multimodal data, it has been shown that the robustness and generalization of the model improve, giving a holistic view of the patient’s mental state and giving more accurate predictions than a feasible model in a clinical setting. The accessibility to reduce false negative cases and adjust the precision–recall tradeoff to clinical needs provides practical benefits to the model as a method to implement in real-life clinical circumstances. Additionally, using SMOTE and data augmentation addressed class imbalance, ensuring fair model performance across diverse patient populations. Even though it showed encouraging results, there are still obstacles to overcome, such as variations in EEG signal quality, data privacy issues, and the fusion of multiple data sources. Addressing these challenges is important for improving the model’s generalizability and performance in various patient populations. Further work will involve data augmentation, improving interpretability, and large-scale validation to assess its real clinical use.

Author Contributions

Conceptualization, S.Y., M.A. (Muhammad Adeel) and U.D.; methodology, U.D. and T.A.; software, S.Y. and M.H.; validation, First, M.H. and M.A. (Muhammad Adeel); formal analysis, T.A. and U.D.; investigation, M.A. (Muhammad Ayaz) and M.H.; resources, A.M.M.; data curation, S.Y.; writing—original draft preparation, M.A. (Muhammad Adeel) and S.Y.; writing—review and editing, M.A. (Muhammad Ayaz); visualization, M.H.; supervision, M.A. (Muhammad Ayaz) and T.A.; project administration, M.H.; funding acquisition, T.A. and M.A. (Muhammad Ayaz). All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by a research grant from the Research, Development, and Innovation Authority (RDIA), Saudi Arabia, grant no. 13010-Tabuk-2023-UT-R-3-1-SE.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data, models, or codes supporting the findings of this study are available upon request from the corresponding author.

Acknowledgments

This work is supported by a research grant from the Research, Development, and Innovation Authority (RDIA), Saudi Arabia, grant no. 13010-Tabuk-2023-UT-R-3-1-SE.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

Kendler, K.S. Eugen Bleuler’s Views on the Genetics of Schizophrenia in 1917. Schizophr. Bull. 2020, 46, 758–764. [Google Scholar] [CrossRef] [PubMed]
Ranjan, R.; Sahana, B.C.; Bhandari, A.K. Deep learning models for diagnosis of schizophrenia using EEG signals: Emerging trends, challenges, and prospects. Arch. Comput. Methods Eng. 2024, 31, 2345–2384. [Google Scholar] [CrossRef]
Amleshi, R.S.; Ilaghi, M.; Rezaei, M.; Zangiabadian, M.; Rezazadeh, H.; Wegener, G.; Arjmand, S. Predictive utility of artificial intelligence on schizophrenia treatment outcomes: A systematic review and meta-analysis. Neurosci.-Obehavioral Rev. 2025, 169, 105968. [Google Scholar] [CrossRef] [PubMed]
Zukowska, Z.; Allan, S.; Eisner, E.; Ling, L.; Gumley, A. Fear of relapse in schizophrenia: A mixed-methods sys-tematic review. Soc. Psychiatry Psychiatr. Epidemiol. 2022, 57, 1305–1318. [Google Scholar] [CrossRef]
Peritogiannis, V.; Ninou, A.; Samakouri, M. Mortality in schizophrenia-spectrum disorders: Recent advances in understanding and management. Healthcare 2022, 10, 2366. [Google Scholar] [CrossRef]
Patel, K.P.; Oroszi, T.L. Schizophrenia Management Approaches: A Look at Progress and Challenges. World J. Neurosci. 2024, 15, 13–34. [Google Scholar] [CrossRef]
Lamichhane, B.; Zhou, J.; Sano, A. Psychotic relapse prediction in schizophrenia patients using a personalized mo-bile sensing-based supervised deep learning model. IEEE J. Biomed. Health Inform. 2023, 27, 3246–3257. [Google Scholar] [CrossRef]
Zhang, T.; Schoene, A.M.; Ji, S.; Ananiadou, S. Natural language processing applied to mental illness detection: A narrative review. Npj Digit. Med. 2022, 5, 1–13. [Google Scholar] [CrossRef]
Shafer, A.; Dazzi, F. Meta-analytic exploration of the joint factors of the Brief Psychiatric Rating Scale–Expanded (BPRS-E) and the positive and negative symptoms scales (PANSS). J. Psychiatr. Res. 2021, 138, 519–527. [Google Scholar] [CrossRef]
Oliver, D.; Arribas, M.; Perry, B.I.; Whiting, D.; Blackman, G.; Krakowski, K.; Seyedsalehi, A.; Osimo, E.F.; Griffiths, S.L.; Stahl, D. Using Electronic Health Records To Facilitate Precision Psychiatry. Biol. Psychiatry 2024, 96, 532–542. [Google Scholar] [CrossRef]
Kanyal, A.; Mazumder, B.; Calhoun, V.D.; Preda, A.; Turner, J.; Ford, J.; Ye, D.H. Multimodal deep learning from imaging genomic data for schizophrenia classification. Front. Psychiatry 2024, 15, 1384842. [Google Scholar] [CrossRef] [PubMed]
Wazni, L.; Gifford, W.; Perron, A.; Vandyk, A. Understanding the Physical Health Problems of People with Psychotic Disorders Using Digital Storytelling. Issues Ment. Health Nurs. 2023, 44, 690–701. [Google Scholar] [CrossRef] [PubMed]
Linden, A.H.; Hönekopp, J. Heterogeneity of research results: A new perspective from which to assess and promote progress in psychological science. Perspect. Psychol. Sci. 2021, 16, 358–376. [Google Scholar] [CrossRef] [PubMed]
Kinreich, S.; McCutcheon, V.V.; Aliev, F.; Meyers, J.L.; Kamarajan, C.; Pandey, A.K.; Chorlian, D.B.; Zhang, J.; Kuang, W.; Pandey, G. Predicting alcohol use disorder remission: A longitudinal multimodal multi-featured machine learning approach. Transl. Psychiatry 2021, 11, 166. [Google Scholar] [CrossRef]
Greco, C.M.; Simeri, A.; Tagarelli, A.; Zumpano, E. Transformer-based language models for mental health issues: A survey. Pattern Recognit. Lett. 2023, 167, 204–211. [Google Scholar] [CrossRef]
Abplanalp, S.J.; Braff, D.L.; Light, G.A.; Joshi, Y.B.; Nuechterlein, K.H.; Green, M.F. Clarifying directional dependence among measures of early auditory processing and cognition in schizophrenia: Leveraging Gaussian graph-ical models and Bayesian networks. Psychol. Med. 2024, 54, 1–10. [Google Scholar] [CrossRef]
Iyortsuun, N.K.; Kim, S.H.; Jhon, M.; Yang, H.J.; Pant, S. A review of machine learning and deep learning approaches on mental health diagnosis. Healthcare 2023, 11, 285. [Google Scholar] [CrossRef]
Zhang, A.; Yao, C.; Zhang, Q.; Zhao, Z.; Qu, J.; Lui, S.; Zhao, Y.; Gong, Q. Individualized multimodal MRI biomarkers predict 1-year clinical outcome in first-episode drug-naïve schizophrenia patients. Front. Psychiatry 2024, 15, 1448145. [Google Scholar]
Bi, Y.; Abrol, A.; Fu, Z.; Calhoun, V.D. A multimodal vision transformer for interpretable fusion of functional and structural neuroimaging data. Hum. Brain Mapp. 2024, 45, e26783. [Google Scholar] [CrossRef]
Shams, A.M.; Jabbari, S. A deep learning approach for diagnosis of schizophrenia disorder via data augmen-tation based on convolutional neural network and long short-term memory. Biomed. Eng. Lett. 2024, 14, 663–675. [Google Scholar] [CrossRef]
Chowdhury, A.H.; Islam, M.F.; Riad, M.R.A.; Hashem, F.B.; Reza, M.T.; Golam Rabiul Alam, M. A Hybrid Federated Learning-Based Ensemble Approach for Lung Disease Diagnosis Leveraging Fusion of SWIN Trans-former and CNN. In International Congress on Information and Communication Technology; Springer: Singapore, 2023; pp. 957–972. [Google Scholar]
Miah, H.; Kollias, D.; Pedone, G.L.; Provan, D.; Chen, F. Can machine learning assist in diagnosis of primary immune thrombocytopenia? a feasibility study. Diagnostics 2024, 14, 1352. [Google Scholar] [CrossRef] [PubMed]
Kerz, E.; Zanwar, S.; Qiao, Y.; Wiechmann, D. Toward explainable AI (XAI) for mental health detection based on language behavior. Front. Psychiatry 2023, 14, 1219479. [Google Scholar] [CrossRef] [PubMed]
Lee, S.; Shin, J.; Kim, H.S.; Lee, M.J.; Yoon, J.M.; Lee, S.; Kim, Y.; Kim, J.Y.; Lee, S. Hybrid method incorporating a rule-based approach and deep learning for prescription error prediction. Drug Saf. 2022, 45, 27–35. [Google Scholar] [CrossRef] [PubMed]
Pillai, V. Enhancing transparency and understanding in ai decision-making processes. Iconic Res.-Gineering J. 2024, 8, 168–172. [Google Scholar]
Tonekaboni, S.; Joshi, S.; McCradden, M.D.; Goldenberg, A. What clinicians want: Contextualizing explainable machine learning for clinical end use. Proc. Mach. Learn. Healthc. Conf. 2019, 106, 359–380. [Google Scholar]
Ching, T.; Himmelstein, D.S.; Beaulieu-Jones, B.K.; Kalinin, A.A.; Do, B.T.; Way, G.P.; Ferrero, E.; Agapow, P.M.; Zietz, M.; Hoffman, M.M. Oppor-tunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 2018, 15, 20170387. [Google Scholar] [CrossRef]
Rieke, N.; Hancox, J.; Li, W.; Milletari, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The future of digital health with federated learning. npj Digit. Med. 2020, 3, 119. [Google Scholar] [CrossRef]
Sheller, M.J.; Reina, G.A.; Edwards, B.; Martin, J.; Bakas, S. Multi-institutional deep learning modeling with-out sharing patient data: A feasibility study on brain tumor segmentation. Int. J. Comput. Vis. 2019, 127, 329–346. [Google Scholar]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural-Infor-Mation Process. Syst. (NeurIPS) 2017, 30, 4765–4774. [Google Scholar]
Ahmad, B.; Sun, J.; You, Q.; Palade, V.; Mao, Z. Brain tumor classification using a combination of variational autoencoders and generative adversarial networks. Biomedicines 2022, 10, 223. [Google Scholar] [CrossRef]
Yasin, S.; Othmani, A.; Raza, I.; Hussain, S.A. Machine learning based approaches for clinical and non-clinical depression recognition and depression relapse prediction using audiovisual and EEG modalities: A comprehensive review. Comput. Biol. Med. 2023, 159, 106741. [Google Scholar] [CrossRef] [PubMed]
He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learn-ing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, DC, USA, 14–19 June 2020; pp. 9729–9738. [Google Scholar]
Misra, I.; Maaten, L.V.D. Self-supervised learning of pretext-invariant representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, DC, USA, 14–19 June 2020; pp. 6707–6717. [Google Scholar]
Yasin, S.; Hussain, S.A.; Aslan, S.; Raza, I.; Muzammel, M.; Othmani, A. EEG based Major Depressive disorder and Bipolar disorder detection using Neural Networks: A review. Comput. Methods Programs Biomed. 2021, 202, 106007. [Google Scholar] [CrossRef]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual repre-sentations. In Proceedings of the 37th International Conference on Machine Learning (ICML), Virtual Event, 13–18 July 2020; Volume 119, pp. 1597–1607. [Google Scholar]
Obermeyer, Z.; Powers, B.; Vogeli, C.; Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 2019, 366, 447–453. [Google Scholar] [CrossRef] [PubMed]
Bzdok, D.; Krzywinski, M.; Altman, N. Points of significance: Machine learning: A primer. Nat. Methods 2017, 14, 1119–1120. [Google Scholar] [CrossRef] [PubMed]
Yasin, S.; Raza, I.; Othmani, A.; Hussain, S.A. AI-Enabled Electroencephalogram (EEG) Analysis for De-pression Relapse Detection in Quadriplegic Patients. In Proceedings of the 2024 International Conference on Computing, Internet of Things and Microwave Systems (ICCIMS), Gatineau, QC, Canada, 29–31 July 2024; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
Kim, H.; Luo, J.; Chu, S.; Cannard, C.; Hoffmann, S.; Miyakoshi, M. ICA’s bug: How ghost ICs emerge from effective rank deficiency caused by EEG electrode interpolation and incorrect re-referencing. Front. Signal Process. 2023, 3, 1064138. [Google Scholar] [CrossRef]
Besana, F.; Civardi, S.C.; Mazzoni, F.; Carnevale Miacca, G.; Arienti, V.; Rocchetti, M.; Politi, P.; Martiadis, V.; Brondino, N.; Olivola, M.; et al. Predictors of readmission in young adults with first-episode psychosis: A multicentric retrospective study with a 12-month follow-up. Clin. Pract. 2024, 14, 1234–1244. [Google Scholar] [CrossRef]
Irfan, M.; Iftikhar, M.A.; Yasin, S.; Draz, U.; Ali, T.; Hussain, S.; Bukhari, S.; Alwadie, A.S.; Rahman, S.; Glowacz, A.; et al. Role of hybrid deep neural networks (HDNNs), computed tomography, and chest X-rays for the detection of COVID-19. Int. J. Environ. Res. Public Health 2021, 18, 3056. [Google Scholar] [CrossRef]
Yasin, S.; Othmani, A.; Mohamed, B.; Raza, I.; Hussain, S.A. Depression detection and subgrouping by using the active and passive EEG paradigms. Multimed. Tools Appl. 2024, 84, 8287–8310. [Google Scholar] [CrossRef]
Delorme, A.; Makeig, S. EEGLAB: An open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 2004, 134, 9–21. [Google Scholar] [CrossRef]

Figure 1. Conceptual framework that features essential AI-driven elements for schizophrenia diagnosis and treatment and prognosis. The diagram combines multimodal data analysis with deep learning and clinical support features to support early detection and individualized treatment alongside proactive relapse prevention.

Figure 4. Workflow of EEG-based schizophrenia relapse prediction using a CNN-Transformer hybrid model.

Figure 5. Comparison between Proactive Relapse Detection (presented in this paper) and Post-Relapse Analysis in schizophrenia care. In the left column, it emphasizes the proactive–continuous monitoring, alerts (immediate and predictive), insights, and adjustments to prevent a fall. The column on the right pertains to analytical approaches following relapse events—review of historical records, outcome prediction, and statistical modeling. Abstract: A framework for continual patient feedback and outcome monitoring. Message: The framework combines real-time engagement with retrospective analysis in order to improve clinical decisions and patient outcomes.

Figure 6. Confusion matrices and precision analysis for assessing classification performance. (a) Confusion matrix of the classification results of the proposed model which has high MCC (MCC = 0.9318). (b) A comparative bar chart of Precision and PI-Score metrics pre and post model optimization. (c) Confusion matrix for the MultiViT model indicating the most notable misclassification. (d) Confusion matrix of the proposed model. The results show the higher predictive performance of the proposed model, with fewer false positives and negatives.

Figure 7. Performance validation using EEG data of the proposed schizpohrenia relapse prediction framework from different aspects. (a) The ROC curve of high classification performance. (b) The PR (Precision–Recall) curve represents the relationship between precision with respect to recall. (c) Calibration plot of predicted vs. observed probabilities. (d) Predicted probability histogram of schizophrenia classification. (e) Bar comparison of accuracy, precision, recall, and F1 for different models. (f) Comparison of sensitivities and specificities for different techniques, whereby better performance is exhibited by the proposed method.

Table 1. Comparison of related works on schizophrenia prediction.

Ref.	Summary	Objective	Methodology	Accuracy	Early Detection	Hybrid Model	Input Data Type
[3]	AI predicts schizophrenia treatment outcomes	Predict treatment outcomes using AI	Meta-analysis of 21 studies	Sensitivity: 70% Specificity: 76%	✓	✓	EEG, imaging data
[7]	Mobile sensing for psychotic relapse	Predict psychotic relapse with mobile data	Personalized LSTM-based DL model	38.8%	✓	×	Mobile sensing data
[11]	Multimodal DL for schizophrenia	Schizophrenia classification	Deep learning using sMRI, fMRI, SNP	79.01%	×	✓	sMRI, fMRI, SNP
[14]	Predicting AUD remission	Predict alcohol disorder remission	ML on EEG, PRS, clinical data	86.04%	✓	✓	EEG, PRS, clinical
[19]	MultiViT for schizophrenia diagnosis	Improve classification accuracy	Vision Transformer (MultiViT) using sMRI, FNC	83.3%	✓	✓	MRI and connectivity data
[23]	XAI for mental health detection	Improve interpretability with accuracy	BiLSTM + Transformer + Explainability	70.78%	×	✓	Textual features (syntax, emotion)
[24]	Predict prescription errors	Reduce clinical alert fatigue	ARDNN (rule-based + DNN)	72.86%	✓	✓	Clinical prescription data
[29]	Federated DL for brain tumor segmentation	Evaluate multi-site model training	U-Net with federated learning	85.2%	✓	✓	MRI scans
Ours	Transformer-based hybrid DL for relapse prediction	Early relapse detection in schizophrenia	CNN + Transformer + FC + Sentiment Analysis	97%	✓	✓	EEG data

Table 2. Comparative Analysis: Schizophrenia Early Detection vs. Relapse Prediction.

Parameter	Schizophrenia Early Detection	Schizophrenia Relapse Prediction
Objective	Identify early schizophrenia symptoms	Predict the risk of relapses in diagnosed patients
Input Factors	Symptoms (hallucinations, delusions, speech, emotional, social, cognitive)	Symptoms + External Factors (medication adherence, stress, sleep quality)
Output	Risk Level (High, Moderate, Low, No Risk)	Relapse Risk (High, Moderate, Low, Stable)
Decision Complexity	Lower (Fewer conditions) (O(1)–O(n))	Higher (Includes external factors) (O(1)–O(n))
Best Case Time Complexity	O(1) (Immediate decision if critical symptoms present)	O(1) (Immediate decision if relapse indicators are strong)
Worst Case Time Complexity	O(n) (Evaluation of all symptoms)	O(n) (Assessment of all symptoms and external factors)
Sensitivity to External Factors	Low (Only symptom-based)	High (Accounts for medication adherence, stress, and sleep)
Practical Usage	Screening tool for early-stage schizophrenia	Monitoring tool for patients already diagnosed
Suitability for Clinical Use	General practitioners, psychologists	Psychiatrists, mental health professionals
Predictive Accuracy	Moderate (Based on the presence of symptoms)	Higher (Includes behavioral and treatment-related data)
False Positive Risk	Higher (Some symptoms may appear in other disorders)	Lower (More context-based evaluation)
False Negative Risk	Lower (Covers broad symptoms)	Higher (May miss relapse risk if external factors fluctuate)
Adaptability	Static (Fixed symptoms used)	Dynamic (Can be adjusted based on patient history)

Table 3. Important symbols and acronyms used in the hybrid deep learning framework.

No.	Symbol/Acronym	Meaning
1	$P (R)$	Predicted probability of schizophrenia relapse
2	$X_{E E G}$	Raw EEG signal input
3	$X_{c l i n i c a l}$	Clinical features (e.g., age, diagnosis, comorbidities)
4	$X_{s e n t i m e n t}$	Sentiment features from patient speech or text
5	$X_{h i s t o r y}$	Historical relapse records
6	$X_{n o r m}^{E E G}$	Z-score normalized EEG signal
7	$H_{c n n} (t)$	Output of CNN after processing EEG at time t
8	$Z_{Q, K, V}$	Query, Key, and Value matrices in Transformer
9	$A (i, j)$	Attention score between EEG time points i and j
10	$Z_{a t t n}$	Context vector after attention computation
11	$H_{t r a n s} (t)$	Transformer-enhanced feature vector
12	$Y_{p r e d}$	Model output (relapse probability)
13	$S_{s e n t i m e n t}$	Aggregated sentiment score
14	$M_{s c o r e}$	Medication adherence probability
15	$R_{a g g}$	Aggregated clinical relapse score
16	$λ_{1}, λ_{2}, λ_{3}$	Fusion weights for EEG, sentiment, medication
17	$L_{C E}$	Cross-entropy classification loss
18	$L_{r e g}$	Regularization loss
19	$L_{t o t a l}$	Total loss (combined objective)
20	$F_{s c o r e}$	F1-score for performance evaluation
21	$M C C$	Matthews Correlation Coefficient
22	$A U C_{R O C}$	Area under the ROC curve
23	$I_{g r a d}$	Input saliency gradient for interpretability
24	$F_{f i n a l} (x)$	Final multimodal prediction function

Table 4. Classification performance metrics for healthy and schizophrenia classes, including precision, recall, F1-score, and averages.

Class	Precision	Recall	F1-Score	Samples
Healthy (0)	0.9820	0.9516	0.9666	46,214
Schizophrenia (1)	0.9489	0.9809	0.9646	42,286
Accuracy	0.9656	0.9656	0.9656	0.9656
Macro Avg.	0.9654	0.9663	0.9656	88,500
Weighted Avg.	0.9662	0.9656	0.9656	88,500
MCC	–	–	0.9318	–
AUC-ROC	–	–	1.0000	–

Table 5. Performance comparison of state-of-the-art techniques with the proposed framework across key evaluation metrics.

Ref.	Technique	Precision	Recall	F1-Score	Sensitivity (TPR)	Specificity (TNR)	Accuracy
[11]	XGBoost	83.00	83.00	84.00	0.89	0.94	83.00%
[19]	MultiViT	78.83	79.43	78.93	0.76	0.77	79.01%
[23]	MentalRoBERTa	72.04	71.62	71.83	0.71	0.72	71.55%
–	Our Proposed Framework	96.0	98.15	97.0	0.98	0.95	97.00%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yasin, S.; Adeel, M.; Draz, U.; Ali, T.; Hijji, M.; Ayaz, M.; Marei, A.M. A CNN-Transformer Fusion Model for Proactive Detection of Schizophrenia Relapse from EEG Signals. Bioengineering 2025, 12, 641. https://doi.org/10.3390/bioengineering12060641

AMA Style

Yasin S, Adeel M, Draz U, Ali T, Hijji M, Ayaz M, Marei AM. A CNN-Transformer Fusion Model for Proactive Detection of Schizophrenia Relapse from EEG Signals. Bioengineering. 2025; 12(6):641. https://doi.org/10.3390/bioengineering12060641

Chicago/Turabian Style

Yasin, Sana, Muhammad Adeel, Umar Draz, Tariq Ali, Mohammad Hijji, Muhammad Ayaz, and Ashraf M. Marei. 2025. "A CNN-Transformer Fusion Model for Proactive Detection of Schizophrenia Relapse from EEG Signals" Bioengineering 12, no. 6: 641. https://doi.org/10.3390/bioengineering12060641

APA Style

Yasin, S., Adeel, M., Draz, U., Ali, T., Hijji, M., Ayaz, M., & Marei, A. M. (2025). A CNN-Transformer Fusion Model for Proactive Detection of Schizophrenia Relapse from EEG Signals. Bioengineering, 12(6), 641. https://doi.org/10.3390/bioengineering12060641

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A CNN-Transformer Fusion Model for Proactive Detection of Schizophrenia Relapse from EEG Signals

Abstract

1. Introduction

2. Material and Methods

2.1. Data Collection and Understanding

2.2. Handling Missing and Duplicate Values

2.3. EEG Preprocessing: Noise Removal and Signal Enhancement

2.4. Feature Extraction and Normalization

2.5. Encoding Categorical Labels and Data Splitting

2.6. Addressing Class Imbalance and Data Augmentation

2.7. Model Development

2.8. Experimental Setup and Computational Resources

3. Problem Formulation

3.1. Input Representation and Normalization

3.2. Transformer-Based Temporal Encoding

4. Results

4.1. Performance of the Hybrid Model

4.2. Statistical Analysis

4.3. Ablation Study and Contribution Analysis

4.4. External Validation and Generalizability

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI