Research on a PTSD Risk Assessment Model Using Multi-Modal Data Fusion

Luo, Youxi; Shang, Yucui; Zhu, Dongfeng; Zhang, Tian; Hu, Chaozhu

doi:10.3390/math13111901

Open AccessArticle

Research on a PTSD Risk Assessment Model Using Multi-Modal Data Fusion

by

Youxi Luo

,

Yucui Shang

,

Dongfeng Zhu

,

Tian Zhang

and

Chaozhu Hu

^*

School of Science, Hubei University of Technology, Wuhan 430068, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(11), 1901; https://doi.org/10.3390/math13111901

Submission received: 8 May 2025 / Revised: 28 May 2025 / Accepted: 30 May 2025 / Published: 5 June 2025

(This article belongs to the Section D: Statistics and Operational Research)

Download

Browse Figures

Versions Notes

Abstract

Post-traumatic stress disorder (PTSD) is a complex psychological disorder caused by multiple factors, which are not only related to individual psychological states but also closely linked to physiological responses, social environments, and personal experiences. Therefore, traditional single data source assessment methods are difficult to fully understand and evaluate the complexity of PTSD. To overcome this challenge, the focus of this study is on developing a PTSD risk assessment model based on multi-modal data fusion. The importance of multi-modal data fusion lies in its ability to integrate data from different dimensions and provide a more comprehensive PTSD risk assessment. For multi-modal data fusion, two sets of solutions are proposed: the first is to extract EEG features using B-spline basis functions, combined with questionnaire data, to construct a multi-modal Zero-Inflated Poisson regression model; the second is to build a multi-modal deep neural network fusion prediction model to automatically extract and fuse multi-modal data features. The results show that the multi-modal data model is more accurate than the single data model, with significantly improved prediction ability. Zero-inflated Poisson models are prone to over-fitting when data is limited, while deep neural network models show superior performance in both training and prediction sets, especially the Hybrid LSTM-FCNN model, which not only has high accuracy but also strong generalization ability. This study proves the potential of multi-modal data fusion in PTSD prediction, and the Hybrid LSTM-FCNN model stands out for its high accuracy and good generalization ability, providing scientific evidence for early warning of PTSD in rescue personnel. Future research can further explore model optimization and clinical applications to promote the mental health maintenance of rescue personnel.

Keywords:

PTSD; multi-modal data; zero-inflated Poisson regression model; deep neural network model

MSC:

68T07; 62J07

1. Introduction

Post-traumatic stress disorder (PTSD) is a complex psychophysiological disorder triggered by extreme psychological trauma, characterized by recurrent intrusive memories and pronounced emotional and behavioral disturbances. Owing to its high prevalence, chronic nature, and elevated suicide risk, PTSD poses a significant global public health challenge, especially among those who have experienced war, natural disasters, or violent events. Rescue personnel, who are routinely exposed to such traumatic events, face an extremely high risk of PTSD, with prevalence rates significantly exceeding those of the general population [1,2].

The onset and progression of PTSD are influenced by a complex interplay of physiological, psychological, and social factors. Traditional assessment methods often rely on subjective evaluations and self-report questionnaires, which may fail to capture the full spectrum of PTSD-related symptoms. With the rapid advancement of big data and artificial intelligence technologies, multi-modal data fusion has emerged as a promising approach for addressing these limitations. By integrating heterogeneous data sources—such as electroencephalography (EEG), physiological metrics, and behavioral traits—researchers can obtain a more comprehensive and objective understanding of PTSD symptoms and risk factors.

Among the various modalities under investigation, voice-based analysis has recently gained increasing attention as a complementary tool for mental health assessment. As a non-invasive, cost-effective, and easily deployable approach, the voice modality offers unique advantages in large-scale and remote psychological screening. Voice signals are capable of capturing nuanced emotional and cognitive states through prosodic, spectral, and temporal features, many of which are correlated with stress, anxiety, and mood disorders. Such characteristics make voice data particularly valuable for real-time and longitudinal monitoring of mental health conditions.

Although the present study primarily focuses on EEG and psychological questionnaire data, we acknowledge the emerging potential of voice signals in the context of PTSD risk prediction. Integrating voice features with other physiological and behavioral data in future research could significantly enhance the sensitivity and robustness of multi-modal assessment frameworks. Furthermore, it would provide opportunities for developing more accessible and scalable mental health interventions, especially in low-resource or high-risk populations.

Therefore, it is crucial to investigate advanced modeling approaches capable of effectively integrating information from multiple data sources. In this study, we propose a PTSD risk assessment framework that fuses EEG signals and psychological questionnaire data using a multi-modal data integration strategy. Specifically, we aim to construct two types of models: (1) a statistical model using Zero-Inflated Poisson regression, and (2) a deep learning model based on Hybrid LSTM-FCNN architecture. We hypothesize that the integration of multi-modal data can significantly improve prediction accuracy and robustness compared to single-modality models.

The key contributions of this study are as follows:

Application and Extension of Zero-Inflated Poisson Models: Given the distributional characteristics of the PCL total score—namely, its discreteness and excessive zeros—we employ the Zero-Inflated Poisson regression framework. We construct three variants: a general ZIP model and a multi-modal ZIP model incorporating EEG-derived features.
Development of Multi-Modal Deep Learning Models for EEG-Based PTSD Prediction: We propose and implement three multi-modal deep neural networks—Hybrid LSTM-FCNN, Hybrid RNN-FCNN, and Hybrid CNN-FCNN—which combine EEG time series data and psychological questionnaire data. These models are capable of automatically extracting high-level predictive features from EEG signals via LSTM, RNN, or CNN layers, and integrating them with questionnaire features through fully connected networks.
Systematic Comparison of Modeling Strategies: We conduct extensive comparative experiments to evaluate the performance of the proposed deep learning models against (a) each other, (b) single-modality neural network models without EEG input, and (c) the traditional ZIP regression models. The results confirm that the Hybrid LSTM-FCNN model consistently achieves superior accuracy and generalization performance.

The remainder of this paper is organized as follows. Section 2 reviews the existing literature on PTSD risk assessment and multi-modal modeling techniques. Section 3 introduces the dataset, feature extraction methods, and the proposed modeling framework. Section 4 presents the experimental results, model comparisons, and ablation analysis. Section 5 provides a discussion and interpretation of the findings, highlighting limitations and implications. Section 6 concludes this study and outlines directions for future research.

2. Literature Review

Research indicates that the prevalence of PTSD varies significantly among rescue personnel from different professional backgrounds (firefighters, police officers, and volunteers), ranging from 2.2% to 21.2% [3,4,5,6]. The incidence is especially high among those performing non-conventional rescue tasks. The occupational nature of rescue work, which involves frequent exposure to traumatic events, makes rescue personnel a high-risk group for PTSD. Specifically, a study by Zhou (2021) [7] found that the PTSD-positive rate among fire rescue personnel was as high as 2.2%, while it decreased to 0.66% among specialized military rescue personnel. Furthermore, a study by Berger et al. (2012) [8] demonstrated that the prevalence of PTSD among rescue workers is significantly higher than that of the general population. Perrin et al. (2007) [4] analyzed different occupational groups involved in the World Trade Center rescue operations and found an overall PTSD prevalence of 12.4%, with rates of 6.2% among police officers and 21.2% among non-affiliated volunteers. The studies also emphasize that early intervention and prolonged work are critical risk factors for PTSD development. For early respondents, in particular, work duration is closely related to PTSD risk, with higher rates among those engaged in non-conventional rescue missions.

In recent years, significant progress has been made in the identification and prediction of post-traumatic stress disorder (PTSD). Traditional assessment tools, such as the International Trauma Questionnaire (ITQ) developed by Hyland et al. (2017) [9] and the CAPS scoring system proposed by Shalev et al. (2019) [10], have laid the foundation for the diagnosis of PTSD. However, with the continuous advancement of technology, traditional assessment methods have shown limitations when dealing with the complexity of PTSD [9]. Therefore, machine learning techniques have gradually become important tools for the early prediction of PTSD. Studies have shown that machine learning algorithms can accurately predict the risk and development trend of PTSD by analyzing large amounts of data. Wshah et al. (2019) [11] and Dupont et al. (2023) [12], respectively, demonstrated the application of machine learning in the risk assessment of PTSD in family members of ICU patients, achieving high prediction accuracy. In recent years, Shahzad and Ali (2024) [13] further expanded this application. They used a variety of machine learning algorithms, including Random Forest, naive Bayes, support vector machine, decision tree, K-nearest neighbor, linear discriminant analysis, and 3D convolutional neural network in deep learning, to distinguish PTSD patients from healthy controls. The results showed that the accuracy of the 3D-CNN model on the training, validation, and test datasets reached 98.12%, 98.25%, and 98.00%, respectively, significantly outperforming other traditional algorithms. Wang et al. (2024) [14] also confirmed in their review that machine learning techniques, especially when combined with multi-modal data fusion, have significantly improved the accuracy of PTSD identification.

Furthermore, with the continuous advancement of machine learning and deep learning technologies, more possibilities have emerged for the early diagnosis and risk assessment of PTSD. For instance, Ge et al. (2016) [15] proposed the McTwo feature selection algorithm, which uses the maximal information coefficient for feature screening, helping to enhance the accuracy and efficiency of multi-modal data fusion. This algorithm can effectively identify the key features influencing the onset of PTSD. Liu et al. (2024) [16] introduced a deep neural network pruning model based on the max–min concave penalty regression, which improves the accuracy and computational efficiency of neural network models. This method can be effectively applied to the prediction of PTSD and the screening of related risk factors. Additionally, Li et al. (2023) [17] optimized the clustering analysis of functional data through dynamic and static enhanced BIRCH algorithms, providing technical support for the efficient identification of PTSD. This approach can enhance the flexibility and accuracy of clustering algorithms, thereby better uncovering the potential patterns of PTSD. These advanced machine learning methods and algorithms offer more powerful tools for the precise prediction of PTSD and clinical decision-making, playing a significant role in early identification and intervention.

In addition to physiological and behavioral data, voice-based analysis has recently emerged as a promising modality in the field of mental health assessment. As a non-invasive, cost-effective, and easily deployable tool, voice data enables large-scale and remote screening of conditions such as PTSD and depression. Acoustic features such as pitch, jitter, shimmer, speech rate, and prosody can reflect subtle emotional and cognitive changes, offering valuable insights into psychological states. Recent studies have explored the use of voice characteristics in identifying mental health disorders. For example, Scherer et al. (2013) demonstrated that voice quality could serve as a speaker-independent indicator for PTSD and depression [18]. Vilenchik et al. (2025) proposed a novel data collection framework that enhances the reliability and generalizability of voice-based depression detection [19]. Anketell et al. (2012) developed an automated voice analysis system for PTSD screening [20], while Cannizzaro et al. (2010) investigated the relationship between auditory hallucinations and dissociative symptoms in chronic PTSD [21]. These findings underscore the potential of voice signals as an emerging channel for mental health monitoring.

The complexity and multidimensional nature of PTSD pose limitations for traditional, single-source data assessment methods in predicting and understanding its pathogenesis [22]. Therefore, the strategy of multi-modal data fusion, which integrates data from diverse sources and types, offers a more comprehensive approach to unraveling the pathogenesis of PTSD and improving prediction accuracy [23]. Although the current study focuses on EEG and questionnaire data, voice features represent an important future direction for multi-modal risk assessment models. Their integration could further enhance diagnostic accuracy and provide continuous, real-time monitoring capabilities in practical applications. Multi-modal data fusion not only provides richer information dimensions but also reveals the underlying relationships between different data types, offering more precise grounds for the diagnosis, treatment, and prevention of PTSD. In recent years, multi-modal data fusion has achieved significant results across various fields. For instance, Liu et al. (2015) [24] proposed a parameterization algorithm for electrocardiograms (ECGs), which, through multi-level feature extraction from ECG data, achieved excellent results in detecting myocardial infarction. This successful application demonstrates the potential of data fusion in medical diagnostics. In neuroscience, Gao et al. (2024) [25] integrated neuroimaging data from MRI, PET, SPECT, and fMRI, successfully acquiring multi-layered information ranging from anatomical structures to physiological functions, thus advancing precision medicine, particularly in the localization and functional assessment of neurological disorders. In the context of diabetic retinopathy (DR), Zhao et al. (2024) [26] utilized a fusion of computer vision techniques and clinical data to significantly enhance the accuracy of referral decisions. In oncology, Cai et al. (2024) [27] revolutionized diagnostic and prognostic models by integrating imaging, histopathology, genomics, and electronic health record data, driving the application of precision medicine and artificial intelligence in clinical practice. Muhammad et al. (2024) [28] emphasized the central role of multi-modal medical image fusion in modern medical diagnostics, noting that the integration of data from different imaging systems provides more detailed anatomical and physiological information, overcoming the limitations of single-modal imaging.

Furthermore, the biomarker optimization method based on constraint programming proposed by Zhou et al. (2015) [29], although initially applied in the biomedical field, also offers valuable insights for the prediction of PTSD through its optimization algorithms. By applying constraints and optimization to multiple variables, this method achieves the best combination within complex datasets, providing new ideas and a technical framework for multi-modal data fusion in PTSD. Similarly, Li et al. (2023) [30] introduced a deep learning method combined with extensive sentiment analysis for quantitative investing, originally applied in the financial sector. This approach, blending deep learning with sentiment analysis, offers insights into multi-modal data fusion for PTSD prediction. By integrating sentiment analysis and data processing methods from various domains, a more comprehensive perspective can be provided for the prediction and identification of PTSD. These studies provide crucial theoretical support and practical experience for the application of multi-modal data fusion in PTSD. By drawing on successful cases from other fields, multi-modal data fusion is expected to play a key role in the diagnosis, prediction, and treatment of PTSD, offering more accurate and personalized healthcare services for patients.

These studies collectively illustrate the growing application of machine learning and multi-modal fusion in healthcare and risk prediction scenarios. However, the use of EEG signals combined with psychological questionnaire data for PTSD risk assessment remains underexplored. This study aims to address this gap by proposing a multi-modal fusion framework that leverages both EEG features and questionnaire data to enhance the accuracy and interpretability of PTSD risk prediction models.

3. Methodology

3.1. Data Description

The data for this study comes from a national postgraduate case competition (http://mas.ruc.edu.cn/syxwlm/MASkx/f3a95df472314a87842ff62b6d76aa44.htm, accessed on 8 May 2025). As the data comes from an official competition source, the competition organizers have already ensured its propriety in accordance with relevant norms. The questionnaire collected demographic information and psychological health assessment indicators, while the EEG data were recorded in a controlled experimental environment. The sample characteristics indicate that the participants’ ages were primarily between 19 and 34 years, with 70% being male.

PTSD symptoms were measured using the Post-Traumatic Stress Disorder Checklist for DSM-5 (PCL-5), which was developed by the Behavioral Science Division of the U.S. National Center for PTSD. This checklist comprises 17 items that assess individuals’ experiences after encountering traumatic events in their lives. The PCL-5 uses a 5-point Likert scale to measure the severity of symptoms, with total scores ranging from 17 to 85. Higher scores indicate a greater likelihood of PTSD. This scale provides a continuous score based on both the number and severity of symptoms, making it a multidimensional tool for observing PTSD. It is widely used in clinical research to evaluate the effectiveness of psychological interventions [31,32,33,34].

3.1.1. Questionnaire Data

The questionnaire data include 903 survey samples, labeled from A1 to A903. There are 27 variables in total, of which there are 11 continuous variables and 16 categorical variables, as shown in Table 1 and Table 2. Descriptive statistics of continuous variables were performed using SPSS 27. The statistical results are summarized in Table 1.

Based on Table 1, the sample of rescue personnel shows no missing values. The minimum age is 15 years, and the maximum age is 56 years, indicating a wide range, but the majority fall within the young to middle-aged category, with 97% of the rescue personnel aged between 19 and 34 years, suggesting that most are in their prime youth. The average height of the rescue personnel is 174.19 cm, the average weight is 71.11 kg, and the average BMI is 23.44, indicating that the participants are neither too thin nor overweight. ASD scores, up to 91, fall into the category of severe autism and indicate a high likelihood of post-traumatic stress disorder.

The PCL score is treated as the dependent variable. Upon examining the PCL score, it is observed that there are many zero values, accounting for 79.734% of the data. Although the PCL score is a numerical variable and does not count data, the high proportion of zeros suggests treating the response variable as a zero-inflated variable. This is illustrated in Figure 1.

3.1.2. EEG Data

To explore the characteristics of EEG signals, we visualized all waveform data for subject A1, as shown in Figure 2.

As illustrated in Figure 2, the intensity of EEG activity varies across different time periods. Certain segments exhibit more pronounced neural activity, characterized by higher peak amplitudes, while others appear relatively quiescent, with lower trough values. These fluctuations may reflect the generation of different types of EEG waveforms corresponding to distinct brain states.

3.2. Feature Extraction

3.2.1. Questionnaire Feature Selection

To filter out relevant features from the questionnaire data, a two-step approach was used. First, the information gain was calculated for all features, and variables above the median threshold were retained. Then, a Random Forest classifier was used to rank the importance of features, and the top 11 features were retained based on cumulative contribution.

Information Gain Calculation: To identify the most relevant questionnaire features for predicting PTSD risk, information gain (IG) was calculated for 26 variables using Python. The total PCL score was used as the response variable. Variables with IG values above the median threshold (0.0718) were retained, resulting in 13 selected features. This method enables global feature selection, capturing variables that contribute most significantly to the overall predictive model. However, IG does not reflect category-specific importance, limiting its application to local or class-specific feature selection.
The algorithmic process for feature selection based on information gain is as follows:
- Calculate the information entropy of the original data: $H (D) = - \sum_{i = 1}^{n} p_{i} \log_{2} (p_{i})$ .
- Select a feature and classify the data according to its feature values. Then, calculate the information entropy for each category and compute the weighted sum to determine the information entropy of this classification method: $H (D | A) = \sum_{j = 1}^{m} \frac{|D_{j}|}{|D|} H (D_{j})$ .
- Calculate the information gain for the selected feature: $G a i n_{A} (D) = H (D) - H (D |A)$ .
- Repeat steps 2 and 3 to compute the information gain for all features, retaining those with the highest information gain.
Random Forest Re-Selection: To refine the feature set obtained from the information gain method, Random Forest (RF) was employed for further feature selection, as shown in Algorithm 1. RF estimates feature importance based on the average reduction in impurity across all decision trees. The 13 variables previously selected were input into a Random Forest model implemented in Python. According to the results, the total ASD score and ASD alertness were the most important features, with the former alone contributing nearly half of the total importance. In contrast, ASD diagnosis and trauma scene emotional reaction showed minimal contributions. The top 11 features, accounting for cumulative importance of 0.996422, were retained for subsequent analysis.

Algorithm 1: Random Forest Feature Selection Algorithm

1: Initialize

k = 1

,

K =

total trees.
2: for

k = 1

to

K

do
3: Generate

k - t h

bootstrap sample

D_{k}^{b o o t}

from

D

4: Train decision tree

T_{k}

on

D_{k}^{b o o t}

5: Identify Out-Of-Bag (OOB) samples

D_{k}^{o o b}

6: Compute baseline accuracy

A_{k}^{base}

of

T_{k}

on

D_{k}^{o o b}

7: for each feature

i = 1

to

M (M = 13)

do
8: Create perturbed OOB dataset

{\tilde{D}}_{k, i}^{o o b}

by permuting feature

i

values
9: Compute perturbed accuracy

A_{k, i}^{perm}

of

T_{k}

on

{\tilde{D}}_{k, i}^{o o b}

10: Calculate accuracy decrease:

Δ A_{k, i} = A_{k}^{base} - A_{k, i}^{perm}

11: end for
12: end for
13: for each feature

i = 1

to

M

do
14: Compute importance:

I_{i} = \frac{1}{K} \sum_{k = 1}^{K} Δ A_{k, i}

15: end for
16: Rank features by

I_{i}

(descending order)

3.2.2. EEG Data Feature Extraction

The EEG signals of rescuers are non-periodic and have complex local change characteristics. Therefore, in this paper, the EEG time series signals are fitted and features are extracted by using the B-spline basis function. B-splines are particularly suitable for non-periodic signal processing due to their flexibility and high efficiency [35,36,37]. Forty B-spline coefficients were extracted as initial features for each EEG sample and downscaled by principal component analysis (PCA), and the top five principal components explaining 88.78% of the variance were finally retained for modeling. Taking participant A1 as an example, the fitting effects of 8 EEG data points using B-spline basis functions are shown in Figure 3:

3.3. Modeling Framework

3.3.1. Zero-Inflated Poisson Regression Model

The Zero-Inflated Poisson (ZIP) regression is used for modeling count data with an excess of zero counts. The underlying theory suggests that the extra zeros are generated by a different process than the count values, and they can be modeled independently. Consequently, the ZIP model comprises two components: a Poisson count model and a logit model for predicting the excess zeros. Lambert (1992) [38] described the ZIP mixture distribution, which can be formulated as Equation (1):

P (P C L; ϕ, λ) = \{\begin{cases} ϕ + (1 - ϕ) e^{- λ}, P C L = 0 \\ (1 - ϕ) \frac{λ^{P C L}}{P C L!} e^{- λ}, P C L > 0 \end{cases}

(1)

where

ϕ

denotes the probability of an excess zero, and

λ

is the mean of the Poisson distribution. When

ϕ = 0

, the ZIP distribution simplifies to a standard Poisson distribution; when

0 < ϕ < 1

, a larger

ϕ

indicates a more significant zero inflation in the data.

The mathematical expectation and variance of the PCL scores are as Equations (2) and (3):

E (P C L) = (1 - ϕ) λ

(2)

V a r (P C L) = E (P C L) (1 + λ - E (P C L))

(3)

In the ZIP model, the variance exceeds the mean due to the additional structural zeros, thereby addressing the over-dispersion problem caused by the accumulation of zeros. To model the relationship between the dependent variable and independent variables, Lambert introduced covariates into both the ZIP parameter component and the Poisson parameter component, resulting in Equation (4):

\{\begin{array}{l} \log (λ) = β_{0} + A S D s c o r e β_{1} + \cdot \cdot \cdot + H o u s e h o l d M o n t h l y I n c o m e P e r c a p i t a β_{11} + P C 1 β_{12} + \cdot \cdot \cdot + P C 5 β_{16} \\ \log i t (φ) = γ_{0} + A S D s c o r e γ_{1} + \cdot \cdot \cdot + H o u s e h o l d M o n t h l y I n c o m e P e r c a p i t a γ_{11} + P C 1 γ_{12} + \cdot \cdot \cdot + P C 5 γ_{16} \end{array}

(4)

where β values are the regression coefficients for the non-zero count part, and γ values are the regression coefficients for the zero-inflated part. If the regression coefficients do not pass the significance test, they are set to 0. The first eleven variables in the equation represent questionnaire-derived features selected via Random Forest, as detailed in Table 1 and Table 2. The remaining five variables correspond to the top five principal components extracted from the EEG data using PCA.

3.3.2. Deep Neural Network Architectures

To evaluate the performance and efficiency of different architectures in processing multi-modal data, we designed and compared three model combinations. First, we developed a model based on the Hybrid LSTM-FCNN architecture, which combines Long Short-Term Memory (LSTM) networks with Fully Connected Neural Networks (FCNNs). LSTM, a specialized type of Recurrent Neural Network (RNN), is particularly suited for processing time series data. It effectively extracts temporal features from EEG signals, captures long-term dependencies, and mitigates issues such as vanishing or exploding gradients. After extracting features with LSTM, the model integrates them with the original data, and classification or prediction is performed through the FCNN, enhancing its ability to recognize complex patterns.

Next, we built the Hybrid RNN-FCNN model, which employs a standard RNN architecture to extract time series features. Unlike LSTM, RNNs are also capable of processing sequential data and retaining information from previous time steps. However, when handling longer sequences, RNNs may face challenges such as vanishing or exploding gradients, potentially leading to suboptimal performance in longer sequences. Nevertheless, the RNN model still plays a crucial role in time series data processing, with the extracted features processed through the FCNN.

Finally, we designed a Hybrid CNN-FCNN model, leveraging Convolutional Neural Networks (CNNs) for feature extraction from EEG signals. CNNs are typically used for high-dimensional data and their convolutional and pooling layers are well suited for automatically extracting local features. In this model, the features extracted by the CNN from the EEG signals are combined with the original data and classified or predicted through the FCNN. This architecture is particularly effective in capturing spatial information in EEG data, making it especially suitable for scenarios where the data exhibit strong local correlations.

Each model is composed of two main components, namely the feature extraction block (LSTM, RNN, or CNN) and the Fully Connected Neural Network (FCNN) [39,40,41]. The FCNN is a basic neural network structure where all neurons in each layer are connected to all neurons in the previous layer. If an FCNN has K layers, and each neuron in the k-1-th layer is connected to all neurons in the k-th layer, the activation process of the j-th neuron in the k-th layer can be expressed as Equation (5):

y_{j}^{(k)} = f_{a c} (\sum_{i} ω_{i, j}^{(k)} y_{i}^{(k - 1)} + b_{j}^{k})

(5)

where

f_{a c} (\cdot)

denotes the activation function,

y_{i}^{(k - 1)}

is the output of the i-th neuron in the k-1-th layer,

ω_{i, j}^{(k)}

represents the weight between the connected neurons, and

b_{j}^{k}

denotes the bias term for the j-th neuron. If there are J neurons in the K-th layer, then all neurons can be represented as

y^{(k)} \in ℝ^{J \times 1}

, which results in the equation (6):

y^{(k)} = f (W^{k} y^{(k - 1)} + b^{(k)})

(6)

where

y^{(k - 1)} = {[y_{1}^{(k - 1)}, y_{2}^{(k - 1)}, \dots, y_{I}^{(k - 1)}]}^{T}

represents the outputs of I neurons from the k-1-th layer,

W^{(k)} \in ℝ^{J \times I}

is the weight matrix, and

b^{(k)} \in ℝ^{J \times 1}

is the bias vector.

The specific model architectures are shown in Figure 4:

The deep learning model is constructed to include both time series and cross-sectional data input parts, which are then combined for regression prediction. The model utilizes LSTM, RNN, and CNN to handle time series data, with an FCNN added for regression prediction. The steps to establish the multi-modal data deep learning model are as Algorithm 2.

Algorithm 2: Multi-Modal Data Fusion Based on Deep Learning Models

1: Import time series and cross-sectional datasets
2: Split cross-sectional data into discrete/continuous types
3: Apply one-hot encoding to discrete data
4: Standardize continuous features
5: Combine processed discrete/continuous features
6: Separate PCL scores, Apply Box–Cox transformation [42]
7: Perform PCA dimensionality reduction on EEG data
8: Split dataset into training/testing sets (4:1 ratio)
9: Construct:
- Time series models: LSTM, RNN, CNN
- Cross-sectional data input model
10: Merge outputs via fully connected layer
11: Train model using fit method + EarlyStopping callback
12: Predict on test set
13: Evaluate model performance

This method enables the handling of complex time series data while integrating other cross-sectional data, providing more accurate and comprehensive predictions. The model’s predictive performance is evaluated by predicting the test set and calculating the MSE metric, which is compared to the performance on the training set.

3.4. Multi-Modal Fusion Strategy

Among the 903 participants, complete questionnaire data were collected for all individuals. However, only 25 participants had accompanying EEG recordings. To address this issue, our modeling strategy distinguishes between full and partial modality cases.

For the Zero-Inflated Poisson regression model, we used only the 25 participants with both EEG and questionnaire data, and manual feature extraction was applied in advance. Specifically, EEG signals were processed via B-spline basis functions followed by PCA, and questionnaire variables were selected through information gain and Random Forest methods. These steps provided structured, low-dimensional features suitable for statistical modeling.

In contrast, the deep learning models employ an end-to-end architecture that performs automatic feature extraction. Each modality is passed through a dedicated subnetwork (e.g., LSTM, RNN, or CNN for EEG), eliminating the need for explicit manual feature engineering. A late-fusion strategy is adopted, where missing modality inputs are replaced with zero-filled vectors, and a masking mechanism is employed to prevent the model from utilizing non-existent features. This design enables the model to remain functional and robust even when only a single data type is available.

3.5. Model Training and Evaluation

In this study, deep learning models were implemented using Python 3.8, TensorFlow 2.13.0 and Keras 2.13.1 framework, following a standardized training pipeline. The dataset was split into training and validation sets using the train_test_split method to ensure independent evaluation and establish a basic validation strategy. To mitigate the impact of feature scale differences, input features were normalized using StandardScaler. To prevent over-fitting, L2 regularization was applied to constrain model complexity, and Dropout layers were incorporated to randomly deactivate neurons during training.

The model was trained using the Adam optimizer with an initial learning rate of 0.001, a batch size of 32, and mean squared error (MSE) as the loss function. The training was run for a maximum of 600 epochs, with the loss and mean absolute error (MAE) evaluated on the validation set after each epoch. If the current MAE was lower than the best recorded value, the model was saved and the optimal MAE was updated accordingly.

To enhance generalization and prevent over-fitting, early stopping was applied: training was terminated if the validation loss failed to improve for 15 consecutive epochs. Additionally, a learning rate scheduler reduced the rate to one-quarter of its current value if no improvement in validation loss was observed for 8 consecutive epochs.

To ensure stability and reproducibility, training was repeated multiple times under the same configuration, yielding consistent results. The final model was trained and tested on a dataset of EEG recordings from 25 rescue workers, with 80% allocated for training and 20% for testing. The model architecture was designed to capture both temporal dynamics from EEG time series and cross-sectional individual-level information. The experimental procedure was repeated multiple times with stable results; specific parameter settings are detailed in Table 3.

4. Results

4.1. Results of the Zero-Inflated Poisson Model

Given the characteristics of the PCL total score, a ZIP regression is more appropriate than a standard Poisson regression. For the initial 25 rescue personnel sample data, the 5 EEG data features derived from PCA dimensionality reduction and the 11 questionnaire features identified through Random Forest were combined to construct the ZIP model.

Based on Table 4 and Table 5, it can be observed that the five principal components extracted from the EEG data only affect the first part of the model and have no impact on the second part. The first four principal components have a positive effect on the PCL total score, with the fourth principal component having the greatest impact. Conversely, the fifth principal component, ASD isolation, ASD avoidance, and ASD alertness have a negative effect, with ASD avoidance and the fifth principal component having the most significant negative impact.

In the second part of the Poisson model, height, ASD score, BMI, and mental toughness have positive effects, while age, weight, ASD re-experience, and household monthly income per capita have negative effects. To better visualize the model’s fitting performance, a comparison between actual values and fitted values is shown in Figure 5.

In the model fitting plot, the x-axis represents the predicted values from the model, and the y-axis represents the actual observed values. The data points are closely aligned around the fitting curve, indicating that the model’s predicted values closely match the actual observed values, suggesting a good fit of the ZIP model. The MSE of the model, calculated using R, is 0.2814173.

4.2. Output of Deep Neural Network Models

To reduce the impact of random factors and enhance the stability and reliability of the results, each model was trained 50 times, and the mean of the MSE values was taken as the comparison standard. This approach minimizes result fluctuation due to random initialization or data splits.

4.2.1. Comparison of Multi-Modal Data Deep Learning Models

After 50 iterations of repeated training, the output results of the three deep learning models were summarized and the results are presented in Table 6:

To evaluate the performance of the three hybrid deep learning models, Table 6 presents the MSE for both the training and test sets. The models—Hybrid LSTM-FCNN, Hybrid RNN-FCNN, and Hybrid CNN-FCNN—integrate multi-modal data using different types of neural networks combined with FCNN.

From the data, it is evident that Hybrid LSTM-FCNN achieves an MSE of 0.0593406 on the training set and 0.1460319 on the test set, making it the best-performing model on the test set among the three. This indicates that it has the strongest generalization capability. Hybrid RNN-FCNN shows an MSE of 0.0477106 on the training set and 0.1650831 on the test set. While it has the lowest MSE on the training set, its performance on the test set is relatively poor, suggesting potential over-fitting and weaker generalization ability. Hybrid CNN-FCNN has an MSE of 0.0789325 on the training set and 0.1571726 on the test set. Overall, it performs well but with slightly lower accuracy on the test set compared to Hybrid LSTM-FCNN.

Overall, the Hybrid LSTM-FCNN and Hybrid CNN-FCNN models perform better on multi-modal data than the Hybrid RNN-FCNN model, with the former showing the lowest test set MSE, indicating a significant advantage in predictive performance and generalization ability.

4.2.2. Ablation Experiment

To systematically investigate the contribution of different input modalities and architectural components, we conducted a comprehensive ablation study involving six model variants. The three main multi-modal fusion models—Hybrid RNN-FCNN, Hybrid CNN-FCNN, and Hybrid LSTM-FCNN—utilize EEG data, questionnaire data, and their cross-tensor features.

In contrast, TS-only uses only EEG time series input, while CSDNN uses only questionnaire features. The RNN-FCNN wo/Cross model is a structural ablation of RNN-FCNN, where the cross-modal interaction tensor is removed to evaluate its marginal impact.

The results in Table 7 reveal that all three full multi-modal models outperform their ablated or single-source counterparts in both training and test performance. Among them, Hybrid LSTM-FCNN achieves the lowest test MSE, indicating the best overall performance. The TS-only model, which lacks questionnaire data, exhibits the worst generalization, confirming that EEG data alone is insufficient for reliable PTSD risk prediction. The CSDNN model performs better than TS-only but is still worse than any full multi-modal model. The RNN-FCNN wo/ Cross model shows only a minor performance drop compared to its full version, suggesting that while cross-tensor features are beneficial, they are not strictly essential.

To verify whether the four models exhibit over-fitting, we plotted the training and validation losses over epochs for the four different models as shown in Figure 6.

For all models, both training and validation losses decrease and eventually stabilize as the number of training epochs increases. This indicates that the models progressively learn how to better fit the data during the training process. The proximity and stabilization of the training and validation loss curves suggest consistent performance on both the training and validation sets, and there is no clear sign of over-fitting in Figure 6.

These analyses indicate that by integrating different types of data sources, multi-modal models can capture the rich information that single data models fail to obtain, thereby enhancing predictive performance. The results in the table suggest that multi-modal data models have significant potential in effectively utilizing multiple information sources, enhancing data feature representation capabilities, and improving model generalization. Additionally, multi-modal models are better equipped to understand and handle the complex relationships between different data, providing greater adaptability in practical applications.

4.2.3. Output Distribution of Different Deep Learning Models

To compare the MSE distribution on the test set for different models and evaluate the performance stability and accuracy of the models, the MSE results of four different deep learning models were visualized. The box plot of MSEs from 50 iterations on the test set is shown in Figure 7.

The box plot illustrates the range of quartiles, medians, and outliers, providing a visual display of data concentration and dispersion. As shown in Figure 7, Hybrid LSTM-FCNN has the lowest and most concentrated test MSE, with a median around 0.15 and the smallest error range, indicating excellent performance. Hybrid CNN-FCNN has a slightly higher test MSE than Hybrid LSTM-FCNN but still performs well, with errors concentrated between 0.15 and 0.2. Hybrid RNN-FCNN shows a slightly higher test MSE, mainly between 0.15 and 0.2, but with some dispersion. CSDNN has the highest test MSE and the greatest dispersion, with a median close to 0.25 and several outliers.

This plot indicates that the Hybrid LSTM-FCNN model performs best on the test data, with the smallest errors and highest stability, while the CSDNN model performs the worst, with larger errors and less stability. The box plot allows us to conclude that the Hybrid LSTM-FCNN has the best performance on the test set, with the lowest MSE and the most concentrated error distribution, indicating high prediction accuracy and stability. In contrast, the CSDNN model has the poorest performance, with higher and more dispersed errors, indicating lower prediction accuracy and stability. The Hybrid RNN-FCNN and Hybrid CNN-FCNN models fall between these two extremes, with Hybrid CNN-FCNN performing better than Hybrid RNN-FCNN.

Overall, the Hybrid LSTM-FCNN remains the optimal model choice when time is sufficient, while the Hybrid CNN-FCNN shows a more balanced performance and is a more suitable model choice when there are strict training time constraints.

4.2.4. Test of Statistical Significance of Model Performance

To evaluate whether the performance differences observed among the deep learning models are statistically significant, we conducted pairwise independent sample t-tests on the test MSE values obtained from 50 repeated training iterations for each model. The results, summarized in Table 8, provide clear statistical evidence supporting the effectiveness of multi-modal fusion approaches.

Specifically, the LSTM-FCNN model significantly outperformed the CSDNN baseline (t = −4.780, p = 0.0088), confirming that the integration of EEG signals with psychological questionnaire data substantially enhances model performance. Similarly, CNN-FCNN also yielded significantly lower MSE than CSDNN (t = −3.878, p = 0.0179), reinforcing the general advantage of multi-modal modeling over single-modality input. Furthermore, a statistically significant difference was observed between LSTM-FCNN and CNN-FCNN (t = 2.891, p = 0.0445), suggesting that LSTM-based architectures are more effective in modeling temporal dependencies in EEG data than CNNs under the current experimental setup. RNN-FCNN also demonstrated statistically significant improvement over CSDNN (t = −2.785, p = 0.0496), indicating that even simpler recurrent structures can benefit from multi-modal data integration.

In contrast, other model comparisons such as RNN-FCNN versus LSTM-FCNN (p = 0.4365) and CNN-FCNN versus TS-only (p = 0.1775) did not reach statistical significance, implying that although numerical differences exist, they may not reflect genuine performance gaps but rather result from random variation. Taken together, these results strongly support the conclusion that multi-modal fusion—particularly through architectures leveraging recurrent mechanisms like LSTM—can yield statistically robust improvements in prediction accuracy for PTSD risk modeling.

4.3. Results Comparison

When using only questionnaire data, we found that the ZIP Regression Model had an MSE of 1.118 on the training set and 1.275 on the test set. Meanwhile, the FCNN model significantly reduced the MSE to 0.130 on the training set and 0.183 on the test set, which is clearly better than the ZIP model. Moreover, our LSTM-FCNN model’s performance on RMSE was particularly outstanding, with values far below the best RMSE result of 11.33 reported for PTSD prediction models in the study by Dale L. Smith and Philip Held (2023) [43], indicating a significant improvement in prediction accuracy.

By integrating questionnaire and EEG data through multi-modal fusion, our prediction model’s performance significantly improved. Specifically, the ZIP model’s MSE on the multi-modal dataset decreased to 0.2814 for the training set and 0.2972 for the test set, which was a significant improvement over its performance using only questionnaire data. Furthermore, the deep learning models, especially the Hybrid LSTM-FCNN model, demonstrated higher accuracy on multi-modal data, clearly outperforming the ZIP model.

Comparing the output results of the training and test sets shows that the multi-modal dataset outperforms the single dataset, whether using the ZIP model or deep neural network models. Additionally, statistical analysis using R 4.4.0 software further confirms the advantages of multi-modal data: on single data, the R-squared of the ZIP model is 0.9914, and the MAE is 0.5213; on multi-modal data, the R-squared increases to 0.9978, and the MAE decreases to 0.2878. The improvement in these statistical metrics clearly demonstrates the potential of multi-modal data in enhancing the accuracy of PTSD predictions. As shown in Table 9.

5. Discussion

This study aimed to improve the accuracy and comprehensiveness of PTSD risk assessment by developing models based on multi-modal data fusion. The two models proposed in this research—the multi-modal ZIP Regression Model and the multi-modal Deep Neural Network Model—demonstrated significant predictive capabilities. In the multi-modal ZIP Regression Model, we found that the principal components of EEG features had a significant impact on the PCL total score, particularly in the count component. Additionally, ASD-related variables from the questionnaire data also showed importance. On the other hand, the Hybrid LSTM-FCNN model exhibited the best predictive performance on multi-modal data, with an MSE of 0.1460319 on the test set, which is substantially lower than that of the single-data models. These findings align with the observations of Wang et al. [14].

Multi-modal data models have clear advantages over single-data models in predicting PTSD risk. For example, the training MSE of the ZIP Regression Model on the multi-modal dataset was 0.2814173, compared to 1.118069 for the single-data model. The Hybrid LSTM-FCNN model further enhanced predictive performance, particularly on the test set, where its MSE was 0.1460319, which is significantly lower than that of the other models.

The multi-modal data fusion approach used in this study effectively combined questionnaire data and EEG data, thereby improving the predictive capabilities of the models. During feature selection, Random Forest was used to identify the most influential variables from the questionnaire data, while PCA and LSTM were employed for dimensionality reduction and feature extraction from EEG data. This approach not only reduced the number of features but also maintained the complexity and diversity of the data, enhancing the model’s generalization ability.

Despite these promising results, three limitations should be noted. First, the models were trained on a relatively limited dataset, raising concerns about potential over-fitting to population-specific patterns. Second, although deep learning models offer high predictive power, their “black-box” nature impedes clinical trust—an issue emphasized in prior studies. Third, the use of PCA and LSTM for EEG feature extraction may overlook fine-grained temporal dynamics; alternative techniques such as wavelet transforms may better preserve the complexity of neural signals. Fourth, and crucially, PTSD symptoms and associated risk factors evolve dynamically over time, which may pose a challenge for static models trained solely on cross-sectional data. In our study, although the questionnaire data are cross-sectional, the EEG component inherently contains temporal information. This temporal structure was captured through the use of B-spline basis functions and deep sequence models such as LSTM, which allowed the model to extract short-term temporal patterns embedded within EEG signals. However, these reflect intra-session dynamics rather than long-term longitudinal changes across weeks or months. Therefore, to further improve real-world applicability, future research should incorporate repeated PTSD assessments and multi-session EEG recordings, enabling the construction of adaptive models that learn continuously and capture long-term progression trajectories.

Future work should focus on three key directions: (1) expanding datasets to include more diverse populations and multi-center collaborations; (2) enhancing model interpretability using approaches such as attention mechanisms to associate specific EEG features with PTSD symptoms; and (3) integrating additional modalities to increase the biological granularity of predictions.

Furthermore, while multi-modal data fusion offers significant performance advantages, it also raises ethical concerns, particularly when applied to vulnerable populations such as trauma survivors. The inclusion of biometric data such as EEG can lead to privacy and consent issues if not handled appropriately. To address these concerns, we adhered to strict data handling protocols during this study and propose incorporating privacy-preserving mechanisms such as data anonymization, secure storage, and consent-based access frameworks in future deployments. It is especially crucial to prevent potential misuse of sensitive data (e.g., in insurance risk profiling), which could further harm already at-risk individuals. As such, future implementations of this model must align with ethical standards and regulatory requirements to ensure both the effectiveness and protection of participants.

Notably, the data we obtained had already undergone de-identification by the data provider. Personal identifiers such as names, places of origin, and detailed family information were not included in the dataset. This minimized the risk of re-identification and ensured that the data used in model training met basic privacy protection standards from the outset. Nevertheless, we recognize that even anonymized biometric and psychological data carry residual privacy risks, particularly in small or vulnerable populations, and thus require continued attention to data governance in future applications.

6. Conclusions

By combining EEG data with questionnaire features, multi-modal fusion offers a more comprehensive and enriched data input, significantly improving the predictive performance of the model. Multi-modal data capture various types of information that single-data models cannot obtain, thereby enhancing both the model’s fitting ability and generalization performance. This study demonstrates that hybrid models incorporating multi-modal data yield lower mean squared errors (MSEs) on both training and test sets compared to single-data models, with the Hybrid LSTM-FCNN model performing best on the test set.

The ZIP model showed a good fit when handling multi-modal data, with an R² value close to 1, indicating strong explanatory power and a good fit. However, for single-data models, although the R² value of the ZIP model was also high, both the MSE and mean absolute error (MAE) were higher, suggesting that single-data models may be prone to over-fitting. In contrast, multi-modal data models are better able to balance fitting and generalization.

Deep neural networks, such as LSTM, RNN, and CNN, demonstrated excellent performance in processing complex time series and high-dimensional data. LSTM is particularly effective at capturing long-term dependencies, CNN excels in feature extraction, and RNN is well suited for sequential data processing. The Hybrid LSTM-FCNN model, by combining LSTM with Fully Connected Neural Networks, not only performed well on the training data but also exhibited strong generalization on the test data, highlighting its significant advantages in capturing data patterns and handling multi-modal data.

Overall, the integration of multi-modal fusion with advanced neural network architectures provides robust technical support for improving PTSD risk prediction. While the Hybrid LSTM-FCNN model is well suited for high-accuracy screening tasks, the ZIP model remains valuable for interpretive diagnostics. A tiered framework—employing deep learning models for initial screening followed by interpretable models for clinical explanation—may offer a balanced approach that maximizes both predictive performance and clinical utility in PTSD risk management among rescue personnel.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math13111901/s1, Detailed data, please refer to the Supplementary Materials.

Author Contributions

Conceptualization, C.H. and Y.L.; methodology, Y.S., D.Z. and Y.L.; software, D.Z., T.Z. and C.H.; validation, Y.S., C.H. and Y.L.; formal analysis, Y.S., D.Z. and T.Z.; data curation, Y.S. and Y.L.; writing—original draft preparation, Y.L. and Y.S.; writing—review and editing, C.H. and Y.L.; visualization, D.Z. and T.Z.; supervision, C.H. and Y.L.; funding acquisition Y.L. and C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Social Science Foundation of China grant number 24BTJ068, National Science Foundation of China grant number 11701161 and Key Humanities and Social Science Fund of Hubei Provincial Department of Education grant number 20D043.

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the study used publicly available datasets from a public competition that were fully de-identified and accessible without restriction. As the data do not contain identifiable personal information and were obtained from a public source, formal ethical review and approval were not required for this study. The analysis was conducted in accordance with the terms of use of the dataset and applicable guidelines.

Informed Consent Statement

Subject consent was waived due to the study uses only anonymous public data with no individual.

Data Availability Statement

The raw data is provided within the Supplementary Materials. All data generated from interim analysis can be obtained by contacting the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kocer, D.; Farriols, N.; Cifre, I.; Nomen, M.; Lalande, M.; Calvelo, A. PTSD Among Refugee Rescue Workers: Effects of Compassion Satisfaction and Fatigue on Burnout. J. Loss Trauma 2023, 29, 421–437. [Google Scholar] [CrossRef]
Bremner, J.D.; Hoffman, M.; Afzal, N.; Cheema, F.A.; Novik, O.; Ashraf, A.; Brummer, M.; Nazeer, A.; Goldberg, J.; Vaccarino, V. The environment contributes more than genetics to smaller hippocampal volume in Posttraumatic Stress Disorder (PTSD). J. Psychiatr. Res. 2021, 137, 579–588. [Google Scholar] [CrossRef] [PubMed]
Chatzea, V.E.; Sifaki-Pistolla, D.; Vlachaki, S.A.; Melidoniotis, E.; Pistolla, G. PTSD, burnout, and well-being among rescue workers: Seeking to understand the impact of the European refugee crisis on rescuers. Psychiatry Res. 2018, 262, 446–451. [Google Scholar] [CrossRef] [PubMed]
Perrin, M.A.; DiGrande, L.; Wheeler, K.; Thorpe, L.; Farfel, M.; Brackbill, R. Differences in PTSD Prevalence and Associated Risk Factors Among World Trade Center Disaster Rescue and Recovery Workers. Am. J. Psychiatry 2007, 164, 1385–1394. [Google Scholar] [CrossRef]
Luo, T.G.; Sun, Y.; Chen, Q.; Chen, J.; Jiang, R.H.; Li, S.Z.; Sai, X.Y. Preliminary establishment of an early susceptibility screening model for post-traumatic stress disorder in rescue workers. Chin. J. Emerg. Resusc. Disaster Med. 2020, 15, 739–742+745. [Google Scholar]
Zhou, S.L.; Jin, Y.H.; Xia, X.H.; Lu, C. Investigation of post-traumatic stress disorder symptoms among medical rescue workers affected by the Tianjin port explosion. Tianjin Nurs. 2018, 26, 184–187. [Google Scholar]
Zhou, Z. A Study on the Symptomatology of PTSD in Chinese Fire Rescue Personnel Based on DSM-5; Institute of Psychology, Chinese Academy of Sciences, University of Chinese Academy of Sciences: Beijing, China, 2021. [Google Scholar]
Berger, W.; Coutinho, E.S.F.; Figueira, I.; Marques-Portella, C.; Luz, M.P.; Neylan, T.C.; Marmar, C.R.; Mendlowicz, M.V. Rescuers at risk: A systematic review and meta-regression analysis of the worldwide current prevalence and correlates of PTSD in rescue workers. Soc. Psychiatry Psychiatr. Epidemiol. 2012, 47, 1001–1011. [Google Scholar] [CrossRef]
Hyland, P.; Shevlin, M.; Brewin, C.R.; Cloitre, M.; Karatzias, T. Validation of post-traumatic stress disorder (PTSD) and complex PTSD using the International Trauma Questionnaire. Acta Psychiatr. Scand. 2017, 136, 313–322. [Google Scholar] [CrossRef]
Shalev, A.Y.; Gevonden, M.; Ratanatharathorn, A.; Laska, E.; van der Mei, W.F.; Qi, W.; Lowe, S.; Lai, B.S.; Bryant, R.A.; Delahanty, D.; et al. Estimating the risk of PTSD in recent trauma survivors: Results of the International Consortium to Predict PTSD (ICPP). World Psychiatry 2019, 18, 77–87. [Google Scholar] [CrossRef]
Wshah, S.; Skalka, C.; Price, M. Predicting posttraumatic stress disorder risk: A machine learning approach. JMIR Ment. Health 2019, 6, e13946. [Google Scholar] [CrossRef]
Dupont, T.; Kentish-Barnes, N.; Pochard, F.; Azoulay, E. Prediction of post-traumatic stress disorder in family members of ICU patients: A machine learning approach. Intensive Care Med. 2024, 50, 114–124. [Google Scholar] [CrossRef] [PubMed]
Shahzad, M.N.; Ali, H. Deep learning-based diagnosis of PTSD using 3D-CNN and resting-state fMRI data. Psychiatry Res. Neuroimaging 2024, 343, 111845. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Ouyang, H.; Jiao, R.; Liu, W. The application of machine learning techniques in posttraumatic stress disorder: A systematic review and meta-analysis. NPJ Digit. Med. 2024, 7, 121. [Google Scholar] [CrossRef] [PubMed]
Ge, R.; Zhou, M.; Luo, Y.; Meng, Q.; Mai, G.; Wang, G.; Zhou, F. McTwo: A two-step feature selection algorithm based on maximal information coefficient. BMC Bioinform. 2016, 17, 142. [Google Scholar] [CrossRef]
Liu, X.; Zhou, L.; Luo, Y. Pruning deep neural network models via minimax concave penalty regression. Appl. Sci. 2024, 14, 3669. [Google Scholar] [CrossRef]
Li, W.; Li, H.; Luo, Y. Dynamic and static enhanced BIRCH for functional data clustering. IEEE Access 2023, 11, 111448–111465. [Google Scholar] [CrossRef]
Scherer, S.; Stratou, G.; Gratch, J.; Morency, L.-P. Investigating voice quality as a speaker-independent indicator of depression and PTSD. Proc. Interspeech 2013, 2013, 847–851. [Google Scholar] [CrossRef]
Vilenchik, D.; Cwikel, J.; Ezra, Y.; Hausdorff, T.; Lazarov, M.; Sergienko, R.; Abramovitz, R.; Schmidt, I.; Perez, A. Method matters: Enhancing voice-based depression detection with a new data collection framework. Depress. Anxiety 2025, 2025, 4839334. [Google Scholar] [CrossRef]
Anketell, C.; Dorahy, M.J.; Shannon, M.; Elder, R.; Hamilton, G.; Corry, M.; MacSherry, A.; Curran, D.; O’Rawe, B. An exploratory analysis of voice hearing in chronic PTSD: Potential associated mechanisms. J. Trauma Dissociation 2010, 11, 93–107. [Google Scholar] [CrossRef]
Xu, R.; Mei, G.; Zhang, G.; Gao, P.; Judkins, T.; Cannizzaro, M.; Li, J. A voice-based automated system for PTSD screening and monitoring. Stud. Health Technol. Inform. 2012, 173, 552–558. [Google Scholar] [CrossRef]
Gootzeit, J.; Markon, K. Factors of PTSD: Differential specificity and external correlates. Clin. Psychol. Rev. 2011, 31, 993–1003. [Google Scholar] [CrossRef] [PubMed]
Chen, H.J.; Guo, Y.; Ke, J.; Qiu, J.; Zhang, L.; Xu, Q.; Zhong, Y.; Lu, G.M.; Qin, H.; Qi, R.; et al. Characterizing typhoon-related posttraumatic stress disorder based on multimodal fusion of structural, diffusion, and functional magnetic resonance imaging. Neuroscience 2024, 537, 141–150. [Google Scholar] [CrossRef] [PubMed]
Liu, B.; Liu, J.; Wang, G.; Huang, K.; Li, F.; Zheng, Y.; Luo, Y.; Zhou, F. A novel electrocardiogram parameterization algorithm and its application in myocardial infarction detection. Comput. Biol. Med. 2015, 61, 178–184. [Google Scholar] [CrossRef] [PubMed]
Gao, R.; Zhang, G.; Wang, X.; Wang, X.M.; Yu, T. Application of a multimodal neuroimaging data management system in functional neurosurgery. Chin. J. Mod. Neurol. 2024, 24, 525–531. [Google Scholar]
Zhao, Y.; Jin, X.; Xiao, H.; Wang, Y. Application of a multimodal model combining computer vision and structured data in the referral of diabetic retinopathy. Chin. J. Digit. Med. 2024, 19, 29–35. [Google Scholar]
Cai, C.F.; Li, J.; Jiao, Y.P.; Wang, X.X.; Guo, G.H.; Xu, J. Progress and challenges of deep learning-based multimodal data fusion methods in oncology. Data Comput. Front. 2024, 6, 3–14. [Google Scholar]
Azam, M.A.; Khan, K.B.; Salahuddin, S.; Rehman, E.; Khan, S.A.; Khan, M.A.; Kadry, S.; Gandomi, A.H. A review on multimodal medical image fusion: Compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics. Comput. Biol. Med. 2022, 144, 105253. [Google Scholar] [CrossRef]
Zhou, M.; Luo, Y.; Sun, G.; Mai, G.; Zhou, F. Constraint programming based biomarker optimization. BioMed Res. Int. 2015, 2015, 910515. [Google Scholar] [CrossRef]
Li, W.; Hu, C.; Luo, Y. A deep learning approach with extensive sentiment analysis for quantitative investment. Electronics 2023, 12, 3960. [Google Scholar] [CrossRef]
Krüger-Gottschalk, A.; Knaevelsrud, C.; Rau, H.; Dyer, A.; Schäfer, I.; Schellong, J.; Ehring, T. The German version of the Posttraumatic Stress Disorder Checklist for DSM-5 (PCL-5): Psychometric properties and diagnostic utility. BMC Psychiatry 2017, 17, 379. [Google Scholar] [CrossRef]
Keane, T.M.; Rubin, A.; Lachowicz, M.; Brief, D.; Enggasser, J.L.; Roy, M.; Hermos, J.; Helmuth, E.; Rosenbloom, D. Temporal stability of DSM–5 posttraumatic stress disorder criteria in a problem-drinking sample. Psychol. Assess. 2014, 26, 1138–1145. [Google Scholar] [CrossRef] [PubMed]
Dong, X.Y.; Tao, G.Y.; Zhang, L.; Zhao, J.; Liu, J.N. Psychological crisis management for frontline nursing staff during emergencies. Chin. Med. Ethics 2021, 34, 173–177. [Google Scholar]
Forkus, S.R.; Raudales, A.M.; Rafiuddin, H.S.; Weiss, N.H.; Messman, B.A.; Contractor, A.A. The Posttraumatic Stress Disorder (PTSD) Checklist for DSM–5: A systematic review of existing psychometric evidence. Clin. Psychol. Sci. Pract. 2023, 30, 110–121. [Google Scholar] [CrossRef] [PubMed]
Cheng, L.; Li, D.; Li, X.; Yu, S. The optimal wavelet basis function selection in feature extraction of motor imagery electroencephalogram based on wavelet packet transformation. IEEE Access 2019, 7, 174465–174481. [Google Scholar] [CrossRef]
Stewart, W.K.; Pretty, G.C.; Shaw, M.G.; Chase, J.G. Creating smooth SI. B-spline basis function representations of insulin sensitivity. Biomed. Signal Process. Control 2018, 44, 270–278. [Google Scholar] [CrossRef]
Zhong, B.; Wang, P.F.; Wang, Y.Q.; Wang, X.L. A review of EEG data analysis techniques based on deep learning. J. Zhejiang Univ. Eng. Sci. 2024, 58, 879–890. [Google Scholar]
Lambert, D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 1992, 34, 1–14. [Google Scholar] [CrossRef]
Zhao, F.; Zhang, C.; Geng, B. Deep multimodal data fusion. ACM Comput. Surv. 2024, 56, 216. [Google Scholar] [CrossRef]
Wang, M.H.; Fang, H.J.; Gong, H.X.; Luo, J.L. Feature extraction and recognition of EEG signals using a multi-scale hybrid convolutional network. J. Huaqiao Univ. (Nat. Sci.) 2023, 44, 628–635. [Google Scholar]
Makhdoomi, S.M.; Rakhra, M.; Singh, D.; Singh, A. Artificial-intelligence based prediction of post-traumatic stress disorder (PTSD) using EEG reports. In Proceedings of the 2022 5th International Conference on Contemporary Computing and Informatics (IC3I), Uttar Pradesh, India, 14–16 December 2022; pp. 1073–1077. [Google Scholar] [CrossRef]
Yingqian, L.; Yundong, T. Functional coefficient cointegration models with Box–Cox transformation. Econ. Lett. 2024, 234, 111472. [Google Scholar] [CrossRef]
Smith, D.L.; Held, P. Moving toward precision PTSD treatment: Predicting veterans’ intensive PTSD treatment response using continuously updating machine learning models. Psychol. Med. 2023, 53, 5500–5509. [Google Scholar] [CrossRef]

Figure 1. Histogram of PCL scores.

Figure 2. EEG signal graph.

Figure 3. B-Spline basis function fitting of EEG data.

Figure 4. Multi-modal Hybrid LSTM-FCNN model framework.

Figure 5. Fitting plot of the multi-modal ZIP regression model.

Figure 6. Comparison of training and validation losses across epochs for different models.

Figure 7. MSE distribution for different deep learning models on the test set.

Table 1. Descriptive statistics of continuous variables.

Variables	Amount	Minimum Value	Maximum Value	Mean Value	Standard Deviation	Skewness	Kurtosis
Age	903	15	56	24.21	3.929	1.779	6.33
Height	903	158	200	174.19	5.360	0.303	0.61
Weight	903	40	130	71.11	12.740	2.275	6.98
ASD score	903	19	91	21.56	6.466	4.306	26.2
ASD isolation	903	5	21	5.69	1.863	3.643	16.1
ASD re-experience	903	5	25	5.57	1.763	4.769	30.6
ASD avoidance	903	4	20	4.53	1.6	3.87	19.1
ASD alertness	903	5	25	5.78	1.905	4.011	22.8
Mentaltoughness	903	10	50	39.48	14.791	−1.231	−0.01
BMI	903	13	45.35	23.44	4.18	2.940	10.6
PCL score	903	0	72	1.40	5.320	6.776	61.3

Table 2. Description of categorical variables.

Variables	Description
Gender	1 = Male; 2 = Female
Education level	1 = College or above; 2 = High school; 3 = Technical secondary or below
Marital status	1 = Single; 2 = Married; 3 = Other
Household monthly income per capita	1 = <3000; 2 = 3000–4999; 3 = 5000–8000; 4 = >8000
Recent accident experience	1 = No; 2 = Yes
Family or close friend accident	1 = No; 2 = Yes
Witnessed severe injury	1 = No; 2 = Yes
Seen dead body during rescue	1 = No; 2 = Yes
Trauma scene emotional reaction	1 = No; 2 = Yes
Mental illness history	1 = No; 2 = Yes
Medication use in the last month	1 = No; 2 = Yes
Smoking history	1 = No; 2 = Yes
Smoking status	1 = Non-smoker with no secondhand smoke; 2 = Non-smoker with passive smoking; 3 = Active smoker
Alcohol consumption	1 = No; 2 = Yes
ASD diagnosis	0 = Negative; 1 = Positive
Genetic history	1 = Negative; 2 = Positive

Table 3. Selection of model parameters.

Model	Hybrid LSTM-FCNN	Hybrid RNN-FCNN	Hybrid CNN-FCNN	FCNN
Trait	Hybrid LSTM-FCNN	Hybrid RNN-FCNN	Hybrid CNN-FCNN	FCNN
LSTM layers	2
Dropout rate applied to LSTM layers	0.5
RNN layers		2
Dropout rate applied to RNN layers		0.5
Number of convolutional layers			2
Number of pooling layers			2
Number of neurons in the first fully connected layer	32	32	32	32
Dropout rate of the first fully connected layer	0.5	0.5	0.5	0.5
Number of neurons in the second fully connected layer	16	16	16	16
Dropout rate of the second fully connected layer	0.5	0.5	0.5	0.5
Training epochs	600	600	600	600
Batch size	32	32	32	32
Optimizer	Adam	Adam	Adam	Adam

Table 4. Key influencing variables in the Poisson component.

Variables	Coefficient
Age	−0.02988
Weight	−1.68882
Height	1.33177
ASD score	0.29176
BMI	5.13504
Mental toughness	0.03725
Household monthly income per capita	−0.21238
ASD re-experience	−0.69806

Table 5. Key influencing variables in the count component.

Variables	Coefficient
ASD isolation	−3.133
ASD avoidance	−15.532
ASD alertness	−4.126
PC1	1.047
PC2	5.304
PC3	2.363
PC4	7.578
PC5	−11.018

Table 6. Comparison of deep neural networks with multi-modal data.

Deep Learning Model	MSE-Train	MSE-Test
Hybrid LSTM-FCNN	0.0593406	0.1460319
Hybrid RNN-FCNN	0.0477106	0.1650831
Hybrid CNN-FCNN	0.0789325	0.1571726

Table 7. Comparison of prediction results for different data types.

Data Type	Model	MSE-Train	MSE-Test
Multi-modal	Hybrid LSTM-FCNN	0.0593406	0.1460319
	Hybrid RNN-FCNN	0.0477106	0.1650831
	Hybrid CNN-FCNN	0.0789325	0.1571726
Ablation	CSDNN	0.1302956	0.1830982
	TS-only	0.1738542	0.3044681
	RNN-FCNN wo/Cross	0.0518096	0.1696373

Table 8. Statistical test results.

Model Comparison	t Value	p Value
RNN-FCNN vs. CSDNN	−2.785	0.0496
CNN-FCNN vs. LSTM-FCNN	2.891	0.0445
CNN-FCNN vs. CSDNN	−3.878	0.0179
LSTM-FCNN vs. CSDNN	−4.780	0.0088
CSDNN vs. RNN-FCNN wo/Cross	2.287	0.0842

Table 9. Model error comparison.

Data Type	Model	MSE-Train	MSE-Test
Single data	ZIP	1.118069	1.275401
	CSDNN	0.1302956	0.1830982
Multi-modal data	ZIP	0.2814173	0.2971681
	Hybrid LSTM-FCNN	0.0593406	0.1460319
	Hybrid RNN-FCNN	0.0477106	0.1650831

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, Y.; Shang, Y.; Zhu, D.; Zhang, T.; Hu, C. Research on a PTSD Risk Assessment Model Using Multi-Modal Data Fusion. Mathematics 2025, 13, 1901. https://doi.org/10.3390/math13111901

AMA Style

Luo Y, Shang Y, Zhu D, Zhang T, Hu C. Research on a PTSD Risk Assessment Model Using Multi-Modal Data Fusion. Mathematics. 2025; 13(11):1901. https://doi.org/10.3390/math13111901

Chicago/Turabian Style

Luo, Youxi, Yucui Shang, Dongfeng Zhu, Tian Zhang, and Chaozhu Hu. 2025. "Research on a PTSD Risk Assessment Model Using Multi-Modal Data Fusion" Mathematics 13, no. 11: 1901. https://doi.org/10.3390/math13111901

APA Style

Luo, Y., Shang, Y., Zhu, D., Zhang, T., & Hu, C. (2025). Research on a PTSD Risk Assessment Model Using Multi-Modal Data Fusion. Mathematics, 13(11), 1901. https://doi.org/10.3390/math13111901

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on a PTSD Risk Assessment Model Using Multi-Modal Data Fusion

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Data Description

3.1.1. Questionnaire Data

3.1.2. EEG Data

3.2. Feature Extraction

3.2.1. Questionnaire Feature Selection

3.2.2. EEG Data Feature Extraction

3.3. Modeling Framework

3.3.1. Zero-Inflated Poisson Regression Model

3.3.2. Deep Neural Network Architectures

3.4. Multi-Modal Fusion Strategy

3.5. Model Training and Evaluation

4. Results

4.1. Results of the Zero-Inflated Poisson Model

4.2. Output of Deep Neural Network Models

4.2.1. Comparison of Multi-Modal Data Deep Learning Models

4.2.2. Ablation Experiment

4.2.3. Output Distribution of Different Deep Learning Models

4.2.4. Test of Statistical Significance of Model Performance

4.3. Results Comparison

5. Discussion

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI