Predicting Fear Extinction in Posttraumatic Stress Disorder

Fear extinction is the basis of exposure therapies for posttraumatic stress disorder (PTSD), but half of patients do not improve. Predicting fear extinction in individuals with PTSD may inform personalized exposure therapy development. The participants were 125 trauma-exposed adults (96 female) with a range of PTSD symptoms. Electromyography, electrocardiogram, and skin conductance were recorded at baseline, during dark-enhanced startle, and during fear conditioning and extinction. Using a cross-validated, hold-out sample prediction approach, three penalized regressions and conventional ordinary least squares were trained to predict fear-potentiated startle during extinction using 50 predictor variables (5 clinical, 24 self-reported, and 21 physiological). The predictors, selected by penalized regression algorithms, were included in multivariable regression analyses, while univariate regressions assessed individual predictors. All the penalized regressions outperformed OLS in prediction accuracy and generalizability, as indexed by the lower mean squared error in the training and holdout subsamples. During early extinction, the consistent predictors across all the modeling approaches included dark-enhanced startle, the depersonalization and derealization subscale of the dissociative experiences scale, and the PTSD hyperarousal symptom score. These findings offer novel insights into the modeling approaches and patient characteristics that may reliably predict fear extinction in PTSD. Penalized regression shows promise for identifying symptom-related variables to enhance the predictive modeling accuracy in clinical research.


Introduction
Laboratory fear extinction studies have provided the mechanistic basis of "goldstandard" exposure therapies for posttraumatic stress disorder (PTSD; [1]), but as many as half of patients do not recover [2].Psychophysiological measures are promising clinical tools for predicting fear extinction and eventually informing extinction-based exposure treatments [3].However, to fulfill this promise, novel statistical approaches may be needed to increase the accuracy and generalizability of fear extinction predictions [3][4][5].Numerous clinical and self-report measures have been associated with fear extinction in some studies, but their findings have not consistently replicated [4,[6][7][8].Thus, it may be beneficial to apply novel bottom-up statistical approaches to simultaneously evaluate multiple predictors of fear extinction in a single study [4,9,10].Penalized regressions are a class of machine learning approaches used in clinical psychology research to increase predictive accuracy, improve generalizability, and select predictors [11][12][13].Additionally, prior studies have shown that penalized regression analyses can be combined with complementary OLS regression analyses to identify promising predictors in clinical research [13,14].However, we are not aware of any studies that have applied penalized regressions, with or without complementary OLS regression analyses, to predict fear extinction in PTSD samples.
Despite the clear clinical relevance of fear extinction in PTSD treatment [15], there is no consensus regarding whether current PTSD diagnoses or symptom severity robustly predict fear extinction [7,9].For example, some studies have found evidence of deficient fear extinction learning in PTSD patients versus controls, while others have not (as reviewed by [7]).The findings have been similarly mixed when examining PTSD symptom severity [16,17] and specific PTSD symptom clusters [16,[18][19][20] as predictors of fear extinction.Several explanations for these inconsistent findings have been proposed, including: the biological [18,21] and clinical [21,22] heterogeneity of PTSD, the insufficient statistical power of previous studies [7], and the methodological heterogeneity across previous studies [9,23].These findings align with prior calls for an increased emphasis on the within-group differences in fear extinction in PTSD [21,22].Further, they highlight a need for strategies to improve predictive performance and generalizability [7,9,23].Ultimately, the accurate and generalizable prediction of fear extinction in PTSD samples may inform advances in precision extinction-based treatments [3,15].
Physiological activity during stressors that occur before fear extinction learning also may predict extinction in PTSD.One such stressor is the dark-enhanced startle paradigm, which elicits physiological responses in an anxiety-provoking environment (i.e., an unfamiliar dark room) [58].Alterations in parasympathetic (e.g., HRV) and sympathetic (e.g., heart rate) activity evoked by dark-enhanced startle tasks have been associated with PTSD and fear extinction [58].Similarly, fear acquisition, a pre-requisite for de novo laboratory fear extinction [1], has been associated with physiological responses during extinction in some PTSD studies [19,59].Notably, in PTSD samples, responding to a safety cue (CS−) during fear acquisition may be a better predictor of extinction than physiological responses to a danger cue (CS+), suggesting a relationship between deficient safety inhibition and impaired fear extinction in PTSD [59,60].In addition to startle, skin conductance response (SCR) evoked by a CS+ and CS− and individual differences in continuously measured (i.e., tonic) heart rate and HRV during acquisition are candidate predictors of extinction in PTSD [59][60][61][62].Thus, all these physiological variables hold promise as predictors of extinction in PTSD.
Approaches that can consider a range of potential predictors for fear extinction include novel multivariable statistical techniques such as penalized regressions.Applying a data-driven approach that explores numerous variables in a single study may be more impactful than a series of studies that examine one to a few at a time [4].Penalized regressions are machine learning algorithms that enable the simultaneous modeling of many predictors [63,64].As such, these models hold promise for improving the accuracy and generalizability of multivariable predictions of fear extinction.Ridge Regression, Lasso Regression, and Elastic Net Regression (ENR) are three commonly used penalized regression models that have advanced the clinical research in other domains [65][66][67].In a study of depression treatment outcomes, each was found to be more accurate than traditional Ordinary Least Squares (OLS) regression in both a training sample and separate holdout sample [11].This suggests that penalized regression algorithms can increase the accuracy and generalizability of clinical predictions [11].One advantage of penalized regression versus OLS regression is the use of regularization to address multicollinearity: this enables the development of multivariable models that can account for many intercorrelated predictor variables [63].A second advantage is that Lasso and ENR perform variable selection by including a penalty term that drops some variables from the model [11,13,68].The penalty term allows Lasso and ENR to select the subset of candidate predictors that minimize the prediction error [11,13,68].Thus, in addition to an increased predictive accuracy and generalizability, Lasso and ENR are useful tools for selecting a parsimonious set of predictor variables.For example, a prior study on depression treatment applied ENR to select the predictors of post-treatment outcomes [11].This study investigated 51 pre-treatment patient characteristics and identified a subset of 14 as predictors using ENR [11], demonstrating that ENR can identify a parsimonious subset of predictor variables from a wide array of potential predictors [11].Similarly, ENR has previously been used to select an optimal multivariable model to prospectively predict PTSD symptoms using imaging, demographic, and clinical patient characteristics [14].Moreover, simulation studies have demonstrated that Lasso and ENR perform more accurate and parsimonious variable selection than other procedures, such as step-wise variable selection [69][70][71].Thus, previous work has provided proof-of-concept for combining penalized regressions with complementary univariate and multivariable OLS regression analyses to identify predictors.To our knowledge, no study has yet applied this strategy to predict fear extinction in PTSD.
The aims of this study were to: (1) compare the accuracy and generalizability of penalized regression (Ridge, Lasso, and ENR) and conventional OLS regression for predicting fear extinction in a sample of trauma-exposed adults with a range of PTSD symptoms, and (2) identify the clinical, psychological, and physiological characteristics that predict fear extinction.Informed by the theory and evidence suggesting that FPS during specific extinction phases may be a promising translational measure of fear extinction in PTSD samples [19,72], we used FPS during early extinction and late extinction as our outcome variables.Based on evidence that penalized regression approaches may improve the predictive accuracy in a cross-validated training sample and the generalizability to a holdout sample [11,65,67], we hypothesized (Primary Hypothesis 1) that our three penalized regression approaches would be more accurate than conventional OLS regression in both the cross-validated training sample and a separate holdout sample.We also compared our three penalized regression approaches, but did not have a specific hypothesis regarding the most accurate.Based on evidence that startle variables are highly intercorrelated [73], we hypothesized (Primary Hypothesis 2) that startle variables would be significant predictors of FPS during both early and late extinction.We regarded the comparisons between specific startle variables (e.g., baseline startle versus dark-enhanced startle) as exploratory.Given the lack of consensus regarding predictors of fear extinction [6,10], we regarded all non-startle predictor variables as exploratory.

Participants
The participants were 125 trauma-exposed adults recruited from the greater Boston metropolitan area.The inclusion criteria were: the ability to provide written informed consent, being aged between 18-55, exposure to at least one DSM-5 PTSD criterion A trauma, and meeting the criteria for at least 2 PTSD symptom clusters, as defined by the Clinician-Administered PTSD Scale for DSM-5 (CAPS; [74]).The exclusion criteria were: a medical condition that would confound the results, history of head trauma, current treatment with an antipsychotic, benzodiazepine use within 48 h, moderate-to-severe alcohol or substance use disorder in the past month, a current psychotic disorder, current anorexia, current obsessive-compulsive disorder, a current manic or mixed mood episode, and a lifetime history of schizophrenia or schizoaffective disorder.The Mass General Brigham and Partners Human Research Committee approved the study procedures.All the participants provided written informed consent.Table 1 shows the participant demographic and clinical characteristics.

Procedures
After providing informed consent, the participants filled out self-report questionnaires and completed clinical interviews administered by doctoral-level psychologists.During a laboratory visit, the participants then completed a dark-enhanced startle task consisting of three phases, as previously described [58].After the dark-enhanced startle, the participants completed a fear conditioning paradigm, as previously described [16].

Clinical Interviews
All the participants were administered the CAPS for DSM-5 [74] and the Mini International Neuropsychiatric Interview (MINI; [75]) by doctoral-level psychologists.
The CAPS [74] is the gold-standard semi-structured interview for PTSD assessment [76] and was used to determine a diagnosis of PTSD (Table 1).In addition, we derived the CAPS total score and scores for each of the four symptom clusters (B-E), due to prior evidence that the relationship between PTSD symptom severity and extinction may depend on the PTSD symptom cluster examined [16,[18][19][20].The MINI was used to assess other DSM-5 disorders, including those relevant to the exclusion criteria above.

Self-Report Measures
Table 2 shows the full list of the self-report measures that we examined.Some of these can be used to derive both total and subscale scores.Based on evidence and theory from the literature, we included a combination of total and/or subscale scores for several measures (see the list below and a detailed justification behind each decision in the Supplementary Materials).Trauma-Exposure and PTSD Symptom Questionnaires The Childhood Trauma Questionnaire (CTQ; [77]) is a 28-item questionnaire used to assess the occurrence and frequency of childhood abuse and neglect.We included the CTQ total score and each of the 5 CTQ subscale scores: Emotional Abuse, Emotional Neglect, Physical Abuse, Physical Neglect, and Sexual Abuse.The Life Events Checklist (LEC; [78] is a self-report measure of exposure to 16 types of potentially traumatic events.We included the LEC Experienced + Witnessed and LEC Experienced scores.The PTSD Checklist for DSM-5 (PCL-5; [79]) is a 20-item self-report measure that assesses the 20 DSM-5 symptoms of PTSD.We included the PCL-5 total score in our main analyses and the PCL-5 cluster scores in our post hoc analyses (see Section 3.3).

Dissociation Questionnaires
The Dissociative Experiences Scale-II [80] is a 28-item measure that assesses both normative (e.g., daydreaming) and clinically significant dissociative experiences during daily life.We included the Dissociative Experiences Scale total score and its 3 subscale scores: amnesia, absorption, and depersonalization/derealization.The Multiscale Dissociation Inventory (MDI; [81]) is a 30-item self-report measure of clinically impairing dissociative symptoms.Although we did not have sufficient MDI data to include the MDI in our primary analyses (see Section 3.3 and Supplementary Materials for details), our post hoc analyses used the MDI total score and 6 MDI subscale scores: disengagement, depersonalization, derealization, emotional constriction/numbing, memory disturbance, and identity dissociation (see Section 3.3 for details).

Depression Questionnaires
The Beck Depression Inventory-II (BDI; [82]) is a 21-item inventory of depressive symptom severity.We included the BDI total score.The Snaith-Hamilton Pleasure Scale (SHAPS; [83]) is a 14-item anhedonia symptom questionnaire.We included the SHAPS total score.

Fear and Anxiety Questionnaires
The State-Trait Anxiety Inventory (STAI; [84]) is a 40-item scale designed to measure trait and state anxiety, and we included both subscale scores.The Fear Survey Schedule-II (FSS; [85]) is a 51-item questionnaire assessing the tendency to experience fear in response to various real-world stressors and stimuli.We included the FSS total score.The Anxiety Sensitivity Index-3 (ASI; [86]) is an 18-item questionnaire assessing fear of anxiety symptoms.We included the ASI total score.

Sleep and Resilience Questionnaires
The Pittsburgh Sleep Quality Index (PSQI; [87]) is a 9-item questionnaire on sleep quality and patterns.We included the PSQI total score.The Connor-Davidson Resilience Scale (CDRISC; [88]) is a 10-item questionnaire assessing the dispositional tendency to respond to stress and adversity with resilience.We included the CDRISC total score.

Laboratory Paradigms Dark-Enhanced Startle
We used a dark-enhanced startle paradigm, as previously published [58].First, during a 2 min Baseline period, the participants acclimated to the laboratory environment and no startle probes occurred.Second, during a 2 min Habituation phase, 8 startle probes were delivered with the lights on.Third, during a 4 min Dark-Light phase, the participants experienced 4 1 min blocks of alternating dark and light, with the order of dark and light counterbalanced between the subjects.Each dark block and light block included 4 startle probes.Across the entire dark-enhanced startle task, the inter-trial intervals ranged from 10 to 30 s.

Fear Conditioning
We used a fear conditioning and extinction paradigm, as previously published [16].Briefly, this paradigm consisted of three phases.First, Habituation included 7 Noise Alone (NA) trials, 4 CS+ trials, and 4 CS− trials.Second, Acquisition included 12 NA trials, 12 CS− trials, and 12 CS+ trials, with a US presented 0.5 s after the CS+ termination with 100% reinforcement.Third, Extinction included 16 NA trials, 16 CS− trials, and 16 CS+ trials, with no US presentation (0% reinforcement).Acquisition and Extinction were pseudorandomized and counterbalanced between the subjects; the trials were presented in blocks, with each block containing 4 trials of each type.All the trial durations were 6 s, and each trial included an auditory startle probe.The auditory startle probes were 106 dB 40 ms white noise bursts with a near-instantaneous rise/fall time delivered 5.6 s into each trial.The inter-trial intervals varied between 9 and 22 s.The stimuli were: a 140 psi airblast delivered to the larynx, which served as the unconditioned stimulus (US), and colored shapes displayed against a white background, which served as the conditioned stimuli.The NA trials consisted of only the white computer screen and an auditory startle probe.

Physiological Data Acquisition and Processing Startle
Electromyography (EMG) was continuously recorded from two 5 mm Ag/AgCl electrodes filled with electrolyte gel and attached below the right eye.The EMG data were acquired at a sampling rate of 1000 kHz, amplified, and digitized using the EMG module of the Biopac MP150 (Biopac Systems, Inc., Aero Camino, CA, USA).The EMG signal was filtered with low-and high-frequency cut-offs at 28 and 500 Hz, respectively.The reflexive eyeblinks to all the startle probes within a response window from 20 to 120 milliseconds were quantified by calculating the difference in the amplitude between the peak of the response and the EMG value at the response onset [89].Based on prior recommendations [89], the individual startle probes were examined, and invalid trials (i.e., blinks in which there was excess noise, blinks which began prior to the latency window, or trials in which a spontaneous blink occurred immediately before the startle probe) were removed.Specifically, 5.47% of individual startles were deleted and treated as missing.
For each trial during Acquisition and Extinction, the FPS was calculated by subtracting the startle magnitude to the corresponding noise-alone trial from the startle magnitude to the conditioned stimulus (e.g., CS+ or CS−) trial.Thus, the FPS reflected the degree to which the reflexive startle response elicited by the startle probe was elevated when in the presence of the CS+ or CS−, relative to when no conditioned stimulus was present.For each block during Acquisition and Extinction, the mean FPS was calculated across all the trials in the block.The dark-enhanced startle response was calculated by subtracting the mean startle magnitude during the light condition of the dark-enhanced startle from the mean startle magnitude during the dark condition of the dark-enhanced startle.The baseline startle response was calculated by taking the mean startle magnitude across all 7 Habituation trials that occurred before the dark-enhanced startle.

Skin Conductance
Skin conductance was continuously recorded from two 5 mm Ag/Cl electrodes filled with isotonic paste and attached to the palm of the non-dominant hand.The skin conductance data were acquired at a sampling rate of 1000 Hz, amplified, and digitized using the Galvanic Skin Response module of the Biopac MP150 (Biopac Systems, Inc., Aero Camino, CA, USA).The SCR to all the conditioned stimuli within a response window from 0.9 to 6 s after the stimulus onset was quantified by calculating the difference in the skin conductance level between the response peak and the response trough [73,90].The SCR to each trial was square root transformed.Based on recommendations, [73,90], each individual SCR trial was examined, and invalid SCRs (i.e., excessive noise) were treated as missing.SCR trials for which no detectable response occurred (i.e., the recorded amplitude was less than the minimum detectable amplitude of 0.2 µS) were treated as non-responses with an amplitude of 0. Applying these criteria, 6.93% of individual SCRs were deleted and treated as missing.
For each block during Acquisition, the SCR difference score was calculated by subtracting the mean SCR response to the CS− trials from the mean SCR response to the CS+ trials.Additionally, the SCR habituation was calculated by taking the mean SCR across all the Habituation trials before Acquisition.

Cardiography
Electrocardiography (ECG) was continuously recorded from three 11 mm Ag/AgCl electrodes filled with electrolyte gel and attached under each clavicle and on the left forearm.The ECG data were acquired at a sampling rate of 1000 Hz, amplified, and digitized using the Galvanic Skin Response module of the Biopac MP150 (Biopac Systems, Inc., Aero Camino, CA, USA).For all the physiological tasks, the heart rate and HRV features were extracted from 1 min intervals of continuous ECG data.Automated QRS detection was performed in MindWare [91], and errors were detected using visual screening and corrected manually.Based on recommendations [91], each 1 min segment of the ECG data was examined for erroneous R peaks and cardiac arrhythmia.Any segment with greater than 10% erroneous R peaks or greater than 2% cardiac arrhythmia was treated as missing.Applying these criteria, 15.20% of individual ECG segments were deleted and treated as missing.
The baseline heart rate and HRV were calculated as the mean of the heart rate and HRV across all the 1 min segments during Habituation prior to the dark-enhanced startle.The dark-enhanced heart rate and dark-enhanced HRV were calculated by subtracting the mean of the heart rate across all the 1 min segments during light and the HRV across all the 1 min segments during light, respectively, from the mean of the heart rate across all the 1 min segments during dark and the HRV across all the 1 min segments during dark, respectively.The heart rate during Acquisition and HRV during Acquisition were calculated by taking the average across all the 1 min segments of the heart rate during Acquisition and the HRV during Acquisition.

Statistical Analyses Outcome Variables
Given the evidence from previous studies on the importance of considering temporal dynamics when using FPS to measure the fear extinction in PTSD samples [18,19], we followed that precedent to operationalize fear extinction.Specifically, we examined the FPS during both early and late extinction as response variables.Early extinction was calculated as the mean FPS to the CS+ across the first two extinction blocks and late extinction as the mean FPS to the CS+ across the last two blocks of extinction [19].

Predictor Variables
The predictor variables included clinical symptom measures, self-report measures, demographic characteristics, baseline psychophysiological measures, psychophysiological measures taken during the dark-enhanced startle task, and psychophysiological measures taken during fear Acquisition.See Table 2 for a complete list.

Analysis Pipeline
Figure 1 displays a schematic of our analysis pipeline.We split our sample of 125 participants into a training sample and holdout sample.Following precedent [11], the 20% of the participants who most recently visited the lab were assigned to the holdout sample (i.e., a temporal validation).The predictor variables were z-transformed.The data preparation and model development only used data drawn from the training sample.The missing data were imputed separately for the training and holdout samples using the missForest [92] package in R [93].For the 50 predictor variables in our models, the missingness rates were as follows: 12 variables with 0 missing, 25 variables with less than 5% missing, 5 variables with between 5 and 10% missing, and 10 variables with between 10 and 18% missing.For the outcome variables, all the participants had complete data.

Comparing Predictive Models
For both of our response variables (i.e., early extinction and late extinction), we compared the performances of 4 different types of predictive models, each of which was implemented using the glmnet package in R [94] with the wrapper function provided by the caret package in R [95].We implemented three different types of penalized regression model: least absolute shrinkage and selection operator (Lasso) regression, Ridge regression, and ENR.For comparison, we also applied conventional ordinary least squares (OLS) linear regression.To minimize the overfitting, we used repeated cross-validation with 10 folds and 100 repeats.Importantly, the cross-validation procedure ensured that all the predictions of the FPS during early extinction and the FPS during late extinction for all the participants were generated from models trained without using their own data.Within the cross-validation step, we used the caret package s resampling grid search to select the optimal tuning hyperparameters, alpha and lambda [95].Specifically, for ENR, each combination of alpha and lambda was tested (from 0 to 1 by 0.05 increments) and the optimal values were selected (i.e., the values that minimized the MSE in the cross-validated training sample) [95].The same procedure was applied for the Lasso and Ridge regression, with the exception that Lasso only identified the optimal value for lambda (with alpha fixed at 0) and Ridge only identified the optimal value for alpha (with lambda fixed at 0) [95].We compared the models performances in the testing sample based on the cross-validated mean squared error (MSE).As a lower MSE indicates a lower predictive error, models with a lower MSE are more accurate predictive models.We also examined the mean absolute error (MAE), which measures the predictive error such that a lower MAE indicates less error and a higher predictive accuracy.Finally, we examined the R 2 (coefficient of determination) values based on the following formula: 1 − [MSE/var(y)].The R 2 value provides a measure of predictive accuracy on a standardized scale with a maximum score of 1, but it is not lower bounded [96].An R 2 of 1 indicates a perfect predictive performance, an R 2 of 0 is equivalent to chance, and an R 2 < 0 indicates a worse predictive performance than chance.
After the cross-validation, all the models were tuned on the entire training sample to derive the final model parameters, which were then used to predict the outcome for the holdout sample.The model performance in the holdout sample was evaluated as described in the previous paragraph.The missing data were imputed separately for the training and holdout samples using the missForest [92] package in R [93].For the 50 predictor variables in our models, the missingness rates were as follows: 12 variables with 0 missing, 25 variables with less than 5% missing, 5 variables with between 5 and 10% missing, and 10 variables with between 10 and 18% missing.For the outcome variables, all the participants had complete data.

Comparing Predictive Models
For both of our response variables (i.e., early extinction and late extinction), we compared the performances of 4 different types of predictive models, each of which was implemented using the glmnet package in R [94] with the wrapper function provided by the caret package in R [95].We implemented three different types of penalized regression model: least absolute shrinkage and selection operator (Lasso) regression, Ridge regression, and ENR.For comparison, we also applied conventional ordinary least squares (OLS) linear regression.To minimize the overfitting, we used repeated cross-validation with 10 folds and 100 repeats.Importantly, the cross-validation procedure ensured that all the predictions of the FPS during early extinction and the FPS during late extinction for all the participants were generated from models trained without using their own data.Within the cross-validation step, we used the caret package's resampling grid search to select the optimal tuning hyperparameters, alpha and lambda [95].Specifically, for ENR, each combination of alpha and lambda was tested (from 0 to 1 by 0.05 increments) and the optimal values were selected (i.e., the values that minimized the MSE in the cross-validated training sample) [95].The same procedure was applied for the Lasso and Ridge regression, with the exception that Lasso only identified the optimal value for lambda (with alpha fixed at 0) and Ridge only identified the optimal value for alpha (with lambda fixed at 0) [95].We compared the models' performances in the testing sample based on the cross-validated mean squared error (MSE).As a lower MSE indicates a lower predictive error, models with a lower MSE are more accurate predictive models.We also examined the mean absolute error (MAE), which measures the predictive error such that a lower MAE indicates less error and a higher predictive accuracy.Finally, we examined the R 2 (coefficient of determination) values based on the following formula: 1 − [MSE/var(y)].The R 2 value provides a measure of predictive accuracy on a standardized scale with a maximum score of 1, but it is not lower bounded [96].An R 2 of 1 indicates a perfect predictive performance, an R 2 of 0 is equivalent to chance, and an R 2 < 0 indicates a worse predictive performance than chance.
After the cross-validation, all the models were tuned on the entire training sample to derive the final model parameters, which were then used to predict the outcome for the holdout sample.The model performance in the holdout sample was evaluated as described in the previous paragraph.

Examining Specific Predictor Variables
For both early extinction and late extinction, we used 8 different criteria to identify the predictor variables: (1) we identified the variables that were significant univariate predictors of extinction at a Bonferroni-corrected threshold of p < 0.001, (2) we identified the variables that were significant univariate predictors at a nominal significance threshold of p < 0.05, (3) we identified the variables that were retained as predictors in Lasso, (4) we identified the variables that were retained as predictors in ENR, (5) we identified the variables that were statistically significant predictors when including the predictors retained by Lasso in a multivariable regression model and applying a Bonferroni correction based on the number of variables included in the model, (6) we identified the variables that were statistically significant predictors when including the predictors retained by Lasso in a multivariable regression model using a nominal significance threshold of p < 0.05, (7) we identified the variables that were statistically significant predictors when including the predictors retained by ENR in a multivariable regression model and applying a Bonferroni correction based on the number of variables included in the model, and (8) we identified the variables that were statistically significant multivariable predictors when including the predictors retained by ENR in a multivariable regression model using a nominal significance threshold of p < 0.05.
To evaluate criteria 1 and 2, we performed 100 univariate regressions in the whole study sample.Specifically, we performed 50 univariate regressions to predict early extinction and 50 univariate regressions to predict late extinction (i.e., one univariate regression per predictor variable).The predictor variables were z-transformed before their inclusion in the models.The participants missing the predictor variable for a given univariate model were dropped from the corresponding univariate regression.Criteria 3 and 4 were evaluated based on cross-validated Lasso and ENR performed in the training sample only.Criteria 5-8 were evaluated based on multivariable OLS regressions performed in the whole sample.For these multivariable OLS regressions performed in the whole sample, we used the R default setting, which dropped the participants' missing data for any predictor from the model.The predictor and response variables were z-transformed before their inclusion in the models, such that the estimated regression coefficients were fully standardized and comparable across the predictors and responses.In addition to our 8 criteria, we also examined the coefficients applied to each variable within each model.Our decision to evaluate the predictor significance at both an uncorrected threshold of p < 0.05 and a Bonferroni-corrected threshold was based on the importance of considering both type I and type II errors for exploratory analyses [97].Currently, there is no clear consensus with regard to p-value correction in exploratory research [97].While some have argued that exploratory analyses should always apply a Bonferroni correction to control for type I errors (e.g., [98][99][100]), others have argued that corrected p-values lead to excessive type II errors and have recommended that no correction be applied (e.g., [101][102][103][104]).Thus, to balance these considerations and maximize transparency, we reported the significance of the findings at both corrected and uncorrected thresholds.

Post Hoc Analyses
Although our study included the Multiscale Dissociation Inventory (MDI; [81]), this measure was not included in the primary analyses due to the excessive missing data in the training sample (see the Supplementary Materials for details).We initially found that the Dissociative Experiences Scale, and its depersonalization/derealization subscale, predicted early extinction.Following up on this finding, we performed post hoc analyses of the MDI and its subscales.
Finally, we performed a post hoc analysis of self-reported PTSD symptom severity using the PCL-5 [105].We initially found that PTSD Cluster E scores on the CAPS predicted early extinction; this is consistent with two prior studies that used CAPS [18,20], but inconsistent with two prior studies that used a self-report measure called the PTSD Symptom Scale (PSS) [16,19].Although we did not have PSS data in this study, we checked to see if a different self-report measure, the PCL-5, would yield a finding consistent with our CAPS finding.

Results
In the whole sample, the FPS levels were variable during early extinction (range = −37.56-236.9;mean = 52.54;and standard deviation = 48.57)and late extinction (range = −27.07-147.73;mean = 33.46;and standard deviation = 37.04).In the training sample, the FPS had a mean of 53.23 during early extinction (standard deviation = 48.96)and a mean of 32.51 during late extinction (standard deviation = 35.49).In the holdout sample, the FPS had a mean of 49.96 during early extinction (standard deviation = 48.01)and a mean of 37.12 during late extinction (standard deviation = 43.22).See Supplementary Figures S1 and S2 for the distributions of the FPS during early extinction and late extinction.

Early Extinction
To predict the FPS during early extinction, the 10-fold repeated cross-validation procedure indicated that the optimal tuning parameters were as follows: Lasso Regression (alpha 1, lambda 3.59); Ridge Regression (alpha 0, lambda 40.37); and ENR (alpha 0.1, lambda 23.55).In the training sample, the cross-validation results indicated that the model with the lowest prediction error was ENR (MSE = 1296.41;MAE = 28.30; and R 2 = 0.50).Likewise, the model with the lowest prediction error in the holdout sample was ENR (MSE = 726.91;MAE = 21.29; and R 2 = 0.57).The Conventional OLS Linear Regression was the model with the highest prediction error in both the cross-validated training sample and holdout sample, and was less accurate than chance in the holdout sample (R 2 = −0.29).For a comparison of the fit indices across all the models for predicting early extinction in the cross-validated training and holdout samples, see Table 3.

Late Extinction
To predict the FPS during late extinction, the 10-fold repeated cross-validation procedure indicated that the optimal tuning parameters were as follows: Lasso Regression (alpha 1, lambda 2.98); Ridge Regression (alpha 0, lambda 48.63); and ENR (alpha 0.1, lambda 17.50).In the training sample, the cross-validation results indicated that the model with the lowest prediction error was ENR (MSE = 966.55;MAE = 23.60; and R 2 = 0.29).However, the model with the lowest prediction error in the holdout sample was Lasso Regression (MSE = 2037.19;MAE = 32.18; and R 2 = 0.29).The Conventional OLS Linear Regression was the model with the highest prediction error in both the cross-validated training sample and holdout sample, and was less accurate than chance in the holdout sample (R 2 = −0.61).For a comparison of the fit indices across all the models for predicting late extinction in both the cross-validated training sample and holdout sample, see Table 4.

Early Extinction
Table A1 shows the variables that met at least one criterion used to identify the predictors of early extinction.Dark-enhanced startle was the only variable that met all eight criteria.No variable met seven or six out of the eight criteria.Four variables met five out of the eight criteria: the Depersonalization and Derealization subscale of the Dissociative Experiences Scale, the Severity of CAPS Cluster E Symptoms (i.e., Alterations in Arousal and Reactivity), the FPS to the CS+ during block 1 of the Acquisition, and the FPS to the CS+ during block 3 of the Acquisition.Four variables met four out of the eight criteria: Baseline Startle, the FPS to the CS+ during block 2 of the Acquisition, the FPS to the CS− during block 2 of the Acquisition, and the FPS to the CS− during block 3 of the Acquisition.Three variables met three out of the eight criteria: the total score on the Pittsburgh Sleep Quality Index, the Physical Neglect subscale on the Childhood Trauma Questionnaire, and female sex.Five variables met two out of the eight criteria, ten variables met one out of the eight criteria, and twenty-three variables did not meet any of the eight criteria.
Across the 50 univariate regression analyses performed on the whole sample to predict the FPS during early extinction, 7 were significant at the Bonferroni-corrected significance threshold of p < 0.001, and an additional 10 were significant only at the uncorrected (nominal) significance threshold of p < 0.05.Across the 50 potential predictor variables included in the cross-validated training sample, Lasso selected 14 predictor variables and ENR selected 23 predictor variables.When including the 14 predictor variables selected by Lasso in a multivariable regression model using the whole sample, 1 variable was significant at the Bonferroni-corrected significance threshold of p < 0.00357 (0.05/14), and an additional 3 variables were significant only at the uncorrected significance threshold of p < 0.05.When including the 23 predictor variables selected by ENR in a multivariable regression model using the whole sample, 1 variable was significant at the Bonferroni-corrected significance threshold of p < 0.00217 (0.05/23), and an additional 6 variables were significant only at the uncorrected significance threshold of p < 0.05.For detailed statistics from the univariate and multivariable regression models, see Supplementary Tables S1 and S2.For a heatmap comparing the coefficient weights of all 50 predictors using a simple univariate regression and the four cross-validated machine learning models, see Supplementary Figure S3.

Late Extinction
Table A2 shows the variables that met each criterion used to identify the predictors of late extinction.Across the eight criteria, zero variables met all eight criteria, and zero variables met seven criteria.Baseline startle was the only variable to meet six criteria.Two variables met five criteria: the FPS to the CS+ during block 2 of the Acquisition and the FPS to the CS− during block 3 of the Acquisition.The CAPS Cluster C score was the only variable to meet four criteria.Four variables met three criteria: the Physical Neglect Subscale of the Childhood Trauma Questionnaire, dark-enhanced startle, the FPS to the CS− during block 1 of the Acquisition, and the FPS to the CS− during block 2 of the Acquisition.Ten variables met two criteria, four variables met one criterion, and twenty-eight variables met zero criteria.
Across the 50 univariate regression analyses performed on the whole sample to predict the FPS during late extinction, 4 were significant at the Bonferroni-corrected significance threshold of p < 0.001, and an additional 5 were significant only at the uncorrected significance threshold of p < 0.05.Across the 50 potential predictor variables included in the cross-validated training sample, Lasso selected 14 predictor variables and ENR selected 22 predictor variables.When including the 14 predictor variables selected by Lasso in a multivariable regression model using the whole sample, no variables were significant at the Bonferroni-corrected significance threshold of p < 0.00357 (0.05/14), but 2 variables were significant at the uncorrected threshold of p < 0.05.When including the 22 predictor variables selected by ENR in a multivariable regression model using the whole sample, 0 variables were significant at the Bonferroni-corrected significance threshold of p < 0.00227 (0.05/22), but 5 variables were significant at the uncorrected threshold of p < 0.05.For detailed statistics from the univariate and multivariable regression models, see Supplementary Tables S3 and S4.For a heatmap comparing the coefficient weights of all 50 predictors using a simple univariate regression and the four cross-validated machine learning models, see Supplementary Figure S4.

Post Hoc Analyses Early Extinction
The univariate regression analysis of the Multiscale Dissociation Inventory (MDI) found an association between the MDI total score and early extinction that was significant at the p < 0.05 level, but would not have survived correction for multiple comparisons (B = 0.012, p = 0.03081).Across the six univariate regression analyses examining the six MDI subscales, two were significant at the p < 0.05 level, but not at the multiple comparison threshold: MDI Depersonalization (B = 0.051, p = 0.02486) and MDI Disengagement (B = 0.056, p = 0.01914).
Across the four univariate regression analyses examining the four PCL-5 PTSD symptom cluster scores, only cluster E was significant (B = 0.052, p = 0.00692).This finding survived correction for multiple comparisons across the four symptom clusters (0.05/4 = 0.01250), but would not have survived correction across all the univariate regressions used to examine early extinction.

Discussion
Identification of the statistical modeling approaches and patient characteristics that predict fear extinction in PTSD may eventually inform advances in precision extinctionbased treatments [3,15].Building on prior evidence that penalized regression modeling may increase the predictive accuracy in clinical research [11,65,67], we compared the accuracy of fear extinction predictions from three types of penalized regressions and traditional OLS regression in a cross-validated training sample and holdout sample.In line with our first hypothesis, all three penalized regression models were more accurate than the OLS regression in both samples.In line with our second hypothesis, the startle variables were more likely to be selected as predictors relative to the non-startle variables.Exploratory comparisons between the patient characteristics highlight three consistent predictors of early extinction: dark-enhanced startle, trait depersonalization/derealization, and PTSD hyperarousal symptom severity.Overall, our study yields novel insights into which modeling approaches and patient characteristics may reliably predict fear extinction in PTSD.

Modeling Approaches
The model comparisons indicated that the penalized regressions predicted fear extinction with a greater accuracy than the conventional (OLS) regression.Based on the MSEs for the models predicting early extinction, the OLS regression had 85% more predictive error than the least accurate penalized regression in the cross-validated training sample (2454.13− 1325.76 = 1128.37;1128.37/1325.76= 0.851111 × 100 = 85.1111%).In the holdout sample, the OLS regression had 394% more predictive error than the least accurate penalized regression.Similarly, for the models predicting late extinction, the level of error for the OLS regression was more than double that for the least accurate penalized regression in both samples.In summary, all three penalized regression models were substantially more accurate and more generalizable to a holdout sample than the conventional regression.In contrast, the difference in the MSEs between the most and least accurate penalized regression models was within 4% in both samples and during both phases.Thus, the three penalized regressions had relatively comparable predictive performances [106].Overall, these results suggest that penalized regressions may hold promise for helping to develop clinically useful predictions of exposure therapy responses in PTSD.However, treatment studies are needed to test this theory directly.

Predictor Variables
Our study is the first to demonstrate that adults with PTSD symptoms who exhibit heightened, unconditioned fear in an anxiety-inducing context also display deficient conditioned fear extinction learning.Across the 50 variables examined during early extinction, dark-enhanced startle was the only variable identified as a predictor across all eight criteria in our study.Notably, the effect of dark-enhanced startle on early extinction was significant in the multivariable models that controlled for baseline startle and the FPS during Acquisition, suggesting that it has an effect above-and-beyond individual differences in general startle reactivity and conditioned fear before starting extinction learning.One possible explanation is that early extinction in an uninstructed paradigm like ours, where participants are not explicitly told that the CS+ will not be followed by the US during extinction, has been found to partially capture a participant's response to an uncertain threat [107].Because dark-enhanced startle falls under the RDoC construct of a potential threat ("anxiety") [108][109][110], our findings may suggest that individual differences in response to these potential threats partially modulate early extinction.Although fear conditioning falls under the RDoC construct of an acute threat ("fear") [109,111], it has been proposed that the RDoC domains of potential and acute threats conceptually overlap within a higher-order internalizing dimension [112].Our finding that dark-enhanced startle consistently and robustly predicted early extinction aligns with this theory.Importantly, prior treatment studies have found that conditioned physiological responses to trauma-related threats could be valuable for developing personalized exposure therapies for trauma-induced psychopathology [3,5].Our finding suggests that an elevated startle in an unconditioned anxiety-inducing context, measured before treatment, may have additional utility in identifying trauma-exposed patients who are likely to have difficulty extinguishing fear during exposure therapy.However, clinical treatment studies are needed to test this hypothesis directly.
Our study is also the first to show that trauma-exposed individuals with elevated dissociation, specifically depersonalization and derealization, may experience deficient fear extinction learning.Among the 50 variables examined during early extinction, the depersonalization and derealization subscale of the Dissociative Experiences Scale emerged as one of the two non-startle variables predicting extinction across all the modeling approaches.Our finding that the depersonalization subscale of the MDI was also associated with early extinction increases the confidence of our findings.It extends it to a clinical measure of dissociation that has previously been found to be relevant to PTSD treatment [113], physiology [37,114,115], and clinical presentation [116].Elevated dissociation has been theorized to hinder safety learning in PTSD, leading to heightened fear responses to nonthreatening stimuli [37].Our study supports this theory, suggesting that individuals prone to dissociation may be less attentive during early extinction, and therefore more likely to experience a delay in learning that the CS+ no longer signals danger.Additionally, we found that the disengagement subscale of the MDI was also associated with an elevated FPS during early extinction in the univariate regression.However, it is important to note that these univariate and multivariate effects did not survive the Bonferroni correction, emphasizing the need for replication in larger samples.
Our finding that PTSD hyperarousal symptoms predicted the FPS during early extinction partially aligns with the prior literature and may have clinical implications.Prior evidence has indicated that PTSD patients with elevated arousal-related symptoms may benefit from tailored treatment approaches designed to address these specific symptoms (for review, see [117]).Therefore, our finding suggests that targeted treatments for PTSD patients with elevated hyperarousal should account for the possibility of delayed or deficient fear extinction.In line with this finding, a previous study by Galatzer-Levy et al. (2017) found that a statistically identified latent subgroup of trauma-exposed adults who had elevated FPSs to a CS+ during early extinction also had elevated DSM-IV hyperarousal symptoms [18].Similarly, Richards et al. (2022) found that a higher FPS across both conditioned stimuli (CS+ and CS− combined) during early extinction was correlated with elevated DSM-IV hyperarousal symptoms [20].However, two prior studies found that DSM-IV intrusion, but not hyperarousal symptoms, were associated with the FPS to a CS+ during early extinction [16,19], contrasting with our findings.A post hoc analysis of our data found that the association of early extinction with PTSD hyperarousal symptoms (and no other symptom clusters) was consistent across two measures of PTSD symptoms, suggesting that these divergent findings may stem from sample heterogeneity, rather than measurement differences (see the Supplementary Materials for additional details).

Methodological Considerations
Our findings suggest that it may be more challenging to identify the modeling approaches and clinical characteristics that robustly predict late extinction relative to early extinction.Overall, our machine learning prediction models had a worse accuracy and generalizability for late extinction.Although the MSE and MAE could not be compared across the different outcome variables, the coefficient of determination (R 2 ) provided a standardized measure of the model performance relative to chance [106].A comparison of the R 2 values across the penalized regression models suggests that the predictions were more precise in the training sample for early extinction (R 2 range 0.48-0.50)versus late extinction (R 2 range 0.28-0.29).Similarly, in the holdout sample, the R 2 values were higher for early extinction (R 2 range 0.53-0.57)versus late extinction (R 2 range 0.25-0.29).A similar pattern extended to the OLS models (see Tables 3 and 4).Additionally, there were fewer consistent predictor variables for late extinction compared to early extinction.For example, across all the 50 variables tested, the average number of the predictor criteria met was 1.46 for early extinction versus 1.12 for late extinction.When excluding the startle variables, this difference increased, with the average number of the predictor criteria being 1.12 for early extinction versus 0.5 for late extinction.This aligns with previous research, indicating that the relationship between fear extinction and clinical variables is influenced by the temporal dynamics and operationalization of fear extinction [9,18].Therefore, future studies focusing on clinical correlates and predictors of fear extinction may be more likely to find an effect during early extinction, where there is generally a greater variability.
Although intuitive, the higher likelihood of finding predictors when using startle versus non-startle variables underscores two critical challenges for FPS studies of fear extinction: (1) the importance of controlling for differences in general startle reactivity [57], and (2) the difficulty in identifying consistent relationships across different measurement methods [6].For both early and late extinction, the startle variables met an average num-ber of 10 predictor criteria, while the non-startle variables (clinical, self-reported, and demographic variables) met only 0.81 criteria.Across the 13 non-startle physiological variables (i.e., heart rate, HRV, and SCR variables), none met more than 2 out of the 8 predictor criteria, indicating a lack of consistent physiological predictors that were not startle variables.In contrast, the Lasso and ENR models for early and late extinction did retain multiple non-startle physiological measures as predictors.Further, each modality of psychophysiological measure examined (heart rate, HRV, and SCR) was retained in at least one cross-validated machine learning model, suggesting that these measures may still have contributed meaningfully to the variance in the FPS during extinction.Overall, these observations are consistent with prior evidence that individual differences in fear extinction may result from the combined effects of numerous individual difference variables, with small but meaningful individual impacts [6].Furthermore, the finding that most candidate predictors were only identified using a subset of criteria and modeling approaches contributes to the growing evidence that methodological differences can lead to inconsistent findings in fear extinction research [9,23,118,119].

Limitations and Future Directions
Our study's limitations need to be considered when interpreting its results.Statistical power has been a concern in psychophysiology research, especially when investigating individual differences and conducting multiple comparisons [120,121].Our study was not well-powered for detecting small effects that may be reflected in the broader PTSD population.As a result, although most variables examined in our study were not consistent predictors of extinction, this does not mean that a consistent effect would not be found in a larger sample.Thus, the small sample size limits the generalizability of the findings, which will require replication in larger samples.Moreover, our sample was predominantly white and female, limiting the generalizability of our findings to samples with different racial and sex compositions.To address this limitation, replication with more diverse samples remains an important future direction.Further, our limited statistical power precluded us from examining modeling approaches that account for interaction effects [106].Because the relationships between fear extinction and many of the variables in our study are likely to be complex and interactive [6], we propose that future studies with larger samples should build on this work.For example, future studies may expand upon the methodological framework employed in this study by including interaction effects within a penalized regression framework and by exploring machine learning approaches, such as decision trees, designed to identify these interactions.Furthermore, it is worth noting that, while FPS is a promising translational measure of conditioned fear in PTSD samples [72], it indexes only one facet of the fear response [73].The variables that were not consistent predictors of the FPS during extinction in our study may be consistent predictors of other fear extinction measures, such as amygdala activity or subjective fear [73].Thus, we propose that future studies extend this work by applying penalized regressions with supplemental univariate and multivariable regressions to identify the consistent predictors of other fear extinction measures.

Conclusions
In summary, we conducted a series of cross-validated penalized regressions, crossvalidated OLS regressions, and multivariable-and univariate-regression-based significance tests to identify the modeling approaches and participant characteristics that predict fear extinction in traumatized adults with a continuum of PTSD symptoms.The penalized regressions outperformed the conventional OLS regression during both the early and late extinction of the FPS, as demonstrated in the training and holdout samples.We identified two novel predictors of early extinction: dark-enhanced startle and trait depersonalization/derealization.Additionally, we extended the previous findings that arousal-related PTSD symptom severity may predict early extinction.Future studies are needed to replicate and extend these findings, particularly regarding their clinical implications.Despite its limitations, our study demonstrates the effectiveness of penalized regressions and offers valuable insights into predicting fear extinction in PTSD samples.In time, this line of work may inform the development of precision extinction therapies for individuals with post-traumatic stress.

Supplementary Materials:
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/brainsci13081131/s1,Supplementary Background provides a more detailed literature review on factors predicting fear extinction.Supplementary Methods provide more details regarding clinical and self-report measures included in this study.Supplementary Tables show beta weights and p-values from all univariate regression models.Supplementary Figures display the distribution of FPS during early extinction and late extinction in the full, training, and holdout samples.Supplementary Discussion provides more detailed interpretation of post hoc analyses of PTSD symptoms using the PCL-5.Supplementary References that appear in the supplement in addition to those in the manuscript .

Figure 1 .
Figure1.Schematic of the analysis pipeline (adapted from[11]).The total sample (n = 125) was split into a training sample (80%) and a holdout validation sample (20%).We compared the performance of 3 machine learning algorithms (in addition to conventional linear regression) via 100 iterations of 10-fold cross-validation.The best-performing model (lowest mean squared error; MSE) was tuned and implemented (without modification) in the holdout validation sample.

Figure 1 .
Figure1.Schematic of the analysis pipeline (adapted from[11]).The total sample (n = 125) was split into a training sample (80%) and a holdout validation sample (20%).We compared the performance of 3 machine learning algorithms (in addition to conventional linear regression) via 100 iterations of 10-fold cross-validation.The best-performing model (lowest mean squared error; MSE) was tuned and implemented (without modification) in the holdout validation sample.

Table 1 .
Sample demographic and clinical characteristics.

Table 2 .
Candidate predictors and outcome measures examined in this study, including estimates of internal consistency for multi-item scales and psychophysiological measures.

Table 3 .
Performance of algorithms predicting early extinction in the (A) cross-validated training sample, and (B) holdout sample.

Table 4 .
Performance of cross-validated algorithms predicting late extinction in the (A) cross-validated training, and (B) the holdout sample.