Neural Networks for Early Diagnosis of Postpartum PTSD in Women after Cesarean Section

Featured Application: Early diagnosis and warning mechanisms are essential in every health condition. The research described in this paper can provide the means for the development of medical assistance applications. Abstract: The correlation between the kind of cesarean section and post-traumatic stress disorder (PTSD) in Greek women after a traumatic birth experience has been recognized in previous studies along with other risk factors, such as perinatal conditions and traumatic life events. Data from early studies have suggested some possible links between some vulnerable factors and the potential development of postpartum PTSD. The classiﬁcation of each case in three possible states (PTSD, proﬁle PTSD, and free of symptoms) is typically performed using the guidelines and the metrics of the version V of the Diagnostic and Statistical Manual of Mental Disorders (DSM-V) which requires the completion of several questionnaires during the postpartum period. The motivation in the present work is the need for a model that can detect possible PTSD cases using a minimum amount of information and produce an early diagnosis. The early PTSD diagnosis is critical since it allows the medical personnel to take the proper measures as soon as possible. Our sample consists of 469 women who underwent emergent or elective cesarean delivery in a university hospital in Greece. The methodology which is followed is the application of random decision forests (RDF) to detect the most suitable and easily accessible information which is then used by an artiﬁcial neural network (ANN) for the classiﬁcation. As is demonstrated from the results, the derived decision model can reach high levels of accuracy even when only partial and quickly available information is provided.


Introduction
Post-traumatic stress disorder (PTSD) is a mental health problem that can develop after a person goes through a life-threatening event. The disorder can develop even when the person is witnessing an event, exposed through information, or extreme repeated exposure to the workplace [1]. The disorder, regardless of the type of exposure to trauma, causes symptoms of re-experiencing, avoidance, negative cognitions in the mood, and arousal. The duration of symptoms lasts more than a month, not due to the action of any substance

Participants
The participants were all postpartum women who gave birth by the 2 types of CS and gave their written consent for their participation. A total of 469 postpartum women were examined in this research. For each case, several demographics, prenatal health, and mental health variables were collected through questionnaires that were filled through interviews during their hospitalization in the departments and 6 weeks later. The exclusion criteria of the research were difficulties at a cognitive level, other languages than Greek, and underage mothers.

Data and Measures
The data were collected in 2 stages: the first stage was the 2nd day after CS, and the second stage was the 6th week after CS. During the first stage, from 469 women, we collected medical and demographic data from the socio-demographic questionnaire and past traumatic life events from the Life Events Checklist-5 (LEC-5) of DSM-V and Criterion A from the adapted first Criterion of PTSD. At the second stage, the PTSD symptoms from the Post-Traumatic Stress Checklist (PCL-5) of DSM-V are collected (The dataset that was used can be found in: https://users.uowm.gr/chorovas/appsci/nn_ptsd.html (accessed on 20 June 2022)).
The life events checklist (LEC) is the only measure that individuals can determine different levels of exposure to a traumatic event in their lives [25]. For a PTSD diagnosis, 8 criteria must be met. For the first criterion (Criterion A), the individual must have been exposed to death, threatened death, serious injury, or sexual violence in one of the following ways: (a) direct exposure, (b) witness to the event, (c) information of the event, and (d) exposure in the working space [26]. For this study, Criterion A was adjusted accordingly. The post-traumatic stress checklist (PCL-5) is a self-report scale, which was developed to measure and evaluate PTSD and PTSD Profile symptoms [1,27]. In the present study, the postpartum women replied via telephone to 20 questions during the 6th postpartum week, corresponding to 20 symptoms of the criteria B (re-experiencing), C (avoidance), D (negative thoughts and feelings), and E (arousal and reactivity). All replies are scored on 5-point scales (range zero to four). A score of one or more in the categories of criteria B and C and two or more in categories D and E are considered PTSD symptoms. Depending on the symptoms, the postpartum women were diagnosed with (a) provisional diagnosis of PTSD and (b) PTSD profile [27,28].
The demographics, prenatal health, and mental health variables that were collected are presented in Tables 1-3 (statistical tests with      In total, for each case there were 70 data fields available as it is shown in Table 4. As mentioned in Section 1, the development of a diagnostic model that could indicate early a possible PTSD case using a minimum amount of information could be very useful to prepare the health personnel for such a scenario so that appropriate measures could be taken in advance. Having this in mind we initially trained an artificial neural network (ANN) [18,23] with all the available information so that we could check whether the traditionally confirmed diagnosis could be replicated. Since that was easily achieved by a two-layered feed-forward ANN (Table 5), the focus was moved to the proper subset of data that could be used to achieve high classification accuracy. Random forest classification [29] was performed with the initial set of 70 data fields (variables). The goal was to derive Gini importance values [30] which could assist with the selection of the proper subset of variables. The criteria for the selection of these variables were the level of their direct availability with the smaller number of questions asked. This procedure resulted in having the sets of data that we used to train the ANNs models. A schematic diagram of the above processing is depicted in Figure 1. Table 4. The total of 70 available data fields.

Number of Data Fields Coded Labels
Demographics (as shown in Table 1  The corresponding results and additional details from the above methodology are presented to the following section. The corresponding results and additional details from the above methodology are presented to the following section.

Initial Classification Using the ANN
As mentioned above, the complete set of the data were used initially to examine the feasibility of the reproduction of the original classification according to the DSM-V. From the 469 cases of the collected data, 379 (80.81%) were manually diagnosed as free of symptoms, 34 (7.24%) had traces and were characterized as profile and 56 (11.94%) were diagnosed as PTSD cases. For the training and testing phases, a stratified ten-fold cross-validation scheme was employed.

Initial Classification Using the ANN
As mentioned above, the complete set of the data were used initially to examine the feasibility of the reproduction of the original classification according to the DSM-V.
From the 469 cases of the collected data, 379 (80.81%) were manually diagnosed as free of symptoms, 34 (7.24%) had traces and were characterized as profile and 56 (11.94%) were diagnosed as PTSD cases. For the training and testing phases, a stratified ten-fold cross-validation scheme was employed.
The ANN was created using the PyTorch (v1.9.0 + cu11) library in Python and had a structure of seventy input units (in the case of the complete data fields as shown in Table 4), six hidden units, and three output units using three bits for the output where only one of them was set to "1" indicating the diagnosis (one hot coding). The connections were feed-forward from one layer to the next, the Sigmoid function (with α = 1.0) was used for activation and the mean squared error (MSE) was employed from the stochastic gradient descent (SGD) optimization algorithm for training. The learning rate was set to 1.0 and the momentum to 0.9. The tuning of the hyperparameters that were used was performed on a trial-and-error base after several initial experimentations.
Initially, we estimated precision, recall, specificity, and accuracy for the complete set of the 70 variables by considering the confusion matrices and these are presented in Tables 5 and 6. Precision estimates how many positive predictions were correct. Recall estimates how many positives are correctly predicted while specificity estimates how many negatives are correctly predicted. Precision is calculated as the fraction TP/(TP + FP), the recall (sensitivity) as TP/(TP + FN), the specificity TN/(TN + FP), and the total accuracy (TP 1 + TP 2 + TP 3 )/(P 1 + P 2 + P 3 ) where TP, FP, TN, and FN are the true and false positives and true and false negatives, respectively. The results for both phases are averaged over ten sessions of the experiments, each one with a different initialization of the weights of the ANN. The averaged learning curve for the training process is depicted in Figure 1.
From Tables 5 and 6 and Figure 2, we can see that the ANN manages to easily learn the classification procedure of the DSM-V. However, we need to perform the same classification with as few variables as possible. Therefore, we employ the RDF importance values. The results for both phases are averaged over ten sessions of the experiments, each one with a different initialization of the weights of the ANN. The averaged learning curve for the training process is depicted in Figure 1.
From Tables 5 and 6 and Figure 2, we can see that the ANN manages to easily learn the classification procedure of the DSM-V. However, we need to perform the same classification with as few variables as possible. Therefore, we employ the RDF importance values.

Importance Values Using Random Decision Forests
All the data from the initial set (469 × 70) were used with the random decision forests classification which was performed using the function randomForest from the library ran-domForest version 4.6-14 in RStudio (v1.3.1093). The number of trees was 500 and the num-

Importance Values Using Random Decision Forests
All the data from the initial set (469 × 70) were used with the random decision forests classification which was performed using the function randomForest from the library randomForest version 4.6-14 in RStudio (v1.3.1093). The number of trees was 500 and the number of variables tried at each split (mtry) was 20. These parameters were also selected on a trial-and-error basis. As RDF classification has a stochastic feature in its operation, ten sessions were run, and the average estimated error rate was 1,13%. The average confusion matrix is shown in Table 7. A powerful feature of RDF classification is that an importance vector is also returned which has the Gini importance values (mean decrease in impurity, MDI) [30] of the variables used. This is very useful for having an idea of what variables contribute more to the classification process as the higher the Gini values the higher the importance of the variables. This is profound in our research as our aim was to reach a competitive level of classification using as less and more directly acquired, variables as possible.
The Gini values for the 70 variables sorted from highest to lowest can be seen in Figure 3 and in Table 8 for more precision.

Classification Using a Subset of the Available Data
The values in Table 8 show an expected high level of importance to the variables that are used directly for the typical diagnosis procedure in DSM-V (indicated by bold variable labels). As these are only available after six weeks, our effort is to avoid them and concentrate on what is quickly and easily acquired with as less questions as possible. This gives us the list of candidate variables listed in Table 9. All the twenty-four variables that are presented in Table 9 were used to construct eight data sets (called D1-D8) in steps of three. The variables in each dataset and the corresponding sum of the Gini values of these variables can be seen in Table 10. The results concerning the precision, recall (sensitivity), specificity, and accuracy during the training and testing phases in a stratified ten-fold cross-validation scheme can be seen in Tables 11 and 12 and Figures 4 and 5.   In order to have an idea about the best level of classification that could be achieved with RDF using only those variables of the complete set which are not related to DSM-V, (i.e., v41-v60 and v36-v39), ten sessions were run using the complete dataset for training. Comparing the classification errors in Table 13 (which is one recall) with the best values for recall in Table 12 we can observe a slightly better performance from the ANN using datasets D6 and D7 with only 18 and 21 variables, respectively. This is an indication of the validity of the variable selection method that was performed based on Table 8.  In order to have an idea about the best level of classification that could be achieved with RDF using only those variables of the complete set which are not related to DSM-V, (i.e., v41-v60 and v36-v39), ten sessions were run using the complete dataset for training. Comparing the classification errors in Table 13 (which is one recall) with the best values for recall in Table 12 we can observe a slightly better performance from the ANN using datasets D6 and D7 with only 18 and 21 variables, respectively. This is an indication of the validity of the variable selection method that was performed based on Table 8.

Discussion
The subject of the present study was to present a model that can produce an early diagnosis to detect and alarm a possible case so that proper measures can be taken as soon as possible. According to our findings, emergency cesarean section, pathology of gestation, preterm birth, the inclusion of neonate in NICU, absence of breastfeeding, psychiatric history, expectations from childbirth, and support from the partner are included in the set of important decision factors.
Additionally, as it can be seen from the results (graphs in Figures 4 and 5, Tables 11 and 12), the ability of the ANN model to arrive at a correct conclusion is demonstrated at a very satisfactory level (around 97% in training and 94% in testing) for the cases which are free of symptoms. For the cases that are PTSD diagnosed, the recognition level reaches 83% in training and 66% in testing. The area in between the above two categories has a low percentage of recognition and it collects the PTSD profile cases. As it can be observed from the results, the PTSD profile cases are the only ones that really need the late questionnaires data (after 6 weeks). According to the above, a policy that could be followed to arrive at a conclusion as soon as possible is to characterize a case that is not classified as free of symptoms as a possible PTSD case. If the case is indeed classified as PTSD, then such a scenario would probably denote an increased potentiality for the appearance of PTSD symptoms after six weeks when the second part of the data is collected. More focused treatment in such a case could be applied and this can start six weeks in advance, providing a beneficial period of medical care.
The use of random decision forests for associating an importance value for each data field is very useful as well. The ordering of the early accessible variables according to their Gini values in Table 9 is the result of that process and it can be noted that this ordering is indeed profound. Criterion A, which constitutes a basic decision factor also in the typical DSM diagnosis, is ranked first and its related parts (A1 and A2) are just after that. Although there is one more datum field related to Criterion A, (v34, number of similar stressful experiences) we decided not to use this as it requires extra effort from the side of the woman in order to be defined. The rest of the data fields that are used for the datasets are all important and this can be shown by the gradual increase in PTSD sensitivity which is noticed in the training phase ( Figure 4). This is expected and it denotes the usefulness of the extra information which is added to every dataset. This information increase is also depicted as the sums of the Gini values of the datasets in Figure 6. arrive at a conclusion as soon as possible is to characterize a case that is not classified as free of symptoms as a possible PTSD case. If the case is indeed classified as PTSD, then such a scenario would probably denote an increased potentiality for the appearance of PTSD symptoms after six weeks when the second part of the data is collected. More focused treatment in such a case could be applied and this can start six weeks in advance, providing a beneficial period of medical care. The use of random decision forests for associating an importance value for each data field is very useful as well. The ordering of the early accessible variables according to their Gini values in Table 9 is the result of that process and it can be noted that this ordering is indeed profound. Criterion A, which constitutes a basic decision factor also in the typical DSM diagnosis, is ranked first and its related parts (A1 and A2) are just after that. Although there is one more datum field related to Criterion A, (v34, number of similar stressful experiences) we decided not to use this as it requires extra effort from the side of the woman in order to be defined. The rest of the data fields that are used for the datasets are all important and this can be shown by the gradual increase in PTSD sensitivity which is noticed in the training phase ( Figure 4). This is expected and it denotes the usefulness of the extra information which is added to every dataset. This information increase is also depicted as the sums of the Gini values of the datasets in Figure 6.

Conclusions
Our aim for this research was to examine whether the use of ANN modeling for describing the classification process of postpartum PTSD could be useful to provide a diagnostic model for the early detection of possible cases. The high accuracy that is obtained using as little and as readily available information as possible demonstrates that this is possible, and this marks a successful scenario for the application of ANNs in psychological data modeling. Future research could incorporate additional machine learning tools for the classification to obtain even more precise classification percentages. The develop-  Table 10.

Conclusions
Our aim for this research was to examine whether the use of ANN modeling for describing the classification process of postpartum PTSD could be useful to provide a diagnostic model for the early detection of possible cases. The high accuracy that is obtained using as little and as readily available information as possible demonstrates that this is possible, and this marks a successful scenario for the application of ANNs in psychological data modeling. Future research could incorporate additional machine learning tools for the classification to obtain even more precise classification percentages. The development of mobile device applications to make the process faster would be also desirable. The benefit for the persons that would finally be diagnosed positively is important as well, since the extra period gained could be used in favor of their preliminary treatment.