PsychArray-Based Genome Wide Association Study of Suicidal Deaths in India

Background: Suicide is a preventable but escalating global health crisis. Genome-wide association studies (GWAS) studies to date have been limited, and some are underpowered. In this study, we aimed to perform the PsychArray-based GWAS study to identify single nucleotide variations associated with suicide in the Indian population. Methods: We recruited unrelated subjects who died by suicide as cases (N = 313) and the non-suicidal deaths as controls (N = 294). The 607 samples were genotyped, including cases and controls using the Illumina Infinium PsychArray-24 BeadChip v1.3 Results: In our study, four single nucleotide polymorphisms (SNPs) crossed the threshold of significance level <1 × 10−5. One of them is intronic at Chromosome2:rs1901851 and three are intergenic at Chromosome12:rs3847911, Chromosome8:rs2941489, Chromosome8:rs1464092. At a significance level of 5 × 10−5, we found a few more SNPs, with the majority of them being intergenic variants. The associated genes were associated with various important functions ranging from cell signaling, GTP binding, GPCR binding, and transcription factor binding. Conclusions: The SNPs identified in our study were not reported earlier. To our best knowledge, this study is one of the first GWAS for suicide in the Indian population. The results indicate few novel SNPs that may be associated with suicide and require further investigation. Their clinical significance is to be studied in the future.


Introduction
Suicide is the fourth most common cause of death for people between 15 to 19 years of ages, with >700,000 deaths each year worldwide [1]. The national suicide rate has risen to 12 per 100,000 population in the year 2021 from 11.3 in the year 2020 [2]. Suicide is most frequently associated with a psychiatric condition in 90% of cases. It is further linked to mood disorders in up to 60% of cases, but the majority of people who experience mood issues never attempt suicide [3]. There is a low willingness to seek psychiatric care in our nation because of the lack of awareness and stigma around mental health. In India, there is a 9% to 25% correlation between psychiatric illness and suicide, according to past studies [4].
There are genetic risks for suicidal behavior, evidenced by the research on twins [5], family-based studies [6], and adoption studies [7]. Until now, there have been no known markers or tools available to evaluate an individual's risk of suicide. However, according to McGuffin et al. (2010), about 40% of all complex interactions of suicidal behavior can be attributed to the genetic components [8]. Mann et al. conceptualized a stress-diathesis model of suicidal behavior that considers a predisposition to the act (diathesis) and an acute precipitating factor (i.e., stress) [3,9,10]. Mood disorders, alcoholism, schizophrenia [11], The entire study was carried out at the Department of Forensic Medicine and Toxicology, collaborating with the Departments of Gastroenterology (Molecular) and Psychiatry at the All India Institute of Medical Sciences (AIIMS), New Delhi, India. As per institutional review board guidelines, the closest available family member's (i.e., legally authorized relatives, LAR) written consent was obtained for sample collection and clinical data usage in the study. The semi-structured psychological autopsy proforma was created based on earlier published work by Ebert and Shneidman [24,25]. Details of the deceased were collected under 12 domains: (1) Socio-demographic details and family background, (2) history of suicidal behavior, (3) overall personality description, (4) perceived psychopathology 2 weeks prior to death, (5) self-care routine 2 weeks prior to death, (6) general coping behavior, (7) intrapersonal relationship, (8) medical history, (9) post-mortem findings, (10) history of genetic disorders, (11) history of drug use, and (12) reflective mental status. These details were obtained for each subject after interviewing one or more close family members or informants. All subjects were autopsied at AIIMS-New Delhi from 2017 to 2019. The samples were collected within 48 h of death with exclusion criteria, i.e., unknown/unidentified dead bodies, decomposed bodies, causes of death that were unclear, and deceased non-Indian descendants.

Suicide Cases (n = 313)
Autopsies of suicide victims were performed at the Department of Forensic Medicine and Toxicology. The final verdict of the death reason (completed suicide) was made by the medical examiner and the government investigation agency (Police). Trained personnel interviewed LARs (family/informants) and reviewed medical records to gather background information on completed suicides using a semi-structured psychological autopsy protocol. Cases where there were very few details available about the deceased were excluded from the study.

Non-Suicide Controls (n = 294)
Autopsies of non-suicide controls were also done in the same department. The nonsuicide controls included death of the deceased by natural causes (myocardial infarction, CAD, etc.) or accidental (road traffic accidents, drowning, or electrocution) ensured by the post-mortem report. In addition to this, two other parameters were also considered to have non-biased controls: (i) A psychological autopsy was performed on the deceased, and (ii) medical records were reviewed to determine if any kind of psychiatric illness or suicidal behavior was present in the deceased; if such behavioural was discovered (for example, any mental illness, suicide attempts, or a desire to die), the literature suggests a strong association between psychiatric illness and suicidal behavior. Hence, at the time of recruitment of the subjects, individuals having psychiatric illness (n = 7) were excluded from the study and not genotyped.

SAS Population from 1000 G Project
The South Asian population (SAS) from the 1000 Genomes Project was included as additional control samples for this study for population stratification and also for imputation purposes [26]. There are five sub-populations of the SAS population, including Sri Lankan Tamil from the UK (STU), Indian Telugu from the UK (ITU), Gujarati Indian from Houston, Texas (GIH), Punjabi from Lahore, Pakistan (PJL), and Bengali from Bangladesh (BEB).

Sample Collection and DNA Extraction
After taking informed consent from the LAR, 3 mL of femoral blood was collected in an EDTA vial and stored at −20 • C until extraction. DNA was extracted using the organic extraction method from 500 µL of whole blood. The quality and quantity of the DNA extracted were assessed using a Nano Drop ND-1000 spectrophotometer (Nano Drop Technologies, Wilmington, DE, USA). A DNA sample with a concentration of 5 ng/µL and an OD value (260/280 ratio) of around 1.8 is considered passable in terms of quality.

PsychArray Genotyping
The 607 samples (cases and controls) were genotyped using the Illumina Infinium PsychArray-24 BeadChip v1.3 (Illumina, San Diego, CA, USA) according to the manufacturer's instructions. Genome Studio 2.0 was used to read the genotypes that were generated as .idat files.

Quality Control (QC)
The workflow for the analysis is illustrated in Figure 1. During genotyping analysis, 18 samples failed the array quality check and could not pass. The genotyped data (589 samples) were subjected to an extensive quality check using Genome Studio v2.0 [27,28] and converted into PLINK files for further processing. The second stage of Brain Sci. 2023, 13, 136 4 of 16 QC procedures was performed using PLINK [29]. The SNPs were mapped with GRCh37 plus strand. Strand orientation has been corrected, and SNPs with >5% missing calls, and SNPs that failed the Hardy-Weinberg equilibrium (case: p-value < 1 × 10 −6 , control: p-value < 1 × 10 −10 ) were excluded. A filter for samples with more than 5% missing calls was also applied. In both the case and control data sets, SNPs with minor allele frequency below 1% and those in build-specific LD regions were also removed from our study. After pre-processing, 574 (449 males and 125 females) samples remained for the analysis, which included 300 cases (212 males and 88 females) and 274 controls (237 males and 37 female). Similar QC steps were applied to the SAS population, which was later used for population stratification and additional control samples. Multidimensional Scaling Analysis (MDS) was performed using PLINK, including the SAS population, to identify individuals that deviate from the majority of sample structures. ated as .idat files.

Quality Control (QC)
The workflow for the analysis is illustrated in Figure 1. During genotyping analysis, 18 samples failed the array quality check and could not pass. The genotyped data (589 samples) were subjected to an extensive quality check using Genome Studio v2.0 [27,28] and converted into PLINK files for further processing. The second stage of QC procedures was performed using PLINK [29]. The SNPs were mapped with GRCh37 plus strand. Strand orientation has been corrected, and SNPs with >5% missing calls, and SNPs that failed the Hardy-Weinberg equilibrium (case: p-value < 1 × 10 −6 , control: p-value < 1 × 10 −10 ) were excluded. A filter for samples with more than 5% missing calls was also applied. In both the case and control data sets, SNPs with minor allele frequency below 1% and those in build-specific LD regions were also removed from our study. After pre-processing, 574 (449 males and 125 females) samples remained for the analysis, which included 300 cases (212 males and 88 females) and 274 controls (237 males and 37 female). Similar QC steps were applied to the SAS population, which was later used for population stratification and additional control samples. Multidimensional Scaling Analysis (MDS) was performed using PLINK, including the SAS population, to identify individuals that deviate from the majority of sample structures. Starting with the raw image data file described in the methodology, pre-processing was performed using GenomeStudio v.2.0 and later using (B) PLINK v.1.9. Quality checking and preprocessing of data. The SAS population from 1 K genome phase 3 data was used for imputation. A logistic model was applied for the association studies using the MDS component and gender as covariates. Top markers in the study were annotated by Annovar, MAGMA, and more information was added using GeneCards and MalaCards. Starting with the raw image data file described in the methodology, pre-processing was performed using GenomeStudio v.2.0 and later using (B) PLINK v.1.9. Quality checking and preprocessing of data. The SAS population from 1 K genome phase 3 data was used for imputation. A logistic model was applied for the association studies using the MDS component and gender as covariates. Top markers in the study were annotated by Annovar, MAGMA, and more information was added using GeneCards and MalaCards.

Imputation
Imputation was carried out over the Michigan Imputation Server using the SAS population of 1000 Genomes phase 3 dataset [30]. The Michigan Imputation Server uses Eagle v2.4 for phasing and Minimac4 for imputation [31,32]. Hard call genotype output was filtered using the R2 score. SNPs with R2 < 0.3 were excluded, and the remaining genotypes were converted to PLINK format for further analysis.

Statistical Analysis and Functional Mapping
Demographic variables of the suicidal cases and non-suicidal controls were analyzed by a chi-square test for dichotomous variables and an independent t-test for continuous variables with the help of R software. Any value, i.e., a p value of 0.05 or lower, was considered significant. GWAS Analysis: The remaining SNPs were converted into PLINK format for further processing after a quality check and pre-processing in Genome Studio. Additional QC and pre-processing were performed as recommended for the MIT imputation server for input files, and 447,072 SNPs passed the criteria and were ready for imputation. SNPs with an R2 score >= 0.3 were considered for filtering. Imputed dosage files converted back to hard call SNPs using DosageConvertor [33]. PLINK has been used for further quality checking as mentioned earlier, and similar steps were applied to the SAS population from the 1000 Genome dataset. The average genotyping call rate in the final dataset was 99.998%. The logistic regression model from PLINK was implemented to identify associations. The resulting association file was used for Manhattan plot and QQ-plot to identify potential associations and the quality of the analysis. A GWAS analysis has been carried out, including the SAS population. After identity by descent (IBD)analysis, the remaining SNPs were used for PLINK MDS analysis to identify and discard subjects that failed to follow population homogeneity. Those subjects were identified using componentwise outlier detection and by manual inspection of the MDS plot ( Figure 2). The logistic regression model of PLINK has been applied using an additive genetic model adjusting for gender and the first ten MDS components. The GWAS logistic model SNP results were submitted to Annovar for functional annotation of SNPs [34], and gene set analysis was performed using MAGMA [35] with 50 Kb upstream and 10 Kb downstream regions. A total of 18,152 autosomal genes were available for analysis. For annotation and enrichment, GRCh37 build datasets were used in either tool. Gene-related data were obtained from GeneCard [36] and disease-related data were obtained from MalaCards and OMIM [37,38].

Psycho-Social Profiles of the Subjects
The demographic attributes of both cases and controls are listed in Table 1. The majority of subjects, i.e., 81.46% of the suicidal group, fell into the age range of 11-40 years,

Psycho-Social Profiles of the Subjects
The demographic attributes of both cases and controls are listed in Table 1. The majority of subjects, i.e., 81.46% of the suicidal group, fell into the age range of 11-40 years, while only 54.42% of the control group subjects fell into the same window. Males outnumbered females in both groups. There is a much larger significant difference (p < 1 × 10 −5 ) in marital status between the two groups (59.74% married; 39.93% unmarried) in the suicidal cases than in the non-suicidal controls (76.87% married; 22.10% unmarried). Most of the cases were reported from the lower and lower-middle socio-economic strata of society. Hanging was the most common method of suicide, contributing 92.97%, followed by poisoning, falling from heights, and burning (single case). Only 5.75% of cases had a diagnosed psychiatric illness. Previous suicide attempts were present in 12.14% of the cases. Proportion tests were used to examine correlations between substance abuse and both groups. No significant relationship between substance abuse and the suicide or non-suicide group was observed (p = 0.463). Among the 313 suicidal deaths, subgroup analysis of suicidal behavioural predisposition was analyzed on the basis of previous attempt (12.14%), presence of a suicide note (4.15%), genetic inheritance (6.07%), psychological behavior prior to the commission of the act, and perceived or proven triggering factors.

Population Stratification/Multidimensional Scaling Analysis
We observed that, from the five sub-populations of SAS (Sri Lankan Tamil from the UK (STU) and Indian Telugu from the UK (ITU), Gujarati Indian from Houston, Texas (GIH), Punjabi from Lahore, Pakistan (PJL), and Bengali from Bangladesh (BEB)), many samples from GIH and ITU were found to be different from the population under study using the MDS plot ( Figure 2) and were excluded from the study.

Inflation Rate of the Data/QQ Plot
A Quantile-Quantile plot was used to check the quality of the data used for association analysis. The genomic inflation rate was approximately 1.04, which falls within the acceptable range and is an indicator of good quality data ( Figure 3).
Brain Sci. 2023, 13, x FOR PEER REVIEW 8 of 17 Figure 3. For association analysis, a quantile-quantile plot was used to evaluate the quality of the data. The genomic inflation rate was about 1.04, which is within an acceptable limit and is a reflection of high-quality data.

Association of SNPs with the Suicidal Deaths
We conducted an association analysis on 447,072 hard-called SNPs that passed quality control and were shared between controls and suicidal cases in this study. We used a relaxed p-value (1 × 10 −5 ) for identifying any significant SNP (Figure 4). The top SNPs found with the relaxed p-value cut-off are enlisted in Table 2, which might be interesting for suicidal behavior. For association analysis, a quantile-quantile plot was used to evaluate the quality of the data. The genomic inflation rate was about 1.04, which is within an acceptable limit and is a reflection of high-quality data.

Association of SNPs with the Suicidal Deaths
We conducted an association analysis on 447,072 hard-called SNPs that passed quality control and were shared between controls and suicidal cases in this study. We used a relaxed p-value (1 × 10 −5 ) for identifying any significant SNP (Figure 4). The top SNPs found with the relaxed p-value cut-off are enlisted in Table 2, which might be interesting for suicidal behavior.
Four SNPs crossed the threshold of significance level <1 × 10 −5 spanning across the three chromosomes. One of them is intronic rs1901851 (p-value = 1.606 × 10 −6 , OR = 1.799), located on the chromosome 2 in close proximity to MIR3681HG, and three are intergenic, one being rs3847911 (p-value = 3.86 × 10 −6 , OR 1.814) on chromosome 12 located in the proximity of two genes, PPM1H and AVPR1, and the remaining two are located on chromosome 8 rs2941489 (p-value = 8.572 × 10 −6 , OR = 1.754) and rs1464092 (p-value = 8.572 × 10 −6 , OR = 1.754), whose nearest gene is HNFG. Additionally, at a significance level of 5 × 10 −5 , we found fourteen more SNPs, the majority of them being intergenic variants. The SNPs were functionally annotated using Annovar and MAGMA for their related genes. There were no SNPs found in the exome region. All SNPs were either intronic or intergenic. For most alleles, a relatively high odds ratio (range = 1.63-2.68) was observed ( Table 2). As shown in Table 3, the associated genes of these SNPs have a variety of important functions, ranging from cell signaling to GTP binding, GPCR binding, and transcription factor binding.    POU6F1: DNA-binding transcription factor activity. An important paralog of this gene is POU6F2. DAZAP2. This gene encodes a proline-rich protein which interacts with the deleted in azoospermia (DAZ) and the deleted in azoospermia-like gene through the DAZ-like repeats. This protein also interacts with the transforming growth factor-beta signaling molecule SARA (Smad anchor for receptor activation), eukaryotic initiation factor 4G, and an E3 ubiquitinase that regulates its stability in splicing factor containing nuclear speckles. The encoded protein may function in various biological and pathological processes, including spermatogenesis, cell signaling and transcription regulation, formation of stress granules during translation arrest, RNA splicing, and pathogenesis of multiple myeloma. Multiple transcript variants encoding different isoforms have been found for this gene.  Four SNPs crossed the threshold of significance level <1 × 10 −5 spanning across the three chromosomes. One of them is intronic rs1901851 (p-Value = 1.606 × 10 −6 , OR = 1.799), located on the chromosome 2 in close proximity to MIR3681HG, and three are intergenic, one being rs3847911 (p-Value = 3.86 × 10 −6 , OR 1.814) on chromosome 12 located in the proximity of two genes, PPM1H and AVPR1, and the remaining two are located on chromosome 8 rs2941489 (p-value = 8.572 × 10 −6 , OR = 1.754) and rs1464092 (p-value = 8.572 × 10 −6 , OR = 1.754), whose nearest gene is HNFG. Additionally, at a significance level of 5 × 10 −5 , we found fourteen more SNPs, the majority of them being intergenic variants. The SNPs were functionally annotated using Annovar and MAGMA for their related genes. There were no SNPs found in the exome region. All SNPs were either intronic or intergenic. For most alleles, a relatively high odds ratio (range = 1.63-2.68) was observed ( Table  2). As shown in Table 3, the associated genes of these SNPs have a variety of important functions, ranging from cell signaling to GTP binding, GPCR binding, and transcription factor binding.  The logistic regression model from PLINK was used to identify associations, and odds ratio after adjusting sex as a confounder.

Discussion
Our study is one of the PsychArray-based GWAS study in India in the context of suicide completers compared against non-suicide deaths. The objective of the study was to find out key variations that might be associated with a suicidal tendency. We selected subjects after a post-mortem and an extensive psychological evaluation of the deceased for the creation of a high-confidence dataset. In our study, we did not get many subjects with confirmed psychological disease. In the Indian population, due to the influence of cultural practices, psychological disorders do not gather much attention and are usually ignored; even close relatives of the deceased are unaware of the deceased's mental health. However, the psychological autopsy proforma helped us identify subjects with strong signs of mental/psychological illness. Therefore, the share of diagnosed psychiatric illness among our study's suicidal death group was found to be only in 5.75% of cases. To our best knowledge, this study is the first attempt to explore SNP-based heritability in completed suicide for any case-control GWAS study for Indian population exclusively. There was only one such study available in the Asian region (Japan) that tried to address suicidal heritability by exploring the GWAS for completed suicides (Otsuka et al., 2019). The area of completed suicide is largely unexplored, especially in India. Hence, our work is one of the first well-designed scientific efforts to identify genetic variations among suicidal deaths and non-suicidal deaths in India.
For high-quality data, extensive pre-processing was used, as recommended by the literature. SAS population has been incorporated in our study for imputation as well as additional control sets to strengthen our methodology and sample size. The samples are nearly moderate size due to strict selection criteria as, per power analysis, it is able to find strong associations. However, none of the SNPs in the study managed to reach the conventional significance level; each identified SNP requires further clinical study.
In our study, with the relaxed genome wide significance level <1 × 10 −5 , the top hit in our study, rs1901851 (Chr2; OR = 1.799; p-value = 1.606 × 10 −6 ; MAF = 0.58), is in close proximity of the RNA gene MIR3681HH that may have regulatory function. The second SNP, i.e., rs3847911 on Chr12, was intergenic to the PPM1H and AVPR1A genes, whose molecular functions include phosphatase activity, RET signaling, and GPCR signaling. PPM1H is associated with neurological disorders such as ADHD [39] and endocrine diseases, while AVPR1A (Arginine Vasopressin) is a 7-transmembrane domain G-protein polypeptide that was reported to be involved in many neurological functions, including aggression, bonding, sex behavior, autism, and schizophrenia [40]. The third and fourth top SNPs (rs2941489, rs1464092) have a significance value <1 × 10 −5 . Both are associated with HNF4G and LINCO1111. HNF4G is associated with DNA binding, transcription factor binding, and steroid hormone receptor activity. It is associated with metabolic disorders such as onset of diabetes in the young or hyperuricemia and gastric or pancreatic cancers [41] (Table 1). LINCO1111 is associated with neurological diseases. rs1805098 is only exonic variant found in our study. It is also associated with the gene HNF4G, which has multiple roles, including carbohydrate metabolism, hormone receptor, and DNA-binding. Other related variants found in our study, which are related to this gene, are rs2922766 and rs1464092. A few SNPs with significance value < 1 × 10 −5 were also found to be interesting. The rs2072781 variant found in the MYLIP gene's UTR3 region is linked to the immune system. This gene is also called a post-transcriptional regulator of LDLR abundance. The LXR-MYLIP-LDLR pathway makes a complementary pathway available to sterol regulatory element-binding proteins for the feedback inhibition of cholesterol uptake [42]. Rs9541141 is located in close proximity to the RNA gene LINC00364, which may be regulatory in nature. Another important variation found is rs1722636 (CHR = 2, OR = 1.703; p-value = 9.19 × 10 −5 ; MAF = 0.3567), whose nearest genes are RBMS1 and TANK. RBMS1 is associated with DNA binding and is associated with the blood disease 'blue toe syndrome', while TANK is associated with signal transduction and the TRAF pathway and is associated with infectious neurological disorders (Nipah virus encephalitis). One important variation is rs2816376, which is found near GCM1 and ELOVL5. GCM1 plays a role in DNA binding; the rs2816372 is another variant associated with this gene. ELOVL5 plays an important role in the elongation of fatty acid chains. Both are associated with cardiovascular blood disease, and rs296646 is associated with immune system signaling and is responsible for immunological disorders ( Table 1).
As mentioned earlier, like many studies on attempted suicide cases, we obtained genes that are associated with various immunological, neurological, and infectious diseases. None of the identified SNPs, gene, or loci were earlier identified in previous GWAS studies. However, most of GWAS study has been conducted for the European and American ancestry and very limited attempts have been made for the Asian population, especially the Indian population. The major functional ontologies of identified genes, however-cell signaling, neurological effect, cell adhesion, and transcriptional activities-were discovered to overlap with previous studies.

Conclusions
Among the top candidate genes, we identified functional domains of transferase, kinases, immune responses, DNA binding, and signaling. Many top candidates are associated with CNS. We observed that MAF values in in-house controls are close to the SAS population, but there were cases (i.e., suicidal death group) where MAF values were relatively higher, as reflected by the odds ratio. The SNPs identified in our study were not previously reported. To the best of our knowledge, this study is one of the first GWAS conducted on suicidal death subjects in the Indian population. The results indicate few novel SNPs that may be associated with suicide and require further investigation.

Limitation
The moderate sample size of our GWAS study is the limitation of our study. As per conventional GWAS studies, we ended up with a smaller number of samples. Another limitation of the study is population stratification. The present SAS population does not entirely represent the Indian population. With a diversified ethnic Indian population, Delhi is a cosmopolitan city. The Indian population represents genetic diversity [43], and variable stratification levels complicate such studies, making them difficult to quantify. This is one of the study's shortcomings.

Data Availability Statement:
The data that support the findings of this study are available on request from the corresponding author [C.B.]. The data are not publicly available due to due to confidential restrictions applicable to health data but could be available under request and prior consent of the Institute's Ethics Committee.