Whole-Exome Sequencing Reveals Migraine-Associated Novel Functional Variants in Arab Ancestry Females: A Pilot Study

Migraine, as the seventh most disabling neurological disease with 26.9% prevalence in Saudi females, lacks studies on identifying associated genes and pathways with migraines in the Arab population. This case control study aims to identify the migraine-associated novel genes and risk variants. More than 1900 Arab ancestry young female college students were screened: 103 fulfilled the ICHD-3 criteria for migraine and 20 cases confirmed in the neurology clinic were included for the study with age-matched healthy controls. DNA from blood samples were subjected to paired-end whole-exome sequencing. After quality control, 3365343 missense, frameshift, missense splice region variants and insertion–deletion (indels) polymorphisms were tested for association with migraine. Significant variants were validated using Sanger sequencing. A total of 17 (p-value 9.091 × 10−05) functional variants in 12 genes (RETNLB, SCAI, ADH4, ESPL1, CPT2, FLG, PPP4R1, SERPINB5, ZNF66, ETAA1, EXO1 and CPA6) were associated with higher migraine risk, including a stop-gained frameshift (-13-14*SX) variant in the gene RETNLB (rs5851607; p-value 3.446 × 10−06). Gene analysis revealed that half of the significant novel migraine risk genes were expressed in the temporal lobe (p-value 0.0058) of the cerebral cortex. This is the first study exploring the migraine risk of 17 functional variants in 12 genes among Saudi female migraineurs of Arab ancestry using whole-exome sequencing. Half of the significant genes were expressed in the temporal lobe, which expands migraine pathophysiology and early identification using biomarkers for research possibilities on personalised genetics.


Introduction
Migraine is a multifactorial chronic neurological disorder with a global prevalence between 14 to 21.7% [1]. It creates a burden on the economy as well as on the normal lives of individuals [2]. It is considered the seventh most disabling disease around the world as it causes 2.9% of years of life lost to disability (YLD) [3][4][5]. In Saudi Arabia, the prevalence of migraines in females is around 26.9% and is considered higher than the global average [6,7].
In addition, recent studies in this population revealed that high oestrogen levels could be involved in mediating the non-menstrual-related migraine among young Saudi females [8], and serum ApoE was reported as an excellent diagnostic marker for the same subjects with migraine in ictal or interictal phase [9]. However, genetic studies are deficient concerning migraines in this population [8].
Migraine is classified mainly into two forms: migraine with aura (MA) and migraine without aura (MO). It can also be classified into a chronic and episodic migraine. Hemiplegic migraine is another type of MA and is a severe and rare condition that affects one side of the body and causes temporary numbness [9]. It once was believed that migraine is a vascular disease; however, many recent types of research confirmed the involvement of multiple mechanisms related to the brain structures and processes [10]. Furthermore, accumulated evidence demonstrates a genetic background for the transmission of the disease. Many epidemiological studies showed the passage of migraine in families with a concordance rate of 1-to 2-fold in monozygotic twins in comparison to dizygotic twins [11]. Different approaches were used to determine the genetic factors and the DNA variants that might be responsible for migraine. Some studies focused on candidate-gene-associated studies (CGAS) to identify specific genetic markers associated with migraine but the negative result of these studies and the development of cost-effective techniques resulted in genome-wide association studies [12]. In these studies, hundreds of thousands of single-nucleotide polymorphisms (SNPs) were screened for their association with migraine. Studies from headache clinics of the Netherlands, Germany and Finland on a 10,747 people led to the identification of rs835740, a single, significant SNP [13]. A similar recent study identified 38 different SNPs and 7 loci, TRPM8, LRP1, FHL5, TSPAN2, ASTN2, near FGF6, and PHACTR1, but none of them were found related to migraine without aura [14]. However, there are no detailed studies on migraine-associated genes from the Arab population [8]. Focused studies on SNPs especially in females are needed, as they are the most affected patients of migraine. Identification of these migraine-associated SNPs needs collaboration from different areas of the world to collect sufficient information from populations with different ethnicities and draw the full picture of the genetic predisposition of migraine. Therefore, this study was designed to identify the associated gene variants and single-nucleotide polymorphisms with migraine in young Saudi females using whole-exome sequencing analysis.

Materials and Methods
The study protocol was approved by the Institutional Review Board of Imam Abdulrahman Bin Faisal University (IRB number: IRB-2021-01-250) and was conducted following the Declaration of Helsinki. More than 1900 Arab ancestry young female college students were screened, 103 fulfilled the ICHD-3 criteria for migraine and 20 cases confirmed in the neurology clinic were included for the study with age-matched controls. This is an exome-wide association study involving 40 participants: 20 controls (healthy subjects) and 20 cases (migraineurs) conducted in the female campus of the College of Medicine and the Institute for Research and Medical Consultations (IRMC) of Imam Abdulrahman bin Faisal University (IAU), Dammam, Saudi Arabia. Subjects were recruited during the year 2021 by convenience sampling and were all Saudi female college students with an age range of 18-30. For the cases to be included in the study, they had to be diagnosed with migraine by a neurologist and satisfy the criteria of the International Classification of Headache Disorders 3rd edition (ICHD). Controls were healthy female subjects of the same age group and with no complaints of headache.
All participants filled a written informed consent for their enrolment in the study, and then they were interviewed and asked to fill an electronic datasheet (Supplementary Material S1). The datasheet included the demographic characteristics (age, marital status, college level, height, weight, BMI). In addition, it included specific questions related to migraine such as frequency of attacks/month, the severity of the attack (using the visual scale 1-10), associated symptoms, presence of triggers (stress, lack of sleep, missed meal or fasting, physical activity, noise, smell, strong lights, fluctuation of weather or temperature, food, relation to the menstrual cycle), type of migraine (with aura, without aura), family history of migraine, history (other chronic diseases) and use of medications (for migraine, for other diseases).

Whole-Exome Sequencing and Statistical Data Analysis
Blood samples were obtained from the study participants in EDTA-vacutainers. Deoxyribonucleic acid was extracted using QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany), DNA purity was checked using nanodrop and concentration was determined using a qubit fluorometer. DNA integrity was tested using agarose gel electrophoresis. Paired-end whole-exome sequencing was conducted for all the samples and subjected to quality screening. The qualified DNA sample was randomly fragmented (150 to 200 bp) and adapters were ligated to both ends and purified by the AMPure beads to excise about 200 bp fragments. Fragmented DNA was subjected to ligation-mediated polymerase chain reaction, followed by the SureSelect Library for enrichment. Captured ligation-mediated polymerase chain reaction products were estimated using Bioanalyzer and loaded on Hiseq platform for high-throughput sequencing. Sample depth means ≥ 7.5, sample variant call rate ≥ 0.5 and sample genotype quality mean ≥ 28 were considered for good quality sample. The conditions such as Phred score quality ≥ 30, raw read depth ≥ 10 and mapping quality ≥ 30 were applied for filtering the good quality variants. The Hail standard (python package) was used for the entire genome-wide association study pipeline analysis as well as for the quality check of the samples, variants and genotypes. Manhattan plot and QQ plot of the association of SNPs with migraine as statistical significance in terms of p-values on a genomic scale were constructed. Gene-level linkage disequilibrium analysis of the SNPs in the significant gene was conducted using Haploview 4.2.

Statistical Analysis
Data were analysed using a statistical package for the social sciences (SPSS) software version 21. Student's t-test was used for identifying the significant difference among the demographic characteristics between migraineurs and the controls. The most significant top 10 genes were analysed using the gene functional classification tool DAVID to identify the significance in the site of expression under the category of GNF_U133A_QUARTILE (p-value < 0.05) [15]. The functional annotation of the top 50 highly associated genes (with the lowest p-value < 0.00023) was performed using the Uniprot database. Further, the GO and pathway enrichment were carried out using enrichR server, and pathway involvement of the genes was performed by KEGG search and colour pathway server. All the significant markers identified from the exome-wide association study and genes were selected for the expression profile analysis in the brain and related tissues using DAVID. Brain and related tissue-expressed genes were separated and analysed for KEGG pathway enrichment.

Study Population
Study participants (Table 1) were drawn from Arab ancestries. The demographic characteristics of the 40 selected Saudi Arabian subjects including 20 migraine patients and controls (n = 20) are presented in Table 1. The clinical characteristics and the frequency of the precipitating factors of headache attack in the migraineurs of the study are presented in Table 2.  Table 2. Clinical characteristics of the migraineurs of the study and the frequency of the precipitating factors.

Single-Variant Analysis
After quality controls for the variants were obtained in the whole-exome sequencing, 3,365,343 variants were satisfied for further exome-wide association analyses. Our study highlights the added influence of considering the functional variants such as missense variants, frameshift variant and missense splice region variant in the analysis: p-values < 0.00001 were set from the exome-wide association analysis (Table 3) to prioritise the top migraine-associated variants in Arab ancestry. The entire list of migraine-associated (p-value < 0.00001) variants identified through exome sequencing are shown in Supplementary Table S1. Seventeen variants were found to be the most significant (9.091 × 10 −5 ) functional variants distributed among twelve genes (RETNLB, SCAI, ADH4, ESPL1, CPT2, FLG, PPP4R1, SERPINB5, ZNF66, ETAA1, EXO1 and CPA6) ( Table 3 and Figure 1). The stop-gained frameshift (-13-14*SX) variant in the gene RETNLB was found to be the most significant functional variant (rs5851607; p-value = 3.446 × 10 −06 ).    Table S1.

Gene Analysis
The most significant top 12 genes were analysed using the gene functional classification tool DAVID to identify the significance in the site of expression. The analysis revealed that 6 out of 12 genes (Table 3) were significantly expressed in the temporal lobe (p-value = 0.00582) (Figure 2). The functional variants with a p-value between 9.091 × 10 −05 to 0.05 on the genes identified as expressed in the temporal lobe revealed the significance of FLG gene with 37 functional variants ( Table 4). The significant SNPs observed in the FLG gene, and their amino acid position, are presented in Figure 3. The gene-level linkage disequilibrium analysis (Haploview 4.2) of the 53 SNPs in the FLG gene reveals significant association for all these SNPs (Chi-Square = 8.556; p-value = 0.0034) (Figure 3). The most significant three markers haplotype rs3126075G, rs7532285T and rs7540123G (Chi-Square = 7.64; p-value = 0.0057) appear to associate significantly with migraine, while the opposite alleles rs3126075C, rs7532285C and rs7540123C (Chi-Square = 3.81; p-value = 0.0407) are the protective type of haplotype. The presence of variants at FLG was confirmed using Sanger sequencing with the designed primers (forward primer, FLGF: 5 CCTCTACCAGGT-GAGCACTCATGAACAGTCTG 3 , and reverse primer, FLGR: 5 TCTCTGACTGCAGAT-GAAGCTTGTCCGTGCC 3 ). Sanger sequencing revealed additional variants in the gene and the need to check their association with migraine ( Figure 4).    Table 3) associated functionally were used as input to identify the expression nature of them using the gene functional classification tool DAVID with the 'GNF U133A QUARTILE'. The most significant (p-value < 9.091 × 10 −05 ) functional variants are presented in the middle path. The functional variants with a p-value between 9.091 × 10 −05 to 0.05 on the genes identified as expressed in the temporal lobe are also presented. A total of 37 functional variants are found to be significant in the gene FLG (full list presented in Table 4).  Table 3) associated functionally were used as input to identify the expression nature of them using the gene functional classification tool DAVID with the 'GNF U133A QUARTILE'. The most significant (p-value < 9.091 × 10 −05 ) functional variants are presented in the middle path. The functional variants with a p-value between 9.091 × 10 −05 to 0.05 on the genes identified as expressed in the temporal lobe are also presented. A total of 37 functional variants are found to be significant in the gene FLG (full list presented in Table 4).

Pathways Analysis
Gene ontology and pathway enrichment of the top 50 genes revealed that the migraine subjects are moderately significant for the organic hydroxy compound catabolic process (GO:1901616; p-value = 9.12893 × 10 −05 ; adjusted p-value = 0.026108731) and quinone metabolic process (GO:1901661; p-value = 0.000272123; adjusted p-value = 0.038913602) ( Table 5; Supplementary Table 2). All the significant markers having genes were checked for the expression profile in the brain-related tissues using DAVID. A total of 6305 genes were present for DAVID among 11958 genes presented for expression analysis using DAVID. A total of 1349 genes were separated based on the expression in the brain and related tissues were analysed for KEGG pathway enrichment; the results revealed that 34 genes were found to be significantly (term p-value = 8.071 × 10 −12 ; adjusted p-value = 2.364 × 10 −09 ) associated with systemic lupus erythematosus (Table 6;  Supplementary Table 3; Figure S3). Furthermore, the pathways such as focal adhesion, ECM-receptor interaction, human papillomavirus infection, alcoholism, pathways in cancer, pi3k-akt signalling pathway and cholesterol metabolism are significantly associated with migraine in the Saudis (adjusted p-value ≤ 2.192 × 10 −05 ) ( Table 5). Table 5. KEGG pathway enrichment from the top 50 genes associated with the exome wide association study analysis.

Pathways Analysis
Gene ontology and pathway enrichment of the top 50 genes revealed that the migraine subjects are moderately significant for the organic hydroxy compound catabolic process (GO:1901616; p-value = 9.12893 × 10 −05 ; adjusted p-value = 0.026108731) and quinone metabolic process (GO:1901661; p-value = 0.000272123; adjusted p-value = 0.038913602) ( Table 5; Supplementary Table S2). All the significant markers having genes were checked for the expression profile in the brain-related tissues using DAVID. A total of 6305 genes were present for DAVID among 11958 genes presented for expression analysis using DAVID. A total of 1349 genes were separated based on the expression in the brain and related tissues were analysed for KEGG pathway enrichment; the results revealed that 34 genes were found to be significantly (term p-value = 8.071 × 10 −12 ; adjusted p-value = 2.364 × 10 −09 ) associated with systemic lupus erythematosus (Table 6; Supplementary Table S3; Figure S3). Furthermore, the pathways such as focal adhesion, ECM-receptor interaction, human papillomavirus infection, alcoholism, pathways in cancer, pi3k-akt signalling pathway and cholesterol metabolism are significantly associated with migraine in the Saudis (adjusted p-value ≤ 2.192 × 10 −05 ) ( Table 5). * The p53 signalling pathway with the significant genes is presented in Figure S1. $ The fatty acid degradation with the significant genes is presented in Figure S2. Table 6. KEGG pathway enrichment from the 1349 genes based on the expression in brain-related tissues associated with the exome-wide association analysis.

Discussion
Genetics play a significant part in migraines, in addition to other factors [16]. However, migraine genetic predisposition does not follow a direct Mendelian pattern. The common form of migraine is most probably polygenic and involves multiple variants at several genetic loci that possibly interact with multiple environmental factors. Exome-wide association study is the most successful method to identify the genes involved in a disease. In this methodology, cohorts of migraine cases and controls are explored for any differences in allele frequencies of single-nucleotide polymorphisms (SNPs) to identify genetic risk factors. There is no single genetic variant that can explain migraine heterogeneity across populations. We performed the first exome-wide association study of migraine in Arab ancestry. Through exome sequencing, we identified an entire list of migraine-associated (p-value < 0.00001) variants and prioritised 17 as the most significant (9.091 × 10 −05 ) functional variants distributed among 12 genes (RETNLB, SCAI, ADH4, ESPL1, CPT2, FLG, PPP4R1, SERPINB5, ZNF66, ETAA1, EXO1 and CPA6) in the Saudi females suffering from migraine. All of these were novel and have not been documented in earlier studies involving other populations such as Europeans [17] and Chinese [18].
The stop-gained frameshift variant in the gene RETNLB is the most significant functional variant, rs5851607; this gene encodes a bactericidal protein, Resistin-like molecule β (RELMβ), which is released from colonic cells to destroy Gram-negative bacteria. Migraine may be associated with diseases such as irritable bowel syndrome (IBS), inflammatory bowel syndrome, and celiac disease [19]. The ADH4 gene encodes the alcohol dehydrogenase enzyme and variations in this gene are associated with alcohol dependence [20]. The CPA6 gene encodes the Carboxypeptidase A6 enzyme, and its mutations can predispose to various types of epilepsy [21]. As shown in Figure 2 and Table 3, a total of 36 functional variants were found to be significant in the gene FLG. This gene encodes a protein called profilaggrin present in the epidermis of the skin. This protein is important for the skin's barrier function. Functional variations in this gene can cause sensitisation or atopic dermatitis [22] and might be the underlying mechanism of migraine-associated allodynia [23]. The ETAA1 gene encodes the protein Ewing tumour-associated antigen 1. This protein functions as a DNA replication stress response protein [24]. The CPT2 gene encodes the carnitine palmitoyltransferase 2 enzyme, which is essential for fatty acid oxidation. The ESPL1 gene codes a protease separase/separin which causes separation of sister chromatids in mitosis [25]. The SERPINB5 gene encodes a protein Maspin, a tumour suppressor that binds directly to extracellular matrix components and inhibits tumour-induced angiogenesis, invasion and metastatic spread [26]. SCAI encodes a protein that suppresses cancer cell invasion [27]. The significant association of multiple genes to migraine might help to explain the wide spectrum of migraine phenotypes.
The morphological changes in the temporal lobe were reported to be associated with migraines [27]. Recently, a reduction in grey matter volume in the temporal lobe was observed in migraine patients [28]. Gene analysis revealed that 6 of the significant 12 novel migraine risk genes were expressed in the temporal lobe of the cerebral cortex. The present study adds a molecular insight into the observations on the temporal lobe [29] and migraine-associated genes [30,31] and opens new avenues for migraine research. The current study will help in power calculations in the future and will provide potential loci to look for in replication studies. This may facilitate a thorough understanding of migraine pathophysiology and its underlying molecular mechanism, and open avenues for more precise diagnosis and therapeutic strategies targeting migraine patients of Arab ancestry. The study may also help in the polygenic risk scoring of the patients. The number of study subjects is one of the notable limitations in the study.

Conclusions
Our study is the first one exploring migraine genetic variations in Arab ancestry. Seventeen significant functional variants including a stop gained in twelve genes are the migraine risk variants in Arab ancestry. Half of the significant novel migraine risk genes are expressed in the temporal lobe of the brain.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/brainsci12111429/s1, Figure S1: The p53 signaling pathway with the significant genes from the study; Figure S2: The Fatty acid degradation with the significant genes is the study; Figure S3: Systemic lupus erythematosus (hsa05322) pathway and significant genes (highlighted) identified from the study. Table S1: List of migraine associated (p < 0.00001) variants identified through exome sequencing; Table S2: Gene ontology pathway analysis of the top 50 genes; Table S3: KEGG pathway enrichment from the 1349 genes based on the expression in brain related tissues associated in the GWAS analysis. Data Availability Statement: All data will be available on reasonable request from the corresponding author.