Unravelling Genes and Pathways Implicated in Working Memory of Schizophrenia in Han Chinese

Working memory deficit is the core neurocognitive disorder in schizophrenia patients. To identify the factors underlying working memory deficit in schizophrenia patients and to explore the implication of possible genes in the working memory using genome-wide association study (GWAS) of schizophrenia, computerized delay-matching-to-sample (DMS) and whole genome genotyping data were obtained from 100 first-episode, treatment-naïve patients with schizophrenia and 140 healthy controls from the Mental Health Centre of the West China Hospital, Sichuan University. A composite score, delay-matching-to-sample total correct numbers (DMS-TC), was found to be significantly different between the patients and control. On associating quantitative DMS-TC with interactive variables of groups × genotype, one SNP (rs1411832), located downstream of YWHAZP5 in chromosome 10, was found to be associated with the working memory deficit in schizophrenia patients with lowest p-value (p = 2.02 × 10−7). ConsensusPathDB identified that genes with SNPs for which p values below the threshold of 5 × 10−5 were significantly enriched in GO:0007155 (cell adhesion, p < 0.001). This study indicates that working memory, as an endophenotype of schizophrenia, could improve the efficacy of GWAS in schizophrenia. However, further study is required to replicate the results from our study.


Introduction
Schizophrenia is a psychiatric disorder that affects about 1% of the world population [1]. Both win and adoption studies have shown that genetic factors play an important role in the pathogenesis of schizophrenia (heritability close to 0.8) [2]. Schizophrenia, having a high heritability, is a multi-dimensional disorder that cannot be explained by Mendelian genetics. It is now generally agreed that it is caused by combined effects of hundreds or thousands of gene variants with modest effects [3]. Advances in whole-genome genotyping and related analysis methods have enabled identification of these variants. Since 2009, many genome-wide association studies (GWAS), with case-control designs, have been carried out to study the genetic architecture underlying schizophrenia [4][5][6]. The results from a GWA study with the largest sample size so far, identified 108 variants that are associated with schizophrenia. Although some variants, such as ones in ZNF804A and variants in major histocompatibity complex (MHC), have been reported to be replicated, many SNPs were found to be non-overlapping in this study [7]. The inconsistency was attributed to factors such as population stratification, sample size, and clinical heterogeneity, with clinical heterogeneity being the most difficult confounding factor in case-control GWASs. Unlike other complex disorders, diagnosis of schizophrenia lacks reliable biomarkers and animal models. In fact, most of its diagnoses are based on subjective judgments from clinical practitioners. Additionally, lack of objective laboratory measures tends to create a barrier between the disorder diagnosis and its management. Although large sample size is a good strategy to map related variants, many case-control studies still failed to produce satisfactory results. Alternatively, various researchers used quantitative traits (QTs) as endophenotype or intermediate phenotype in order to enhance the efficacy of the GWAS in schizophrenia [8,9]. In comparison to dichotomous phenotype (subjective and random-prone diagnoses), QT is easy to measure and can be standardized. In a continuum from gene to clinical outcome, quantitative pathological changes are supposed to be more proximal for genetic underpinnings of the disease. Various studies have shown that QTs improve the efficacy of GWAS [6,[9][10][11][12]. Further, they were also found to be beneficial in mapping the genes associated with schizophrenia. Dickinson et al. [13] used g score, a composite score combining six neuro-cognitive dimensions as QTs in a genome association study to map potential genetic variants associated with schizophrenia. This approach led to unveiling the effects of SCN2A on general cognitive ability, brain physiology, and mRNA expression in schizophrenia. Potkin et al. [14] carried out a genome-wide study on schizophrenia by using QTs with results showing that variants in six genes (POU3F2, TRAF, GPC1, POU3F2, TRAF, and GPC1) were associated significantly with working memory task-related bold signal of functional MRI in schizophrenia, with a p-value threshold of 10 −6 . Furthermore, they identified these six genes/regions involved in pathways related to neurodevelopment and response to stress. These studies demonstrated the efficacy of using QTs as endophenotype in exploring the pathogenesis of schizophrenia.
Previous studies have demonstrated that working memory deficit exists both in schizophrenic patients and their biological relatives [15][16][17]. In the present study, we used the total correct numbers of delay-matching-to-sample (DMS-TC) as QTs, which is a composite score generated from delay-matching-to-sample (DMS) test in the Cambridge Neuropsychological Test Automated Battery (CANTAB), to explore common genetic variants underlying the working memory deficit in schizophrenia by using a hypothesis-free GWAS analysis. Furthermore, we used ConsensusPathDB (available on line: http://consensuspathdb.org/), one of the most widely applied pathway databases, to analyze pathway over-representation of genes that the associated variants are located in or close to. It is assumed that this downstream strategy will increase validity of the study and might shed light on the mechanism of how these biological pathways bridge genes with neurocognitive deficits in schizophrenia.

Demographic Characteristics and Delayed-Matching-to-Sample (DMS) Test
A hundred and forty first-episode and drug-naive patients with schizophrenia and 100 healthy controls were included in the study. There was no difference between patients and controls in term of sex, age, and education years (Table 1). Nineteen scores of 125 individuals were successfully uploaded into the result dataset (67 controls, 58 cases) after DMS test. Of these 19 measurement scores from DMS, 10 scores remained significantly different between patients and healthy controls with age, sex, and education year being adjusted for. However, after multiple testing, only one of these scores, DMS-TC, survived Bonferroni correction, as shown in Table 1 (p < 0.05). From the study, it was observed that DMS-TC score can be used for QTs in subsequent analysis.

Analysis of Quantitative Traits (QTs) and Over-Representation Study
A total of 742,805 SNPs passed the quality control, with a mean call rate of 98.9%. However, seven patients and six controls failed to pass the quality control and cryptic relatedness, and thus were excluded from subsequent analysis. Multidimensional scaling (MDS) in PLINK showed that individuals were tightly clustered, indicating that individuals were of the same ancestral Chinese Han origin ( Figure 1). Following quality control, inflation factor (λ), generated from PLINK analysis was found to be 1.0099, which showed that confounding factors were well-adjusted ( Figure 2). Since there was no conspicuous population structure among the study samples and no significant deviation of the observed distribution, principal components from MDS was not chosen as a covariate for the linear regression analysis. Finally, genotyping data of 93 patients and 134 controls passed the quality control and were included for the subsequent association study. Results from the association test of genotype × group and DMS-TC QTs demonstrated that rs1411832 was found to be the most significant SNP, which was located at the downstream of YWHAZP5 (p = 2.02 × 10 −7 ) ( Table 2). 121 autosomal variants, located in or within flanking areas of 46 genes, passed the significant threshold of 5 × 10 −5 using DMS-TC as quantitative trait (Figure 3). These annotated genes were used to map the significant pathways for the study. Pathway over-representation analysis of annotated genes by ConsensusPathDB showed that nine genes (SCARB1, DSCAM, LGALS4, COL14A1, PTPRT, IGSF11, DSCAML1, SMOC2, and FAT3) were significantly over-represented in one GO, cell adhesion (GO:0007155, p < 0.001).

Association Study of Working Memory Deficit as QT
The current study is the first GWAS utilizing one of the composite scores of the working memory task paradigm as QT to map common variants associated with schizophrenia. In this study, significant differences in computerized DMS were found between schizophrenia patients and healthy controls. One Composite scores, DMS-TC, was therefore chosen as QTs to associate with interaction of genotypes × group. Additionally, genes containing variants of significant importance were mapped by one GO term using pathway database. Results of the study highlighted the validity of the method used for genetic underpinning in the pathology of schizophrenia.
We found rs1411832 with the smallest p value in one of the intergenic regions downstream of gene YWHAZP5 (tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, zeta pseudogene 5). Most pseudogenes are very similar protein-coding genes and some are believed to be extra copies of genes; a recent study demonstrated that some pseudogenes are functional [18]. YWHAZ has been well-studied in schizophrenia [19][20][21], its gene product belongs to the 14-3-3 family of proteins which mediate signal transduction by binding to phosphoserine-containing proteins. 14-3-3 is abundant in brain which points to its critical role in neuronal functions [22,23]. Initial finding of association can be followed up by (i) sequence similarity between YWHAZ and YWHAZP5; (ii) better defining the nature of the potential association between YWHAZ and YWHAZP5; and (iii) attempting to replicate it in two additional samples. Furthermore, some studies have shown that intergenic variants might play an important role in regulating expression of nearby genes [24,25] and they are also often in linkage disequilibrium with causal variants in the gene. Further study should be conducted on causal variants in LD with this intergenic variant. Meanwhile, Since the study was trait-associated GWAS and both environment and gene can contribute to complex traits, it is likely that this significant variant regulates the expression of YWHAZP5 through epigenetic progress. Thus further studies are required to validate this presumption.

Over-Representation of Genes in Single Pathway
Pathway over-representation analysis of annotated genes by ConsensusPathDB showed that nine genes (SCARB1, DSCAM, LGALS4, COL14A1, PTPRT, IGSF11, DSCAML1, SMOC2, and FAT3) were significantly over-represented in one GO, cell adhesion (GO:0007155, p < 0.001). Of these nine genes, DSCAM and DSCAML1 (Down's syndrome cell adhesion molecule gene) are associated with Down's syndrome. Various studies have demonstrated a relationship between DSCAM and neurobehavioral phenotype of Down's syndrome which includes working memory deficit [26][27][28][29]. Recently, exome sequencing has detected genetic overlapping between schizophrenia and neurodevelopmental disorder [30,31]. From the above studies, we concluded that there should at least be a subgroup in schizophrenia that shares the same molecular pathological pathway with neurodevelopmental disorder. Beside this, PTPRT was often found to be related to neurodevelopment and cell growth [31,32]. A neurodevelopment model of schizophrenia has been frequently verified in studies involving different research strategies. Findings from our study add further evidence that neurodevelopmental deficit is associated with the pathogenesis of schizophrenia.

Functional Characterization of Genes Post GWAS
In recent years, various arguments were made on the extrapolation of the GWAS results in order to clarify the current vague picture. After GWAS, a posteriori functional pathway is one of the most authentic strategies to study the role of mapped genes in the complex disorder [33][34][35]. Pathway approaches have been adopted in many studies [36,37] and are shown to be robust to detect the joint action of variants of small effect clustering within biological pathways that play a major role in predisposing to complex genetic disorders and they can increase power by summarizing combined effects of all SNPs within a pathway in attempt to make biological meaningful interpretations of the data [38][39][40]. Even very large GWAS may lack power to identify small SNP effects, but these may be detectable at a pathway level.
In our study, subsequent functional enrichment from ConsensusPathDB highlighted that the tagged genes were mostly enriched in cell adhesion (GO:0007155). Cell adhesion molecules in nervous system and neural cell adhesion molecules (NCAMs) play a critical role in neural development such as cell adhesion, growth, and migration as scaffold for novel binding proteins [41][42][43], and various studies have demonstrated changes in NCAMs in schizophrenia patients. Although NCAM abnormalities in schizophrenia patients have been widely described, very few studies focus on the relationship between extracellular matrix and paradigm of the visual working memory test.

Participants
One hundred first-episode schizophrenia patients (recruited from West China Hospital of Sichuan University, Chengdu, China) and 140 demographically matched healthy controls without family history of psychological disorders (recruited from the local neighborhood), were included in the study. Patients were interviewed with SCID (structured clinical interview) by trained psychiatrists to ensure the diagnosis was based on DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, fourth edition) criterion, whereas healthy controls were screened with structured clinical interview-non-patient (SCID-NP) to ensure the absence of psychiatric illness. Patients were followed up to 6 months in order to confirm the diagnosis. No anti-psychotic medication was administered to patients at the time of clinical and neuro-cognitive evaluation. Both patients and controls were excluded if they had one of following conditions: (i) organic cerebral diseases; (ii) neurological diseases; (iii) severe endocrine diseases; (iv) axis I and II diagnosis other than schizophrenia according to DSM-IV; and (v) single or double limb palsy. Informed written consent was obtained from all the participants after explaining the study. This study was approved by the Institutional Review Broad (IRB) of West China Hospital, Sichuan University.

DMS Test Paradigm
DMS is one of the computerized tests in the Cambridge Neuropsychological Test Automated Battery (CANTAB) (available on line: http://www.cantab.com), which is used to assess working memory. The paradigm of the DMS test is described elsewhere in detail [44]. DMS test data was obtained for 125 out of 227 genotyped individuals (58 patients and 67 healthy controls). There are 19 scores from the DMS test which belongs to three categories: correctness in terms of number and percentage (total correct percentage, correct percentage on simultaneous level, correct percentage on all delays, correct percentage on 0 ms delay, correct percentage on 4000 ms delay, correct percentage on 12,000 ms delay, total correct numbers, total correct numbers on all delays, total correct numbers on simultaneous level, total correct numbers on 0 ms delay, total correct numbers on 4000 ms delay, total correct numbers on 12,000 ms delay); latency at both simultaneous and delayed level (mean correct latency, mean correct latency on all delays, mean correct latency on simultaneous level); and statistical analysis of discrimination between signal and noise (prob error given correct, prob error given error, DMS A', DMS B'). DMS-TC, we are choosing here as QTs for association study, belongs to correctness in terms of number and percentage. It indicates the total number of trials in which subjects selected correct stimulus as their first response and is calculated by combination of total correct numbers on simultaneous level and total correct numbers on all delays.

DMS Test
Analysis of variance (ANOVA) using statistical package for the social sciences (SPSS21.0, SPSS, Inc., Chicago, IL, USA) was used to compare DMS test scores between patients and healthy controls, adjusting age, education years, and gender as covariates. The multiple test was accounted for by using Bonferroni correction. To justify the inter-group differences, statistically significant values (p-value) were set to p < 0.05 (Bonferroni correction).

Genotyping and Quality Control
DNA was extracted from whole blood samples using the standard isolation method and the genotyping was performed on the HumanOmniZhongHua-8 Bead Chip platform. Genotyping data were systematically filtered according to their genotyping rates (per sample and per marker), minor allele frequency (MAF), and Hardy-Weinberg equilibrium tests (only in controls) in PLINK. Participants with low genotyping rate (<97%), markers with missing rate >5% per individual and/or with MAF >0.05, and markers that failed to pass Hardy-Weinberg equilibrium tests (p ≤ 0.00001) were excluded from the study. Furthermore, gender of each participant, with genotyping data on gender-specific loci, was confirmed by genotyping platform and the participants with the lower genotype call rate were excluded if pairs of participants with identical genotypes were found. After systematic filtering, 109,923 SNPs and four individuals (two patients and two healthy controls) were excluded based on the set threshold value.

Correction for Population Stratification
For further analysis of the genetic relationship between SNPs and population structure within the sample, remaining SNPs were further pruned to ensure linkage equilibrium between the SNPs. SNPs with r 2 > 0.5, were consequently excluded from the study (PLINK command-indep-pairwise 50 10 0.5). The multi-dimensional scaling algorithm (MDS) for cryptic relatedness in PLINK was applied to a pruned sample. Matrix generated from MDS was visualized by plotting R package (available on line: http://www.r-project.org). Thirty eight thousand five hundred and fifty seven SNPs and nine subjects (five cases and four healthy controls) were excluded after failing the genetic relationship test and cryptic relatedness. Inflation factor, lambda (λ) was estimated by calculating the mean of observed and expected chi-square test statistics to analyze the results of whole-genome association studies for over-dispersion due to population substructure and other confounding factors.

Linear Regression Analysis
DMS-TC scores, which significantly differed between patients and health controls, were used as QTs in regression analysis. Of the total of 227 subjects with high-quality genotyping data, 125 with DMS-TC were included for QT analysis. The QT analysis was based on comparing the differential effects of SNPs on the DMS QTs. Out of the four possible models (additive, co-dominant, dominant, and recessive), additive components (reflecting the additive contribution of risks for complex diseases) were included in the linear model. Multivariate linear regression model in PLINK1.07 [45] was used to assess the correlation of QT and interaction of group and additive genetic risk of minor allele of 742,805 SNPs which passed genetic quality control while adjusting for age, sex, and education year. The group was labeled as discrete variable (1 = patients, 2 = healthy controls) to define the interaction with the genetic risk of allele. Finally, a Manhattan plot was plotted using R to illustrate the whole genome-wide association scans.

Over-Representation Study
To map the pathway over-represented by genes from association study, genes with SNPs passing a threshold of 5 × 10 −5 were further explored by web server of ConsensusPathDB (available on line: http://cpdb.molgen.mpg.de/CPDB) to detect the set most enriched by these genes. Hyper-geometric test detect the gene set in predefined lists of functionally associated genes (pathways, gene ontology categories, and neighborhood-based entity sets) with p-value indicating the extent of corresponding enrichment. Previous studies shown that the p value of 5 × 10 −8 is rather conservative which may lead to excessive false negative especially for gene-or pathway-based analysis in smaller sample size [46]. Some genes missed in the single-locus association test can be detected significantly when integrated into pathway association analysis [47][48][49]. In addition, Nicolae et al. [50] found that trait-associated SNPs below 5 × 10 −4 are more likely to be eQTLs, and these signals from GWA study are not exhausted after fully investigating the results of expression quantitative trait locus (eQTL) studies in lymphoblastoid cell lines from HapMap samples. Enlightened by these studies, we set 5 × 10 −5 as the empirical cut-off p value to identify potential pathways involved in working memory deficit in schizophrenia.

Conclusions
In our study, GWAS was conducted to find the genes that were significantly associated with working memory deficit of schizophrenia. The results showed that variants near one gene, which encodes YWHAZP5, were most significantly associated with working memory deficits in schizophrenia patients. Additionally, downstream pathway analysis showed that NCAM is essential in working memory-related pathogenesis of schizophrenia.
Although the study was promising in various aspects, there are some limitations associated with it. First, we chose only one part of results from DMS, i.e., DMS-TC, as our QTs. We are fully aware that there are scores of many kinds generated from DMS of CANTAB, we multiple-corrected all p-values generated from DMS test using Bonferroni correction, and only the total correct numbers of the DMS could survive after the correction, which imply the strongest effect size in all indicators of the DMS. The other indicators could not survive after Bonferroni correction, so they were not included in subsequent analysis.
Second, this study is an explorative study of methodology involving variant mapping and annotation of its related biological pathway. SNP with the most significant p-value in our study is rs1411832 in chromosome 10 (p < 2.02 × 10 −7 ), which is acceptable considering our sample size and missing phenotype data [51]. However, given the limitations of GWAS and its sample size, larger sample size with inclusion of both common and rare variants is required to validate the results of this study. Additionally, patients included in the study were all first-episode and drug-naïve. Some of them even reported difficulty in completing the task, thus resulting in missing phenotype data; given the study design, this is an inevitable trade-off for not being cofounded by medication and chronicity of the disease. In summary, this study showed that working memory might be one of the endophenotypes of schizophrenia and these endophenotypes can increase the efficiency of GWAS in neuropsychiatric disorders such as schizophrenia.

Author Contributions
The whole study was conceived and designed by Hongyan Ren, Qiang Wang and Tao Li. Recruitment of patients and healthy controls was carried out by Hongyan Ren, Chengcheng Zhang, Chaohua Huang, Na Li, Mingli Li, Yinfei Li, Deng Wei and Xiaohong Ma. Both genotypic data and cognitive data were analyzed by Hongyan Ren, Chengcheng Zhang, Qiang Wang and Bo Xiang Extraction of DNA was implemented by X.M. and Bo Xiang. The manuscript was written by Hongyan Ren, Qiang Wang and Tao Li.