APOE Variants in an Iberian Alzheimer Cohort Detected through an Optimized Sanger Sequencing Protocol

The primary genetic risk factor for late onset Alzheimer’s disease (LOAD) is the APOE4 allele of Apolipoprotein E (APOE) gene. The three most common variants of APOE are determined by single nucleotide polymorphisms (SNPs) rs429358 and rs7412. Our aim was to estimate allele and genotype frequencies of APOE variants in an Iberian cohort, thus helping to understand differences in APOE-related LOAD risk observed across populations. We analyzed saliva or buccal swab samples from 229 LOAD patients and 89 healthy elderly controls (≥68 years old) from Northern Portugal and Castile and León region, Spain. The genotyping was performed by Sanger sequencing, optimized to overcome GC content drawbacks. Results obtained in our Iberian LOAD and control cohorts are in line with previous large meta-analyses on APOE frequencies in Caucasian populations; however, we found differences in allele frequencies between our Portuguese and Spanish subgroups of AD patients. Moreover, when comparing studies from Iberian and other Caucasian cohorts, differences in APOE2 and APOE4 frequencies and subsequent different APOE-related LOAD risks must be clarified. These results show the importance of studying genetic variation at the APOE gene in different populations (including analyses at a regional level) to increase our knowledge about its clinical significance.


Introduction
Alzheimer's disease (AD) is a progressive neurodegenerative disorder associated with cognitive decline and the leading cause of progressive dementia (60-80% cases) [1]. It is one of the most severe brain disorders of elderly and a major public health problem due to the increased life expectancy observed in most populations [2,3]. It is estimated that

Iberian Cohort
The cohort of this study was composed by a total of 229 clinically diagnosed Alzheimer patients with onset age ≥62 years old from Northern Portugal (N = 148) and Castile and León, Spain (N = 81), and 89 healthy controls ≥68 years old (N = 60 from Northern Portugal, and N = 29 from Castile and León). AD patients were clinically diagnosed following the criteria of the National Institute on Aging and Alzheimer's Association (NIA-AA) [18] and with the exception of 6 subjects, all of them performed a Mini-Mental State Exam (MMSE) [19]. The main characteristics of studied individuals are shown in Table 1. Informed written consents were obtained from all the participants or from their family and/or legal representatives. This project has been approved by the Ethics Committee of the University of Porto (report #38/CEUP/2018), Portugal.

Sample Collection and DNA Extraction
Saliva and buccal swab samples were collected from patients and controls by using the saliva collector Oragene DNA OG-500 (DNA Genotek Inc., Ottawa, ON, Canada) or sterile brush/cotton buccal swabs. The choice of collection method was based on the ability of the patient to provide a 2 mL saliva sample with the specific saliva collector. In cases where this was not possible (often in patients in a severe/advanced state of the disease), sterile buccal swabs were used.
DNA was extracted according to the manufacturer's instructions using the prepIT-L2P (DNA Genotek Inc., Ottawa, ON, Canada) for the saliva collectors and the Citogene Buccal Swab protocol from the Citogene Blood Kit (Citomed Lda., Odivelas, Portugal). For the buccal swab extraction procedure, an initial incubation step of 55 • C for 1-3 hours was introduced to ensure that most of dried epithelial cells were removed from the swab. After DNA extraction, samples were subjected to quantification of nucleic acids and quality control assessment by absorbance ratios (260/230 and 260/280) on a spectrophotometer (Nan-oDrop™ 1000, Thermo Fisher Scientific, Waltham, MA, USA). When necessary (i.e., poor Abs 260/230 ratio), a standard ethanol-based purification protocol was performed.

Whole Genome Amplification
Prior to the sequencing reaction, genetic material can be amplified to yield sufficient concentration of double-stranded DNA (dsDNA) in cases of low concentration [20] or simply to allow generation of more DNA for sample preservation (i.e., biobanking). To accomplish that aim, a genome-wide amplification protocol was performed according to the manufacturer's instructions, using the Illustra GenomiPhi V2 DNA Amplification kit (GE Healthcare, Chicago, IL, USA). In summary, 9 µL of sample buffer was added to 1 µL of DNA (10 ng), and the mix was heated at 95 • C for 3 minutes before snap-cooling on ice. Subsequently, 9 µL of reaction buffer and 1 µL of enzyme were added to the mix, followed by an incubation at 30 • C for 1.5 hours and 65 • C for 10 minutes to inactivate the enzyme.

APOE Genotyping
Although next generation sequencing (NGS) techniques display numerous advantages, the first generation methodology Sanger sequencing [21] is still considered a solid and reliable method, especially in clinical settings. Sanger sequencing provides a way to confirm variants attributed by NGS, while also being able to cover regions poorly screened by those new technologies [22,23].
To assess APOE variants, we sequenced a region encompassing both SNPs rs429358 (NC_000019.10:g.44908684T>C, GRCh38.p13) and rs7412 (NC_000019.10:g.44908822C>T, GRCh38.p13), after designing a single pair of primers (Primer3 tool; [24]) for amplification ( Figure 1). Given the GC-rich DNA segment of our target region, a PCR amplification protocol was optimized as shown in Figure 2: 3 µL DNA (5 ng/µL) was added to a mix of 0.5 µL Forward Primer (4 µM) (5 -GCCTACAAATCGGAACTGGA, Merck SA, Algés, Portugal), 0.5 µL Reverse Primer (4 µM) (5 -CTGCCCATCTCCTCATC, Merck SA, Algés, Portugal), 5 µL master mix Qiagen, and 1 µL Q Solution (Qiagen ® Multiplex PCR kit, Qiagen, Hilden, Germany). The following thermocycling conditions were used, optimized from those reported by Kushioka et al. [12]: 95 • C for 15 minutes; 38 cycles of 98 • C for 20 seconds, 62 • C for 30 seconds, and 68 • C for 45 seconds; followed by a final extension step at 68 • C for 10 minutes, and cooling down to 4 • C. PCR efficacy was confirmed for all samples through a standard polyacrylamide gel 40% (w/v) acrylamide:bisacrylamide (19:1) solution and silver staining protocol. Next, we proceeded to an enzymatic clean-up step by adding 2 µL of the DNA product to 1 µL of 1:5 Exonuclease I (Thermo Fisher Scientific, Waltham, MA, USA) and FastAP Thermosensitive Alkaline Phosphatase (Thermo Fisher Scientific), followed by incubation at 37 • C and 80 • C, 15 minutes each. Samples were then submitted to cycle sequencing reaction (BigDye Terminator v3.1 Cycle Sequencing kit, Applied Biosystems, Foster City, CA, USA). The cleaned-up PCR product (2.5 µL) from the last step was added to the sequencing reaction mix containing the reverse or forward primer for sequencing (0.5 µL at 3.2 µM primer stock), 1 µL of Big Dye v3.1 Terminator, and 1 µL of Big Dye v3.1 Sequencing Buffer, totalizing a final 5 µL volume. The thermocycler conditions used were 96 • C for 1 minute; 35 cycles of 96 • C for 15 seconds, 50 • C for 5 seconds, and 60 • C for 2 minutes; followed by a final cool down step to 4 • C. A final purification step was performed with an Illustra™ Sephadex™ G-50 Fine DNA Grade solution (GE Healthcare), with 750 µL of Sephadex solution pipetted into columns inserted in 2 mL microcentrifuge tubes and centrifuged at 1000 × g for 4 minutes. Sephadex columns were then transferred to clean tubes and the total sequencing reaction product was carefully pipetted into the middle of the column before new centrifugation under the same conditions (1000 × g, 4 minutes). Finally, 10 µL of highly deionized formamide (Hi-Di™ Formamide, Applied Biosystems™) was added to each sample followed by capillary electrophoresis of the sequenced products (3500 Series Genetic Analyzer, Applied Biosystems, Thermo Fisher Scientific).

Sequencing Results and Statistical Analysis
The resulting sequencing data was analyzed using Unipro UGENE bioinformatic tool, v.33, available at www.ugene.net [25], with the designed primers of APOE SNPs for SNP localization, and the reference sequence GRCh38/hg38 for alignment.
Testing for deviations from Hardy-Weinberg equilibrium (HWE) was performed for both rs429358 and rs7412 SNPs in Portuguese and Spanish control populations, after Bonferroni correction, which resulted in a significance level α equal to 0.0125. To test for non-random distribution of haplotypes into population samples under the hypothesis of panmixia, we performed exact tests of differentiation (100,000 Markov steps; α = 0.05). Both HWE and exact tests were computed in Arlequin software [26].
To estimate odds ratios (ORs) of developing Alzheimer's disease according to APOE genotypes and alleles, we used the online statistical software MedCalc ® (https://www. medcalc.org/calc/odds_ratio.php). ORs are presented with a 95% confidence interval (95% CI) using APOE3/APOE3 genotype and APOE3 allele as the reference in the analysis of APOE genotypes and alleles, respectively. When comparing ORs between males and females harboring the risk allele APOE4, we used APOE4 non-carriers as the reference.

Optimization of APOE Genotyping
Obtaining genotypes for APOE variants rs429358 and rs7412, which are responsible for APOE isoforms E2, E3, and E4, may present challenges due to amplification difficulties of the targeted fragment. The known difficulty in genotyping these two SNPs is most likely due to the high GC content of the region. In the present case, the amplicon flanked by the two primers used for both amplification and sequencing contains 74% of GC content ( Figure 1). Initial attempts to amplify and sequence this region using a standard amplification protocol were not successful, as we followed a basic PCR consisting of (1) an initial denaturation step at 95 • C, optimal temperature for our DNA polymerase activity; (2) 35 rounds of a three-step temperature cycle for denaturation at 94 • C, a fixed annealing temperature, and elongation at 68 • C; and (3) a final extension step at 68 • C. In a first attempt to solve the ineffective amplification, a touchdown PCR approach was used employing lower temperatures: a first annealing temperature above the projected Tm (Primer Melting Temperature), then transitioning to a lower, more permissive temperature over the course of 10-15 cycles [27]. Following this strategy, we tested an initial annealing temperature of 69 • C for 90 seconds transitioning to 60 • C after 10 cycles by a decrement of 1 • C per cycle. The second phase consisted of a generic amplification stage: 25 cycles at a Tm of 60 • C for 90 seconds. This approach did not improve the amplification results (in all PCR attempts both positive and negative controls were included and worked correctly). After deeper search in the literature, we applied the approach by Kushioka et al. [12], which considers a temperature of 98 • C for 10 seconds for denaturation (instead of a standard step at 94 • C) during the entire number of PCR cycles. We adapted this protocol to optimize our amplification reactions. Finally, we ended up with very strong and specific amplification bands by performing the key steps of denaturation at 98 • C for 20 seconds, and a Tm of 62 • C ( Figure S1).
With the modification and optimization of a traditional APOE genotyping protocol, we expect to contribute to make more straightforward the assessment of variants that define APOE isoforms. Moreover, by describing a step-by-step protocol for genotyping such a GC-rich sequence, it is our aim to help others in the process of analyzing similar regions, not only for research purposes, but also in routine practices ( Figure 2).

APOE Genetic Variation and LOAD ORs in Northern Portugal and in Castile and León Region, Spain
We tested conformity with the HWE expectations after Bonferroni correction for rs429358 and rs7412 in the Portuguese and Spanish control populations: no significant departures from equilibrium were found. Next, APOE allele and genotype frequencies were estimated for both patients and controls in each population group. Of the six possible APOE genotypes, only APOE2/APOE4 was not observed in any of the cohorts ( Table 2). Analyses of differentiation of allele and genotype distributions were computed within and between Portuguese and Spanish populations, for both cases and controls subgroups (Table 3). When analyzing cases versus controls, we found different allele and genotype distributions in both Portuguese and Spanish populations. We also found differences in allele frequencies between Portuguese and Spanish AD patients, which prevented us from pooling these populations in further genotype analyses.  To gain insight into the risk of developing LOAD as a function of the APOE alleles and genotypes in our Iberian populations, we estimated ORs for disease associated with APOE in our cohorts and compared to those calculated based on frequencies reported for a large Caucasian population [31] (Table 4). As expected, high ORs for LOAD were associated with APOE4 alleles (2.17; CI, 1.19-3.99, in the Portuguese cohort; and 1.57; CI, 0.65-3.80, in the Spanish cohort). Moreover, given the importance of studying sex-risk factors for AD, we estimated APOE4-associated ORs for males and females from both Iberian populations (Table 5). Curiously, in the Portuguese cohort, our results have shown APOE4 male carriers presenting slightly higher odds of developing LOAD (3.17; CI, 0.89-11.31) than APOE4 female carriers (2.49; CI, 1.04-5.93); in the Spanish cohort, values 1.50; CI, 0.36-6.17 and 6.49; CI, 0.79-53.57 were observed, respectively. These results should, however, be interpreted with caution given the non-significance of most estimated ORs and the small sample size of some analyzed subgroups. Table 3. Apolipoprotein E (APOE) allelic and genotypic differences of late onset Alzheimer's disease (LOAD) patients versus controls for Portuguese and Spanish cohorts. For allele distribution, we also tested differences of Portuguese versus Spanish for cases and controls. Significant p-values ± SD (α = 0.05) obtained from exact tests of differentiation are marked with an asterisk. After finding differences on APOE allele frequencies between Portuguese and Spanish patients (P = 0.03265), we have not performed further interpopulation analyses (marked in the table as NC: not calculated). N: Number of individuals or alleles when APOE genotypes or APOE alleles are analyzed, respectively.

Discussion
In this study, we developed an optimized protocol for APOE genotyping by Sanger sequencing and described a step-by-step procedure to overcome the inherent drawbacks of the high GC content in APOE_exon 4, where SNPs rs429358 and rs7412 (that define APOE alleles) are located. Then, we estimated APOE genotypic and allelic frequencies in our cohorts of LOAD patients from Northern Portugal and Castile and León (Spain), in comparison to geographically matched control populations.
The three major APOE alleles display a worldwide frequency of approximately 8%, 78%, and 14%, respectively; nevertheless, a non-random global distribution has been reported in several studies [15,[31][32][33][34]. In Europe, it has been observed a north-to-south gradient of APOE4 and APOE3 alleles, with frequency of APOE4 increasing and APOE3 allele frequency decreasing with latitude, whereas the observation of APOE2 seems to be independent of the European latitude [34]. Interestingly, based on linkage disequilibrium evidence, Seixas et al. [29] suggested that the evolution of APOE alleles in humans followed a E4->E3->E2 pathway, thus confirming APOE4 as the ancestral allele. In their study, the authors reported APOE frequencies in the general population from Northern Portugal (APOE2: 4.4%; APOE3: 88.2%; and APOE4: 7.4%; Table 2). Recently, their work was cited in a comprehensive and elucidative review on APOE pleiotropy by Belloy et al. [15]; however, Portugal was placed distant from other European countries with a considerably deviating APOE4 allele frequency. We herein take notice that frequencies reported in this review for the Portuguese population were mistakenly taken by those of the African population of São Tomé e Príncipe (APOE2: 10.0%; APOE3: 65.2%; and APOE4: 24.8%; Seixas et al., [29]). APOE frequencies in Portuguese and Spanish populations are rarely described in the literature, with a considerable gap on the comparison between AD patients and carefully selected controls. Our study was a prospective work with the aim of performing this comparison in the regions of Northern Portugal and Castile and León (Spain), to bridge the gap regarding Iberia and analyze whether or not APOE frequencies deviate from those reported for Caucasians.
APOE2 alleles have been shown to confer a protective effect for AD. In addition to decreased amounts of AD-related brain pathology and a later disease age-of-onset in patients, APOE2 variants are markedly underrepresented among affected individuals [35]. In our Portuguese subgroup, frequency of APOE2 is reduced to about half in the cohort of patients when compared to controls (4.1% versus 7.5%; OR, 0.60; CI, 0.25-1.48), supporting the protective effect of these alleles. A similar reduction has been observed in a large meta-analysis comprising 5107 AD patients and 6262 controls from different Caucasian populations (OR, 0.61; CI, 0.54-0.69) (Tables 2 and 4, [31]). On the other hand, frequency of APOE2 in the general population from Northern Portugal and in a non-AD Portuguese cohort has been reported as low as 4.4% [29] and 6.3% [28], respectively. When retrieving APOE frequencies from other studies, we must take into account the aim of each study and also bear in mind that, even in case-control studies, APOE frequencies observed in control cohorts may include young age individuals at the time of examination, who may develop LOAD further in life; when this happens, frequencies described in such controls are biased by the presence of presymptomatic patients. In our study, we tried to overcome this issue by selecting controls aged ≥68 years old, free from any sign of dementia, submitted to the Mini-Mental State Exam (values ≥ 26). Interestingly, the presence of this protective allele has been detected in a single LOAD patient (heterozygous APOE2/APOE3) in our Spanish cohort, which resulted in a more accentuated decrease of APOE2 in patients (0.6%) versus controls (8.6%), resulting in an OR of 0.07 (95% CI, 0.01-0.62); Table 4. This might be, however, explained by a random underrepresentation of rarer alleles, as a frequency of 6% has been previously reported for APOE2 among LOAD patients from Madrid, Spain (even analyzing a small sample size of 47 patients; [30]). On the other hand, when looking at APOE2 frequency among a geographically-matched control cohort, their results are not in line with a protective role ascribed for these alleles (6% in patients versus 5% in controls; Table 2). Again, the problem of including individuals examined at young age (controls ≥52 years old) can be raised, but in the study by Ibarreta et al. [30], the small sample size may also be underlying the difficult interpretation of results. Finally, the hypothesis that differences in detection methodologies may be causing some of these discrepancies cannot be discarded. To clarify whether such differences on APOE2 allele frequencies do occur at regional level, it would be important to extend this study to other Spanish regions.
Our results for APOE4 alleles-the strongest known genetic risk factor for LOAD-are similar to those previously reported in Caucasians [31]. In our Portuguese subgroup, the relative increase in APOE4 frequency in patients is twice of that in controls (24.3% versus 12.5%; OR, 2.17; CI, 1.19-3.99), slightly below the reported for Caucasians (36.7% versus 13.7%; OR, 3.51; CI, 3.29-3.75); Tables 3 and 4). In our Spanish subgroup, this increase is about 1.6-fold in AD patients (19.1%) when compared to controls (12.1%), very close to the ratio reported for Hispanic populations (19.2% versus 11%) [31]. Interestingly, in a study including subjects from the Spanish area of Madrid, a notorious increase of 8.5-fold in APOE4 allele frequency in LOAD patients (34%) has been reported when compared to controls (4%) ( Table 2, [30]). If, on one hand, E4 frequency in Madrilenian patients is close to that reported among Caucasians (36.7%), the remarkable low frequency in controls is hard to explain looking at frequencies found either in Caucasians (13.7%), in our Spanish controls (12.1%) or among controls from the area of Barcelona (11.1%) ( Table 2). The results should be interpreted with caution given the small sample size of the Madrid cohort (47 patients and 42 controls); nevertheless, it is very interesting to note such differences on APOE distribution at a regional level, highlighting the importance of these studies.
Globally, the most common isoform of the APOE protein is E3, believed to be neutral, not leading to an increased or decreased risk of developing AD. As expected, our results showed APOE3 allele frequencies as the highest when compared to those of other variants, both in patients and controls from the two studied regions. Similar frequencies of APOE3 in controls and patients are also in line with its neutral role in AD (Northern Portugal patients: 71.6%; controls: 80.0%; Castile and León patients: 80.2%; controls: 79.3%; Table 2).
In addition to the APOE genotype, sex-based differences in LOAD risk have been studied, but sex-dependent association of LOAD and APOE is still controversial and urges to be clarified. Several epidemiological studies have suggested that APOE4 female carriers displayed a higher risk for LOAD when compared to their male counterparts [35][36][37], but complexity is added by studies showing that this finding may be additionally dependent on age [38,39]. For instances, in a large analysis comprising data on~58,000 subjects from 27 independent studies in the Global Alzheimer's Association Interactive Network, men and women carrying the APOE4/APOE4 genotype had similar odds of developing AD across the age span of 55 to 85 years, with women showing an increased risk at younger ages [38]. In this study, we attempted to shed light into sex differences in APOE4 effects on LOAD by comparing odds between males and females carrying the risk APOE4 allele. We observed lower ORs in APOE4 female carriers in the Portuguese cohort, however, the small sample size in some subgroups may be underlying this result and the non-significant ORs observed.

Conclusions
This work held the goal of studying the genetic variation of APOE in a cohort of lateonset AD patients from Northern Portugal and Castile and León (Spain), in comparison to geographically matched control populations. The genetic characterization of APOE provides information on the landscape of AD in these regions based on the haplotype data obtained from APOE alleles at SNPs rs429358 and rs7412. We developed an optimized protocol for APOE genotyping through Sanger sequencing, overcoming the inherent drawbacks of high GC content in this genomic region. Our results on APOE frequencies are in line with previous reports, although differences regarding the rare APOE2 allele may be further explored by extending this study to other Iberian subgroups. Moreover, discrepancies on APOE4 allele frequencies reported in other studies of Portuguese and Spanish cohorts deserve a closer look. Finally, we have drawn attention to the importance of selecting carefully control individuals in AD studies given the late onset of this disease. By capturing a broader and accurate picture of APOE allelic variation in multiple populations (including at regional levels), we hope to contribute to a comprehensive evaluation of the utility of APOE in a clinical context. Funding: This work was supported by 'European Commission' and 'European Regional Development Fund' (FEDER) under the project "Análisis y correlación entre el genoma completo y la actividad cerebral para la ayuda en el diagnóstico de la enfermedad de Alzheimer" (Project 0378_AD_EEGWA_2_P), (Cooperation Programme INTERREG V-A Spain-Portugal POCTEP 2014-2020) and the COMPETE 2020-Operacional Programme for Competitiveness and Internationalisation (POCI), Portugal 2020. Portuguese funds are supporting this work through FCT-Fundação para a Ciência e a Tecnologia/Ministério da Ciência, Tecnologia e Inovação in the framework of the project "Institute for Research and Innovation in Health Sciences" (POCI-01-0145-FEDER-007274). IG, AML, SM, and NP are funded by FCT: CEECIND/02609/2017, IF/01262/2014, CEECIND/00684/2017, and through the Decreto-Lei n • 57/2016 de 29 de Agosto, respectively. Spanish funds are supporting this work through 'Ministerio de Ciencia e Innovación-Agencia Estatal de Investigación' and FEDER under project PGC2018-098214-A-I00 and by "CIBER en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN)" through "Instituto de Salud Carlos III" co-funded with FEDER funds.