Genomic Sequencing and Analysis of Eight Camel-Derived Middle East Respiratory Syndrome Coronavirus (MERS-CoV) Isolates in Saudi Arabia

Middle East respiratory syndrome coronavirus (MERS-CoV) causes severe respiratory illness in humans; the second-largest and most deadly outbreak to date occurred in Saudi Arabia. The dromedary camel is considered a possible host of the virus and also to act as a reservoir, transmitting the virus to humans. Here, we studied evolutionary relationships for 31 complete genomes of betacoronaviruses, including eight newly sequenced MERS-CoV genomes isolated from dromedary camels in Saudi Arabia. Through bioinformatics tools, we also used available sequences and 3D structure of MERS-CoV spike glycoprotein to predict MERS-CoV epitopes and assess antibody binding affinity. Phylogenetic analysis showed the eight new sequences have close relationships with existing strains detected in camels and humans in Arabian Gulf countries. The 2019-nCov strain appears to have higher homology to both bat coronavirus and SARS-CoV than to MERS-CoV strains. The spike protein tree exhibited clustering of MERS-CoV sequences similar to the complete genome tree, except for one sequence from Qatar (KF961222). B cell epitope analysis determined that the MERS-CoV spike protein has 24 total discontinuous regions from which just six epitopes were selected with score values of >80%. Our results suggest that the virus circulates by way of camels crossing the borders of Arabian Gulf countries. This study contributes to finding more effective vaccines in order to provide long-term protection against MERS-CoV and identifying neutralizing antibodies.


Introduction
Coronaviruses are a family of single-stranded, positive-sense RNA genomes that range from 26,000 to 32,000 basepairs in length. This family can infect both humans and animals, and has been found in many mammal hosts, including bats, camels, and pigs [1]. Coronaviruses are also known to have high genetic diversity influenced by mutation and recombination, which can lead to the development of new viruses. Before 2003, human coronaviruses were primarily known for causing common colds. In 2003, severe acute respiratory syndrome coronavirus (SARS-CoV) was first diagnosed in southern China as an epidemic, and went on to infect almost 8000 people in 29 countries, with a fatality rate of approximately 10% [2,3]. Almost ten years later, in 2012, another novel and deadly human coronavirus emerged, named the Middle East respiratory syndrome coronavirus (MERS-CoV); it infected more

Genome Organization
The consensus MERS-CoV genome obtained through analyzing isolates from eight infected camels from different locations across Saudi Arabia was 30,118 nt in length with a GC content of 41%. The eight constituent genomes shared >99% identity. The camel MERS-CoV genome is similar to that of human MERS-CoV, containing ten ORFs (ORF1ab, ORF3, spike [S], ORF5, ORF4a, ORF4b, envelope [E], membrane [M], nucleocapsid [N], and ORF8b). BLAST comparison to other camel and human MERS-CoV strains available from NCBI revealed that all share ∼99% identity with one another. The comparative analysis of homologous sequences of spike proteins and genes has been performed for the 31 Betacoronavirus strains ( Table 1). The MERS-CoV strains were nearly identical across their spike proteins and genes (>99.50% and >81%, respectively). Indicative of a very recent emergence into the human population.

Phylogenetic Analysis
Phylogenetic analysis was performed on the complete genomes and spike proteins of the eight new MERS-CoV sequences and 23 representative viruses from the genus Betacoronavirus. The eight camel-derived MERS-CoV sequences clustered in a distinct clade and were all relatively close to other MERS-CoV sequences from human and camel hosts ( Figure 1). The 2019-nCoV falls in a distinct clade and shows a closer relationship to Bat-SARS-Like (GU190215.1) than the MERS-CoV strains. In agreement with the complete genome results, analysis based on the spike proteins showed that 2019-nCoV has a closer relationship to Bat-SARS-Like and SARS-CoV strains than MERS-CoV strains ( Figure 2). We observed very high conservation between the new eight sequenced MERS-CoV genomes, with sequence identity >99%, despite the different geographical locations of sampling. In addition, a specific sample collected in 2015 from a human in Qatar was revealed to be more closely related to the eight new MERS-CoV isolates than to other UAE and Oman MERS-CoV strains. Genetically, the MERS-CoV genome can overall be considered more diverse than those of other betacoronavirus family members. Phylogenetic analysis of the MERS-CoV spike protein and its counterparts showed that the four clades formed four well-supported branches ( Figure 2).

MERS-CoV B Cell Epitope Analysis
B cell epitopes are significant in the defense of the immune system against viral diseases, allowing B cells to identify different viral infections and activate responses against them. There has been many studies aiming to design MERS-Cov vaccine [24][25][26]. In this study, epitope analysis of MERS-CoV showed that the MERS-CoV spike protein has 24 discontinuous regions in total, from which just six epitopes with score values of >0.8 could be selected. The higher the score, the greater the potential of a true discontinuous epitope; the highest probability obtained was 0.898. Table 2 details the peptides of these high-scoring discontinuous epitopes and their sequence locations, epitope lengths, and associated scores. Their positions on the 3D structure of the MERS-CoV spike protein (PDB ID: 5X5F) are illustrated in Figure 3.

Discussion
MERS-CoV is a zoonotic virus for which ongoing studies indicate major roles in transmission for direct contact with live camels and with humans having symptomatic MERS [27][28][29], with dromedary camels being considered intermediate hosts that are the main source of human infections. The second largest outbreak to date occurred in Saudi Arabia, and MERS-CoV is widespread among humans and camels in the Gulf countries [30], where it remains a potential hazard to regional and global health and welfare. The present study seeks to determine the evolutionary relationships between eight new MERS-CoV isolates obtained from Arabian camels in different regions of Saudi Arabia and 23 other previously sequenced betacoronavirus strains, based on phylogenetic trees constructed with BEAST for the complete viral genome and the spike protein.
In the phylogenetic analysis, the eight new strains clustered in a separate clade, exhibiting close relationships with strains obtained from MERS-CoV-infected camels and humans in Saudi Arabia, the United Arab Emirates (UAE), Qatar, and Oman. This finding supports the expectation that the virus circulates via camels crossing borders with Oman and other Gulf countries, and is also consistent with the report of another study that movement of camels between Saudi Arabia, the UAE, and Oman contributes to the spread of the virus throughout the Arabian Peninsula [31]. Based on these results, it is recommended that restrictions be implemented to prevent these animals from migrating between Gulf countries.
In conclusion, this study determined the evolutionary relationships of newly sequenced MERS-CoV, along with 23 previously sequenced isolates of the genus Betacoronavirus. We also performed genomic characterization of MERS-CoV, highlighting its origin and likely means of transmission through Gulf countries. In addition, we analyzed the MERS-CoV spike protein for suitable epitopic candidates and identified six viral peptides with good antigen presentation scores (>80%).

Sample Collection
The samples were collected from 30 dromedary camels at different locations in Saudi Arabia, following the guidelines of the Food and Agriculture Organization of the United Nations. The camel was appropriately restrained, and the camel nasal wing was then held and raised by the thumb and index finger, followed by gentle insertion of the nasal swab by the upper half of the nasal vestibule as deep as possible until the swab is handled. The swab was then laterally rotated, transferred into a tube containing virus transport media, and kept at 4 • C in preparation for PCR screening.

RNA Extraction
Total RNA was successfully extracted from eight samples of camel nasal fluids using a MagMax-96 viral RNA extraction kit (Thermo Fisher Scientific, Waltham, MA, USA) as per the manufacturer's protocol. The final DNAase-treated RNA product was stored at −80 • C.

Sequence Preparation
Eight complete genomes of MERS-CoV from C. dromedarius have been sequenced in our lab. The samples came from multiple geographical locations in Saudi Arabia: four from Jeddah city, one from Northern Saudi Arabia, two from Mecca city, and one from Riyadh city (Figure 4). For the sake of comparison, we also retrieved 21 complete genomes of betacoronavirus strains from the National Center for Biotechnology Information (NCBI) database (http://www.ncbi.nlm.nih.gov/genomes/ VirusVariation/Database/nph-select2.cgi), along with those of two 2019-nCoV strains isolated in Saudi Arabia (https://www.gisaid.org). In total, 31 betacoronavirus sequences were analyzed (Table 3).

RNA Quantification and cDNA Conversion
The concentration of each RNA sample was measured on a Qubit 3.0 using the Qubit RNA BR assay kit (Thermo Fisher Scientific, Waltham, MA, USA) and reverse transcribed with a SuperScript II cDNA kit (Thermo Fisher Scientific, Waltham, MA, USA). A minimum RNA concentration of 100 ng/uL was used for cDNA conversion.

Library Preparation and Sequencing
High-quality cDNA was subjected to Ampliseq library preparation as per the manufacturer's protocol using MERS Ampliseq Panels (Thermo Fisher Scientific, Waltham, MA, USA) with a mean insert size of 200 bp. Each library was assigned to a distinct barcode using the Ion Xpress Barcode Adapters 1-16 kit and purified using Agencourt AMpure XP beads (Beckman Coulter, Brea, CA, USA). Purified libraries were efficiently quantified on a StepOnePlus Real-Time PCR system using an Ion Library TaqMan quantification kit (Thermo Fisher Scientific, Waltham, MA, USA) as per the manufacturer's protocol. Libraries were normalized to 25 pM, and template-positive Ion Sphere particles (ISPs) were obtained by pooling all ten normalized libraries and performing clonal amplification on a OneTouch 2 system using the Ion PGM HiQ OT2 200 kit (Thermo Fisher Scientific, Waltham, MA, USA). Template-positive ISPs were then loaded on Ion 318 chips and sequenced on an Ion PGM instrument using the Ion PGM HiQ Sequencing kit (Thermo Fisher Scientific, Waltham, MA, USA). The resulting data were preprocessed (base calling, base quality recalibration, alignment and consensus sequence assembly) using Torrent Suite Server v 5.6. Ninety percent of the consensus sequences were obtained from this data; the remaining gaps were filled by Sanger sequencing on a 3730xl Genetic Analyzer using custom-designed primers (Macrogen LLC, Seoul, Korea) and BigDye Terminator v 3.1 Cycle Sequencing chemistry (Thermo Fisher Scientific, Waltham, MA, USA).

Gap Filling and Sanger Sequencing
To fill the gaps that remained after mapping Ion reads, we designed 11 primer sets (Table 4) against the NCBI MERS-CoV reference genome assembly (NC_019843.3). Primers were designed and checked using Primer Express software (v3.0) (Applied Biosystems, Foster City, CA, USA) and NetPrimer (Premier Biosoft, Palo Alto, CA, USA), with gap flanking regions incorporated into the primers. Sequences for all the major gaps were amplified by PCR using Platinum TM Hot Start PCR Master Mix 2X (Thermo Fisher Scientific, Waltham, MA, USA). PCR products were sequenced using the BigDye ® Terminator Sequencing Kit (Applied Biosystems, Foster City, CA, USA) on an Applied Biosystems 3730xl DNA Analyzer. Most minor gaps were sample-specific; to ensure complete genome coverage for phylogenetic analysis, these gaps were closed manually after performing multiple sequence alignment (MSA) on the fully-sequenced genomes. MSA was carried out using Geneious software [34]. Table 4. Primer sets used for filling sequence gaps.

Phylogenetic Trees and Evolutionary Dynamics
We first performed multiple sequence alignment of the 31 strains using MAFFT (v7) [35]. Next, we estimated the phylogenetic relationships using a time-framed Bayesian evolution analysis approach via a Markov Chain Monte Carlo (MCMC) inference method; this approach was implemented in the BEAST package (v1.8.2) [36]. For analysis of whole genomes, we used the GTR+I substitution model, a strict clock model, and a constant-size tree prior. For the spike protein dataset, we used the Blosum62+Gamma substitution model, a strict clock model, and a constant-size tree prior. MCMC analyses were run for 20 million iterations, sampling every 100 thousand iterations after a 10% burn-in. We assessed each run's convergence using Tracer (v1.6) [37]. The BEAST package TreeAnnotator was utilized to summarize a maximum clade credibility (MCC) tree for every dataset using MCMC tree samples. Phylogenetic clustering patterns were visualized and analyzed with FigTree (v1.4.3) (http://tree.bio.ed.ac.uk/software/figtree/).

Prediction of MERS-CoV B Cell Epitopes
The IEDB [38] and BCPRED [39] servers were utilized to predict B cell epitopes of the MERS-CoV spike protein. IEDB software can predict both linear and discontinuous epitopes using a protein 3D structure (PDB format) as input. The software associates each epitope with its score, which is defined as a Protrusion Index (PI) method. For example, score value 0.8 means the predicted epitope would include within 80% of the protein amino acids and 20% of the protein amino acids will be outside of the epitope. Only epitopes that were located on the outer surface of the protein were chosen. Six residues in length was regarded as adequate to prompt a defensive immune reaction. We focused on discontinuous antigenic epitopes as they are increasingly explicit and have higher dominant attributes over linear epitopes [40,41].