Genomic Characterization of a Novel Alphacoronavirus Isolated from Bats, Korea, 2020

Coronavirus, an important zoonotic disease, raises concerns of future pandemics. The bat is considered a source of noticeable viruses resulting in human and livestock infections, especially the coronavirus. Therefore, surveillance and genetic analysis of coronaviruses in bats are essential in order to prevent the risk of future diseases. In this study, the genome of HCQD-2020, a novel alphacoronavirus detected in a bat (Eptesicus serotinus), was assembled and described using next-generation sequencing and bioinformatics analysis. The comparison of the whole-genome sequence and the conserved amino acid sequence of replicated proteins revealed that the new strain was distantly related with other known species in the Alphacoronavirus genus. Phylogenetic construction indicated that this strain formed a separated branch with other species, suggesting a new species of Alphacoronavirus. Additionally, in silico prediction also revealed the risk of cross-species infection of this strain, especially in the order Artiodactyla. In summary, this study provided the genetic characteristics of a possible new species belonging to Alphacoronavirus.


Introduction
Coronavirus, a group of enveloped, positive single-stranded RNA of approximately 30 kb in length, belongs to the subfamily Orthocoronavirinae, family Coronaviridae, and is classified into four genera: Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus [1]. Alphacoronavirus has been recognized as the causative agent of mild respiratory syndromes in humans, such as the human coronavirus (HuCoV) NL63 and HuCoV 229E, and serious respiratory diseases in livestock [2]. Transmissible gastroenteritis coronavirus (TGEV), porcine epidemic diarrhea virus (PEDV), and recently, porcine enteric alphacoronavirus (PEAV) are the major viruses responsible for most of the pandemics in pigs, causing huge economic losses [3,4]. Betacoronavirus causes several deadly diseases in humans such as severe acute respiratory syndrome (SARS), Middle East respiratory syndrome (MERS), and Coronavirus Disease-19 (COVID-19) [2]. Most of these diseases have originated from wild animals. Due to their large genome size, high mutation rates, and recombination between homologous RNA regions, coronaviruses are considered as one of the most diversified viruses [5]. The extreme diversity of coronaviruses has been observed in rodents and bats worldwide [6][7][8]. In detail, almost all corona positive samples collected in Asia, Africa, and North America were bats which account for 91/100 different taxonomic units [6]. Three different coronavirus groups were observed in 16 rodent species belonging to seven different genera. In addition to the two well-known reservoirs mentioned above, coronavirus was also prevalent in rabbits and hedgehogs in France [7]. Although the study focused on the highly conserved region of Rna-dependent RNA polymerase (RdRp) encoding genes, many potentially new Coronavirus species have been detected in wild animals around the world [9][10][11]. Recently, six novel species/variants were obtained in rodents [12] and bats [11] in China. Bat species' richness, for instance, was correlated with the diversity of coronavirus [6]. The diversity of coronaviruses and high recombinant rates, on the other hand, increased the risk of host switch and ecological niche adaptations [13]. The sequence analysis of multiple complete genomes of camel coronavirus revealed a trace of rabbit coronavirus and rodent coronavirus [14]. Additionally, viral quasispecies followed by selection might have played an important role for coronavirus during the new host adaptation [15].
Bats are unique mammals (with distinct characteristics such as a long lifespan, being the only mammal capable of actually flying, and a gregarious nature), and they are widely distributed worldwide. As a result, they come into easy contact with other animals. Bats are important original reservoirs for a vast number of zoonotic viruses, which cause serious infections in humans and livestock [16]. Several strains of Alphacoronavirus and Betacoronavirus detected in bats have been known to induce diseases such as COVID-19, responsible for the current pandemic, as well as many serious infectious diseases in livestock [17,18]. In fact, more than one-third of viruses detected in bats belong to the Coronaviridae family [19]. The prevalence of coronaviruses in bats was 15.2% in Korea [20] and 6.8% in China [11]. Corona infection rates of bats were estimated at 1.7% in Gabon [21] and 3.7% in Brazil [22]. Host restriction of the bat coronavirus is still under debate. Some studies suggest that the coronavirus might be restricted to the bat genus or below level [23,24], . A genetic analysis of Rdrp encoding fragments of bat coronavirus in Northern Germany indicated that closely related coronavirus strains were more likely associated with the bat species than with the location of the sampling sites [25]. On the other hand, others supported the wide host range of coronavirus groups [5,9]. BatCoV HKU10 strains detected in different bat species shared a highly similar sequence throughout the genome, except the genes encoding for the spike protein, which contributed to the new host adaptation [26]. Similar phenomena were also observed in the cases of SARS-CoV2 infections in humans and SARS detected in bats [27].
Therefore, it is necessary to continue active surveillance and genetic analysis of newly detected coronaviruses in bats. In Korea, previous studies based on partial RdRp indicated that bats contained a diversity of alpha-and beta-coronaviruses [20,28]. However, a genomic-based approach provides more in-depth analysis into the diversity of bat coronaviruses in terms of genetic variation. As a result, we describe the complete genomic characteristics of HCQD-2020, a novel Alphacoronavirus species isolated from a Korean bat species, Eptesicus serotinus.

Sampling, RNAExtraction and RT-PCR
From July to September 2020, six carcasses of different microbat species (Eptesicus serotinus, Myotis petax, M. ikonnibovi, and Pipistrellus abramus) were collected from Kangwon and Gyeongbuk provinces (Table S1). Samples were kept in ice packages and then transferred to the College of Veterinary Medicine, Seoul National University. Organs of the bat carcasses (lung, intestine, and liver) were homogenized in 1 ml of Dulbecco's Modified Eagle Medium (DMEM) followed by three cycles of freeze-thaw procedure. An amount of 150 µL of this homogenized solution was used for RNA extraction using a DNA/RNA extraction kit (Intron Biotech, Gyeonggi-do, Korea) according to the manufacturer's protocol. Previously published pancoronavirus primers [29] were applied for screening the presence of coronavirus in the bat samples (Table S2). RT-PCR reactions using the TOPScript Onestep RT-PCR kit (Enzynomics, Daejeon, Korea) were performed under the condition of initial heating of 50 • C for 30 min, 95 • C for 10 min; followed by 40 cycles of 95 • C for 30 s, 55 • C for 30 s and 72 • C for 30 s; and a final elongation step at 72 • C for 7 min. Corrected bands were purified by gel extraction followed by directed DNA sequencing.

Whole-Genome Sequencing, Genome Assembly, and Annotation
To prepare RNA samples for next-generation sequencing (NGS), 0.5 mL of the homogenized solution was treated with 10 µL of RNase (4 mg/mL) (Biosesang, Gyeonggi-do, Korea) and 10 µL of DNase (10 U/µL) (Promega, Madison, WI, USA) for 30 min. The nuclease-treated solution was filtered through a 0.2 µm filter (Sartorius, Goettingen, Germany). Finally, particle-associated RNA was extracted as described above. The RNA sample was sent to Macrogen for NGS using a library of 346 bp in size.
Raw data of 101 bp pair-end sequencing was filtered to remove the low-quality base calling by FastQC using the recommended parameters. Filtered reads were assembled de novo using SPAdes software [30]. A scaffold related to coronavirus was detected by Blastn by comparing with the coronavirus database. Next, the 3 -end sequencing was performed as described elsewhere [31].

Sequence Alignment and Phylogenetic Construction
Recombinant events were commonly detected and continuously played roles in the evolution of coronaviruses [35,36]. For the purpose of classification with previously known, well-defined viruses of Alphacoronavirus genus, this study did not perform recombination analysis prior to phylogenetic reconstructions. All phylogenetic trees were inferred based on the whole genome, structural and nonstructural protein-encoding genes rather than the genomic fragment in between the predicted breakpoints.

Potential Host Prediction
In order to investigate the cross-infection of this strain, an online web tool (available at http://host-predict.cvr.gla.ac.uk/ accessed on 2 May 2021) was used to predict the potential reservoir host [41]. A model combining genomic biases and phylogenetic neighborhood was applied in this study for greater accuracy. In detail, the coding sequence of all putative genes and the whole genome of HCQD-2020 strain were used for prediction. The results are represented as a box-plot graph displaying the min, 25th percentile, median, 75th percentile, and max probability scores of each group of reservoir hosts. The higher the score, the more significant the probability that a group of hosts acted as a reservoir.

Coronavirus Detection in Bat Samples
To examine the presence of coronavirus in the bat samples collected in this study, organ samples from each bat including the lungs, intestine, and liver were applied for RNA extraction followed by RT-PCR using the pan-CoV primers. Of these, only the intestinal sample from E. serotinus collected from Gyeongbuk exhibited a single band of 440 bp as expected. All other samples were negative with coronavirus. Therefore, we extracted this band for Sanger sequencing. A phylogenetic analysis indicated that this isolate belonged to the Alphacoronavirus genus ( Figure S1).

Whole-Genome Assembly and Annotation
Whole-genome sequencing using the Illumina platform was carried out to further analyze the genomic characteristics of the isolate detected in this study. A total of 7.2 Gbps with percentages of high-quality base calling of 98.74% and 96.46% for Q20 and Q30 was obtained. A near complete genome (28,752 nucleotides excluding the poly-A tail with the average depth of 30X) of the alphacoronavirus strain HCQD-2020 (GenBank accession number: MW924112) was obtained and annotated. Sequence annotation showed that this strain contains seven common open reading frames (ORFs) in the typical order 5 -UTR-ORF1ab-S-ORF3-E-M-N-ORF7-3 -UTR ( Figure 1). Hexanucleotide transcriptional regulatory sequences (TRSs) required for the transcription of complete and subgenomic RNA were also identified (Table 1). Additionally, the putative signal sequences, including a partial 5 -UTR, a 3 -UTR, and a coronavirus frameshifting stimulation element, conserved the slippery sequence ( Table 2). The characteristics of putative nonstructural proteins (NSP) 1-16 are described in Table 3. The appearance of a small ORF (or ORFs), with unknown function, (normally named as ORF7) downstream of the nucleocapsid encoding gene has been reported in some other species belonging to this genus [42]. Nevertheless, in this study, neither Blastn nor Blastp revealed homology sequences for the putative ORF7 of the HCQD-2020 strain.
percentile, median, 75 th percentile, and max probability scores of each group of reservoir hosts. The higher the score, the more significant the probability that a group of hosts acted as a reservoir.

Coronavirus Detection in Bat Samples
To examine the presence of coronavirus in the bat samples collected in this study, organ samples from each bat including the lungs, intestine, and liver were applied for RNA extraction followed by RT-PCR using the pan-CoV primers. Of these, only the intestinal sample from E. serotinus collected from Gyeongbuk exhibited a single band of 440 bp as expected. All other samples were negative with coronavirus. Therefore, we extracted this band for Sanger sequencing. A phylogenetic analysis indicated that this isolate belonged to the Alphacoronavirus genus ( Figure S1).

Whole-Genome Assembly and Annotation
Whole-genome sequencing using the Illumina platform was carried out to further analyze the genomic characteristics of the isolate detected in this study. A total of 7.2 Gbps with percentages of high-quality base calling of 98.74% and 96.46% for Q20 and Q30 was obtained. A near complete genome (28,752 nucleotides excluding the poly-A tail with the average depth of 30X) of the alphacoronavirus strain HCQD-2020 (GenBank accession number: MW924112) was obtained and annotated. Sequence annotation showed that this strain contains seven common open reading frames (ORFs) in the typical order 5′-UTR-ORF1ab-S-ORF3-E-M-N-ORF7-3′-UTR ( Figure 1). Hexanucleotide transcriptional regulatory sequences (TRSs) required for the transcription of complete and subgenomic RNA were also identified (Table 1). Additionally, the putative signal sequences, including a partial 5′-UTR, a 3′-UTR, and a coronavirus frameshifting stimulation element, conserved the slippery sequence ( Table 2). The characteristics of putative nonstructural proteins (NSP) 1-16 are described in Table 3. The appearance of a small ORF (or ORFs), with unknown function, (normally named as ORF7) downstream of the nucleocapsid encoding gene has been reported in some other species belonging to this genus [42]. Nevertheless, in this study, neither Blastn nor Blastp revealed homology sequences for the putative ORF7 of the HCQD-2020 strain.     Alphacoronavirus indicated that the most similar regions of HCQD-2020 with other known species were in Nsp12, Nsp13, and Nsp14 ( Figure 3B) atapproximately 80%, which is far below the cutoff value of a new species at 90% amino acid identity according to the International Committee on Taxonomy of Viruses (ICTV). In other regions, the amino acid sequence's similarity was under 75% (Figure 3A,B). More specifically, Nsp3 was the most distantly related between HCQD-2020 and other strains, with amino acid identity ranking from 27 to 52%; followed by Nsp5, with a rank of 40-67% ( Figure 3A). On the other hand, conserved regions located in ORF1b were more conserved between the present strain and others, with differences of about 19-45% ( Figure 3B). Phylogeny based on the wholegenome sequence also indicated that our strain is distantly related to other known species belonging to Alphacoronavirus (Figure 4, Figure S2). This result suggests that HCQD-2020 might be a new species belonging to Alphacoronavirus.

Phylogenetic Analysis Suggesting That HCQD-2020 Strain Might be a Novel Species Belonging to the Alphacoronavirus Genus
A whole-genome comparison indicated that our strain was mostly related to BatCoV Anlong57 (KY770851) and SAX2011 (NC_028811) with percentage identity values of 56.1% and 55.7%, respectively ( Figure 2). Additionally, an amino acid comparison of seven highly conserved regions in replicate proteins of HCQD-2020 with other members of Alphacoronavirus indicated that the most similar regions of HCQD-2020 with other known species were in Nsp12, Nsp13, and Nsp14 ( Figure 3B) atapproximately 80%, which is far below the cutoff value of a new species at 90% amino acid identity according to the International Committee on Taxonomy of Viruses (ICTV). In other regions, the amino acid sequence's similarity was under 75% (Figure 3A and 3B). More specifically, Nsp3 was the most distantly related between HCQD-2020 and other strains, with amino acid identity ranking from 27 to 52%; followed by Nsp5, with a rank of 40-67% ( Figure 3A). On the other hand, conserved regions located in ORF1b were more conserved between the present strain and others, with differences of about 19-45% ( Figure 3B). Phylogeny based on the whole-genome sequence also indicated that our strain is distantly related to other known species belonging to Alphacoronavirus (Figure 4, Figure S2). This result suggests that HCQD-2020 might be a new species belonging to Alphacoronavirus.     Further classifications were conducted based on the topology of phylogeny constructed on the basis of two main nonstructural protein-encoding genes, ORF1a and ORF1b, and four main structural protein-encoding genes. Except for the highly similar topological trees based on ORF1a and ORF1b ( Figure 5A and 5B), the remaining phylogenetic trees revealed the change in the position of the HCQD-2020 strain within the Alphacoronavirus genus ( Figure 5C to 5F). Even so, this strain was placed on a separate branch from the other known species of Alphacoronavirus, further supporting that HCQD-2020 is likely a novel species. Further classifications were conducted based on the topology of phylogeny constructed on the basis of two main nonstructural protein-encoding genes, ORF1a and ORF1b, and four main structural protein-encoding genes. Except for the highly similar topological trees based on ORF1a and ORF1b ( Figure 5A,B), the remaining phylogenetic trees revealed the change in the position of the HCQD-2020 strain within the Alphacoronavirus genus ( Figure 5C-F). Even so, this strain was placed on a separate branch from the other known species of Alphacoronavirus, further supporting that HCQD-2020 is likely a novel species.

In Silico Cross-Species Infectious Ability Examination
Bat coronaviruses are mostly significant due to their risk of zoonotic diseases. In this study, we applied an in silicoanalysis to predict the potential infection of this virus in another host. The results indicated that, in addition to its natural reservoir of microbats (Vespertilioniformes), this strain can also infect other hosts belonging to the order Artiodactyla ( Figure 6) with equal probability. In detail, the q1, median, and q3 of the probability scores of the Artiodactyla host group were 0.03, 0.13, and 0.43, respectively, while the corresponding values for the Vespbat host group were 0.02, 0.1, and 0.44. These results indicate the risk of cross-species infection of the HCQD-2020 strain.
Viruses 2021, 13, x FOR PEER REVIEW 10 of 14 Bat coronaviruses are mostly significant due to their risk of zoonotic diseases. In this study, we applied an in silicoanalysis to predict the potential infection of this virus in another host. The results indicated that, in addition to its natural reservoir of microbats (Vespertilioniformes), this strain can also infect other hosts belonging to the order Artiodactyla ( Figure 6) with equal probability. In detail, the q1, median, and q3 of the probability scores of the Artiodactyla host group were 0.03, 0.13, and 0.43, respectively, while the corresponding values for the Vespbat host group were 0.02, 0.1, and 0.44. These results indicate the risk of cross-species infection of the HCQD-2020 strain.

Discussion
The bat is considered to be a source of several viral pathogens transmissible to humans and livestock. The first evidence of rabies virus transmission from bats was reported in 1921 [43]. After that, an increasing number of human and animal viral diseases such as the Hendra virus [44], Nipah virus [45], and pteropine orthoreovirus [43] were detected in bats. Since the development of sequencing technology, as well as the emergence of the deadly pathogen SARS-CoV, studies related to the diversity of bat virome, including members with risks of zoonotic infection belonging to Coronaviridae, Paramyxoviridae, Reoviridae, Rhabdoviridae, and Filoviridae [19], have been elucidated. Climate change and human activities result in close contact between wild animals and humans, consequentially increasing the risk of host transmission of viruses. In brief, serological evidence revealed the multi-infection of SARS-related coronavirus from bats to humans [46,47]. Therefore, active vigilance against bat-borne viruses with added attention to the coronavirus is essential in the prevention of other widespread zoonotic diseases.
In this study, several species belonging to genera Eptesicus, Myotis, and Pipistrellus have been investigated for the presence of coronavirus. These species share their habitat niches with other wild and/or livestock animals, thereby increasing the risk of crosscontamination to humans involving any of the viruses they carry. Of these samples, a distantly genetically related viral isolate belonging to Alphacoronavirus was detected in E. serotinus. This result further contributed to the genetic diversity of bat coronavirus in general and Alphacoronavirus in particular.
It is generally accepted that the Alphacoronavirus genus is extremely diverse. To date, 19 different species belonging to 14 sub-genera of Alphacoronavirus have been officially

Discussion
The bat is considered to be a source of several viral pathogens transmissible to humans and livestock. The first evidence of rabies virus transmission from bats was reported in 1921 [43]. After that, an increasing number of human and animal viral diseases such as the Hendra virus [44], Nipah virus [45], and pteropine orthoreovirus [43] were detected in bats. Since the development of sequencing technology, as well as the emergence of the deadly pathogen SARS-CoV, studies related to the diversity of bat virome, including members with risks of zoonotic infection belonging to Coronaviridae, Paramyxoviridae, Reoviridae, Rhabdoviridae, and Filoviridae [19], have been elucidated. Climate change and human activities result in close contact between wild animals and humans, consequentially increasing the risk of host transmission of viruses. In brief, serological evidence revealed the multi-infection of SARS-related coronavirus from bats to humans [46,47]. Therefore, active vigilance against bat-borne viruses with added attention to the coronavirus is essential in the prevention of other widespread zoonotic diseases.
In this study, several species belonging to genera Eptesicus, Myotis, and Pipistrellus have been investigated for the presence of coronavirus. These species share their habitat niches with other wild and/or livestock animals, thereby increasing the risk of crosscontamination to humans involving any of the viruses they carry. Of these samples, a distantly genetically related viral isolate belonging to Alphacoronavirus was detected in E. serotinus. This result further contributed to the genetic diversity of bat coronavirus in general and Alphacoronavirus in particular.
It is generally accepted that the Alphacoronavirus genus is extremely diverse. To date, 19 different species belonging to 14 sub-genera of Alphacoronavirus have been officially accepted by ICTV. In this study, a whole-genome comparison indicated that the HCQD-2020 strain was distantly related to other known species of Alphacoronavirus (Figure 2). Genomebased and functionalgene-based phylogeny constructions also indicated that this strain formed a separate branch in phylogenetic trees (Figures 4 and 5). Recent metagenomic studies of bat virome revealed several potential novel species within Alphacoronavirus detected in bats around the world [27,[48][49][50]. This result, along with other up-to-date studies, once again supported the genetic heterogeneity of this genus.
All members of this genus have similar genomic organization, containing ORF1ab-S-ORF3-E-M-N [42]. Furthermore, additional ORFs located downstream of the nucleocapsidencoding gene were also observed in many species of this genus such as TGEV, BatCoV-HKU2, BatCoV-512, and Shrew coronavirus [12]. In addition to the common ORFs found in other Alphacoronavirus's members, a putative ORF7 was found at the 3 terminator of HCQD-2020 s genome (Figure 1). Its sequence at the amino acid level was not homologous with any of the known protein sequences. It should also be noted that this putative ORF was likely the most distinct ORF of the currently known alphacoronavirus [12].
Alphacoronavirus contains several harmful viruses such as TGEV, PEDV, and PEAV that cause serious economic losses in pig production [4]. The last two species were considered to originate from bats [51,52]. Evidence of the host jumping of coronavirus from bat to other species belonging to even-toed ungulate animals was characterized in the case of PEAV, which shares high nucleotide identity (approximately 95% sequence similarity) with bat-HKU2 strains [53]. In this study, an in silico analysis indicated that HCQD-2020, a distantly related species belonging to Alphacoronavirus, can infect another host, especially those in the order Artiodactyla, which include some species such as camels and pigs ( Figure 6). Camels were previously determined as the intermediate hosts of the MERS virus [54,55]. Focusing on the genus Alphacoronavirus, strains that were closely related to the human alphacorornavirus E229 were detected in domestic camels [56]. Recently, a novel alphacoronavirus belonging to the species Alphacoronavirus I that is usually found in pigs, dogs, and cats was detected in children with pneumonia in Malaysia [57]. Therefore, it is important to investigate potential hosts besides bats for newly detected coronaviruses.

Conclusions
In summary, this study reported and described the nearly complete genome of an Alphacoronavirus species originating from bats. Based on the low sequence identity, the presence of a putative ORF7 with no homology to any known genes in Genbank, and distant relation to other representative species of Alphacoronavirus, the HCQD-2020 strain was proposed as a novel strain of this genus. In silicoanalyses suggested that this newly identified strain of coronavirus could infect other hosts, not limited to bats. Future studies should focus on understanding the diversity of coronavirus and the probability of host jumping.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/v13102041/s1, Figure S1: Initial classification of HCQD-2020 strain within subfamily Coronavirinae using the conserved region of Rdrp encoding fragment, Figure S2: Phylogenetic tree construction based on whole genome sequence analysis, Table S1: Information of bat samples collected in this study, Table S2: List of primers using in this study.