Complete Genome and Molecular Characterization of a New Cyprinid Herpesvirus 2 (CyHV-2) SH-01 Strain Isolated from Cultured Crucian Carp

Cyprinid herpesvirus 2 (CyHV-2) is a causative factor of herpesviral hematopoietic necrosis (HVHN) in farmed crucian carp (Carassius carassius) and goldfish (Carassius auratus). In this study, we analyzed the genomic characteristics of a new strain, CyHV-2 SH-01, isolated during outbreaks in crucian carp at a local fish farm near Shanghai, China. CyHV-2 SH-01 exhibited a high sensitivity to goldfish and crucian carp in our previous research. The complete genome of SH-01 is 290,428 bp with 154 potential open reading frames (ORFs) and terminal repeat (TR) regions at both ends. Compared to the sequenced genomes of other CyHVs, Carassius auratus herpesvirus (CaHV) and Anguillid herpesvirus 1 (AngHV-1), several variations were found in SH-01, including nucleotide mutations, deletions, and insertions, as well as gene duplications, rearrangements, and horizontal transfers. Overall, the genome of SH-01 shares 99.60% of its identity with that of ST-J1. Genomic collinearity analysis showed that SH-01 has a high degree of collinearity with another three CyHV-2 isolates, and it is generally closely related to CaHV, CyHV-1, and CyHV-3, although it contains many differences in locally collinear blocks (LCBs). The lowest degree of collinearity was found with AngHV-1, despite some homologous LCBs, indicating that they are evolutionarily the most distantly related. The results provide new clues to better understand the CyHV-2 genome through sequencing and sequence mining.


Introduction
Crucian carp (Carassius carassius) is one of the most widely farmed freshwater fish species of the Cyprinidae family in China, alongside goldfish (Carassius auratus), and the annual worldwide production reached 2748.6 thousand tonnes in 2020 (FAO, 2022). Viral diseases are common in aquaculture, including the crucian carp farming industry, and they cause serious harm to wild lower vertebrate populations worldwide.
Previously, we isolated a new strain, CyHV-2 SH-01, during outbreaks in crucian carp at a local fish farm near Shanghai, China and confirmed that goldfish also showed high susceptibility with symptoms including acute gill hemorrhages and high mortality, similar to HVHN caused by SH-01 isolated from crucian carp [27]. To further explore the genetic properties and potential molecular pathogenic mechanisms of CyHV-2, we sequenced and analyzed the genome of CyHV-2 SH-01 in the present work.

Isolation of Virus and DNA Extraction
In a previous work in our laboratory (Shanghai Ocean University), moribund crucian carp (13-15 cm in length) were collected during a disease outbreak in a fish farm near Shanghai, China, and diseased tissues including kidney, spleen, muscle, and blood were collected for testing and were identified as HVHN caused by CyHV-2, and the strain named SH-01 was isolated. Then, DNA was extracted from purified viral particles using the TIANamp Genomic DNA Kit (DP304, Tiangen, Beijing, China) according to the manufacturer's instructions, and PCR amplification was performed using the PrimeSTAR ® Max system (R045Q, TaKaRa, Beijing, China) for sequencing, as described previously reported [27].

Analysis of the Genome Structure and Molecular Characterization of CyHV-2 SH-01
The frequency of codon usage in the SH-01 genome was analyzed using the CUSP program (https://www.bioinformatics.nl/cgi-bin/emboss/cusp accessed on 17 June 2022). The genome map of SH-01 was drawn with Adobe Illustrator 2021 (Adobe, San Jose, CA, USA). A graph of the sequence lengths of the amino acids of proteins encoded by open reading frames (ORFs) on the X-axis and the number of proteins per length on the Y-axis was calculated by Geneious Prime v2022.2.1 (Biomatters, Auckland, New Zealand).

Comparison of Genomic Structure and Evolutionary Relationships among SH-01 and the Other Seven Strains
The sequence identities of the genomes and ORFs (or CDS) among SH-01 and another six closely related strains in the genus Cyprinivirus and CaHV (the classification status has not been clarified according to the ICTV 2021) were aligned through MAFFT by Geneious Prime v2022.2.1 (Biomatters, Auckland, New Zealand).
The evolutionary patterns among the homologous or heterologous regions of the genomes of eight isolates including SH-01 were analyzed by Mauve alignment in DNAS-TAR Lasergene v17.3 (DNASTAR, Madison, WI, USA). Furthermore, the comparison of locally collinear blocks (LCBs) among SH-01, CyHV-1, and CyHV-3 was conducted using the progressive Mauve algorithm in Geneious Prime v2022.2.1 (Biomatters, Auckland, New Zealand). Then, a phylogenetic tree was constructed based on the amino acid sequences of helicase (ORF71) using the neighbor-joining method in MEGA v11 (https: //megasoftware.net accessed on 10 July 2022) with bootstrap values of 1000 replications.

Genome Structure and Composition
We obtained a total of 46,723,798 raw readings (7,008,569,700 raw bases) by highthroughput sequencing (HTS), and after removing low-quality data, 46,357,846 clean readings (6,461,792,984 clean bases) remained, with a G+C content of 59.50%. Then, we successfully assembled the complete genome sequence of CyHV-2 SH-01 and submitted it to GenBank (Accession No. BankIt2436221).
The genome of CyHV-2 SH-01 is a linear double-stranded DNA, 290,428 bp in length, with an overall G+C content of around 51.60%, including a unique (U) and terminal repeat (TR) region at both ends. We analyzed the frequency of codon usage in the SH-01 genome using the CUSP program, and the results showed that the coding GC content of SH-01 open reading frames (ORFs) was 51.64%, while the GC content of 1st, 2nd, and 3rd letters in the triplet codons were 52.61%, 52.38%, and 49.93%, respectively. The SH-01 genome contains 154 predicted ORFs, of which four duplicated ORFs (ORF5, ORF6, ORF7, and ORF8) are located in TR ( Figure 1A), similar to ST-J1 [33]. ORFs encode proteins ranging in length from 63 (ORF106) to 4123 (ORF62) amino acids (aa), with an average length of 527.37 aa ( Figure 1B and Table S1). Thirty-two ORFs contain introns, of which ORF79 has four introns, ORF6, ORF33, and ORF3 have three introns, and the other 28 ORFs have one or two introns (Table S1). In line with ST-J1 and SY, there are 86 ORFs located on the positive strand in SH-01 and 68 ORFs on the negative strand. Additionally, there are seven core ORFs (ORF19, ORF55, ORF72, ORF88, ORF92, ORF93, and ORF142) in the SH-01 genome ( Figure 1A and Table S1), of which ORF72 and ORF92 are significantly conserved among alloherpesviruses.

Chronological Characteristics of Gene Expression
Gene expression during lytic replication of herpesviruses is characterized by a distinct chronological sequence involving three main temporal phases, immediate-early (IE), early (E), and late (L) genes, and expression patterns are the result of complex interactions between viruses and cytokines at the transcriptional and post-transcriptional levels, as well as structural differences in the promoters (cis-versus trans-acting elements) among the three types of genes [34,35]. Similar to herpesviruses, with reference to CyHV-2 ST-J1 [36], we marked the five IE (red), 34 E (green), and 39 L (blue) genes in the genome map of SH-01 ( Figure 1A and Table S1). Further understanding of the chronological characteristics of the gene expression of CyHV-2 could provide insight into the viral replication mechanisms and interactions with hosts.

Features of the Predicted Functional Protein Encoded by SH-01
Among the 154 ORFs of CyHV-2 SH-01, 26 ORFs encoding proteins were predicted to possess a SP sequence, and seventy-five putative proteins were predicted to possess one or more TMDs. All 154 predicted proteins were also analyzed through the CDD database, and 55 putative proteins were predicted to contain one or more conserved domains (Table S1). These results revealed that six of the ORFs (ORF25C, ORF34, ORF52, ORF119, ORF127, and ORF151A) encoding proteins contain an SP but no TMD, suggesting that these proteins are secreted. Notably, ORF64, ORF114, ORF152A, ORF16, and ORF153B have 10,9,8,7, and 6 TMDs, respectively, and these were predicted to be important membrane proteins of SH-01, similar to CyHV-3 [33]. As shown in Table S1, there are 10 putative TMDs in ORF64, similar to the 12 TMDs in ORF64 of CyHV-3, indicating a nucleoside transporter domain or similar, suggesting that the protein may be essential for CyHV-2 replication, but this needs to be verified experimentally. In addition, there are three ORFs (ORF41, ORF144, and ORF150) with a really interesting new gene (RING) domain that have a ubiquitin and possess protein transferase activity, one ORF (ORF4) belonging to the tumor necrosis factor receptor (TNFR) family, and 143 unclassified ORFs ( Figure 1A and Table S1).
Furthermore, ORF71 encodes a DEAD (Asp-Glu-Ala-Asp)-like helicase-primase subunit, in which the N-terminal domain contains an ATP-binding region involved in ATPdependent RNA or DNA unwinding, and ORF79 encodes a putative DNA polymerase catalytic subunit, which functions in viral replication. Additionally, ORF19 and ORF140 encode nucleotide kinases; ORF122 and ORF123 encode trimeric dUTP diphosphatases that function in the preservation of chromosomal integrity; ORF51, ORF68, and ORF91 possess structural maintenance of chromosomes (SMC) domains; ORF141 and ORF142 encode ribonucleotide reductases; and ORF52 and ORF55 encode a ribonuclease and thymidine kinase, respectively, which are involved in DNA degradation and provide a nucleotide feedstock for viral DNA synthesis. ORF68 encodes SbcC [37], part of an exonuclease complex involved in DNA metabolism, replication, recombination, and damage repair, as well as signal transduction and immune responses, and there is a macrodomain in ORF68 and ORF69 with the same function. ORF128 encodes rad18 (DNA repair protein) that plays a role in DNA metabolism, replication, recombination, and repair. ORF129 has a putative SPRY (B30.2) domain, part of tripartite motif 5α (TRIM5α), considered to be a specific anti-retroviral determinant [38]. ORF145 has an RNA recognition motif (RRM) involved in post-transcriptional gene expression processes including mRNA and rRNA processing, RNA export, and RNA stability [39]. ORF28A, ORF146, and ORF147 encode proteins with immunoglobulin (Ig) domains and are the members of the Ig superfamily. ORF104 encodes a protein kinase (PKc-like) associated with protein phosphorylation and regulation of many cellular signaling pathways. ORF20 and ORF28 contain an NAD(P)(+)-binding (NADB) Rossmann-fold domain related to redox metabolic pathways. ORF98 encodes an uracil-DNA glycosylase (UDG)-like protein, which initiates the repair of uracil bases in DNA and maintains the integrity of genetic information. Moreover, ORF62 contains an ovarian tumor (OTU) domain, which may be involved in the suppression of the innate immune responses of hosts [40].
We obtained a total of 46,723,798 raw readings (7,008,569,700 raw bases) by highthroughput sequencing (HTS), and after removing low-quality data, 46,357,846 clean readings (6,461,792,984 clean bases) remained, with a G+C content of 59.50%. Then, we successfully assembled the complete genome sequence of CyHV-2 SH-01 and submitted it to GenBank (Accession No. BankIt2436221).
The genome of CyHV-2 SH-01 is a linear double-stranded DNA, 290,428 bp in length, with an overall G+C content of around 51.60%, including a unique (U) and terminal repeat (TR) region at both ends. We analyzed the frequency of codon usage in the SH-01 genome using the CUSP program, and the results showed that the coding GC content of SH-01 open reading frames (ORFs) was 51.64%, while the GC content of 1st, 2nd, and 3rd letters in the triplet codons were 52.61%, 52.38%, and 49.93%, respectively. The SH-01 genome contains 154 predicted ORFs, of which four duplicated ORFs (ORF5, ORF6, ORF7, and ORF8) are located in TR ( Figure 1A), similar to ST-J1 [33]. ORFs encode proteins ranging in length from 63 (ORF106) to 4123 (ORF62) amino acids (aa), with an average length of 527.37 aa ( Figure 1B and Table S1). Thirty-two ORFs contain introns, of which ORF79 has four introns, ORF6, ORF33, and ORF3 have three introns, and the other 28 ORFs have one or two introns (Table S1). In line with ST-J1 and SY, there are 86 ORFs located on the positive strand in SH-01 and 68 ORFs on the negative strand. Additionally, there are seven core ORFs (ORF19, ORF55, ORF72, ORF88, ORF92, ORF93, and ORF142) in the SH-01 genome ( Figure 1A and Table S1), of which ORF72 and ORF92 are significantly conserved among alloherpesviruses. In addition, ORF99 is predicted to be the same as in spike-torovirus, a transmembrane protein of coronaviruses that mediates the binding of viruses to host cell receptors and participates in membrane fusion. ORF139 encodes a protein homologous to the C-terminal structure of the poxvirus B22R protein. Furthermore, ORF155 encodes a protein homologous to the major outer-envelope glycoprotein (BLLF1) of EBV, also known as gp350, which is abundantly expressed in the envelope of EBV and is the antigen responsible for stimulating neutralizing antibody production in host cells [41]. ORF30 encodes a protein homologous to the late lytic protein BDLF3 of EBV associated with immune evasion [42]. The protein encoded by ORF49 is homologous to infected-cell polypeptide 4 (ICP4), a major transcriptional regulator of the herpes simplex virus type 1 (HSV-1), forming a tripartite complex with the TATA-binding protein (TBP) and the transcription factor IIB (TFIIB), related to the activation of L genes [35,43].

Comparison of Genome Structure and ORFs Arrangement
The genomes of three isolates of CyHV-2, namely ST-J1, SY-C1, and SY, have been sequenced and annotated through comparative genomics [33,44,45]. They are 290,304, 289,365, and 290,455 bp in length, encoding 150, 140, and 150 unique ORFs, respectively, and all genomes possess U and TR features at each end. Consistently, our results showed that the genome of SH-01 is 290,428 bp in length, with U and TR at each end, encoding 150 unique ORFs, sharing 99.60%, 98.53%, and 98.35% sequence identity with ST-J1, SY-C1, and SY, respectively (Table 1) (Table S1). a Reprinted with permission from Ref. [33]. 2013, Andrew J. Davison; b Reprinted with permission from Ref. [44]. 2015, Lijuan Li; c Adapted with permission from Ref. [45]. 2018, Bo Liu; d Adapted with permission from Ref. [46]. 2007, Takashi Aoki; e Adapted with permission from Ref. [24]. 2010, Steven J. van Beurden; f Adapted with permission from Ref. [26]. 2016, Xiaotao Zeng; g U represents the unique region in the genome (except in that of CaHV); h TR represents the terminal repeat in the genome (except in that of CaHV); i number of ORFs in U plus two copies of TR; j number of ORFs in U plus one copy of TR; k the genome identities of eight strains were aligned through MAFFT by Geneious Prime v2022. Steven J. van Beurden; f Adapted with permission from Ref. [26]. 2016, Xiaotao Zeng; g U represents the unique region in the genome (except in that of CaHV); h TR represents the terminal repeat in the genome (except in that of CaHV); i number of ORFs in U plus two copies of TR; j number of ORFs in U plus one copy of TR; k the genome identities of eight strains were aligned through MAFFT by Geneious Prime v2022. Compared with CyHV-1 and CyHV-3, similar to ST-J1, the counterparts of some ORFs in the TR in CyHV-3 (ORF1, ORF2, ORF3, and ORF4) or CyHV-1 (ORF2 and ORF3) [33] are located closely downstream of the U in SH-01, related to flanking genes, with only ORF5, ORF6, ORF7, and ORF8 in the TR of SH-01. Moreover, the CyHV-3 genome remains the largest overall, and CyHV-1 contains the fewest ORFs among CyHVs, but unlike CyHV-1 and CyHV-3, the CyHV-2 SH-01 genome has more complexity in terms of copy size and arrangement of the U and the TR. Similar to ST-J1, SH-01 also contains a 220 bp inverted repeat region downstream of ORF25C and ORF48, but it was not observed in SY, Compared with CyHV-1 and CyHV-3, similar to ST-J1, the counterparts of some ORFs in the TR in CyHV-3 (ORF1, ORF2, ORF3, and ORF4) or CyHV-1 (ORF2 and ORF3) [33] are located closely downstream of the U in SH-01, related to flanking genes, with only ORF5, ORF6, ORF7, and ORF8 in the TR of SH-01. Moreover, the CyHV-3 genome remains the largest overall, and CyHV-1 contains the fewest ORFs among CyHVs, but unlike CyHV-1 and CyHV-3, the CyHV-2 SH-01 genome has more complexity in terms of copy size and arrangement of the U and the TR. Similar to ST-J1, SH-01 also contains a 220 bp inverted repeat region downstream of ORF25C and ORF48, but it was not observed in SY, SY-C1, or CaHV [26], nor in CyHV-1 and CyHV-3. Compared with the genomic structure of CyHV-3, ORF4 is located downstream of the U rather than in the TR in SH-01, while ORF4 is deleted in CyHV-1. Importantly, ORF140 undergoes a large translocation in CyHV-1 and is located upstream of the U, while in the CyHV-2 and CyHV-3 strains it is located downstream. In addition, ORF2A is inserted between ORF2 and ORF3 in SH-01, and there are many insertions, deletions, rearrangements, and inversions in the arrangement of ORFs in SH-01; for example, the orientation of ORF128-133 and ORF135-138 is opposite compared to CyHV-1 and CyHV-3 ( Figure 2 and Table S1).
Moreover, the genome of SH-01 also shares a high identity (92.63%) with CaHV [26], although CaHV has no TR repeats, and ORF1-13 at the downstream end of its genome correspond to ORF153B-ORF8 at the upstream end of the SH-01 genome, while ORF144 in CaHV corresponds to ORF4 in SH-01 (Table 1 and Figure 2). Additionally, among the genus Cyprinivirus, AngHV-1 is distantly related to SH-01, with only 53 homologous ORFs, all with less than 60% identity; for example, ORF5 in the TR of AngHV-1 is homologous to ORF123 in the U of SH-01 (Table S1 and Figure 2).

Genomic Evolutionary Relationships among SH-01 and the Other Seven Strains
Based on the sequencing and analysis of viruses of the genus Cyprinivirus, the genomes of CyHVs have been in found multiple genetic information changes such as gene recombination, including deletions, duplications, rearrangements, and horizontal transfers, or nucleotide mutations, including base substitutions, insertions, and deletions, which has resulted in a high complexity and diversity of these genomes through evolution. Typically, orthologous genes can "jump" within genomes, known as gene rearrangement, and maximal collinear sets of homologous sites are regarded as locally collinear blocks (LCBs) that cover a "block" of sequences without any internal genome rearrangement, or causing changes in genome structure by inserting as "new genes" into other genomes [47].
Herein, we analyzed the evolutionary patterns among the homologous or heterologous regions of the genomes of eight isolates, including SH-01. The evolutionary relationships between the genomes of SH-01 and the other seven isolates were explored ( Figure 3A). Different LCBs are marked with regions in different colors, and these can "jump" or rearrange across the genome as a complete unit. In contrast to SH-01, LCBs on the same side of the centerline indicate the same transcriptional direction, while those on the opposite side indicate an inverted transcriptional direction. The spacer region (outside the LCBs) is considered to have no or very low homology, and the colored part inside the LCBs shows the corresponding parts of the homologous gene sequences ( Figure 3A,B). The genome of SH-01 contains 19 LCBs with variation in sequence length and in at least one ORF inside each LCB. SH-01 showed high consistency in orientation and alignment of LCBs compared with the other three CyHV-2 isolates, despite differences in the number of LCBs. In particular, the corresponding lengths of the same LCBs were also highly similar, suggesting that these four isolates may have similar pathways in genome evolution. Consistent with the results in Figure 2, LCB14-18 (downstream end of the genome) of SH-01 correspond to LCB1-8 (upstream end of the genome) of CaHV. In contrast, compared with CyHV-1, CyHV-3, and AngHV-1, there are many differences in the number, orientation, alignment order, and corresponding length of the LCBs. Specifically, compared with CyHVs, AngHV-1 has the most divergent LCBs, and only seven are homologous. Notably, consistent with Figure 2, ORF140 jumped to LCB4 (upstream end of the genome) in CyHV-1, while it is located at LCB7, LCB22, and LCB15 (downstream end of the genome) in CyHV-2, CyHV-3, and CaHV, respectively. In addition, there are many inversions of the LCBs in CyHV-1 and CyHV-3, as shown in Figure 3A.
To better understand the genomic evolutionary differences, we further compared the LCBs of SH-01, CyHV-1, and CyHV-3 ( Figure 3B). Compared with the genome of SH-01 (290,428 bp), this was increased by 716 bp and 4718 bp in CyHV-1 and CyHV-3, respectively, with 55 and 40 differential genes, including deletions and insertions, and there are 122 homologous genes among the three viruses (Tables 1 and S1, Figure 2). The comparison of the LCBs revealed that both SH-01 and CyHV-3 contained 10 LCBs, with one lacking in CyHV-1, corresponding to positions 209,037-220,265 in the SH-01 genome (11,229 bp, containing ORF138 and ORF139). Additionally, there are some differences in the alignment order and corresponding length of LCBs among the three viruses. Moreover, the orientation of LCB4 (1911 bp, containing ORF21-23, corresponding to LCB2 of SH-01) and LCB6 (7421 bp, containing ORF114-115, ORF120-121, ORF123-124, and ORF126-129, corresponding to LCB5 of SH-01) in CyHV-1, as well as LCB6 (10,781 bp, containing ORF128-133 and ORF135-136, corresponding to LCB7 of SH-01) in CyHV-3 are arranged opposite to their counterparts in SH-01 ( Figure 3B and Table 2). Interestingly, consistent with Figure 2 and Figure 3A, LCB2 (472 bp, containing ORF140, corresponding to LCB6 and LCB8 of SH-01 and CyHV-3, respectively) in CyHV-1 jumped to the interval between LCB1 and LCB3, located at the upstream end of the genome. This suggests a possible molecular mechanism in which there may be potential mutational hotspot sites on both sides of the LCB. Furthermore, a phylogenetic tree was constructed based on the aa sequences of helicase (ORF71). This showed the clear clustering of the newly determined sequences with the previously described CyHV-2 and the closest relationship with ST-J1; it is clustered with CyHV-1 and CyHV-3, separate from AngHV-1 ( Figure 3C). a "None" responses in which no parallel LCB exists in the genome of CyHV-1; b the changes in base pairs in the parallel LCBs of the CyHV-1 or CyHV-3 genome relative to SH-01. "+" responses decrease in base pairs, while "−" responses increase in base pairs.

Discussion and Conclusions
In our previous research, we isolated a new strain, named CyHV-2 SH-01, during outbreaks in crucian carp at a local fish farm near Shanghai, China, and confirmed that goldfish also showed high susceptibility and mortality with symptoms similar to HVHN [27]. Here, we present the complete genome structure and molecular characterization of SH-01. Although CyHV-2 is distributed worldwide, different CyHV-2 isolates have not been comprehensively compared. Our results will provide more background for future research.
Similar to the genome structures of viruses in the genus Cyprinivirus, our results showed that the complete genome of SH-01 is 290,428 bp in length with an overall G+C content of around 51.60%, including a U and TR region at both ends. It contains 154 predicted ORFs, in which four ORFs (ORF5, ORF6, ORF7, and ORF8) are duplicated in the TRs. Notably, the genome of SH-01 shares 99.60% of its sequence identity with that of ST-J1, and similar to ST-J1, SH-01 also contains a 220 bp inverted repeat region downstream of ORF25C and ORF48, but it was not observed in SY, SY-C1, and CaHV, nor in CyHV-1 and CyHV-3. Moreover, we found several variations in the SH-01 genome compared to the other seven closely related viruses as per the following discussions.
Gene expression during the lytic replication of herpesviruses is characterized by a distinct chronological sequence involving three main temporal phases, respectively, IE, E, and L genes. Similar to herpesviruses, Tang et al. (2020) recently identified and screened five IE genes (ORF54, ORF121, ORF141, ORF147, and ORF155), 34 E genes, and 39 L genes in the CyHV-2 ST-J1 genome using HTS combined with cycloheximide (CHX) and cytarabine (Ara-C) inhibitors. They found that all five IE genes were transcribed within 30 min after infection with CyHV-2; E genes, such as ORF80, ORF89, and ORF97, could be detected at 1 h post-infection (hpi), and most of the other E genes appeared at 1-2 hpi, while L genes such as ORF7 appeared at 6 hpi, and replication was completed within 8 h [36]. Innovatively, with reference to ST-J1, we have marked five IE, 34 E, and 39 L genes on the genome map ( Figure 1A and Table S1), but it needs to be further identified and confirmed experimentally. The expression patterns are the result of complex interactions between herpesviruses and the cytokines of hosts. After the virus invades the host cell, IE genes initiate transcription immediately, relying on the transcription and translation system of the host to provide transcriptional activating proteins that control the transcription of the E and L genes. The E genes encode proteins that are involved in regulating the physiological state of host cells to facilitate viral DNA replication and metabolism. Subsequently, the L genes, which primarily encode structural proteins, begin to be transcribed, ultimately leading to the assembly and release of infectious virions [35,50,51]. In the future, we can focus on the transcriptional regulatory functions of viral genes during CyHV-2 replication and the mechanisms of interaction with the host.
The prediction of the functional features of proteins encoded by the virus is essential for further understanding of the pathogenic properties and infection mechanism of CyHV-2, and this is undoubtedly indispensable for the development of targeted antiviral drugs or vaccines. In present work, 55 putative proteins encoded by SH-01 are predicted to contain one or more conserved domains (Table S1). A total of 26 ORFs encoding proteins of SH-01 were predicted to possess an SP, and SPs are mainly short peptides located at the N-terminus of proteins that may serve as potential targets for drugs [52]. Specially, we revealed that six ORFs (ORF25C, ORF34, ORF52, ORF119, ORF127, and ORF151A) encoding proteins contain an SP but no TMD, which indicates that these proteins could be secreted. Moreover, ORF64, ORF114, ORF152A, ORF16, and ORF153B of SH-01 have 10,9,8,7 and 6 TMDs, respectively. These proteins contain many TMDs that span the membrane multiple times, indicating that they may be important membrane proteins for CyHV-2; they may have important functions in substance transport, signal transduction, and membrane receptor recognition and serve as potential antiviral drug targets. Interestingly, latent membrane protein-1 (LMP1) encoded by the Epstein-Barr virus (EBV) contains a domain appertaining to the TNFR family that participates in many signaling pathways of host cells to influence their proliferation and differentiation for demands of virus replication [53]. The TNFR domain of ORF4 predicted in SH-01 may have a similar function. In addition, we also predict that many proteins encoded by SH-01 play a role in viral DNA replication, metabolism, and repair, however, further research on the mechanisms of CyHV-2 replication is needed.
Furthermore, genome sequence comparisons demonstrate that the variations among SH-01 and the other seven strains are evident. The variations are gene recombinations, including deletions, duplications, rearrangements, and horizontal transfers, and/or nucleotide mutations, including base substitutions, insertions, and deletions ( Figures 2 and 3A,B). We found that 71 out of 150 ORFs are identical (100%) between SH-01 and ST-J1, and only 23 (of the total 154 ORFs) have no variation in SH-01 compared with ST-J1, SY, and SY-C1 (Table S1). This suggests that CyHV-2 has co-evolved with its host, and host adaptation has led to genomic diversity among the strains isolated from different hosts [45]. Notably, one group proposed that CyHV-2 could be divided into two Chinese (C) and Japanese (J) genotypes based on differences between the genomes of SY-C1 isolated from China and ST-J1 isolated from Japan, according to their isolation sites [44]. Given the high genomic similarity between SH-01 and ST-J1, SH-01 may be classified as a J genotype.
Moreover, the derived genome sizes (Table 1) are sequences obtained by sequencing and do not precisely equate to the actual sizes of the viruses because the genome of each CyHV contains many tandem direct reiterations of short sequences, often in complex forms containing partial or scattered repeats; repeated sequences are characteristic of most herpesvirus genomes and their lengths are often variable, leading to heterogeneity in genome size [33]. We further compared the LCBs of SH-01, CyHV-1, CyHV-3, and AngHV-1; there are several differences in the number, orientation, alignment order, and corresponding length of the LCBs. This implies that these four viruses contain inserted and deleted genes, and they underwent events involving genes jumping and/or differences in evolution rates in LCBs under long-term host selection pressure, allowing the viruses to occupy more diverse niches [54]. Specifically, compared with CyHVs, AngHV-1 has the most divergent LCBs, and only seven are homologous, suggesting that they diverged earlier in genome evolution, but share a common ancestor and evolved separately in different directions to adapt to environmental stress. Obviously, genome alignment facilitates downstream evolutionary inferences, such as rearrangement history, phylogeny, prediction of ancestral states, and detection of selective pressures, influencing coding sequences and noncoding sequences [55,56].
Summarizing these findings is valuable for future research. Overall, the complete genome sequence and structure of CyHV-2 SH-01 was analyzed and compared with those of CyHVs, AngHV-1, and CaHV. Several variations were found in SH-01, including nucleotide mutations, deletions, and insertions, as well as gene duplications, rearrangements, and horizontal transfers. Notably, the genome of SH-01 isolated from crucian carp shares 99.60% of its identity with that of ST-J1 isolated from goldfish, implying that SH-01 may have originated from goldfish and had been introduced to crucian carp, which confirms our previous work [27]. Our findings provide information to further understand the CyHV-2 genome through sequencing and sequence mining.