Selective Depletion of ZAP-Binding CpG Motifs in HCV Evolution

Hepatitis C virus (HCV) is a bloodborne pathogen that can cause chronic liver disease and hepatocellular carcinoma. The loss of CpGs from virus genomes allows escape from restriction by the host zinc-finger antiviral protein (ZAP). The evolution of HCV in the human host has not been explored in the context of CpG depletion. We analysed 2616 full-length HCV genomes from 1977 to 2021. During the four decades of evolution in humans, we found that HCV genomes have become significantly depleted in (a) CpG numbers, (b) CpG O/E ratios (i.e., relative abundance of CpGs), and (c) the number of ZAP-binding motifs. Interestingly, our data suggests that the loss of CpGs in HCV genomes over time is primarily driven by the loss of ZAP-binding motifs; thus suggesting a yet unknown role for ZAP-mediated selection pressures in HCV evolution. The HCV core gene is significantly enriched for the number of CpGs and ZAP-binding motifs. In contrast to the rest of the HCV genome, the loss of CpGs from the core gene does not appear to be driven by ZAP-mediated selection. This work highlights CpG depletion in HCV genomes during their evolution in humans and the role of ZAP-mediated selection in HCV evolution.


Introduction
Hepatitis C infection is a common cause of chronic liver disease and cirrhosis worldwide [1,2]. The current estimates of the global prevalence of hepatitis C virus (HCV) infection show that 0.7% of the global population is infected by HCV, accounting for 56.8 million cases as of January 2020 [3]. The WHO estimated that in 2019, around 290,000 people died from hepatitis C mostly from cirrhosis and hepatocellular carcinoma [4]. The Hepatitis C virus is a positive sense strand RNA virus belonging to the family Flaviviridae. Although there are 7 genotypes with over 65 subtypes, genotype 1 is the most prevalent, followed by genotypes 2 and 3 [5]. Other genotypes are geographically restricted [6].
The abundance of the CpG dinucleotides in viruses and their association with virus evolution and host-evasion strategies have been explored for both RNA [7,8] and DNA viruses [9,10]. The CpG content differs significantly amongst viruses [11,12]. The depletion of CpG dinucleotides in DNA viruses is attributed to several mechanisms, including the stimulation of TLR-9 immune responses and deamination of cytosines that are methylated by DNA methyl transferases. [10,11]. Although CpG dinucleotides in RNA virus genomes are not methylated, most RNA viruses infecting humans are CpG depleted [12]. A recent report identified selective binding of Zinc-finger Antiviral Protein (ZAP) to CpG enriched motifs in viral genomes leading to the restriction of virus replication [13]. Furthermore, the sequence of specific ZAP-binding motifs has also been identified [14]. The underlying mechanisms of ZAP-mediated restriction of RNA viruses is not well understood. CpG depletion in SARS CoV2 has been linked to ZAP-mediated selection pressure [7].
The majority of individuals infected with HCV progress to chronic HCV infection. Therefore, HCV has long-term interactions with the human host. Although the impact of HCV on the methylation of host genes has been studied [15], the loss of CpG dinucleotides from the HCV genomes during the last four decades of evolution in humans has not been studied. Furthermore, the role of ZAP-mediated pressures, if any, in the evolution of HCV remains unknown. The scenario for the treatment of HCV infection has changed rapidly since the introduction of Direct-Action Antivirals (DAA), which was first approved for use in 2011 for genotype 1 virus variants, after which second-generation DAAs were introduced in December 2013 [16]. Most patients on DAA regimens attained sustained virological response (SVR) [17], however, treatment failure was seen in about 5-10% [18]. Moreover, DAA-mediated SVR does not eliminate the risk of development of hepatocellular carcinoma (HCC). The availability of over 3000 complete HCV genome sequences allows us to investigate CpG depletion, ZAP-mediated selection pressures, and the effect of DAAs, if any, on virus evolution.

Materials and Methods
We retrieved 3983 complete HCV genomes available in the LANL's HCV database (https://hcv.lanl.gov/; accessed on 28 September 2022; Last GenBank update: 1 July 2022). We used the default conservative criteria of the database to exclude sequences with too many Ns (high content of non-ACTG characters), contaminants (likely contamination with a laboratory strain), synthetic sequences, sequences containing an artifactual deletion of >100 NTs, and tiny sequences (<50 bp). We analysed 2616 sequences with information on the year of sample collection for this study (Accession numbers of sequences are listed in Supplementary Table S1).
The mono and dinucleotide frequencies were calculated as percentages from the sequence length, excluding inserts (-) and Ns from the MSA of HCV sequences. The dinucleotide O/E ratio is a normalized abundance of dinucleotides against the constituent mononucleotides, which helps with understanding whether the changes in dinucleotide frequency compared to the changes in the constituent mononucleotides. The dinucleotides O/E ratios were calculated using the formula: where f (X p Y) = observed frequency of dinucleotide, f (X) = frequency of nucleotide X, f (Y) = frequency of nucleotide Y, G = Genome length. ZAP-binding motifs (i.e., C(n m )G(n)CG, where m = 4/5/6/7/8) in the sequences were found using re module (v 2.2.1). GC content was calculated using Biopython's (v 1.79) built-in method.
The multiple sequence alignment (MSA) of 2616 sequences was created using mafft v7.490 [19] in a single step. Each sequence was individually aligned to the H77 reference genome (Accession ID: NC_004102.1). A python script was used to generate mapping between pre-alignment and post-alignment reference sequence positions. The resulting MSA with a total 2616 full length sequences was used to analyse the number of sequences that lost a ZAP-binding motif from each of the ZAP-binding motif sites (i.e., n = 258 ZAP-binding motif sites present in the H77 HCV reference sequence).
The violin plots, line plots, scatter plots, and bar plots were created using seaborn (v 0.11.2). The moving average plot for the number of CpGs and number of ZAP-binding motifs in the reference sequence were generated in seaborn (v 0.11.2), and the calculations were done using the pandas' (v 1.3.5) rolling function with a window size of 500 bp (with window labels set as the centre of the window and no points in the window were excluded from calculations). Percentages of mononucleotides and dinucleotides were plotted along a time axis of 1-year intervals. Since multiple sequences are reported each year, a 95% confidence interval band was plotted alongside the mean values. The boxplot within the violin plots depict the lower quartile, the median, and the upper quartile. p < 0.01 was considered statistically significant. The Mann-Whitney U test was con-Pathogens 2023, 12, 43 3 of 11 ducted using scipy (v 1.7.3) to compare the medians in the violin plots, and Pearson's correlation coefficient was determined for the numbers of CpGs and ZAP-binding motifs using scipy (v 1.7.3). Bar plots were created using seaborn (v 0.11.2) by extracting the median values of the violin plots for obtaining the loss of CpG motifs and ZAP-binding motifs, and the number of CpGs and ZAP-binding motifs were plotted as in reference sequence. The codes used for the analysis in this study are available at the Github repository (https://github.com/iamakhilverma/hcv_seqs_analysis.git; uploaded on 7 December 2022.). The statistical calculations for the barplots, Chi-square tests, were performed in Graphpad.

Loss of CpG Content and ZAP-Binding Motifs in HCV Genome
Full-length HCV genomes were downloaded from the HCV sequence database. We analysed a total of 2616 HCV full-length HCV genomes available from 1977 to 2022 (please see methods for details). To investigate the changes in CpG content of HCV genomes during human evolution, we analysed historical HCV sequences (i.e., all HCV sequences available from samples collected on or before 2001; n = 201 sequences) and contemporary HCV sequences (i.e., all HCV sequences available from samples collected on or after 2010; n = 1319 sequences). Interestingly, the median CpG numbers from historical HCV sequences were significantly higher than that from contemporary HCV sequences ( Figure 1A; Median 586 vs. 536; p < 0.0001). Our data suggests that over 40 (median) CpGs were lost from HCV genomes during their four decades of evolution in humans.
A reduction in GC% of the virus genomes can result in the reduction of CpG numbers. To investigate whether the reduction in CpG numbers in HCV genomes during their evolution in humans is merely a reflection of variation in the GC%, we estimated the relative abundance CpG dinucleotides, which is calculated as O/E (observed/expected) ratios (i.e., CpG dinucleotide content normalized to the numbers of the constituent C and G mononucleotides). If the loss of Cs and Gs from the genome is resulting in the loss of CpG dinucleotides from HCV genomes, the O/E ratios of the historical and contemporary HCV sequences will be comparable. On the contrary, decreasing O/E ratios of HCV genomes over time suggest that the loss of CpGs is independent of the GC% ( Figure 1B). The validity of this finding is further strengthened by the fact that there is indeed a significant reduction in the GC content of HCV genomes during this period (Supplementary Figure S1). We then wanted to investigate the CpG content of HCV genotypes. However, the availability of very few sequences (i.e., less than 25 sequences) from before 2001 precluded this analysis of HCV genotypes with the exception of genotype 1. Loss of CpGs, CpG O/E ratios, and ZAPbinding motifs with time were observed for HCV genotype 1 (Supplementary Figure S2). The lack of an adequate number of sequences from other HCV genotypes/subtypes did not allow us to elucidate genotype/subtype-specific differences in CpG content and ZAPbinding motifs, if any, across HCV genomes. Although there is no genotype-specific selection bias in the sequences analysed in our study, we cannot rule out a role for the differences in the distribution of HCV genotypes over time and its potential impact on our findings.
CpG-rich regions can be targets for ZAP-mediated virus restriction. ZAP was recently shown to bind to diverse sequence motifs with a terminal CpG dinucleotide (i.e., C(n4-8)G(n)CG), where n = a/c/g/t) [14]. ZAP is known to inhibit viral replication by binding to CpG-rich regions of the viral genome [13]. Contemporary HCV sequences had significantly lower numbers of ZAP-binding motifs as compared to historical HCV sequences ( Figure 1C). Together, our data suggest that during evolution in humans, HCV genomes have lost CpGs (both absolute numbers and relative abundance) as well as ZAP-binding motifs. HCV is one of the few RNA viruses that has a long-term relationship with the host. Pronounced depletion of CpG content and ZAP-binding motifs in the HCV genome during its evolution in the human host may represent a strategy for evasion of host innate immune responses. A reduction in GC% of the virus genomes can result in the reduction of CpG numbers. To investigate whether the reduction in CpG numbers in HCV genomes during their evolution in humans is merely a reflection of variation in the GC%, we estimated the relative abundance CpG dinucleotides, which is calculated as O/E (observed/expected) ratios (i.e., CpG dinucleotide content normalized to the numbers of the constituent C and G mononucleotides). If the loss of Cs and Gs from the genome is resulting in the loss of CpG dinucleotides from HCV genomes, the O/E ratios of the historical and contemporary HCV sequences will be comparable. On the contrary, decreasing O/E ratios of HCV genomes over time suggest that the loss of CpGs is independent of the GC% ( Figure 1B). The validity of this finding is further strengthened by the fact that there is indeed a significant reduction in the GC content of HCV genomes during this period (Supplementary Figure  S1). We then wanted to investigate the CpG content of HCV genotypes. However, the availability of very few sequences (i.e., less than 25 sequences) from before 2001 precluded this analysis of HCV genotypes with the exception of genotype 1. Loss of CpGs, CpG O/E ratios, and ZAP-binding motifs with time were observed for HCV genotype 1 (Supplementary Figure S2). The lack of an adequate number of sequences from other HCV genotypes/subtypes did not allow us to elucidate genotype/subtype-specific differences in

Correlation between Loss of ZAP Binding Motif and CpG Loss
ZAP-binding motifs contain a terminal CpG. Nonetheless, it is not known whether the number of CpGs in virus genomes correlates with the number of ZAP-binding motifs. Since ZAP-binding motifs and CpGs are both lost from HCV genomes with time, we investigated the correlation, if any, between CpG numbers and the number of ZAP-binding motifs. We found a strong correlation between CpG numbers and the number of ZAP-binding motifs in the HCV genome ( Figure 2; r = 0.84; p < 0.0001; Pearson's r statistical calculations). Our results indicate that the CpG numbers in the HCV genome may serve as a surrogate for the number of ZAP-binding motifs. vestigated the correlation, if any, between CpG numbers and the number of ZAP-b motifs. We found a strong correlation between CpG numbers and the number of binding motifs in the HCV genome (Figure 2; r = 0.84; p <0.0001; Pearson's r sta calculations). Our results indicate that the CpG numbers in the HCV genome may as a surrogate for the number of ZAP-binding motifs.

CpG Depeletion in HCV Genomes Are Primarily Driven by the Loss of ZAP-Binding Motifs
The analysis of CpG dinucleotides in historical and contemporary HCV sequences reveals that over 40 CpGs are lost during evolution in the human host ( Figure 1A). To understand if ZAP-mediated selection pressures are major drivers of CpG depletion in HCV genomes, we assessed the median numbers and the proportion of CpGs lost from within ZAP-binding motifs and outside ZAP-binding motifs. The majority of the CpGs lost [38 of 50 CpGs lost (76%) over time were from within ZAP-binding motifs] were lost due to the loss of the CpGs within ZAP-binding motifs ( Figure 3A,B), suggesting that ZAP-mediated selection pressures may be the major drivers of CpG depletion during the evolution of HCV genomes in the human host. Our findings clearly support evolutionary or survival advantages for the loss of CpGs from ZAP-binding motifs in the HCV genome as opposed to the loss of CpGs that lie outside ZAP-binding motifs ( Figure 3C).

CpG Depeletion in HCV Genomes Are Primarily Driven by the Loss of ZAP-Binding Motifs
The analysis of CpG dinucleotides in historical and contemporary HCV sequences reveals that over 40 CpGs are lost during evolution in the human host ( Figure 1A). To understand if ZAP-mediated selection pressures are major drivers of CpG depletion in HCV genomes, we assessed the median numbers and the proportion of CpGs lost from within ZAP-binding motifs and outside ZAP-binding motifs. The majority of the CpGs lost [38 of 50 CpGs lost (76%) over time were from within ZAP-binding motifs] were lost due to the loss of the CpGs within ZAP-binding motifs ( Figures 3A,B), suggesting that ZAP-mediated selection pressures may be the major drivers of CpG depletion during the evolution of HCV genomes in the human host. Our findings clearly support evolutionary or survival advantages for the loss of CpGs from ZAP-binding motifs in the HCV genome as opposed to the loss of CpGs that lie outside ZAP-binding motifs ( Figure 3C).

Selective Conservation of Specific ZAP-Binding Motifs during HCV Evolution
Temporal loss of ZAP-binding motifs from HCV genomes is evident in our results ( Figure 1E). However, to visualize the loss of ZAP-binding motifs at specific genome locations, data on the conservation of each ZAP-binding motif along the length of the whole HCV genomes were plotted. These data from all the 2616 sequences were mapped to the reference HCV genomic sequence H77 (Accession number: NC_004102) ( Figure 4A). Of note, only about 5% of the 258 predicted ZAP-binding sites in the HCV reference genome were conserved in ≥90% HCV genomes analysed ( Figure 4A). Interestingly, out of the 30 ZAP-binding sites conserved in >75% of the genomes, 18 were present within the first 1000

Selective Conservation of Specific ZAP-Binding Motifs during HCV Evolution
Temporal loss of ZAP-binding motifs from HCV genomes is evident in our results ( Figure 1E). However, to visualize the loss of ZAP-binding motifs at specific genome locations, data on the conservation of each ZAP-binding motif along the length of the whole HCV genomes were plotted. These data from all the 2616 sequences were mapped to the reference HCV genomic sequence H77 (Accession number: NC_004102) ( Figure 4A). Of note, only about 5% of the 258 predicted ZAP-binding sites in the HCV reference genome were conserved in ≥90% HCV genomes analysed ( Figure 4A). Interestingly, out of the 30 ZAP-binding sites conserved in >75% of the genomes, 18 were present within the first 1000 nucleotides of the genome showing a clear enrichment at the core protein region of the genome ( Figure 4A). The moving average plot of CpGs and ZAP-binding motifs across the genome also revealed a similar picture, where the gene encoding the HCV core protein (342-914 nucleotide positions) is enriched for CpGs and ZAP-binding motifs ( Figure 4B,C).  Despite a good correlation between the number of CpGs and ZAP-binding motifs in HCV genomes (Figure 2), specific genomic regions appear to be exceptions. For example, the genes coding for E1 and NS3 proteins appear to have high CpG numbers but relatively fewer ZAP-motifs ( Figures 4B,C). The genes encoding the HCV core protein, membrane protein, and NS5B are enriched for ZAP-binding motifs ( Figure 4C). Interestingly, the conservation of ZAP-binding motif is much more pronounced in the HCV Core gene compared to those encoding the membrane protein or NS5B ( Figure 4A). Despite a good correlation between the number of CpGs and ZAP-binding motifs in HCV genomes (Figure 2), specific genomic regions appear to be exceptions. For example, the genes coding for E1 and NS3 proteins appear to have high CpG numbers but relatively fewer ZAP-motifs ( Figure 4B,C). The genes encoding the HCV core protein, membrane protein, and NS5B are enriched for ZAP-binding motifs ( Figure 4C). Interestingly, the conservation of ZAP-binding motif is much more pronounced in the HCV Core gene compared to those encoding the membrane protein or NS5B ( Figure 4A).

Enhanced Loss of CpGs from the HCV Core Gene from Outside ZAP-Binding Motifs
The high CpG numbers and ZAP-binding motifs in the HCV core gene may also imply that genes encoding key structural proteins of a virus are more resistant to mutations and/or loss of CpGs. We then investigated the high numbers of CpGs and ZAP-binding motifs in the HCV core gene. Previous reports on the HCV core gene suggest a higher degree of conservation to maintain the structural integrity of the virus [20,21]. We therefore investigated the loss of CpGs and ZAP-binding motifs in this region as compared to the rest of the genome. There were significant decreases in CpG numbers, CpG O/E ratios, and number of ZAP-binding motifs in the core-protein encoding region (nucleotide positions 342-914) of the viral genome ( Figure 5A-C). Both CpG dinucleotides and ZAP-binding motifs are enriched in the core protein region as compared to the rest of the genome ( Figure 5D,E). When we plotted the proportion of CpGs lost (median) from the core gene and the rest of the genome, the proportion of CpGs lost from the core gene was almost four-fold higher than that from the rest of the HCV genome ( Figure 5F). Of note, the proportion of ZAP-binding motifs lost from the core gene is marginally lower than that from the rest of the genome ( Figure 5G). Interestingly, although the proportion of total CpGs lost from the HCV core gene is much higher than that from the rest of the genome, the loss of ZAP-binding motifs is not the primary driver of CpG loss within the core gene; this is in contrast to the finding at the whole genome level. This finding explains at least in part the conservation ZAP-binding motifs, a phenomenon that was unique to the HCV core gene ( Figure 4A). Nonetheless, the mechanisms underlying the enhanced CpG loss from outside the ZAP-binding motifs within the HCV core gene merits investigation.

Enhanced Loss of CpGs from the HCV Core Gene from Outside ZAP-Binding Motifs
The high CpG numbers and ZAP-binding motifs in the HCV core gene may also imply that genes encoding key structural proteins of a virus are more resistant to mutations and/or loss of CpGs. We then investigated the high numbers of CpGs and ZAP-binding motifs in the HCV core gene. Previous reports on the HCV core gene suggest a higher degree of conservation to maintain the structural integrity of the virus [20,21]. We therefore investigated the loss of CpGs and ZAP-binding motifs in this region as compared to the rest of the genome. There were significant decreases in CpG numbers, CpG O/E ratios, and number of ZAP-binding motifs in the core-protein encoding region (nucleotide positions 342-914) of the viral genome ( Figure 5A-C). Both CpG dinucleotides and ZAP-binding motifs are enriched in the core protein region as compared to the rest of the genome ( Figure 5D,E). When we plotted the proportion of CpGs lost (median) from the core gene and the rest of the genome, the proportion of CpGs lost from the core gene was almost four-fold higher than that from the rest of the HCV genome ( Figure 5F). Of note, the proportion of ZAP-binding motifs lost from the core gene is marginally lower than that from the rest of the genome ( Figure 5G). Interestingly, although the proportion of total CpGs lost from the HCV core gene is much higher than that from the rest of the genome, the loss of ZAP-binding motifs is not the primary driver of CpG loss within the core gene; this is in contrast to the finding at the whole genome level. This finding explains at least in part the conservation ZAP-binding motifs, a phenomenon that was unique to the HCV core gene ( Figure 4A). Nonetheless, the mechanisms underlying the enhanced CpG loss from outside the ZAP-binding motifs within the HCV core gene merits investigation.

Discussion
High mutation rates, recombination, and mutations in the virus polymerase have been associated with the high genetic diversity of the HCV genome [22]. As a result, HCV genotypes may have up to 30% genetic diversity at specific genomic regions [23]. In addition to subtypes within a genotype, HCV quasispecies or viral variants with an infected host adds to its diversity [24].
Studies on CpG depletion in other RNA viruses have provided interesting insights on virus evolution, pathogenesis, and adaptation to the host [25][26][27][28]. Therefore, the evolution of CpGs HCV in humans over the last four decades represents an interesting but yet unexplored opportunity.
We analysed a total of 2616 HCV genomes from 1977 to 2021 and found a significant reduction in CpG numbers, CpG O/E ratios, and ZAP-binding motifs over time. Contemporary HCV genomes have significantly reduced CpG content and ZAP-binding motifs as compared to historical HCV sequences. These findings suggest a role for CpG depletion in shaping the evolution of HCV. Previous studies have shown that CpG depletion in virus genomes is pronounced during host adaptations [7,27]. In addition, CpG content remains stable for well-adapted human viruses such as influenza B virus [27]. Our findings indicate that CpG content for HCV genomes still appears to be evolving, suggesting ongoing adaptations to the human host. This is consistent with a report that suggests that the most common ancestor of HCV (subtype 1b) infections in humans may date back to early 1900s [29].
The trend of declining CpG numbers, CpG O/E ratios, and the number of ZAPbinding motifs in the HCV genome over time was briefly reversed during the period 2013-2015 ( Figure 1D-F). An increase in CpG content and ZAP-binding motifs in HCV genomes is evident from 2013-2015. Interestingly, this period overlaps with the timeline for the approval of combination DAA therapy [30]. The introduction of antiviral drugs may limit the genetic diversity of viruses in the host, as only a small subset of the virus population with resistant mutations are able to survive. This genetic bottleneck may also lead to the emergence of new drug-resistant variants [31]. We speculate that evolutionary constraints associated with the introduction of combination DAA therapy for HCV may have impacted the evolution of CpGs and ZAP-binding motifs from 2013 to 2015. Previous reports indicate that Ribavarin (anti-HCV agent) leads to the accumulation of mutations at specific genomic locations [32]. Furthermore, some of the DAA anti-HCV drugs target the HCV RNA dependent polymerase [30], which may directly impact the type of mutations occurring in the HCV genome Apart from ZAP-mediated selection pressures, other selection pressures including TLR7-mediated immune selection pressures [7], host-specific selection pressures [12], and tissue-specific selection pressures [26] may be associated with the depletion of CpGs in RNA viruses. Therefore, the number of CpGs may not necessarily correlate with the number of ZAP-binding motifs for a given RNA virus. Nonetheless, we found a good correlation between the CpG numbers and the number of ZAP-binding motifs in HCV genomes ( Figure 2). This finding suggests that CpG numbers in HCV genomes may be surrogates for the number of ZAP-binding motifs.
The role of ZAP-mediated selection pressures in shaping RNA virus evolution has not been well studied. In SARS-CoV-2, the depletion of CpGs has been primarily attributed to pressures acting outside the ZAP-binding motifs. Our finding on the role of ZAPmediated selection pressures as a major driver of HCV evolution (Figure 3) highlights that the CpG depletion in RNA viruses infecting humans is due to fundamental differences in evolutionary pressures. The underlying reasons for contrasting roles of ZAP-mediated selection among human viruses remain elusive. Liver is one of the human tissues where ZAP is highly expressed (Tissue atlas) [33]. A potential role for tissue-specific expression of ZAP and the necessary co-factors for ZAP-mediated restriction merits further investigation.
Among the HCV genes, the HCV core gene is enriched for both CpGs and ZAPbinding motifs. Although ZAP-binding motifs in the HCV genome are depleted with time due to selection pressures, ZAP-binding motifs within the HCV core gene appear to be well conserved (Figure 4). The HCV core protein is a basic protein that interacts with HCV RNA, and oligomerizes and facilitates virus assembly [34]. In addition, the HCV core protein is a nucleic acid chaperone [35]. Mutations and deletions in the Nterminus of the HCV core has been shown to impact virus assembly [36]. We have not identified the specific reasons for the conservation of ZAP-binding motifs in the HCV core gene. Nonetheless, the selective conservation of ZAP-binding motifs in specific genes in virus genomes may indicate the existence of yet unknown constraints that minimize the loss of CpGs/ZAP-binding motifs. Importantly, this finding also suggests that the benefits of retaining CpGs/ZAP-binding motifs over the survival/replication advantages are associated with escaping ZAP-mediated restriction in the host. We also found that the loss of CpGs within the HCV core gene occurs primarily outside ZAP-binding motifs ( Figure 5), suggesting the existence of gene-specific differences in selection pressures.

Conclusions
In conclusion, here we identify a role for CpG depletion in shaping HCV evolution in the human host. Our results also suggest that ZAP-mediated selection pressures are the major drivers of CpG depletion in the HCV genome. The conservation of ZAP-binding motifs in the HCV genome is unique to the HCV core gene, where CpG depletion is primarily driven by selection pressures that are independent of ZAP-mediated restriction. This work highlights the underlying mechanisms of CpG depletion in HCV genomes in humans and sheds light on the contrasting role of different selection pressures at specific genomic locations within a virus genome.