Viral and Host Small RNA Response to SARS-CoV-2 Infection

: After two years into the pandemic of the coronavirus disease 2019 (COVID-19), it remains unclear how the host RNA interference (RNAi) pathway and host miRNAs regulate severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection and impact the development of COVID-19. In this study, we proﬁled small RNAs in SARS-CoV-2-infected human ACE2-expressing HEK293T cells and observed dysregulated host small RNA groups, including speciﬁc host miRNAs that are altered in response to SARS-CoV-2 infection. By comparing dysregulated miRNAs in different SARS-CoV-2-infected samples, we identiﬁed miRNA-210-3p, miRNA-30-5p, and miR-146a/b as key host miRNAs that may be involved in SARS-CoV-2 infection. Furthermore, by comparing virally derived small RNAs (vsmRNAs) in different SARS-CoV-2-infected samples, we observed multiple hot spots in the viral genome that are prone to generating vsmRNAs, and their biogenesis can be dependent on the antiviral isoform of Dicer. Moreover, we investigated the biogenesis of a recently identiﬁed SARS-CoV-2 viral miRNA encoded by ORF7a and found that it is differentially expressed in different infected cell lines or in the same cell line with different viral doses. Our results demonstrate the involvement of both host small RNAs and vsmRNAs in SARS-CoV-2 infection and identify these small RNAs as potential targets for anti-COVID-19 therapeutic development.


Introduction
Two years into the global coronavirus disease 2019 (COVID-19) pandemic, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a new zoonotic coronavirus strain with humans as a host [1], many aspects of this disease are still unknown. Questions that remain to be answered include what is the origin of SARS-CoV-2, how the broad range of symptoms is induced in response to SARS-CoV-2 infection, and what is the contribution of genetic background to the development of COVID-19. One intriguing question is whether this virus interacts with the host endogenous RNA interference (RNAi) pathway during infection and COVID-19 pathogenesis. Additional related questions include whether the host can use RNAi to control SARS-CoV-2-induced pathogenesis, whether SARS-CoV-2 encodes viral suppressors of RNAi (VSR) to antagonize the host RNAi response, whether SARS-CoV-2 encodes viral small RNAs (vsmRNAs), and if yes, whether these vsmRNAs play a role in the infectious cycle of SARS-CoV-2. COVID-19 is a disease associated with complicated immune reactions from the host immune system, including hyperinflammation. In addition to modulating the life cycle of the virus, the host RNAi system may also play an important role in regulating the immune system in response to SARS-CoV-2 infection [2,3]. Answers to these questions could reveal novel mechanisms lated regions (UTR) for other positive viral transcripts. These sequences are also the shared sequence for all viral transcripts, and it is likely that the prime region will be targeted by RNAi or used for encoding vsmRNAs.
Our data from HEK293T-hACE2 and published data indicate that the SARS-CoV-2 infection can produce abundant putative vmiRNAs but much fewer vsiRNAs, and similar to vsiRNAs, the biogenesis of vmiRNAs can be dependent on the antiviral isoforms of Dicer (aviD). Moreover, the reanalysis of datasets used to clone vmiR-ORF7a indicates that the abundance of vmiR-ORF7a varies largely with different infection doses and cell lines, and the abundance does not always correlate with the potential scale of immune response of the cell lines [17,18]. Hence, more studies are needed to unveil the biogenesis and function of vsmRNAs, including vmiR-ORF7a, in SARS-CoV-2 infection, and their role in COVID-19 pathogenesis.

SARS-CoV-2 Infection Dysregulates Host Small RNAs in HEK293T-hACE2 Cells
To achieve a high infection rate and maintain high viability of infected cells to better detect small RNA changes that may be overshadowed by strong host immune responseinduced cell death or viral RNA degradation, we chose HEK293T cells that have a low level of immune response to SARS-CoV-2 infection. HEK293T cells have low hACE2 expression. We addressed this limitation by overexpressing the hACE2 gene in HEK193T cells, which allowed potent SARS-CoV-2 infection, resulting in up to 50% of RNA sequencing reads mapped to the viral genome at 72 h post-infection (hpi) [25]. In the current study, we present our smRNAseq data in the SARS-CoV-2-infected HEK293T-hACE2 cells.
We first observed that read length distribution was altered upon SARS-CoV-2 infection in HEK293T-hACE2 cells. The percentage of reads with lengths shorter than 27 nucleotides (nts) was reduced, while the percentage of reads with lengths longer than 32 nts was increased ( Figure 1A). Next, we compared small RNA population distribution between SARS-CoV-2-infected and mock-infected samples. We found that the relative amounts of several small RNAs in the total small RNA population were significantly dysregulated in SARS-CoV-2-infected samples. The statistical analysis of reads annotation showed that there was a significant change in most small RNAs after SARS-CoV-2 infection. Specifically, the percentage of host miRNAs in total small RNAs was significantly reduced, whereas the amount of ribosome RNA-derived fragments (rRNAs), tRNA-derived fragments (tRFs), non-coding-RNA-derived fragments from ncRNAs, and other smRNAs (such as signal recognition particle RNAs -srpRNAs, small nuclear RNAs-snRNAs, small conditional RNAs-scRNAs, and vsmRNAs) was significantly increased ( Figure 1B). Since small RNAs other than miRNAs generally do not have a specific process that can control their biogenesis and produce consistent mature sequences, it is hard to know which sequences derived from them were differentially changed unless one looks at each sequence individually. Using the small RNA analysis tool miRge3 that can combine all tRFs mapped to a tRNA to produce a combined-tRF read (tRFcomb), we performed a differential expression analysis of tRFcombs and miRNAs by DESeq2. There were 181 upregulated and 44 downregulated host tRFs and miRNAs with a fold change ≥2 [abs(log2FC) ≥ 1] and an adjusted p-value (p-adj) < 0.05. This analysis also showed that there are more tRFs than miRNAs with significant changes, as shown by the volcano plot, for which only miR-210-3p and miR-30d-5p made it into the list with -log10 (p-value) > 20. (Figure 1C, Supplementary File S1A,B). Next, using small RNA qRT-PCR, we measured the amounts of several highly expressed and significantly altered miRNAs, tRFs, and rRNAs, relative to two highly expressed host miRNAs in HEK293T cells, miR-10a-3p, and miR-484 (miR-10a-3p and miR-484 showed similar amounts between SARS-CoV-2-infected and mock-infected samples, therefore they were used as the normalization controls). The qRT-PCR data agreed with the sequencing results in general ( Figure 1D). In summary, our data indicate that the relative amounts of tRFs, rRNAs, and other small RNAs were increased in SARS-CoV-2-infected HEK293T-hACE2 cells, which could have resulted from the SARS-CoV-2-infection-induced cell stress response that leads to the cleavage or degradation of their parental RNAs. The increase in tRFs, rRNAs, and other small RNAs in relative amounts to the total small RNA population could contribute to the decrease in the relative amount of miRNA population in the total small RNA population.
Microbiol. Res. 2022, 13, FOR PEER REVIEW 4 miR-484 (miR-10a-3p and miR-484 showed similar amounts between SARS-CoV-2infected and mock-infected samples, therefore they were used as the normalization controls). The qRT-PCR data agreed with the sequencing results in general ( Figure 1D). In summary, our data indicate that the relative amounts of tRFs, rRNAs, and other small RNAs were increased in SARS-CoV-2-infected HEK293T-hACE2 cells, which could have resulted from the SARS-CoV-2-infection-induced cell stress response that leads to the cleavage or degradation of their parental RNAs. The increase in tRFs, rRNAs, and other small RNAs in relative amounts to the total small RNA population could contribute to the decrease in the relative amount of miRNA population in the total small RNA population.

Host miRNAs Respond to SARS-CoV-2 Infection in HEK293T-hACE2 Cells
Host miRNAs are the largest family of host small RNAs that depend on the RNAi mechanism for biogenesis and function and participate in the interplay between host and viruses. Due to the fact that the above data revealed that SARS-CoV-2 infection in HEK293T-hACE2 cells could decrease the relative amount of miRNA population in all small RNAs, to determine the specific expression changes in miRNAs, we performed a differential expression analysis of miRNAs in SARS-CoV-2-infected versus mock-infected HEK293T-hACE2 cells. Despite robust infection, the scale of changes in the host miRNA expression was mild and only a small number of miRNAs exhibited a substantial change in their expression level. There were 27 miRNAs with a ≥1.5-fold change [abs(log2FC) ≥ 0.6], of which 15 miRNAs exhibited a ≥2-fold change [abs(log2FC)≥ 1] ( Figure 1E, Supplementary File S1C). These results indicate that the SARS-CoV-2 infection in HEK293T-hACE2 cells only causes significant changes in the expression level of a small number of specific miRNAs but not a large population of miRNAs, despite a significant decrease in the percentage of miRNAs in all small RNAs.
To identify miRNAs that were specifically regulated by SARS-CoV-2 infection, we examined the expression levels of the above 27 miRNAs in small RNA profiling data from other SARS-CoV-2-infected human cell lines of COVID-19 patient samples (Table 1). First, we reanalyzed datasets that reported host miRNA level changes in response to SARS-CoV-2 infection. Specifically, the dataset of SARS-CoV-2-infected Calu-3 (Calu3-EW in Figure 2A) [10], miRNA profiling data from the first plasma dataset of COVID-19 patients (COVID-19 in Figure 2A) [13], and the second plasma dataset of COVID-19 patients ( (Table 1). While the difference in small RNA profiles from different datasets indicates that SARS-CoV-2 infection could dysregulate miRNA expression in a cell type-and tissue-dependent manner, we observed overlapping changes in these datasets. We found that miR-1275 and miR-4484 were downregulated whereas miR-1246, miR-7-5p, miR-1258, miR-877-3p, and miR-2115-3p were upregulated in multiple datasets, despite that they were altered at different scales and the change is not always statistically significant.
Based on the expression levels of the above 27 miRNAs in human lung tissues documented by miRmine [30] and their seed sequences matched to sequences on the SARS-CoV-2 genome (Figures 1C-E and 2A, Supplementary File S1C), we identified miR-210-3p and miR-30d-5p as the two host miRNAs that could be involved in or regulated by SARS-CoV-2 infection because their expression levels were significantly altered by SARS-CoV-2 infection, exhibiting higher expression levels in both HEK293T cells and lung tissues. These two miRNAs also harbor multiple seed sequences that match the SARS-CoV-2 genome (Figure 2A, Supplementary File S3A,B), and miR-30d-5p is the miRNA with the highest number of matches among the above 27 miRNAs. miR-210-3p has been shown to be a major player in the hypoxia pathway by regulating the adaptive response to the low oxygen condition [31]. miR-210-3p was slightly downregulated in the SARS-CoV-2-infected Calu3-EW dataset and was also downregulated in the first plasma dataset of COVID-19 patients. However, the same laboratory reported it as an upregulated miRNA in the plasma of SARS-CoV-2-infected ferret samples [13]. In the second plasma dataset of COVID-19 patient samples, miR-210-3p was upregulated in both mild and severe COVID-19 patients, with a higher level in the plasma of severer cases ( Figure 2A) [14]. Therefore, miR-210-3p is a host miRNA that can respond to SARS-CoV-2 infection. Whether it will be upregulated or downregulated upon SARS-CoV-2 infection may be dependent on whether the infection conditions lead to hypoxia or COVID-19 stages with or without the life-threatening hypoxemia symptom [32].    Table 1 for detailed information on infection and abbreviations of samples.
miR-30d-5p has fourteen 7-mer seed sequences and seven additional 6-mer seed sequences that are complementary to the SARS-CoV-2 genome ( Both miR-30a-5p and miR-30e-5p, expressed at relatively high levels in HEK293T cells, were significantly downregulated by the SARS-CoV-2 infection, albeit at a smaller scale (Supplementary File S1B,C). Since all family members of miR-30-5p share the seed sequence for target gene recognition, all other family members of miR-30-5p could also target SARS-CoV-2, thus, miR-30-5p may target the virus in any tissues or cells through different family members that are expressed in different tissues. In addition, we found that both miR-30b-5p and miR-30c-5p were among the significantly downregulated miRNAs in the second plasma dataset of COVID-19 patients [14], and miR-30a/c/d/e-5p were present at high levels in the upper respiratory tract [19]. Therefore, miR-30-5p may be a host miRNA that can respond to SARS-CoV-2 infection and may have antiviral effects or be involved in the host immune response [33].
We did not observe a significant change in the expression of immune-related miRNAs miR-155-3p and miR-146b-5p in SARS-CoV-2-infected HEK293T-hACE2 cells, although both miRNAs have been shown to be dysregulated by the SARS-CoV-2 infection in previous studies [10,14,34], presumably because HEK293T cells lack the immune response. Our reanalysis of the smRNAseq dataset from the second plasma dataset of COVID-19 patients confirmed that miR-146b-5p was downregulated in the plasma of both mild and severe COVID-19 patients. In addition, we found that miR-146a-5p was also downregulated in these samples [14]. There was more than two-fold reduction in the expression of both miR-146a-5p and miR-146b-5p in the plasma of severe COVID-19 patients compared to mild patients, indicating that their expression levels are correlated to the severity of the disease (Supplementary File S4). Therefore, there is a consistent observation of miR-146 reduction in both cell lines and patient samples, indicating that there may be a correlation between the reduced expression level of miR-146 and the exacerbated immune response observed in severe COVID-19 patients [34].
Next, we looked at the expression level changes in the above differentially expressed miR-NAs in the datasets used to clone vmiR-ORF7a, SARS-CoV-2-infected hACE2-overexpressing A549 (A549-hACE2) cells, and Caco2 cells from the laboratory of Germano Cecere (datasets of GC: A5549-GC, Caco2-GC) [17] and SARS-CoV-2-infected A549-hACE2 cells, Calu-3 cells, and PC9 cells from the laboratory of Joan Steitz (datasets of JS: A549-JS, Calu3-JS, and PC9-JS) [18]. We reanalyzed our data and the above datasets in parallel using the same procedure and parameters (Table 1). To focus on miRNAs, we performed small RNA read length filtering, as both laboratories did to their datasets. All the above datasets were filtered to read lengths between 15-nucleotides (nts) and 30-nts before running miRge2 to generate miRNA counts and perform DESeq2 to identify differentially expressed miR-NAs. Although the small RNA population was reduced to a smaller size and the list of differentially expressed small RNAs was slightly different to unfiltered data, we still observed significantly altered host miRNAs in SARS-CoV-2-infected HEK293T-hACE2 cells ( Figure 2B, Supplementary File S5). However, for all datasets of GC and JS, there were only a few highly expressed host miRNAs with substantial change (DESeq2 calculated mean read count ≥50, ≥2-fold change [abs(log2FC) ≥ 1], p-adj ≤ 0.05) in SARS-CoV-2infected A549-hACE2 (GC or JS) and Caco2 (GC) cells using high multiplicity of infection (MOI = 3 or 5), consistent with their conclusion that SARS-CoV-2 infection only affects the expression of a small number of host miRNAs ( Figure 2B, Supplementary Files S2B and S6). Our reanalysis showed that miR-210-3p was only downregulated in the dataset of Caco2-GC-48h but upregulated in the rest of the GC's and JS's datasets. However, the change in miR-210-3p expression in all these datasets was not significant, indicating that a special change in miRNA-210-3p under our infection condition and the production of vmiR-ORF7a may not be associated with the expression change in specific host miRNAs. We also found that the amount of tRFs and rRNAs was reduced from read length filtered data compared to unfiltered data in our HEK293T-hACE2 dataset, but the percentage of reads for rRNAs was increased and the percentage of reads for miRNAs was decreased in infected samples (Supplementary File S5). This observation could probably explain, at least in part, why we have observed significant changes in the levels of tRFs and rRNAs in our studies.

SARS-CoV-2 Infection Produces vsmRNAs in HEK293T Cells
We detected 0.85% of reads that were mapped to the SARS-CoV-2 genome in SARS-CoV-2infected samples, consistent with the low percentage of vsmRNAs, most of which are <1% of total small RNA reads, in 41 hosts infected by six RNA viruses [35]. Except for a small portion of the vsmRNA that was matched to the -ssRNA genome or sgmRNA (-vsmRNA) ( Figure S1A), most of the vsmRNA sequences were matched to the +ssRNA genome or +sgmRNA (+vsmRNA) ( Figure S1B).
Most of these vsmRNAs were consistently present in both replicates of infection in HEK293T-hACE2 cells. Regardless of the polarity, there were many more vsmRNAs located in the region from the envelope (E) to ORF10 genes (Figure S1C,D). Next, we compared the presence of vsmRNAs in the dataset of Calu3-EW with the datasets of mouse brain organoids (BO), and in the second plasma dataset of COVID-19 patients [10,12,14]. We observed that vsmRNAs exhibited a similar presence in different replicates of the same infection, suggesting that these vsmRNAs are likely not randomly degraded viral RNA fragments. However, only some of them showed similar sequence read distribution across datasets of infections, including the datasets from Calu3-EW, BO, and COVID-19 patient plasma, and they are mainly +vsmRNAs ( Figure 3). We also checked potential bindings of RNA binding proteins (RNP) to these vsmRNA-encoding regions in the SARS-CoV-2 genome using the chromatin isolation by RNA purification (ChIRP) data from SARS-CoV-2-infected Vero-E6 and Huh7 cells. We would like to check whether these vsmRNA derivation regions are covered and protected by RNPs [29] ( Figure S2). The ChIRP data have a more robust coverage of the SARS-CoV-2 genome than smRNAseq data. There are almost 10 times more reads from ChIRP than that from smRNAseq for any region of the SARS-CoV-2 genome. However, the vsmRNA derivation regions with abundant vsmRNAs are not always regions that are heavily bound by RNPs, indicating that most vsmRNAs are not always derived from RNA fragments bound by RNPs and protected from degradation. Therefore, these vsmRNAs may have their own biogenesis pathways and SARS-CoV-2 infection could trigger the production of vsmRNAs. However, we did not observe a flat plateau-shaped distribution of the vsmRNAs in the read coverage tracks of the integrative genomics viewer (IGV), a typical feature of smRNAseq reads for miRNAs when the majority of reads mapped to a specific genome region have similar sequences. Instead, these vsmRNAs exhibited bell-shaped distributions, indicating that most of these reads have partially identical sequences and were unlikely generated in a precisely controlled manner, such as canonical miRNAs or siRNAs. The diversified vsmRNA sequences could result from inconsistent processing, the parental precursor RNAs have unstable in vivo structures, or they could be products of RNA degradation. genomics viewer (IGV), a typical feature of smRNAseq reads for miRNAs when the majority of reads mapped to a specific genome region have similar sequences. Instead, these vsmRNAs exhibited bell-shaped distributions, indicating that most of these reads have partially identical sequences and were unlikely generated in a precisely controlled manner, such as canonical miRNAs or siRNAs. The diversified vsmRNA sequences could result from inconsistent processing, the parental precursor RNAs have unstable in vivo structures, or they could be products of RNA degradation.   Table 1 for detailed information on infection and abbreviations of samples.
Next, to examine how vsmRNAs were processed and the potential function of vsmR-NAs, we compared vsmRNAs in the datasets of SARS-CoV-2-infected HEK293T-hACE2, BO-WT-CoV, and Calu3-EW, and chose three vsmRNAs that were present in all these datasets to look at their detailed sequences and test their function. Two of the vsmRNAs were derived from the N gene and one from the ORF10. We denoted them as vsmRNA-N1-5p, vsmRNA-N3-5p, and vsmRNA-ORF10-3p ( Figure S3A). They were chosen based on their high abundance, pattern of overlapped reads across different samples, and the sequences located in the arm region of a hairpin structure, which may favor Drosha/DGCR8 processing. The hairpin structure was predicted with LinearFold and mFold using the mature vsmRNA sequences, plus both upstream and downstream sequences ( Figure S3B). The vsmRNA-ORF10 is in a large hairpin structure that has been verified experimentally [36].
Analysis of the sequences of all three vsmRNAs revealed that they were processed at several specific sites. These processed sequences partially overlap, but unlike mature miRNA sequences that usually have one to three major forms and are mainly different at the 3 end, these vsmRNAs have more major isoforms and are different at both the 5 and 3 ends ( Figure S3C).
To examine whether these three vsmRNAs have silencing functions and have the potential to modulate viral infection, we used reporter assays to measure their silencing effects. To test whether vsmRNAs can target the −ssRNA genome or −sgmRNA, we generated reporters that carry the viral N gene sequence (+strand reporters) or its reverse complementary sequences (−strand reporters) in the 3 UTR of the Renilla luciferase gene (Rluc), then measured Rluc expression in the presence of either nucleus-localized (expressed from transfected plasmid DNA) or cytoplasm-localized (from transfected in vitro transcribed mRNA) N gene-3 UTR transcripts ( Figure S3D). We only observed mildly reduced expression in strand-specific reporters ( Figure S3E). To test whether the reduction was caused by antisense activity of reverse complementary sequences, or vsmRNAs derived from viral transcripts, or both, we constructed reporters carrying the short antisense or sense sequences of vsmRNA-5p and 3p for these three vsmRNAs. This time we only observed a very low reduction in reporter signal in the presence of viral transcripts, indicating that these viral transcripts may only have the antisense-mediated target repression activities. The low RNAi activities observed from these three vsmRNAs suggest that these viral transcripts may not be efficiently processed to yield high levels of vsmRNAs or that these vsmRNAs are not potent RNAi reagents ( Figure S3F).

The Biogenesis of vsmRNAs and vmiR-ORF7a
We looked into the datasets of SARS-CoV-2-infected mouse brain organoids that revealed the production of vsiRNAs with aviD processing [12]. Our reanalysis found only low-level expressions of vsiRNAs in all three SARS-CoV-2-infected samples, including infected wildtype brain organoids that express both endogenous Dicer and aviD (BO-WT-CoV), infected Dicer-null brain organoids that overexpressed full-length Dicer exogenously (BO-Dicer-CoV), and infected Dicer-null brain organoids that express aviD exogenously (BO-aviD-CoV). There were abundant vsmRNAs in these samples. There were 30 times more vsmRNAs in BO-WT-CoV than that in BO-aviD-CoV. The BO-WT-CoV had 1/3 of the total reads of the BO-aviD-CoV but had at least 10 times more vsmRNA reads. Furthermore, there was only a low level of SARS-CoV-2-derived vsmRNAs in BO-Dicer-CoV, similar to that in wildtype brain organoids infected by the Zika virus as a negative control (BO-WT-Zika) (Figures 4 and S4). These data support the intriguing idea that many virally derived RNA fragments are not favorable substrates of Dicer but can be efficiently processed by aviD to vsmRNAs, presumably because the endogenous aviD is much more efficient in processing than the exogenously expressed aviD or that the endogenous aviD is expressed at a much higher level than the exogenously expressed aviD. The Hel2i domain, which is deleted in aviD, is used by Dicer to bind TRBP, indicating a potential role of TRBP in the SARS-CoV-2 infection [37].
Among all SARS-CoV-2-derived vsmRNAs, vmiRNA-ORF7a has the best credibility because it was identified by smRNAseq in four cell lines and patient samples and was cloned independently by two groups [17,18]. We reanalyzed it in several datasets ( Figures 5 and S5).
Firstly, we found the flat-plateau read distribution of vmiR-ORF7a in datasets of both GC and JS but not others ( Figure 5A,B). Usually, bona fide miRNAs have precise 5 ends and their 3 ends usually have 1-3 isoforms that form a flat-shaped read distribution, and random degradation leads to products variable in length, and the read distribution usually forms a curved shape. To rule out RNA degradation in our samples, we compared several human miRNAs in our infected samples with datasets of A549-JS-24h and A549-GC-48h ( Figure S6). In both read distribution and read coverage, we observed our data and JS's data showed a perfect flat top shaped read distribution for eight human miRNAs, while GC's data may have some RNA degradation due to the extra step of size fraction for RNA ( Figure S6A-E).
Secondly, we found that the abundance of vmiR-ORF7a varied dramatically cell-bycell-line. It was most abundant in the A549-hACE2 cells and less abundant in the Caco2 cells. Moreover, both PC9 and Calu-3 cells, especially Calu-3 cells, a model cell line that has been extensively used for studying SARS-CoV-2 infection, had many fewer reads of miR-ORF7a, similar to their levels in SARS-CoV-2-infected HEK293T-hACE2 cells ( Figure S5A). Since SARS-CoV-2 can achieve robust infection in Calu-3, Caco2, HEK293T-hACE2, and A549-hACE2 cells [10,25], the differential abundance of vmiR-ORF7a in these cells suggests that the production of miR-ORF7a is not directly correlated with viral load in different cells.  Table 1

for detailed information on infection and abbreviations of samples.
Among all SARS-CoV-2-derived vsmRNAs, vmiRNA-ORF7a has the best credibility because it was identified by smRNAseq in four cell lines and patient samples and was cloned independently by two groups [17,18]. We reanalyzed it in several datasets ( Figures  5 and S5).  Table 1 for detailed information on infection and abbreviations of samples. Thirdly, we found that the abundance of vmiR-ORF7a was directly correlated with the MOI of infection within the same cell line, which suggests that the high level of vmiR-ORF7a was directly correlated with the high viral load at the initial infection stage in the same cell lines ( Figure 5A,C,D). It is worth noting that a high level of vmiR-ORF7a was already detected at 6 hpi in A549-hACE2 cells as shown in the A549-JS-6h dataset, similar to the level in HEK293T-hACE2 cells at 72 hpi, which indicates that vmiR-ORF7a can be produced in a large amount at a very early stage of infection, at least in A459-hACE2 cells, also supporting the idea that it may be processed from an early expressed, specifically transcribed sgmRNAs ( Figure 5B,C).
According to the read distribution pattern on other areas of the SARS-CoV-2 genome in datasets of GC and JS, we predict that there may be two more vmiRNAs encoded by ORF7a that are sequentially processed out from the same transcript as vmiR-ORF7a: vmiR-ORF7a-2 and vmiR-ORF7a-3. Both are downstream of vmiR-ORF7a and less abundant than vmiR-ORF7a. vmiR-ORF7a-2 was the fifth top cloned 22-nt read in the dataset of A549-GC-48h [17] (Figure 5E,F). However, in the predicted RNA structure, vmiR-ORF7a is located at the 5 arm of a hairpin structure, while vmiR-ORF7a-2 and vmiR-ORF7a-2 are not in a structure that would be a favorite Drosha/DGCR8 substrate ( Figure S7).  Table 1 for detailed information of infection and the abbreviations of the samples.

Discussion
How the host RNAi pathway responds to the SARS-CoV-2 infection is an important question that remains to be answered, because answers to this question may reveal additional molecular mechanisms underlying the SARS-CoV-2 infection, the COVID-19 development and progression, and post-acute sequelae of COVID-19. Furthermore, RNAi molecules have the potential to be developed into effective therapeutic tools for combating COVID-19 and future infectious pandemics. However, conflicting data have been produced using COVID-19 patient samples or cell lines, which have caused confusion in the field and cast shadows on the therapeutic potential of molecules identified from these findings. In the current study, we conducted SARS-CoV-2 infection in HEK293T-hACE2 cells and performed a comparative study of small RNA profiles from SARS-CoV-2-infected cell lines and patient samples.
Firstly, we found many small host RNAs, such as miRNAs, tRFs, and rRNAs, that were perturbed by SARS-CoV-2 infection. The increased levels of tRFs and rRNAs in SARS-CoV-2-infected cells suggest that SARS-CoV-2 infection may induce the cleavage of tRNAs and rRNAs. The increase in tRFs, rRNAs, and other small RNAs in relative amounts to the total small RNA population could contribute to the decrease in the relative amount of miRNA population in the total small RNA population.
Our data indicate that SARS-CoV-2 infection can affect the expression of specific miRNAs but it unlikely affects host miRNA expression globally, which needs to occur by perturbing the miRNA biogenesis machineries. In this sense, our smRNAseq data together with published smRNAseq data do not provide evidence to support the idea that SARS-CoV-2 encodes VSRs to affect host miRNA biogenesis, although it has been reported that Dicer, Ago2, and Drosha were downregulated in COVID-19 patients, while DGCR8 was upregulated [9,38]. It is unlikely that changes in these RNAi components have significant effects on miRNA biogenesis. It remains unknown whether SARS-CoV-2 encodes VSRs that can affect RNAi silencing function by binding to Argonaute proteins and associated factors to affect the activity of the RNA-induced silencing complex.
Our analyses indicate that the SARS-CoV-2 infection can induce changes in the expression of both hypoxia and immune response-related host miRNAs in a cell-or tissuedependent manner. Both miR-210-3p and miR-146 could contribute to the progression of COVID-19, thus, warranting further studies using patient samples.
The dysregulation of miR-210-3p expression in the SARS-CoV-2 infection was not consistent, both up and down regulation were observed. Whether the dysregulation of miR-210-3p expression in the infected cells and plasma of COVID-19 patients is a natural host response to hypoxia induced by the SARS-CoV-2 infection or whether it is correlated with the life-threatening hypoxemia symptoms in COVID-19 patients with severe cases remains to be investigated. It is possible that the expression of miR-210-3p is affected by the stage of the disease, the type of tissues, and/or the status of hypoxia conditions. miR-146 is the most consistently observed host miRNA that exhibits reduced expression in response to the SARS-CoV-2 infection. It was downregulated in cell lines and COVID-19 patient samples. Considering the well-characterized role of miR-146 in the anti-inflammatory and anti-immune responses, the reduced expression level of miR-146 in severe COVID-19 patients may explain, at least in part, the exacerbated immune response observed in these patients.
While there are multiple potential site-specific RNA-RNA interaction sites between the SARS-CoV-2 RNA and miR-30-5p, whether miR-30-5p can modulate SARS-CoV-2 infection remains to be validated experimentally. One thing that is clear is that miR-30 family members are expressed in many tissues and have a high expression level in the lungs, the primary tissue of the SARS-CoV-2 infection. Our data revealed that all the cell lines used for infection, including HEK293T-hACE2, A549-hACE2, Calu-3, Caco2, and PC9 cells, had high miR-30d-5p expression. Therefore, SARS-CoV-2 transcripts are likely to be exposed to miR-30-5p regulation in infected cells. The effectiveness of this regulation remains to be validated experimentally. It will be interesting to find out whether miR-210-3p and miR-30-5p miRNAs are also downregulated in COVID-19 patients because their downregulation could lead to reduced targeting of the SARS-CoV-2 genome, thus, diminishing their antiviral effect. In addition, a recent study showed that herpesviruses can downregulate multiple miR-30 family members to induce type I interferon response, which is necessary for the latency and reactivation of the virus [33]. Thus, the involvement of miR-30 family members in the host immune response to SARS-CoV-2 infection deserves further study in the future.
Secondly, we performed studies on SARS-CoV-2-derived vsmRNAs and found that many species of vsmRNA are derived from SARS-CoV-2 genomes or transcripts. We showed that the ORF3a to ORF10 region had the highest expression of vsmRNAs. The generation of these vsmRNAs is likely dependent on aviD expression. The existence of SARS-CoV-2-derived vsmRNAs suggests that the viral genome or transcripts can be subjected to Drosha/DGCR8 processing, which requires the redistribution of Drosha/DGCR8 into the cytoplasm or the transportation of the viral genome or transcripts into the nucleus, and implies that the host can use the RNAi machineries to destroy the viral RNA genome of SARS-CoV-2, supporting the idea that Drosha/DGCR8 can have antiviral effects on SARS-CoV-2 infection.
The discovery of aviD in mammalian cells has brought up many questions for future studies, including, for example, whether the induction or activation of aviD is specific to viral infection, how aviD affects the biogenesis of other small RNAs, and how the switch from Dicer to aviD processing is regulated through splicing. Answers to these questions may provide insights into virology in general and mechanisms and applications of RNAi in viral infections.
Thirdly, the studies of vmiR-ORF7a raise many unanswered questions regarding its biogenesis, tissue distribution, and biological function in SARS-CoV-2 infection. These ques-tions include, for example, whether pre-vmiR-ORF7a is processed out of the SARS-CoV-2 genome or sgmRNA or both, whether vmiR-ORF7a maturation is dependent on aviD or Dicer? Are different aviD expression levels in different cell lines or different infections correlated with different abundances of vmiR-ORF7a? Moreover, it will be interesting to know whether vmiR-ORF7a is present in COVID-19 patients across different tissues, at different disease stages, and how it affects the pathogenesis of COVID-19. If yes, whether it can be used as a therapeutic target.
It remains unclear why vmiR-ORF7a is only present at a high level in A549-hACE2 cells and Caco2 cells. One possible explanation is that vmiR-ORF7a is processed from specifically transcribed sgmRNAs in some cell types and the processing is independent of the abundance of the viral genome and mediated by Dicer directly. This explanation supports the idea that the biogenesis of vmiR-ORF7a is Drosha-independent but Dicerdependent [17,18]. Other possible explanations include that different cell lines may exhibit different degrading viral RNAs and that only some degraded viral RNAs can be processed by Dicer into vmiR-ORF7a. It is worth noting that the level of vmiR-ORF7a reads is very low in nasopharyngeal swabs of patients that tested positive for COVID-19 [17] and were not detectable in the second datasets of plasma. Since miRNA is much more stable than mRNA in general, it is possible that vmiR-ORF7a exists in the RNAemia of SARS-CoV-2 and contributes to the post-acute sequelae of COVID-19. More work is needed to define the expression and function of vmiR-ORF7a in SARS-CoV-2-infected cells and tissues, its role in the development of COVID-19, and its potential to be used as a diagnostic marker and/or a therapeutic target.

Materials and Methods
All methods were carried out in accordance with relevant guidelines and regulations.

Cell Lines
HEK293T (ATCC CRL-11268) cell line was obtained from the American Type Culture Collection (ATCC).
The HEK293T-hACE2 cell line was created by transiently transfecting HEK293T cells with an hACE2 expressing vector (Addgene #1786). Total RNA from SARS-CoV-2 that infected the HEK293T-hACE2 cell were prepared for smRNAseq.

SARS-CoV-2 Infection, Expansion, and Quantification of Viral RNA
SARS-CoV-2 virus expansion, preparation, and quantification of viral RNA genome were performed as previously described [25,39,40] and were performed in the BSL3 facility of UCLA with institutional biosafety approvals. The RNA samples for smRNAseq were the same samples as the SARS-CoV-2 infected HEK293T-hACE2 cells that were used for RNAseq [25]. Briefly, SARS-CoV-2, isolate USA-WA1/2020, was used at MOI of 1 (1 plaque forming unit per cell) to infect HEK293T-hACE2 cells for 72 h. Trizol reagent (Thermo Fisher, Waltham, Ma, USA) was used for total RNA isolation.

Plasmids and In Vitro Transcripts Were Used for Reporter Assay
All sequences of DNA fragments and primers are listed in Supplementary Table S1. siCheck reporters: The multiple clone site in siCheck2.0 (Promega) was modified to make siCheck2.4 to construct siCheck reporters. N sequence of SARS-CoV-2 was PCRcloned from the GFP reporter that carries the SARS-CoV-2 fragment-2, including the sequence of N [41]. The 3 UTR sequence was synthesized with IDT (integrated DNA technologies) as a gBlock fragment. N was first cloned to siCheck 2.4, then the gBlock fragment of 3 UTR was ligated to N to reconstitute the N-ORF10-3 UTR sequence in siCheck 2.4 to create siC-N. PCR was used to amply N-ORF10-3 UTR sequence in siC-N and reverse ligated into siCheck 2.4 to create siC-revN.
In vitro transcription (ivt): The 17-nt T7 promoter sequence was added to the 5 end PCR primers and amplified from siC-N to createt a template for ivt. AmpliScribe T7-Flash transcription kit (Epicentre) was used for ivt reaction according to the manufacturer's instruction. The reaction was performed using 1 µg template at 37 • C, for 2 h. The ivt control template in the kit was used to generate ivt control transcripts for reporter assays.
Luciferase reporter-based RNAi knockdown assays: For all siCheck-2.4-based reporters, SV40-driven Renilla luciferase (Rluc) was used as the primary reporter gene, with the sequence of interest cloned into the MCS in the 3 UTR region of Rluc. The HSK-TK promoter driven Firefly luciferase (Fluc) on the same vector was included as a control reporter. The expression of Rluc was normalized to the expression of Fluc (Rluc/Fluc). The percentage was calculated relative to the Rluc/Fluc ratio of the control plasmid or the ivt control transcripts.

Small RNA Deep Sequencing Dataset and Raw Data Processing
For HEK293T-hACE2 datasets: smRNAseq was carried out using a customized protocol [21,22]. Briefly, 100-1.0 µg of total RNA was used to construct smRNAseq libraries. smRNAseq was performed on a HiSeq 2500 (Illumina, San Diego, CA, USA). Single reads, flow cell cluster generation, and 51 cycles were used. The sequencing depth was 30 M reads per sample. To construct smRNAseq libraries, the 5 adaptor used in the Illumina Truseq smRNAseq protocol was replaced with a customized 5 adaptor that three random nucleotides at the 3 end of the 5 adaptors were added to in order to reduce sequencing bias [21,22]. The 3 adaptor provided in the kit was used for 3 end ligation and sample barcoding. Cutadapt [42] was used to remove the 5 and 3 adaptors from raw data files to generate unfiltered data that small RNAs of all lengths were kept. In order to compare with published datasets (GC and JS) that only contain small RNAs with a specific range of reading length, we also generated filtered 15-mer to 30-mer processed reads files for HEK293T-hACE2 datasets. For small RNA length filtered data, only reads with lengths between 15 nts and 30 nts were kept during the cutadapt processing step.
For Calu3-EW datasets from GSE148729: Raw data files were download from GEO [10]. Since there are no mock infection data available, infection at 4 h were used as control and infection at 12 h and 24 h were pooled together as infected samples.
For Mouse brain organoids datasets: All six sets of infection data of SARS-CoV-2 and Zika were downloaded from GEO (GSE173946) [12]. Due to the fact that there is no replicate for these datasets and there are mouse datasets, no miRNA analysis was performed on these datasets. These datasets were only used for viewing vsmRNA and datasets for Zika were used as control for SARS-CoV-2.
For datasets from GC: Raw sequence files of mock infection and infection (24 hpi and 48 hpi) for A549-GC and Caco2-GC from GSE162318 were downloaded from GEO.
For datasets from JS: Raw sequence files for mock infection and infection (6 hpi, 24 hpi, and 48 hpi) for A549-JS, Calu3-JS, and PC9-JS from GSE183280 were downloaded from GEO.

smRNAseq Data Analysis (Length Distribution, Reads Count, Annotation)
Most of the above raw smRNAseq read files were first processed with Cutadapt then analyzed with miRge 2.0 to generate files with reads length distribution and reads count [43]. HEK293T-hACE2 samples were also analyzed with miRge 3.0 to generate files with reads length distribution and reads count [44].

Small RNA qRT-PCR
A published S-Poly(T) small RNA qRT-PCR detection protocol was adapted for small RNA qRT-PCR. The protocol was modified for multiplex in the reverse transcription step [22,45]. Briefly, 100 ng total RNA was used. We first added poly(A) tail to the RNA using a poly(A) tailing kit from Epicentre (Madison, WI, USA). Since miR-10a-5p or miR-484 had similar abundance in the four HEK293T-hACE2 samples, we chose miRNA-10a-5p as an RNA sample loading control to normalize relative expression of other small RNAs, and miR-484 was used to validate similar miRNA-10a-5p expression levels across different samples. Primers for reverse transcription for all 10 small RNAs were pooled and were added to the same reverse transcription reaction. Primers used for reverse transcription of a specific small RNA were designed according to the most abundant isoforms of a given small RNA in smRNAseq data. All oligoes were ordered from IDT. Universal TaqMan probe and PCR mixtures were employed as published [45]. The ∆Cq (quantitation cycle) value of a small RNA relative to miR-10a-5p in each sample was calculated as Cq small RNA -Cq miRNA-10a-5p . The ∆∆Cq value of a small RNA represents the ∆Cq of small RNAs in SARS-CoV-2 infected samples-∆Cq of small RNAs in mock infected samples. The value of 2 −∆∆Cq was calculated as the relative fold change of a small RNA in infected samples versus that in mock infected samples. All primers for small RNA qRT-PCR are listed in Supplementary Table S1.

SARS-CoV-2 ChIRP Data Analysis
All six datasets ChIRP data for Vero E6 and Huh7 cells [29] were downloaded from GEO (GSE167341). Subreads aligner (version 2.1) [46] was used to align reads to SARS-CoV-2 genome to generate BAM files. IGV (version 2.12.2) tools [47] was used to sort and index BAM files for IGV track loading and viewing. After comparing all six datasets, data from replicate #1 of Vero E6 cells was used in all related figures.

DESeq2 Analysis of Differtially Expressed Small RNAs
Raw (unnormalized) miRNA read counts generated by miRge2 or 3 were used as input for the Bioconductor package DESeq2 (version 1.30.0) in R (version 4.1/4.2) environment to produce the differentially expressed small RNA results (infected versus mock) at the default setting. The p-value for differential gene expression was assessed with the Wald-test method. The Benjamini and Hochberg approach was used to obtain the adjusted p-value (p-adj) [48].

IGV View of vsmRNAs
Cutadapt processed smRNAseq data files were first aligned to SARS-CoV-2 reference (NC-045512.2) using Bowtie (version 1.3) to generated SAM files. Samtools was used to convert SAM files to BAM files, sort BAM files, and generate BAI files [47,49]. Bowtie2 (v2.4.5) was used to align reads to hg19 genome to view the human miRNA read distribution by IGV, as shown in Figure S4.

Other Bioinoformatics Tools Used for Data Analysis and Plots
R (version 4.0 to 4.2), R studio (version 2022.07.1), and many R or Bioconductor software (version 3.11 to 3.15) packages were used for data analysis and visualization. R package of EnhancedVolcano (https://github.com/kevinblighe/EnhancedVolcano, accessed on 5 October 2022) was used to generate the volcano plots.

Statistics
Inferential statistics of differential small RNA expression in virally infected samples versus mock-infected samples were calculated with the default statistical settings in the DESeq2 as stated in the DESeq2 analysis section. Descriptive statistics of two different datasets were calculated with a two-way ANOVA followed by a post-hoc test (multiple unpaired t-tests). Unless specified in the figure legends, p-adj < 0.05 for DESeq2 data or p-value < 0.05 for others, was used as the cutoff for significance in gene expression. The error bar is expressed as the standard deviation.  Data Availability Statement: All raw and some processed small RNA sequencing data generated in this study for HEK293T-hACE2 cells were deposited to the NCBI Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/, accessed on 5 October 2022) under the accession num-ber GSE189920. Reanalyzed public data were provided in Supplementary Files. Materials and Correspondence. Correspondence and requests for materials should be addressed to G.S. or Y.S.