Viruses in Laboratory Drosophila and Their Impact on Host Gene Expression

Drosophila melanogaster has one of the best characterized antiviral immune responses among invertebrates. However, relatively few easily transmitted natural virus isolates are available, and so many Drosophila experiments have been performed using artificial infection routes and artificial host–virus combinations. These may not reflect natural infections, especially for subtle phenotypes such as gene expression. Here, to explore the laboratory virus community and to better understand how natural virus infections induce changes in gene expression, we have analysed seven publicly available D. melanogaster transcriptomic sequencing datasets that were originally sequenced for projects unrelated to virus infection. We have found ten known viruses—including five that have not been experimentally isolated—but no previously unknown viruses. Our analysis of host gene expression revealed that numerous genes were differentially expressed in flies that were naturally infected with a virus. For example, flies infected with nora virus showed patterns of gene expression consistent with intestinal vacuolization and possible host repair via the upd3 JAK/STAT pathway. We also found marked sex differences in virus-induced differential gene expression. Our results show that natural virus infection in laboratory Drosophila does indeed induce detectable changes in gene expression, suggesting that this may form an important background condition for experimental studies in the laboratory.

Virus infection may be associated with dramatic changes in gene expression.First, infections trigger host signalling cascades that eventually alter the expression of host immune effector molecules.For example, wild-type D. melanogaster infected with Drosophila C Virus (DCV) displays increased expression of genes such Spaetzle (Spz) and Drosomycin (Drs) that encode a cytokine and an antimicrobial peptide, respectively, and which are important in the toll pathway [9].There is also increased expression of the immune-associated genes, CG12780, vir-1, and listericin (CG9080) [10].Second, the expression of non-immune genes changes as a consequence of infection, either through viral manipulation of the host or through the consequences of disease.For example, female flies infected with Drosophila Kallithea Nudivirus display decreased expression of chorion proteins as they cease laying eggs [11].Much of what is known about the Drosophila expression response to viral infection has come from transcriptome profiling studies e.g., [11][12][13][14].While a minority of studies use natural infection routes, most rely on systemic viral infections established through microinjection (or septic puncture) into the body cavity.However, the infection route of pathogens can substantially affect the outcome of infection, and can trigger different antiviral responses [15][16][17].This may in part be because injection bypasses immune responses at the natural site of pathogen entry, such as the cuticle, trachea, gut, and reproductive organs.For example, the Toll pathway appears necessary for resistance to oral DCV infection via activation of the transcription factor Dorsal, whereas it is not apparently required in resistance to systemic DCV infection [18].The route of infection is also important in shaping viral tropism, host response, and pathology.For example, DCV infection is almost 100% lethal when injected but causes only 10-25% mortality when flies are infected orally-even with very high viral titres [18,19].Remarkably, oral DCV infections at the larval stage are reported to protect flies during reinfection by injection as adults [16].In contrast, it has been reported that flies injected with a sublethal dose of DCV are not protected against subsequent DCV injection, indicating the importance of oral infection in priming the immune response [20].
A relative lack of natural virus isolates may also limit studies of host-virus interaction using the Drosophila model.Such interactions can be highly host-specific [21]; for example, it has been suggested that the Virus protein 1 (VP1) of Drosophila immigrans nora virus is unable to suppress the antiviral RNAi pathway in D. melanogaster, whereas it can suppress it in D. immigrans [22].However, many studies in Drosophila have used non-native viruses such as Flock House Virus (a beetle virus, e.g., [2]), Sindbis virus (a mosquito alphavirus, e.g., [23]), Invertebrate Iridescent Virus 6 (a moth iridovirus, e.g., [24]), and Cricket Paralysis Virus (a moth dicistrovirus, e.g., [25]).
Some of these viruses are known to occur in laboratory flies and cell culture.The most common viruses reported from laboratory fly stocks include DCV, nora virus, DAV, Newfield virus, DMelSV, and Thika virus [29].In addition, Drosophila X virus, Drosophila Totivirus, Newfield virus, American Nodavirus, Bloomfield virus, nora virus, DCV, and DAV have all been found laboratory cell cultures [29,36].The widespread occurrence of such viruses in experimental stocks raises the question of whether changes in gene expression induced by these viruses can impact laboratory experiments.For example, viruses can affect fecundity and shorten development time and lifespan of Drosophila [11, [37][38][39] and can change fruit fly behaviour and mobility [11, [39][40][41], which may negatively impact the interpretability of life history and developmental studies.
Here, we survey seven publicly available adult transcriptome project datasets from laboratory D. melanogaster to quantify the prevalence of viruses in experimental studies, and to assess the impact of viruses on patterns of host gene expression.As these infections were incidental and unintended by the original authors, they reflect natural laboratory infection routes and host-virus combinations.The resulting changes in gene expression are suggestive of a reduction in host reproductive investment, and nora-virus-induced gut pathology and host repair.These results imply that background changes in gene expression due to viral infection may be relevant to laboratory experiments.

Data Sources
We initially selected nine D. melanogaster RNAseq 'projects' that each comprised at least 130 sequencing libraries, and downloaded them from the European Nucleotide Archive [42], Table 1).The selected datasets reflected a range of original experimental purposes.For example, an exploration of natural genetic variation in expression regulation in the Drosophila Genetic Reference Panel (DGRP) [43,44], the relationship between genotype and circadian gene expression [45], the utility of different bioinformatics pipelines [44], the impact of lead exposure on expression across development [46], and the role of selection in sex-biased gene expression [47].Two projects lacked any evidence of viral infection (PRJNA527284 and PRJNA52728), and these were excluded.Two projects comprised a mix of developmental stages and/or cell culture (PRJNA305983 and PRJNA75285), and from these, we retained only adult data.

Virus Detection and Quantification of Host Expression
To identify unknown and potentially novel viral infections, we de novo assembled non-fly reads.We first excluded any reads mapping to D. melanogaster or to well-studied contaminating cellular organisms such as bacteria, fungi, trypanosomatids, and microsporidia (a 'hologenome' as reported in [58]) using Bowtie 2 [59].We then assembled the remaining read pairs using Trinity [60], and retained all scaffolds with a length of at least 500 nt.We grouped the resulting scaffolds into clusters meeting at least a 95% sequence identity threshold using CD-HIT [61].Cluster representatives were then used to search against a custom database using Diamond blastx [62], retaining clusters with a score at most 5% lower than the best alignment score for each query.This custom target database comprised all viral proteins from the Genbank non-redundant (nr) protein database [63] and all the prokaryote, protist, fungal, nematode, hymenopteran, and dipteran sequences from NCBI refseq protein database [64].Contigs from each virus were then assembled together using Geneious Prime.These manually curated viral references were then used as targets for viral quantification by mapping using Bowtie 2 [59].To mitigate the potential impact of barcode-switching [65], each virus was considered to be present in a library if the number of reads from that virus was at least 1% of the number in the library with the highest number of reads from that virus.We additionally applied a minimum threshold of 150 reads, chosen as this threshold reduced the inconsistency between duplicate samples (Figure S1).For comparison, we also estimated virus presence/absence in each library by estimating the rate of barcode switching based on sex-specific genes, but this approach gave similar results (Document S1).

Association between Coinfecting Viruses in Each Dataset
To test if there is an interaction between co-infecting viruses, we generated a contingency table of the total number of infected and uninfected libraries for each pair of viruses in each dataset.We then tested for statistical significance of association between the viruses using Fisher's exact test in the R stats package [69].

Virus Prevalence and Diversity
We quantified virus prevalence as the proportion of sequencing libraries within each published project for which the virus read number passed our presence/absence threshold (selected based on the inferred barcode switching rate, above).We then assumed binomial sampling to obtain confidence intervals for the proportion of affected libraries in each project.To explore the phylogenetic relationships between laboratory viruses and previously published virus sequences, we selected the sample within each project that had the highest virus read number for each virus.For galbut virus, which has two distinct strains, we selected the sample with the highest read number for each haplotype.We then performed phylogenetic analyses using the sequences from these samples and incorporating all the other sequences from each virus available in GenBank [63].For analysis of segmented viruses, we selected the segment that had the most examples in genbank.The number of sequences available varied between 3 (Brandeis virus) and 144 (DMelSV), and the aligned sequence length varied between 1.4 kb (galbut virus) and 11.3 kb (nora virus).DNA sequences were aligned using MAFFT [70], and we inferred maximum likelihood trees using iqtree version 2 [71], using the best supported substitution model according to the Bayesian information criterion.Note that recombination is common in many of these viruses [29], such that trees should be interpreted in terms of overall similarity rather than relationship, and that branch lengths will not be proportional to divergence time.All alignments, model details, and trees are presented in (Zip File S2).

Differential Host Gene Expression
We performed a differential gene expression analyses using 'Dream' [72], an R package [69] for gene expression analysis in R that permits the use of mixed effect models.Each virus was analysed separately by combining all the project datasets in which it was detected, treating the presence/absence of each virus in each library as a fixed effect in the differential expression model.Host sex was also fitted as a fixed effect, as was its interaction with the presence/absence of the virus.The genotype and dataset project codes were combined and fitted as a random effect to account for differences between genetic background and laboratory environment.If different treatment methods such as lead exposure, mating statuses, or tissues were reported, these were also combined and included in the random effect.If both sexes were present, uninfected females were treated as the reference condition for comparisons.For example, if two viruses were present in a study, and both sexes were sampled from multiple genotypes and tissues, our analysis model would be: If the virus was present in only one sex in a mixed-sex project dataset, or the infected project datasets had only one sex, then the sex term and its interaction with virus were excluded.If a virus was present in only one dataset, then the dataset term was excluded from the model, and if the dataset had only one genotype, then the genotype term was also removed accordingly.Fly age was not incorporated into the models because these datasets are from published studies that do not report fly age.Cut-off log-fold changes, logFC > 0.5 and logFC < −0.5, were used to define increased and decreased expression, respectively, and a Benjamini-Hochberg adjusted p-value < 0.001 (to account for multiple testing) was used to indicate statistical significance.The genes with significant changes in expression from the differential expression analysis were subsequently analysed for gene ontology enrichment using VISEAGO [73], which depends on TOPGO [74].

Correlation Analysis
To compare the effects on the host by viral infection post hoc, we calculated a Pearson correlation matrix between the estimated changes in gene expression in response to viruses and their interactions with sex using the rcorr function in the Hmisc package [75] in R [69].

Ten RNA Viruses Were Detectable in Laboratory RNAseq Datasets
We examined a total of nine publicly available RNAseq projects, but two of these contained no viral reads and were excluded from further analyses (Table 1).One project (PRJNA258012) contained reads from Flock House virus, Newfield virus, Drosophila Totivirus, and Drosophila X Virus (DXV).All of these viruses are common cell culture contaminants [29], and three have only previously been reported from cell culture.In addition, read numbers were very highly correlated among these viruses across sequencing libraries (r ≥ 0.96, p-value = 2.2 × 10 −16 ), consistent with a common source (Figure S2).We therefore chose to exclude these viruses from further analysis, as we believe they are likely to represent RNA contamination or barcode hopping from unreported cell culture libraries, rather than infections of the sequenced flies.Neither of the remaining two viruses seen in this project have been reported from cell culture.After excluding these putatively contaminating reads, we detected a total of 10 viruses with a high level of confidence, i.e., at high copy number in multiple libraries.These included galbut virus (Partitiviridae), Motts Mill virus (Solemoviridae), DAV (Permutotetraviridae), DCV (Dicistroviridae), nora virus (unclassified Picornovirales), Dmel sigma virus (Rhabdoviridae), Bloomfield virus (Reoviridae), Craigie's Hill virus (Nodaviridae), Thika virus (unclassified Picornavirales), and Brandeis virus (cf.Negevirus).Out of a total of 2586 libraries across the seven datasets analysed, 625 (24%) were infected with only one virus and 104 (4%) were infected with two or more viruses.Using a p-value < 0.01 as a test of significance, we found that co-infecting viruses were independent of each other in the datasets used in this study, except for galbut virus and Motts Mill virus in PRJNA258012 (p-value = 3.4 × 10 −5 ) and PRJNA281652 (p-value = 3.2 × 10 −5 ), DAV and Thika virus in PRJNA261333 (p-value = 5 × 10 −3 ), DAV and DCV in PRJNA281652 (p-value = 5 × 10 −4 ), and DAV and nora virus in PRJNA518903 (p-value = 4 × 10 −4 ).However, the number of coinfections was small; for example, in PRJNA258012, 49 (6.8%), and 171 (23.9%) libraries were infected with Motts Mill virus and galbut virus, respectively whilst one (0.1%) library was infected with both viruses.Similarly, in PRJNA281652, 45 (9.8%) and 79 (17.2%) libraires were infected with DCV and DAV, respectively whilst 17 (3.7%)libraries were infected with both viruses.In PRJNA518903, 10 (1.3%) and 31 (4%) libraires were infected with DAV and nora virus, respectively whilst four (0.5%) libraries were infected with both viruses.Finally, in PRJNA261333, 24 (6.1%) and 39 (9.8%) libraires were infected with DAV and Thika virus, respectively whilst seven (1.8%) libraries were infected with both viruses.Thus, the linear model should successfully disentangle the effects of the viruses, and any (unmodelled) interaction between viruses would only add to the residual variance.Some samples displayed an extremely high virus titre (Zip File S1).Most extreme was library SRR3654766 (heads from lead-treated males), in which ca. 70% of non-ribosomal RNA reads derived from DCV.Additionally, at an occasionally very high level were nora virus and DAV, which constituted 46% of non-ribosomal reads in SRR8522465 and 25% in SRR8522473, respectively (adult guts from DGRP lines 309 and 359).The other viruses generally had a much lower copy-number, with Motts Mill virus reaching a maximum of 4.8% of non-ribosomal RNA reads (SRR7620105; adult males from DGRP line 93), Brandeis virus 3.8% (SRR3654657; heads from adult males), Thika virus 2.3% (SRR1577470; adult male Df(2L)ED247/+ flies), and Bloomfield virus 2.4% (SRR8522432; adult guts from DGRP line 380).
Of these 10 viruses, nora virus had the highest prevalence, detectable in five projects, with a prevalence of up to 53% (PRJNA261333; Figure 1).In PRJNA518903, 100% of the noravirus-infected libraries were gut tissue-consistent with its known gut tropism [76].DAV had the second highest prevalence, occurring in four projects with a prevalence of up to 17% (PRJNA281652).The rarest viruses were Craigie's Hill virus, Bloomfield virus, DMelSV and Brandeis virus, each appearing in one project at prevalences as low as 0.3% (Bloomfield virus in PRJNA518903; Figure 1).The two viruses that are thought to be exclusively vertically transmitted (DMelSV and galbut virus; [35,77] were only found in DGRP lines, consistent with those lines' relatively recent wild origin.And, consistent with previous studies of galbut virus [29], we detected two distinct strains with pairwise sequence identity of 85% in DGRP projects PRJNA258012 and PRJNA483441.It is noteworthy that no laboratory isolates have been published for Thika virus, Motts Mill virus, Brandeis virus, Craigie's Hill virus, and Bloomfield virus, potentially allowing us to analyse Drosophila expression in response to these pathogens for the first time.

Laboratory Viruses Are Closely Related to Each Other
Our phylogenetic analyses showed that although laboratory viruses were very rarely identical across projects, viruses from different projects sometimes clustered together with each other and previously published laboratory isolates.This was most notable for DAV, DCV, and Nora virus (Figure S3A-C), and may suggest that there are clades of these viruses circulating in the laboratory environment.The same may be true for Motts Mill virus, as the four sequences formed two near-identical pairs, but previous laboratory isolates have not been reported (Document S2).Prevalence is given as a percentage of libraries in which the virus was detected, plotted on a log scaled axis for clarity, with 95% confidence intervals assuming binomial sampling.

Laboratory Viruses Are Closely Related to Each Other
Our phylogenetic analyses showed that although laboratory viruses were very rarely identical across projects, viruses from different projects sometimes clustered together with each other and previously published laboratory isolates.This was most notable for DAV, DCV, and Nora virus (Figure S3A-C), and may suggest that there are clades of these viruses circulating in the laboratory environment.The same may be true for Motts Mill virus, as the four sequences formed two near-identical pairs, but previous laboratory isolates have not been reported (Document S2).Prevalence is given as a percentage of libraries in which the virus was detected, plotted on a log scaled axis for clarity, with 95% confidence intervals assuming binomial sampling.

Many Laboratory Virus Infections Did Not Induce Detectable Changes in Gene Expression
We did not detect any significant changes in gene expression in response to DMelSV (Spreadsheet S1; Figure S4), and most of the expression changes in response to Bloomfield virus (Figure S4), galbut virus, and DCV were similarly not significant (Figure 2).Tret1-2, which encodes a sugar transporter [78], was the only gene with a significant change in expression in response to DCV, and dpr6, CG4676, and Apl were the only genes with significant increase in response to galbut virus infection.Defective proboscis extension response 6 (dpr6) is involved in synapse organisation [79] whilst CG4676 and Apl are involved in protein transport and localisation [80,81].In Bloomfield-virus-infected flies, only three genes, Ets21C, CG16995, and CG18649, were significantly upregulated (Figure S4).Ets at 21C is involved in response to stress such as infection or oncogene activation, CG16995 is involved in sexual reproduction, while the function of CG18649 is still unknown.
We did not detect any significant changes in gene expression in response to DMelSV (Spreadsheet S1; Figure S4), and most of the expression changes in response to Bloomfield virus (Figure S4), galbut virus, and DCV were similarly not significant (Figure 2).Tret1-2, which encodes a sugar transporter [78], was the only gene with a significant change in expression in response to DCV, and dpr6, CG4676, and Apl were the only genes with significant increase in response to galbut virus infection.Defective proboscis extension response 6 (dpr6) is involved in synapse organisation [79] whilst CG4676 and Apl are involved in protein transport and localisation [80,81].In Bloomfield-virus-infected flies, only three genes, Ets21C, CG16995, and CG18649, were significantly upregulated (Figure S4).Ets at 21C is involved in response to stress such as infection or oncogene activation, CG16995 is involved in sexual reproduction, while the function of CG18649 is still unknown.B-I).The virus names and total number of variables (genes) used in each expression analysis is shown.A significance threshold of p < 0.001 was used and a logFC cutoff of 0.5.The genes in red are both nominally significant and have a 0.5 < logFC < −0.5.

Motts Mill Virus and Craigies Hill Virus Affected the Expression of Genes Related to Development
Of the 454 genes that displayed a logFC of more than 0.5 in response to Craigies Hill virus infection, only 4 were nominally significant (adjusted p-value threshold of 0.001; Figure 2).The significantly upregulated genes in response to Craigies Hill virus infection included esc and wus, which are involved in development [82,83], and CG31704 that is involved in sexual reproduction [84].Microtubule-associated protein 205 (Map-205) that is involved in mitosis [85], was the only significantly downregulated gene in response to Craigies Hill virus infection.
Of the 1865 genes that showed a logFC of more than 0.5 in response to Motts Mill virus infection, only 7 were nominally significant (adjusted p-value threshold of 0.001; Figure 2).The upregulated genes in response to Motts Mill virus infection include Lsp1alpha and Fbp1 that encode macromolecular components [86,87], SdicC that is involved in microtubule transport, and tipE that enhances para sodium ion channel function and is required during pupal development to rescue adult paralysis [88].Of the 3275 genes that showed displayed a logFC of less than 0.5 in response to Motts Mill virus infection, only 2 were nominally significant (adjusted p-value threshold of 0.001; Figure 2).These two downregulated genes were lncRNA:CR45631 and CG11263, which encode a long, non-coding RNA and an RNA binding protein.

Brandeis Virus and Thika Virus Affected the Expression of Genes Related to Reproduction and Immunity
Out of 1082 genes that showed a logFC of more than 0.5 in response to Brandeis virus, 14 were nominally significant (adjusted p-value threshold of 0.001; Figure 2).These included CG31704 and CG3349, which are involved in male sexual reproduction [84,89].Other significantly upregulated genes in response to Brandeis virus include SPH93 and Ir56b, which take part in host antimicrobial defence [90] and the response to chemical stimuli, respectively [91].There were no significantly downregulated genes in response to Brandeis virus.
Out of 240 genes that showed a logFC of more than 0.5 in response to Thika virus infection, 3 were nominally significant (p-value threshold of 0.001; Figure 2).These were CG12970, which is involved in the STING antiviral response [3] and Drsl2, which encodes a peptide with homology to the antifungal peptide encoded by Drs [92].Another gene, Sox21a, which is involved in stem cell differentiation in the midgut, was also significantly upregulated in response to Thika virus infection [93].Out of the 394 genes that displayed a logFC of more than −0.5 (adjusted p-value threshold of 0.001; Figure 2) in response to Thika virus infection, 5 were nominally significant.Most of the significantly downregulated genes in response to Thika virus infection are involved in sexual reproduction.These included genes encoding chorion proteins such as CG4066, CG12716, and Mur11Da [94], and Abd-B, which is involved in regulating post-mating responses in females [95].

The Enteric Viruses Nora Virus and DAV May Trigger Host Innate Immune Response and Gut Epithelium Repair
Of the 340 genes that showed a logFC of more than 0.5 in response to nora virus infection, 330 were significant (adjusted p-value threshold of 0.001; Figure 2).Many of the significantly upregulated genes in response to nora virus infection are involved in midgut stem cell differentiation.These include Ser12, which is predicted to encode a protein implicated in wound healing [96], and Ptx1, Sox21a, and GATAe, which encode proteins implicated in differentiation in the midgut stem cell [93,97,98].Unpaired3 (upd3), which is involved in gut epithelium repair via the JAK/STAT pathway was also significantly upregulated in response to nora virus infection.Numerous immune response genes were significantly upregulated in response to nora virus infection.These include Nazo, which encodes an antiviral effector protein that is expressed downstream of Sting and relish antiviral response [99], DptA and DptB, which both encode antimicrobial peptides regulated by the ImD pathway [100,101], and Mtk, which encodes an antifungal peptide regulated by the ImD and Toll pathways [102].
Of the 314 genes that showed a logFC of less than −0.5 in response to nora virus infection, 313 were significant (adjusted p-value threshold of 0.001; Figure 2).These significantly downregulated genes include numerous genes that encode structural constituent of the chitin-based larval cuticle such as Lcp65Ac, TwdlR, and TwdlS [103][104][105].There was also a downregulation of genes involved in mitochondrial function such as Mics1 and fzo, which encode proteins that enhance oxidative phosphorylation and enable the fusion of the mitochondrion during spermatid differentiation, respectively [106,107].
Of the 43 genes that showed a logFC of more than 0.5 in response to DAV, 12 were significant (adjusted p-value threshold of 0.001; Figure 2).Antimicrobial genes such as Srg1, CG6429, and TotM were significantly upregulated in response to DAV infection.Srg1 is involved in STING antiviral signalling, while CG6429 and TotM are predicted to be involved in immune response [108,109].Of the 397 genes that showed a logFC decrease of more than −0.5 in response to DAV infection, 25 were significant (adjusted p-value threshold of 0.001; Figure 2).Most of the significantly downregulated genes in response to DAV were long, non-coding RNA such as lncRNA:CR45910 and lncRNA:CR44953, with unknown function.In response to DAV, many genes predicted to be involved in fatty acid-CoA metabolism such as CG31091 [110], CG6300 [111], and Traf-like [112] were significantly downregulated.We also observed a significant downregulation of Peritrophin-15a and E(spl)malpha-BFM, which are involved in chitin binding and development [113] and cell fate specification and sensory organ development via Notch signaling [114].

Sex-Virus Interaction
The interaction between virus and sex could only be inferred for viruses that were present in both sexes, namely galbut virus, Motts Mill virus, nora virus, DCV, DAV and Thika virus (Spreadsheet S2; Figure 3).The effect of sex on the host response to DCV was not significant for any genes in our analysis.However, in the flies infected with DAV, Thika virus, Motts Mill virus, galbut virus, and nora virus, the effect of sex on virus infection induced significant changes in the expression of 1, 4, 83, 95, and 1504 genes, respectively (adjusted p-value threshold of 0.001; Figure 3).Only MFS9, which is predicted to be involved in transmembrane anion transport [115], was significantly downregulated in males infected with DAV.Four genes, Mur11Da, CG4066, CG31661, and CG6508, were significantly upregulated in Thika-virus-infected males.Mucin related 11Da, CG4066, and CG31661 are predicted to be involved in chorion eggshell assembly [94] and CG6508 is predicted to enable endopeptidase activity [116].
In galbut-virus-infected males, there was a significant upregulation of numerous genes involved in cuticle development such as Lcp4 and Cpr65Ec [104] and immune response such as AttC, which shows homology to several antimicrobial peptides [117].In galbutvirus-infected males, there was also a significant downregulation of male reproductive genes such as Sfp24Ba and Acp76A [84], metal ion transport genes such as dpr3 [118], and several long, non-coding RNA genes (Figure 3).In Motts-Mill-virus-infected males, there was a significant upregulation of genes involved in the JAK/STAT pathway, such as GILT2 and GILT3 [119]; and the Imd pathway, such as Def and IKKbeta ( [120,121]; Figure 3).There was also a significant upregulation in the expression of tipE, which encodes a protein that enhances sodium ion channel function [88] and a significant downregulation of genes implicated in development such as otk2 and lov in Motts-Mill-virus-infected males [122,123].
In nora-virus-infected males, there was a significant upregulation of genes such as Mur82C and Cpr56F, which are predicted to encode structural components of extracellular matrix ( [124,125]; Figure 3).There was also a significant upregulation of genes involved in systemic immune response such as such as CecA1 and Def, which encode an antimicrobial peptide regulated by the ImD and Toll pathways [120] and male reproductive genes such as Sfp84E and Sfp33A2 [84,126].Nora-virus-infected males displayed a significant downregulation of immune genes such as Ser12, which is involved in wound healing [127] and Send2, which encodes a protein stored in the seminal receptacle and which is predicted to be involved in proteolysis [128].The genes, narya and ImpE2, which are involved in DNA repair and embryogenesis, respectively, were also significantly downregulated in males infected with Motts Mill virus [129,130].
bial peptide regulated by the ImD and Toll pathways [120] and male reproductive genes such as Sfp84E and Sfp33A2 [84,126].Nora-virus-infected males displayed a significant downregulation of immune genes such as Ser12, which is involved in wound healing [127] and Send2, which encodes a protein stored in the seminal receptacle and which is predicted to be involved in proteolysis [128].The genes, narya and ImpE2, which are involved in DNA repair and embryogenesis, respectively, were also significantly downregulated in males infected with Motts Mill virus [129,130].

Figure 3.
Sex-virus interaction effects on host gene expression.Using uninfected females as baseline, the volcano plots show the sex interaction with viral infection (A-F), that is, the difference in expression between virus-infected males and the expression that would be predicted from the combined effects of sex and infection alone.The virus names and total number of variables (genes) used in each expression analysis are shown.A significance threshold of p < 0.001 was used as well as a logFC cutoff of 0.5.The genes in red are both significant and have a 0.5 < logFC < −0.5.

Changes in Gene Expression Were Positively Correlated between Flies Infected with Galbut Virus, Nora Virus, and Motts Mill Virus
To determine whether changes in gene expression in response to virus challenge were consistent across viruses, we compared the inferred changes between each of the  Sex-virus interaction effects on host gene expression.Using uninfected females as baseline, the volcano plots show the sex interaction with viral infection (A-F), that is, the difference in expression between virus-infected males and the expression that would be predicted from the combined effects of sex and infection alone.The virus names and total number of variables (genes) used in each expression analysis are shown.A significance threshold of p < 0.001 was used as well as a logFC cutoff of 0.5.The genes in red are both significant and have a 0.5 < logFC < −0.5.

Changes in Gene Expression Were Positively Correlated between Flies Infected with Galbut Virus, Nora Virus, and Motts Mill Virus
To determine whether changes in gene expression in response to virus challenge were consistent across viruses, we compared the inferred changes between each of the viruses (Figure 4).The magnitude of such correlations was generally small, with the exception of the expression changes induced by galbut virus, DAV, and Motts Mill virus.Changes in gene expression were significantly positively correlated (r = 0.53, p-value < 0.001) between the DAV, galbut virus, and Motts Mill infection, although none were significant in all three (Figure 4).Changes in gene expression were significantly positively correlated (r = 0.53, p-value < 0.001) between the galbut:sex and Motts Mill virus:sex.That is, the effect of sex on virus infection induced similar changes in the expression of 209 genes in galbutvirus-and Motts-Mill-virus-infected flies, with four genes being nominally significant in both infections.These genes include three long, non-coding RNA, lncRNA:CR40465, lncRNA:CR44631, lncRNA:CR44289, and Yp2, which encodes the major yolk protein of eggs.
ception of the expression changes induced by galbut virus, DAV, and Motts Mill vi Changes in gene expression were significantly positively correlated (r = 0.53, p-val 0.001) between the DAV, galbut virus, and Motts Mill infection, although none were nificant in all three (Figure 4).Changes in gene expression were significantly positi correlated (r = 0.53, p-value < 0.001) between the galbut:sex and Motts Mill virus:sex.T is, the effect of sex on virus infection induced similar changes in the expression of genes in galbut-virus-and Motts-Mill-virus-infected flies, with four genes being no nally significant in both infections.These genes include three long, non-coding R lncRNA:CR40465, lncRNA:CR44631, lncRNA:CR44289, and Yp2, which encodes the m yolk protein of eggs.

Significantly Enriched GO Terms
There were no significantly enriched GO terms among the genes that changed pression in response to DmelSV, Bloomfield virus and DCV.Six GO terms related to proliferation were significantly enriched in Craigies-Hill-virus-infected flies (Spreads S3; Figure 5).In general, the enriched GO terms were consistent with the genes that w differentially expressed in our analysis.For example, the significantly enriched term response to Thika virus and DAV infection include development and fatty acid met lism, respectively (Figure 5).Similar consistencies between the significantly enriched terms and expression analysis were observed on the effect of sex on virus infec (Spreadsheet S4; Figure 6).For example, the effect of sex on Motts Mill virus and ga virus infection significantly enriched GO terms such as ion transport, developmental cess, and cuticle development (Figure 6).Similarly, the effect of sex on nora virus infec enriched GOP terms related to meiotic cell cycle and response to stimulus (Figure 6).

Significantly Enriched GO Terms
There were no significantly enriched GO terms among the genes that changed expression in response to DmelSV, Bloomfield virus and DCV.Six GO terms related to cell proliferation were significantly enriched in Craigies-Hill-virus-infected flies (Spreadsheet S3; Figure 5).In general, the enriched GO terms were consistent with the genes that were differentially expressed in our analysis.For example, the significantly enriched terms in response to Thika virus and DAV infection include development and fatty acid metabolism, respectively (Figure 5).Similar consistencies between the significantly enriched GO terms and expression analysis were observed on the effect of sex on virus infection (Spreadsheet S4; Figure 6).For example, the effect of sex on Motts Mill virus and galbut virus infection significantly enriched GO terms such as ion transport, developmental process, and cuticle development (Figure 6).Similarly, the effect of sex on nora virus infection enriched GOP terms related to meiotic cell cycle and response to stimulus (Figure 6).

Discussion
Most studies of how Drosophila gene expression changes in response to virus infection either use non-native viruses or inject natural viruses into the host-which may not be a good reflection of what happens in nature.To understand how Drosophila gene expression changes in response to native virus infection, we analysed seven publicly available RNA-seq datasets that were serendipitously infected with ten different viruses.

Prevalence
The viruses found in this study were galbut virus, Motts Mill virus, DAV, DCV, nora virus, DmelSV, Bloomfield virus, Craigie's Hill virus, Thika virus, and Brandeis virus.The most prevalent viruses in the datasets analysed were nora virus and DAV, with prevalences of up to 52% and 18%, respectively.This is higher than has been observed in wild flies; a study of wild D. melanogaster suggested that the average global prevalences of nora virus and DAV is about 7% each [29], although the authors also reported that DAV had a prevalence of about 50% in wild D. melanogaster collected from Athens (Georgia, USA).The prevalence of galbut virus in this study was 14%, which is much lower than has been reported in wild flies, where the prevalence of galbut is often over 50% [29,35].The difference in virus prevalence between laboratory and wild flies indicates that some viruses may be better suited than others to the ecology of the lab, suggesting the need to isolate more natural viruses from wild Drosophila to better understand host-virus interactions.

Changes in Gene Expression in Virus-Infected Flies
In response to DAV and nora virus infection, we found evidence of increased expression of immune genes involved in the antiviral STING pathway and changes in the expression of genes involved in lipid metabolism.Recent work has shown that flies mutant in STING display reduced lipid storage and downregulated expression of lipid metabolism genes [131].The authors also reported that Drosophila STING interacts with lipidsynthesizing enzymes such as acetyl-CoA carboxylase [131].It has also been reported that genes regulated by IKKβ, such as STING, restrict infection by picorna-like viruses in Dro-

Discussion
Most studies of how Drosophila gene expression changes in response to virus infection either use non-native viruses or inject natural viruses into the host-which may not be a good reflection of what happens in nature.To understand how Drosophila gene expression changes in response to native virus infection, we analysed seven publicly available RNA-seq datasets that were serendipitously infected with ten different viruses.

Prevalence
The viruses found in this study were galbut virus, Motts Mill virus, DAV, DCV, nora virus, DmelSV, Bloomfield virus, Craigie's Hill virus, Thika virus, and Brandeis virus.The most prevalent viruses in the datasets analysed were nora virus and DAV, with prevalences of up to 52% and 18%, respectively.This is higher than has been observed in wild flies; a study of wild D. melanogaster suggested that the average global prevalences of nora virus and DAV is about 7% each [29], although the authors also reported that DAV had a prevalence of about 50% in wild D. melanogaster collected from Athens (Georgia, USA).The prevalence of galbut virus in this study was 14%, which is much lower than has been reported in wild flies, where the prevalence of galbut is often over 50% [29,35].The difference in virus prevalence between laboratory and wild flies indicates that some viruses may be better suited than others to the ecology of the lab, suggesting the need to isolate more natural viruses from wild Drosophila to better understand host-virus interactions.

Changes in Gene Expression in Virus-Infected Flies
In response to DAV and nora virus infection, we found evidence of increased expression of immune genes involved in the antiviral STING pathway and changes in the expression of genes involved in lipid metabolism.Recent work has shown that flies mutant in STING display reduced lipid storage and downregulated expression of lipid metabolism genes [131].The authors also reported that Drosophila STING interacts with lipid-synthesizing enzymes such as acetyl-CoA carboxylase [131].It has also been reported that genes regulated by IKKβ, such as STING, restrict infection by picorna-like viruses in Drosophila S2 cells [132], which is consistent with our findings given that nora virus is a picorna-like virus.
As in previous studies, we also detected a significant downregulation of genes involved in mitochondrial function in response to nora virus infection.For example, using DNA microarray, [12] reported a significant downregulation of CG15434 that is predicted to be involved in mitochondrial electron transport.The authors also found significant upregulation of chorion protein genes [12] such as Femcoat and Cp18, which were not differentially expressed in our study.In contrast to our findings, a study on Drosophila's response to nora virus infection reported a downregulation of antimicrobial peptide genes regulated by the IMD pathway, such as DptA [133], which were upregulated in our study.The authors also observed a significant upregulation of immune response genes such as Tep4 and Drs [133], which were not differentially expressed in our study.
Although we detected no significant changes in expression in response to DMelSV, it has been reported that DMelSV infection increases the rates of expression of chorion protein genes such as Cp18 and downregulation of ribosomal protein genes [134].The authors also reported no differentially expressed immune genes in response to DMelSV [134].However, another study found an increase in DMelSV replication in PGRP-LC and domeless knockout flies, suggesting the recruitment of IMD and JAK/STAT pathways respectively in DMelSV infection [135].
In our study, only the gene encoding a trehalose transporter, Tret1-2, was significantly upregulated in response to DCV infection.This is in contrast with published work that showed that systemic DCV infection decreases the expression of many genes involved in trehalose metabolism, such as Tret1-1, CG5177, and Tps1 [136].Using Drosophila S2 cells, another study reported that only a few genes were differentially expressed in response to DCV infection, with many of the upregulated genes encoding for heat shock proteins [13].DCV infection has also been found to increase the expression genes immune genes such as Spz [9] and vir-1 [10] that are involved in the toll and JAK/STAT pathways, respectively.

Potential Virus-Induced Pathologies
The changes in expression we have observed may be indicative of virus-induced pathologies in the host.For example, GO analysis showed that changes in the expression of metal ion transport was significantly enriched in the flies infected with Motts Mill virus.Perhaps, virus infection may have induced the deregulation of metal ion transport in flies infected with Motts Mill, leading to their change in expression.Conversely, there was an increase in the expression of tipE in response to Motts Mill infection, which may have induced changes in the function of the metal ion channels.Temperature-induced paralytic E (tipE) encodes a protein with a cysteine-stabilized αβ scaffold (CSαβ).Drosophila peptides with CSαβ scaffold are implicated in the regulation of voltage-gated sodium ion channels [137].
We observed an increase in the expression of genes involved in chitin biogenesis and binding, wound healing, and differentiation in the midgut stem cell, respectively.Perhaps, in response to nora-virus-induced damage of the gut epithelium [76], the host may activate the JAK/STAT pathway via upd3 in the gut to regulate the repair response.Published work has shown that upd genes are expressed by haemocytes upon gut septic injury, to remotely stimulate stem cell proliferation and the expression of Drosomycin-like genes in the intestine [138].This is consistent with work showing that upd3 and the chitin metabolism gene, CG32302, were significantly upregulated in Drosophila infected with nora virus [12,133].Genes involved in midgut stem cell differentiation such as Sox21a were also upregulated in response Thika virus infection in our study, which is in contrast with published work.For example, differential gene expression analysis on Drosophila midgut cells detected a significant downregulation of Socs36E, which is upregulated following gut septic injury [138] and encodes a negative regulator of the JAK/STAT pathway [139].
Although not detected in our study, it has been reported that systemic DCV infection induces expression of genes consistent with nutritional stress in the digestive tract [136].The authors found that many of the repressed genes in response to DCV infection are also downregulated in flies undergoing starvation, which include the protease gene, Jon65Ai, and the lysozyme gene, LysE [136].
Overall, our work confirms that viruses are often naturally present in laboratory Drosophila fly culture, and that they can induce detectable changes in host gene expression.It may therefore be useful to consider their effects in experimental studies, especially those that generate transcriptomic data.

Figure 1 .
Figure 1.Prevalence of Viruses in the laboratory D. melanogaster.The figure shows the prevalence of galbut virus, Motts Mill virus, DAV, DCV, nora virus, DmelSV, Bloomfield virus, Craigie's Hill virus, Thika virus, and Brandeis virus in the seven projects analysed.The dataset project codes are shown for each, and brown, green, and mauve bar charts represent dsRNA, +ssRNA, and −ssRNA viruses.Prevalence is given as a percentage of libraries in which the virus was detected, plotted on a log scaled axis for clarity, with 95% confidence intervals assuming binomial sampling.

Figure 1 .
Figure 1.Prevalence of Viruses in the laboratory D. melanogaster.The figure shows the prevalence of galbut virus, Motts Mill virus, DAV, DCV, nora virus, DmelSV, Bloomfield virus, Craigie's Hill virus, Thika virus, and Brandeis virus in the seven projects analysed.The dataset project codes are shown for each, and brown, green, and mauve bar charts represent dsRNA, +ssRNA, and −ssRNA viruses.Prevalence is given as a percentage of libraries in which the virus was detected, plotted on a log scaled axis for clarity, with 95% confidence intervals assuming binomial sampling.

Figure 2 .Figure 2 .
Figure 2. Effects of viral infection on host gene expression.Using uninfected females as baseline, the volcano plots show the main effects on Drosophila gene expression of sex (i.e., being male) in panel (A), and viral infection in panels (B-I).The virus names and total number of variables (genes) used

Figure 3 .
Figure 3.Sex-virus interaction effects on host gene expression.Using uninfected females as baseline, the volcano plots show the sex interaction with viral infection (A-F), that is, the difference in expression between virus-infected males and the expression that would be predicted from the combined effects of sex and infection alone.The virus names and total number of variables (genes) used in each expression analysis are shown.A significance threshold of p < 0.001 was used as well as a logFC cutoff of 0.5.The genes in red are both significant and have a 0.5 < logFC < −0.5.

Figure 4 .
Figure 4. Correlation.The triangular matrix shows significant (p < 0.001) Spearman rank correla coefficients between the estimated changes in gene expression induced by 10 Drosophila viruses virus names are shown above and to the side.Panel (A) shows the correlation between chang expression in response to viruses.Panel (B) shows the correlations between the changes in exp sion induced by the effects of sex on viral infection.Blank squares have non-significant correlat

Figure 4 .
Figure 4. Correlation.The triangular matrix shows significant (p < 0.001) Spearman rank correlation coefficients between the estimated changes in gene expression induced by 10 Drosophila viruses.The virus names are shown above and to the side.Panel (A) shows the correlation between changes in expression in response to viruses.Panel (B) shows the correlations between the changes in expression induced by the effects of sex on viral infection.Blank squares have non-significant correlations.

Figure 5 .
Figure 5. Main effects significantly enriched GO terms.GO enrichment on the effect of virus infection on Drosophila gene expression.The virus names are shown.(A-D) show a −log10(p-value) of functional enrichment analysis and a dendrogram based on hierarchical clustering of the enriched GO terms.The heatmaps of functional GO terms show a clustering combining a description of the first common GO ancestor of each set of GO terms.The heatmap shows the number of GO terms in each set and the dendrogram is based on BMA semantic similarity distance and ward.D2 aggregation criterion (E-G).

Figure 5 .
Figure 5. Main effects significantly enriched GO terms.GO enrichment on the effect of virus infection on Drosophila gene expression.The virus names are shown.(A-D) show a −log10(p-value) of functional enrichment analysis and a dendrogram based on hierarchical clustering of the enriched GO terms.The heatmaps of functional GO terms show a clustering combining a description of the first common GO ancestor of each set of GO terms.The heatmap shows the number of GO terms in each set and the dendrogram is based on BMA semantic similarity distance and ward.D2 aggregation criterion (E-G).

Figure 6 .
Figure 6.Sex-virus interaction significantly enriched GO terms.GO enrichment on the effect of sex on virus infection.The virus names are shown.(A,B) show a −log10(p-value) of functional enrichment analysis and a dendrogram based on hierarchical clustering of the enriched GO terms.(C,D) are heatmaps of functional GO terms clusters combining a description of the first common GO ancestor of each set of GO terms.The heatmap shows the number of GO terms in each set and the dendrogram is based on BMA semantic similarity distance and ward.D2 aggregation criterion.

Figure 6 .
Figure 6.Sex-virus interaction significantly enriched GO terms.GO enrichment on the effect of sex on virus infection.The virus names are shown.(A,B) show a −log10(p-value) of functional enrichment analysis and a dendrogram based on hierarchical clustering of the enriched GO terms.(C,D) are heatmaps of functional GO terms clusters combining a description of the first common GO ancestor of each set of GO terms.The heatmap shows the number of GO terms in each set and the dendrogram is based on BMA semantic similarity distance and ward.D2 aggregation criterion.