In between: Gypsy in Drosophila melanogaster Reveals New Insights into Endogenous Retrovirus Evolution

Retroviruses are RNA viruses that are able to synthesize a DNA copy of their genome and insert it into a chromosome of the host cell. Sequencing of different eukaryote genomes has revealed the presence of many such endogenous retroviral sequences. The mechanisms by which these retroviral sequences have colonized the genome are still unknown, and the endogenous retrovirus gypsy of Drosophila melanogaster is a powerful experimental model for deciphering this process in vivo. Gypsy is expressed in a layer of somatic cells, and then transferred into the oocyte by an unknown mechanism. This critical step is the start of the endogenization process. Moreover gypsy has been shown to have infectious properties, probably due to its envelope gene acquired from a baculovirus. Recently we have also shown that gypsy maternal transmission is reduced in the presence of the endosymbiotic bacterium Wolbachia. These studies demonstrate that gypsy is a unique and powerful model for understanding the endogenization of retroviruses.


Introduction
Retroviruses are enveloped RNA viruses that copy their genome into DNA, then insert it into their host cell's chromosomes as an essential part of their replication cycle. Classical retroviruses, like HIV or HTLV, propagate through extracellular particles that infect fresh cells in the host and ensure transmission to other individuals of the host species. When this cycle involves somatic cells, the replication cycle involves no alteration of the genetic structure of the host species, but if germline cells are infected but remain competent, the viral genetic information can be passed to successive generations and may eventually become a feature of all members of the host species. Such genetic sequences are called endogenous retroviruses and the dynamic process of their acquisition is called viral endogenization (see Box 1). In silico analysis of eukaryote genomes has revealed that such endogenous retroviruses are widespread [1].

Box 1. Endogenization: However could a viral sequence get into my genes?
For a viral sequence to become an endogenous element of the host's genome, the germinal cells themselves must suffer viral infection. The pathway from fusion of germinal cells to the birth of viable and reproductively competent offspring is exceedingly complex and finely balanced. Major disruption of function is very likely to result in fatal errors in embryogenesis and the chromosome bearing a newly inserted viral sequence will then not be transmitted to subsequent generations. This may go a long way to explain why the insertion sites of endogenous retroviral elements do not show a totally random pattern. Even if insertion of a retroviral provirus occurs randomly, those sites that cause too much disruption will be eliminated before they can be seen. Population genetic models could then explain diffusion and fixation of an endogenous provirus within a host population and then within a species. The viral genes themselves may be expressed and the resultant peptides can change the dynamic economy of the host cell. This does not preclude all modification of germ cell function, however; some discrete modifications of viral genes may be tolerated or even beneficial. Indeed several examples of "domesticated" retroviral genetic elements now contribute to genome expression in many species-including humans.
Endogenous retroviruses vary in abundance in eukaryote genomes, and pose many interesting questions. What mutations in their genomes permit their co-existence with their hosts? What impact do these sequences have upon the integrity and function of the host genome? How do the host genomes regulate endogenous viral sequences that have conserved their replicative potential? Important results have recently been obtained from the analysis of human endogenous retroviruses permitting, among other things, the dating of the incursion of human endogenous retroviral families [2], the "revival" of the postulated infectious retroviral ancestor of the most prolific family of human endogenous retroviruses by in vitro directed mutagenesis [3] and a demonstration of the retroviral origin of the syncytin gene, which plays an essential part in placental morphogenesis [4]. Nevertheless, studies on the crucial first step of "colonization" of gametes by retroviral sequences leading to endogenization are still sparse. One interesting case is the present active invasion of the koala genome by an exogenous virus closely related to the gibbon ape leukemia virus (GALV) [5], which is present in the genome of some, but not all, animals and appears to be actively infecting their germ-line cells. This animal model is obviously not accessible to an experimental approach. In this review we propose that the gypsy retroelement is an exceptional model for understanding the mechanisms of retroviral endogenization.

Gypsy: An Errantivirus of Drosophila melanogaster
A helpful model for endogenization and for the balancing regulation needed for stability is provided by the gypsy retroelement in Drosophila melanogaster. Gypsy is classified as an errantivirus, a division of Metaviridae, which are a sister group to the Retroviridae in the LTR retroelements ( Figure 1).

Figure 1. Phylogenetic relationships between LTR-retroelements based on Reverse Transcriptase domains
The genome organization of gypsy is very similar to that of the classical retroviruses with typical LTRs flanking three open reading frames (ORFs). The first ORF corresponds to a gag-like gene with a recognizable NC domain, although the MA and CA domains are unrecognizable, as are those of the HSV spumavirus [6]. Again, like the Spumaviridae and some other retroviruses like ASLV or EIAV [7], the gag gene lacks a myristilation sequence suggesting a different mechanism for assembly of viral particles at the plasma membrane. The gag gene is followed by a recognizable protease (pro) and polymerase (pol) reading frame, read by a frameshift from the gag sequence and by an envelope gene (env) expressed from a spliced mRNA lacking the gag and pro-pol sequences ( Figure 2).
Interestingly, it should be noted that gagand env-related sequences were identified in Drosophila genomes [8,9] mirroring viral gene domestication events described in vertebrates. However, the cellular roles of these Drosophila putative proteins remain to be demonstrated. Several gypsy cis-regulatory sequences are known. Gypsy contains two internal ribosome entry sites (IRES) [10] and two distinct primer binding sites at the 5' and 3' ends of its genome [6]. An unusual characteristic of gypsy is that it contains 12 repeats of a sequence in its 5'UTR region which binds the chromatin insulator protein Suppressor of Hairy-Wing [11]. This insulator has the property of completely or partially blocking the activity of enhancers when it occurs between the enhancer and the promoter of a gene. This indicates the effect of gypsy proviruses in gene regulatory pathways. Nevertheless, other crucial cis-regulatory sequences remain unknown: for example, the nucleo-cytoplasmic export signal for the full-length unspliced gypsy genomic RNA, and the psi packaging RNA element necessary for its encapsidation.

Gypsy Regulation by Flamenco
All individuals of Drosophila melanogaster carry defective gypsy proviruses in the centromeric regions of their X chromosome [12], and certain individuals from wild populations also carry euchromatic integrated gypsy proviruses [13]. The number of these novel insertions varies from one to five, and their integration sites vary, suggesting that they represent new germline integrations of an active circulating virus. In addition some laboratory strains of D. melanogaster carry a large number (around twenty) of gypsy proviruses, and show a high rate of mutation due to the disruption of cellular genes by these mobile proviruses [14]. Genetic analysis of these flies identified a locus named flamenco in the heterochromatic region of the X chromosome, which provided a maternal regulation of gypsy expression [15]. In permissive flies (flamP/flamP homozygotes) gypsy elements multiply unrestricted, but they are controlled in the presence of the flamR dominant restrictive allele. The restriction occurs in a special tissue of the gonads: the follicular cells which are of somatic origin and which surround the oocyte in Drosophila ovary ( Figure 3). Immuno-labeling shows the polarized expression of gypsy Env antigens on the surface of the follicular cells in proximity to the oocyte in permissive samples and its absence in those from flies of the restricted phenotype, correlating with the acquisition of new gypsy proviruses in the permissive flies. It should be noted that other Drosophila errantiviruses show the same tissue expression and are similarly controlled from the same locus (Table 1). The mechanism of transfer does not, however, involve Env directly, because a gypsy provirus lacking Env is transferred to oocytes in permissive flies and integrates copies to progeny at a rate similar to that of the intact provirus [16]. The authors have verified absence of gypsy Env products in this strain, but cannot rule out presence of a heterologous envelope protein able to pseudotype the env defective gypsy. Another Drosophila retroelement called ZAM has a similar replicative cycle to gypsy [17] and "hijacks" the vitellogenin pathways to enter oocytes [18].

The Genetic "Music" of flamenco
The flamenco locus is located at the heterochromatic 20A locus (X chromosome) and contains numerous co-orientated defective sequences of gypsy and other transposable elements. It is not a classical gene that directs the production of a conventional mRNA, but can generate a long non-coding RNA containing many transposable element truncated sequences in an anti-sense direction [19,20]. The transcription is implemented by RNA pol II, is regulated by the transcription factor cubitus interruptus [21] and generates a number of different RNA precursors by differential splicing. The precursors are exported to a perinuclear region in follicular cells near to the yb bodies and called flam bodies [22], where they are processed into 25-27 nt fragments which are loaded onto the Piwi protein under control by the co-chaperone Shutdown protein [23], forming piRNA-inducing silencing complex (piRISC). It was first proposed that piRISC was able to target and to cleave sense transcripts from active errantiviruses like gypsy, inducing gene silencing at the post-transcriptional level [24]. However several recent results strongly suggest that this complex is imported to the nucleus and silences transposon transcription by establishing H3K9me3 heterochromatic marks [25][26][27][28] (Figure 4). This mechanism of errantivirus regulation by flamenco is called primary piRNA-mediated transcriptional gene silencing (TGS) and operates only in somatic follicular cells; another type of piRNA pathway involving amplification by the "ping pong loop" is active in the germline [29] but does not concern gypsy and is not further discussed here. Genesis of flamenco-like clusters is a fascinating question, and recent results concerning the dynamic of flamenco alleles in Drosophila reveal recurrent insertions and deletions of transposon sequences at the flam locus [30].

Gypsy: An Endogenous Retrovirus with Infectious Properties
The gypsy element in Drosophila provides a good experimental system for the various steps in endogenization. In order to invade the genome of a new species, the potential endogenous element must first establish itself in at least one individual of that species. Gypsy is an errantivirus, a subtype of the Metaviridae division of LTR transposons, which possess a third ORF coding for an env-like gene. This might permit classical retrovirus-like infectivity.
To test this, Kim et al. developed an experimental system where culture media including homogenized gypsy-expressing Drosophila were fed to permissive larvae lacking active gypsy proviral sequences. A highly-selective genetic test allowed an estimation of the frequency of transfer of new proviral copies into the germline of progeny by the observations of mutations caused by insertion of a gypsy provirus into the ovo locus [31]. Similar results were obtained by Song et al. using purified gypsy particles from permissive adult females; they also showed that infection was abrogated by pre-treatment of the particles by an anti-Env antibody, suggesting an active role for gypsy Env in the infection process [32].
A defining characteristic of the errantiviruses, as compared to the metaviruses and semotiviruses, is that they possess a third open reading frame coding for an envelope glycoprotein and expressed from a sub-genomic spliced mRNA similar to that of vertebrate retroviruses [33]. The gypsy Env is atypical for retroviruses, but shows significant homology to the baculovirus fusion protein FP [34] and was probably "captured" by insertion of a LTR retrotransposon, which lacks the env gene, into the dsDNA genome of a baculovirus infecting the host cell. Baculoviruses have a replication strategy and a cellular tropism quite different to those of the errantiviruses. Baculovirus particles exist in two different forms: the occlusion derived viruses (ODV) and the budded viruses (BV). ODV are released from occlusion bodies, and initiate intestinal infection after ingestion by the insect host, whereas BV buds out of infected cells and mediate cell-to-cell spread throughout the insect. BV and ODV differ mainly in the origin of their envelopes. The errantiviral envelope shows similarities with the FP envelope protein of the BV particles present in the group II nucleopolyhedroviruses, which allows cell penetration by a pH-dependent fusion. This similarity is highly significant for the peptide fusion domain, present in all viral fusion proteins of class I ( Figure 5) [35] and which allows fusion between the virus and the target cell membrane [36]. In silico analysis predicts, however, only a weak fusion potential for these peptides, in comparison to that of the HRSV paramyxovirus [37].

Wolbachia Influences Gypsy Maternal Transmission
We have been considering the overall host and parasite context of the interactions between gypsy and Drosophila and have recently concentrated upon another maternally-transmitted agent in Drosophila: Wolbachia, which passes from mother to offspring in the oocyte cytoplasm [38]. Interestingly, Wolbachia reduces the replication of several exogenous RNA viruses [39,40] and we have shown that, in the presence of one Wolbachia variant (wMel), which is at present becoming a major strain in Drosophila melanogaster [41], the maternal transmission of gypsy, as measured by its insertion into the ovo gene, is substantially reduced [42]. This diminution does not involve flamenco restriction, because rates of gypsy insertion do not differ between wMel+ and wMel− flamenco restrictive flies. We hypothesize that Wolbachia competes efficiently with gypsy for a posterior position within the oocyte, and thereby impedes gypsy maternal transmission into the offspring germline cells. We are considering whether this mechanism operates for other endogenous retroviruses of Drosophila, or indeed for other cases of maternal transmission of exogenous viruses.

Conclusions
Gypsy is a relevant in vivo model of endogenization because of its natural presence in Drosophila melanogaster for which powerful molecular and genetic tools are available. The fact that it is quite easy to induce gypsy mobilization and transfer to the germline makes it a unique model that allows the study of endogenous retrovirus genesis but also of a mechanism of envelope gene procurement, making gypsy an excellent model of "exogenization" process. The acquisition or loss of an envelope gene can be seen as a dynamic system showing an unstable equilibrium between a LTR retrotransposon lacking the envelope gene and an infectious retrovirus ( Figure 6). Identification of an errantivirus, burdock, closely related to gypsy but lacking an envelope gene reinforces this hypothesis [43]. However, the precise role of gypsy Env remains an open question. We have obtained some preliminary results that suggest that Env is incorporated into gypsy particles produced in a Drosophila cell culture line, which makes this in vitro tool suitable for Env analysis.
Gypsy and the errantiviruses represent a hybrid type of viral element, which resembles a retroelement in its replication enzymatic machinery, but has opportunistically acquired a viral gene coding for a protein with fusogenic properties. The finding that gypsy also interacts with Wolbachia provides the first evidence that a novel factor like an endosymbiotic bacterium can influence colonization of the genome by retroelements.