2A and 2A-like Sequences: Distribution in Different Virus Species and Applications in Biotechnology

2A is an oligopeptide sequence that mediates a ribosome “skipping” effect and can mediate a co-translation cleavage of polyproteins. These sequences are widely distributed from insect to mammalian viruses and could act by accelerating adaptive capacity. These sequences have been used in many heterologous co-expression systems because they are versatile tools for cleaving proteins of biotechnological interest. In this work, we review and update the occurrence of 2A/2A-like sequences in different groups of viruses by screening the sequences available in the National Center for Biotechnology Information database. Interestingly, we reported the occurrence of 2A-like for the first time in 69 sequences. Among these, 62 corresponded to positive single-stranded RNA species, six to double stranded RNA viruses, and one to a negative-sense single-stranded RNA virus. The importance of these sequences for viral evolution and their potential in biotechnological applications are also discussed.


Introduction
2A and 2A-like sequences are oligopeptides with approximately 18-25 amino acids and can mediate a co-translation "cleavage" of polyproteins in eukaryotic cells. The "core" sequence at the C-terminus of 2A, together with the N-terminal proline of the downstream protein, contains the canonical motif-(G/H) 1 D 2 (V/I) 3 E 4 X 5 N 6 P 7 G 8 P 9 -involved in a ribosome "skipping" effect during translation, which separates two proteins without needing a proteinase [1,2].
The 2A cleavage occurs between the G 8 site at the upstream protein (P1) and the P 9 site at the downstream protein (P2). During amino acid insertion into the protein, the 2A sequence can cause a structural modification at the ribosome peptidyl-transferase center (PTC), making the ribosome "skip" the proline codon. It inhibits the formation of a glycineproline peptide bond because of the hydrolysis of the peptidyl (2A)-tRNAGly ester linkage, releasing the polypeptide from the translational complex [3,4]. In this way, the first amino acid, proline, of the downstream encoded protein, is specified by the third codon in the sequence of P 7 G 8 P 9 , and the C-terminal amino acid of the upstream encoded protein is a glycine encoded by the second codon in that sequence [5,6]. This ribosome "skipping" effect is also referred to as "Stop-Carry On" or "StopGo" translation [6]. Thus, the ribosome activity does not depend on structural elements within the mRNA but a peptide sequence, differentiating this mechanism from the other forms of non-canonical mRNA processing. Because of this activity, the 2A and 2A-like sequences can be named CHYSELs (cis-acting hydrolase elements) [7].
Originally, the term "2A" was assigned to define a specific region of the genome of the foot-and-mouth disease virus (FMDV), a positive-sense single-stranded RNA (pssRNA) virus and member of the Picornaviridae family [1,4,[8][9][10]. Similar sequences discovered in other viruses were named "2A-like." These sequences have been described in other Picornaviridae, such as Equine rhinitis A virus and Porcine teschovirus-1, in other viruses of the Dicistroviridae and Iflaviviridae families [2], and even in the infectious myonecrosis virus (IMNV), a double-stranded RNA (dsRNA) virus belonging to the Totiviridae family [11].
From these first discoveries, the 2A and 2A-like proteolytic cleavage activities have been demonstrated in several eukaryotic systems in vitro and in vivo [2,12]. Because of their mechanism of action, some authors also refer to 2A and 2A-like peptides as cis-acting hydrolase elements [7,13].
In 2017, Yang et al. reviewed the 2A sequence structures and functions of Picornaviridae members [14]. The latest works analyzing 2A and 2A-like sequences, including viruses from other families, were conducted by Luke et al. in 2008Luke et al. in , 2009, and 2014 and by Luke and Ryan in 2013 [2,[15][16][17]. With advances in sequencing technology, in recent years, there has been a significant increase in the number of viral sequences added to the National Center for Biotechnology Information (NCBI) database. Therefore, the goal of this article was to introduce a new screening of 2A and 2A-like sequences in viral genomes available from the NCBI database to revise the principal 2A and 2A-like sequences, describe their occurrence in different viral families, and discuss their potential applications in biotechnology.

Materials and Methods
The sequences used in this study were obtained from the viral databank (https:// www.ncbi.nlm.nih.gov/genome/viruses/, accessed on 9 January 2021). To find 2A/2A-like sequences, the viral genomes were aligned against some of the 2A/2A-like classical motifs (GDVEENPGP; GDVESNPGP; HDIETNPGP; GDVELNPGP; GDIELNPGP; GDIESNPGP; HDVEMNPGP) using the Blastp tool (https://blast.ncbi.nlm.nih.gov/Blast.cgi, accessed on 9 January 2021) and the non-redundant protein sequences database (nr) only including viruses (taxid:10239). Search parameters were set to return a maximum of 500 sequences for each query. Repeated viral sequences were excluded from the analysis.
An active search was performed on the publication linked to the sequence annotation in the NCBI database to identify whether the sequences found had already been reported in the literature after the initial screening. If no report was found, an active search was performed using the Google Scholar search tool, with each respective virus name plus the word "2A" as keywords. If no articles reported the presence of 2A/2A-like in the query virus, we considered this finding novel. Table 1 shows the principal 2A or 2A-like motifs that had their self-cleavage efficiencies tested in vitro, confirming that these sequences are widely distributed among the pssRNA and dsRNA viruses, ranging from insect to mammalian viruses. Luke et al. were the first to report this wide distribution and identified motifs similar to those found in the FMDV [2]. Infectious myonecrosis virus (IMNV) Unassigned Totiviridae GDVEENPGP~99% [2,11] The search for these motifs in the viral genomes available in the NCBI database revealed 69 sequences containing 2A-like motifs that had not been identified. Among these, 62 corresponded to pssRNA, six to dsRNA, and one to a negative-sense single-stranded RNA (nssRNA) virus. Additionally, 2A-like motifs, previously described in 102 sequences, were confirmed. All 2A/2A-like motifs and their respective species resulting from the search are described in Tables 2 and 3.

pssRNA Viruses
Here, we registered 62 new 2A-like notifications in pssRNA viruses, as presented in Table 2 (underlined). The positions in each respective genome are shown in Figure 1.    In most pssRNA viruses, 2A/2A-like segments are used in primary polypeptide processing. The pssRNA viruses commonly possess one 2A/2A-like sequence, but some viruses have two, three, or even four motifs ( Table 2). Many of them are members of the order Picornavirales, such as Picornaviridae, Dicistroviridae, and Iflaviridae. Currently, the Picornaviridae family has 63 assigned genera [28] In aphthoviruses and cardioviruses, the 2A-like region self-cleaves at its own Cterminus, meaning that the 2A-like polypeptide remains as a C-terminal extension of the upstream polyprotein (P1) until it is removed by secondary proteinase cleavage [8,9]. However, in parechoviruses, the 2A-like region has no protease or protease-like activity, and its apparent function is to alter host cell metabolism because it possesses a high homology to cellular protein H-rev107 that regulates cell proliferation (H-box 2A) [29].
In insect Iflaviruses, the 2A-like sequence separates the capsid and replicative protein domains. The Dicistroviridae family is composed of the Aparavirus, Cripavirus, and Triatovirus genera, in which the 2A-like sequences occur at the N-terminal region of the replicative protein open reading frame (ORF) [2,14].
Members of the Permutotetraviridae and Carmotetraviridae families (previously Tetraviridae), Thosea asigna virus and Euprosterna elaeasa virus, encode a 2A-like sequence at the N-terminus of the structural ORF [1]. The Providence virus has three 2A-like sequences, 2A 2 and 2A 3 , located in the capsid protein precursor (VCAP), and 2A 1 at the N-terminus of the p130 ORF, which encodes the viral replicase [30].

dsRNA Viruses
Among the dsRNA viruses, 2A-like sequences not yet reported were found in six species. The new 2A-like sequences are underlined in Table 3, and their localization inside the genome is schematized in Figure 2. In double-stranded viruses, 2A-like sequences are present in two families: Totiviridae and Reoviridae. In Totiviridae, 2A-like sequences are distributed in all representatives of the IMNV-like group [31]. These viruses predominantly infect arthropods, such as penaeid shrimp [32], mosquitoes [33,34], and the fruit fly Drosophila melanogaster [35], except for the golden shiner Totivirus that infects the fish Notemigonus crysoleucas [36]. The genome of IMNV-like viruses is composed of two ORFs, and the 2A-like sequences separate an RNA-binding protein of other putative proteins in ORF1 [37].
In the Reoviridae family, 2A-like sequences are found in cypoviruses and rotaviruses with 2A-like sequences in one of the segments encoding a non-structural protein. In Operophtera brumata cypovirus 18 and Bombyx mori cypovirus 1, 2A-like sequences occur within segment 5. In type C rotaviruses, 2A-like sequences link the ssRNA-binding protein NSP3 to dsRNA-binding protein (dsRBP). In porcine and human rotavirus C, the 2A-like sequences are present at segment 6, although in the adult diarrhea virus, the sequence appears in segment 5 [1,2]. All cypoviruses and rotaviruses possess only one 2A-like sequence (Table 3).

nssRNA Virus
Surprisingly, one 2A-like motif (GDIEQNPGP) was found in a tentatively assigned virus of the Bunyaviridae family (Accession number: APG79245.1). This motif is located in the RNA-dependent RNA polymerase (RdRp) sequence (Figure 3). This is the first report of a 2A-like sequence in a nssRNA virus.

2A/2A-Likes Sequences and Viral Evolution
Previous studies concerning RNA viruses and 2A-like peptides have reported that these sequences emerged independently during the evolution of viral families [2,14]. However, in a previous study [31], we showed sequences very similar to functional 2A-like sequences in some RNA viruses that could be the precursors of 2A sequences.
In particular, RNA viruses depend on the activity of RNA-dependent RNA polymerases. These enzymes have a significant error rate (10 −3 to 10 −5 mutations per inserted nucleotide) because they do not have exonucleotide review activity [38]. This results in a high degree of genetic heterogeneity in populations of RNA viruses, which are believed to favor adaptability to different environments and hosts [39]. Considering this, the 2A/2A-like sequences could have emerged by subsequent mutation events that ended in a cleavage function, providing the advantage of releasing more than one protein from the same ORF. Therefore, this could directly impact viral adaptation potential and viral infection mechanisms to favor their fitness in complex multicellular systems [31].
Yang et al. also suggested that picornaviruses with more complex infection mechanisms than other viruses of the same family have more than one 2A-like sequence in their genomes [14]. Taking this evidence into account, it seems that 2A/2A-like sequences may be a key element in viral genome evolution and, once acquired, its loss of function may impact virus effectivity.

Biotechnology Applications
Various approaches have been employed to co-express multiple proteins in cells, including the use of internal ribosomal entry site (IRES) elements [40,41], dual promoter systems [42,43], and transfection of multiple vectors [44]. Each of these is associated with several limitations, such as uneven or unreliable protein expression levels, silencing of some promoters [45,46], and increased toxicity to cells (with multiple transfections) [47].
Co-expression systems, including 2A/2A-like peptides, could be an alternative strategy for expressing multiple genes under the control of a single promoter. These constructs could have the additional advantage of producing proteins at near-stoichiometric levels, unlike IRES-mediated polycistronic expression, where ribosomes are independently recruited at distinct regions with the mRNA [1,4,48,49]. This necessitates the optimization of the system by testing several combinations of promoters and/or IRES and the order of genes within the expression cassette [46]. Furthermore, IRES activity can be affected by cell type, and variable expression can be observed in the downstream coding sequence [50].
In yeasts, more than two 2A sequences have been used to co-express proteins from the same vector. As seen in [68] and [69], three proteins were produced using this strategy in S. cerevisiae. Surprisingly, up to nine proteins have been linked and successfully cotranslated and separated with 2A sequences in the yeast Pichia pastoris [70].
Researchers have also attempted to use 2A for multi-gene transformation in staple crops [71,72]. They can also be used for gene fusion, as seen in tomatoes, potatoes, and others [73,74].

Conclusions
In this article, we reviewed the 2A/2A-like sequence distribution of viruses and described the occurrence of these motifs in viral species where these sequences have not been previously reported. These findings need to be confirmed through in vitro tests to verify they are active 2A-like sequences.
Because of its cleavage function, the 2A/2A-like sequences appear to directly affect the complexity of the viral genome, which plays a decisive role in viral evolution. Additionally, they are excellent alternatives for developing new biotechnological tools that depend on the expression of multiple products, such as vaccines, transgenic approaches, cell/gene therapy, and optogenetic tools. Funding: We would like to thank Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for financial support.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.