New Insights into Evolution of the ABC Transporter Family in Mesostigma viride, a Unicellular Charophyte Algae

ATP-binding cassette (ABC) transporters play an important role in driving the exchange of multiple molecules across cell membranes. The plant ABC transporter family is among the largest protein families, and recent progress has advanced our understanding of ABC classification. However, the ancestral form and deep origin of plant ABCs remain elusive. In this study, we identified 59 ABC transporters in Mesostigma viride, a unicellular charophyte algae that represents the earliest diverging lineage of streptophytes, and 1034 ABCs in genomes representing a broad taxonomic sampling from distantly related plant evolutionary lineages, including chlorophytes, charophytes, bryophytes, lycophytes, gymnosperms, basal angiosperms, monocots, and eudicots. We classified the plant ABC transporters by comprehensive phylogenetic analysis of each subfamily. Our analysis revealed the ancestral type of ABC proteins as well as duplication and gene loss during plant evolution, contributing to our understanding of the functional conservation and diversity of this family. In summary, this study provides new insight into the origin and evolution of plant ABC transporters.


Introduction
The ATP-binding cassette (ABC) transporter is an ancient and large family of transmembrane transport proteins [1][2][3], which are present in all cellular organisms and participate in the transport of a variety of substances [4][5][6][7][8][9][10]. Based on their structural organization, ABC proteins are mainly divided into full-and half-transporters. Full transporters are composed of two transmembrane domains (TMD) and two nucleotide-binding domains (NBD). Half transporters are composed of one TMD and one NBD [11,12].
In plants, ABC transporters are one of the largest protein families and include eight subfamilies (ABCA-G and ABCI) [11,13]. During evolution, plant ABC transporter genes have undergone multiplication and functional diversification [14][15][16]. The high number and complexity of ABC proteins requires comprehensive phylogenetic study to resolve their evolution and function. Several studies have reported the classification of ABC transporters in plants [17][18][19]. However, an investigation of ABCs in unicellular charophyte algae is missing and is crucial to understand the ancestral form and the original function of the plant ABC family.
All green plants can be divided into two groups: Streptophyta and chlorophyte green algae. Streptophyta are further divided into charophyte green algae and all land plants, which evolved from unicellular charophyte algae predecessors [20,21]. Currently, a total of six charophyte green algae genome have been reported [22][23][24][25][26]. Mesostigma viride is the only unicellular charophyte algae that was recently genome sequenced and is representative of the earliest diverging lineage of streptophytes [24].

Sequence Retrieval
We performed BLASTP and TBLASTN searches using well-studied Arabidopsis ABC proteins [11,27] from each subfamily as queries to identify the plant ABCs (e-value < e-10) from the 11  . We used a relatively strict criterion to collect ABCs with high-quality sequences. Subsequently, the sequences were searched against the conserved protein domain database [28], SMART [29], and PFAM [30]

Phylogenetic Analysis of Gene Families
For each subfamily, multiple alignments of candidate proteins were performed using MAFFT version 7 with the G-INS-i algorithm [31], followed by manual editing in MEGA 7 software [32]. Only positions that were unambiguously aligned were included in the further analyses. Neighbor-joining (NJ) phylogenetic trees were constructed using MEGA 7 software based on the multiple alignment of candidate proteins. To determine the statistical reliability, bootstrap analysis was conducted using 1000 replicates with the pdistance and pairwise deletion. In addition, Maximum Likelihood (ML) phylogenetic trees were constructed using IQ-tree with 1000 replicates to validate the NJ results [33].

Identification of ABC Transporters in Green Plants
To investigate the origin and evolution of the ABC family in plants, we carried out a genome-wide survey of ABC proteins from 11 representative species of chlorophyte (C.reinhardtii and V. carteri), charophyte (M. viride and K. flaccidum), bryophyte (P. patens and M. polymorpha), lycophyte (S. moellendorffii), gymnosperm (P. abies), basal angiosperm (A. trichopoda), monocot (O. sativa), and eudicot (A. thaliana). After removing incomplete and/or redundant sequences and alternative splice variants, we identified a total of 1093 ABC proteins in the 11 species (Figure 1). Based on the domain annotations, these proteins belong to eight ABC subfamilies: ABCA, ABCB, ABCC, ABCD, ABCE, ABCF, ABCG, and ABCI. Interestingly, we identified 59 ABCs in M. viride that were distributed in all subfamilies, 55 of them were supported by RNA-seq data and their expression was dy-namic under different environmental condition (Supplementary Figure S1) [24], suggesting most of the ABC genes in M. viride were functional. Notably, the eight ABC subfamilies also included chlorophyte green algae, suggesting that ABC subfamily diversification may have evolved and functional in the common ancestor of green plants. Comparing the number of ABC proteins among these species, we found that the number was largely increased from green algae (water) to land plants, coinciding with the functional diversification of ABC transporters in land plants [15,16]. removing incomplete and/or redundant sequences and alternative splice variants, we identified a total of 1093 ABC proteins in the 11 species (Figure 1). Based on the domain annotations, these proteins belong to eight ABC subfamilies: ABCA, ABCB, ABCC, ABCD, ABCE, ABCF, ABCG, and ABCI. Interestingly, we identified 59 ABCs in M. viride that were distributed in all subfamilies, 55 of them were supported by RNA-seq data and their expression was dynamic under different environmental condition (Supplementary Figure  S1) [24], suggesting most of the ABC genes in M. viride were functional. Notably, the eight ABC subfamilies also included chlorophyte green algae, suggesting that ABC subfamily diversification may have evolved and functional in the common ancestor of green plants. Comparing the number of ABC proteins among these species, we found that the number was largely increased from green algae (water) to land plants, coinciding with the functional diversification of ABC transporters in land plants [15,16].

Evolutionary Analyses of ABC Subfamilies
We further performed phylogenetic reconstruction of each subfamily to better understand the evolutionary relationships of ABC transporters and investigate their origin. Considering no standard nomenclature for the classified clades within each subfamily, we "group" to name each clade.

ABCA Subfamily
Subfamily A consists of forward-oriented (TMD-NBD) transporters and is mainly involved in the manipulation of metabolic and signaling lipids in organisms [34,35]. The plant ABCA subfamily contains one full-size ABCA, named AtABCA1, which is the only full-size ABCA protein and is the largest ABC protein, and several half-size ABCAs that are also called ABC two homologues (ATH) [27,36]. Our analysis showed that the plant ABCA subfamily proteins can be classified into three groups, G1-G3, with strong bootstrap support. The G1 and G3 proteins were half-size ABCAs that were found in all the species examined. G1 proteins were present as multiple copies in most species except for a single copy in P. abies, K. flaccidum, and M. viride ( Figure 2 Figure S2). Interestingly, we observed four copies in the single-cell green algae M. viride and one copy in K. flaccidum, S. moellendorffii, and A. thaliana, but not in many species, including rice and early land plants, suggesting multiple losses of G2 proteins during evolution. G3 proteins were

Evolutionary Analyses of ABC Subfamilies
We further performed phylogenetic reconstruction of each subfamily to better understand the evolutionary relationships of ABC transporters and investigate their origin. Considering no standard nomenclature for the classified clades within each subfamily, we "group" to name each clade.

ABCA Subfamily
Subfamily A consists of forward-oriented (TMD-NBD) transporters and is mainly involved in the manipulation of metabolic and signaling lipids in organisms [34,35]. The plant ABCA subfamily contains one full-size ABCA, named AtABCA1, which is the only full-size ABCA protein and is the largest ABC protein, and several half-size ABCAs that are also called ABC two homologues (ATH) [27,36]. Our analysis showed that the plant ABCA subfamily proteins can be classified into three groups, G1-G3, with strong bootstrap support. The G1 and G3 proteins were half-size ABCAs that were found in all the species examined. G1 proteins were present as multiple copies in most species except for a single copy in P. abies, K. flaccidum, and M. viride  Figure S2). Interestingly, we observed four copies in the single-cell green algae M. viride and one copy in K. flaccidum, S. moellendorffii, and A. thaliana, but not in many species, including rice and early land plants, suggesting multiple losses of G2 proteins during evolution. G3 proteins were present as a single copy in most species except for multiple copies in P. abies, S. moellendorffii, and A. thaliana. It should be noted that G1-G3 proteins were all found in M. viride (Figure 2 and Supplementary Figure S2), suggesting that all three groups have an ancient origin.

ABCC Subfamily
ABCC subfamily proteins, also known as multidrug resistance-associated proteins (MRPs), are full-length transporters in plants, and most of them contain an additional N-terminal hydrophobic region [11,49]. In plants, ABCCs are localized to the vacuolar membrane and plasma membrane [18,50]. They transport not only plant-derived compounds but also chlorophyll degradation metabolites and phytochelatins [50,51]. Therefore, ABCCs have thus far been described as transporters involved in internal detoxification [15]. Our phylogenetic analysis classified ABCCs into four subgroups: G1-G4 (Figure 4 and Supplementary Figure S4). The presence of ABCCs in green algae allowed us to identify the ancestral versions. We found green algae ABCCs in G3 and G4, and the chlorophytes C. reinhardtii and V. carteri were only present in G4. Together with all the protein sequences of G1 and G2 were from land plants (Figure 4 and Supplementary Figure S4), suggesting that these two groups, particularly G4, are more ancient, and G1 and G2 may have evolved after plants colonized land or green algae lost G1 and G2 proteins during evolution. Interestingly, within the subgroups, G4 contained nine copies of ABCCs in M. viride, which was the highest copy number among all species, suggesting that duplication of ABCC occurred at the base of Charophyta. Consistently, we observed multiple copies of ABCC in early land plants (Figure 4 and Supplementary Figure S4). However, the copy number was not significantly increased in seed plants, which have generally undergone whole genome duplication (WGD), suggesting a possible purifying selection in ABCCs during evolution.

ABCD Subfamily
The ABCD family contains predominantly half-size proteins with the orientation TMD-NBD, also known as PMPs (peroxisomal membrane proteins). They are associated with peroxisomal import of fatty acids [52][53][54][55]. The phylogenetic analysis identified three groups: G1-G3 ( Figure 5 and Supplementary Figure S5). M. viride and chlorophyte green algae protein were observed in all groups, indicating the ancient origin of ABCDs. We found that every species examined contains half-size G1 and full-size G3 proteins. Interestingly, the copy number increased during the transition from water to land and subsequently decreased during the transition to flowering plants ( Figure 5 and Supplementary Figure S5). However, we found G2 proteins in most species except A. thaliana, O. sativa, and P. abies, suggesting that G2 was lost in flowering plants from the basal angiosperm.

ABCE and ABCF Subfamilies
Atypical ABC transporters are present in the ABCE and ABCF subfamilies, which lack TMDs and consist of two NBDs [4,13]. ABCE was first identified as an RNase L inhibitor (RLI) in Homo sapiens and is a highly evolutionarily conserved protein [56]. In plants, the ABCE was the smallest subfamily among all the ABC subfamilies (Figure 1),

ABCE and ABCF Subfamilies
Atypical ABC transporters are present in the ABCE and ABCF subfamilies, which lack TMDs and consist of two NBDs [4,13]. ABCE was first identified as an RNase L inhibitor (RLI) in Homo sapiens and is a highly evolutionarily conserved protein [56]. In plants, the ABCE was the smallest subfamily among all the ABC subfamilies (Figure 1), and ABCEs were found in all the species, including M. viride (Figure 6 and Supplementary Figure S5). We found that most species contain 1-3 copies, except P. abies, which has seven copies of ABCEs, suggesting that the function of ABCEs is highly conserved and possibly redundant in very few species. and ABCEs were found in all the species, including M. viride (Figure 6 and Supplementary Figure S5). We found that most species contain 1-3 copies, except P. abies, which has seven copies of ABCEs, suggesting that the function of ABCEs is highly conserved and possibly redundant in very few species.
In humans and yeast, ABCF proteins are involved in ribosome assembly and protein translation [57]. Our phylogenetic analysis classified the plant ABCFs into five groups: G1-G5 (Figure 7 and Supplementary Figure S7). The presence of green algae ABCFs suggested that all the groups have an ancient origin. We found that all the species contain G1-G3 and G5 proteins. However, G4 only included green algae and P. patens (Figure 7 and Supplementary Figure S7), suggesting that G4 was lost in most land plants. Notably, for each group, we did not find a large copy number of ABCFs, and all the species contained 1-3 copies (Figure 7 and Supplementary Figure S7).  In humans and yeast, ABCF proteins are involved in ribosome assembly and protein translation [57]. Our phylogenetic analysis classified the plant ABCFs into five groups: G1-G5 (Figure 7 and Supplementary Figure S7). The presence of green algae ABCFs suggested that all the groups have an ancient origin. We found that all the species contain G1-G3 and G5 proteins. However, G4 only included green algae and P. patens (Figure 7 and Supplementary Figure S7), suggesting that G4 was lost in most land plants. Notably, for each group, we did not find a large copy number of ABCFs, and all the species contained 1-3 copies (Figure 7 and Supplementary Figure S7).

ABCG Subfamily
ABCG is the largest ABC transporter subfamily in plants and has an NBD-TMD reverse domain architecture [6,7]. ABCGs have been reported to be involved in the transport of various secondary metabolites in plants [6], such as detoxification materials, hormones, and lipids [16,[58][59][60][61]. The plant ABCGs include two major groups: the white-brown complex (WBC), named after Drosophila melanogaster, which comprises half-size ABCG proteins, and pleiotropic drug resistance proteins (PDRs), named after the yeast prototype, which contain a large group of full-sized ABCG proteins [62,63]. Our phylogenetic analysis revealed six WBC groups, W1-W6, and one PDR group (Figure 8 and Supplementary Figure S8). For the WBC groups, M. viride was observed in W1-W3 and W5-W6, and the copy number of ABCG was largely expanded in multicellular plants. For W3, we observed 11 ABCGs in S. moellendorffii, representing the largest number within the WBC groups ( Figure 8). The group of PDRs was highly supported by the bootstrap value, suggesting that PDRs have evolved from a common ancestor with full size. A single copy of PDR was present in M. viride. It should be noted that in other species, PDRs were largely expanded, and even more expanded than WBCs. The copy number of PDRs ranged from nine for K. flaccidum to 22 for S. moellendorffii (Figure 8 and Supplementary Figure S8). Based on the topology and bootstrap support, the PDR contains six smaller clusters, P1-P6, and the copy number is variable between species. For example, most of the rice PDRs were found in P1, and the majority of P. patens PDRs were found in P6 (Figure 8 and Supplementary Figure S8), suggesting that duplication may occur within specific species.

ABCI Subfamily
The ABCH subfamily is found in animals but not in plants. Instead, plants contain a group of nonintrinsic ABC proteins, named ABCIs, which have only NBD domains [11,18,64,65]. The phylogenetic analysis classified the ABCIs into 14 groups, G1-G14, with high bootstrap support (Figure 9 and Supplementary Figure S9). M. viride ABCIs were found in 11 of the 14 groups and were only absent in G2, G5, and G12, suggesting that most groups have a deep origin and that the expansion of ABCI may have occurred in the common ancestor. In contrast to the high number of copies and frequent duplications of other subfamilies, one or two copies of ABCIs were usually found in the species examined for each group, and the highest copy number was three, which was found in Arabidopsis of G6 and G12, and in P. abies of G2. Notably, ABCI was not identified in many species in G5 and G12, which may indicate multiple gene loss events during evolution.

Conclusions
As key players in plant growth and development, ABC transporters are of interest for their potential applications in agriculture. Here, we report the identification and evolution of ABCs in Mesostigma viride. Our comprehensive and updated phylogenetic analysis provides new insights into understanding the evolutionary mechanisms underlying the origin and expansion of plant ABC transporters and provides a valuable resource for investigating the physiological functions of ABC genes.
Supplementary Materials: The following supporting information can be downloaded at: https://ww w.mdpi.com/article/10.3390/cimb44040112/s1, Figure S1: Expression profiles of ABC genes in M. viride. The genes IDs are on the right. The different environmental conditions used for expression analysis are indicated at the bottom of each column. Grey color indicates no expression. Figure S2: Phylogenetic analysis of ABCA subfamily proteins from 11 evolutionarily representative plant species. A Maximum Likelihood (ML) tree was generated using IQ-tree with 1000 bootstrap replicates, and bootstrap values >50% are shown on the branches. The M. viride proteins are highlighted by triangles. Figure S3: Phylogenetic analysis of ABCB subfamily proteins from 11 evolutionarily representative plant species. A Maximum Likelihood (ML) tree was generated using IQ-tree with 1000 bootstrap replicates, and bootstrap values >50% are shown on the branches. The M. viride proteins are highlighted by triangles. Figure S4: Phylogenetic analysis of ABCC subfamily proteins from 11 evolutionarily representative plant species. A Maximum Likelihood (ML) tree was generated using IQ-tree with 1000 bootstrap replicates, and bootstrap values >50% are shown on the branches. The M. viride proteins are highlighted by triangles. Figure S5: Phylogenetic analysis of ABCD subfamily proteins from 11 evolutionarily representative plant species. A Maximum Likelihood (ML) tree was generated using IQ-tree with 1000 bootstrap replicates, and bootstrap values >50% are shown on the branches. The M. viride proteins are highlighted by triangles. Figure S6: Phylogenetic analysis of ABCE subfamily proteins from 11 evolutionarily representative plant species. A Maximum Likelihood (ML) tree was generated using IQ-tree with 1000 bootstrap replicates, and bootstrap values >50% are shown on the branches. The M. viride proteins are highlighted by triangles. Figure S7: Phylogenetic analysis of ABCF subfamily proteins from 11 evolutionarily representative plant species. A Maximum Likelihood (ML) tree was generated using IQ-tree with 1000 bootstrap replicates, and bootstrap values >50% are shown on the branches. The M. viride proteins are highlighted by triangles. Figure S8: Phylogenetic analysis of ABCG subfamily proteins from 11 evolutionarily representative plant species. A Maximum Likelihood (ML) tree was generated using IQ-tree with 1000 bootstrap replicates, and bootstrap values >50% are shown on the branches. ABCGs include two major groups: the white-brown complex (WBC), and pleiotropic drug resistance proteins (PDRs). The M. viride proteins are highlighted by triangles. Figure S9: Phylogenetic analysis of ABCI subfamily proteins from 11 evolutionarily representative plant species. A Maximum Likelihood (ML) tree was generated using IQ-tree with 1000 bootstrap replicates, and bootstrap values >50% are shown on the branches. The M. viride proteins are highlighted by triangles.
Author Contributions: X.G. contributed to the conception and design of the work. X.G. performed data collection. X.G. and S.W. conducted the statistical analyses. X.G. wrote the draft of the manuscript. S.W. made critical revisions to the manuscript. All authors have read and agreed to the published version of the manuscript.