Genome-Wide Identification of Soybean ABC Transporters Relate to Aluminum Toxicity

ATP-binding cassette (ABC) transporter proteins are a gene super-family in plants and play vital roles in growth, development, and response to abiotic and biotic stresses. The ABC transporters have been identified in crop plants such as rice and buckwheat, but little is known about them in soybean. Soybean is an important oil crop and is one of the five major crops in the world. In this study, 255 ABC genes that putatively encode ABC transporters were identified from soybean through bioinformatics and then categorized into eight subfamilies, including 7 ABCAs, 52 ABCBs, 48 ABCCs, 5 ABCDs, 1 ABCEs, 10 ABCFs, 111 ABCGs, and 21 ABCIs. Their phylogenetic relationships, gene structure, and gene expression profiles were characterized. Segmental duplication was the main reason for the expansion of the GmABC genes. Ka/Ks analysis suggested that intense purifying selection was accompanied by the evolution of GmABC genes. The genome-wide collinearity of soybean with other species showed that GmABCs were relatively conserved and that collinear ABCs between species may have originated from the same ancestor. Gene expression analysis of GmABCs revealed the distinct expression pattern in different tissues and diverse developmental stages. The candidate genes GmABCB23, GmABCB25, GmABCB48, GmABCB52, GmABCI1, GmABCI5, and GmABCI13 were responsive to Al toxicity. This work on the GmABC gene family provides useful information for future studies on ABC transporters in soybean and potential targets for the cultivation of new germplasm resources of aluminum-tolerant soybean.


Introduction
ATP-binding cassette (ABC) transporters are an ancient and huge transmembrane protein family that widely exist in natural organisms, and they have attracted extensive attention due to their diverse biological functions. ABC proteins release energy mainly by binding and hydrolyzing ATP, leading to conformational changes of proteins, thus realizing the transmembrane transport of a wide array of substrates, including carbohydrates, lipids, peptides, terpenes, cell metabolites, heavy metal chelates, and metal ions [1].
ABC transporters generally contain two domains: hydrophobic transmembrane domains (TMDs) containing four to six transmembrane helices and nucleotide-binding domains (NBDs) containing several highly conserved characteristic motifs, such as Walker A motif [GXXGXGKS/T], Walker B motif [(RK)X 3 GX 3 L(hydrophobic) 3 ], ABC signature motif, the Q-loop, and the H-loop [2]. TMDs serve as recognition agents and channels for substrate transport lipid bilayers, whereas NBDs provide energy for substrate transport or non-transport processes through ATP binding and ATP hydrolysis.
On the basis of the phylogenetic analysis of ABC transporters, the similarity of conserved sequences, and the organization of their domains, ABC transporters are divided into eight subfamilies: ABCA-ABCG and ABCI. To date, the number of ABC transporter genes identified in plants is much higher than that of animals and microorganisms [3].

Genome-Wide Identification of Soybean ABC Transporters
A total of 255 GmABC protein sequences were identified in the genome of soybean and then classified into eight subfamilies. These genes were denoted as GmABCA1-GmABCA7, GmABCB1-GmABCB52, GmABCC1-GmABCC48, GmABCD1-GmABCD5, GmABCE1, GmABCF1-GmABCF10, GmABCG1-GmABCG111, and GmABCI1-GmABCI21, according to their position in the chromosome (Figure 1). Basic information on the GmABC genes based on their subfamilies, including gene ID number, number of amino acid (aa) residues, domain, gene position, molecular weight (kDa), isoelectric point (pIs), and subcellular localization, is shown in Table S1. The proteins ranged from 127 to 1897, and molecular weight varied from 13.51 kDa to 211.71 kDa. Their pIs ranged from 5.10 to 9.88. Approximately 70% of GmABC proteins was predicted to be located in the plasma membrane, and many of them were in the chloroplast and cytoplasm. A few of them were the mitochondrion, vacuole, nucleus, endoplasmic reticulum, and Golgi apparatus.

Genome-Wide Identification of Soybean ABC Transporters
A total of 255 GmABC protein sequences were identified in the genome of soybean and then classified into eight subfamilies. These genes were denoted as GmABCA1-GmABCA7, GmABCB1-GmABCB52, GmABCC1-GmABCC48, GmABCD1-GmABCD5, GmABCE1, GmABCF1-GmABCF10, GmABCG1-GmABCG111, and GmABCI1-GmABCI21, according to their position in the chromosome (Figure 1). Basic information on the GmABC genes based on their subfamilies, including gene ID number, number of amino acid (aa) residues, domain, gene position, molecular weight (kDa), isoelectric point (pIs), and subcellular localization, is shown in Table S1. The proteins ranged from 127 to 1897, and molecular weight varied from 13.51 kDa to 211.71 kDa. Their pIs ranged from 5.10 to 9.88. Approximately 70% of GmABC proteins was predicted to be located in the plasma membrane, and many of them were in the chloroplast and cytoplasm. A few of them were the mitochondrion, vacuole, nucleus, endoplasmic reticulum, and Golgi apparatus.

Phylogenetic Analysis of the GmABC Family
To better understand the evolutionary relationship of the GmABC proteins further, we constructed a phylogenetic tree by using the GmABCs' amino acid sequences and the 77 ABC transporters in plants with known functions, which had been previously reported ( Figure 2, Table S2). Similar to those in Arabidopsis and rice, the GmABC proteins in soybean were also divided into eight subfamilies (ABCA-ABCG and ABCI) ( Figure 2, Table S1). Among the 255 GmABC proteins, subfamily ABCG ranked first and accounted for 43.5%; ABCB and ABCC came in second, and ABCE was the least common. Subfamily ABCG was divided into two clades: clade PDR (36 members) and clade WBC (75 members); subfamily ABCB was divided into three clades: clade MDR, clade ATM, and clade TAP, containing 43, 2, and 7 members, respectively ( Figure 2). In summary, this classification is consistent with the perspective of phylogenetic branch and classification criteria in other species.

Phylogenetic Analysis of the GmABC Family
To better understand the evolutionary relationship of the GmABC proteins further, we constructed a phylogenetic tree by using the GmABCs' amino acid sequences and the 77 ABC transporters in plants with known functions, which had been previously reported ( Figure 2, Table S2). Similar to those in Arabidopsis and rice, the GmABC proteins in soybean were also divided into eight subfamilies (ABCA-ABCG and ABCI) ( Figure 2, Table  S1). Among the 255 GmABC proteins, subfamily ABCG ranked first and accounted for 43.5%; ABCB and ABCC came in second, and ABCE was the least common. Subfamily ABCG was divided into two clades: clade PDR (36 members) and clade WBC (75 members); subfamily ABCB was divided into three clades: clade MDR, clade ATM, and clade TAP, containing 43, 2, and 7 members, respectively ( Figure 2). In summary, this classification is consistent with the perspective of phylogenetic branch and classification criteria in other species.

Gene Structure and Conserved Motif Analysis
The gene structure and conserved motif organization of GmABC were analyzed to characterize the structural diversity of GmABC genes further ( Figure 3 and Figure S1). We found that GmABCs showed diversification between and within subfamilies through gene structural analysis. In terms of the distribution of exons, most GmABC transporters had 1-27 exons, and numerous exons were observed in GmABCC23 (36) and GmABCA1 (40). The number of exons was highly conserved on the same branches, especially the nearby protein-coding genes that have the same gene structure. Similarly, the number of introns among the different family genes also varied markedly ( Figure 3). Specifically, 15 GmABC transporter genes, including GmABCC42, GmABCF7, GmABCF9, GmABCG81-GmABCG54, and GmABCG51-GmABCG99, only had one exon and no intron. Except for GmABCI18, GmABCI14, GmABCG85, GmABCA2, GmABCB18, GmABCC40, and GmABCC45, most GmABCs had UTR, including 97.25% (2248/255) of GmABCs.

Gene Structure and Conserved Motif Analysis
The gene structure and conserved motif organization of GmABC were analyzed to characterize the structural diversity of GmABC genes further (Figures 3 and S1). We found that GmABCs showed diversification between and within subfamilies through gene structural analysis. In terms of the distribution of exons, most GmABC transporters had 1-27 exons, and numerous exons were observed in GmABCC23 (36) and GmABCA1 (40). The number of exons was highly conserved on the same branches, especially the nearby protein-coding genes that have the same gene structure. Similarly, the number of introns among the different family genes also varied markedly ( Figure 3). Specifically, 15 GmABC transporter genes, including GmABCC42, GmABCF7, GmABCF9, GmABCG81-GmABCG54, and GmABCG51-GmABCG99, only had one exon and no intron. Except for GmABCI18, GmABCI14, GmABCG85, GmABCA2, GmABCB18, GmABCC40, and GmABCC45, most GmABCs had UTR, including 97.25% (2248/255) of GmABCs.  To gain the conserved domain of GmABCs, we analyzed them by using the MEME program. A total of 10 conserved motifs were predicted and comprised approximately 21-50 conserved amino acids ( Figure 4, Table S3). As shown in Figure S1, GmABCs in the same branches displayed a similar motif order and composition as the gene structure. Both of them further supported the classification of subfamilies. Motifs -2, -4, -6, and -10 were prevalent in most GmABCs except in some GmABCGs, GmABCB, GmABCD, GmABCE, and GmABCF, respectively; motifs -1, -4, and -5 were arranged in the same order within most GmABCCs and GmABCGs, followed by motifs 6 and -9 separately. Common motifs occur in most GmABCs, and some members in separate subfamilies also contain their unique motifs; for instance, motif -7 only existed in the GmABCBs; motifs -8 and -9 basically only existed in GmABCA, GmABCG, GmABCD, and GmABCG, respectively, whereas motif -3 was mainly found in GmABCA-GmABCD. Interestingly, some putative GmABCs (e.g., GmABCI20) did not contain the above domain but contained the typical domain of ABC transporters while containing dozens of amino acids. To gain the conserved domain of GmABCs, we analyzed them by using the MEME program. A total of 10 conserved motifs were predicted and comprised approximately 21-50 conserved amino acids ( Figure 4, Table S3). As shown in Figure S1, GmABCs in the same branches displayed a similar motif order and composition as the gene structure. Both of them further supported the classification of subfamilies. Motifs -2, -4, -6, and -10 were prevalent in most GmABCs except in some GmABCGs, GmABCB, GmABCD, GmABCE, and GmABCF, respectively; motifs -1, -4, and -5 were arranged in the same order within most GmABCCs and GmABCGs, followed by motifs 6 and -9 separately. Common motifs occur in most GmABCs, and some members in separate subfamilies also contain their unique motifs; for instance, motif -7 only existed in the GmABCBs; motifs -8 and -9 basically only existed in GmABCA, GmABCG, GmABCD, and GmABCG, respectively, whereas motif -3 was mainly found in GmABCA-GmABCD. Interestingly, some putative GmABCs (e.g., GmABCI20) did not contain the above domain but contained the typical domain of ABC transporters while containing dozens of amino acids.

Gene Duplication and Synteny Analysis
Gene duplication, especially tandem and segmental duplications, play an indispensable role in expanding the gene family during the evolutionary process. Thus, we analyzed the locations of GmABC duplicates in the soybean genome and observed four different types of gene duplications, including 17 proximal, 20 tandem, 41 dispersed, and 177 whole genome duplications (WGD) (segmental) ( Table S5). None of the genes was a singleton. Hence, segmental duplication was the main reason for the expansion of the GmABC transporter.

Gene Duplication and Synteny Analysis
Gene duplication, especially tandem and segmental duplications, play an indispensable role in expanding the gene family during the evolutionary process. Thus, we analyzed the locations of GmABC duplicates in the soybean genome and observed four different types of gene duplications, including 17 proximal, 20 tandem, 41 dispersed, and 177 whole genome duplications (WGD) (segmental) ( Table S5). None of the genes was a singleton. Hence, segmental duplication was the main reason for the expansion of the GmABC transporter.
A total of 125 GmABC gene pairs were detected, of which 15 pairs were tandem duplications, and 110 were segmental duplications (Figures 1 and 5, Table S6). Two groups, including six genes (GmABCB35, GmABCB36, and GmABCB37; GmABCG59, GmABCG60, and GmABCG61), each contained two tandem duplication gene pairs located on the same chromosome Gm08 and Gm19 and were adjacent to each other. Chromosomes Gm03, Gm08, Gm13, and Gm19, each contained three tandem duplication gene pairs, whereas Gm02, Gm12, and Gm15 all contained one (Figure 1). A total of 125 GmABC gene pairs were detected, of which 15 pairs were tandem duplications, and 110 were segmental duplications (Figures 1 and 5, Table S6). Two groups, including six genes (GmABCB35, GmABCB36, and GmABCB37; GmABCG59, GmABCG60, and GmABCG61), each contained two tandem duplication gene pairs located on the same chromosome Gm08 and Gm19 and were adjacent to each other. Chromosomes Gm03, Gm08, Gm13, and Gm19, each contained three tandem duplication gene pairs, whereas Gm02, Gm12, and Gm15 all contained one (Figure 1). To detect which selection type promoted GmABC transporter evolution, we calculated Ka/Ks values for GmABC genes in duplicated genes. In addition to the To detect which selection type promoted GmABC transporter evolution, we calculated Ka/Ks values for GmABC genes in duplicated genes. In addition to the GmABCA2/GmABCA5, GmABCA4/GmABCA5, and GmABCA5/GmABCA7 gene pairs, the Ka/Ks ratios of all GmABC gene pairs were less than 1, suggesting that intense purifying selection was accompanied by the evolution of GmABC genes (Table S6). In the GmABCA2/GmABCA5, GmABCA4/GmABCA5, and GmABCA5/GmABCA7 gene pairs, most of the synonymous mutation sites occurred. That is, the sequence divergence was large and the evolutionary distance was long. The divergence times of GmABC gene pairs were also evaluated. The divergence time of GmABC tandem duplication pairs ranged from 4.2 MYA to 120.5 MYA, with an average time of 28.8 MYA, whereas the segmental duplication pairs diverged between 5.3 and 106.6 MYA with an average time of 27.9 MYA (Table S6). In addition, the GmABCG77/GmABCG78 gene pair might have occurred at approximately 120.5 MYA, which is an ancient event. Since then, no tandem duplication event occurred in 65 MYA until 54.5 MYA. However, segmental duplication events arose nearly 13 MYA later than tandem duplication and occurred all the time. The result showed that tandem duplications have a relatively earlier origin (Table S6).
To better understand the evolutionary origin of the GmABC gene family further, we analyzed the genome-wide collinearity of soybean with three other representative species and constructed a collinearity plot ( Figure 6). A total of 88, 18, 98, and 7 collinear gene pairs were identified in A. thaliana, O. sativa, S. tuberosum, and Z. mays, respectively (Table S7). We also found the syntenic genes of GmABCs locating on almost all chromosomes of A. thaliana and S. tuberosum, but only on a few chromosomes of O. sativa and Z. mays. In addition, more collinearity blocks at the genome scale were found between soybean and A. thaliana and between soybean and S. tuberosum (Figure 6), indicating that the GmABCs have remained closely related to A. thaliana and S. tuberosum during evolution. The Ka/Ks ratios of the orthologous gene pairs were calculated and the Ka/Ks values of all ortholog pairs were below 0.5, confirming that the GmABC family across the species was under a stabilizing or strong purifying selection during its evolution (Table S7). Interestingly, several collinear relationships were found in different species, and some collinear gene pairs were available in dicotyledonous plants (soybean and A. thaliana/S. tuberosum) but absent in the monocotyledonous ones (sugarcane and O. sativa/Z. mays). For example, 74 GmABCs and 46 AtABCs had a collinear relationship, and one-to-one matches had the most relationships, such as GmABCG39/AT1G31770.1. Many-to-one cases and one-to-many matches, such as (GmABCG12, GmABCG17, GmABCG20)/ AT1G15520.1 and GmABCG54/ (AT2G47000.5, AT3G62150.2, AT4G01830.1), also existed (Table S7), indicating that GmABCs were relatively conserved and that collinear GmABCs between species may have originated from the same ancestor.

Expression Pattern of GmABC Genes in Different Tissues
To better understand the biological functions of the identified genes further, we downloaded the transcriptional data of fifteen soybean tissues from a publicly available gene expression browser website and analyzed the expression patterns (Table S8). As shown in Figure 7, the GmABCs exhibited different expression levels in the suspensor, cotyledon, hypocotyls, embryo, endosperm, seed coat, seedling, root, shoot, leaves, nodule, flower, pod, seed, and callus. For example, a total of seven genes in GmABCA were all expressed in the suspensor and seed coat, among which GmABCA1 and GmABCA5 were expressed in all tissues, indicating that the GmABCA members might have multiple biological functions. GmABCB, GmABCC, and GmABCG exhibited that some members were expressed in almost all tissues, and some were almost not expressed. However, almost all the members of GmABCD, GmABCE, GmABCF, and GmABCI showed relatively high expression in all tissues, especially GmABCE and GmABCF. Most homologous gene pairs broadly exhibited similar expression patterns, but there were some exceptions. For example, the gene pair GmABCG10/GmABCG27 exhibited no expression in all tested tissues, and the gene pair GmABCG10/ GmABCG18, GmABCG18 exhibited relatively high expression in almost all tissues (Figure 7). Among the GmABC genes examined, 51 were expressed in all fifteen tissues, 29 were not expressed in the tissue samples, and 18 were expressed only in the single tissue that showed tissue specificity (Tables S8 and S9, Figure  8A,B).

Expression Pattern of GmABC Genes in Different Tissues
To better understand the biological functions of the identified genes further, we downloaded the transcriptional data of fifteen soybean tissues from a publicly available gene expression browser website and analyzed the expression patterns (Table S8). As shown in Figure 7, the GmABCs exhibited different expression levels in the suspensor, cotyledon, hypocotyls, embryo, endosperm, seed coat, seedling, root, shoot, leaves, nodule, flower, pod, seed, and callus. For example, a total of seven genes in GmABCA were all expressed in the suspensor and seed coat, among which GmABCA1 and GmABCA5 were expressed in all tissues, indicating that the GmABCA members might have multiple biological functions. GmABCB, GmABCC, and GmABCG exhibited that some members were expressed in almost all tissues, and some were almost not expressed. However, almost all the members of GmABCD, GmABCE, GmABCF, and GmABCI showed relatively high expression in all tissues, especially GmABCE and GmABCF. Most homologous gene pairs broadly exhibited similar expression patterns, but there were some exceptions. For example, the gene pair GmABCG10/GmABCG27 exhibited no expression in all tested tissues, and the gene pair GmABCG10/ GmABCG18, GmABCG18 exhibited relatively high expression in almost all tissues (Figure 7). Among the GmABC genes examined, 51 were expressed in all fifteen tissues, 29 were not expressed in the tissue samples, and 18 were expressed only in the single tissue that showed tissue specificity (Tables S8 and S9, Figure 8A,B).   GmABCs are expressed in each tissue (suspensor, cotyledon, hypocotyls, embryo, endosperm, seed coat, seedling, root, shoot, leaves, nodule, flower, pod, seed, and callus). Each color represents a tissue, and its length refers to the number of expressed genes. The black dots indicate the number of genes expressed in each tissue. The only dot (red) represents the number of genes unique to this tissue, and two or more dots connected by a line represent the number of genes expressed in several tissues. The green, blue, golden yellow, and pink lines connecting the dots represent the number of genes expressed in two, three, four, and five tissues, respectively. The dark line represents the number of genes expressed in at least six tissues. (B) The Venn diagram shows the specific expressed GmABCs in different tissues of soybean.

Expression of GmABC Genes in Response to Al Toxicity
According to the phylogenetic tree, we chose GmABCB23, GmABCB25, GmABCB48, GmABCB52, GmABCI1, GmABCI5, and GmABCI13 for further gene expression analysis by qRT-PCR (Figure 9). These genes were chosen because they were on the same evolutionary branch as genes with known functions related to aluminum toxicity. Therefore, we analyzed and then performed qRT-PCR to identify their expression patterns related to aluminum toxicity. After treatment, the expression of all selected genes was induced by Al in roots, but the induced expression was different. GmABCB25 and GmABCB52 were upregulated; GmABCB25, GmABCB48, and GmABCI1 were first upregulated and then downregulated; GmABCB23 and GmABCI13 were first upregulated and then downregulated, and finally recovered to a high level (Figure 9). The roots of soybean exposed to a 0.5 mM CaCl 2 solution (pH 4.5) containing 50 µM AlCl 3 for 0, 3, 6, 12, and 24 h were used for analysis. The expression was determined by qRT-PCR and β-tubulin was used as an internal control. The treatment without AlCl 3 (0 µM AlCl 3 ) was used for control. All data shown are means ± SD of three biological replicates. Bars with different lowercase letters indicate significant differences (p < 0.05).

Discussion
The ABC transporter gene family is one of the largest and oldest known protein families. In plants, ABC transporters have many types, complex structures, and diverse functions, which are involved in all the life activities of plants. In previous studies, the ABC transporter gene family has been systematically analyzed in multiple plants and has been found to play vital roles in the growth, development, and response to abiotic and biotic stresses of plants [6,[40][41][42][43]. Here, we performed a comprehensive analysis of the GmABC genes and provided a reference for further exploration in soybean.
In our study, a total of 255 putative GmABC genes were identified in soybean via genome-wide analysis, accounting for 0.296% of the total 86,247 annotated genes of its reference sequence (Table S1) , indicating that the number of ABC transporters is inconsistent with their genome sizes. However, ABC transporter numbers in soybean are higher than those in all the other known species. Soybean has undergone several ancestral WGD events in the long history of evolution, and~60% sequences are repetitive. Approximately 75% of the genes are multi-copy genes, and~25% of soybean genes have experienced two times of polyploidization, and their structure and function are highly conserved [39,44,45], probably leading to the greater number of ABC genes in soybean compared with other species. During the evolutionary history of the GmABC gene family, gene diversification resulted in huge differences in the length and type of amino acid. Thus, the protein characteristics are remarkably different, including the MWs, isoelectric points and localization. For example, most GmABC proteins are predicted to be located in the plasma membrane, followed by the chloroplast (Table S1).
The ten conserved motifs of GmABC proteins were also identified and analyzed ( Figure 4 and Figure S1, Table S3) and were found to be basically the same as that of Brassica rapa, tomato, and flax [6,46,47]. This result indicates that GmABCs may have similar functional characteristics. At the same time, we found that each subfamily has unique motifs, which were in a different order according to the analysis of the motif position of GmABC proteins ( Figure S1). This finding suggests that each subfamily has specific functions [48]. Putative functions of conversed motifs were shown in Table S3 by ScanProsite analysis, and basically all of them were related to the ABCB and ABCG subfamily. Some GmABCs have no conserved motifs and only contain dozens of amino acids coding for the typical domain of ABC transporters, indicating that the GmABC family may be expanding and is possibly caused by WGD. In the evolutionary history, many higher plants underwent several WGD events [44] and polyploidization. For species that experienced WGDs, the subsequent WGD event would accelerate the gene loss of the previous WGD. The genes retained after WGDs could explain how the duplicated genes generated by WGD contributed to the organism's evolution and enhanced their adaption to the environment [49][50][51][52]. The dozens of amino acids coding for the typical domain of ABC transporters may provide some clues to the origin and evolution of the GmABC gene family.
WGDs, which serve as evolutionary switches, play vital roles in the development of different plant species and their adaptation to various environments. Moreover, gene duplications are an important source of power in gene diversification and genomic evolution [53,54]. Similar to various plants [47,[55][56][57], WGD/segmental duplications (69.41%) contribute the most to the expansion of GmABC gene families in soybean (Table S5), suggesting the high conservation of the GmABC family. Synteny analysis results showed that the Ks values and the number of homologous gene pairs between dicotyledons were much higher than that in monocotyledons (Table S7). The same is true for collinearity blocks ( Figure 6). The above results indicated that a close relationship with ABC genes existed among dicotyledonous plants and supported the view that dicotyledons were more ancient than monocotyledons. It also suggested that the ABC family mainly expanded during the evolution of dicotyledons. The gene duplication results revealed that 125 GmABC gene pairs were duplicated genes. We found that the span of the Ks value and the number of segmental duplicates (0.07-1.38, 110) were larger than those in tandem duplicates (0.05-1.57, 15) ( Table S6), indicating that the segmental duplication events ran through the whole evolutionary history of soybean. It also showed that the GmABC expansion originated from different periods over the course of soybean evolution through the evaluated divergence times of the duplication events (Table S6). Syntenic analysis and gene duplication authenticated that GmABC genes underwent strong purifying selection. Our results are close to those in other plants [47,58], indicating that GmABC genes are relatively conservative in the process of evolution and maintain functional stability. According to the expression profiles of GmABCs in different tissues, the majority of GmABCs differed in their expression patterns, and the minority displayed similarities (Figure 7). These phenomena further explained the functional diversity and expression-site specificity of GmABC genes. Therefore, further experiments are needed to obtain whether functional differentiation occurred.
Many reported studies have shown that ABC transporters are involved in abundant functions, including the transport of secondary metabolites, antibiotics, drugs, carbohydrates, lipids, and ions, as well as heavy metal detoxification [59][60][61][62][63][64]. Aluminum toxicity, which is a global agricultural problem, can limit crop productivity by inhibiting root growth. In recent years, the progress of genome sequence research has provided convenience for the cloning and identification of genes related to Al tolerance in plants [26,[65][66][67], e.g., AtALS3, AtSTAR1, OsALS1, FeALS1.1, FeALS1.2, FeSTAR2, FeSTAR1, OsSTAR1, and Os-STAR2 [32,33,36,37]. Taken together, these research results indicate that ABC transporters play important roles in plant resistance to Al toxicity. However, whether this internal detoxification mechanism exists in soybean and which gene is responsible for this mechanism have yet to be investigated. The soybean GmABCs' functions could be inferred on the basis of phylogenetic trees of GmABCs and 77 known plant ABC transporters (Figure 2). For example, GmABCB44, GmABCB45, and three known ABC transporter proteins (i.e., AtABCB23, AtABCB24, and AtABCB25, all of which are known as modulating Fe-S cluster biogenesis) [68] are located on the same branch of the phylogenetic tree ( Figure 2, Table S2), suggesting that GmABCB44 and GmABCB45 may also be involved in modulating Fe-S cluster biogenesis. Thus, we selected GmABCB23, GmABCB25, GmABCB48, GmABCB52, GmABCI1, GmABCI5, and GmABCI13 as candidate genes ( Figure 2) according to previous genes related to aluminum detoxification [14,31,32,36,61]. We found that the aluminum toxicity considerably enhanced the candidate genes' expressions ( Figure 9), indicating that they might be involved in Al toxicity. GmABCB48, GmABCB52, OsALS1, and AtALS1, which are located in the same evolutionary branch, suggest that GmABCB48 and GmABCB52 are likely to be located in the tonoplast and are responsible for the sequestration of Al into the vacuoles [31,35,36]. GmABCI5 (homolog of AtSTAR1 and OsSTAR1) may interact with GmABCI1 or GmABCI13 (homolog of AtALS3 and OsSTAR2) and form an ABC transporter to regulate Al tolerance through the vesicular transport of UDP-glucose, which affects hemicellulose metabolism by regulating XET activity [32][33][34]. Whether they work similar to known genes must be determined for validation. Our research provides the potential roles of GmABC transporters in Al toxicity and a foundation for further functional characterization.

Identification of Putative ABC Proteins in Soybean
Information on the whole soybean genome sequence, protein sequence, and gene annotation (Wm82.a4.v1) was downloaded from the Phytozome v12.1 database (https: //phytozome.jgi.doe.gov/pz/portal.html, last accessed 12 April 2020). To identify soybean ABC protein candidates, the Hidden Markov Model (HMM) analysis was used for the search. From UniPort and RGAP, all ABC protein sequences of Arabidopsis and rice were downloaded as seed sequences to establish HMM profiles [69][70][71]. On the basis of this model, HMMER 3.0 software was used to search all ABC proteins in the soybean genome [72]. The identified ABC proteins from soybean were also filtered and validated by the presence of conserved domains (Pfam: PF00005; Pfam: PF01061; Pfam: PF00664; Pfam: PF01458; Pfam: PF02470) using the Pfam (http://pfam.xfam.org/, last accessed 8 May 2020) and the Simple Modular Architecture Research Tool (SMART, http://smart.embl-heidelberg.de/smart/batch.pl, last accessed 8 May 2020). A total of 255 soybean ABC proteins were identified.
The members of the soybean ABC family were named in accordance with their chromosome position. The number of amino acids, theoretical molecular weight (MW), and theoretical isoelectric point (pI) of the soybean ABC family were obtained by ExPASy [73]. Their subcellular localization was predicted by WoLFPSORT (https://www.genscript.co m/wolf-psort.html?src=leftbar, last accessed 12 June 2020). The Phytozome database could extract the information on ABC genes, including their chromosomal distribution and their start and end positions on the chromosomes.

Gene Structure, Conversed Motif, and Phylogeny Analysis
The software TBtools (ver 1.0692) was used to draw the gene structures by comparing the cDNA sequences and its corresponding genomic DNA sequences of ABC transporter members [74,75]. The amino acid sequences were analyzed by MEME Suite v5.2.0, and the conserved motifs were identified [76]. The parameters were set as the default value of the software, and the number of motifs was 10. The discovered motifs were searched in the Expasy-Prosite database with ScanProsite server (https://prosite.expasy.org/scanprosite/, last accessed 11 July 2020) [77].
All the protein sequences of 255 soybean ABC proteins and 77 previously reported ABC transporters from other plant species were used for multiple sequence alignments by MAFFT 7.0 (https://mafft.cbrc.jp/alignment/server/, last accessed 11 July 2020). The unrooted phylogenetic tree was then constructed by MEGA 7.0 using the ML algorithm with 1000 bootstraps. The result of the ML tree was viewed and edited by Evolview [78].

Chromosomal Locations, Gene Duplication, and Gene Collinearity Analysis
The chromosomal locations of GmABC genes were illustrated by MapInspect. Gene duplication events and genome collinearity were analyzed by the MCScanX program in TBtools and visualized in CIRCOS using default parameters. The syntenic relationship of ABC genes was determined by Dual Synteny Plotter software in TBtools. The putative duplication events were detected for the ABC genes. To evaluate the selection pressure, the ratios of non-synonymous (Ka) substitutions to synonymous (Ks) substitutions of each duplicated ABC gene were calculated by the NG method of TBtools. Ks values > 2.0 must be discarded to avoid the saturation of substitutions [79,80]. The occurrence time of duplicated GmABC gene pairs was calculated by "T = Ks/(2λ × 10 −6 )" (λ = 6.5 × 10 −9 ) [81].

Expression Pattern Analysis of the GmABC Genes
The expression data of all the 255 GmABC genes in fifteen different tissues (suspensor, cotyledon, hypocotyls, embryo, endosperm, seed coat, seedling, root, shoot, leaves, nodule, flower, pod, seed, and callus) of soybean were obtained from the web interface (http://ve nanciogroup.uenf.br/resources/, last accessed 23 August 2020) to detect the expression profile [82]. The gene expression was calculated and normalized in TPM (Transcripts Per Million) using StringTie (-e option) [67]. Lastly, all the median TPMs per tissue were used to make heatmaps through TBtools to investigate their expression patterns.

Plant Materials and QRT-PCR Analysis
The soybean (Glycine max L.) seeds from our lab were used for analysis and the growth conditions are previously described [11]. The uniform seedlings were selected and exposed to 0.5 mM CaCl 2 solution (pH 4.5) containing either 0 µM AlCl3 (control) or 50 µM AlCl 3 (treatment) for 3, 6, 12, and 24 h, respectively. Then, the collected root tips (1 cm) were quickly frozen in liquid nitrogen and stored at −80 • C for subsequent RNA extraction. The experiment was performed in triplicate.
All the samples' total RNA were extracted by using Trizol reagent (Invitrogen, Carlsbad, CA, USA), as described by the manufacturer's protocol. The cDNAs were synthesized by Prime Script reverse transcriptase (TaKaRa, Kusatsu, Shiga, Japan). The number of transcripts of the selected GmABCsand gene β-tubulin (internal control) were determined by using SYBR Premix Ex Taq™ II kit (TaKaRa, Kusatsu, Shiga, Japan). qPCR conditions were as follows: stage 1, 95 • C for 5 min; stage 2, 40 cycles of 10 s at 95 • C, 30 s at 60 • C; stage 3, 95 • C for 15 s, 60 • C for 60 s, 95 • C for 15 s. Stage 3 was for melting curve analysis. The formula 2 −∆∆CT was utilized to calculate the relative expression levels. The β-tubulin was used to normalize the expression levels of the genes detected in this study. The gene-specific primers used for this analysis are shown in Table S10.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/1 0.3390/ijms22126556/s1. Figure S1. Conserved motifs in GmABC proteins. The 10 conversed motifs were identified by MEME and displayed by 10 various different colors. Table S1. Basic information about the ABC transporter gene family in soybean. Table S2. The 77 known ABC transporters in plants. Table S3. Conserved motifs present in ABC transporter in soybean. Table S4. Chromosome distribution of GmABC superfamily genes in soybean genome. Table S5. The duplicated type of GmABC genes in soybean. Table S6. KaKs calculation and estimated divergence time for each pair of segmentally and tandemly duplicated GmABC gene pairs. Table S7. One-to-one orthologous relationships between G.max and other four plant species. Table S8. RNA-seq data of 242 soybean ABC genes in fifteen tissues as shown in median TPM (transcripts per million). Table S9. List of highly expressed ABC transporter genes in different tissues. Table S10. Primers used for QRT-PCR.
Author Contributions: J.H. and H.W. contributed to the conception of the study. X.L. and Y.G. performed the experiments. X.L., X.C., Y.G., and J.H. contributed significantly to analysis and manuscript preparation. X.L., X.C., Y.G., J.H., and H.W. performed the data analyses and wrote the manuscript. J.H., W.L., and H.W. helped perform the analysis with constructive discussions. All authors have read and agreed to the published version of the manuscript.
Funding: This work was financially supported by the National Natural Science Foundation of China (32071499; 31701508) and Program for Science & Technology Innovation Talents in Universities of Henan Province (20HASTIT038). These funding supported the study design, data collection and analysis, manuscript writing, and the open access payment.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The whole soybean genome sequence, protein sequence, and gene annotation information (Wm82.a4.v1) were available in the phytozome v12.1 database (https:// phytozome.jgi.doe.gov/pz/portal.html, last accessed 12 April 2020). The expression data of all the 255 GmABC genes in fifteen different tissues of soybean were obtained from the web interface (http://venanciogroup.uenf.br/resources/, last accessed 23 August 2020) and included in Table S7 of this article.