Genome-Wide Identiﬁcation, Characterization, and Expression Analyses of P-Type ATPase Superfamily Genes in Soybean

: P-type ATPases are transmembrane pumps of cations and phospholipids. They are energized by hydrolysis of ATP and play important roles in a wide range of fundamental cellular and physiological processes during plant growth and development. However, the P-type ATPase superfamily genes have not been characterized in soybean. Here, we performed genome-wide bioinformatic and expression analyses of the P-type ATPase superfamily genes in order to explore the potential functions of P-type ATPases in soybean. A total of 105 putative P-type ATPase genes were identiﬁed in the soybean genome. Phylogenetic relationship analysis of the P-type ATPase genes indicated that they can be divided into ﬁve subfamilies including P1B, P2A/B, P3A, P4 and P5. Proteins belonging to the same subfamily shared conserved domains. Forty-seven gene pairs were related to segmental duplication, which contributed to the expansion of the P-type ATPase genes during the evolution of soybean. Most of the P-type ATPase genes contained hormonal- and/or stress-related cis-elements in their promoter regions. Expression analysis by retrieving RNA-sequencing datasets suggested that almost all of the P-type ATPase genes could be detected in soybean tissues, and some genes showed tissue-speciﬁc expression patterns. Nearly half of the P-type ATPase genes were found to be signiﬁcantly induced or repressed under stresses like salt, drought, cold, ﬂooding, and/or phosphate starvation. Four genes were signiﬁcantly affected by rhizobia inoculation in root hairs. The induction of two P2B-ATPase genes, GmACA1 and GmACA2 , by phosphate starvation was conﬁrmed by quantitative RT-PCR. This study provides information for understanding the evolution and biological functions of the P-type ATPase superfamily genes in soybean.


Introduction
The P-type ATPases are cation and lipid pumps energized by hydrolysis of ATP. They were named P-type ATPases because they form a phosphorylated (hence P-type) reaction cycle intermediate during catalysis [1]. The ATP hydrolyzing machinery of P-type ATPases consists of three interconnected cytoplasmic domains: the phosphorylation (P), nucleotide binding (N), and actuator (A) domain. A conserved sequence motif, DKTGT, is contained in the P domain of P-type ATPases, where the Asp (D) residue is phosphorylated during catalysis [1]. The P-type ATPases form a large superfamily that can be divided into five distinct evolutionarily related subfamilies, P1-P5. Each of the subfamilies can be further divided into subgroups [2,3]. The major subgroups of P-type ATPases include P1A-ATPases and development, such as nutrient uptake and transportation, stomatal movement, cell elongation, and environmental stress tolerance [6,7,28,30,[34][35][36].
The P4-ATPases, also known as phospholipid flippases, are unique to eukaryotic organisms. They are present in the plasma membrane and membranes of the secretory pathway to generate phospholipid asymmetry, which is required for vesicle budding and fusion in the secretory pathway [4,37]. A characteristic feature of the P4-ATPase subfamily proteins is the high frequency of hydrophobic or aromatic residues within those transmembrane domains that are commonly involved in cation transport in other P-type ATPases [4]. Mutation of some of these residues causes changes in lipid specificity, but none of these residues seem to be involved in direct lipid binding [37,38]. There are 12 P4-ATPase subfamily proteins in Arabidopsis, designated as Aminophospholipid ATPase1 (ALA1) to ALA12 [12]. Ten ALA proteins were identified in rice [11]. Some Arabidopsis ALA proteins have been characterized to be phospholipid flippases and have diverse physiological functions [37]. For example, AtALA1 is involved in chilling tolerance [39]; AtALA3 is a broad-specificity phospholipid flippase localized in the Golgi apparatus, where it contributes to the formation of secretory vesicles [10]; AtALA10 transports up to six different phospholipids, including sphingomyelin, a ceramide-derived phospholipid [40]. In contrast to the well-known cation transporters of the P-type ATPases, the substrate of P5 ATPases is still unclear [5]. In both Arabidopsis and rice, there is only one P5 ATPase gene [11]. In addition to model plants Arabidopsis and rice, genes encoding P-type ATPase proteins have been identified at the whole-genome scale in some other plants [41,42].
Soybean is an important economic crop providing oil-and protein-rich food for humans around the world. The genome sequence of a cultivated soybean developed in the 1980s (Glycine max L. var. Williams 82) has been released ten years ago [43]. The reference genome has opened the door for soybean functional genomics [44,45]. The entire P-type ATPase superfamily in the whole genome of soybean has not been identified, and its roles in developmental regulation and environmental stress tolerance are largely unknown. In this study, we carried out a genome-wide search and characterized and analyzed all the putative P-type ATPase genes in the soybean genome. RNA-sequencing and microarray data were used to analyze their expression patterns in different tissues of soybean and their transcriptional responses to environmental stresses and rhizobia infection. Quantitative reverse-transcription polymerase chain reaction (qRT-PCR) was used to validate the expression of two genes in response to phosphorus deficiency. These results provide useful information for further functional studies on P-type ATPase genes and their possible roles in stress tolerance, nutrient utilization, and symbiosis.

Identification and Annotation of P-Type ATPase Genes in the Soybean Genome
Proteins encoding putative P-type ATPase proteins in the soybean genome were identified by searching the soybean genome database SoyBase (https://www.soybase.org/) by BLASTP with the representative P-type ATPase proteins (including all five subfamilies of P-type ATPases, P1B, P2A/B, P3A, P4, and P5) from Arabidopsis and rice. The default settings were used. The nucleotide and protein sequences were obtained from the SoyBase website (https://www.soybase.org/). Protein domains were analyzed by using HMMScan (https://www.ebi.ac.uk/Tools/hmmer/search/hmmscan) [46] and the InterPro database (http://www.ebi.ac.uk/interpro/) [47]. The TMHMM2.0 program (http://www.cbs.dtu. dk/services/TMHMM/) was used to predict transmembrane helices [48]. Proteins that do not contain the characteristic domains and motifs of the P-type ATPase subfamilies were eliminated. Putative genes encoding P-type ATPase proteins were named according to their orthologues in Arabidopsis and their positions in the phylogenetic trees.

Phylogenetic Tree Construction and Chromosome Distribution
The full-length amino acid sequences of P-type ATPase proteins from soybean, Arabidopsis, and rice were retrieved from TAIR (http://www.arabidopsis.org/), RGAP Agronomy 2021, 11, 71 4 of 25 (http://rice.plantbiology.msu.edu/index.shtml), and SoyBase (https://www.soybase.org/), respectively. These sequences were aligned using ClustalW, and the neighbor-joining trees were constructed using the Molecular Genetics Analysis (MEGA) 6.0 with the "pairwise deletion" option and "Poisson correction" model [49]. Bootstrap replications were set to 1000 to evaluate the reliability of internal branches. Chromosomal locations of all the putative genes encoding P-type ATPase proteins were obtained from the SoyBase website (https://www.soybase.org/). The P-type ATPase genes were mapped on the 20 soybean chromosomes based on their physical positions on chromosomes.

Gene Duplication Analysis and Calculation of Ka/Ks Values
The duplication of putative P-type ATPase genes on segmentally duplicated regions was determined according to the results of phylogenetic trees and through searching the plant genome duplication database (PGDD) [50]. If the distance between two neighboring paralogous genes were less than 100 kb and separated by five or less genes, they were considered to be tandemly duplicated genes [51]. The history of selection performed on coding sequences can be measured by the Ka (non-synonymous substitution rates)/Ks (synonymous substitution rates) ratio, and the Ka/Ks can be used to identify pairwise combinations of genes, where encoded proteins may have changed functions [52]. The direction and magnitude of natural selection enforcing on the different genes could be interpreted by the Ka/Ks ratio. A pair of sequences having Ka/Ks < 1 indicates purifying selection; Ka/Ks = 1 implies both sequences drift neutrally; and Ka/Ks > 1 indicates positive or Darwinian selection [53]. The Ks and Ka of each gene pair was calculated by using TBtools [54]. The divergence time (T) was calculated by T = Ks/(2 × 6.1 × 10 −9 ) × 10 −6 MYA (million years ago), in which 6.1 × 10 −9 is divergence rate in millions of years translated from Ks value [53].

In Silico Expression Analysis of P-Type ATPase Genes in Soybean
The expression patterns of P-type ATPase genes in different tissues of soybean were retrieved from two published soybean RNA-seq datasets [55,56]. The normalized Illumina-Solexa reads number (The Reads/Kb/Million, RPKM) was log2-transformed and visualized in the heatmap, which was drawn by the MultiExperiment Viewer program (http://mev. tm4.org/). Microarray and high-throughput sequencing datasets of short-term salt stress (1 h, 6 h, and 12 h) in soybean roots [57], 1 d cold stress in soybean shoots [58], 4 d dehydration stress in soybean shoots [58], 7 d flooding stress in soybean leaves [59], 7 d drought stress in soybean leaves [59], and 7 d phosphate starvation stress in soybean roots [60], were acquired from published datasets. The published dataset of transcriptome response to symbiotic bacteria (Bradyrhizobium japonicum) in soybean root hairs was also retrieved [61].

Analysis of Cis-Acting Regulatory Elements
Promoter sequences of 1.5 kb upstream from the transcription start sites of putative genes encoding P-type ATPase proteins were obtained from the SoyBase database (http: //www.soybase.org/). The location of cis-acting regulatory elements was analyzed using Regulatory Sequence Analysis Tools (RSAT) (http://rsat.eead.csic.es/plants/) [62].

Plant Growth and qRT-PCR Analysis
Soybean seeds (Glycine max L. var. Williams 82) were germinated at room temperature in the dark between two layers of filter paper moistened with sterilized water. After four days, soybean seedlings were grown hydroponically in half-strength modified Hoagland nutrient solution, the pH of the nutrient solution was adjusted to 5.6, and the nutrient solution was changed every three days [63]. The seedlings were grown in a growth chamber with a photoperiod set at 16-h-light/8-h-dark at 26/22 • C and with a light intensity of 150 µmol m −2 s −1 . After cultivating for two weeks, soybean seedlings were used for phosphorus deficiency treatment. Seedlings were transferred into nor- For the phosphorus deficient treatment, we replaced KH 2 PO 4 with K 2 SO 4 to keep the concentration of K + . Roots and leaves of soybean seedlings were separately sampled after treatment of phosphorus deficiency for one and seven days. Quantitative RT-PCR (qRT-PCR) was performed on a real-time PCR system (CFX96 system, Bio-Rad Company, Hercules, California, USA) as described previously [64]. Relative expression levels were normalized to that of an internal control GmACTIN11 (Glyma.18g290800). The primers used for amplification were as follows: 5 -CGGTGGTTCTATCTTGGCATC-3 and 5 -GTCTTTCGCTTCAATAACCCTA-3 for GmACTIN11; 5 -TCGAGGCTTTCTTGGGTTGG-3 and 5 -GCAGTTTGCCATGTCTCAGTC-3 for GmACA1; and 5 -ACTACTTTTGGGGAAGT GGTAG-3 and 5 -CATTTGGCCACTGTTACTATCG-3 for GmACA2. The calculated efficiency (E) of these primers was all around 2.0.

Identification of Putative Genes Encoding P-Type ATPase Proteins in the Soybean Genome
To investigate the P-type ATPase gene superfamily in the soybean genome, the BLAST algorithm was used to search the soybean genome database in the SoyBase website (https://www.soybase.org/) using Arabidopsis and rice P-type ATPase protein sequences as the query. In total, 105 genes were identified to putatively encode P-type ATPase proteins (Table S1). These genes can be classified into five subfamilies, e.g., P1B, P2A/B, P3A, P4, and P5, based on their homologues in Arabidopsis and rice ( Figure 1 and Table S1). There were nineteen P1B genes, seven P2A genes, twenty-six P2B genes, twenty-four P3A genes, twenty-seven P4 genes, and two P5 genes identified in the soybean genome (Figures 1 and  2). The total number of P-type ATPase proteins in soybean is about 2.2 and 2.3 folds that in Arabidopsis and rice, respectively ( Figure 2).

P1B Subfamily Genes in Soybean
Twenty P1B subfamily genes have been identified previously in soybean and they were sequentially named GmHMA1 to GmHMA20 according to their chromosomal positions [65]. Here, we used P1B proteins from Arabidopsis and rice as queries to BLAST against the soybean genome database (SoyBase, https://www.soybase.org/). These 20 genes were all covered in the BLAST result, but the gene or protein length of these GmHMAs in the latest version of soybean genome annotation (Wm82.a4.v1) is different from the previous report (Wm82.a1.v1) [65]. For example, the protein length of GmHMA16 is 278 amino acids, while the length was 793 in the previous report. Because of the lack of characteristic domains of P1B ATPases, GmHMA16 was removed in the P1B subfamily members. The gene and protein sequences of the other 19 GmHMAs in the latest version were then used for further analysis. The coding region of these soybean P1B

P1B Subfamily Genes in Soybean
Twenty P1B subfamily genes have been identified previously in soybean and they were sequentially named GmHMA1 to GmHMA20 according to their chromosomal positions [65]. Here, we used P1B proteins from Arabidopsis and rice as queries to BLAST against the soybean genome database (SoyBase, https://www.soybase.org/). These 20 genes were all covered in the BLAST result, but the gene or protein length of these GmH-MAs in the latest version of soybean genome annotation (Wm82.a4.v1) is different from the previous report (Wm82.a1.v1) [65]. For example, the protein length of GmHMA16 is 278 amino acids, while the length was 793 in the previous report. Because of the lack of characteristic domains of P1B ATPases, GmHMA16 was removed in the P1B subfamily members. The gene and protein sequences of the other 19 GmHMAs in the latest version were then used for further analysis. The coding region of these soybean P1B ATPase genes was interrupted by 4 to 16 introns (Table S2). The protein length of GmHMAs ranged from 505 to 1096 amino acids ( Table S2). All of the GmHMA proteins were predicted to contain a E1-E2 ATPase domain ( Figure S1). Twelve GmHMA proteins were predicted to contain one to three heavy-metal-associated domains ( Figure S1). All GmHMA proteins were predicted to contain two to eight transmembrane helices by using the TMHMM2.0 program (http://www.cbs.dtu.dk/services/TMHMM/), with the exception that there was no transmembrane helix for GmHMA4 (Table S2), which is consistent with a previous report [65]. An unrooted phylogenetic analysis was conducted to explore the evolutionary relationships of HMAs in soybean, Arabidopsis and rice plants. The topology of the phylogenetic tree revealed that the P1B subfamily can be divided into six clades (I-VI) ( Figure 3). In each clade, genes from the three plant species were present, suggesting a common ancestor of the same clade in these plants.
the TMHMM2.0 program (http://www.cbs.dtu.dk/services/TMHMM/), with the exception that there was no transmembrane helix for GmHMA4 (Table S2), which is consistent with a previous report [65]. An unrooted phylogenetic analysis was conducted to explore the evolutionary relationships of HMAs in soybean, Arabidopsis and rice plants. The topology of the phylogenetic tree revealed that the P1B subfamily can be divided into six clades (I-VI) ( Figure 3). In each clade, genes from the three plant species were present, suggesting a common ancestor of the same clade in these plants. Figure 3. Phylogenetic tree of P1B family proteins from soybean (Glycine max), rice (Oryza sativa), and Arabidopsis thaliana: the full-length amino acid sequences were aligned by ClustalW. The neighbor-joining phylogenetic tree was constructed by MEGA6 with 1000 bootstrap replications. GmHMAs are denoted by red circles, AtHMAs are denoted by purple triangles, and OsHMAs are denoted by green diamonds. The locus name of a soybean gene is shown in brackets. Roman numerals designate the subfamilies.

P2 Subfamily Genes in Soybean
The P2 subfamily ATPases can be classified in two groups: P2A (ECA) and P2B (ACA). Previously, we identified 8 P2A subfamily genes and 25 P2B subfamily genes in the soybean genome [66]. The soybean P2 ATPase proteins were predicted to contain five  . Phylogenetic tree of P 1B family proteins from soybean (Glycine max), rice (Oryza sativa), and Arabidopsis thaliana: the full-length amino acid sequences were aligned by ClustalW. The neighborjoining phylogenetic tree was constructed by MEGA6 with 1000 bootstrap replications. GmHMAs are denoted by red circles, AtHMAs are denoted by purple triangles, and OsHMAs are denoted by green diamonds. The locus name of a soybean gene is shown in brackets. Roman numerals designate the subfamilies.

P2 Subfamily Genes in Soybean
The P2 subfamily ATPases can be classified in two groups: P2A (ECA) and P2B (ACA). Previously, we identified 8 P2A subfamily genes and 25 P2B subfamily genes in the soybean genome [66]. The soybean P2 ATPase proteins were predicted to contain five to ten transmembrane helices (Table S3). Domain prediction analysis showed that five domains characteristic for P2 subfamily ATPase, such as N-terminal autoinhibitory domain, N-terminus cation transporter/ATPase, E1-E2 ATPase, haloacid dehalogenase-like hydrolase, and C-terminus cation transporter/ATPase, were predicted to exist in most of the soybean P2 subfamily proteins [66]. Most of the soybean P2B proteins were predicted to contain an N-terminal autoinhibitory domain [66]. A phylogenetic tree was constructed using deduced amino acid sequences to analyze the evolutionary relatedness among the members of P2 subfamily from soybean, rice, and Arabidopsis ( Figure 4). The phylogenetic analysis suggested that the P2 subfamily formed five distinct clades; P2B can be divided into four clades, i.e., P2B-I, P2B-II, P2B-III, and P2B-IV ( Figure 4). Gene members in the same clade have a high degree of evolutionary relatedness in these three plant species, suggesting the common ancestry of P2 type ATPases in these plants.
using deduced amino acid sequences to analyze the evolutionary relatedness among the members of P2 subfamily from soybean, rice, and Arabidopsis ( Figure 4). The phylogenetic analysis suggested that the P2 subfamily formed five distinct clades; P2B can be divided into four clades, i.e., P2B-I, P2B-II, P2B-III, and P2B-IV ( Figure 4). Gene members in the same clade have a high degree of evolutionary relatedness in these three plant species, suggesting the common ancestry of P2 type ATPases in these plants. . Phylogenetic tree of P2 subfamily proteins from soybean (Glycine max), rice (Oryza sativa), and Arabidopsis thaliana: the full-length amino acid sequences were aligned by ClustalW. The neighbor-joining phylogenetic tree was constructed by MEGA6 with 1000 bootstrap replications. Soybean genes are denoted by red circles, Arabidopsis genes are denoted by purple triangles, and rice genes are denoted by green diamonds. The locus name of a soybean gene is shown following the gene name. The P2 subfamily is divided into six clades, i.e., P2A, and P2B-I to P2B-IV.

P3 Subfamily Genes in Soybean
A total of 24 putative genes encoding PM H + -ATPases which belong to the P3 subfamily were identified in soybean (Table S4). These genes were designated as GmHA1 to GmHA24. The intron number of GmHA genes varied from 8 to 21, and the protein length of GmHAs ranged from 888 to 1017 amino acids (Table S4). All of the GmHA proteins were predicted to contain 6 to 10 transmembrane helices ( Table S4). All of the GmHA proteins were predicted to contain an N-terminus cation transporter/ATPase domain, a the full-length amino acid sequences were aligned by ClustalW. The neighbor-joining phylogenetic tree was constructed by MEGA6 with 1000 bootstrap replications. Soybean genes are denoted by red circles, Arabidopsis genes are denoted by purple triangles, and rice genes are denoted by green diamonds. The locus name of a soybean gene is shown following the gene name. The P2 subfamily is divided into six clades, i.e., P2A, and P2B-I to P2B-IV.

P3 Subfamily Genes in Soybean
A total of 24 putative genes encoding PM H + -ATPases which belong to the P3 subfamily were identified in soybean (Table S4). These genes were designated as GmHA1 to GmHA24. The intron number of GmHA genes varied from 8 to 21, and the protein length of GmHAs ranged from 888 to 1017 amino acids ( Table S4). All of the GmHA proteins were predicted to contain 6 to 10 transmembrane helices ( Table S4). All of the GmHA proteins were predicted to contain an N-terminus cation transporter/ATPase domain, a E1-E2 ATPase domain, and a haloacid dehalogenase-like hydrolase domain, with the exception that there is no N-terminus cation transporter/ATPase domain for GmHA18 ( Figure S2). By sequencing alignment, the protein sequences of PM H + -ATPases were found to be highly conserved among the P3 subfamily proteins in Arabidopsis, rice and soybean ( Figure S3). All of the 24 GmHA proteins possess a penultimate Thr, a conserved region I, and a conserved region II in the C-terminal regions with the exception of GmHA13 ( Figure  5). Region I and region II have been identified to be important for autoinhibitory effects on the PM H + -ATPase [67]. The phosphorylation of penultimate Thr or Ser is required for 14-3-3 protein binding, which results in the activation of PM H + -ATPases [32,68]. A phylogenetic tree was constructed using full-length amino acid sequences of all the PM H + -ATPase in soybean, Arabidopsis, and rice. These proteins could be grouped into five distinct clades ( Figure 6). In each clade, PM H + -ATPase genes from soybean, Arabidopsis, and rice are all included ( Figure 6).
Agronomy 2021, 11, 71 9 of 25 5). Region I and region II have been identified to be important for autoinhibitory effects on the PM H + -ATPase [67]. The phosphorylation of penultimate Thr or Ser is required for 14-3-3 protein binding, which results in the activation of PM H + -ATPases [32,68]. A phylogenetic tree was constructed using full-length amino acid sequences of all the PM H + -ATPase in soybean, Arabidopsis, and rice. These proteins could be grouped into five distinct clades ( Figure 6). In each clade, PM H + -ATPase genes from soybean, Arabidopsis, and rice are all included ( Figure 6).

GmHA1
NNLLENKTAFTTKKDYGKEEREAQWALAQRTLHGLQPPETSNIFNEKSSYRELTEIAEQAKRRAEVARLRELHTLKGHVESVVKLKGLDIDTIQQHYTV NTVINQRTAFINKNDFGKEAREAAWATEQRTLHGLQSAESKG-FTDKHTFREINTLAEEARRRAEIARLRELHTLKGRVESFAKLRGLDIDAMNGHYTV Clustal Consensus : :.: Region I Region II 14-3-3 Figure 5. Alignment of the C-terminal regions of 24 soybean PM H + -ATPases and three representative PM H + -ATPases from Arabidopsis and rice (AHA2, AHA8, and OsA1): region I, region II, and the 14-3-3 protein binding site within the C-terminal regions are indicated by lines. Region I and region II in the C-terminal region have been identified to be critical for the autoinhibitory effects on the H + -ATPase. The phosphorylation of penultimate Thr or Ser is required for 14-3-3 protein binding, which results in the activation of H + -ATPases.

P4 Subfamily Genes in Soybean
In this study, 27 P4 subfamily genes were identified in the soybean genome (Table S5). These genes were designated as GmALA1 to GmALA27 according to their positions in the phylogenetic tree (Figure 7). The intron number of GmALA genes varied from 6 to 24, and the protein length of GmALAs ranged from 943 to 1297 amino acids (Table S5). All the P4 ATPase proteins in soybean were predicted to contain seven to ten transmembrane helices ( Table S5). All of these proteins were predicted to contain several conserved domains of P4-ATPases, such as an N-terminal phospholipid-translocating P-type ATPase domain, a E1-E2 ATPase domain, a cation-transport ATPase domain, a haloacid dehalogenase-like hydrolase, and a C-terminal phospholipid-translocating P-type ATPase domain ( Figure 8). A phylogenetic tree was constructed using the neighbor-joining method to determine the homology between the P4 subfamily genes in soybean, Arabidopsis and rice. The topology of the phylogenetic tree showed that the P4 subfamily can be divided into five clades (I-V) (Figure 7). Genes coming from the monocot rice, the dicotyledon Arabidopsis, and the legume soybean were all included in each of these clades (Figure 7), suggesting common ancestors of P4 ATPases in these angiosperms.

P5 Subfamily Genes in Soybean
By homolog searching, two putative genes were identified to encode P5 subfamily ATPases in soybean; they were named GmP5a and GmP5b (Table S6). The coding regions of these two genes are both interrupted by 20 introns. The encoded proteins of these two genes were predicted to contain 10 transmembrane helices (Table S6). GmP5a and GmP5b were both predicted to contain a E1-E2 ATPase domain and a haloacid dehalogenase-like hydrolase domain ( Figure 9A). The protein sequences of GmP5a and GmP5b are similar to the P5 proteins in Arabidopsis and rice ( Figure S4). A phylogenetic tree suggested that GmP5a and GmP5b are closely related to their orthologous proteins in Arabidopsis and rice ( Figure 9B).

P5 Subfamily Genes in Soybean
By homolog searching, two putative genes were identified to encode P5 subfamily ATPases in soybean; they were named GmP5a and GmP5b (Table S6). The coding regions of these two genes are both interrupted by 20 introns. The encoded proteins of these two genes were predicted to contain 10 transmembrane helices (Table S6). GmP5a and GmP5b were both predicted to contain a E1-E2 ATPase domain and a haloacid dehalogenase-like hydrolase domain ( Figure 9A). The protein sequences of GmP5a and GmP5b are similar to the P5 proteins in Arabidopsis and rice ( Figure S4). A phylogenetic tree suggested that GmP5a and GmP5b are closely related to their orthologous proteins in Arabidopsis and rice ( Figure 9B).

Chromosomal Distribution and Gene Duplication of Soybean P-Type ATPase Genes
The identified 105 putative genes encoding P-type ATPases were distributed on 19 of the 20 chromosomes in soybean ( Figure 10). There is no P-type ATPase gene on chromosome 20. On each of the other 19 chromosomes, the number of P-type ATPase genes ranged from one to eleven; only one gene (GmACA10) was found on chromosome 10, while 11 genes were found on chromosome 8 ( Figure 10). The duplication of genes resulted in gene family expansion. GmHMA1 and GmHMA2 were found to be tandem duplicated because their distance is less than 100 kb and no intervening gene was found ( Figure 10). In addition, 47 segmentally duplicated gene pairs were identified with higher bootstrap values (>90%) ( Table S7). The synonymous substitution rates (Ks), the nonsynonymous substitution rates (Ka) and the Ka/Ks ratio for the 47 duplicated gene pairs indicated high similarities in their coding sequence alignments. The Ks values of these 47 gene pairs ranged from 0.056 for gene pair GmALA26/GmALA27 to 0.372 for gene pair GmACA21/GmACA22 with an average Ks of 0.115 (Table S7). The Ka/Ks of all the 47 duplicated gene pairs were found to be between 0.023 and 0.338 (Table S7), suggesting the influence of purifying selection on the evolution of these gene pairs, because a pair of genes having Ka/Ks < 1 could indicate purifying selection enforcing on the different protein coding genes during the evolution [53]. The duplication time for each gene pairs was calculated based on the divergence rate of λ = 6.1 × 10 −9 proposed for soybean [53]. All the segmentally duplicated gene pairs showed a time frame between 4.59 and 30.52 MYA, having an average time of 9.4 MYA (Table S7).

Chromosomal Distribution and Gene Duplication of Soybean P-Type ATPase Genes
The identified 105 putative genes encoding P-type ATPases were distributed on 19 of the 20 chromosomes in soybean ( Figure 10). There is no P-type ATPase gene on chromosome 20. On each of the other 19 chromosomes, the number of P-type ATPase genes ranged from one to eleven; only one gene (GmACA10) was found on chromosome 10, while 11 genes were found on chromosome 8 ( Figure 10). The duplication of genes resulted in gene family expansion. GmHMA1 and GmHMA2 were found to be tandem duplicated because their distance is less than 100 kb and no intervening gene was found ( Figure 10). In addition, 47 segmentally duplicated gene pairs were identified with higher bootstrap values (>90%) ( Table S7). The synonymous substitution rates (Ks), the nonsynonymous substitution rates (Ka) and the Ka/Ks ratio for the 47 duplicated gene pairs indicated high similarities in their coding sequence alignments. The Ks values of these 47 gene pairs ranged from 0.056 for gene pair GmALA26/GmALA27 to 0.372 for gene pair GmACA21/GmACA22 with an average Ks of 0.115 (Table S7). The Ka/Ks of all the 47 duplicated gene pairs were found to be between 0.023 and 0.338 (Table S7), suggesting the influence of purifying selection on the evolution of these gene pairs, because a pair of genes having Ka/Ks < 1 could indicate purifying selection enforcing on the different protein coding genes during the evolution [53]. The duplication time for each gene pairs was calculated based on the divergence rate of λ = 6.1 × 10 −9 proposed for soybean [53]. All the segmentally duplicated gene pairs showed a time frame between 4.59 and 30.52 MYA, having an average time of 9.4 MYA (Table S7).

Tissue Expression Patterns of Soybean P-Type ATPase Genes
The tissue expression patterns of putative genes encoding P-type ATPase proteins in soybean were analyzed by using two published RNA-seq datasets. The first dataset contains nine tissues, i.e., leaves, shoot apical meristem (SAM), flower, green pods, root, root tip, mature nodules, and root hair cells isolated 84 and 120 h after sowing (HAS) [56]. The second dataset contains 14 tissues, i.e., three vegetative tissues (leaves, root, and nodules) and whole seed at 11 stages of reproductive tissue development (flower, pod, and seeds) [55]. In the first dataset, with the exception of GmACA18 and GmACA24, which do not have corresponding locus names of version 1.0 (Wm82.a1.v1, used in the published RNA-seq data), the expression of the other 103 putative P-type ATPase genes could be detected in at least one of the nine tissues ( Figure 11 and Table S8), suggesting authentic expression of these identified P-type ATPase genes. In the second dataset, the expression of 93 putative P-type ATPase genes could be detected in at least one of the 14 tissues, while the expression of ten genes (GmHMA20, GmACA3/4/7/8, and GmHA9/10/11/12/16) could not be found ( Figure S5). Consistently, the expression of the ten genes showed very low expression levels in tissues of soybean in the first dataset ( Figure 11 and Table S8), suggesting that these genes could be expressed under specific conditions and that their functions may be very weak under normal conditions. It is notable that the previously identified GmHMA16 could not be detected in any of these two datasets, suggesting that it is a pseudogene. As shown in both datasets, many genes were expressed ubiquitously in various tissues, such as GmHMA5-8/10-12/14/15/19, GmACA1/2/5/6/9-14/19/20/23/25/26, GmP5a/b, and GmHA3-8/20-23, whereas some genes were tissue-specifically expressed. For example, GmACA3/4, and GmHA11-15 were predominantly expressed in flowers; GmHMA1/2 and GmHA1/2 were strongly expressed in roots; and GmHMA18, GmACA21/22, and GmALA15/16 were predominantly expressed in root nodules (Figure 11 and Figure S5).

Tissue Expression Patterns of Soybean P-Type ATPase Genes
The tissue expression patterns of putative genes encoding P-type ATPase proteins in soybean were analyzed by using two published RNA-seq datasets. The first dataset contains nine tissues, i.e., leaves, shoot apical meristem (SAM), flower, green pods, root, root tip, mature nodules, and root hair cells isolated 84 and 120 h after sowing (HAS) [56]. The second dataset contains 14 tissues, i.e., three vegetative tissues (leaves, root, and nodules) and whole seed at 11 stages of reproductive tissue development (flower, pod, and seeds) [55]. In the first dataset, with the exception of GmACA18 and GmACA24, which do not have corresponding locus names of version 1.0 (Wm82.a1.v1, used in the published RNAseq data), the expression of the other 103 putative P-type ATPase genes could be detected For example, GmACA3/4, and GmHA11-15 were predominantly expressed in flowers; GmHMA1/2 and GmHA1/2 were strongly expressed in roots; and GmHMA18, GmACA21/22, and GmALA15/16 were predominantly expressed in root nodules ( Figure  11 and Figure S5).

Analysis of Cis-Acting Element in the Promoter Regions of Soybean P-Type ATPase Genes
The presence of 10 classes of cis-elements related to hormonal signal and/or stress responses was surveyed in the 1.5 kb promoter regions upstream from the predicted transcription start sites of these putative P-type ATPase genes. These cis-elements include dehydration and cold response element DRE/CRTE, abscisic acid response element ABRE, ethylene responsive element GCC-box, auxin signaling component ARF1 binding site (AuxRE), salicylic acid-responsive element SARE, phosphate starvation signaling transcription factor PHR1 binding site (P1BS), sulfur-responsive element (SURE), stress-related CAMTA transcription factor binding site CG-box, and stress-related WRKY transcription factor binding site W-box (Table S9) [69][70][71][72][73][74][75][76][77][78]. The positions of cis-elements located in the promoter regions are shown in Table S10. With the exception of GmALA4 and GmALA21, all of the other P-type ATPase genes contained at least one of the ten cis-elements in their promoter regions ( Figure S6). More than 60 of these genes contained more than four kinds of cis-elements in their promoters, especially for GmACA20 and GmHA6; each of them contained eight kinds of cis-elements ( Figure S6). Among the P-type ATPase genes, more than half of them contained SARE, W-box and/or SURE, while only around 15% of them contained DRE/CRT or P1BS ( Figure S6). Some cis-elements have more than one copy in the promoter region. For example, four ABRE elements were found in the promoter of GmHA11; seven SARE elements were found in the promoter of GmACA14; and nine SURE elements were found in the promoter of GmHMA17 ( Figure S6 and Table S10).
The expression profile of P-type ATPase genes in response to rhizobia (Bradyrhizobium japonicum) inoculation (12-48 h after inoculation (HAI)) was also analyzed in soybean root hairs by acquiring published transcriptome data [61]. With the exception of GmHA12, GmHA14, GmACA4, GmACA22, and GmECA2, almost all of the P-type ATPase genes (98 genes) could be detected in the root hairs with or without rhizobia inoculation ( Figure  13 and Table S11). It is interesting that some of these genes were significantly affected during the inoculation of rhizobia. For example, GmHMA2 was significantly induced at 12 and 24 HAI; GmACA19 and GmALA1 were remarkably repressed at 12 HAI and 48 HAI, respectively; and GmALA7 was dramatically increased at 12 HAI ( Figure 13). Two P-type ATPase genes, GmACA1 and GmACA2, were suggested to be responsive to phosphorus deficiency by RNA-seq ( Figure 12). In order to confirm their transcriptional responses to phosphorus deficiency, we analyzed their expression patterns under phosphorus deficiency stress in soybean leaves and roots by qRT-PCR. Consistent with the RNA-seq data, GmACA1 and GmACA2 were found to be significantly induced by phosphorus deficiency one day and seven days after phosphorus deficiency treatment in both leaves and roots ( Figure 14).

Figure 12.
Heatmap representation for the expression profiles of putative genes encoding P-type ATPase proteins under environmental stresses, like short-term salt stress (1 h, 6 h, and 12 h) in roots, 1 d cold stress in shoots, 4 d dehydration stress in shoots, 7 d flooding stress in leaves, 7 d drought stress in leaves, and 7 d phosphate starvation stress in roots: the intensities of the color represent the relative magnitude of fold changes in log2 values according to microarray or highthroughput sequencing data (fold change > 2, and p-value < 0.05). Red color indicates induction, blue color indicates repression, and gray color means no significant change in expression level.
The expression profile of P-type ATPase genes in response to rhizobia (Bradyrhizobium japonicum) inoculation (12-48 h after inoculation (HAI)) was also analyzed in soybean root hairs by acquiring published transcriptome data [61]. With the exception of GmHA12, GmHA14, GmACA4, GmACA22, and GmECA2, almost all of the P-type ATPase genes (98 genes) could be detected in the root hairs with or without rhizobia inoculation ( Figure 13 and Table S11). It is interesting that some of these genes were significantly affected during the inoculation of rhizobia. For example, GmHMA2 was significantly induced at 12 and 24 HAI; GmACA19 and GmALA1 were remarkably repressed at 12 HAI and 48 HAI, respectively; and GmALA7 was dramatically increased at 12 HAI ( Figure 13).   to phosphorus deficiency by RNA-seq ( Figure 12). In order to confirm their transcriptional responses to phosphorus deficiency, we analyzed their expression patterns under phosphorus deficiency stress in soybean leaves and roots by qRT-PCR. Consistent with the RNA-seq data, GmACA1 and GmACA2 were found to be significantly induced by phosphorus deficiency one day and seven days after phosphorus deficiency treatment in both leaves and roots ( Figure 14).

Discussion
The P-type ATPase superfamily comprises a large number of ATP-hydrolyzing transporters of cations and phospholipids [1]. Based on their evolutionary relationships and transporting targets, the superfamily genes are divided into five subfamilies, e.g., P1 to P5 [2,3]. Many P-type ATPase genes have been indicated to be involved in various cellular and physiological processes during the whole life of plants [6][7][8][9]37]. The P-type ATPase superfamily genes have been characterized genome-wide in the model plants Arabidopsis and rice, as well as in some other plants [3,12,41,42,[79][80][81]. The total number of P-type ATPase genes in Arabidopsis and rice is 47 and 45, respectively [11][12][13]. In this study, a total of 105 genes were identified to encode putative P-type ATPases in the soybean genome ( Figure 1 and Table S1-S6). Phylogenetic analysis suggested that soybean P-type ATPase genes can be grouped into five subfamilies, i.e., P1B, P2A/B, P3A, P4, and P5, which is consistent with that in Arabidopsis and other flowering plants [11,41]. The P2Cand P2D-type Na + pumps exist in Chlorophyceae (e.g., Ostreococcus tauri and Chlamydomonas reinhardtii) and moss (e.g., Physcomitrella patens) but appear to have been lost in vascular plants [3].
The number of the total P-type ATPase genes in soybean is about 2.2 and 2.3 times higher than that in Arabidopsis and rice, respectively ( Figure 2). Soybean is a paleopolyploid with 75% of the genes present in multiple copies [43]. The larger size of the P-type ATPase gene family in soybean could be attributed to the two whole-genome duplication events occurred approximately 59 and 13 MYA [43]. Among the P-type ATPase genes in

Discussion
The P-type ATPase superfamily comprises a large number of ATP-hydrolyzing transporters of cations and phospholipids [1]. Based on their evolutionary relationships and transporting targets, the superfamily genes are divided into five subfamilies, e.g., P1 to P5 [2,3]. Many P-type ATPase genes have been indicated to be involved in various cellular and physiological processes during the whole life of plants [6][7][8][9]37]. The P-type ATPase superfamily genes have been characterized genome-wide in the model plants Arabidopsis and rice, as well as in some other plants [3,12,41,42,[79][80][81]. The total number of P-type ATPase genes in Arabidopsis and rice is 47 and 45, respectively [11][12][13]. In this study, a total of 105 genes were identified to encode putative P-type ATPases in the soybean genome ( Figure 1 and Tables S1-S6). Phylogenetic analysis suggested that soybean P-type ATPase genes can be grouped into five subfamilies, i.e., P1B, P2A/B, P3A, P4, and P5, which is consistent with that in Arabidopsis and other flowering plants [11,41]. The P2C-and P2D-type Na + pumps exist in Chlorophyceae (e.g., Ostreococcus tauri and Chlamydomonas reinhardtii) and moss (e.g., Physcomitrella patens) but appear to have been lost in vascular plants [3].
The number of the total P-type ATPase genes in soybean is about 2.2 and 2.3 times higher than that in Arabidopsis and rice, respectively ( Figure 2). Soybean is a paleopolyploid with 75% of the genes present in multiple copies [43]. The larger size of the P-type ATPase gene family in soybean could be attributed to the two whole-genome duplication events occurred approximately 59 and 13 MYA [43]. Among the P-type ATPase genes in soybean, one pair of genes (GmHMA1 and GmHMA2) was found to be tandemly duplicated, and 47 pairs of genes were possibly duplicated segmentally ( Figure 10 and Table  S7). The time frame of segmental duplication of these genes was estimated to be between 4.59 and 30.52 MYA (Table S7). Thus, the whole genome segmental duplication could play a major role in the expansion of P-type ATPase genes during the evolution of soybean. Other gene families like SWEET transporters, calcium transporters, and WRKY transcription factors were also found to be expanded mainly by segmental duplication in the soybean genome [52,66,82]. Duplicated genes could enhance the adaptation to various environmental conditions during the evolution of soybean.
In this study, nearly all of the P-type ATPase genes could be detected in at least one of the soybean tissues and most of them were expressed in multiple tissues ( Figure 11 and Figure S5), suggesting that these genes are authentic and that they may have functions during the growth and development of soybean. Some genes in the same subfamily showed different expression patterns in various tissues, suggesting that they may have distinct functions during the growth and development of soybean. It is notable that many of the duplicated gene pairs showed similar tissue expression patterns, suggesting that they may have redundant functions. However, they may have potential different roles at specific developmental stages or under specific environmental stresses. The ion transport of each P-type ATPase is dependent on the ion binding capabilities and affinity constants [15,83]. Further biochemical experiments are needed to address the specific roles of homologous proteins. In addition, a number of genes were found to be constitutively expressed in various tissues ( Figure 11 and Figure S5), such as GmHA3-8/20-23, GmHMA5-8/10-12/14/15/19, and GmACA1/2/5/6/9-14/19/20/23/25/26, suggesting that they may have general functions during the whole life of soybean. However, some soybean P-type ATPase genes were expressed in a tissue-specific manner ( Figure 11 and Figure S5). The genes predominantly expressed in the flowers, like GmACA3/4 and GmHA11-15, could play potential roles in reproductive growth of soybean. Genes strongly expressed in the roots, like GmHMA1/2 and GmHA1/2, could have putative functions in nutrient uptake and translocation. Some P-type ATPase genes have been suggested to be expressed strongly in specific tissues and to play critical roles in diverse physiological processes, such as stomatal regulation, root development, pollen tube growth, and flower coloration [25,40,[84][85][86][87]. For example, Arabidopsis PM H + -ATPase genes AHA1 and AHA2 are the dominating isoforms in the majority of tissues, whereas AHA8, AHA7, AHA6, and AHA9 exhibited high expression levels in pollen [88]. Recently, it has been found that the simultaneous loss of function mutation of AHA6, AHA8, and AHA9 delays pollen germination, causes pollen tube growth defects, and drastically reduces fertility in Arabidopsis [84]. Some P-type ATPase genes play important roles in plant symbiosis with arbuscular mycorrhizal fungi and rhizobial bacteria [89][90][91][92]. For example, Medicago truncatula P2A-ATPase gene MCA8 has been suggested to be critical for the generation of Ca 2+ spiking in the nucleus, which is essential for legume symbiosis [91]; tomato PM H + -ATPase SlHA8 is critical for arbuscule development and energizing the periarbuscular membrane for symbiotic phosphate and nitrogen uptake [92]. It is interesting that almost all of the putative P-type ATPase genes could be detected in soybean root hairs with or without rhizobia inoculation (Bradyrhizobium japonicum) ( Figure 13). Some genes, like GmHMA18, GmACA21/22, and GmALA15/16 were predominantly expressed in root nodules ( Figure 11 and Figure S5). In addition, the expression of some genes, like GmACA19, GmHMA2, and GmALA1/7 were significantly changed at 12 or 24 HAI (Figure 13). These results suggest that P-type ATPase genes could participate in the establishment of symbiosis and nodule development in soybean. Further research is needed to investigate their potential roles in symbiosis.
The production of soybean all over the world is frequently influenced by various environmental stresses. By retrieving published microarray and RNA-seq transcriptome data, the expression of 51 putative P-type ATPase genes was found to be remarkably affected by environmental stresses like salt, drought, cold, flooding, and/or phosphate starvation. These genes included five GmHMA genes (GmHMA6/7/10/11/15) that were induced by cold; six GmHMA genes (GmHMA2/5/8/13/14/18) that were repressed by salt or dehydration; four GmACA genes (GmACA1/2/23/25) and three GmHA genes (GmHA1/3/19) that were induced cold, salt, and/or drought; and eight GmHA genes (GmHA6/16/17/20/21/22/23/24) that were repressed by cold, drought, and/or salt ( Figure 12). Modification of the expression of HMA genes could affect the transport and distribution of Cu and Zn in plant cells and could further affect the activities of Cn/Zn superoxide dismutase, which are closely related to stress tolerance. Many P-type ATPase genes have been indicated to be involved in stress response and tolerance in plants [26,39,93,94]. For example, AtHMA1, an Arabidopsis P 1B -type ATPase localized in the chloroplast envelope, is involved in Cu and Zn transport, and the loss of function mutant hma1 is sensitive to high light and excess Zn stress [95,96]; rice OsACA6 is involved in salt and drought tolerance by increasing reactive oxygen species scavenging [24]; and PM H + -ATPase plays a key role in salt and alkali resistance [97,98]. Ca 2+ signals play a very important role in plant response and tolerance to various stresses. The Ca 2+ -ATPase has been indicated to be associated with the generation of Ca 2+ signaling and cellular Ca 2+ homeostasis [8]. It is interesting that many Ca 2+ -ATPase genes were found to be induced by stresses like salt, cold, drought, and phosphate starvation (Figure 12). By using qRT-PCR, GmACA1 and GmACA2 were confirmed to be induced by phosphate starvation after treatment for short and moderate periods ( Figure  13). Similarly, the expression of a tomato Ca 2+ -ATPase gene LCA1 was also increased by phosphate starvation [99]. Whether the induction of Ca 2+ -ATPase genes is involved in the generation of Ca 2+ signaling under phosphate starvation, and whether the increased expression of specific Ca 2+ -ATPases can enhance phosphate starvation adaptation deserve further investigations.

Conclusions
In this study, 105 putative P-type ATPase genes were identified in the soybean genome. According to the phylogenetic relationships and the conserved protein domains, they were classified into five subfamilies. All of the P-type ATPase genes identified included nineteen P1B, seven P2A, twenty-six P2B, twenty-four P3A, twenty-seven P4, and two P5 genes. Comprehensive bioinformatic and expression analyses of the P-type ATPase superfamily provide a foundation for future exploration of the potential cellular and physiological functions of P-type ATPase genes in growth and development, symbiosis establishment, and environmental stress tolerance in soybean. Considering the important roles of P-type ATPases in stress resistance and nutrient utilization in model plants like Arabidopsis and rice, further research is needed to develop stress-resistant and nutrient-efficient plants by modulating the expression of P-type ATPase genes in soybean.

Supplementary Materials:
The following data is available online at https://www.mdpi.com/20 73-4395/11/1/71/s1. Figure S1: Schematic representation of functional domains of soybean P1B subfamily proteins; Figure S2: Schematic representation of functional domains of soybean P3A subfamily proteins; Figure S3: Alignment of the amino acid sequences of 24 soybean PM H + -ATPases and three representative PM H + -ATPases from Arabidopsis and rice (AHA2, AHA8, and OsA1); Figure S4: Alignment of the amino acid sequences of P5 ATPases from soybean, Arabidopsis, and rice; Figure S5: Heat map representation for tissue-specific expression patterns of the predicted P-type ATPase family genes in soybean according to Illumina RNA sequencing data (https://www.soybase. org/soyseq/); Figure S6: The distribution of cis-elements in the 1.5 kb promoter regions of soybean P-type ATPase family genes, Table S1: A list of all the putative genes encoding P-type ATPase proteins in the soybean genome; Table S2: A list of putative P1B subfamily ATPase genes and their sequence details in soybean; Table S3: A list of putative P2 subfamily ATPase genes and their sequence details in soybean; Table S4: A list of putative P3 subfamily ATPase genes and their sequence details in soybean; Table S5: A list of putative P4 subfamily ATPase genes and their sequence details in soybean; Table S6: A list of putative P5 subfamily ATPase genes and their sequence details in soybean; Table  S7: Identification of substitution rates for soybean P-type ATPase superfamily gene pairs; Table S8: Expression patterns of soybean P-type ATPase genes in 9 different tissues; Table S9: Summary of cis-acting elements that are related to plant hormone and stress response; Table S10: Distribution of hormone-and stress-related cis-elements in promoters of putative P-type ATPase genes in soybean; Table S11: Expression pattern of soybean P-type ATPase genes in root hair (RH) and stripped roots in response to B. japonicum.