Genome-Wide Identification and Characterization of Polygalacturonase Gene Family in Maize (Zea mays L.)

Polygalacturonase (PG, EC 3.2.1.15) is a crucial enzyme for pectin degradation and is involved in various developmental processes such as fruit ripening, pollen development, cell expansion, and organ abscission. However, information on the PG gene family in the maize (Zea mays L.) genome and the specific members involved in maize anther development are still lacking. In this study, we identified 55 PG family genes from the maize genome and further characterized their evolutionary relationship and expression patterns. Phylogenetic analysis revealed that ZmPGs are grouped into six Clades, and gene structures of the same Clade are highly conserved, suggesting their functional conservation. The ZmPGs are randomly distributed across maize chromosomes, and collinearity analysis showed that many ZmPGs might be derived from tandem duplications and segmental duplications, and these genes are under purifying selection. Furthermore, gene expression analysis provided insights into possible functional divergence among ZmPGs. Based on the RNA-seq data analysis, we found that many ZmPGs are expressed in various tissues while 18 ZmPGs are highly expressed in maize anther, and their detailed expression profiles in different anther developmental stages were further investigated by using RT-qPCR analysis. These results provide valuable information for further functional characterization and application of the ZmPGs in maize.


Introduction
Polygalacturonases (PGs, EC 3.2.1.15) belong to one of the largest hydrolase families, which are polysaccharide lyases, and catalyze α-1,4 linkages among D-galacturonic acid residues in homogalacturonan [1,2]. PGs are mainly divided into three categories: endo-PGs, exo-PGs, and rhamno-PGs. Generally, rhamno-PGs present from algae to land plants, endo-PGs appear in terrestrial plants, while exo-PGs only exist in angiosperms [3]. There are four conserved domains in plant PG proteins, and the core amino acid sequences of domains I and II are SPNTDG and GDDC, respectively. The three aspartic acids (D) in domain I and domain II may be the components of the catalytic sites [4]. Domain III is composed of CGPGHG, of which the histidine residue (H) is supposed to be involved in catalytic reaction [5]. The amino acid sequence of domain IV is RIK, which may be related to ion interaction at the carboxyl end of substrates [5]. Generally, proteins that have the above four conserved domains are identified as PGs, but domain III is relatively less

Identification of PG Genes from Maize Genome
Accurate identification and a unified nomenclature are essential for future research into the PG gene family in maize. Here, we identified a total of 55 PG genes from the maize genome and named them from ZmPG1 to ZmPG55 according to their chromosomal locations ( Table 1). The 55 ZmPG genes are randomly distributed across 10 chromosomes, while only one gene is scattered on a constitutive chromosome (Table 1). Fourteen genes (25.5%) are distributed on chromosome 6, which contains the largest number of ZmPGs, while the smallest number of ZmPGs appears on chromosomes 2 and 10 and Contig 206. 3 of 20 Clade E and Clade D ZmPG genes are mapped to seven chromosomes, while all Clade C ZmPG genes are mapped to chromosome 3. ZmPG genes in Clade A are located on chromosomes 1, 3, 5, 6, and 9, and in Clade B, they are located on chromosomes 3, 8, and 9, whereas one Clade G gene is clustered on chromosome 2. The length of maize PG proteins ranged from 148 to 766 amino acids and the molecular weight ranged from 15.42 kDa to 79.73 kDa with the predicted isoelectric point ranging from 4.93 to 9.22. Subcellular localization prediction showed that the 55 PG proteins are localized in seven different cellular compartments, including extracellular space, organelle membrane, chloroplast, plasma membrane, nucleus, mitochondrion, and an anchored component of the plasma membrane. Notably, 37 out of 55 maize PG proteins are localized at extracellular space, followed by 9 predicted to be localized at the plasma membrane. Interestingly, semi-autonomous organelles mitochondria and chloroplasts each have one PG protein (Table 1).

Phylogenetic Analysis of ZmPGs
The molecular evolution of the PG family is decided mainly by the evolution of increasingly sophisticated organs in plants [30,44]. To investigate the phylogenetic relationship of the PG gene family in maize, an unrooted phylogenetic tree was constructed from the alignment of full-length PG proteins of maize, Arabidopsis thaliana, and rice ( Figure 1A, File S1). The results showed that 165 PGs are grouped into seven Clades and correspondingly named Clades A to G based on a previous study [3]. Clade A and Clade B each contain 9 ZmPGs, and Clade C to Clade E contain 2, 19, and 15 ZmPGs, respectively ( Figure 1B, Table S2). The gene numbers of the three species in Clade B and Clade G are roughly the same, and PG genes of the three species are evenly distributed in each branch. The numbers of maize and Arabidopsis PG genes in Clade D are approximately twice those of rice, suggesting that the duplication of these Clade PG genes in maize and Arabidopsis occurred after specification. Interestingly, Arabidopsis has eleven members in Clade F, while maize and rice are absent from this Clade, indicating that these genes probably emerged after monocot and dicot plant separation. In contrast, Arabidopsis has fewer PG members in Clade A than maize and rice do. It is speculated that these monocot-specific or dicot-proliferated PG genes are probably functionalized for monocots and dicots development during evolution ( Figure 1B).
An unrooted phylogenetic tree was reconstructed using full-length protein sequences of the 55 ZmPGs with the neighbor-joining (NJ) method. The tree was divided into six main Clades (Clades A to G; Clade F was not included herein and hereafter, as maize does not have a Clade F member) ( Figure 2). To explore the relationship of gene structure and phylogeny, we investigated gene structures by GSDS [45]. The results showed that there are a different number of exons in ZmPG genes (Figure 2), ranging from 1 to 9. ZmPGs in Clades C and D containing exo-PGs have shorter gene sequences and fewer exons, while the ZmPGs in Clade E containing oligo-PGs generally have longer intron sequences. Consistent with the phylogenetic relationship, closely related genes usually have common gene structures and intron lengths. However, some ZmPGs in Clade B show a significant difference in gene structural arrangements. For instance, ZmPG49 contains only two exons, while ZmPG45 contains 8 exons ( Figure 2).  showed that there are a different number of exons in ZmPG genes (Figure 2), ranging from 1 to 9. ZmPGs in Clades C and D containing exo-PGs have shorter gene sequences and fewer exons, while the ZmPGs in Clade E containing oligo-PGs generally have longer intron sequences. Consistent with the phylogenetic relationship, closely related genes usually have common gene structures and intron lengths. However, some ZmPGs in Clade B show a significant difference in gene structural arrangements. For instance, ZmPG49 contains only two exons, while ZmPG45 contains 8 exons ( Figure 2).

Collinearity and Amino Acid Substitution Selection Pressure Analyses
Segmental duplications are long DNA fragments that are nearly identical and present in distant chromosome locations. They occur most frequently in plants because most plants are diploidized polyploids and retain a great deal of duplicated chromosomal blocks within their genomes [46,47]. Tandem duplications mainly occur in the region of

Collinearity and Amino Acid Substitution Selection Pressure Analyses
Segmental duplications are long DNA fragments that are nearly identical and present in distant chromosome locations. They occur most frequently in plants because most plants are diploidized polyploids and retain a great deal of duplicated chromosomal blocks within their genomes [46,47]. Tandem duplications mainly occur in the region of chromosome recombination [48]. Gene family members generated from tandem replications are usually closely arranged on the same chromosome, forming a gene cluster with similar sequences and similar functions [49].
Segmental duplications and tandem duplications of the 55 ZmPGs were investigated by MEGAX and McscanX. Our analysis revealed 24 tandem duplication pairs in ZmPGs (Table 2). These tandem duplication pairs formed from 10 genes, which are located on chromosome 6 and belong to Clade D. In addition to the tandem duplication pairs, 9 segmental duplication pairs were identified ( Figure 3B, Table 2). A total of 13 ZmPGs with 9 pairs associated with segmental duplications account for 23.63% (13/55) of all the ZmPGs, and 9 ZmPGs with 24 pairs associated with tandem duplications account for 16.36% (9/55) of all the ZmPGs. The total duplication ratio of ZmPGs is 39.99%, which is much lower than the maize genome duplication ratio (25,000 Mb, 60-80%), suggesting that segmental and tandem duplications contribute little to the expansion of the ZmPG gene family. Ks (synonymous substitution rate) and Ka (nonsynonymous substitution rate) parameters of duplication events were calculated through KaKs Calculator, and the date of the duplication events were calculated (T) using the formula T = Ks/2λ (λ represents the estimated clock-like rate of synonymous substitution, which is 1.65 × 10 −8 substitutions/synonymous site/year for cereals) [50][51][52]. Codon alignment of duplicated genes was performed by MEGAX [53]. The approximate dates of the estimated duplication events are shown in Table 2. The origin of the 24 tandem duplication pairs of ZmPGs on the same chromosome was 0.521 to 18.424 million years ago. The dates of other segmental duplication pairs were 4.151 to 56.732 million years ago. In addition, the Ka/Ks ratios of the 24 pairs of ZmPG tandem duplications are less than 1, and the Ka/Ks ratios of most ZmPG segmental duplications are also less than 1. As the Ka/Ks ratio gives an indication of what selection has been placed on this gene, these results indicate that the duplicated maize genes are under purifying selection, and this selection would eliminate deleterious mutations in the species. chromosome 6 and belong to Clade D. In addition to the tandem duplication pairs, 9 segmental duplication pairs were identified ( Figure 3B, Table 2). A total of 13 ZmPGs with 9 pairs associated with segmental duplications account for 23.63% (13/55) of all the ZmPGs, and 9 ZmPGs with 24 pairs associated with tandem duplications account for 16.36% (9/55) of all the ZmPGs. The total duplication ratio of ZmPGs is 39.99%, which is much lower than the maize genome duplication ratio (25,000 Mb, 60-80%), suggesting that segmental and tandem duplications contribute little to the expansion of the ZmPG gene family.

Expression Analysis of Maize PG Genes in Different Tissues and Developmental Anthers
To investigate the expression patterns of ZmPGs, RNA-seq data of 20 maize tissues were retrieved from the SRA (Sequence Read Archive) database, and RNA-seq analysis was further performed to obtain FPKM values of ZmPGs (Table S3). The results showed that 48 ZmPG genes were expressed in at least one tissue and the expression levels of the 48 genes were represented by a heatmap, as shown in Figure 4. However, there are 7 genes (ZmPG14, ZmPG24, ZmPG25, ZmPG32, ZmPG39, ZmPG50, and ZmPG53) that had no detectable expression or low expression (FPKM less than 1 in both tissues) were filtered out. As shown in Figure 4A, ZmPG genes in Clade E were constitutively expressed in various tissues, while ZmPG genes in Clade D were specifically expressed in anther, and ZmPG34 was specifically expressed in the meiotic tassel. Clade C has only two members, ZmPG9 was constitutively expressed at high levels, but the expression of the other member ZmPG14 could not be detected in any of the analyzed tissues, indicating their variations in cis-regulation. The orphan gene ZmPG7 of Clade G was highly expressed in most analyzed tissues, including the meiotic tassel.  As many ZmPG genes are specifically expressed in anther, we further analyzed the expression profiles in developmental anthers. According to the cytological characteristics of maize anthers, the development of maize anthers can be divided into 14 stages, the meiosis starts from stage 7 and ends at stage 8b, and the microspore undergoes its first mitosis at stage 11 [54]. The RNA_seq data from S5 to S11 were used for the analysis (Fig-Figure 4. (C) Relative expression of 18 ZmPGs from developmental anther stages 5 to 12 (S5-S12) detected by RT-qPCR. Each bar represents the mean and SD of three repeats. Similar results were obtained from three independent biological experiments.
As many ZmPG genes are specifically expressed in anther, we further analyzed the expression profiles in developmental anthers. According to the cytological characteristics of maize anthers, the development of maize anthers can be divided into 14 stages, the meiosis starts from stage 7 and ends at stage 8b, and the microspore undergoes its first mitosis at stage 11 [54]. The RNA_seq data from S5 to S11 were used for the analysis ( Figure 4B, Table S4). Consistent with the constitutive expression pattern in different tissues, most Clade E ZmPGs were also constitutively expressed at all the analyzed anther developmental stages. Several anther-specific Clade D ZmPG genes peaked their expression at specific anther developmental stages: ZmPG34 (S8b-S10), ZmPG9 (S8a-S10), ZmPG52 (S7-S9), ZmPG13 (S8b-S10), and ZmPG53 (S7-S8b).

Validation of the PG Gene Expression in Developmental Anthers via RT-qPCR
To further investigate the ZmPGs involved in anther development, 18 ZmPGs that showed expression in developmental anthers according to RNA_seq were validated via RT-qPCR. Consistent with the RNA_seq data, all the 18 genes were expressed in anthers ( Figure 4C). According to their expression peak occurrences at early (S5-S6), middle (S7-S9), and late (S10-S12) stages, these genes can be divided into four groups. Eleven of the 18 analyzed genes, ZmPG6, ZmPG11, ZmPG17, ZmPG22, ZmPG27, ZmPG37, ZmPG38, ZmPG44, ZmPG46, ZmPG47, and ZmPG53, showed high expression at early stages in anther development, while ZmPG15 was highly expressed at late stages. The expression of ZmPG7, ZmPG9, ZmPG13, ZmPG34, and ZmPG52 peaked at the middle stages (S7-S9) of anther development. ZmPG19 expressed highly at both early and late stages but not at the middle stage.

Cis-Regulatory Motif Analysis of ZmPG Genes
The expression analysis above showed variations in expression patterns of the maize PG genes. As cis-elements are important in gene expression regulation, we analyzed the cis-elements in the ZmPG promoters by using Plant CARE [55] (Figure 5). Generally, two categories can be distinguished according to their expression from life beginning to end: one is that the proteins are encoded by the early genes, mainly including cytoskeletal proteins, as well as proteins related to cell wall synthesis, starch accumulation, the other Clades including late genes, which encode proteins involved in pollen tube growth and pollen maturation [56]. In Arabidopsis, MYB transcription factor MS188 directly regulates QRT3 to affect pectin wall degradation and pollen exine synthesis [45]. It is suggested that the MS188 homologous gene ZmMYB84 may regulate PGs in maize. Indeed, as we found that 35 ZmPG promoter regions have MBS (MYB binding site CAACTG_motif). There are 35 ZmPG genes containing (GCN4_motif) and one gene containing (AACA_motif) that are involved in endosperm expression. In addition, we also identified 32 (GA(T)TGA(T)C(T)A(G)TGG(A)_motif) and 11 (CACGTT_motif) cis-acting regulatory elements involved in zein metabolism (O2-site) and seed-specific regulation (RY-element), respectively. In maize PG gene promoters, cis-acting regulatory elements involved in light responsiveness (G-box/G-Box) were identified in 52 PG genes. These results indicated that ZmPG genes might be involved in different biological processes of plant development. AACA_motif is associated with endosperm-specific negative expression, the G-box/G-Box cis-acting regulatory element is involved in light responsiveness, GCN4_motif is involved in endosperm expression, MBS represents a MYB binding site, O2-site represents a cis-acting regulatory element, which is involved in zein metabolism and regulation, and RY-element is involved in seed-specific regulation.

Conserved Motif and Structure Prediction of the Maize PG Proteins
Amino acid sequence alignment indicated that the vast majority of ZmPGs contain four conserved domains (I: SPNTDGI, II: GDDC, III: CGPGHGISIGSLG, and IV: RIK) (Figure 6A). However, not all ZmPGs have all the four conserved domains, e.g., in Clade D, ZmPG25 lacks domains I and II, while ZmPG55 lacks domain III. All PG proteins of Clade E lack the conserved III domain, which is also the case in apple [36]. In addition, individual amino acid substitutions are found in the conserved domains. For instance, the serine (S) of the conserved domain I is substituted by alanine (A) or threonine (T) in many Clade E PGs. Overall, the Clade E ZmPGs are the most variable members of the family. It is worth noting that the Clade G member ZmPG7 does not contain any of the four conserved domains. We then used MEME to scan conserved motifs in ZmPG proteins. Ten conserved motifs are identified ( Figure S1) and none of the ZmPGs contain all 10 motifs. Genes in the same group tend to share common motifs. For example, the PGs in Clade D contain eight same motifs, except for ZmPG20. Motif 5 covers conserved domain I and a portion of conserved domain II, and motif 10 covers conserved domain III and domain IV. Besides these two conserved motifs, motif 8 and motif 4 are the most conservative that present in Figure 5. Schematic model of seven cis-elements in the promoter sequences of ZmPGs. AACA_motif is associated with endosperm-specific negative expression, the G-box/G-Box cis-acting regulatory element is involved in light responsiveness, GCN4_motif is involved in endosperm expression, MBS represents a MYB binding site, O2-site represents a cis-acting regulatory element, which is involved in zein metabolism and regulation, and RY-element is involved in seed-specific regulation.

Conserved Motif and Structure Prediction of the Maize PG Proteins
Amino acid sequence alignment indicated that the vast majority of ZmPGs contain four conserved domains (I: SPNTDGI, II: GDDC, III: CGPGHGISIGSLG, and IV: RIK) ( Figure 6A). However, not all ZmPGs have all the four conserved domains, e.g., in Clade D, ZmPG25 lacks domains I and II, while ZmPG55 lacks domain III. All PG proteins of Clade E lack the conserved III domain, which is also the case in apple [36]. In addition, individual amino acid substitutions are found in the conserved domains. For instance, the serine (S) of the conserved domain I is substituted by alanine (A) or threonine (T) in many Clade E PGs. Overall, the Clade E ZmPGs are the most variable members of the family. It is worth noting that the Clade G member ZmPG7 does not contain any of the four conserved domains. We then used MEME to scan conserved motifs in ZmPG proteins. Ten conserved motifs are identified ( Figure S1) and none of the ZmPGs contain all 10 motifs. Genes in the same group tend to share common motifs. For example, the PGs in Clade D contain eight same motifs, except for ZmPG20. Motif 5 covers conserved domain I and a portion of conserved domain II, and motif 10 covers conserved domain III and domain IV. Besides these two conserved motifs, motif 8 and motif 4 are the most conservative that present in the majority of PGs, including the Clade E PGs. Next, we predicted three-dimensional structures of ZmPG proteins. The results showed that ZmPGs are structurally conserved and have a single-stranded right-handed beta-helix structure ( Figure 6B and Figure S2), also known as a pectin lyase-like CATH superfamily 5 [54,57]. This superfamily is mostly found in bacteria, plants, and fungus, and scarcely on invertebrates and environmental samples. This is consistent with the structure of polygalacturonase from Erwinia carotovora and Aspergillus niger [58,59]. Parallel β-helically folded enzymes can recognize and hydrolyze large polysaccharides [60]. Consistent with structure conservation, the GO annotations showed that the functions of ZmPGs are highly conserved. Almost all genes are involved in the process of pectin catabolism in the biological process and associated with the cell wall in the cellular component (Table S5). All ZmPG proteins may have hydrolyase and polygalacturonase activity ( Figure S3). the majority of PGs, including the Clade E PGs. Next, we predicted three-dimensional structures of ZmPG proteins. The results showed that ZmPGs are structurally conserved and have a single-stranded right-handed beta-helix structure ( Figure 6B and Figure S2), also known as a pectin lyase-like CATH superfamily 5 [54,57]. This superfamily is mostly found in bacteria, plants, and fungus, and scarcely on invertebrates and environmental samples. This is consistent with the structure of polygalacturonase from Erwinia carotovora and Aspergillus niger [58,59]. Parallel β-helically folded enzymes can recognize and hydrolyze large polysaccharides [60]. Consistent with structure conservation, the GO annotations showed that the functions of ZmPGs are highly conserved. Almost all genes are involved in the process of pectin catabolism in the biological process and associated with the cell wall in the cellular component (Table S5). All ZmPG proteins may have hydrolyase and polygalacturonase activity ( Figure S3).

Discussion
Plant PG genes were first identified more than 50 years ago, and the gene products are multifunctional proteins that play an important role in the decomposition of pectin. Previous studies have genome-wide identified PG gene family members from several plant species. In the current study, we identified 55 ZmPG genes, and they are randomly distributed on 11 chromosomes. Subcellular localization prediction analysis indicated that the 55 ZmPG proteins are localized in different cellular compartments. Most of them are predicted to be localized at the extracellular space, suggesting that they are secretory proteins and are associated with the degradation of the cell wall.
Phylogenetic analysis revealed that the 55 maize PG genes are grouped into six Clades, and members in the same group have similar gene structures. The number of ZmPG genes in Clade B is approximately the same in the three species, and they are evenly distributed in various branches of the phylogenetic tree, suggesting that duplication of the PG genes in Clade B may have occurred before monocots and dicots separation. Meanwhile, multiple PG genes in Clade D are clustered together in the same species, indicating that duplication of these genes occurred after specification.
Homologous genes distributed on farther locations are usually referred to as segmental duplication events, while those located together are considered as tandem duplication events [51]. Our analysis showed that the total segmental and tandem duplication ratio (39.99%) is much lower than the maize whole-genome duplication ratio. Therefore, tandem and segmental duplication events have little effect on ZmPGs expansion. However, tandem duplication-generated ZmPGs are mostly presented in Clade D, which is similar to other species [30,39], suggesting that a biased expansion occurred in Clade D genes. Our analysis showed that the KaKs ratio of 6 pairs of segmental duplications and 24 pairs of tandem duplications of ZmPG genes are less than 1. This indicates that the duplication of the ZmPG genes occurred through purifying selection, and the corresponding ZmPG proteins are considered to be relatively conserved [50][51][52][53]55,61]. Thus far, the origin of maize has not been extensively studied. It is not clear whether duplication events of the PG gene family predated the formation of Maize Species. However, the predicted earliest dates of duplication events in the 9 maize PG gene segmental duplication pairs ranged from 4.151 to 56.732 million years ago, and 24 tandem duplication pairs ranged from 0.521 to 18.424 million years ago. These results suggest that this is an ancient gene family.
Generally, PG proteins contain four conserved domains except for the Clade E members, which are less conserved [36]. Consistent with the previous studies, all ZmPGs in Clade E lack the conserved domain III. Clade G is a special category because it does not have any of the four typical domains of ZmPGs, but only has two conserved motifs. However, Clade E and G PGs are widely found in different organisms [3]. Therefore, they may have undergone extensive natural selection during the long evolutionary process. ZmPGs of each Clade may have their specific biochemical activity. It is speculated that Clade A and Clade B contain endo-PGs, Clade C and Clade D contain exo-PGs, Clade E contains rhamno PGs, and Clade F cannot be defined as exo-PGs or endo-PGs [3]. Different types of PGs have different substrates (such as HG (homogalacturonan, HG) and RG (Homogalacturonan, HG)) and products (such as OGS (oligogalacturonides, OGs) and RHA (rhamno-polygalacturonase, RHA)) [3]. Therefore, the evolutionary differences among Clades may indicate the variations in pectin components that they catalyze. Structural models may contribute to understanding the evolutionary history and biological function of ZmPGs. Important aspects to explore the structural study further are to disclose their catalytic active sites, as well as to characterize their substrates and kinetic properties.
Gene expression patterns are significant clues for clarifying gene function. In this study, we analyzed expression profiles of the 55 ZmPG genes. Previous studies showed that PG genes in Clade E present from algae to flowering plants, while the PGs in Clades C, D, and F present only in flowering plants [3]. Coding sequences of Clade E PG genes are highly conserved and most of the Clade E PG genes tend to be constitutively expressed, which reflect their important roles in plant development. These characteristics are similar to those of housekeeping genes (HK) in humans and mice [62]. Many HK genes have early origins, and the slower evolution rate of these very early originated ancient proteins is a typical feature [63]. These results suggest that members of Clade E are probably ancient proteins, whereas members of Clade C, Clade D, and Clade F may be critical for the development of specific organs in flowering plants. Most ZmPGs in Clade D are highly and specifically expressed in anthers, as well as in Arabidopsis, poplar, and cucumber [14,30,39], indicating the functional conservation of PG genes in male reproductive development across species.
Pectin is the main component of the pollen wall in angiosperms, and the pollen tube wall is an extension of the pollen inner wall [64]. In addition to the presence of pectin, many pectin-degrading enzymes have been found in plant anthers, including pectinase, polymethylgalacturonase, and pectin methylesterase [65][66][67]. Previous studies have shown that the expression of PGs is higher in the late stage of plant anther development [65,[68][69][70]. During pollen development, meiosis-generated tetrads require a separation event to form independent microspores. In Arabidopsis, microspores fail to separate in qrt1and qrt2deficient mutants, where callose can be degraded normally during tetrad pollen formation. However, pectin still exists after the degradation of callose, thereby demonstrating that QRT1 and QRT2 are required for the pectin degradation during microspore separation [19]. In our study, we showed that ZmPG52(Zm00001d048079) shares high homology with Arabidopsis QRT2(At3g07970) and is highly expressed in developmental anthers, suggesting that ZmPG52 could also be involved in maize tetrad separation. Polygalacturonase is required for the degradation of the cell wall of the pollen mother cell. In Arabidopsis thaliana, microspore isolation is impaired by the knocking out of QUARTET2 (QRT2) and QRT3 [18,19]. ZmPG7 (Zm00001d002342) shares high homology with QRT3, RNA-seq and RT-qPCR indicated that its expression peaks at the beginning of stage 7 of meiosis, suggesting that ZmPG7 may play the same role as its orthologs in Arabidopsis. Studies have shown that PGA4 is involved in the pollen development process and pollen tube growth [71]. ZmPG34 (Zm00001d006818) shares high homology with PGA4 and is highly expressed at the trinucleate stage in the fertile anthers, so it is speculated that ZmPG34 may also be involved in pollen tube growth in the same manner [28]. Notably, ZmPG7, ZmPG34, and ZmPG52 promoters contain MBS. In Arabidopsis, MYB transcription factor MS188 directly regulates QRT3 to affect pectin wall degradation and pollen exine synthesis [45]. It is suggested that MYB84 may regulate PGs in maize. Further validation of their functions will greatly advance our understanding of male sterility in maize.

Genome-Wide Identification of PG Genes in Maize
Two methods and a four-step analysis were conducted to identify PG genes from the maize genome. First, Arabidopsis PG protein sequences obtained from the TAIR website (https://www.arabidopsis.org/, accessed on 1 April 2020) were used as probes to search in the maize genome with blastP on the Gramene website (http://ensembl.gramene.org/Zea_ mays/Info/Index, accessed on 4 April 2020) [1]. Second, the maize genome was scanned and predicted for proteins corresponding to the Pfam PG family (PF00295) using Hmmer V3 (http://pfam.xfam.org/, accessed on 11 April 2020) [72]. Candidates were obtained from the original PG HMM, the high-quality proteome (E value < 1·e −10 ) was aligned with the manual verification of the complete PG domain, and hmmbuild was used to construct the maize-specific PG HMM. Putative PG genes were selected from maize-specific HMM results with E-values below 0.01. Genes acquired from the two above methods were taken as maize candidate PG genes. Then, the isolated candidate PG genes were further confirmed via online tools Pfam (http://pfam.sanger.ac.uk/search, accessed on 20 April 2020) and SMART (http://smart.embl-heidelberg.de/, accessed on 20 April 2020). In addition to the PG genes obtained by using the methods above, an Arabidopsis PG gene At4g20050 had been investigated and was used to search for orthologs in the maize database, although it did not contain any domain of classic PG proteins [18]. After deduplication, the genes left were considered as maize PG genes. To determine the physical and chemical parameters of each maize PG protein, ExPASY (https://web.expasy.org/protparam/, accessed on 1 May 2020) was used to calculate molecular weight (MW), isoelectric point (PI), and the number of amino acids [73]. BUSCA was used to predict protein subcellular localization (http://busca.biocomp.unibo.it/, accessed on 1 May 2020) [74].

Phylogenetic Analysis and Chromosomal Location of Maize PG Genes
Multiple alignments of maize, Arabidopsis, and rice PG protein sequences were performed by ClustalW of MEGAX [75]. Phylogenetic trees were constructed with MEGAX using the neighbor-joining (NJ) method, and bootstrap values were based on 500 replicates. Information of chromosome length and chromosomal location of maize PG genes were obtained from the Ensemble the Plants online website (http://ensembl.gramene.org/Zea_ mays/Info/Index, accessed on 2 February 2021) and displayed by using the online software MA2C (http://mg2c.iask.in/mg2c_v2.0/, accessed on 3 February 2021).

Gene Structure and Conserved Motif Analysis
Sequence and chromosome annotation information of maize PG genes were obtained from Gramene website (http://ensembl.gramene.org/Zea_mays/Info/Index, accessed on 3 April 2021). The web-based bioinformatic tool GSDS2.0 (http://gsds.cbi.pku.edu.cn/ index.php) was used to graphically display the exon/intron genomic structures of maize PG genes [48]. An online tool MEME motif analysis (http://meme-suite.org/tools/meme, accessed on 3 April 2021) was carried out to identify the conserved motifs of maize PG proteins [76]. The maximum number of patterns determined in the MEME program was adjusted to 10 and the width of the domain was set from 6 to 100. Default parameters were used for these bioinformatic tools, unless otherwise specified. DNAMAN software was used to display four PG conserved domains.

Collinearity Analysis and Selective Pressure for Duplicated Genes
To explore the evolutionary dynamics of the coding sequences of ZmPGs, algorithms for vertical and horizontal comparisons were performed. Two genes located in the same chromosomal fragment within 100 kb and separated by five or fewer genes were identified as tandem-duplicated genes [77]. MCScanX was used to analyze the segmental and tandem duplication events [78]. Circos were used to draw the sequence segmental duplication homology [79].

Cis-Elements Analysis of Maize PG Gene Promoters
ZmPG promoter sequences (3 Kb upstream the start codon) were retrieved from the Gramene database (http://bl.gramene.org/zea_mays/info/index, accessed on 23 May 2020). Online software PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/, accessed on 18 April 2021) was used to analyze the cis-elements in the isolated promoter sequences. The models of cis-elements in the promoters were displayed with software GSDS [48].

Expression Analysis of Maize PG Genes
The transcriptome data of Maize at different developmental stages were obtained from the SRA (Sequence Read Archive) database at NCBI (National Center for Biotechnology Information) under the accession code PRJNA171684 [81]. Transcriptome data of seed, coleoptile, root, stem, intemode, leaf, anthers, silk, cob, tassel, and tips were analyzed. First, the high-throughput sequencing data were converted into fastq files using the fast-dump parameter of the sratoolkit software, and the raw sequencing data were assessed for quality using FastQC software, followed by adaptor removing [82]. Low-quality and the excessive number of unknown bases were filtered using Trimmomatic software to obtain clean reads. Clean reads were aligned to the maize B73 reference genome using Hisat2 software [83]. Maize reference genome sequence and annotation information were downloaded from the Ensembl database (ftp://ftp.ensemblgenomes.orgpub/plants/release-27/GenBank/ Zeamays/, accessed on 18 February 2021), and transcripts were assembled using stringtie software, after which the balltown package of R software was used to calculate the transcript expression of genes in each tissue. FPKM (fragments per kilobase of exon per million fragments mapped) values were used to measure the expression levels of genes.
Maize (B73) was grown under natural conditions in Beijing, China (Experimental base of Research Center of Biology and Agriculture, University of Science and Technology Beijing, China, 116"38 E, 40"06 N). Total RNA was isolated using TRIzol reagent (Invitrogen, Waltham, MA, USA) from maize anthers. One microgram of RNA was used to synthesize first-strand cDNA. RT-qPCR was performed using SYBR TB Green TM Premix Ex Taq TM (TaKaRa, Dalian, China) with a QuantStudio 5 Real-Time PCR System (ABI, Waltham, MA, USA). ZmActin7 was used as an internal control. Primers were designed by Primer3 (version 4.1.0) and are listed in Table S6.

Conclusions
In conclusion, we systematically conducted a genome-wide exploration of maize PG genes through various bioinformatic analyses, including elucidating the physicochemical properties, phylogeny, chromosomal location, gene structure, selective pressures, collinearity analysis, and expression profiles of the ZmPG genes.
A total of 55 PG genes were identified from the maize genome, and all the maize PG genes were randomly distributed across the maize chromosomes. Phylogenetic analysis revealed that these maize PG genes were clustered into six Clades. The gene structures of the ZmPGs were highly conserved in each of the Clades, reflecting their functional conservation. Collinearity analysis showed that a high proportion of the ZmPG genes might be derived from tandem and segmental duplications with purifying selection, providing insights into possible functional divergence among members of the ZmPG gene family. Furthermore, comprehensive analyses of the expression profiles revealed that ZmPG7, ZmPG34, and ZmPG52 have an expression peak in anther development. Promoters of the three ZmPG genes have MBS cis-elements, suggesting that orthologs of MYB84 in maize may regulate these ZmPGs and probably be relative to fertility. These data provide valuable information for future functional investigations of this gene family.

Conflicts of Interest:
The authors declare no conflict of interest.