Genome-Wide Identification and Analysis of Class III Peroxidases in Allotetraploid Cotton (Gossypium hirsutum L.) and their Responses to PK Deficiency

Class III peroxidases (PODs), commonly known as secretable class III plant peroxidases, are plant-specific enzymes that play critical roles in not only plant growth and development but also the responses to biotic and abiotic stress. In this study, we identified 198 nonredundant POD genes, designated GhPODs, with 180 PODs being predicted to secrete into apoplast. These POD genes were divided into 10 sub-groups based on their phylogenetic relationships. We performed systematic bioinformatic analysis of the POD genes, including analysis of gene structures, phylogenetic relationships, and gene expression profiles. The GhPODs are unevenly distributed on both upland cotton sub-genome A and D chromosomes. Additionally, these genes have undergone 15 segmental and 12 tandem duplication events, indicating that both segmental and tandem duplication contributed to the expansion of the POD gene family in upland cotton. Ka/Ks analysis suggested that most duplicated GhPODs experienced negative selection, with limited functional divergence during the duplication events. High-throughput RNA-seq data indicated that most highly expressed genes might play significant roles in root, stem, leaf, and fiber development. Under K or P deficiency conditions, PODs showed different expression patterns in cotton root and leaf. This study provides useful information for further functional analysis of the POD gene family in upland cotton.


Introduction
Peroxidases (EC 1.11.1.x) are encoded by multigenic families and are involved in several important physiological and developmental processes. Among them, class III peroxidases (EC 1.11.1.7), belonging to the haem peroxidase subfamily, exist only in plants and have an extremely widespread presence in the plant kingdom [1]. They are members of a large multigenic family with more than 200 members in

Sequence Retrieval for POD Proteins in Cotton
The local BLAST database was established with protein sequences of upland cotton (G. hirsutum L. acc. TM-1) whole genome (download from http://mascotton.njau.edu.cn). The protein sequences of POD family members in the genome of Arabidopsis were retrieved from the TAIR database (http://www.arabidopsis.org/). The candidate sequences of POD in cotton were acquired by BLASTP with each of the 73 different amino acid sequences of Arabidopsis POD gene family as query sequences (screening threshold value/E-Value: 1e −10 ). To verify the reliability of the initial results, the acquired candidate sequences were further submitted against PFAM (http://pfam.xfam.org/) to verify the domains for identifying the POD gene family members in cotton. The theoretical molecular weights (MWs) and isoelectric points (pIs) of the proteins were collected through an online program (http://www.ebi.ac.uk/Tools/seqstats/emboss_pepstats/).

Phylogenetic Analysis
Multiple sequence alignments were conducted on the amino acid sequences of POD proteins in G. hirsutum genomes using Cluster W of MEGA 5.0 software with the default settings [21]. Subsequently, the software was employed to construct an unrooted phylogenetic tree based on alignments using the Neighbor-Joining (NJ) method with the following parameters: model (p-distance), bootstrap (1000 replicates), and gap/missing data (pairwise deletion).

Gene Structure Analysis
The genomic and CDS sequences of cotton PODs, extracted from G. hirsutum genome databases, were compared by using the Gene Structure Display Server program (http://gsds.cbi.pku.edu.cn/) to infer the exon/intron organization of POD genes.

Analysis of Chromosomal Location and Gene Duplication
Information about the physical locations of all POD genes on chromosomes was obtained through BLASTn searches against the G. hirsutum genome database. All GhPOD genes were then mapped on the chromosomes using the software MapInspect (http://mapinspect.software.informer.com). The detection of POD gene duplication events was also carried out and paralogous POD gene pairs were identified based on the alignment results. The criteria were as follows: the shorter sequence covers over 80% of the longer sequence after alignment and the minimum identity of aligned regions is equal to or above 80%. In addition, to explore the selection pressures among POD duplicated genes, we calculated the nonsynonymous mutation rate (Ka), synonymous mutation rate (Ks), and Ka/Ks values for the duplicated gene pairs with Mega 5.0.

Cotton Culture and Expression Analysis of POD Genes under PK Deficiency
Cotton (G. hirsutum L. TM-1) was planted in a growth chamber (day/night of 14/10 h with temperature 30/25 • C and photo intensity 450 µmol/m2·s) under liquid culture. The solution composition was as follows (mmol/L): 2.5 Ca(NO 3 ) 2 , 1 MgSO 4 , 0.5 NH 4 H 2 PO 4 , 2.5 KCl, 2 NaCl, 2 × 10 −4 CuSO 4 , 1 × 10 −3 ZnSO 4 , 0.1 EDTA-FeNa, 2 × 10 −2 H 3 BO 3 , 5 × 10 −6 (NH 4 ) 6 Mo 7 O 24 and 1 × 10 −3 MnSO 4 . The seedlings with four expanded leaves were treated separately with the original solution (control), low K solution (0.05 KCl with NaCl to balance the Cl ion and others the same as in the original solution), and low P solution (0.005 NH 4 H 2 PO 4 with NH 4 Cl to balance the NH 4 + and others the same as in the original solution). On the 7th day of treatment, the third leaf from the uppermost was counted and the young roots of all treatments were sampled and stored in −80 • C for RNA extraction and gene expression analysis. Expression profiles of POD genes response to PK deficiency were analyzed by using the Illumina Hiseq2000 (Illumina, San Diego, CA, USA) to perform high-throughput RNA-seq of the root and leaf of control, P deficiency, and K-deficiency. In total, 26.95 Gb of raw RNA-seq data were generated (BGI-Tech., Shenzhen, China). RNA-seq reads were mapped to the cotton genotype TM-1 genome using Tophat (v2.0.8; Top Hat, Toronto, Canada). To measure the gene expression level in sampled tissues, we calculated the expression of each gene using FPKM (Fragments per Kilobase of exon model per Million mapped reads) with Cufflinks (v2.1.1; http://cole-trapnell-lab.github.io/cufflinks/). We analyzed the POD gene expression changes in root and leaf under control, P-deficiency, and K-deficiency by using software MultiExperiment Viewer (MeV; http://mev.tm4.org).

Localization of POD Proteins
Secretion of POD proteins to the apoplast or to the vacuole were predicted by combinations of using SignalP (www.cbs.dtu.dk/services/SignalP/) with a signal peptide, SecretomeP (www.cbs.dtu. dk/services/SecretomeP) without a signal peptide and TargetP (www.cbs.dtu.dk/services/TargetP). The secreted POD proteins were further investigated in the xylem saps, separately, from field cotton [8] and chamber cotton.

Identification of POD Genes
We used the 73 Arabidopsis POD genes to acquire 264 cotton POD genes by BLASTP and further verified their domains with PFAM. A total of 198 non-redundant POD genes with conserved POD domains were identified in cotton. This number is greater than that in Arabidopsis (73) [5], Populus (93) [3], Chinese Pear (94) [22], maize (119) [23], and rice (138) ( [4]; but it was similar with that in switchgrass (200) [2]. For convenience, we assigned names to these POD genes (GhPOD01-198) according to their chromosomal positions. The length of the 198 newly identified POD proteins varies from 160 to 1098 amino acid (aa) with an average of 332 aa. There is only one POD containing more than 672 aa. The isoelectric point (PI) varied from 4.12 to 10.50 with a mean of 7.73 and >7.0 of 67.2% POD proteins. Other information of chromosomal location, molecular weight (MW) gene size, coding sequence (CDs) size of each GhPOD gene/protein is shown in Table 1.

Phylogenetic Analysis
The phylogenetic tree was constructed with the NJ method based on multiple sequence alignment of entire amino acid sequence of 198 upland cotton POD protein sequences in order to acquire a better understanding of evolutionary history and the phylogenetic relationship of POD in upland cotton. Based on the phylogenetic tree (Figure 1), we identified 10 major clusters with high bootstrap probabilities (BPs) ranging from 59 to 100%, among them six clusters had 100% BPs, one had 95% BPs, and two had 71% or nearly 71% BPs (Figure 1). The POD genes were not evenly distributed in some groups in upland cotton, Cluster I had the most members (68), which could be divided into seven subgroups, but Cluster VII only had two members (Figures 1 and 2). The phylogenetic tree was constructed with the NJ method based on multiple sequence alignment of entire amino acid sequence of 198 upland cotton POD protein sequences in order to acquire a better understanding of evolutionary history and the phylogenetic relationship of POD in upland cotton. Based on the phylogenetic tree (Figure 1), we identified 10 major clusters with high bootstrap probabilities (BPs) ranging from 59 to 100%, among them six clusters had 100% BPs, one had 95% BPs, and two had 71% or nearly 71% BPs (Figure 1). The POD genes were not evenly distributed in some groups in upland cotton, Cluster I had the most members (68), which could be divided into seven subgroups, but Cluster VII only had two members (Figures 1 and 2).

Gene Structures
POD genes with classical conserved intron/exon gene structure were observed. The coding sequence of 100 of the 198 peroxidase genes are disrupted by three introns at conserved positions ( Figure 2). However, variations in this basic gene structure were observed for another 98 of the family, implicating loss of one or more introns (64) or gain of one or more introns (34). Forty genes lost one of the three putative ancestral introns, while fifteen genes lost two introns. Additionally, nine genes (among them, eight genes were closely related, belonged to VI subgroup) were devoid of

Gene Structures
POD genes with classical conserved intron/exon gene structure were observed. The coding sequence of 100 of the 198 peroxidase genes are disrupted by three introns at conserved positions ( Figure 2). However, variations in this basic gene structure were observed for another 98 of the family, implicating loss of one or more introns (64) or gain of one or more introns (34). Forty genes lost one of the three putative ancestral introns, while fifteen genes lost two introns. Additionally, nine genes (among them, eight genes were closely related, belonged to VI subgroup) were devoid of any introns. In comparison with the classical three introns, the number of POD genes gaining one to nine more introns (except eight), were 5, 3, 2, 4, 13, 2, 4, and 1, respectively. These differences may be derived from a single intron loss or gain events during the long evolutionary period. In addition, 23 of 24 genes with more than seven introns constitute group X, which contains the largest numbers of introns. Sub-clusters with conserved intron/exon gene structure were also observed.

Responses of POD Genes to PK Deficiency in Cotton Roots and Leaves
Among 198 GhPOD genes, the expression of 30 genes was not detected in all samples including roots and leaves under control conditions, and under K and P deficient conditions. In roots, K and P deficiency, respectively, induced 10 and 11 POD gene expression from zero to slight level (FPKM value < 1). In leaves, K and P deficiency, respectively, induced 28 and 11 POD gene expression from zero to slight level (FPKM value < 1 for most of them). For roots, compared with controls, 12 and 42 POD gene expression, respectively, was above 2-fold and below 0.5-fold under K deficiency; the expression of 18 POD genes, respectively, was above 2-fold and below 0.5-fold under P deficiency. In leaves, compared with controls, the expression of 25 and 16 POD genes, respectively, was above 2-fold and below 0.5-fold being subjected to K deficiency; the expression of 12 and 33 POD genes, respectively, was above 2-fold and below 0.5-fold being subjected to P deficiency. The same POD gene expressed itself with obviously different patterns in leaves or roots under K or P deficiency. For example, under K deficiency, the expression of Gh_A08G1806 was 0.2 times as much as the controls; however, under P deficiency, it was

Responses of POD Genes to PK Deficiency in Cotton Roots and Leaves
Among 198 GhPOD genes, the expression of 30 genes was not detected in all samples including roots and leaves under control conditions, and under K and P deficient conditions. In roots, K and P deficiency, respectively, induced 10 and 11 POD gene expression from zero to slight level (FPKM value < 1). In leaves, K and P deficiency, respectively, induced 28 and 11 POD gene expression from zero to slight level (FPKM value < 1 for most of them). For roots, compared with controls, 12 and 42 POD gene expression, respectively, was above 2-fold and below 0.5-fold under K deficiency; the expression of 18 POD genes, respectively, was above 2-fold and below 0.5-fold under P deficiency. In leaves, compared with controls, the expression of 25 and 16 POD genes, respectively, was above 2-fold and below 0.5-fold being subjected to K deficiency; the expression of 12 and 33 POD genes, respectively, was above 2-fold and below 0.5-fold being subjected to P deficiency. The same POD gene expressed itself with obviously different patterns in leaves or roots under K or P deficiency. For example, under K deficiency, the expression of Gh_A08G1806 was 0.2 times as much as the controls; however, under P deficiency, it was 6.1 times as much as controls in roots. K deficiency induced the expression of Gh_A08G1806 from zero to 1.3 FPKM, but P deficiency did not change its transcription level.

Secret Traits of Cotton POD
Among 198 GhPOD genes, 142 POD enzymes were predicted with signal peptide by using SignalP, 147 with signal peptide by using TargetP, another 38 POD enzymes were predicted being secreted into apoplast with SecretomeP among no signal peptides by SignalP prediction (Table 1). In xylem sap, 61 POD enzyme isoforms were identified. Among them, 31 isoforms were found not only in field conditions but also in greenhouse conditions (Figure 4).
Genes 2019, 10, x FOR PEER REVIEW 11 of 16 6.1 times as much as controls in roots. K deficiency induced the expression of Gh_A08G1806 from zero to 1.3 FPKM, but P deficiency did not change its transcription level.

Secret Traits of Cotton POD
Among 198 GhPOD genes, 142 POD enzymes were predicted with signal peptide by using SignalP, 147 with signal peptide by using TargetP, another 38 POD enzymes were predicted being secreted into apoplast with SecretomeP among no signal peptides by SignalP prediction (Table 1). In xylem sap, 61 POD enzyme isoforms were identified. Among them, 31 isoforms were found not only in field conditions but also in greenhouse conditions (Figure 4).

Identification of Cotton POD Genes and Their Expansion
Members of POD gene family are involved in the regulation of a variety of biological processes. POD proteins are classified into apoplast type and vacuole type [25]. Apoplast type PODs participate in plant cell wall lignification, defense to abiotic and biotic stresses, plant growth and development, etc. The majority of PODs (90%) was predicted to be secreted to apoplast by using SignalP plus SecretomeP, and 61 PODs in total were detected in cotton xylem sap from adult cotton plants in the field and cotton seedlings in the greenhouse, indicating their different roles in cotton growth and development. The predictive tools for localization show very different results, indicating the fact that plant localization signals are very variable.
Previously, Delannoy et al. (2003) characterized nine POD genes, found them showing differential expressions in response to the pathogen and suggested that they may have various functions in cotton defense to bacterial blight disease [9]. Furthermore, Delannoy et al. (2006) analyzed 12 POD genes from cotton and found two of them played a role in the oxidative burst response of cotton to bacterial blight [19]. Mei et al. (2009) also investigated 10 POD genes in cotton fiber development [6].
Systematic and comprehensive analyses of POD gene families have been published for Populus trichocarpa [3], Zea mays [23], Arabidopsis thaliana [5], and Oryza sativa [4]. The genome data of allotetraploid upland cotton [20] provides a useful tool for analysis of the upland cotton POD gene family. In our study, 198 POD genes were identified and characterized in upland cotton. The number in the upland cotton is higher than that in Arabidopsis (73) [5], Poplar (93) [3], Chinese Pear (94) [22], maize (119) [23], and rice (138) [4] and similar to that in switchgrass (more than 200) [2]. This is probably due to the fact that upland cotton and switchgrass are tetraploid with larger genomes containing two-type sub-genomes (respectively, 26 chromosomes, A and D; 18, A and B), and that Arabidopsis, maize and rice were diploid with smaller genomes (respectively, 5 chromosomes; 10; 12). However, this cannot explain the fact that the tetraploid Populus with 19 chromosomes in the genome has only 93 POD isoforms.
Gene duplications are one of the primary driving forces in the evolution of genomes and genetic systems [24]. Certain studies have shown that segmental duplication was largely responsible for the expansion of cotton gene families such as the TCP transcription factors in G. raimondii, YABBY and GhHsp20 in G. hirsutum [26][27][28]. By contrast, tandem duplication has contributed significantly to the expansion of this gene family in poplar [3]. However, for nsLTPs, both tandem and segmental duplication contributed to its expansion in G. arboreum and G. hirsutum, while tandem duplication was the dominant pattern in G. raimondii [29]. Interestingly, in this study, we determined that the number of GhPOD genes involved in segmental duplication and tandem duplication is similar, suggesting that both segmental and tandem duplications were equal contributors to the expansion of the POD gene family in upland cotton. It showed a similar gene duplication with POD genes in maize [23]. The Ka/Ks < 1 of the most GhPOD duplicated pairs showed that negative selection may be largely responsible for maintaining the functions of upland cotton POD enzymes.
Phylogenetic analysis of the GHPOD gene family revealed that the exon/intron structures of these genes are relatively conserved due to one half of the GhPOD genes with the 4 exons/3 introns structures. Similarly, 48 of the 73 (65.8%) peroxidase encoding genes in Arabidopsis consist of 4 exons/3 introns [5], and 38 of 138 (27.5%) in rice constitute this structure. It suggests a common ancestral gene with a classical pattern of 4 exons/3 introns [4]. Many studies have shown that introns were specifically inserted into plants and were retained in the genome during the course of evolution [30]. Another half of GhPOD genes gained or lost one or more introns from the POD coding region in a subfamily specific manner, and what is most serious is that some genes contain no introns. This extreme case for POD genes also exist in Arabidopsis [5], rice [4], maize [23], which might be explained either by the loss of all introns or by the occurrence of a reverse transcription event followed by the integration of the cDNA copy back in the genome, as described in mammals, yeast, and maize [31,32]. It is well known that the structural diversity of genes drives the evolution of multigene families. Also, the differences in these characteristics detected between different subfamilies suggest that upland cotton POD members are functionally diversified.  * the POD protein was detected in the field cotton xylem sap; # the POD protein was detected in the greenhouse cotton. Sources of the samples are as follows: leaf-1 (true leaves, Accession: SRX849561); leaf-2 (leaves at 2-week-old plants, Accession: SRX797901); root (roots at 2-week-old plants, Accession: SRX797899); stem (stems at 2-week-old plants, Accession: SRX797900); petals (petals of mature flowers, Accession: SRX797903); fibers (fibers of 25 days post-anthesis).

Expression Profiles of GhPOD Genes
Gene expression patterns can provide important clues about gene function. We used publicly available [20] (https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA248163) and our own genome-wide transcripts profiling data from upland cotton tissues as a resource to investigate the expression patterns of GhPODs. Based on the public data, 28 of the 198 identified GhPOD genes were not expressed in leaves, roots, stems, petals or fibers (Figures 4 and 5), indicating their functional loss among all organs. In comparison between our results and the results of public data by using data from normal culture conditions, 56, 35, and 19 of the 198 GhPOD genes, respectively, exhibited no expression in leaves or/and roots (Figures 4 and 5). It indicated that a part of GhPODs are expressed coincidently under different conditions or at different developmental stages. Few POD genes demonstrate tissue or organ specificity. In the Arabidopsis genome, 73 POD genes have been annotated, 65 of which were expressed in various tissues, and only three (AtPrx12, AtPrx62, AtPrx65) identified as specific to roots [33]. In the upland cotton genome, 17 of 198 POD genes were identified as specific to roots. However, only 4 and 1 were expressed in leaves and fibers, respectively (Figures 4 and 5).
PODs are expressed in different patterns when facing different biotic and abiotic stresses [19,23,34,35]. This was also confirmed in roots and leaves when plants were subjected to K or P deficiency. For example, expression level of gene Gh_D10G1158 decreased obviously in roots under K deficiency and was about 30% of the controls; under P deficiency, its expression was increased obviously and was 202% as much as controls. Conversely, its expression level was increased by 20.9-fold in the leaf under K deficiency, but very few of them show expression changes under P deficiency. Additionally, different subfamily genes showed obviously different responses to K or P deficiency. In comparison with the controls, 16.7% and 45.2% of GhPOD genes had, respectively, more than 2 and less than 0.5 times the expression level in the root under K deficit in the I subfamily; 48.0% and 33.3% of GhPOD genes had, respectively, more than 2 and less than 0.5 times the expression level in the root under P deficit in the I subfamily. Interestingly, few GhPOD genes with higher expression level with FPKM > 30 showed changes of more than 2 or less than 0.5 times in leaf or root under K or P deficit, indicating that these genes play important roles in the maintenance of basic plant growth.
Author Contributions: P.D. and Z.Z. designed the experiments and wrote the manuscript, B.Z., G.W. and M.C. contribute substantially in data analysis and interpretation of the data. All authors reviewed and approved the final manuscript.
Funding: This research was supported by the National Natural Science Foundation of China (31571600 and 31271648).

Conflicts of Interest:
The authors declare no conflict of interest.