Genome-Wide Identification of Cyclophilin Gene Family in Cotton and Expression Analysis of the Fibre Development in Gossypium barbadense

Cyclophilins (CYPs) are a member of the immunophilin superfamily (in addition to FKBPs and parvulins) and play a significant role in peptidyl-prolyl cis-trans isomerase (PPIase) activity. Previous studies have shown that CYPs have important functions in plants, but no genome-wide analysis of the cotton CYP gene family has been reported, and the specific biological function of this gene is still elusive. Based on the release of the cotton genome sequence, we identified 75, 78, 40 and 38 CYP gene sequences from G. barbadense, G. hirsutum, G. arboreum, and G. raimondii, respectively; 221 CYP genes were unequally located on chromosomes. Phylogenetic analysis showed that 231 CYP genes clustered into three major groups and eight subgroups. Collinearity analysis showed that segmental duplications played a significant role in the expansion of CYP members in cotton. There were light-responsiveness, abiotic-stress and hormone-response elements upstream of most of the CYPs. In addition, the motif composition analysis revealed that 49 cyclophilin proteins had extra domains, including TPR (tetratricopeptide repeat), coiled coil, U-box, RRM (RNA recognition motif), WD40 (RNA recognition motif) and zinc finger domains, along with the cyclophilin-like domain (CLD). The expression patterns based on qRT-PCR showed that six CYP expression levels showed greater differences between Xinhai21 (long fibres, G. barbadense) and Ashmon (short fibres, G. barbadense) at 10 and 20 days postanthesis (DPA). These results signified that CYP genes are involved in the elongation stage of cotton fibre development. This study provides a valuable resource for further investigations of CYP gene functions and molecular mechanisms in cotton.


Introduction
Cyclophilins (CYPs) are a member of the immunophilin superfamily (in addition to FKBPs and Parvulins) and are identified by their highly conserved cyclophilin-like domain (CLD) [1]. In 1984, Handschumacher et al. first identified cyclophilin A (CyPA) from bovine thymocytes, which is the receptor of the immunosuppressive drug cyclosporine A (CsA) [2]. Subsequently, an 18-kDa protein with PPIase (peptidyl-prolyl cis-trans isomerase) activity, which can catalyse the cis/trans isomerisation process of Xaa-proline peptide bonds, was found by Fischer and Takahashi et al. This process is a rate-limiting step in protein folding. Fischer and Takahashi et al. also discovered that PPIase is the to determine whether there was a complete or partially conserved CLD motif, removing sequences below the significance threshold. Finally, 75, 78, 40 and 38 CYP genes were identified from G. barbadense, G. hirsutum, G. arboreum, and G. raimondii, respectively. According to the naming method of AtCYPs by He and Romano et al. [9,31], 231 CYP genes encoding proteins from cotton were named by their relative molecular weight; we used immature proteins because different tools predict different cleavable signal peptides. Furthermore, proteins with similar molecular weights were consecutively numbered. The predicted CYP genes encoded proteins ranging between 64 and 185 amino acids, with predicted molecular weights ranging between 8.50 and 142.50 kDa. The minimum isoelectric point was 4.5 (GbCYP40-2), and the maximum isoelectric point was 11.5. Statistics showed that 62.3% of proteins were basic proteins. Of these, 120 cyclophilin proteins were localised in the cytoplasm, and the rest were localised in the nucleus, chloroplast, mitochondria, plasma membrane and extracellular tissues. Detailed information about the CYP gene family in cotton is displayed in Additional File 1: Table S1.

Gene Structure and Domain Analysis
To obtain the gene structures of the cotton CYP genes, the intron/exon structures of 231 CYP genes were analysed by the GSDS online server, as displayed in Figure 2C and Additional File 2-4: Figure S1-S3. The analysis showed that 34 genes (GbCYP14-2, GbCYP16-1, GbCYP18-1 to GbCYP18-9, GhCYP12, GhCYP18-2 to GhCYP18-9, GhCYP18-11, GhCYP18-12, GaCYP15, GaCYP18-3 to GaCYP18-7 and GrCYP18-2 to GrCYP18-7) had only one exon, and there were 21 exons in GbCYP142, which had the largest number of exons. Conserved motif prediction showed that cyclophilins could be divided into single-domain (SD) and multi-domain (MD) groups. Among the 231 CYP genes, 182 genes encoded a single CLD domain, and the remaining 49 CYPs possessed other functional domains, such as TPR, coiled coil, U-box, RRM, WD40 and zinc finger domains ( Figure 3). The CLD domain was composed of motifs 1, 2, 3, 4, 5, 6 and 7 ( Figure 2B, Additional File 2-5: Figure S1-S3, Table S2). In contrast to the results of the phylogenetic analysis, most members belonging to the same subgroup had similar gene structures and functional domains.  (38) and Arabidopsis (29). MEGA 6.0 was used to build the neighbour-joining (NJ) tree with 1000 bootstrap replicates. Different line colours indicate different subgroups of CYPs. AtCYPs are represented by red triangles.

Gene Structure and Domain Analysis
To obtain the gene structures of the cotton CYP genes, the intron/exon structures of 231 CYP genes were analysed by the GSDS online server, as displayed in Figure 2C and Additional File 2-4: Figures S1-S3. The analysis showed that 34 genes (GbCYP14-2, GbCYP16-1, GbCYP18-1 to GbCYP18-9, GhCYP12, GhCYP18-2 to GhCYP18-9, GhCYP18-11, GhCYP18-12, GaCYP15, GaCYP18-3 to GaCYP18-7 and GrCYP18-2 to GrCYP18-7) had only one exon, and there were 21 exons in GbCYP142, which had the largest number of exons. Conserved motif prediction showed that cyclophilins could be divided into single-domain (SD) and multi-domain (MD) groups. Among the 231 CYP genes, 182 genes encoded a single CLD domain, and the remaining 49 CYPs possessed other functional domains, such as TPR, coiled coil, U-box, RRM, WD40 and zinc finger domains ( Figure 3). The CLD domain was composed of motifs 1, 2, 3, 4, 5, 6 and 7 ( Figure 2B, Additional File 2-5: Figures S1-S3 , Table S2). In contrast to the results of the phylogenetic analysis, most members belonging to the same subgroup had similar gene structures and functional domains.

Gene Expression Analysis
Based on the RNA-seq transcriptome data, the RPKMs of different tissues (seeds, stems, leaves and flowers) and fibre developmental stages (0 DPA ovules and 5 DPA, 10 DPA and 15 DPA fibres) were used to investigate the expression profiles of CYP family genes in sea-island cotton (Additional File 10: Table S6). The heat map shows that these GbCYP genes clustered in six pattern groups. In general, genes within the same subgroup showed similar expression patterns (Figure 7). The first group included GbCYP8, GbCYP21-1 and GbCYP37-2, which were highly expressed in seeds. In the second group, 14 genes were highly expressed in 5 and 10 DPA fibres. In the third group, 14 genes were highly expressed in leaves, while 4 genes were highly expressed in stems or flowers in the fourth group. The other genes in groups five and six were highly expressed in 0 DPA ovules. The various expression patterns suggested the functional divergence of different groups of GbCYP members in cotton.

Gene Expression Analysis
Based on the RNA-seq transcriptome data, the RPKMs of different tissues (seeds, stems, leaves and flowers) and fibre developmental stages (0 DPA ovules and 5 DPA, 10 DPA and 15 DPA fibres) were used to investigate the expression profiles of CYP family genes in sea-island cotton (Additional File 10: Table S6). The heat map shows that these GbCYP genes clustered in six pattern groups. In general, genes within the same subgroup showed similar expression patterns (Figure 7). The first group included GbCYP8, GbCYP21-1 and GbCYP37-2, which were highly expressed in seeds. In the second group, 14 genes were highly expressed in 5 and 10 DPA fibres. In the third group, 14 genes were highly expressed in leaves, while 4 genes were highly expressed in stems or flowers in the fourth group. The other genes in groups five and six were highly expressed in 0 DPA ovules. The various expression patterns suggested the functional divergence of different groups of GbCYP members in cotton. To further study the relationship between GbCYPs and cotton fibre development, we isolated 35 genes with RPKM ≥ 5 at 5, 10 and 20 DPA for further real-time quantitative RT-PCR (qRT-PCR) analysis. Of these, 15 genes were highly expressed in 5 DPA fibres ( Figure 8A, Figure 9), and GbCYP18-6, GbCYP27-2, GbCYP28, GbCYP37-4 and GbCYP37-5 were highly expressed in 10 DPA To further study the relationship between GbCYPs and cotton fibre development, we isolated 35 genes with RPKM ≥ 5 at 5, 10 and 20 DPA for further real-time quantitative RT-PCR (qRT-PCR) analysis. Of these, 15 genes were highly expressed in 5 DPA fibres ( Figures 8A and 9), and GbCYP18-6, GbCYP27-2, GbCYP28, GbCYP37-4 and GbCYP37-5 were highly expressed in 10 DPA fibres ( Figures 8B  and 9). GbCYP24-1 and GbCYP26-3 were highly expressed in 15 DPA fibres ( Figure 8B). In addition, 13 genes were mainly expressed in 20 DPA fibres (Figures 8C and 9). The results of qRT-PCR proved that the fold-change values of GbCYPs were reliable. It is notable that differentially expressed GbCYPs were mainly distributed in the second group.  Table S7.
Additionally, Xinjiang is the only long-staple cotton-producing area in China. Xinhai21 is a hybrid of the Giza variety from Egypt and has been the major cultivar in Xinjiang over the years, and Ashmon is a resource material that was introduced from Syria. Our research team investigated the quality of these two sea-island cotton fibres over the three years under field conditions and found that Xinhai21 had a greater fibre length than Ashmon (Table 1) [32]. To understand the role of GbCYPs during different developmental stages of cotton fibre, we chose Ashmon as the control material and analysed the expression profiles of six GbCYPs (GbCYP18-5, GbCYP18-6, GbCYP27-2, GbCYP36, GbCYP47-1 and GbCYP58) between Xinhai21 and Ashmon. As shown in Figure 9, there were significant differences in some fibre development stages. Generally, the expression levels of six Xinhai21 CYPs were higher than those of the same six Ashmon CYPs. GbCYP47-1 showed a progressive decrease from 5 DPA to 25 DPA in two cotton species, and GbCYP18-6 showed the highest overall expression levels at 10 DPA and 15 DPA and a 5-fold increase between 5 DPA and 10 DPA. The expression of GbCYP27-2 peaked at 10 DPA, and the expression levels of GbCYP18-5, GbCYP36 and GbCYP58 peaked at 20 DPA.   GbUBQ7 was used as an internal reference to normalise the expression data. Error bars were calculated from the difference in the expression patterns of three biological replicates. The asterisks and double asterisks indicate correlations at the 0.05 and 0.01 significance levels, respectively (t-test, * p < 0.05, ** p < 0.01).

Identification of the CYP Gene Family
In this study, we identified a total of 75, 78, 40 and 38 CYP genes from G. barbadense, G. hirsutum, G. arboreum, and G. raimondii, respectively. The number of CYP genes in tetraploid cotton was approximately twice that in diploid cotton. The evolutionary tree showed that more CYP proteins appeared in pairs and clustered together with 2 GbCYPs, 2 GhCYPs, 1 GaCYP and 1 GrCYP, which supported the cotton species polyploidisation event that occurred 1.5 million years ago [24,25]. Moreover, there were 25 genes far from other members of the CYP family in cotton, implying that these CYP family genes underwent different evolutionary processes. Some newly generated genes and pre-existing genes might have been deleted during cotton evolution. The chromosomal locations and collinearity showed that segmental duplications played a significant role in the expansion of CYP members in cotton, which was the same as the results reported in A. thaliana, rice, B. napus and other plants [9][10][11][12][13]. Approximately half (49.3%) of the common orthologous CYP genes are shared by all four Gossypium species, indicating that these collinear pairs might have already existed before the genome rearrangement. The promoter analysis results suggest that GbCYPs may play an important role in responses to light, stress and hormone signalling.

Gene Function Analysis Revealed CYP Contributions to Various Cellular Processes
Cyclophilins are present in all subcellular compartments, are involved in various metabolic processes, especially protein interactions, and act as molecular chaperones [5,33]. The CLD has an overall beta-barrel structure, which consists of eight antiparallel beta-strands and two alpha-helices Figure 9. Expression profiles of six GbCYPs between Xinhai21 and Ashmon based on qRT-PCR. GbUBQ7 was used as an internal reference to normalise the expression data. Error bars were calculated from the difference in the expression patterns of three biological replicates. The asterisks and double asterisks indicate correlations at the 0.05 and 0.01 significance levels, respectively (t-test, * p < 0.05, ** p < 0.01).
Additionally, Xinjiang is the only long-staple cotton-producing area in China. Xinhai21 is a hybrid of the Giza variety from Egypt and has been the major cultivar in Xinjiang over the years, and Ashmon is a resource material that was introduced from Syria. Our research team investigated the quality of these two sea-island cotton fibres over the three years under field conditions and found that Xinhai21 had a greater fibre length than Ashmon (Table 1) [32]. To understand the role of GbCYPs during different developmental stages of cotton fibre, we chose Ashmon as the control material and analysed the expression profiles of six GbCYPs (GbCYP18-5, GbCYP18-6, GbCYP27-2, GbCYP36, GbCYP47-1 and GbCYP58) between Xinhai21 and Ashmon. As shown in Figure 9, there were significant differences in some fibre development stages. Generally, the expression levels of six Xinhai21 CYPs were higher than those of the same six Ashmon CYPs. GbCYP47-1 showed a progressive decrease from 5 DPA to 25 DPA in two cotton species, and GbCYP18-6 showed the highest overall expression levels at 10 DPA and 15 DPA and a 5-fold increase between 5 DPA and 10 DPA. The expression of GbCYP27-2 peaked at 10 DPA, and the expression levels of GbCYP18-5, GbCYP36 and GbCYP58 peaked at 20 DPA.

Identification of the CYP Gene Family
In this study, we identified a total of 75, 78, 40 and 38 CYP genes from G. barbadense, G. hirsutum, G. arboreum, and G. raimondii, respectively. The number of CYP genes in tetraploid cotton was approximately twice that in diploid cotton. The evolutionary tree showed that more CYP proteins appeared in pairs and clustered together with 2 GbCYPs, 2 GhCYPs, 1 GaCYP and 1 GrCYP, which supported the cotton species polyploidisation event that occurred 1.5 million years ago [24,25]. Moreover, there were 25 genes far from other members of the CYP family in cotton, implying that these CYP family genes underwent different evolutionary processes. Some newly generated genes and pre-existing genes might have been deleted during cotton evolution. The chromosomal locations and collinearity showed that segmental duplications played a significant role in the expansion of CYP members in cotton, which was the same as the results reported in A. thaliana, rice, B. napus and other plants [9][10][11][12][13]. Approximately half (49.3%) of the common orthologous CYP genes are shared by all four Gossypium species, indicating that these collinear pairs might have already existed before the genome rearrangement. The promoter analysis results suggest that GbCYPs may play an important role in responses to light, stress and hormone signalling.

Gene Function Analysis Revealed CYP Contributions to Various Cellular Processes
Cyclophilins are present in all subcellular compartments, are involved in various metabolic processes, especially protein interactions, and act as molecular chaperones [5,33]. The CLD has an overall beta-barrel structure, which consists of eight antiparallel beta-strands and two alpha-helices and catalyses a rate-limiting step of the cis/trans isomerisation process in protein folding [34]. Interestingly, cyclophilins contain some special domains in addition to the CLD, suggesting that cyclophilin proteins have various biological functions in cotton.
As shown in Figure 3, we identified 28 cotton CYPs containing an additional tetratricopeptide repeat (TRP) domain at the C-terminus, and these CYPs were highly homologous to AtCYP40 (Additional File 1: Table S1). The TPR motif consists of a repetitive 34-amino acid sequence that is essential for AtCYP40 binding to Hsp90 (the heat shock protein 90). The TPR domain can regulate Hsp90 ATPase activity and aid in protein folding, trafficking and the assembly of multiprotein complexes [35]. Moreover, a complex of AGO1, HSP90, CYP40, with a small RNA duplex is a key intermediate of RNA-induced silencing complex (RISC) assembly and participates in posttranscriptional gene silencing [36]. The mutants of AtCYP40 show development defects [37]. Thus, we speculated that the 28 cotton CYPs and AtCYP40 may have similar functions.
In contrast, GaCYP70, GrCYP63, GhCYP70-1, GhCYP70-2 and GbCYP58 contained four WD40 (tryptophan-aspartate) repeats at the N-terminus and were highly homologous to AtCYP71. WD40 domains, which were originally identified in G-proteins, widely participate in protein-protein interactions, signal transduction and transcription regulation [38] and are involved in cotton fibre development during the initiation and elongation stages [30,39]. In addition, AtCYP71 interacts with histone H3 and is involved in chromatin assembly, determining the development of Arabidopsis [40].
Lastly, GbCYP142, the largest CYP in Gossypium, contains a transmembrane domain (TR) and two ribonuclease MRP protein subunits (POP1/POPLD), suggesting that GbCYP142 might be involved in the metabolism of RNA molecules [45]. In addition, GbCYP142, AtCYP21-4 and OsCYP21-4 are homologous genes. Research has shown that OsCYP21-4 expression levels are increased in response to various abiotic stresses [46]. These results underline the important functions of CYPs in many essential physiological processes of cotton growth, including fibre development.

Expression Analysis Revealed CYPs Involved in Cotton Fibre Development
Of the two tetraploid cultivated Gossypium species, G. barbadense is prized for its extra-long fibres.
The RPKM values of various expression patterns from different G. barbadense tissues showed that lost CYPs were highly expressed in ovules and fibres. qRT-PCR showed that 35 GbCYPs were highly expressed in 5-20 DPA fibres. Furthermore, the expression levels of six Xinhai21 CYPs were higher than those of Ashmon CYPs. These results signify the potential function of CYPs in the elongation stage of cotton fibre development.
Recently, research has confirmed that the phloem constitutes a long-distance transport system for trafficking plant hormones and signals and for the distribution of nutrients throughout the plant. Giavalisco et al. identified 140 soluble phloem proteins of B. napus by 1-DE and high-resolution 2-DE, which included 9 CYP proteins [47], and Hanhart et al. performed LC-MS/MS analysis of phloem protein extract of B. napus and found 12 BnCYPs in the phloem [11]. Investigating the dgt (auxin-resistant diageotropica) mutant of tomato and using Fourier transform infrared spectroscopy revealed that cyclophilin (LeCYP) is a component of auxin signalling [48]. Yeast two-hybrid and pull-down assays demonstrated that AtCYP20-2 interacted with BZR1 and changed its conformation to regulate the activity of Brassinolide (BR) [49]. BR and auxin are integrant elements for cotton fibre initiation and elongation [50,51]. These results indicate that some cyclophilins probably have essential functions in long-distance hormone signalling that are involved in cotton fibre development.
In addition, Ramsay et al. reported that WD40 repeat proteins interacted with MYB and bHLH in a double direction to regulate cotton fibre development during the initiation and elongation stages [39]. We found that 5 CYPs contained WD40 repeats (Figure 3), and real-time qRT-PCR also confirmed that GbCYP58 was more highly expressed in Xinhai21 than in Ashmon. This finding indicates that some cyclophilins may be involved in cotton fibre development through protein-protein interactions. However, the real molecular mechanisms between cyclophilins and cotton fibre remain to be further studied.

Phylogenetic and Possible Functional Analysis
Multiple protein sequence alignment of the CLD domains from G. raimondii, G. arboretum, G. hirsutum, G. barbadense and Arabidopsis was performed by ClustalW2 (http://www.genome.jp/tools-bin/ clustalw), and the alignment was imported into MEGA 5.0 (Center for Evolutionary Medicine and Informatics, Arizona State University, Tempe, AZ, USA) with pairwise distance and the NJ algorithm to construct a phylogenetic tree using the p-distance method with 1000 bootstrap replications [52]. The exon/intron structure of the cotton CYP genes was analysed by GSDS 2.0 (http://gsds.cbi.pku.edu.cn/). The MEME programme (http://meme-suite.org/) was employed to identify the conserved motifs in the CYP sequences, with the following parameters: number of unique motifs: 15; maximum and minimum search widths: 32; and 6 amino acid residues. The promoter sequences (2 kb upstream from the translation start site) of the CYP genes were obtained from the G. barbadense genomes, and 75 GbCYP gene cis-acting regulatory elements were predicted by PlantCARE (http: //bioinformatics.psb.ugent.be/webtools/plantcare/html/).

Chromosomal Locations and Gene Collinearity Analysis
The physical chromosome locations of all CYPs were obtained from the gff3 files of four cotton species databases. Mapchart 2.0 (https://mapchart.net/) software was adopted to visually map the chromosomal location [53]. Gene duplication events were analysed using the Multiple Collinearity Scan toolkit (MCScanX: http://chibba.pgml.uga.edu/mcscan2/) [54]. To exhibit segmentally duplicated pairs and orthologous pairs of CYP genes from G. barbadense and the three other Gossypium species, we used Dual Systeny Plotter software (https://github.com/CJ-Chen/TBtools) to draw collinearity maps. Thereafter, the synonymous (Ks) and nonsynonymous (Ka) substitution rates of each duplicated CYP gene were estimated by the PAL2NAL programme (http://www.bork.embl.de/pal2nal/).

Gene Expression Analysis
The reads per kb per million reads (RPKM) values were obtained from the transcriptome data of Gossypium barbadense cv. Xinhai21 [25], which were provided by Tian-Zhen Zhang and his research group. We isolated the GbCYPs with RPKM ≥ 5 for further expression analysis.

RNA Isolation and qRT-PCR Analysis
Total RNA was isolated using the RNAprep Pure Plant Kit (TIARGEN, Beijing, China) and was treated with DNase I to remove genomic DNA. The RNA quality and purity were measured by a NanoDrop 2000 spectrophotometer (Thermo Scientific, massachusetts, USA), and 1 µg of total RNA was used to synthesise first-strand cDNA with the Transcriptor First Strand cDNA Synthesis Kit and oligo-dT primers at 42 • C for 60 min and 72 • C for 10 min. Real-time PCR was performed in a 7500 Fast Real-Time PCR system (Applied Biosystems, State of California, USA) using SYBR Green Master Mix. Three biological replicates were performed per cDNA sample, and each reaction was prepared in a total volume of 20 µl containing 10 µL of SYBR Green PCR Master Mix, 1 µL of each primer, 2 µL of diluted cDNA template, and 6 µL of ddH 2 O. The cotton UBQ7 gene was used as a reference, and the PCR conditions were as follows: 95 • C for 5 min followed by 40 cycles of 95 • C for 5 s and 60 • C for 34 s. The 36 primers (Table S1) were designed by Primer Premier 5, and the results were analysed with the 2 −∆∆Ct method.

Conclusions
CYPs are ubiquitous proteins that are found in bacteria, plants and animals. There are a large number of CYPs in plants, but the functions of most plant CYPs are still elusive. For the first time, we performed a genome-wide analysis of CYP gene families in four Gossypium species. A total of 231 CYP genes were retrieved based on the conserved CLD domain and were clustered into three major groups and eight subgroups. Subsequently, a detailed analysis of their phylogeny, structures, conserved motifs, chromosomal locations, gene collinearity, cis-acting elements, and expression patterns in different fibre development stages was performed. The results revealed that most cyclophilin proteins are SD proteins, segmental duplications that played a significant role in the expansion of CYP members, and 52% of all CYPs are localised in the cytoplasm in cotton. Cyclophilin proteins may be involved in various processes of cotton growth and development. The expression patterns observed via qRT-PCR analysis showed that the expression levels of six Xinhai21 (long fibre) CYPs were higher than those of Ashmon (short fibre) CYPs, and GbCYP genes may be involved in the elongation stage of cotton fibre development. In addition, these results provide a valuable resource for further investigation of CYP gene functions in cotton.