Transcriptome Analysis Implicates Involvement of Long Noncoding RNAs in Cytoplasmic Male Sterility and Fertility Restoration in Cotton

The cytoplasmic male sterility (CMS)/restorer-of-fertility system is an important tool to exploit heterosis during commercially hybrid seed production. The importance of long noncoding RNAs (lncRNAs) in plant development is recognized, but few analyses of lncRNAs during anther development of three-line hybrid cotton (CMS-D2 line A, maintainer line B, restorer-of-fertility line R) have been reported. Here, we performed transcriptome sequencing during anther development in three-line hybrid cotton. A total of 80,695 lncRNAs were identified, in which 43,347 and 44,739 lncRNAs were differentially expressed in A–B and A–R comparisons, respectively. These lncRNAs represent functional candidates involved in CMS and fertility restoration. GO analysis indicated that cellular hormone metabolic processes and oxidation–reduction reaction processes might be involved in CMS, and cellular component morphogenesis and small molecular biosynthetic processes might participate in fertility restoration. Additionally, 63 lncRNAs were identified as putative precursors of 35 miRNAs, and quantitative reverse transcription polymerase chain reaction (qRT-PCR) showed a similar expression pattern to RNA-seq data. Furthermore, construction of lncRNA regulatory networks indicated that several miRNA–lncRNA–mRNA networks might be involved in CMS and fertility restoration. Our findings provide systematic identification of lncRNAs during anther development and lays a solid foundation for the regulatory mechanisms and utilization in hybrid cotton breeding.


Introduction
Previous studies have shown that large portions of the eukaryotic genomic sequences consist of noncoding RNAs (ncRNAs). The ncRNAs lack apparent coding potential and could be divided into microRNAs (miRNAs) of 20-30 nt in length, medium ncRNAs of 50-200 nt, and long noncoding RNAs (lncRNAs) with transcripts longer than 200 nt [1][2][3][4]. The lncRNAs are a vital regulatory component of gene expression during many biological developmental processes [3,5,6]. In recent years, numerous lncRNAs have been identified in many plant species using RNA sequencing (RNA-seq) data. For example, Jun et al. identified 6480 long intergenic noncoding RNAs (lincRNAs) in Arabidopsis by analyzing transcriptome data [7]. In maize, 20,163 lncRNAs were identified in the complete genome

Genome-Wide Identification and Characterization of LncRNAs during Anther Development of Three-Line Hybrid Cotton
To identify the lncRNAs involved in cytoplasmic male sterility (CMS) and restoration of fertility during anther development, high-throughput sequencing was performed for the CMS line (A), maintainer line (B), and fertility restoration line (R), each with three biological replicates. In total, 910 million clean reads (284 million in A, 315 million in B, and 311 million in R) passed the quality filters and were retained for further analysis. Almost 86%, 87%, and 88% of the reads were aligned to the TM-1 reference genome for A, B, and R, respectively (Table 1). On the basis of the transcripts assembly, a total of 940,696 transcripts were identified. To identify putative lncRNAs, we first filtered out the single-exon transcripts located within 500 bp of other transcripts, then excluded short transcripts (length < 200 bp), and screened the transcripts based on expression level (FPKM ≥ 0.5 for multi-exon transcripts, or FPKM ≥ 2 for single-exon transcripts), and finally filtered out the known non-lncRNAs. After these basic filtering processes, 93,546 transcripts were retained as lncRNA candidates. We evaluated the coding potential of the remaining transcripts by means of CPC (coding potential calculator) and Pfam protein domain analyses (Figure 1a). After the five-step filter process, 80,695 lncRNAs were identified during anther development (Table S1).
To describe explicitly the characteristics of lncRNAs, we compared the characteristics of lncRNAs and mRNAs in the following aspects (Figure 1b). In transcript length, the lncRNAs length ranged from 200 to 18,412 bp, and more than 97% of the lncRNAs were smaller than 2000 bp in length, whereas the length of mRNAs ranged from 150 to 21,501 bp, but almost 85% of mRNAs were between 150 and 2000 bp in length. In exon number, lncRNAs had fewer exons than mRNAs on average, almost 83% of the lncRNAs contained one exon, and 17% had multiple exons, whereas almost 26% of the mRNAs contained one exon and 74% had multiple exons. The overall open reading frame (ORF) length of lncRNAs was typically smaller than that of mRNAs. Comparison of FPKM distribution between lncRNAs and mRNAs showed that the FPKM of lncRNAs was lower than that of mRNAs during anther development (Table S1).

Identification and Functional Analysis of Differentially Expressed LncRNAs in A, B, and R Lines
The following three comparisons of lncRNA expression levels were performed: A-B, which had the isogenic nuclear genomes (containing the recessive non-functional rf1 allele) but different cytoplasm and fertility; A-R, both of which had the same CMS-D2 cytoplasm but differed in fertility and Rf1 alleles; and B-R, both of which were isogenic and fertile but differed in cytoplasm and Rf1 alleles. A total of 18,755, 20,837, and 21,346 unique lncRNAs were specifically expressed in the A, B, and R lines, respectively, and 15,692 lncRNAs were expressed in common among the A, B, and R lines (Figure 2a). On the basis of the expression level, lncRNAs with a greater than two-fold change and p-value < 0.01 were considered to be differentially expressed in different samples. A total of 43,347, 44,739, and 46,431 differentially expressed lncRNAs were identified in the A-B, A-R, and B-R comparisons, respectively, and 1713 lncRNAs were differentially expressed in common among the A-B, A-R, and B-R comparisons ( Figure 2b, Table S2). The differentially expressed lncRNAs represented a total of 67,021 non-redundant lncRNAs that were distributed among the 26 chromosomes of G. hirsutum, of which 33,142, 29,689, and 4190 differentially expressed lncRNAs were located in the A and D subgenomes and different scaffolds, respectively (Table S2). The distribution of differentially expressed lncRNAs for the three comparisons is shown in Figure 2c.
Previous studies indicate that the Rf1 gene is located on chromosome Gh_D05, and the nearest flanking simple sequence repeat (SSR) markers to Rf1 are BNL3535 at a genetic distance of 0.049 cM and NAU3652 with a genetic distance of 0.078 cM [26,27]. In the present study, a total of 2452 differentially expressed lncRNAs were identified on chromosome Gh_D05, of which 806 lncRNAs were differentially expressed in the A-R group. Furthermore, 86 lncRNAs were identified in the Rf1 region flanked by the two SSR markers, of which 65 lncRNAs were down-regulated and 21 lncRNAs were up-regulated in the R line compared with the A and B lines. For example, the RNA-seq data showed that TCONS_00779116, TCONS_00797453, TCONS_00796983, and TCONS_00797056 were up-regulated in the fertile R line, compared with the A and B lines, and the qRT-PCR results were approximately consistent (Figure 2d). 150 and 2000 bp in length. In exon number, lncRNAs had fewer exons than mRNAs on average, almost 83% of the lncRNAs contained one exon, and 17% had multiple exons, whereas almost 26% of the mRNAs contained one exon and 74% had multiple exons. The overall open reading frame (ORF) length of lncRNAs was typically smaller than that of mRNAs. Comparison of FPKM distribution between lncRNAs and mRNAs showed that the FPKM of lncRNAs was lower than that of mRNAs during anther development (Table S1).

Identification and Functional Analysis of Differentially Expressed LncRNAs in A, B, and R Lines
The following three comparisons of lncRNA expression levels were performed: A-B, which had the isogenic nuclear genomes (containing the recessive non-functional rf1 allele) but different cytoplasm and fertility; A-R, both of which had the same CMS-D2 cytoplasm but differed in fertility and Rf1 alleles; and B-R, both of which were isogenic and fertile but differed in cytoplasm and Rf1 alleles. A total of 18,755, 20,837, and 21,346 unique lncRNAs were specifically expressed in the A, B, and R lines, respectively, and 15,692 lncRNAs were expressed in common among the A, B, and R lines (Figure 2a). On the basis of the expression level, lncRNAs with a greater than two-fold change and p-value < 0.01 were considered to be differentially expressed in different samples. A total of It has been shown that lncRNAs are preferentially located close to genes that they regulate, and that lncRNAs might overlap with the promoter region and may regulate the expression profile of their target genes at the transcriptional or post-transcriptional level [6,17,[28][29][30]. In the present study, to evaluate the function of these differentially expressed lncRNAs, we selected the genes located less than 10 kb from the differentially expressed lncRNAs as corresponding target genes and performed GO analyses among the A-B, A-R, and B-R comparisons ( Figure 3a, Table S3). The GO terms oxidation-reduction process, regulation of hormone levels, and hormone metabolic process were the three most highly enriched terms in the A-B comparisons, whereas cell morphogenesis, cellular component morphogenesis, and the oxidation-reduction process were the three most enriched terms in the A-R comparisons. Similarly, in B-R comparisons, the three most enriched terms were cell morphogenesis, cellular component morphogenesis, and the single-organism biosynthetic process. 6 differentially expressed lncRNAs were identified on chromosome Gh_D05, of which 806 lncRNAs were differentially expressed in the A-R group. Furthermore, 86 lncRNAs were identified in the Rf1 region flanked by the two SSR markers, of which 65 lncRNAs were down-regulated and 21 lncRNAs were up-regulated in the R line compared with the A and B lines. For example, the RNA-seq data showed that TCONS_00779116, TCONS_00797453, TCONS_00796983, and TCONS_00797056 were up-regulated in the fertile R line, compared with the A and B lines, and the qRT-PCR results were approximately consistent (Figure 2d). It has been shown that lncRNAs are preferentially located close to genes that they regulate, and that lncRNAs might overlap with the promoter region and may regulate the expression profile of their target genes at the transcriptional or post-transcriptional level [6,17,[28][29][30]. In the present study, to evaluate the function of these differentially expressed lncRNAs, we selected the genes located less than 10 kb from the differentially expressed lncRNAs as corresponding target genes and performed GO analyses among the A-B, A-R, and B-R comparisons ( Figure 3a, Table S3). The GO terms oxidation-reduction process, regulation of hormone levels, and hormone metabolic process were the three most highly enriched terms in the A-B comparisons, whereas cell morphogenesis, cellular component morphogenesis, and the oxidation-reduction process were the three most enriched terms in the A-R comparisons. Similarly, in B-R comparisons, the three most enriched terms were cell morphogenesis, cellular component morphogenesis, and the single-organism biosynthetic process. To identify lncRNAs associated with CMS or fertility restoration, differentially expressed lncRNAs between A-B and A-R comparisons were analyzed. In total, 43,347 and 44,739 differentially expressed lncRNAs were identified in the A-B and A-R comparison groups, respectively. Of these differentially expressed lncRNAs, 328 and 431 were unique in the A-B and A-R comparisons and thus might be involved in CMS or fertility restoration during anther development of cotton. In addition, we detected significant enrichment of GO terms involved in CMS or fertility restoration (p < 0.05). In the A-B comparison, we observed GO term enrichment for biological processes, including oxidation-reduction process (GO: 0055114), photosynthetic electron transport chain (GO: 0009767), regulation of hormone levels (GO: 0010817), cellular hormone metabolic process (GO: 0034754), and cytokinin metabolic process (GO: 0009690) (Figure 3b). These results showed that the greatest difference between normal Upland cotton cytoplasm and sterile cytoplasm was enrichment in the cellular hormone metabolic process and oxidation-reduction reaction process. In the A-R comparison, the most significantly enriched GO terms among biological processes were cellular component morphogenesis (GO: 0032989) and small molecular biosynthetic process (GO: 0044283), of which cellular amino acid biosynthetic process (GO: 0008652) and tricarboxylic acid biosynthetic process showed significant enrichment (GO: 0072351) ( Figure 3c, Table S3). These results indicated that differentially expressed lncRNAs may regulate functional genes involved in cellular component morphogenesis for fertility restoration in cotton.  To identify lncRNAs associated with CMS or fertility restoration, differentially expressed lncRNAs between A-B and A-R comparisons were analyzed. In total, 43,347 and 44,739 differentially expressed lncRNAs were identified in the A-B and A-R comparison groups, respectively. Of these differentially expressed lncRNAs, 328 and 431 were unique in the A-B and A-R comparisons and thus might be involved in CMS or fertility restoration during anther development of cotton. In addition, we detected significant enrichment of GO terms involved in CMS or fertility restoration (p < 0.05). In the A-B comparison, we observed GO term enrichment for biological processes, including oxidation-reduction process (GO: 0055114), photosynthetic electron transport chain (GO: 0009767), regulation of hormone levels (GO: 0010817), cellular hormone metabolic process (GO: 0034754), and cytokinin metabolic process (GO: 0009690) (Figure 3b). These results showed that the greatest difference between normal Upland cotton cytoplasm and sterile cytoplasm was enrichment in the cellular hormone metabolic process and oxidation-reduction reaction process. In the A-R comparison, the most significantly enriched GO terms among biological processes were cellular component morphogenesis (GO: 0032989) and small molecular biosynthetic process (GO: 0044283), of which cellular amino acid biosynthetic process (GO: 0008652) and tricarboxylic acid biosynthetic process showed significant enrichment (GO: 0072351) ( Figure 3c, Table S3). These results indicated that differentially expressed lncRNAs may regulate functional genes involved in cellular component morphogenesis for fertility restoration in cotton.

Predicted Interactions between LncRNAs and MiRNAs during Anther Development
MiRNAs regulate gene expression at the post-transcriptional level by interacting with the complementary binding sites on target sequences, resulting in mRNA cleavage, decoy activity, and translation repression [15,31]. A previous study indicated that lncRNAs may act as targets or eTMs by binding with miRNAs and thus inhibit the interaction between miRNAs and the target genes [18]. In the current study, we predicted the potential of lncRNAs as miRNA eTMs by integration of previous miRNA sequence data during anther development [25]. In total, two lncRNAs (TCONS_00342368 and TCONS_00148576) were predicted to be potential eTMs for five miRNAs, of which TCONS_00342368 was a putative eTM for ath-miR171c-5p, osa-miR171c-5p, and stu-miR171d-5p, and TCONS_00148576 was a putative eTM for ath-miR399b and cme-miR399d (Figure 4a). The predicted miRNA binding sites and the bulge region in lncRNAs were identical among the different miRNAs in the same miRNA family. To analyze the sequence evolutionary conservation of these eTM lncRNAs, we aligned the sequences of the predicted eTM-binding sites for miR171 and miR399 from Arabidopsis and rice ( Figure 4b). The miRNA binding sequences were well conserved, and the bulge region frequently varied among different species, consistent with previous studies of Arabidopsis and rice [18] and Brassica [19]. Further analysis is necessary to understand the role of the two lncRNAs predicted to be eTMs.
In addition, lncRNAs might have functions associated with a role as miRNA precursors [32]. Those lncRNAs that act as precursors of miRNAs might perform an indirect regulatory function through corresponding miRNAs. Moreover, differential expression of lncRNAs might result in the differential expression of corresponding mature miRNAs [15,32]. In the present study, 63 lncRNAs were predicted to be putative precursors of 35 miRNAs belonging to 26 miRNA families (Table S4). Of these lncRNAs, 13 lncRNAs were identified as the putative precursor of multiple miRNAs, with 1-2 nt difference in the mature miRNA sequence. Interestingly, 19 of the precursor lncRNAs showed differential expression among the A, B, and R lines, of which five lncRNAs (TCONS_00600850, TCONS_00807084, TCONS_01123999, TCONS_01109996, and TCONS_01148734) showed a consistent expression pattern with the corresponding mature miRNA (gra-miR8753, gma-miR160b, ghr-miR7506, ghr-miR7511, and gra-miR8638). The results of qRT-PCR analysis showed that two lncRNAs (TCONS_00807084 and TCONS_01148734) and the corresponding miRNAs (gma-miR160b and gra-miR8638) showed a similar expression pattern in three-line hybrid cotton (Figure 5a). Interestingly, gma-miR160b, which is derived from TCONS_00807084, was up-regulated in the A and B lines compared with the R line and may regulate an auxin response factor GhARF17 (Gh_D06G0360) (Figure 5b). Thus, TCONS_00807084 and gma-miR160b might play critical roles in anther development by influencing the auxin regulatory pathway. These findings indicate the complex regulatory mechanisms by which lncRNAs and the corresponding mature miRNAs function during anther development, although the underlying regulatory network remains unclear.

8
In the current study, we predicted the potential of lncRNAs as miRNA eTMs by integration of previous miRNA sequence data during anther development [25]. In total, two lncRNAs (TCONS_00342368 and TCONS_00148576) were predicted to be potential eTMs for five miRNAs, of which TCONS_00342368 was a putative eTM for ath-miR171c-5p, osa-miR171c-5p, and stu-miR171d-5p, and TCONS_00148576 was a putative eTM for ath-miR399b and cme-miR399d (Figure 4a). The predicted miRNA binding sites and the bulge region in lncRNAs were identical among the different miRNAs in the same miRNA family. To analyze the sequence evolutionary conservation of these eTM lncRNAs, we aligned the sequences of the predicted eTM-binding sites for miR171 and miR399 from Arabidopsis and rice (Figure 4b). The miRNA binding sequences were well conserved, and the bulge region frequently varied among different species, consistent with previous studies of Arabidopsis and rice [18] and Brassica [19]. Further analysis is necessary to understand the role of the two lncRNAs predicted to be eTMs. In addition, lncRNAs might have functions associated with a role as miRNA precursors [32]. Those lncRNAs that act as precursors of miRNAs might perform an indirect regulatory function through corresponding miRNAs. Moreover, differential expression of lncRNAs might result in the differential expression of corresponding mature miRNAs [15,32]. In the present study, 63 lncRNAs were predicted to be putative precursors of 35 miRNAs belonging to 26 miRNA families (Table S4). Of these lncRNAs, 13 lncRNAs were identified as the putative precursor of multiple miRNAs, with 1-2 nt difference in the mature miRNA sequence. Interestingly, 19 of the precursor lncRNAs showed differential expression among the A, B, and R lines, of which five lncRNAs (TCONS_00600850, TCONS_00807084, TCONS_01123999, TCONS_01109996, and TCONS_01148734) showed a consistent expression pattern with the corresponding mature miRNA (gra-miR8753, gma-miR160b, ghr-miR7506, ghr-miR7511, and gra-miR8638). The results of qRT-PCR analysis showed that two lncRNAs (TCONS_00807084 and TCONS_01148734) and the corresponding miRNAs (gma-miR160b and gra-miR8638) showed a similar expression pattern in three-line hybrid cotton (Figure 5a). Interestingly, gma-miR160b, which is derived from TCONS_00807084, was up-regulated in the A and

The miRNA-LncRNA-mRNA Regulatory Networks between A, B, and R Lines
To explore the regulatory networks of lncRNAs involved in CMS and fertility restoration, we selected the target genes of the differentially expressed lncRNAs and miRNAs based on the RNA-seq data and constructed a putative miRNA-lncRNA-mRNA regulatory network using Cytoscape software ( Figure 6, Table S5). The networks were composed of 66 miRNAs, 161 lncRNAs, and 658 mRNAs (mRNAs regulated by miRNAs and lncRNAs), of which 58 lncRNAs regulated by 15 miRNAs and 77 lncRNAs regulated by 35 miRNAs were specifically differentially expressed in the A-B and A-R comparisons, respectively; in addition, 26 lncRNAs regulated by 16 miRNAs were in common among the A-B and A-R comparisons. These predicted target genes regulated by miRNAs and lncRNAs were divided into multiple groups. First, several genes showed critical roles in oxidation-reduction processes, of which a glutamyl-tRNA reductase (Gh_A08G0634), cupredoxin superfamily protein (Gh_D03G0412), and cinnamyl alcohol dehydrogenase (Gh_D11G3399) show oxidoreductase activity during anther development. Second, several genes were transcription factors and involved in cellular hormone response and metabolic processes. For example, bZIP

The miRNA-LncRNA-mRNA Regulatory Networks between A, B, and R Lines
To explore the regulatory networks of lncRNAs involved in CMS and fertility restoration, we selected the target genes of the differentially expressed lncRNAs and miRNAs based on the RNA-seq data and constructed a putative miRNA-lncRNA-mRNA regulatory network using Cytoscape software ( Figure 6, Table S5). The networks were composed of 66 miRNAs, 161 lncRNAs, and 658 mRNAs (mRNAs regulated by miRNAs and lncRNAs), of which 58 lncRNAs regulated by 15 miRNAs and 77 lncRNAs regulated by 35 miRNAs were specifically differentially expressed in the A-B and A-R comparisons, respectively; in addition, 26 lncRNAs regulated by 16 miRNAs were in common among the A-B and A-R comparisons. These predicted target genes regulated by miRNAs and lncRNAs were divided into multiple groups. First, several genes showed critical roles in oxidation-reduction processes, of which a glutamyl-tRNA reductase (Gh_A08G0634), cupredoxin superfamily protein (Gh_D03G0412), and cinnamyl alcohol dehydrogenase (Gh_D11G3399) show oxidoreductase activity during anther development. Second, several genes were transcription factors and involved in cellular hormone response and metabolic processes. For example, bZIP (Gh_A01G1768), ARF (Gh_D06G0360), and SPL (Gh_A11G0706) transcription factors and SAUR-like auxin-responsive protein (Gh_A12G2237) and a gibberellin-regulated family protein (Gh_D02G0666) were regulated by miRNAs and lncRNAs involved in hormone response and metabolic processes. Third, several genes were functional proteins, including an ABORTED MICROSPORES (AMS) protein (Gh_D12G0328), nucleotide/sugar transporter family protein (Gh_A04G0407), and NB-ARC domain-containing disease resistance protein (Gh_A08G1378). Several functionally unknown genes regulated by miRNAs and lncRNAs were differentially expressed in the A, B, and R lines. These above-mentioned genes may play critical roles in CMS and fertility restoration in cotton. The expression levels of several miRNA-lncRNA-mRNA regulatory networks were validated by qRT-PCR and were concordant with the transcriptome data ( Figure 7). For example, a bHLH transcription factor, ABORTED MICROSPORES (AMS) gene (Gh_D12G0328), was a specific phytochrome-interacting factor regulated by ath-miR414 and TCONS_01118841. The gene Gh_A06G1391, which was regulated by ghr-miR2950 and TCONS_00233548, shows oxidoreductase activity and may participate in fatty acid biosynthesis. The qRT-PCR results showed that both genes were down-regulated in the A line compared with the B line. These genes might be involved in CMS during anther development. In the A-R comparison, a glutamyl-tRNA reductase (Gh_A08G0634) functions in oxidation-reduction processes, as the target gene of gra-miR166d and TCONS_00327889 and was down-regulated in the A line compared with the R line. The gene Gh_D11G3015, which encodes a calcium-dependent lipid-binding (CaLB domain) protein and was the target gene of zma-miR171b-3p and TCONS_01086083, may participate in the calcium signaling pathway and was down-regulated in the A line compared with the R line. These genes might be involved in pollen development and fertility restoration during anther development. In addition, in our previous study, the PPR (Gh_D05G3392) gene located near the region of Rf1 was regulated by gra-miR7505b [25]. The expression levels of several miRNA-lncRNA-mRNA regulatory networks were validated by qRT-PCR and were concordant with the transcriptome data ( Figure 7). For example, a bHLH transcription factor, ABORTED MICROSPORES (AMS) gene (Gh_D12G0328), was a specific phytochrome-interacting factor regulated by ath-miR414 and TCONS_01118841. The gene Gh_A06G1391, which was regulated by ghr-miR2950 and TCONS_00233548, shows oxidoreductase activity and may participate in fatty acid biosynthesis. The qRT-PCR results showed that both genes were down-regulated in the A line compared with the B line. These genes might be involved in CMS during anther development. In the A-R comparison, a glutamyl-tRNA reductase (Gh_A08G0634) functions in oxidation-reduction processes, as the target gene of gra-miR166d and TCONS_00327889 and was down-regulated in the A line compared with the R line. The gene Gh_D11G3015, which encodes a calcium-dependent lipid-binding (CaLB domain) protein and was the target gene of zma-miR171b-3p and TCONS_01086083, may participate in the calcium signaling pathway and was down-regulated in the A line compared with the R line. These genes might be involved in pollen development and fertility restoration during anther development. In addition, in our previous study, the PPR (Gh_D05G3392) gene located near the region of Rf1 was regulated by gra-miR7505b [25]. Interestingly, we detected the lncRNA TCONS_00797453 located in the promoter region of the Gh_D05G3392 gene and may strongly regulate the expression of that gene in the R line. This result showed that PPR (Gh_D05G3392) may be regulated by both gra-miR7505b and TCONS_00797453 to participate in fertility restoration.

Discussion
Although an increasing number of lncRNAs have been identified in cotton, including those involved in fiber development [14,20,21], response to drought and salt stress [22,23], and disease resistance [33], no lncRNAs have been previously identified during anther development in three-line hybrid cotton. In the present study, transcriptome sequencing was performed during anther development (at the stage of male meiosis) in Upland cotton harboring the CMS-D2 cytoplasm to systematically identify lncRNAs involved in CMS and fertility restoration. The differences between the CMS line (A), maintainer line (B), and fertility restoration line (R) are as follows: A vs. B, both have different cytoplasm and fertility; A vs. R, both harbor different Rf1 alleles and fertility; and B vs. R, both differ in cytoplasm and Rf1 alleles. Thus, three-line hybrid cotton represents suitable material to explore the molecular mechanism of nucleo-cytoplasmic interaction. In our previous studies, differentially expressed genes and miRNAs were analyzed during anther development in three-line hybrid cotton, and many candidate genes and miRNAs were discovered [24,25]. In the current study, identification of lncRNAs differentially expressed between the A, B, and R lines provides a novel perspective for understanding the molecular mechanism of CMS and fertility restoration in Upland cotton.

Overview of LncRNAs Identification and Function in Anther Development
In the present study, 80,695 lncRNAs were identified by analyzing almost 910 million clean reads, of which 18,755, 20,837, and 21,346 unique lncRNAs were specifically expressed in the A, B, and R lines, respectively. As previously reported [14], we observed that the number of identified lncRNAs during anther development is larger than the numbers identified in Arabidopsis, maize, The putative regulation mechanism and expression pattern analysis in miRNAs-lncRNAs-mRNAs networks during anther development. (a) Regulation mechanism prediction of functional miRNAs-lncRNAs-mRNAs networks in anther development; (b) The qRT-PCR validate four miRNAs-lncRNAs-mRNAs regulatory networks in anther development of cotton.

Discussion
Although an increasing number of lncRNAs have been identified in cotton, including those involved in fiber development [14,20,21], response to drought and salt stress [22,23], and disease resistance [33], no lncRNAs have been previously identified during anther development in three-line hybrid cotton. In the present study, transcriptome sequencing was performed during anther development (at the stage of male meiosis) in Upland cotton harboring the CMS-D2 cytoplasm to systematically identify lncRNAs involved in CMS and fertility restoration. The differences between the CMS line (A), maintainer line (B), and fertility restoration line (R) are as follows: A vs. B, both have different cytoplasm and fertility; A vs. R, both harbor different Rf1 alleles and fertility; and B vs. R, both differ in cytoplasm and Rf1 alleles. Thus, three-line hybrid cotton represents suitable material to explore the molecular mechanism of nucleo-cytoplasmic interaction. In our previous studies, differentially expressed genes and miRNAs were analyzed during anther development in three-line hybrid cotton, and many candidate genes and miRNAs were discovered [24,25]. In the current study, identification of lncRNAs differentially expressed between the A, B, and R lines provides a novel perspective for understanding the molecular mechanism of CMS and fertility restoration in Upland cotton.

Overview of LncRNAs Identification and Function in Anther Development
In the present study, 80,695 lncRNAs were identified by analyzing almost 910 million clean reads, of which 18,755, 20,837, and 21,346 unique lncRNAs were specifically expressed in the A, B, and R lines, respectively. As previously reported [14], we observed that the number of identified lncRNAs during anther development is larger than the numbers identified in Arabidopsis, maize, and Gossypium arboreum [7,8,20] but similar to the numbers identified in wheat [15], Brassica rapa [19], and cotton in response to drought stress [22]. Thus, we suspect that the size and complexity of the genome and strict screening criteria may have led to an increase in the number of lncRNAs identified. In addition, comparison of the characteristics of lncRNAs and mRNAs revealed that lncRNAs share many common characteristics, such as fewer exons, typically smaller length, and lower expression level than mRNAs during anther development.
Few studies have investigated the involvement of lncRNAs in plant reproductive development. In rice, Zhang et al. identified and verified a set of lncRNAs involved in sexual reproduction [9]. Huang et al. identified lncRNAs during pollen development and fertilization in Brassica. The GO enrichment analysis indicated that genes that show transcription regulator activity (GO: 0030528) and involved in morphogenesis (GO: 0010927) might perform critical roles in pollen exine formation [19]. In the current study, 43,347 and 44,739 lncRNAs were differentially expressed in the A-B and A-R comparisons, respectively. Previous studies indicate that Rf1 is located on chromosome Gh_D05 and is flanked by the BNL3535 and NAU3652 SSR markers [26,27]. In the present study, 21 lncRNAs in the Rf1 region flanked by these two markers were up-regulated in the R line compared with the A and B lines, and the results of qRT-PCR analysis were approximately consistent with the RNA-seq data.
To explore the function of the differentially expressed lncRNAs, we performed a GO enrichment analysis. In the A-B comparison, enrichment was observed in the GO terms oxidation-reduction process, photosynthetic electron transport chain, and cellular hormone regulation and metabolic process. These results indicated that the differences between the normal Upland cotton cytoplasm and sterile cytoplasm may influence cellular hormone and metabolic processes and oxidation-reduction reaction processes. Energy supply and cellular hormone content and metabolism during anther development might be involved in CMS. In the A-R comparison, the most significantly enriched GO terms were cellular amino acid biosynthetic process and tricarboxylic acid biosynthetic process. These results indicated that differentially expressed lncRNAs may participate in cellular component morphogenesis and small molecular biosynthetic processes for fertility restoration in cotton. These lncRNAs represent functional candidates for CMS and fertility restoration for further investigation.

Relationship between LncRNAs and MiRNAs in Anther Development of Cotton
Previous studies indicate that lncRNAs may act as eTMs to prevent interaction between miRNAs and the target genes by competitively binding with the corresponding miRNAs [16,18,19]. For example, the lncRNA IPS1 in Arabidopsis acts as an eTM of ath-miR399 to regulate PHOS2 and a 3 nt bulge is present in the 10th and 11th nt positions of the miRNA [16]. In Brassica, two lncRNAs are eTMs of miR160 and function in pollen formation and male fertility [19]. In tomato, a lncRNA acts as an eTM of miR166 and may regulate Tomato yellow leaf curl virus resistance [34]. In the present study, two lncRNAs (TCONS_00342368 and TCONS_00148576) were predicted to be potential eTMs for five miRNAs, of which TCONS_00342368 was a putative eTM for ath-miR171c-5p, osa-miR171c-5p, and stu-miR171d-5p, and TCONS_00148576 was a putative eTM for ath-miR399b and cme-miR399d. The predicted miRNA binding sites and the bulge region are conserved among different miRNAs and the same miRNAs in different plant species [18,19,35]. We observed that the sequence of the eTMs of miR171 and miR399 were well conserved and the bulge region frequently varied among different species, which is consistent with previous studies [18,19,35]. Therefore, we hypothesize that certain interactions between these lncRNAs and miRNAs may play a fundamental role in anther development of cotton.
In addition to functioning as eTMs of miRNAs, lncRNAs are also predicted to be precursors of miRNAs and differential expression of lncRNAs might result in the differential expression of the corresponding mature miRNAs [13][14][15]. For instance, Wang et al. systematically analyzed the expression of a lncRNA that generates miR397 during fiber development of cotton [14]. Cagirici et al. reported that a stress-responsive lncRNA was the precursor of miR1117 and miR1127a [15]. In the present study, 63 lncRNAs were identified as putative precursors of 35 miRNAs, of which five miRNAs (gra-miR8753, gma-miR160b, ghr-miR7506, ghr-miR7511, and gra-miR8638) showed an expression level consistent with that of the precursor lncRNAs. For example, gma-miR160b, derived from TCONS_00807084, was up-regulated in the A and B lines compared with the R line. While miRNA gma-miR160b regulates an auxin response factor, GhARF17 (Gh_D06G0360), was down-regulated in the A line compared with the B and R lines. Several studies indicate that miR160 and ARF17 perform critical roles during pollen development. For example, overexpress miR160-resistant ARF17 show male sterility in Arabidopsis [36]. Jun et al. observed that ARFA17 is essential for primexine formation and that primexine was defective in the arf17 mutant, which caused pollen wall-patterning defects and pollen degradation in Arabidopsis [37]. Ding et al. overexpressed miR160 in cotton, which leads to anther indehiscence, suppression of ARF10 and ARF17 expression, and thus increased cotton sensitivity to high temperature stress by activation of the auxin response [38]. Huang et al. increased ARF17 expression levels by overexpressing the lncRNA bra-eTM160 for inhibition of bra-miR160 and caused male sterility in Brassica and observed that potential dosage-dependent regulation may render lncRNAs as an endogenous modulator for miRNA functions [19]. The above-mentioned results indicate that the lncRNAs-miR160-ARF17 regulatory network might participate in pollen development through involvement in auxin regulation, although the underlying regulatory mechanisms are incompletely understood.
Several studies indicate that lncRNAs and miRNAs are involved in complex regulatory pathways during plant development processes [15,35,39,40]. For example, Reina et al. observed that more than 700 lncRNAs in rice were cleaved by miR2118 and processed by the DCL4 protein, resulting in production of phasiRNAs [39]. Furthermore, the lncRNA PMS1T is targeted by miR2118 to produce phasiRNAs that preferentially accumulate in a photoperiod-sensitive male sterility line under long-day conditions, and the elevated phasiRNAs eventually cause male sterility in rice through an unknown regulatory network [41]. Liu et al. reported a regulatory network of miR3954-lncRNA-phasiRNAs-NAC, which causes early flowering in citrus. Overexpression of miR3954 causes down-regulation of the corresponding lncRNAs (Cs1g09600 and Cs1g09635), up-regulation of phasiRNAs, and a reduced expression level of NAC genes [42]. These results indicate that miRNA regulation of lncRNAs may function as part of a complex regulatory pathway during plant development. Thus, in the present study, we constructed several putative miRNAs-lncRNAs-mRNAs regulatory networks involved in CMS and fertility restoration. Fifty-eight lncRNAs regulated by 15 miRNAs and 77 lncRNAs regulated by 35 miRNAs were specifically differentially expressed in the A-B and A-R comparisons, respectively. Several miRNAs-lncRNAs-mRNAs regulatory networks were validated by qRT-PCR analysis. For example, in the A-B comparison, a transcription factor AMS gene (Gh_D12G0328) and a fatty acid biosynthesis-related gene (Gh_A06G1391), regulated by the corresponding miRNAs and lncRNAs, were down-regulated in the A line compared with the B line. These genes might be involved in CMS during anther development. In the A-R comparison, a glutamyl-tRNA reductase (Gh_A08G0634) and a calcium-dependent lipid-binding (CaLB domain) protein (Gh_D11G3015), regulated by the corresponding miRNAs and lncRNAs, function in oxidation-reduction processes and the calcium signaling pathway, respectively. An additional regulatory network, in which PPR (Gh_D05G3392) located near the Rf1 gene mapping region is regulated by gra-miR7505b and TCONS_00797453, might play a critical role in fertility restoration. However, these regulatory networks require validation in the future.

Plant Materials and Transcriptome Sequence
The CMS-D2 three-line hybrid cotton system was developed at the Cotton Research Institute, Chinese Academy of Agricultural Science (Anyang, China). In our previous study, the CMS line harboring CMS-D2 cytoplasm was crossed with the restorer line, and the maintainer line with normal fertile Upland cotton (AD1) cytoplasm as the recurrent male parent to backcross with the F 1 plants to construct a BC 8 F 1 population. From this segregating population, the sterile and fertile plants were selected as the CMS-D2 line (A) and restorer line (R), respectively [27]. The A line is homozygous for the recessive (i.e., nonfunctional) fertility restorer alleles (rf1rf1), whereas the maintainer line (B) harbors normal fertile Upland cotton cytoplasm and has the same nuclear allelic composition (rf1rf1). The R line is homozygous for dominant (i.e., functional) fertility restorer alleles (Rf1Rf1) to allow recovery of fertility in CMS-D2 cotton plants in the cross A × R. The three lines were grown under normal production conditions. For sample collection, as described in previous studies [43,44], each genotype was grown side-by-side in field, and floral buds approximately 3 mm in length (corresponding roughly to the stage of male meiosis) were collected from about 100 plants (one floral bud was collected per plant) and combined, with three independent biological replicates. All collected floral buds were cut above the ovaries, immediately frozen in liquid nitrogen, and stored at −80 • C until use.
Total RNA was extracted using the Spectrum™ Plant Total RNA Kit (Princeton, NJ, USA) in accordance with the manufacturer's instructions. Equal amounts of RNA from the three biological replicates were used to construct transcriptome libraries (A1-3, B1-3, and R1-3) and small RNA libraries (A, B, and R) [25]. Both transcriptome and small RNA sequencing were performed on an Illumina HiSeq 2500/2000 platform (Tianjin, China). The raw sequence data for the transcriptome and small RNAs have been submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra) under accession numbers SRX3421007 and SRX3422274, respectively.

Annotation of Transcripts and Identification of Long Noncoding RNAs
For the transcriptome data, after filtering of low-quality reads and trimming the adaptor sequences, a total of 887,090,420 clean reads were obtained to map to the Upland cotton "Texas Marker-1" (TM-1) reference genome (http://www.cottongen.org) [45] using TopHat 2.1.1 (Seattle, WA, USA) [46]. Transcripts assembly was accomplished using Cufflinks 2.2.1 (Berkeley, CA, USA) [47] based on the results of alignment to the TM-1 reference genome. All transcripts without strand information were discarded. The remaining transcripts were used to identify lncRNAs on the basis of the following rigorous criteria. First, single-exon transcripts located within 500 bp of other transcripts were excluded. Second, transcripts smaller than 200 bp in length were removed. Third, transcripts with a fragments per kilobase of transcript per million mapped reads (FPKM) score higher than 2 with a single exon or 0.5 with multiple exons in at least one sample were retained. Fourth, transcripts that overlapped with known genes and other non-mRNAs (rRNA, tRNA, snRNA, snoRNA, and pseudogenes) were excluded. Fifth, CPC [48] and Pfamscan [49] software were used to calculate the coding potential of the remaining transcripts. Only transcripts with CPC score < 0 and Pfamscan e-value < 0.001 were considered as putative lncRNAs for subsequent analysis.

Expression and Target Gene Analysis of LncRNAs
Cuffdiff 2.1.1 software was used to estimate fragments per kilobase of exon per million fragments mapped (FPKMs) of both lncRNAs and coding genes in each sample [50].
Based on the genome location of the lncRNAs and coding genes, we identified the target genes 10 kb upstream and downstream of lncRNAs and then analyzed their function. Gene ontology (GO) enrichment analysis was performed using the GOseq R package (version 3.6, Melbourne, Australia). The GO terms with a corrected p-value less than 0.05 were considered to be significantly enriched [51].

Prediction of Putative Targets and Endogenous Target Mimics for MiRNAs
The miRNA targets of lncRNAs or mRNAs were predicted using the psRNATarget server (http://plantgrn.noble.org/psRNATarget/), for which less than three mismatches in targets and miRNA pairing regions were permitted. The eTMs for miRNAs were predicted by combing psRNATarget and application of the rules established by Wu et al [18].
The miRNAs precursors were predicted by mapping the mature miRNAs sequence to the putative lncRNAs sequence through local BLAST and mismatch not allowed. We then utilized Mfold (http://unafold.rna.albany.edu/?q=mfold) to predict stem-loop structures as suggested by Jones-Rhoades [52].

Construction of MiRNA-lncRNA-mRNA Regulatory Networks
To understand the role of lncRNAs, miRNA-lncRNA-mRNA networks were constructed based on differentially expressed lncRNAs and miRNAs, and the target pairs of miRNAs-lncRNAs, miRNAs-mRNAs, and lncRNAs-mRNAs.
The regulatory networks contained miRNAs, lncRNAs acting as miRNA targets, mRNAs acting as lncRNA targets, and mRNAs acting as miRNA targets. The miRNA-lncRNA-mRNA regulatory networks were visualized using Cytoscape 3.7.1 software [53].

Quantitative RT-PCR Validation of LncRNAs, MiRNAs, and mRNAs Expression
To validate the relative expression of lncRNAs, miRNAs, and mRNAs, quantitative reverse transcription PCR (qRT-PCR) was performed with specific primers as described previously [25]. Total RNAs and miRNAs were extracted from the same samples by RNA-seq and reverse-transcribed to cDNA using the TransScript ® miRNA First-Strand cDNA Synthesis SuperMix kit (TransGen, Beijing, China) and PrimeScript™ RT reagent kit (Takara, Dalian, China) following the manufacturers' guidelines. The qRT-PCR mixture contained 1 µL diluted cDNA, 10 µL 2× SYBR ® Green Mix (Takara), 0.5 µM of each primer, and ddH 2 O to make up the volume to 20 µL. All reactions were performed with three biological replicates. Ubiquitin 6 (GhUBQ6) was used as the reference gene, and relative expression levels were calculated using the 2 −∆∆Ct method [54]. The qRT-PCR primers used are listed in Table S6.

Conclusions
In this study, systematic transcriptome sequencing was performed during anther development of Upland cotton harboring the cytoplasmic male sterile Gossypium harknessii (D2) cytoplasm. In total, 80,695 lncRNAs were identified, of which 43,347 and 44,739 lncRNAs were differentially expressed in the A-B and A-R comparisons, respectively. These lncRNAs represent functional candidates involved in CMS and fertility restoration. We analyzed the putative relationship between lncRNAs and miRNAs and observed that lncRNAs may act as miRNA precursors, miRNA targets, and miRNA eTMs. Sixty-three lncRNAs were identified as putative precursors of 35 miRNAs, and qRT-PCR results showed a similar expression level to that of RNA-seq data. To explore the functions of lncRNAs, we constructed putative miRNA-lncRNA-mRNA regulatory networks involved in CMS and fertility restoration. However, further functional analyses are needed to elucidate the regulatory networks. This study lays a solid foundation for exploration of the functions and regulatory mechanisms of lncRNAs in anther development of cotton.