Integrated Analysis of Single-Molecule Real-Time Sequencing and Next-Generation Sequencing Eveals Insights into Drought Tolerance Mechanism of Lolium multiflorum

Lolium multiflorum is widely planted in temperate and subtropical regions globally, and it has high economic value owing to its use as forage grass for a wide variety of livestock and poultry. However, drought seriously restricts its yield and quality. At present, owing to the lack of available genomic resources, many types of basic research cannot be conducted, which severely limits the in-depth functional analysis of genes in L. multiflorum. Therefore, we used single-molecule real-time (SMRT) and next-generation sequencing (NGS) to sequence the complex transcriptome of L. multiflorum under drought. We identified 41,141 DEGs in leaves, 35,559 DEGs in roots, respectively. Moreover, we identified 1243 alternative splicing events under drought. LmPIP5K9 produced two different transcripts with opposite expression patterns, possibly through the phospholipid signaling pathway or the negatively regulated sugar-mediated root growth response to drought stress, respectively. Additionally, 13,079 transcription factors in 90 families were obtained. An in-depth analysis of R2R3-MYB gene family members was performed to preliminarily demonstrate their functions by utilizing subcellular localization and overexpression in yeast. Our data make a significant contribution to the genetics of L. multiflorum, offering a current understanding of plant adaptation to drought stress.


Introduction
Drought is a periodic and growing natural disaster that impacts extensive subject areas [1], including water resources [2,3] and crop yield [4][5][6], as well as a range of environmental systems [7], resulting in serious harm to ecological security and human society [8,9]. Over the past half-century, drought has become more and more serious all over the world [10], which has greatly reduced the productivity of grazing grasslands and artificial mowed grasslands.
L. multiflorum is one of the most important forage grasses and is widely grown in temperate and subtropical regions worldwide [11]. It is an excellent annual gramineous species recommended for planting, and its forage yield and quality cannot be replaced by other forage grasses at present [12]. It plays an important role in restoring degraded grasslands and establishing artificial grasslands and has both high ecological value and high economic benefit [12]. In recent years, its phytoremediation [13][14][15], bio-ethanol production [16] and anti-inflammatory medicinal properties [17] have also been reported. L. multiflorum is a highly self-incompatible plant, corresponding to the complex structure of its genome [18]. Meanwhile, the lack of available genomic resources hinders its improvement by breeders. Thus, there is an urgent need to construct valuable gene data sets and screen key candidate genes in L. multiflorum.

of 20
At present, vast quantities of data have been generated through second-generation high-throughput sequencing platforms. However, owing to the short reads generated by these technologies, it is quite difficult to obtain full-length sequences, and such sequencing technologies cannot work well in complex regions [19], which limits the ability of researchers to study gene function throughout the entire genome [20]. SMRT sequencing is a third-generation sequencing technology, which is more and more used in full-length sequencing [21,22]. It makes up for the deficiency from short reads and can generate full-length cDNA sequences (4-8 kb on average) without assembly or a reference genome, which greatly increases the potential for the discovery of genes and deeper studies of cell transcription [23,24]. There are many examples of SMRT sequencing being used to explore key genes or pathways to promote molecular breeding. In Iris halophila, metal ion transporters were found to be involved in the response to Pb stress using SMRT sequencing [25]. A key synthase in the benzylisoquinoline alkaloid biosynthesis pathway, the main active substance in Corydalis yanhusuo, was identified by SMRT sequencing [26]. SMRT sequencing has become an ideal way to construct transcriptomes and analyze novel genetic material, especially for species with no reference genomes.
We utilized SMRT sequencing data generated with the PacBio Sequel platform, which produces long reads, to reveal full-length transcriptome information in L. multiflorum under drought. Such SMRT sequencing data can be complemented by NGS short reads [27]. Then, high-quality full-length transcript and drought-regulated genes in multiple tissues of L. multiflorum were obtained. On the basis of transcriptome data, we carried out functional annotation, lncRNA prediction, coding sequence prediction, TF analyses and functional validation of candidate genes by subcellular localization and overexpression in yeast. This is the first time to use SMRT sequencing to generate full-length transcription data from L. multiflorum under drought and is a very useful and important resource for further research of this important forage. This research can be utilized to elucidate the mechanisms of drought response of L. multiflorum and to create drought-tolerant, water-saving and environmentally friendly forage.

Assembly of the Sequence Datasets and Functional Annotation
We obtained a total of 60.89 Gb of raw data from PacBio ISO-Seq and generated 13,787,867 subreads with a total size of 30.72 Gb. Furthermore, by self-correcting multiple single-molecule sequencing sequences, we obtained 1,026,983 CCS sequences after filtering. Based on their inclusion of 5 -primer, 3 -primer and poly-A tail sequences, CCS sequences were divided into full-length and non-full-length reads. A total of 879,040 fulllength reads were found among the CCS sequences, and a total of 834,868 full-length non-chimera (FLNC) reads were obtained through ICE. Finally, LoRDEC software was used to identify sequencing errors, and the RNA-Seq short reads with high accuracy were combined for further correction. A total of 385,645 error-corrected consensus reads were obtained, with an average length of 2571 bp (Table 1, Figure 1C). Finally, CD-HIT software was used to eliminate redundancy among consensus reads, and 385,645 full-length nonredundant transcripts and 207,995 Unigenes of samples were obtained (Table 1). A total of 12,433 transcripts shared by the two samples were found by cluster analysis of transcripts with redundant sequences between the two samples ( Figure 1A).
To further investigate gene function, non-redundant sequences were annotated by using CD-HIT. Overall, the transcripts with annotations corresponding to the Nr database were most common, with a total of 182,952 ( Figure 1B). Additionally, 77,258 transcripts were annotated in all databases, and 189,898 transcripts were annotated in at least one database (Table S1). To investigate whether drought stress affects AS in L. multiflorum, 26,169 non-redundant transcripts (Table S2) were processed into 10,629 UniTransModels (Table S2), and 1243 AS events (Table S2) were identified based on UniTransModels. Before and after drought stress, the occurrence of AS events was essentially uniform in both type and quantity. In descending order of quantity, the types can be ranked as follows: retained intron (RI), alternative 3 splice sites (A3), alternative 5 splice sites (A5), skipped exons (SE), alternative first exons (AF), and alternative last exons (AL).

Differentially Expressed Genes in L. multiflorum Leaves and Roots under Drought
To evaluate the reliability of the DEG analysis, Pearson correlation analysis was performed for pairs of samples. The coefficients of determination (R2) among the three biological replicates in each condition were at least greater than 0.97, confirming the high correlation among biological replicates and the stability and reliability of the DEG analysis ( Figure 2A). In leaves and roots of L. multiflorum under drought stress, 41,141 DEGs (19,155 down-and 21,986 up-regulated genes) ( Figure 2C) and 35,559 DEGs (17,402 downand 18,157 up-regulated genes) ( Figure 2D) were found, respectively. All DEGs involved in the response to drought stress were analyzed, and 12 significant profiles were obtained by trend analysis ( Figure 2B). Profile 15 represents 3955 genes up-regulated in response to drought stress in both leaves and roots ( Figure S1A, Table S3). Profiles 14 and 11 represent 3993 up-regulated genes only in leaves and 3397 up-regulated genes only in roots, respectively ( Figure S1B,C, Table S3). Profile 5 represents 4534 genes down-regulated in response to drought stress in both leaves and roots ( Figure S1D, Table S3). These results indicated that there were more DEGs in L. multiflorum leaves under in vitro drought treatment and that the response of leaves to stress was broader and more active. At the same time, the trend analysis showed that more genes were down-regulated in both leaves and roots. To further investigate gene function, non-redundant sequences were annotated by using CD-HIT. Overall, the transcripts with annotations corresponding to the Nr database were most common, with a total of 182,952 ( Figure 1B). Additionally, 77,258 transcripts were annotated in all databases, and 189,898 transcripts were annotated in at least one database (Table S1).
To investigate whether drought stress affects AS in L. multiflorum, 26,169 non-redundant transcripts (Table S2) were processed into 10,629 UniTransModels (Table S2), and 1243 AS events (Table S2) were identified based on UniTransModels. Before and after drought stress, the occurrence of AS events was essentially uniform in both type and quantity. In descending order of quantity, the types can be ranked as follows: retained intron (RI), alternative 3′ splice sites (A3), alternative 5′ splice sites (A5), skipped exons (SE), alternative first exons (AF), and alternative last exons (AL). respectively ( Figure S1B,C, Table S3). Profile 5 represents 4534 genes down-regulated i response to drought stress in both leaves and roots ( Figure S1D, Table S3). These resul indicated that there were more DEGs in L. multiflorum leaves under in vitro drought trea ment and that the response of leaves to stress was broader and more active. At the sam time, the trend analysis showed that more genes were down-regulated in both leaves an roots.  A comparison of differences between the two treatment groups in leaves and roots (i.e., DRL versus CKL and DRR versus CKR) revealed that 11,940 DEGs responded to drought treatment in both tissues ( Figure 3A, Table S4). To explore the biological functions of DEGs, GO and KEGG enrichment analyses were performed on all the differentially expressed genes in the two treatment groups ( Figure 3B,C, Table S5). Among molecular function (MF) terms, 'catalytic activity' and 'coenzyme binding' were significantly enriched in both the DRL versus CKL and DRR versus CKR comparisons, simultaneously. Among cellular component (CC) terms, 'membrane', '1,3-beta-d-glucan synthase complex', 'integral component of membrane', 'intrinsic component of membrane' and 'membrane part' were significantly enriched in the DRL versus CKL and DRR versus CKR comparisons, simultaneously. No biological process (BP) terms were enriched in both comparisons. riched in both the DRL versus CKL and DRR versus CKR comparisons, simultan Among cellular component (CC) terms, 'membrane', '1,3-beta-d-glucan syntha plex', 'integral component of membrane', 'intrinsic component of membrane' and brane part' were significantly enriched in the DRL versus CKL and DRR versus CK parisons, simultaneously. No biological process (BP) terms were enriched in bo parisons.

Transcription Factor Statistics and Identification of R2R3-MYB Family Members
Transcription factors play an important role in regulating gene expression and have been an active research focus for decades. Full-length ryegrass transcriptome data were analyzed using the iTAK database, and the transcripts of 6512 transcription factors in 90 families were identified under CK conditions ( Figure 4A, Table S6). A total of 6567 transcription factors in 88 families were identified under DR conditions ( Figure 4B, Table S6). Under CK and DR conditions, the families represented by the most members were SNF2, SET and FAR1 and FAR1, C3H and SNF2, respectively. Under CK and DR conditions combined, four families ranked in the top ten in terms of the number of family members represented, namely SNF2, FAR1, MYB-related and C3H.

Transcription Factor Statistics and Identification of R2R3-MYB Family Members
Transcription factors play an important role in regulating gene expression and have been an active research focus for decades. Full-length ryegrass transcriptome data were analyzed using the iTAK database, and the transcripts of 6512 transcription factors in 90 families were identified under CK conditions ( Figure 4A, Table S6). A total of 6567 transcription factors in 88 families were identified under DR conditions ( Figure 4B, Table S6). Under CK and DR conditions, the families represented by the most members were SNF2, SET and FAR1 and FAR1, C3H and SNF2, respectively. Under CK and DR conditions combined, four families ranked in the top ten in terms of the number of family members represented, namely SNF2, FAR1, MYB-related and C3H. Further identification of R2R3-MYB gene family members among MYB-related genes revealed 29 R2R3MYB genes expressed in L. multiflorum, sequentially named LmMYB1 through LmMYB29. The ORF lengths of LmMYB members ranged from 720 to 2889 bp, and the predicted protein lengths ranged from 239 to 962 amino acids, with molecular weights ranging from 27.23 to 105.65 kDa, respectively ( Table 2). Further identification of R2R3-MYB gene family members among MYB-related genes revealed 29 R2R3MYB genes expressed in L. multiflorum, sequentially named LmMYB1 through LmMYB29. The ORF lengths of LmMYB members ranged from 720 to 2889 bp, and the predicted protein lengths ranged from 239 to 962 amino acids, with molecular weights ranging from 27.23 to 105.65 kDa, respectively ( Table 2).

Genetic Analysis of Members of the R2R3-MYB Gene Family
To investigate the phylogenetic relationships of the R2R3MYB family members of L. multiflorum and Arabidopsis thaliana, a phylogenetic tree was constructed. Thus, R2R3MYB members from the two species could be divided into 13 subgroups, numbered S1 to S13, according to their phylogenetic relationships ( Figure 5). In S1, there is only one gene family member, LmMYB6, and it is distantly related to all other members. Meanwhile, S3 and S5 subgroups only have members in L. multiflorum. All other subgroups consist of members in both L. multiflorum and A. thaliana. The tree was generated with MEGA 7.0 software using the neighbor-joining (NJ) method based the inferred amino acid sequences. R2R3-MYB members in L. multiflorum are labeled with red st S1 to S13 represent each of the different subgroups.

The Expression Patterns of R2R3-MYB Family Members under Drought
In order to better understand the role of R2R3-MYB proteins in L. multiflorum in response to stress, the expression patterns of LmMYB genes were measured using qR PCR. The samples of L. multiflorum under drought stress included nine time points (0 m 15 min, 30 min, 1 h, 2 h, 3 h, 6 h, 12 h, 24 h). The experimental data are briefly summariz as follows. The expression patterns of the 29 LmMYB genes can be roughly divided i three categories: early response (ER), intermediate response (IR) and late response (L ( Figure 6). The ER genes (10 members) showed a very fast response to drought stress a were significantly upregulated at 15 and 30 min. The IR genes (11 members) were sign cantly up-regulated at 1-6 h, and some members were continuously upregulated. T (LR) genes (eight members) were up-regulated to a significant level after 6 h and of continued to be up-regulated under subsequent stress. The tree was generated with MEGA 7.0 software using the neighbor-joining (NJ) method based on the inferred amino acid sequences. R2R3-MYB members in L. multiflorum are labeled with red stars. S1 to S13 represent each of the different subgroups.

The Expression Patterns of R2R3-MYB Family Members under Drought
In order to better understand the role of R2R3-MYB proteins in L. multiflorum in the response to stress, the expression patterns of LmMYB genes were measured using qRT-PCR.

LmMYB Transcripts Localized to the Nucleus Enhanced Abiotic Stress Tolera
The subcellular localization of LmMYB1, LmMYB8 and LmMYB9 in t was further studied, revealing that all three genes were expressed in the n was consistent with the function of transcription factors ( Figure 7A). To furt ize the response of LmMYB1, LmMYB8 and LmMYB9 to stress, we cloned t them into the pYES2 vector, and then heterologously expressed them in the line. All yeast-harboring empty vectors as well as factors containing LmM and LmMYB9 were able to grow normally on SD-URA (2 g/L galactose) m 7B). Only the INVScI strains with LmMYB1, LmMYB8 and LmMYB9 were a SD-URA (2 g/L galactose) medium under 3 M sorbitol ( Figure 7C) and 1.5 ments ( Figure 7D). This result indicated that overexpression of LmMYB1, LmMYB9 could significantly improve the resistance of INVScI yeast stra stress compared with the strain harboring the empty pYES2 vector. This also LmMYB1, LmMYB8 and LmMYB9 may be involved in the plant response to o

LmMYB Transcripts Localized to the Nucleus Enhanced Abiotic Stress Tolerance in Yeast
The subcellular localization of LmMYB1, LmMYB8 and LmMYB9 in tobacco leaves was further studied, revealing that all three genes were expressed in the nucleus, which was consistent with the function of transcription factors ( Figure 7A). To further characterize the response of LmMYB1, LmMYB8 and LmMYB9 to stress, we cloned them, inserted them into the pYES2 vector, and then heterologously expressed them in the INVScI yeast line. All yeast-harboring empty vectors as well as factors containing LmMYB1, LmMYB8 and LmMYB9 were able to grow normally on SD-URA (2 g/L galactose) medium ( Figure 7B). Only the INVScI strains with LmMYB1, LmMYB8 and LmMYB9 were able to grow on SD-URA (2 g/L galactose) medium under 3 M sorbitol ( Figure 7C) and 1.5 M NaCl treatments ( Figure 7D). This result indicated that overexpression of LmMYB1, LmMYB8 and LmMYB9 could significantly improve the resistance of INVScI yeast strains to osmotic stress compared with the strain harboring the empty pYES2 vector. This also suggests that LmMYB1, LmMYB8 and LmMYB9 may be involved in the plant response to osmotic stress.

Discussion
As one of the most extensive forms of stress, drought has a huge impact on the survival of grasses. L. multiflorum is a vanguard grass species in artificial grassland construction, a preferred grass species for mowed grasslands and an important grass species for soil restoration. Understanding its response mechanism to drought stress is of substantial value for agricultural production, animal husbandry and ecological restoration. Currently, there is little available genomic information on L. multiflorum, so SMRT sequencing and NGS sequencing were used to construct a novel Unigene database for L. multiflorum under drought stress in this study. This extensive and comprehensive unigene database provides strong support for future research on the molecular mechanisms of drought responses in L. multiflorum. We identified 207,955 Unigenes and 1243 AS events and further characterized a number of key candidate genes involved in the response of L. multiflorum to drought stress, which can be used to advance the breeding of L. multiflorum to better adapt to increasingly arid environments.

Discussion
As one of the most extensive forms of stress, drought has a huge impact on the survival of grasses. L. multiflorum is a vanguard grass species in artificial grassland construction, a preferred grass species for mowed grasslands and an important grass species for soil restoration. Understanding its response mechanism to drought stress is of substantial value for agricultural production, animal husbandry and ecological restoration. Currently, there is little available genomic information on L. multiflorum, so SMRT sequencing and NGS sequencing were used to construct a novel Unigene database for L. multiflorum under drought stress in this study. This extensive and comprehensive unigene database provides strong support for future research on the molecular mechanisms of drought responses in L. multiflorum. We identified 207,955 Unigenes and 1243 AS events and further characterized a number of key candidate genes involved in the response of L. multiflorum to drought stress, which can be used to advance the breeding of L. multiflorum to better adapt to increasingly arid environments.

A More Extensive and Complete Transcriptome Dataset
Compared to other L. multiflorum transcriptome studies using the Illumina platform, our results provide a more extensive and complete transcriptome dataset, with several critical strengths. First, our full-length transcriptome can later be used as a reference for annotating and assembling subsequent genomes of L. multiflorum and related species. Second, 207,955 accurate and high-quality full-length sequences were obtained, providing particularly valuable data for gene structure and gene function analyses. Third, the fulllength transcripts generated in this study can be used to study the response of L. multiflorum to environmental changes with greater clarity. In this study, 207,955 Unigenes with an average length of 2571 bp were obtained by PacBio SMRT-Seq. Through the combination of Illumina and PacBio platform data, Unigenes are significantly increased in number and length compared with previous studies ( Figure S2) [28,29]. The average predicted lengths of Unigenes using Illumina data were only 871 bp and 575.24 bp, respectively. The increase in number and length greatly enriched the L. multiflorum unigene library, making it both more extensive and more complete, thus providing solid data support for subsequent gene function studies.

Alternative Splicing Plays an Important Role in Complex Transcriptional Regulation
Alternative splicing plays an important role in stress responses and could enhance transcript diversity dramatically. The phospholipid signaling pathway is involved in regulating plant growth and aging [30]. Additionally, phosphatidylinositol phosphate 5-kinase (PIP5K) is a key part of the phospholipid signaling pathway [31]. PIP5K proteins play important roles in plants and have many functions. For example, AtPIP5K1 is involved in responses to water stress and the abscisic acid signaling pathway in A. thaliana [32]. AtPIP5K4 is involved in regulating stomatal opening in A. thaliana [33]. In this study, 1243 AS events were identified, and some notable phenomena were found. LmPIP5K9 produced two different transcripts, i.e., CDS1 (i2_LQ_DE_c113853/f1p10/2988) and CDS2 (i3_LQ_DE_c41394/f1p1/3368), through alternative splicing of a retained intron (RI), each of which may perform different functions in L. multiflorum ( Figure 8A). Read counts of the two transcripts indicated that the expression levels of CDS1 and CDS2 were low in both leaf and root samples under normal growth conditions, which was consistent with a recent study [34]. This finding implies that LmPIP5K9 is not required under normal growth conditions. Additionally, CDS1 is rapidly upregulated, about four-fold, after induction by drought stress, which is also corroborated by other studies [34,35]. However, the expression of CDS2, which itself is low under normal growth conditions, is further down-regulated and almost not expressed at all under drought stress. Thus, two transcripts of the same gene have different expression patterns. Accordingly, LmPIP5K9 may up-regulate the CDS1mediated phospholipid signaling pathway to participate in the drought stress response. In contrast, CDS2 interacts with a cytosolic invertase to negatively regulate sugar-mediated root growth, as previously reported [36]. In future work, we will confirm the functions of these two transcripts, initially in A. thaliana.

Key Distinctive Candidate Genes Involved in the L. multiflorum Drought Stress Response
Drought stress greatly affected gene expression in L. multiflorum, and the affected genes exhibited different expression patterns. We combined differential gene analysis, alternative splicing analysis and gene family analysis to identify some distinctive candidate genes involved in the response to drought stress. Homeobox (HOX) genes have been identified and characterized in many eukaryotes and are involved in regulating various aspects of growth and development [37]. We found that HOX22 (i1_LQ_DE_c57408/f1p2/1087) and HOX24 (i3_HQ_DE_c33357/f3p2/3113) had higher expression levels in DRL samples than other samples (Figure 8B). At the same time, there was an enrichment of DEGs in the Intrinsic/Integral component of Golgi membrane identified by GO analysis, suggesting that HOX22/24 responds to drought stress by participating in the formation of the Golgi apparatus. High-affinity K+ transporters (HAK) are present in all plants with known genomes, but not in animals [38]. These proteins play an important role in K+ uptake and transport [39,40] and are also involved in osmotic regulation [41], stabilizing plants by balancing K+ homeostasis during cell growth and drought stress responses. In the present study, the expression of LmHAK7 in root tissues after drought stress was remarkably high compared with leaf tissues (Figure 8C), which was consistent with a recent study [42]. Characterizing intense responses to drought stress (i.e., fold change DRR:CKR = 71.5) and ultra-high expression in roots (i.e., fold change DRR:DRL = 9.45) will be important focuses of our future work. In the present study, we also found that LmMYB1, LmMYB8 and LmMYB9 were able to improve drought tolerance and salt tolerance in yeast (Figure 7), suggesting that they might have the same function in plants. Future experiments will aim to genetically transform A. thaliana and L. multiflorum for functional verification.

Key Distinctive Candidate Genes Involved in the L. multiflorum Drought Stress Response
Drought stress greatly affected gene expression in L. multiflorum, and the affected genes exhibited different expression patterns. We combined differential gene analysis, alternative splicing analysis and gene family analysis to identify some distinctive candidate

Material Cultivation and Sample Collection
In this study, L. multiflorum cultivar 'Chuannong No. 1 was used. The experimental subjects for transcriptional sequencing included 300 individual plants during their flowering period, all of which were cultivated at the Ya'an research station of Sichuan Agricultural University. In order to obtain more comprehensive full-length transcriptome sequence information, roots, stems, leaves and flowers were collected from 30 randomly selected individual plants in the control group (CK) and in vitro drought treatment group after 6 h (DR).
Samples for expression pattern analysis were planted in pots (25 × 19 × 6 cm) in a growth chamber with day/night cycles of 16 h/8 h and 22/20 • C. When the number of leaves on plants reached three or four, stress treatments were initiated. The 15% PEG6000 is used to simulate drought stress. The leaves were sampled at 0 h, 15 min, 30 min, 1 h, 2 h, 3 h, 6 h, 12 h and 24 h after drought stress. All tissues were immediately stored in liquid nitrogen after sampling and then stored at −80 • C.

Illumina cDNA Library Construction and Next-Generation Sequencing
After RNA was extracted from all samples, 3 µg of RNA was used for Illumina cDNA library construction. Total RNA was extracted by grinding tissue in TRIzol reagent (Invitrogen, Carlsbad, CA, USA) on dry ice and processed following the protocol provided by the manufacturer. For NGS, a total of 12 Illumina cDNA libraries were constructed by using the Illumina Stranded RNA Library Prep Kit (New York, NY, USA). Six libraries were constructed from root tissue and the other six were from leaf tissue. The cDNA library samples were sequenced by Gene Denovo Biotechnology Co. (Guangzhou, China). The quality control of the NGS raw data was conducted as follows: the sequenced raw reads with the adapter and primer sequences and poly-N were filtered out. The clean data were used to correct the PacBio sequencing data in the next step.

PacBio cDNA Library Construction and Single-Molecule Real-Time (SMRT) Sequencing
After RNA was extracted from all samples, 1 µg of RNA was used for PacBio cDNA library construction. Total RNA was extracted from each sample by using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). We constructed two libraries, one from a mixture of roots, stems, leaves and flowers under drought treatment, and the other from a mixture of roots, stems, leaves and flowers under the control treatment. The SMRT sequencing library was prepared with the Clontech SMARTer PCR cDNA Synthesis Kit and the BluePippin Size Selection System protocol. The mRNA enriched by Oligo (dT) magnetic beads was reverse transcribed into cDNA. Then, double-stranded cDNA was generated with the optimum cycle number. In addition, >4 kb size selection was performed using the BluePippinTM Size-Selection System. Then, large-scale PCR was performed for the subsequent SMRTbell library construction. The SMRTbell library was sequenced by Gene Denovo Biotechnology Co. (Guangzhou, China).
The PacBio Sequel sequencing platform, based on SMRT-Seq technology, was used to generate full-length sequence data. The whole data processing process is carried out based on SMRTlink 5.1 (https://www.pacb.com (accessed on 20 May 2020)). Circular consensus sequence (CCS) data were generated from the subread BAM files. CCS.BAM files that were output were then classified into full-length and non-full-length reads using pbclassify.py. The non-full length and full-length fasta files produced were then fed into the cluster step, which performs isoform-level clustering (ICE), followed by final Arrow polishing. Additional nucleotide errors in consensus reads were corrected using Illumina RNA-Seq data with the software LoRDEC. Any redundancy in corrected consensus reads was removed by CD-HIT to obtain final transcripts for the subsequent analysis ( Figure 9). pbclassify.py. The non-full length and full-length fasta files produced were then fed into the cluster step, which performs isoform-level clustering (ICE), followed by final Arrow polishing. Additional nucleotide errors in consensus reads were corrected using Illumina RNA-Seq data with the software LoRDEC. Any redundancy in corrected consensus reads was removed by CD-HIT to obtain final transcripts for the subsequent analysis ( Figure 9).

Identification of Alternative Splicing Events and lncRNA Prediction
The non-redundant transcripts were processed with the Coding GENome reconstruction Tool [45]. Each transcript family was further reconstructed into one or more unique transcript model(s) (referred to as UniTransModels). Error-corrected non-redundant transcripts were mapped to UniTransModels. Splicing junctions for transcripts mapped to the same UniTransModels were examined, and transcripts with the same splicing junctions were collapsed. Collapsed transcripts with different splicing junctions were detected as transcription isoforms of UniTransModels. Alternative splicing (AS) events

Identification of Alternative Splicing Events and lncRNA Prediction
The non-redundant transcripts were processed with the Coding GENome reconstruction Tool [45]. Each transcript family was further reconstructed into one or more unique transcript model(s) (referred to as UniTransModels). Error-corrected non-redundant transcripts were mapped to UniTransModels. Splicing junctions for transcripts mapped to the same UniTransModels were examined, and transcripts with the same splicing junctions were collapsed. Collapsed transcripts with different splicing junctions were detected as transcription isoforms of UniTransModels. Alternative splicing (AS) events were identified with SUPPA (https://github.com/comprna/SUPPA (accessed on 20 May 2020)) [46].
We used the Coding-Non-Coding-Index (CNCI) [47], Coding Potential Calculator (CPC) [48], Pfam-scan [49] and PLEK [50] software tools to predict lncRNA. CNCI profiles adjoin nucleotide triplets to effectively distinguish protein-coding from non-coding sequences independent of known annotations. CPC mainly assesses the extent and quality of the ORF in a transcript and searches the sequences in known protein sequence databases to clarify the coding and non-coding transcripts. Each transcript was translated in all three possible frames and mapped in the Pfam database using Pfam Scan. Any transcript with a Pfam hit was excluded in the following steps. The PLEK support vector machine classifier uses an optimized K-mer approach to construct the best classifier to assess the coding potential for species that lack high-quality genome sequences and annotations. Default parameters were used for four tools. Transcripts predicted with protein-coding potential by any or all of the three tools above were filtered out, and those without any identified coding potential were our candidate set of lncRNAs.

Quantification of Gene Expression Levels and Differential Expression Analysis
Reference sequences were mapped prior to the quantification of gene expression. The reference sequence mapping used CD-HIT to identify the transcripts that are redundant after generating the corrected consensus sequence. The clean reads from RNA-Seq were mapped to the reference sequence. The process used RSEM software [51], and the parameters for bowtie2 were end-to-end and sensitive mode. Default parameters were used for all other settings.
The read count for each transcript was obtained from the mapping results. Differential expression analysis of the two groups was performed using the DESeq2 R package (1.34.0) (http://bioconductor.org/packages/release/bioc/html/DESeq2.html (accessed on 20 May 2020)). The resulting P-values were adjusted using Benjamini and Hochberg's approach for controlling the false discovery rate. Genes with an adjusted p-value < 0.05 as determined by DESeq were identified as differentially expressed.

Transcription Factor Analysis and Identification of R2R3-MYB Gene Family Members
Transcription factors (TFs) are groups of protein molecules that can bind to specific sequences in the 5 upstream sequences of genes to ensure that the target genes are expressed at a specific intensity at a specific time and location [52]. We used iTAK software [53] to predict plant TFs. R2R3-MYB genes were identified in the PFAM protein family database using HMMER 3.0 software [54], with the MYB-like DNA-binding domains (Pfam, PF00249) as the search query [55] and an initial threshold value of E ≤ 10-10. Basic information about these genes, including PIs, MWs and subcellular localization, was predicted using the ExPASy website (https://web.expasy.org/protparam/ (accessed on 10/02/2021)) [56].

Phylogenetic Analysis
The inferred protein sequences of the R2R3-MYB from L. multiflorum and A. thaliana were aligned using ClustalW with its default parameters. MUSCLE [57] and Clustal Omega [58] were also used to verify alignment results. The proteins of A. thaliana MYB were downloaded from the Arabidopsis Information Resource (TAIR) (http://www.aabidopsis. org/ (accessed on 10 February 2021)). Based on the alignment of the MYB domain, a phylogenetic tree was constructed with MEGA 7.0 [59] using the neighbor-joining (NJ) method. Bootstrap values (>50%) were estimated using 1000 replicates. Interactive Tree Of Life (iTOL) software was used to optimize the obtained phylogenetic tree [60].

Expression Pattern Analysis of the R2R3-MYB Gene Family
Total RNA was extracted from leaves under drought stress. The cDNA was synthesized using a MonScript™ RTIII Super Mix with dsDNase (Two-Step) Kit from Monad Biotech Co., Ltd. (Suzhou, China). Subsequently, real-time quantitative PCR (qRT-PCR) was performed using SYBR qPCR Master Mix (Vazyme, China). Gene-specific primers (shown in Supplementary Table S1) were designed to avoid the conserved region. Both eIF4A and HIS3 were used as reference genes [61].

Subcellular Localization and Heterologous Expression
Subcellular localization prediction of gene family members was conducted using the ExPASy website. To confirm the predicted subcellular localization, we inserted the fulllength sequences of LmMYB1, LmMYB8 and LmMYB9 into the pYBA1132 vector. Then, all of them were transformed into tobacco leaves by Agrobacterium-mediated transformation. The primers used are listed in Supplementary Table S7. Empty vector GFP was used as a control. The fluorescence images of GFP fusion proteins were observed by FV10 confocal microscopy.

Heterologous Expression of LmMYB1, LmMYB8 and LmMYB9 in Yeast
LmMYB1, LmMYB8 and LmMYB9 were amplified from the PacBio cDNA library of L. multiflorum constructed using the primers listed in Supplementary Table S7. The correct CDS regions were cloned into a pYES2 vector for expression in yeast. Cells of the sorbitoland NaCl-sensitive yeast strain INVScI were obtained from MiaoLing Plasmid Platform. The plasmids empty pYES2, pYES2-LmMYB1, pYES2-LmMYB8 and pYES2-LmMYB9 were transformed into INVScI yeast by Carrier DNA (Vazyme, China).
Yeast transformants were screened using SD-Ura plates. The whole process lasted for 2 days at 28 • C. To analyze sorbitol and NaCl resistance, the validated single colonies were cultured in liquid SD-Ura medium (2% galactose) at 28 • C and 150 rpm. When the medium OD600 value reaches 2.0, gradient dilution is performed (10 −1 , 10 −2 , 10 −3 ). Diluted yeast was spotted onto SD-Ura plates containing 3 M sorbitol or 2 M NaCl in turn, which were photographed and observed by the naked eye.

Conclusions
Through the analysis of transcriptome changes in leaves and roots of annual ryegrass under drought stress using SMRT and NGS sequencing technologies we identified many genes with potentially related functions and revealed complex transcript responses such as alternative splicing and lncRNA expression. These results contribute to the current understanding of the complexity of transcriptional regulation in plant drought responses. In particular, the R2R3-MYB transcription factor family was identified and analyzed using the high-quality full-length transcriptome. The candidate drought response regulatory factors LmMYB1, LmMYB8 and LmMYB9 in L. multiflorum were preliminarily verified in both tobacco and yeast. These results help clarify the mechanism of plant responses to stress. Future studies will focus on plant responses to stress to improve our understanding of plant adaptation to climate change.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijms23147921/s1. Author Contributions: X.Z. and L.H. contributed to the conception of the study; Q.L., F.W. and Y.S. performed the experiment; Q.L. and F.W. contributed significantly to analysis and manuscript preparation; Q.L. and F.W. performed the data analyses and wrote the manuscript; Q.L., X.Z. and L.H. helped perform the analysis with constructive discussions. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement:
The datasets generated in this study have been uploaded to the NCBI database under the accession number PRJNA634598.