Identification and Comparative Analysis of Genes and MicroRNAs Involved in the Floral Transition of the Xinjiang Early-Flowering Walnut ( Juglans regia L.)

: For tree crops, shortening the juvenile phase is a vital strategy to advance fruit bearing and enhance the breeding efficiency. Walnut ( Juglans regia L.) seedlings usually take at least eight to 10 years to flower, but early-flowering (EF) types can flower one or two years after planting. In this study, RNA sequencing (RNA-Seq) and microRNA sequencing (miRNA-Seq) were used for a tran-scriptome-wide analysis of gene and miRNA expression in hybrids of the Xinjiang EF walnut variety ‘Xinwen 81’ and later-flowering (LF) walnut. Based on a high-quality chromosome-scale reference genome, a total of 3009 differentially expressed genes (DEGs) were identified, of which 933 were upregulated (accounting for 31%) and 2076 were downregulated (accounting for 69%). DEGs were functionally annotated, and the flowering-related genes FLOWERING LOCUS T ( FT ), SUP-PRESSOR OF OVEREXPRESSION OF CO 1 ( SOC1 ), and LEAFY ( LFY ) showed remarkable upregu-lation in EF compared with in the LF walnut. In addition, miRNAs associated with floral transition were screened as candidates for flowering time regulation in the walnut. This work provides new insights into walnut floral transition, which may ultimately contribute to genetic improvement of the walnut.


Introduction
In flowering plants, the floral transition from the juvenile to the adult phase is controlled by several exogenous and endogenous developmental signals to ensure an appropriate flowering time [1,2]. In the model plant Arabidopsis thaliana, this transition is regulated by the photoperiodic, vernalization, autonomous, and gibberellin pathways [1,3]. Crosstalk between those pathways in this critical reproductive stage is regulated by a network of flowering-related genes [1,[4][5][6]. For example, FLOWERING LOCUS C (FLC) functions as a flowering repressor in Arabidopsis; in contrast, FLOWERING LOCUS T (FT), or florigen, promotes flower initiation and is directly repressed by FLC under long-day photoperiod (LD) conditions [7][8][9][10]. In addition to the four major pathways described above, floral transition is influenced by the miR156-dependent aging pathway [11,12]. Recent studies have revealed that miR156 and miR172 play key roles in the regulation of the floral transition. The expression level of miR156 is high in the juvenile stage and gradually decreases during the adult stage, whereas miR172 exhibits the opposite expression pattern [12,13]. It is crucial to understand how these pathways function in response to diverse developmental and environmental signals in model and crop plants. However, few flowering time genes have been identified in perennial woody plants.
Perennial woody plants require longer time periods to complete the transition from the juvenile to the adult phase. After the floral transition, they can generate both flowering and vegetative buds each year. In perennial plants, FLC-like genes are expressed at different levels that correspond with evolutionary relationships between plants [14][15][16][17]. It has been shown that, in biennial-to-perennial wavy bittercress (Cardamine flexuosa), integrated age and vernalization pathways generally regulate flowering with CfFLC and miR156-SPL-miR172 modules [18]. Most woody fruit trees have long juvenile vegetative phases, which is the major limitation to early economic benefits for fruit production. Despite the differences in lifespan between annual and perennial plants, similar genetic and epigenetic processes seem to underlie flower transition from the vegetative to reproductive stage.
Persian walnut (Juglans regia L.) is a well-known nut fruit with high nutritive value [19]. It originated in the mountain valleys of Central Asia and is now grown worldwide in countries such as the USA, Iran, and China [20]. This species takes approximately eight to 10 years to bloom for fruit setting after seedlings are planted; the relatively long juvenile phase delays economic income for several years after planting, making early-flowering (EF; short juvenile phase) walnut germplasm resources more desirable for walnut breeders and farmers. Compared with other plants, the mechanism of floral transition in walnut remains less studied. Using RNA sequencing (RNA-Seq) analyses, recent studies have proposed several candidate genes associated with the juvenile-adult phase change in walnut trees [21,22], but the critical genes regulating the precocious characteristic of EF walnut germplasm are still unknown. A high-quality chromosome-scale reference genome was recently released for the walnut [23]. The combination of RNA-Seq data with this reference genome will provide more informative profiles for understanding the floral transition.
In the present study, we created hybrid offspring between the EF cultivar 'Xinwen 81' and the late-flowering (LF; or long juvenile phase) type, the Xinjiang local walnut. RNA-Seq and microRNA-Seq (miRNA-Seq) analyses were used to identify differentially expressed genes (DEGs) and differentially expressed miRNAs associated with the floral transition in walnut varieties. Together, these results not only lay the foundation for further studies on juvenile-adult phase change regulation in walnut varieties but, ultimately, may contribute to the development of walnut breeding programs for the selection of EF germplasm.

Plant Materials and Sample Collection
In this study, all walnut materials were planted and collected in a field in Alar city, Xinjiang province, China (40°32′27.74239″ N, 81°18′3.40166″ E, altitude 957.37 m). Briefly, all walnut materials used in this study were obtained following the procedure. Firstly, we hybridized a bred EF variety ('Xinwen 81') and a local LF walnut variety by manual pollination to obtain the F1 offspring in 2013. Then, seeds from the F1 offspring were harvested and sown. We observed whether the F1 seedling plants were able to complete the floral transition with visible floral buds in 2014 and 2015. The F1 seedling walnuts that flowered for one or two years were marked as EF offspring, while LF F1 seedling walnuts were marked as those that flowered beyond 2 years. For RNA-Seq and miRNA-Seq analyses, EF and LF bud samples were separately mixed with three repeats from the F1 hybrid population. The mixed samples of EF and LF walnut offspring buds were harvested, flashfrozen in liquid nitrogen, and then stored at −80 °C for further analyses.

RNA Extraction, Library Preparation, RNA Sequencing, and microRNA Sequencing
Total RNA was extracted from the flower buds of EF and LF walnut plants using the RNA Easy Fast Plant Tissue Kit (TIANGEN, Beijing), and cDNA was generated with the FastKing RT Kit (TIANGEN, Beijing), both according to the manufacturer's instructions. The Agilent Bioanalyzer 2100 was used to measure the RNA quality, and only high-integrity RNA was used for further analyses. For RNA-Seq, the library was prepared using a TruSeq RNA kit (Illumina, San Diego, CA, USA) following the manufacturer's recommendations. For miRNA-Seq, the library was prepared using the MicroRNA Sample Pre Kit as instructed by the manufacturer. High-throughput sequencing was performed using the Illumina HiSeq2500 platform.

RNA-Seq and miRNA-Seq Data Analysis
RNA-Seq libraries were evaluated for quality using FastQC. Low-quality reads, and adapter sequences were filtered using Trim-galore v0.6.2 (https://github.com/FelixKrueger/TrimGalore, accessed on 4.3.2021). Cleaned reads were aligned to the high-quality chromosome-scale assembly of the walnut reference genome [23]  miRNA-Seq data were evaluated for quality, and low-quality reads and adapter sequences were removed. Reads that were 18 to 30 nt long were included in subsequent analyses. Cleaned reads mapped to ribosomal RNA (rRNA), transfer RNA (tRNA), small nucleolar RNA (snoRNA), and small nuclear RNA (snRNA) sequences were filtered using Bowtie2, and then the remaining reads were aligned to the walnut genome using miRDeep2 [25]. TPM values were calculated for each miRNA, and then the miRNA abundance was compared between EF and LF samples using the DEseq2 package in R, as described above.

Functional Enrichment Analysis
DEGs were annotated using Gene Ontology (GO) terms from the agriGO v2.0 database [26], and functional enrichment was determined using Fisher's test and Yekutieli's multi-test adjustment method. GO terms with false detection rate (FDR) values < 0.05 were defined as significantly enriched. The Kyoto Encyclopedia of Genes and Genomes (KEGG) [27] was used for biochemical pathway annotation and enrichment analysis, with significantly enriched terms also being defined as those with FDR < 0.05.

miRNA Target Prediction
miRNA target genes were predicted using TargetFinder [28] with default parameters.

Quantitative Real-Time PCR Validation
Total RNA was extracted and first-strand cDNA was synthesized as described above. Diluted cDNA was used as a template for quantitative real-time (qRT)-PCR, which was performed using a SYBR Premix ExTaq II Kit on an ABI 7500 qPCR instrument (Applied Biosystems, Waltham, MA, USA). The 2 −ΔΔCt method was used to evaluate the relative expression levels of target genes in each sample [5]. All primers used in this study are shown in Table S1.

RNA Sequencing and Transcription Analysis
RNA was extracted from six samples (LF and EF walnut flower buds, each with three biological replicates: EF1-3 and LF1-3, respectively). Illumina sequencing yielded a total of 33.71 million paired-end reads in a total of 42.47 Gb sequencing data. These raw data sequences have been submitted to the Gene Expression Omnibus (GEO) database of the National Center for Biotechnology Information (NCBI) under the accession number GSE193061. After quality checking and cleaning, the remaining reads (those with bases of Q ≥ 20) constituted over 93% of the samples, comprising between 4.63 and 6.55 million reads per sample. The percentage of uniquely mapped reads ranged from 96.61 to 96.78% (Table S2), and only the uniquely mapped reads were further used to calculate the normalized gene expression level as fragments per kilobase of transcript per million mapped reads (FPKM).
To gain insight into transcriptomic differences between EF and LF varieties, we performed a principal component analysis (PCA) ( Figure 1A), correlation analysis ( Figure  1B), protein-coding read annotation ( Figure 1C), expression level analysis ( Figure 1D), and expression level density analysis ( Figure 1E). The PCA yielded two principal components (PCs) that explained 87% and 5% of the variance for all six samples. Using the PCA and correlation analysis, we found that the three biological replicates of EF and LF walnut varieties had very high levels of replicability ( Figure 1A,B). The average number of reads annotated as protein coding genes in the LF walnut samples was higher than in the EF walnut samples ( Figure 1C). To minimize transcriptional noise, we defined expressed genes as those with FPKM ≥ 1. Using this cutoff, the number of expressed genes in the LF walnut samples was lower than in the EF walnut samples ( Figure 1D,E).

Identification and Enrichment Analysis of DEGs between EF and LF Walnut Varieties
A total of 3009 genes were identified as significant DEGs between the EF and LF samples, 933 of which were upregulated (accounting for 31%) and 2076 of which were downregulated (accounting for 69%) (Figure 2A-E). We visualized the locations of these DEGs among all chromosomes using a circos plot (Figure 2A) and directly show gene expression levels with a volcano plot ( Figure 2C) and a heatmap ( Figure 2D). To determine the functions of the identified DEGs, we performed a GO analysis. Significant GO terms were categorized into three main groups: biological processes, cellular components, and molecular functions ( Figure 3A). In the biological process category, 'microtubule cytoskeleton' was the most highly enriched term. In the molecular function category, 'microtubule motor activity' and 'oxidoreductase activity' were the two most highly enriched terms. In the cellular component category, there were five enriched terms, of which 'microtubule-based process' was the most significantly enriched, followed by 'cell cycle process', 'movement of cell or subcellular component', and 'microtubule-based movement'. We further selected the top 20 terms with the smallest FDR values, and most of the terms were found to be highly relevant to the microtubule and cytoskeleton. Four DNA replication-associated terms and two cell cycle-associated terms were shown to be also enriched. These results indicate that the DEGs identified are associated with growth and development, likely reflecting the differing regulation of plant development and flowering time between the EF and LF offspring walnuts. showing the significantly enriched KEGG terms, which were divided into four groups: environmental information processing, human diseases, genetic information processing, and metabolism.
We next used the KEGG biochemical pathway analysis [27] to further investigate DEG functions. All DEGs were mapped to terms in the KEGG database and then filtered with a cutoff of p < 0.05 to identify significantly enriched terms. DEGs annotated as being involved in 'Plant hormone signal transduction', 'DNA replication', 'Anthocyanin biosynthesis', and 'Phenylpropanoid biosynthesis' were the most significantly enriched ( Figure  3B), suggesting that these genes may function in walnut plant development.

miRNA Sequencing and Data Analysis
miRNA was extracted from six samples (EF and LF walnut flower buds, each with three biological replicates) and sequenced, yielding a total of 91.98 Mb of sequenced data points. These raw data sequences have been submitted to the GEO under the accession number GSE193061. Only reads that were 18 to 30 bp in length were used in subsequent analyses ( Figure 4A). Two PCs explained 71.0% of the total sample variance, and the three biological replicates of each variety had very high levels of replicability ( Figure 4B). Clean reads were mapped to ribosomal RNA (rRNA), transfer RNA (tRNA), small nucleolar RNA (snoRNA), and small nuclear RNA (snRNA) sequences ( Figure 4C), and then the remaining reads were aligned to the walnut genome. miRNAs from 18-23 bp had a first nucleotide bias towards U, whereas the bias was for A and G in 24 and 25 bp miRNAs, respectively ( Figure 4D). TPM values were calculated for each miRNA, and a density analysis was also performed ( Figure 5A). The heatmap ( Figure 5B) directly shows the Pearson correlation coefficients between samples and again indicates a high level of repeatability among the biological replicates of each sample, consistent with the PCA results.   Table S3). miRNA expression cluster analysis ( Figure S1) and GO and KEGG function analyses were also performed ( Figure S2). The top 10 most highly enriched GO terms in each category are shown in Figures 6F and Figure S2A, and the most enriched KEGG terms are shown in Figures 6G and Figure S2B.

Identification of Genes and miRNAs Associated with Flowering Time
To demonstrate the regulatory network between miRNAs and mRNAs, miRNA target genes were predicted. We constructed an interaction network between differentially expressed miRNAs and DEGs ( Figure 6A) based on the predicted results, and the miRNAs in the jre-MIR156 family were verified using qRT-PCR ( Figure 6B). We collected the gene list including the flowering genes and homologous genes of flowering genes in Arabidopsis thaliana, visualized it with a heatmap (Figure 6C), and validated gene expression levels by comparing results from qRT-PCR and RNA-Seq ( Figure S3). A GO enrichment analysis was also performed ( Figure S4). In Arabidopsis, the regulation of the flowering time involves a comprehensive network, in which FT [9,29] is a key hub gene that promotes flowering. LFY (floral meristem identity control protein LEAFY) [30] is a transcriptional regulator that plays a regulatory role in flowering. SOC1 (AGAMOUS-like 20) [31] acts downstream of FT and controls flowering. We validated the expression of SOC1 (Jr1DG00265800), LFY (Jr6SG00015000), and FT (Jr4DG00095200) with qRT-PCR, and found that the three genes were more highly expressed in the EF than in the LF walnut variety ( Figure 6D). These results suggest that these candidate genes play vital roles in the regulation and promotion of flowering in the EF walnut variety.

Discussion
In woody horticultural plants (fruit trees), the early-flowering/precocious trait is desirable because it permits early fruit setting and timely harvest. The selection of earlyflowering walnut cultivars may increase or stabilize production, and some walnut genotypes originating from central Asia (including Xinjiang province) display an early flowering phenotype [32]. In vitro propagation of these EF walnut varieties and observations of flower bud formation has confirmed the EF phenotype [33][34][35]. The identification of genetic mechanisms underlying the modulation of flowering time can contribute to walnut breeding goals by promoting fruit production [20] and aids in the selection of lines to study the genetic cues controlling flowering in woody perennials more broadly.
Currently, in walnut varieties, very little is known about the signals that underlie the transition between the juvenile and adult phases. Recent studies have identified flowering-related genes associated with this transition in walnut trees [21,22]. Remarkably, transcriptome profiling (RNA-seq) has revealed thirty-one DEGs associated with this floral transition in J. regia [22]. However, that study performed a RNA-seq analysis that lacked a high-quality walnut reference genome. Thanks to technology advances, a walnut reference genome has been released with the new version [23], which allowed us to easily and conveniently identify profiles of DEGs and differentially expressed miRNAs associated with the floral transition. For example, with a high-quality walnut reference genome, it was possible to mark all DEGs and differentially expressed miRNAs as special location IDs in our study (Table S2 and Table S3). Moreover, our study performed RNA-seq and microRNA-seq followed by a bulk segregant analysis (BSA) of the F1 offspring population obtained from a bred EF walnut 'Xingwen81' and a local LF walnut cultivar [22].
Recently, epigenetic factors have been a major driver for understanding the mechanisms associated with floral transition [1]. For example, various histone modifications have been revealed as being associated with the regulation of the flowering time in model plants [5,36,37]. In woody plants, it has been shown that the complex regulatory networks associated with floral transition are controlled by epigenetic mechanisms [38,39], implying that specific genes and epigenetic mechanisms have essential roles in Xinjiang precocious walnuts. Thus, future research will focus on identifying the functional impacts of these DEGs and miRNAs and investigating the chromatin states of the candidate genes.
In conclusion, we profiled differentially expressed genes and miRNAs associated with the floral transition in the Xinjiang EF walnut. Using RNA-Seq and validating the data with qRT-PCR analyses, we found that FT, SOC1, and LFY are highly expressed in the EF walnut variety. We additionally identified a set of miRNAs associated with the floral transition. Together, these results not only lay the foundation for further studies into the regulation of the juvenile-adult phase change but could also contribute to the development of walnut breeding programs for selecting EF germplasm.

Supplementary Materials:
The following are available online at www.mdpi.com/article/10.3390/horticulturae8020136/s1. Figure S1: miRNA expression trend analysis. Figure S2: Bubble diagram of the miRNA enrichment analysis using GO (A) and KEGG (B). Figure S3: Validation of candidate genes with qRT-PCR. Figure S4: GO enrichment analysis of candidate genes. Table S1: List of primers used in this study. Table S2: DEGs between the EF and LF walnut varieties by RNA-seq. Table S3: Differentially expressed miRNAs between the EF and LF walnut varieties by mi-croRNA-seq.