Transcriptome Analysis of Genes Involved in Fatty Acid and Lipid Biosynthesis in Developing Walnut (Juglans regia L.) Seed Kernels from Qinghai Plateau

Walnut (Juglans regia) is an important woody oil-bearing plant with high nutritional value. For better understanding of the underlying molecular mechanisms of its oil accumulation in the Qinghai Plateau, in this study we monitored walnut fruit development, and 15 cDNA libraries were constructed from walnut seed kernels collected at 72, 79, 93, 118 and 135 days after flowering (DAF). The candidate genes were identified using sequencing and expression analysis. The results showed that the oil content in the kernels increased dramatically in late July and reached the maximum value of 69% in mature seed. More than 90% of the oils were unsaturated fatty acids (UFAs) and linoleic acid (18:2) was the predominant UFA accumulated in mature seed. Differentially expressed genes (DEGs) in 15 KEGG pathways of lipid metabolism were detected. We identified 119 DEGs related to FA de novo biosynthesis (38 DEGs), FA elongation and desaturation (39 DEGs), triacylglycerol (TAG) assembly (24 DEGs), oil bodies (12 DEGs), and transcription factors (TFs, 6 DEGs). The abundantly expressed oleosins, caleosins and steroleosins may be important for timely energy reserve in oil bodies. Weighted gene coexpression network analysis (WGCNA) showed that AP2/ERF and bHLH were the key TFs, and were co-expressed with ACC1, α-CT, BCCP, MAT, KASII, LACS, FATA, and PDCT. Our transcriptome data will enrich public databases and provide new insights into functional genes related to the seed kernel lipid metabolism and oil accumulation in J. regia.


Introduction
Walnut (Juglans regia L.), as an important woody oil plant, is widely cultivated in many regions around the world. Walnut is rich in several beneficial compounds, e.g., unsaturated fatty acids (UFAs), proteins, minerals, and tocopherols [1], and therefore it has gained tremendous interest due to its nutritional and medicinal benefits [2]. Beneficial effects of walnut consumption have been reported, including anti-atherogenic, anti-inflammatory, antimutagenic, and anticancer effects [3], and protection from diabetes [4,5] or cardiovascular diseases [6]. A mature walnut kernel contains a high oil content, which can vary from 52% to 72% [7][8][9]. The application of vegetable oils is determined by FA composition, which is one of the most important characteristics of vegetable oil quality and which determines the suitability of a vegetable oil for nutrition [2].
The vertical diameter of walnut fruit slowly increased to 50.73 mm and then decreased to 50.33 mm in the last period; the fruit transverse diameter slowly increased to 45.69 mm and then decreased to 45.52 mm in the last period ( Figure 1C, Table S1).
The total oil content of the walnut kernel samples from 10 time points was calculated ( Figure 1D, Table S1). The total oil content continuously increased from 20.20% to 69.17% from 12 July to 11 September, and then slightly decreased to 68.43% on 20 September. The total oil content sharply increased from 19 July (24.10%) to 5 August (51.25%), and then showed a slowly rising trend. The maximum oil increase in the walnut kernels appeared Single fresh fruit weight rapidly increased from 51.983 g to 59.375 g (from 12 July to 9 August), then slowly increased to 61.479 g (3 September), and then slightly decreased from 61.479 g to 59.121 g (from 11 September to 20 September). Single nut weight and single walnut kernel weight showed slow upward trends from 3.63 g to 12.11 g, and 0.72 g to 5.43 g, respectively ( Figure 1B, Table S1).
The vertical diameter of walnut fruit slowly increased to 50.73 mm and then decreased to 50.33 mm in the last period; the fruit transverse diameter slowly increased to 45.69 mm and then decreased to 45.52 mm in the last period ( Figure 1C, Table S1).
The total oil content of the walnut kernel samples from 10 time points was calculated ( Figure 1D, Table S1). The total oil content continuously increased from 20.20% to 69.17% from 12 July to 11 September, and then slightly decreased to 68.43% on 20 September. The total oil content sharply increased from 19 July (24.10%) to 5 August (51.25%), and then showed a slowly rising trend. The maximum oil increase in the walnut kernels appeared between 79 days after flowering (DAF, G2) and 89 DAF ( Figure 1D). The mature walnut seed oil content was more than 69% of the total composition of the walnut seed. According to the embryo morphology development and the oil content, seed kernels at the G1-G5 stages were chosen for RNA-seq analysis.
The FA composition and content of walnut oil in the mature walnut seed kernels were analyzed (Table 1), and a total of 16 FAs were detected and measured. There were five kinds of main FAs in the mature walnut seeds: 16:0, 18:0, 18:1, 18:2 and 18:3. The total content of these FAs accounted for more than 99% of the total FAs, while the content of other FAs was lower. The two predominant compositions of saturated fatty acids (SFAs) in mature seeds were 16:0 (6.41 %) and 18:0 (2.04 %), which were maintained at low levels, while the main UFAs were 18:2, 18:1 and 18:3. The total content of the three UFAs accounted for more than 90% of the total FAs. PUFAs were the major FAs, accounting for more than 70%. A proportion of 57.67% of the total FAs was 18:2, and this was the primary fatty acid in walnut oil.

Identification of DEGs and Enrichment Analysis during Seed Development
A total of 709,394,820 raw reads were obtained from 15 sequencing libraries and 103.82 G clean bases were obtained after quality control. All of the clean reads were mapped to the walnut genome sequence; the average total map rate was 97.55% and the proportion of the total map rate ranged from 97.09% to 98.06%. The mapping results of each sample are shown in Table S2. The average reads number mapped to the exon region of the genome was 6,441,750,032 and the average proportion mapped to the exon region of the genome was 95.606%. The mapping results of each sample are shown in Table S3. Over 92% of sequences aligned to the reference genome located in the exon regions. The percentage of the sequences located in intergenic regions ranged from 1.35% to 4.27%; a small proportion of sequences were located in intron regions ( Figure S1).
Raw data of all raw reads are available in the NCBI SRA database with the accession number PRJNA781571.
The gene expression levels of all samples from five fruit development stages were qualified and compared, and the DEGs were detected. The clustering heatmaps of all DEGs were analyzed (Figure 2A). The numbers of DEGs identified in samples from different developmental stages can be seen in Figure 2B. It is interesting that the numbers of DEGs increased with the walnut seed development. A total of 4035 DEGs were found in G1 vs. G2, including 1789 upregulated genes and 2246 downregulated genes. A total of 7537 DEGs were found in G3 vs. G1, including 3411 upregulated genes and 4126 downregulated genes; 9279 DEGs were found in G4 vs. G1, including 4219 upregulated genes and 5060 downregulated genes. The most DEGs were found in G5 vs. G1, including 6333 upregulated genes and 7164 downregulated genes. A Venn diagram of DEGs was drawn, and a total of 2168 genes were expressed in all five stages ( Figure 2C). To analyze the gene expression pattern, clustering analysis was performed, and all the genes were clustered into four major patterns ( Figure 2D).
Raw data of all raw reads are available in the NCBI SRA database with the accession number PRJNA781571.
The gene expression levels of all samples from five fruit development stages were qualified and compared, and the DEGs were detected. The clustering heatmaps of all DEGs were analyzed ( Figure 2A). The numbers of DEGs identified in samples from different developmental stages can be seen in Figure 2B. It is interesting that the numbers of DEGs increased with the walnut seed development. A total of 4035 DEGs were found in G1 vs. G2, including 1789 upregulated genes and 2246 downregulated genes. A total of 7537 DEGs were found in G3 vs. G1, including 3411 upregulated genes and 4126 downregulated genes; 9279 DEGs were found in G4 vs. G1, including 4219 upregulated genes and 5060 downregulated genes. The most DEGs were found in G5 vs. G1, including 6333 upregulated genes and 7164 downregulated genes. A Venn diagram of DEGs was drawn, and a total of 2168 genes were expressed in all five stages ( Figure 2C). To analyze the gene expression pattern, clustering analysis was performed, and all the genes were clustered into four major patterns ( Figure 2D).  GO (Gene Ontology) annotation classed all the DEGs. The intracellular organelle parts and organelle parts were the top two terms in G2 and G1, followed by cytoskeletal protein binding, tubulin binding, microtubule binding, microtubule-based process, motor activity, microtubule motor activity, microtubule-based movement, and movement of cell or subcellular component ( Figure S2A). Molecular function regulator and cytoskeletal protein binding were the top two terms in G3 and G1, followed by enzyme regulator activity, tubulin binding, motor activity and microtubule binding ( Figure S2B). Ribonucleoprotein complex, ribosome, carboxylic acid metabolic process, oxoacid metabolic process and organic acid metabolic processes were the top five terms in G4 and G1 ( Figure S2C). Cytoskeletal protein binding, carboxylic acid biosynthetic process, and organic acid biosynthetic process were the top three terms in G5 and G1, followed by tubulin binding, microtubule binding, and pyridoxal phosphate binding ( Figure S2D).  (Figure 3). Carbon metabolism (ath01200) was the top pathway that DEGs enriched, followed by plant hormone signal transduction pathway (Ko04075) in G2 and G1 ( Figure 3A). Plant hormone signal transduction pathway (Ko04075) was the most enriched pathway in G3 and G1, followed by glycolysis/gluconeogenesis (ath00010) ( Figure 3B). Ribosome (ath03010) was the top pathway that DEGs enriched, followed by biosynthesis of amino acid pathway (ath01230) and carbon metabolism in G4 and G1 ( Figure 3C). For G5 and G1, the carbon metabolism pathway and biosynthesis of amino acid pathway were the top two pathways, followed by the plant hormone signal transduction pathway (Ko04075) ( Figure 3D). There are a few pathways involved in FA biosynthesis and lipid metabolism shown in the figure. A total of 15 pathways of these involved DEGs were annotated to lipid metabolism in our work ( Figure S3, Table 2).

Identification and Expression Profiling of DEGs for Fatty Acid Biosynthesis, Elongation, and Desaturation
In this study, we focused on the key genes associated with lipid synthesis and oil accumulation. Based on the gene annotation for the transcriptome of the five developmental walnut seeds, DEGs related to FA biosynthesis, FA elongation, biosynthesis of UFAs, TAG biosynthesis, oil body and transcription factors (TFs) were screened to further investigate their expression patterns during walnut seed development ( Figure 4).
Acetyl-CoA carboxylase (ACCase) catalyzes the first step of the FA biosynthesis pathway in the plastids. Acetyl-CoA conversion to malonyl-CoA is catalyzed by ACCase that consists of four subunits, including biotin carboxylase (BC), carboxyl transferase subunit alpha (α-CT), carboxyl transferase subunit beta (β-CT) and biotin carboxyl carrier protein (BCCP). A total of 13 DEGs encoding ACCase and its subunits, including three ACC1, six BCCP, and four α-CT, were found in this research. Our analysis revealed that ACC1 (109007141), BCCP (108991825, 108988222) and α-CT (109000808) were the dominant ones because their transcriptional levels were maintained at high levels (KPFM > 80) from G3 to G4, and then reduced afterwards at the G5 stage. This may promote rapid increases in oil content during this stage. The transcriptional levels of six BCCP repeats were similar, and were high at the G1-G4 stage and lower at the G5 stage.  Among the four α-CT, three α-CTs (108981788, 109000808 and 109004195) showed a similar pattern which was upregulated continuously from the G1 to G4 stage, but downregulated at the G5 stage. The transcription of the other α-CT (109018146) was maintained at a low level during G1-G4, but was high at the G5 stage. Then, MAT further transfers malonly-CoA to malonly-ACP. Two DEGs was identified as MAT, were highly expressed from G1 to G4, and showed the lowest level at the G5 stage. Subsequently, six continuous condensation reactions are catalyzed by KAS, KAR, HAD and EAR. The 14:0-ACP is transformed to 16:0-ACP, after seven cycles of those reactions, 4:0-ACP is transformed to 18:0-ACP. Then, 16:0-ACP and 18:0-ACP are transformed to 16:0 and 18:0 by FATB and FATA, respectively. We identified 23 DEGs, including two KASIII, four KASII, three KASI, four KAR, two HAD, three EAR, three FATB, and two FATA ( Figure 4A, Table S4). Among these DEGs, one KAR (108987089) and one FATB (109004120) showed similar transcriptional patterns, and their transcriptional levels were the lowest at G1 and higher from G2 to G5. The transcription of the other DEGs mentioned maintained high levels from G1 to G4, but had the lowest level at G5. These expression patterns provided an explanation for the continuous and rapid oil accumulation during early seed developmental stages (G1 to G4).

Identification and Expression Profiling of DEGs for TAG Assembly and Oil Accumulation
TAG assembly takes place in the ER and is synthesized by the acyl-CoA-dependent pathway (Kennedy pathway) and the acyl-CoA-independent pathway. Glycerol-3-phosphate (G3P) and acyl-CoAs are taken as primary substrates [32]. Glycerol-3-phosphate acyltransferase (GPAT), lysophospholipid acyltransferases (LPLATs), phosphatidic acid phosphatase (PAP), phospholipase A2 (PLA2), DGAT, LPCAT, PDAT and phosphatidylcholine: diacylglycerol cholinephosphotransferase (PDCT) are involved in TAG assembly. Seven, six, one, one, three, four and two DEGs were identified as GPAT, LPLAT, PAP, PLA2, DGAT, PDAT and PDCT, respectively. Among the seven GPATs, GPAT3 (109006642) exhibited higher expression levels than the other GPATs during seed development, with FPKM values ranging from 19.15 to 39.04, reaching a peak at G3 and then significantly decreasing. The expression levels of GPATs (108982438, 109020163 and 108981658) decreased significantly after attaining a peak at G4. One PAP and one PLA2 were highly expressed at G3, with FPKM values of 12.11 and 132.92, respectively, and which then decreased. LPLAT superfamily members are acyltransferases of de novo and remodeling pathways of glycerophospholipid biosynthesis. The incorporation of an acyl group from either acyl-CoAs or acylACPs into acceptors such as glycerol 3-phosphate, dihydroxyacetone phosphate was catalyzed by the proteins mentioned above. LPLATs such as LPCAT-1, lysophosphatidylethanolamine acyltransferase (LPEAT, also known as MBOAT2) are included in this superfamily [33][34][35]. The FPKM (fragments per kilobase of transcript sequence per millions base pairs sequenced) values of six LPLATs ranged from 0.73 to 124.65, of which the FPKM of LPLAT (108984432) was higher (24.82 to 124.65), while the remaining DEGs expression levels were relatively low. LPLATs (108984432, 108997125, 109011026 and 108988977) were highly expressed at G1, and the others (108995554, 108991699) were highly expressed at G5 ( Figure 4D, Table S4).
The expression levels of PLA2s were high during all the developmental stages, with FPKM values from 65.72 to 132.92. The expression levels of PDCT (108998331) were maintained at a stable, high level (78.49~89.29) at G1-G4, and then significantly declined to a low level (FPKM < 9). Another PDCT was less expressed during the seed development, with FPKM values from 0.16 to 2.09. DGAT and PDAT are essential enzymes for TAG biosynthesis. In walnuts, among the three DGAT1s, the expression level of DGAT1 (109011752) was significantly higher than that of other members during seed development (FPKM > 27), while the transcription of one DGAT1 (109009971) was maintained at a low level all along (FPKM < 1). The expression levels of the three DGAT1s were most abundant at G5. The two PDATs (109000668, 109008819) exhibited higher expression levels than others during the seed development (FPKM > 12), and one PDAT (109000668) was upregulated continuously at G1-G4, then downregulated at G5. These results suggested that TAG assembly occurred during the entire seed development (G1-G5) under the synergistic effect of these genes ( Figure 4D, Table S4).
Seed oils are mainly stored in the form of TAG, which is stored in oil bodies (OBs). Oleosins (OLEs), caleosins (CLOs) and steroleosins (STEs) are associated with seed OBs. Twelve DEGs (four CLO, five OLE and three STE) were identified in this work. Throughout development, the five gene homologs encoding OLE were highly expressed in the seed kernel (595 < FKPM < 7678) ( Figure 4E, Table S4), and had the highest expression levels at G3 or G4. In addition, the OLE-encoding genes were more strongly upregulated than the CLOor STE-encoding genes from G1 to G4. Among the four CLOs, CLO (10893007) exhibited higher expression levels than other CLOs (FPKM > 573), reached a peak at G4, and then declined at G5. The expression level of STE (108984079) was significantly higher than the other STEs and was expressed to the highest level at G4. TFs are important activators which modulate gene expression at the transcriptional level. Besides the lipid-related genes, six crucial TFs were differentially transcribed during the seed development, including two B3 domain-containing transcription factor ABI3 (ABI3), one B3 domain-containing transcription factor FUS3 (FUS3), and three ethylene-responsive transcription factor WRI1 (WRI1). The two ABI3s were highly expressed at G1-G5. The differentially transcribed WRI1 (109021237) maintained a relatively low level at G1-G5 (FKPM < 1). FUS3 and WRI1 (108983551, 109010003) were highly expressed at G1-G4, but declined sharply at the following stage.

Identification and Expression Profiling of DEGs of HSPs and HSFs
Because of its unique geographical location, in the Qinghai Plateau there are large temperature differences between day and night ( Figure S4). Temperatures are typically lowest in the morning and reach a maximum in the afternoon. There may be a protective mechanism during seed kernel development to eliminate the impacts of large temperature differences and protect seed development and the accumulation of oil. Interestingly, differentially expressed HSPs were found in developing seeds, and they were identified as HSP70 and small HSPs (SHSPs). Many HSPs were highly expressed at G1, G2, and G5 ( Figure 4F, Table S4), which may be caused by the large fluctuation of day and night temperature. In addition, 17 DEGs were identified as heat stress TFs or heat shock factors (HSFs). The transcriptional level of most of the HSFs had a relatively low level at G3 ( Figure 4G, Table S4).

Validation of RNA-Seq Results by Quantitative Real-Time PCR (qRT-PCR)
The expression level and temporal transcription patterns of 30 DEGs (Table S5) involved in lipid biosynthesis and metabolism were analyzed to verify the accuracy of our RNA-seq data. Figure 5A shows the expression profiles of the 30 genes related to lipid biosynthesis and metabolism. The expression patterns of the DEGs (CER10, ACC1, LPLAT1, KCS1, PDAT, FATB, HAD, KAR, GPD1, KAS II, LOX6, LPP3 and PAS2) detected by qRT-PCR were consistent with those estimated by RNA-seq ( Figure 5A). Correlation analysis between the RNA-seq and qRT-PCR expression levels of DEGs was conducted ( Figure 5B). The expression levels of the DEGs were positively correlated (p < 0.01) between the RNA-seq and qRT-PCR results, and the Pearson correlation coefficient was 0.6301 ( Figure 5B).

Weighted Gene Co-Expression Network Analysis
Weighted gene co-expression network analysis (WGCNA) was carried out in order to study the relationship between the DEGs and the oil content in walnut seed kernels. The results indicated that 35 modules were identified and marked with different colors ( Figure 6A, Figure S5A). The module-trait correlation relationships were analyzed, which showed that the blue module had the highest correlation with oil content of J. regia ( Figure S5B,C). The most abundant TF families in the blue module were AP2/ERF (36), WD40 (32), MYB (30), bHLH (16), bZIP (16), NAM (14), BTB (13), and WRKY (12) (Table S6).

Discussion
Walnut (J. regia) is an important oil-bearing plant with high nutritional value. Recently, RNA-seq has been shown to be one of the effective methods of revealing the biological mechanism of lipid metabolism and oil accumulation in the oilseed crops. In our study, the oil contents of walnut seed kernels at different developmental stages growing at high altitude were analyzed. DEGs and TFs related to lipid biosynthesis and metabolism were screened and obtained through transcriptomics analysis. Gene co-expression analysis indicated that the TFs AP2/ERF (109006724) and bHLH (108987327) were the hub genes of the blue module ( Figure 6B). Gene expression analysis showed that the expression trends of the two TFs were similar with those of the genes related to oil biosynthesis, that is, were highly expressed at G4-G5 ( Figure S5D). Moreover, AP2/ERF (109006724) and bHLH (108987327) were co-expressed with genes related to oil biosynthesis, such as ACC1 (109007141) (Figure 6C).

Discussion
Walnut (J. regia) is an important oil-bearing plant with high nutritional value. Recently, RNA-seq has been shown to be one of the effective methods of revealing the biological mechanism of lipid metabolism and oil accumulation in the oilseed crops. In our study, the oil contents of walnut seed kernels at different developmental stages growing at high altitude were analyzed. DEGs and TFs related to lipid biosynthesis and metabolism were screened and obtained through transcriptomics analysis.
In this study, the walnuts flowered in early May in the Qinghai Plateau, and the fruit was harvested in mid-September. The time from full blossom to fruit maturity was about 126 days. The walnut fruit growth period in the Qinghai Plateau in this research is shorter than that in Xinjiang, which was about 150 days in a site 1394 m above sea level [36]. The plant growth period is shorter with the increase in altitude. The sampling site of this study is at an elevation of 2102 m, and there is also a large temperature difference between day and night ( Figure S4). Therefore, we believe that the plateau environment led to the shorter growth period in this study. The fruit development, single fruit weight, nut weight, kernel weight, and fruit vertical and transverse diameters showed increasing trends in general ( Figure 1B,C). The single fruit weight and the fruit vertical and transverse diameters slightly decreased after reaching maximum values in the late developmental phase (G5), due to the moisture content decreasing during fruit development, accompanied by the decrease in single fruit weight and the fruit vertical and transverse diameters caused by dehydration and shrinkage of seed and pericarp.
In the process of walnut fruit development, the oil content showed a trend of slowfast-slow. The oil accumulated slowly in the early stages of fruit development, then accumulated rapidly at 70 to 110 DAF. Thereafter, the oil accumulated slowly until fruit ripening [37]. The oil bodies were first observed at 60 DAF in the embryo, and then the number of oil bodies gradually increased until the fruit ripening [36]. In the present study, the oil content increased gradually during fruit development and maturation, which was consistent with previous studies [37] on the dynamic accumulation of walnut oil. The mature walnut nut was made up of more than 69% oil. The FA composition of the mature walnut oil was further investigated, and we found more than 90% were UFAs and more than 70% were PUFAs ( Table 1). The contents of the two predominant compositions of SFAs were maintained at low levels. These results showed that the walnut kernel contained more PUFAs and fewer MUFAs, which was similar to previous descriptions of walnuts cultivated in other regions in China and in other countries (Italy [16], Turkey [38], New Zealand [11]). Therefore, walnuts from the Qinghai Plateau are good edible oil sources with high amounts of PUFAs. However, Geng et al. [39] reported that MUFA content was the highest in walnuts from Yunnan. Previous studies showed that FA composition and content are affected by growth environments [40,41]. It has been found that the content of 18:3 in soybean was the most vulnerable to environmental changes [41]. Temperature is one of the essential environmental factors affecting plant metabolism. Therefore, we speculate that the temperature of the sampling site affected the UFA content. In this study, HSPs were highly expressed at G1, G2, and G5, and this may be caused by the large fluctuation of day and night temperature in Qinghai. How walnut plants adapt to large diurnal temperature fluctuations to protect seed development and oil accumulation needs more research in the future.
In this study, 15 cDNA libraries for transcriptome sequencing of walnut seed kernels at five developmental stages were constructed, more than 97% of clean reads in each library mapped to the walnut reference genome. We obtained four sets of DEGs between the different seed development stages. There were more downregulated genes than upregulated genes in lipid synthesis pathways. We focused on FA biosynthesis and oil accumulation in the developing walnut seed kernels, which involves FA biosynthesis, TAG assembly and lipid storage.
The 18C UFAs are important constituents of vegetable oils (e.g., camellia, hickory, walnut), and the regulatory mechanisms are different among plants. With dehydrogenation by SAD, 18:0-ACP is transformed to 18:1-ACP in the plastid, and then 18:1-ACP is transformed to 18:2-ACP and 18:3-ACP in the plastid by FAD6 and FAD7/8, respectively. In another pathway, 18:1-ACP is transformed to 18:1-PC by a series of enzymes, which is desaturated by FAD2 and FAD3 to form 18:2-PC and 18:3-PC in the ER. The results of studies on mutation, overexpression, or heterotopic expression of these genes showed that they have an important influence on oil accumulation or FA compositions. The 18:1 FA is dominant in Camellia oleifera and Carya cathayensis Sarg. [42,43], and the perfect coordination of high SAD levels with low FAD2 levels enhanced the accumulation of 18:1. High 18:2 accumulation in the seeds of Artemisia sphaerocephala and Gossypium hirsutum L. is due to the high expression of FAD2 and the low expression of FAD3 [44,45]. Previous studies indicated that the expression of FAD3 and FAD7/8 are variable in plants with high 18:3 content. The upregulation of FAD3 was associated with 18:3 accumulation in seeds of Paeonia ostii and Linum usitatissimum [46,47]. In addition, the accumulation of 18:3 was associated with the expression of FAD2 and FAD3 in Perilla frutescens seeds [48].
It has been reported that J. regia L is rich in PUFAs, which may be due to high expression levels of FAD2 and FAD3 [29]. The results of subcellular localization confirmed that JrFAD3 plays a part in the ER rather than the plastid [29]. However, what causes high 18:2 and low 18:3 is still unknown.
In this study, UFAs accounted for more than 90% of the total FAs, and more than 70% were PUFAs (Table 1). Four, two, three, one and two DEGs were identified as SAD, FAD2, FAD3, FAD6 and FAD7/8, respectively. The transcriptional levels of SAD, FAD2 and FAD3 were high, while FAD6 and FAD7/8 were less expressed in developing walnut seeds, which was consistent with previous results [29]. In our work, the expression levels of SAD, FAD2, and FAD3 were high from G1 to G4 ( Figure 4C), and reached their peak at G4. This trend is consistent with oil accumulation, where the oil continuously and rapidly increased at G1-G4. However, the expression levels of FAD7/8 (108994930) were less than one during walnut seed development ( Figure 4C, Table S4). Based on this, we speculated that 18:2 and 18:3 in walnut seeds were mainly produced by the highly expressed FAD2 and FAD3 in the ER rather than by FAD6 and FAD7/8 in plastids.
DGAT and PDAT are essential enzymes for TAG biosynthesis, and their contributions to TAG assembly varies in different species. Studies show that PDAT exhibits higher expression levels than DGAT in Carthamus tinctorius L., G. hirsutum L., J. regia L., and P. ostia. This indicates that PDAT might play a more important role in the process of TAG biosynthesis [29,44,46,49]. In Torreya grandis kernels, PDAT showed a higher correlation with oil content than DGAT, indicating that PDAT contributed more to the accumulation of oil than DGAT [50]. However, it was found that DGAT was more important for the biosynthesis of TAG in many oilseeds, such as Brassica napus and Paeonia lactiflora [51,52]. Furthermore, PDAT and DGAT simultaneously regulated TAG biosynthesis in A. sphaerocephala [45]. The expression level of PDAT in walnut is much higher than that of DGAT1 and DGAT2, and PDAT is highly expressed at 63−133 DAP [29]. From this study, two PDATs (109000668, 109008819) exhibited higher expression levels than the others during the seed development. One PDAT (109000668) was upregulated continuously at G1-G4 then downregulated at G5. This expression trend of PDAT was similar to that of PDAT reported by Huang et al. [29]. Further, in our work, the expression level of DGAT (109011752) was much higher than that of the other two DGATs (109009388, 109009971) during seed kernel development, and reached a peak at G5. These results indicated that it might be that PDAT and DGAT simultaneously regulate TAG biosynthesis in walnut.
It has been reported that oleosins controlling oil body structure and lipid accumulation are important proteins in seed [53]. In this study, the OLEs were highly expressed from G1 to G4, and were expressed at much higher levels than the CLOs and STEs. The expression trend of OLEs, CLO (10893007) and STE (108984079) were consistent with oil accumulation. Thus, they might play an important role in lipid accumulation, and high expression levels of OLEs may be closely related to high oil content of walnut kernels.
Previous studies have shown that TFs, such as LEC, WRI1, FUS3 and ABI3, play roles in seed oil synthesis and deposition, but this varies from species to species. WRI1 and NF-YB6 were considered to be elite TFs in the regulation of lipid metabolism in G. hirsutum L. [44]. WRI1 and FUS3 may be crucial in the regulation of lipid biosynthesis in the kernel of P. ostii and T. grandis [46,50]. In walnut cultivar "Linzaoxiang", the expression pattern of WRI1 aligned with the accumulation of oil, and WRI1 may play an important role in the synthesis of walnut oil [29]. Wang et al. [54] identified five important TFs (WRI1, ABI3, FUS3, PKL and VAL1) which might highly regulate ACCase, KASII, LACS, FAD3 and LPAAT. In this study, the results of gene co-expression analysis showed that AP2/ERF and bHLH were the key TFs for the walnut oil accumulation during seed kernel development.
The differential activity of one or more enzymes in each step might lead to the variation in the oil content in developing walnut seeds. Thus, it would be extremely important to study the genes related to oil content and FA composition and content of walnut seed varieties, in respect to copy numbers, allelic combination, transcriptional and post-translational regulations [48]. The oil synthesis is a complicated process that involves a series of enzymes and molecules. Some important genes differentially expressed in the developing walnut seed kernels were identified. Additionally, some TFs that might be related to the FA biosynthesis and oil accumulation of J. regia were identified. Nevertheless, the regulating mechanism of oil accumulation in developing seed kernels is still unclear. Further studies on their functions in FA biosynthesis and oil accumulation are required.

Plant Materials
All of the fruits were collected from a local walnut variety (30-year-old walnut trees), which is located in Jianzha county (101 • 57 E, 36 • 01 N, elevation 2102 m, Qinghai province, China). The annual average temperature is 8.3 • C, the annual average precipitation is 350~400 mm, and the annual sunshine hours are around 2500 h.

Sampling and Fruit Growth Analysis
Samples were collected every 7~10 days during the seed development in 2019. The fruits were collected from areas with the same sunlight conditions. Walnut fruits were harvested on 12 July, 19 July, 26 July, 5 August, 9 August, 16 August, 23 August, 3 September,11 September,and 20 September (65,72,79,89,93,100,107,118,126, and 135 DAF, respectively). According to the results of the pilot study, due to the low oil extraction rate at the early stages, the oil content was continuously analyzed from early July to mid-September. According to the embryo morphology development and oil content, the seed kernels collected at 72, 79, 93, 118 and 135 DAF (designated as G1, G2, G3, G4 and G5, respectively) ( Figure 1A) were chosen for RNA-seq analysis. The walnut husk and hull were removed quickly, and the peeled walnut kernel samples for transcriptome sequencing and qRT-PCR were immediately frozen in liquid nitrogen and then taken back to the laboratory, and stored in a refrigerator at −80 • C until use. All of the samples were collected with three biological replicates. At the same time, the vertical and transverse diameter of walnut fruit were measured using vernier calipers. One percent electronic balance was used to determine the weight of fruit, nut, and seed kernel. The mean values of 20 fruits were calculated. Meanwhile, the fruit morphology was recorded using a camera (Canon PowerShot SX50 HS, Oita Prefecture, Japan).

Kernel Oil Content and Fatty Acid Composition Detection
The walnut seed kernels dried to constant weight were ground to a homogenized powder. Then, 1.50 g of the powder was put in a water-free and oil-free filter paper bag used to extract crude fat using a Soxhlet apparatus at 75 • C for 8 h with anhydrous ether (boiling point 34.5 • C) as the extractant. Fatty acid methyl esters (FAMEs) were assayed by gas chromatography with flame ionization detection (GC-FID) and DB-23 chromatographic column (Agilent Technologies, Santa Clara, CA, USA) using the method described by Poggetti et al. [16]. Relative percentage of FAs was calculated according to the peak areas. All analyses were carried out in triplicate. The data were expressed as mean ± standard error (SE).

Total RNA Isolation, Transcriptome Sequencing Library Construction, and the Next Generation Sequencing
The total RNA of the walnut kernel samples was extracted using a Plant RNA Kit (OMEGA Bio-Tek, Norcross, GA, USA) according to the instructions. Three methods were used to detect RNA quality and quantity: agarose gel electrophoresis was used to detect the RNA degradation and contamination, the Nano Photometer ® spectro photometer (IMPLEN, Munich, Germany) was used to check RNA purity, and the Bioanalyzer 2100 system (Agilent Technologies, CA, USA) was used to check RNA integrity. A transcriptome sequencing library was constructed using a NEBNext ® Ultra TM RNA Library Prep Kit for Illumina ® (Illumina, San Diego, CA, USA) following the producer's recommended technology process. The library quality was assessed on the Agilent Bioanalyzer 2100 system. Illumina sequencing was performed at Novogene Bioinformatics Technology Co., Ltd., Beijing, China. The library preparations were sequenced on the Illumina Novaseq 6000 platform and 150 bp paired-end reads were generated.

Analysis of Transcriptome, Quality Control and Clean Reads Mapping to the Reference Genome
All sequencing results were assessed for quality control by removing reads containing adapter, reads containing ploy-N, and low-quality reads from the raw data, and thus we obtained clean data that were used for downstream analyses. All clean data were mapped to the walnut genome reference sequence (GenBank assembly accession: GCF_001411555.1) using Hisat2 v2.0.5 [55].

Quantification of Gene Expression Level and Differential Expression Analysis
The reads numbers mapped to each gene were counted using feature Counts v1.5.0-p3 [56]. FPKM values for each gene were calculated based on the length of the gene and reads count were mapped to this gene, and were used for estimating gene expression levels [57]. Differential expression analysis of five groups (three biological replicates per group) was performed using the DESeq2 R package (1.16.1) [58]. The Benjamini and Hochberg's approach was used to adjust the resulting p-values for controlling the false discovery rate [59]. Genes with an adjusted p-value < 0.05 found by DESeq2 were said to be differentially expressed.

GO and KEGG Enrichment of DEGs
The clusterProfiler R package was used to analyze the DEGs and correct the gene length bias. GO terms with corrected p-values less than 0.05 were considered significantly enriched by DEGs. The clusterProfiler R package was used to test the statistical enrichment of DEGs in KEGG pathways [60].

Quantitative Real-Time PCR and Correlation Analyses
The total RNA was extracted using a TaKaRa MiniBEST Plant RNA Extraction Kit according to the manufacturer's instructions, and cDNA was synthesized from the total RNA (500 ng) using TaKaRa PrimeScript TM RT Master Mix (Takara Biotechnology Co., Ltd., Dalian, China) to reach 10 µL total volume following the instructions (Takara). The qRT-PCR was performed with TaKaRa TB Green ® Premix Ex Taq TM II (Tli RNaseH Plus) (Takara Biotechnology Co., Ltd., Dalian, China) according to the manufacturer's instructions, and was performed on a CFX Connect TM Real-Time System (Bio-Rad Laboratories, CA, USA). As the housekeeping gene, walnut GAPDH was used as the reference gene [29]. The primers for qRT-PCR were designed with Oligo software (Table S5). The relative expression level of the target genes was calculated by 2 −∆∆CT method [61]. Pearson correlation analysis of the expression levels between qRT-PCR and RNA-seq was conducted by GraphPad Prism 6.0., and p ≤ 0.01 was the threshold for statistical significance (**).

Conclusions
In this study, the oil contents of J. regia seed kernels from the Qinghai Plateau in different developmental stages were measured. The results indicated that walnut oils increased dramatically in late July and reached the maximum value of 69% in mature seeds. The 18:2 was the predominant UFA accumulated in mature seeds. The transcriptome of J. regia at five developmental stages was sequenced and annotated using the Illumina RNA-seq technology. Through transcriptomics profiling, four sets of DEGs in the different seed development stages were obtained. Compared to G1, the number of DEGs increased with the seed kernel development. The DEGs related to lipid biosynthesis and metabolism were screened by the DESeq method. We counted the number of DEGs associated with the lipid metabolic and oil accumulation. The key regulatory enzymes were identified, and they may play crucial roles in FA biosynthesis and oil accumulation in walnut seed kernels. FA de novo biosynthesis-related genes, including ACCase, MAT, KAS, KAR, HAD, EAR, FATB and FATA, were highly expressed from G1 to G4 stage. LACS, KCS, KCR, HCD, ECR, SAD, and FADs were related to FA elongation and desaturation. GPAT, LPLAT, PAP, PLA2, DGAT, PDAT, and PDCT were involved in TAG assembly pathway. The synergy of these genes promoted oil synthesis, and highly expressed oleosins, caleosins and steroleosins may be important for timely energy reserve in oil bodies. This transcriptome data will enrich public databases and provide new insights into functional genes related to the seed kernel lipid metabolism and oil accumulation in J. regia. Heat shock protein and heat stress TFs may protect walnuts from damage caused by large temperature differences and provide a guarantee for high lipid content. WGCNA showed that AP2/ERF and bHLH were the key TFs in lipid biosynthesis in walnut seed kernels, and were co-expressed with ACC1, α-CT, BCCP, MAT, KASII, LACS, FATA, and PDCT. Our results will serve as a basis to investigate regulation networks of J. regia to clarify the molecular mechanisms of oil accumulation process and to accelerate the genetic engineering to increase seed oil content and quality. Our results may also provide a useful reference for the molecular breeding of woody oil plants.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/plants11233207/s1, Figure S1: Distribution of reads in different regions of the genome at five developmental stages, Figure S2: Gene ontology classification of the walnut transcriptome data, Figure S3: Bubble diagram of enrichment of KEGG pathway, Figure S4: Temperature variation in the process of walnut seed development, Figure S5: Weighted gene coexpression network analysis of genes and the expression of TF at different walnut seed development stages; Table S1: The morphological and physiological characteristics, oil accumulation in the process of walnut embryo development, Table S2: The mapping result of each sample, Table S3. Distribution of sequencing reads in genome regions, Table S4: Detailed information regarding gene annotation and FPKM values of genes in developing walnut seeds, Table S5: The designed primers of the key enzymes involved in lipid biosynthesis and metabolism for qRT-PCR, Table S6