1. Introduction
Carya illinoinensis (Wangenh.) K. Koch is a plant of the genus
Carya Nutt. in the
Juglandaceae family, also named the American pecan or long pecan. The dried fruit product is called Bigen fruit, also known as the longevity fruit, and is currently one of the most famous dried fruit oil species worldwide [
1]. North American Indians have eaten pecan for centuries, and it is the only commercially important nut species native to North America [
2]. Pecan production, at 122,500 tons, means that it was the sixth-largest tree nut in the world in 2018 [
3]. The fruit can be sold whole, in the shell or shelled, or sold as flakes or crushed nuts, of which the kernel is usually used to make desserts, sweets, ice cream, and breakfast cereals [
4]. Pecan nuts are rich in unsaturated fatty acids, and eating nuts with skins can also supplement cellulose in the human body [
5]. The high content and large proportion of phospholipids and glycerolipids in mature pecan kernels provide a theoretical basis for the processing and utilization of plant and edible oils. The characteristics of being rich in triacylglycerol (TG), phosphatidylcholine, and other lipids in various mature pecan cultivars give them unique potential in food nutrition and health care [
6].
Fatty acids (FAs) are a group of aliphatic carboxylic acid compounds composed of carbon, hydrogen, and oxygen. According to whether the hydrocarbon chain is saturated, FAs can be divided into saturated fatty acids (SFAs) and unsaturated fatty acids (UFAs). SFAs have no double-bond unsaturated hydrocarbon chain of. According to the number of unsaturated bonds in their hydrocarbon chain, SFAs can be divided into single unsaturated fatty acids (MUFAs) and polyunsaturated fatty acids (PUFAs). The human body can synthesize MUFAs by itself, but not PUFAs, and human physiology shows that polyunsaturated fatty acids are essential [
7]. PUFAs are indispensable vital nutrients in the process of human growth and development. Plant oils and marine creatures are necessary for the human body to obtain PUFA diameters [
8]. In higher plants, lipid synthesis can be mainly divided into three stages: First, fatty acids are synthesized in the plastids. Then, free fatty acids are assembled in the endoplasmic reticulum to synthesize TGs. Finally, TGs are encapsulated and bound by oil droplet proteins to form oil droplets, which are stored in the organelles of oil droplets in the form of microsomes. Fatty acids are usually found in plant seeds in the triacylglyceride (TAG) bond form (grease); meanwhile, in no-seed oil crops such as olive (
Canarium album) and palm (
Trachycarpus fortunei), fatty acids accumulate in the fleshy peel of the fruit [
9]. The leaves of plants or other vegetative tissues may also accumulate a small amount of TAG [
10]. Fatty acids are energy sources for the human body. Cells use glucose or free fatty acids for phospholipid and sphingolipid biosynthesis. Phospholipids and sheath fat play an essential role in cell signal transduction and are the main elements of the cell membrane [
11]. Under low-temperature stress, the cell membrane changes its state from a liquid phase to a gel phase, slowing down the metabolism of the body. Consequently, cold-sensitive plants suffer from injury or death [
12]. Additionally, many plant lipids or their metabolic derivatives have certain biological activities, which are closely related to cell recognition, specificity and tissue immunity [
13].
Wang used thin-film drilling–vacuum filtration technology [
14], and Geng used surfactant and salt-aided aqueous extraction technology to extract walnut oil [
15]. Jia used the comparative transcriptome analysis of pecan (female and male inflorescences) to enhance understanding of the gene specialization of flowers of different sexes [
16]. However, the genes related to fatty acid synthesis in pecan kernels remain unknown, and there are few studies on the changes in their components during development. In this research, through the identification of ‘Mahan’ pecan fatty acid composition and the changing trends observed in the kernel, RNA-Seq was used to analyze the transcriptome patterns of the ‘Mahan’ kernel at 80 days, 90 days, 110 days, and 130 days after anthesis. The analysis results provide an overview of the complete development process of the ‘Mahan’ pecan in fatty acids into a molecular control network. These differentially expressed genes could be candidate genes for further functional verification, providing potential gene resources for the genetic improvement of pecan and the promotion of pecan breeding work.
2. Materials and Methods
2.1. Plant Material and Treatment
The plant materials used were the fruits of the ‘Mahan’ pecan, collected in Heyue Garden, Yangzhou Baoying County, Jiangsu Province, China (N 33°02′46″~33°24′55″, E 119°07′43″~119°42′51″). The trees with a good growth status and development condition and a relatively consistent tree potential were selected for marking. The samples were collected eight times from 50 to 140 days after anthesis, and the full and substantial fruits without obvious diseases and pests were selected for each plant. After sampling, water was used to flush the dust from the pecan surface, half of the fruits were placed into an ice box and the others into liquid nitrogen, and they were taken back to the laboratory quickly. The former sample was photographed in transverse and longitudinal sections using a hammer to cut out seeds and mixed samples. Next, the sample was added to a −20 °C refrigerator until testing. Another part of the sample was added to a −80 °C refrigerator and set aside. Three biological replicates were set up for each experiment, and five fruits were measured for each biological replicate.
2.2. Measurement of Biochemical Parameters
The Soxhlet extraction method was used to extract the pecan oil. Next, the seeds were put in the oven at 105 °C and dried to a constant weight. Later, petroleum ether was added to the Soxhlet extractor, placing a filter paper cartridge containing the sample in the extraction bottle. Next, this was heated to 80–85 °C for 6–8 h to extract the colorless transparent liquid in the bottle. Next, the round-bottom flask was removed, and the oil ether mixture was rotationally evaporated to a constant weight, keeping the light-yellow transparent liquid in the bottle as the pecan oil.
The gas chromatography–mass spectrometry (GC-MS) method was used to identify the fatty acid components. GC-MS model: Trace GC DSQII GC instrument (chromatographic column for HP-5MS, 30.0 m × 0.25 mm × 0.25 μm). The chromatographic conditions were as follows: the injection port temperature was 250 °C, helium was used as the carrier gas, and the flow rate was 1.0 mL·min−1. The procedures were performed at a temperature of 50 °C and maintained for 2 min. Next, the sample was maintained at 4 °C/min. With an increased speed, the temperature was increased to 200 °C and maintained for 5 min. Finally, at 5 °C/min, the speed increased to 220 °C for 20 min. The mass spectrometry conditions were as follows: electron impact ion source, 70 eV, electronic energy spectrum scan range, 30–450 amu, and full-scan mode.
Later, a −80 °C refrigerator was used at 80, 95, 110, and 130 days after the flowering of the pecan nut samples. Samples were transported on dry ice to Biomarker Technologies Co., Ltd. (Beijing, China) for transcriptome sequencing, and three repeats were set in each period. SPSS 26, Excel 2016, and Origin 2018 software were used for data processing and mapping analysis.
2.3. RNA Extraction, Library Construction, and Sequencing
The Biomarker Plant Total RNA Isolation Kit (polysaccharides and polyphenolics-rich) was used to extract the total RNA of the four different development periods of the ‘Mahan’ pecan kernel. The NanoDrop 2000 (Thermo Scientific, Waltham, MA, USA) spectrophotometer was used to inspect the purity and concentration of RNA. The purity, concentration, and integrity of the RNA samples were examined using NanoDrop, Qubit 2.0 (Thermo Scientific, Waltham, MA, USA) and Agilent 2100 (Agilent, Santa Clara, CA, USA). Only RNA with an adequate quality could move on to the following procedures. Qualified RNA was processed for library construction. The procedures were as follows: (1) mRNA was isolated using oligo(dT)-attached magnetic beads. (2) The mRNA was then randomly fragmented in a fragmentation buffer. (3) First-strand cDNA was synthesized, with fragmented mRNA as a template and random hexamers as primers, followed by second-strand synthesis with the addition of PCR buffer, dNTPs, RNase H, and DNA polymerase I. The purification of cDNA was performed using AMPure XP beads. (4) Double-strand cDNA was subjected to end repair. Adenosine was added to the end and ligated to the adapters. AMPure XP beads were applied here to select fragments within the 300–400 bp size range. (5) The cDNA library was obtained via certain rounds of PCR on the cDNA fragments generated during step 4. Qubit 2.0 and Agilent 2100 were used to examine the concentration of the cDNA and the insert size to ensure library quality. Q-PCR was performed to obtain a more accurate library concentration. A library with a concentration larger than 2 nM was acceptable. The qualified library was pooled based on the pre-designed target data volume and then sequenced on the Illumina (San Diego, CA, USA) sequencing platform. After the sequencing data were offline, the bioinformatics analysis process provided by BMKCloud (
www.biocloud.net accessed on 10 March 2023) was used for the data analysis.
2.4. Bioinformatics Analysis of RNA-Seq Data
Based on sequencing-by-synthesis (Sequencing By Synthesis, SBS) technology, cDNA libraries were sequenced on the high-throughput platform of Illumina, generating significant amounts of high-quality data known as raw data. It is crucial to ensure the quality of the read before moving on to the following analysis. This is because raw data contains useless data, such as primers and adapters, which must be removed before analysis. The data quality control procedures were as follows: (1) adapter contaminations were trimmed, and (2) nucleotides with a low-quality score were removed. The data processed via the above steps were named “clean data”.
HISAT2 [
17] is a highly efficient system for mapping RNA-seq reads, and is a more advanced version of TopHat2/Bowtie2. HISAT2 uses a Burrows–Wheeler transform and a Ferragina–Manzini (FM) index-based search. HISAT2 uses one global graph FM index (GFM) to represent the general population, and small indexes (local indexes) combined with several alignment strategies to achieve more efficient alignment. StringTie [
18] was applied to assemble the mapped reads. The algorithm was established based on optimality theory. It utilizes a novel network flow algorithm and an optional de novo assembly step to assemble and quantify transcripts representing the multiple spliced variants for each gene locus. The discovery of novel transcripts and genes was achieved using StringTie, based on the reference genome, to optimize the annotation information of a genome. The mapped reads were assembled and compared with the original annotations of the genome. The transcript regions without annotation obtained using the above processes were novel transcripts.
Novel genes were annotated using DIAMOND [
19] against databases including the Non-Redundant Protein Sequence Database (NR) [
20], Swiss-Prot [
21], Clusters of Orthologous Groups of proteins (COG) [
22], Clusters of orthologous groups for eukaryotic complete genomes (KOG) [
23], and Kyoto Encyclopedia of Genes and Genomes (KEGG) [
24]. The KEGG orthology of novel genes was obtained using the above processes. The Gene Ontology (GO) [
25] orthology of novel genes was obtained using the underlying software InterProScan [
26], based on the InterPro database. The amino acid sequences of novel genes were blasted against the Pfam [
27] database using HMMER [
28] to gain annotation information.
The number of fragments from a transcript is affected by the sequencing Jones, P data volume (or number of mapped reads), the length of the transcript, and the expression level of transcripts. The number of mapped reads must be normalized according to the size of the transcripts in order to reveal the expression level of each transcript more accurately. Fragments per kilobase of transcript per million fragments mapped (FPKM) were applied to measure the expression level of a gene or transcript using the StringTie maximum flow algorithm [
29].
The expression of a gene can be influenced by both external stimuli and the internal environment, which are highly temporal-specific and tissue-specific. The genes that expressed significantly differently under different conditions, such as treatment vs. control, wild type vs. mutants, different time points, and tissues, were defined as differentially expressed genes (DEGs). The collection of genes that is acquired in differential expression analysis is called a DEG set. Similarly, transcripts with significantly different expression levels are named differentially expressed transcripts (DETs). For the experiments with biological replicates, differential expression analysis was processed using DESeq2 [
30]. The criteria for differentially expressed genes were set as a fold-change (FC) ≥ 2 and a false discovery rate (FDR) < 0.05. FC refers to the ratio of gene expression in two samples. FDR refers to the adjusted
p-value and is used to measure the significance of the difference.
2.5. Validation of RNA-Seq Data by qRT-PCR
Eight genes were selected from the significantly enriched DEGs in the fatty acid anabolic pathways for real-time quantitative PCR (qRT-PCR) analysis. Specific primers were designed through the genscript online website (
https://primer3.ut.ee/ accessed on 5 May 2023), and the selection of the CiActin reference gene was in reference to Mo [
31]. qRT-PCR treatment was performed using the SYBR Green PCR Master Mix (Takara, Japan).Using the iQ
TM 5 multicolor Real-Time PCR detection system (Bio-Rad, Hercules, CA, USA) to analyze the reaction after the dissolution curve analysis. The relative gene expression was calculated using the 2
−ΔΔCT method. Each gene analysis was repeated three times.
4. Conclusions
In this study, we comprehensively analyzed the changes in fatty acid composition during the fruit development of pecan fruits. The results showed that at the physiological level, the oil accumulation of the ‘Mahan’ kernel followed an ‘M’-shaped curve with the development of the fruit, and that the fatty acid fractions from high to low were oleic acid, linoleic acid, palmitic acid, stearic acid, and linolenic acid. At the molecular level, a total of 83.82 Gb of clean data was annotated using RNA-seq from 80, 95, 110, and 130 days after flowering, 5376 new genes were discovered, and 2761 new genes were annotated in at least one database. SAD and FAD2 were significantly upregulated from 80 to 95 and from 95 to 110 days after flowering, and downregulated from 110 to 130 days after flowering. These DEGs were enriched in fatty acid biosynthesis, elongation, and degradation. These results indicate that these genes play an essential role in fatty acid accumulation in pecan. The synthesis mechanism of high oil and unsaturated fatty acids in pecan kernels was revealed using RNA-Seq. The changes in the gene expression levels were analyzed, which is expected to provide a theoretical reference for the analysis of plant oil synthesis mechanisms, enrich the research content regarding oil synthesis, and provide potential gene resources for further academic research and the genetic improvement of pecan to promote pecan breeding. The excavated genes were not further analyzed in this paper. In future work, the functions of related genes can be verified via overexpression analysis and gene silencing. Yeast one-hybrid and dual-luciferase assays were used to verify the interaction between genes, and yeast two-hybrid was used to verify the protein interaction, so as to analyze the related network of pecan fatty acid synthesis.