Integrated Metabolomic and Transcriptomic Analysis to Characterize Cutin Biosynthesis between Low- and High-Cutin Genotypes of Capsicum chinense Jacq

Habanero peppers constantly face biotic and abiotic stresses such as pathogen/pest infections, extreme temperature, drought and UV radiation. In addition, the fruit cutin lipid composition plays an important role in post-harvest water loss rates, which in turn causes shriveling and reduced fruit quality and storage. In this study, we integrated metabolome and transcriptome profiling pertaining to cutin in two habanero genotypes: PI 224448 and PI 257145. The fruits were selected by the waxy or glossy phenotype on their surfaces. Metabolomics analysis showed a significant variation in cutin composition, with about 6-fold higher cutin in PI 257145 than PI 224448. It also revealed that 10,16-dihydroxy hexadecanoic acid is the most abundant monomer in PI 257145. Transcriptomic analysis of high-cutin PI 257145 and low-cutin PI 224448 resulted in the identification of 2703 statistically significant differentially expressed genes, including 1693 genes upregulated and 1010 downregulated in high-cutin PI 257145. Genes and transcription factors such as GDSL lipase, glycerol-3 phosphate acyltransferase 6, long-chain acyltransferase 2, cytochrome P450 86A/77A, SHN1, ANL2 and HDG1 highly contributed to the high cutin content in PI 257145. We predicted a putative cutin biosynthetic pathway for habanero peppers based on deep transcriptome analysis. This is the first study of the transcriptome and metabolome pertaining to cutin in habanero peppers. These analyses improve our knowledge of the molecular mechanisms regulating the accumulation of cutin in habanero pepper fruits. These resources can be built on for developing cultivars with high cutin content that show resistance to biotic and abiotic stresses with superior postharvest appearance.


Introduction
Hot peppers (Capsicum chinense Jacq.), popularly known as habaneros, are domesticated from tropical regions of Central America and have great economic significance in terms of culinary, pharmaceutical and ornamental perspectives. Their fruits are a good source of vitamins, antioxidants and other phytonutrients, including major important alkaloids such as phenolics, carotenoids, flavonoids and capsaicinoids [1,2]. Habanero pepper fruits are subjected to desiccation and postharvest wilting because of their hollow shape and limited water-holding capacity. The abundance of cuticle in the pericarp can resist desiccation, but pepper cuticle development is not yet well understood [3,4]. Cuticle is known to play a critical role in plant survival because its primary physiological function is as a sealant around plant tissues to protect against drought conditions and prevent desiccation by reducing nonstomatal water loss [5][6][7]. The cuticle structure is diverse among different species but is made up of a polyester cutin that is covered with waxes (intracuticular and epicuticular). Cuticle, a hydrophobic polymer synthesized by the epidermal cells, is a major physiological trait acquired by plants during evolution for survival in dehydrated conditions. It also coordinates the interaction between a plant and its environment by limiting UV radiation and mechanical damage and is a defense against pathogen entry. In terms of chemical composition, the cuticle is a polyester matrix of cutin embedded with waxes [8].
Cutin is the major constituent of the plant cuticle and makes up about 80% of the plant cuticle. It is an insoluble, covalently cross-linked polymer that is synthesized by epidermal cells in higher plants [5,6,9]. Cutin is made up of organic chemicals that include glycerol, hydroxylated fatty acids and hydroxylated epoxy compounds with carbon atom chains of lengths 16 and 18 and phenolic compounds [5,[10][11][12][13]. Cutin composition and its genetic basis have been studied in model plants such as Arabidopsis, tomato and rice [14][15][16][17]. In Arabidopsis, several genes including GPAT6, GDSL lipase, LACS, CYP86A, CYP77A, ABCG32 and ABCG11 involved in cutin initiation and development have been identified [7,18,19]. In peppers, Parsons et al. [4] reported that the cuticle of Capsicum annum fruit show variations in composition among species and cultivars [4,7,20]. Additionally, cuticle composition varies across pepper cultivars, which in turn affects the response to biotic and abiotic stresses [21]. However, a better understanding of the molecular basis of this monomer composition is important for using cutin for crop improvement in pepper [4].
Recent progress in "omics" approaches is being utilized for tracking the metabolites and genes involved in cutin biosynthesis, transport and regulation in plant tissues [4,19,22]. Owing to the widely proposed significance of cuticle in plant physiology and metabolism, the metabolite profile of cutin has been explored extensively in the model plant Arabidopsis and other crops such as barley, tomato, rice and maize [15,17,23]. Different aspects of cuticle biosynthesis have been considered in Arabidopsis and tomato fruits [5,12,13], however, there are no studies reported in habanero peppers in terms of whole fruit transcriptome and metabolome to understand cutin accumulation and metabolism. Hence, the current study aimed to understand cutin biosynthesis in habanero peppers by taking advantage of integrated RNA-Seq and metabolome analysis to study cutin biosynthesis in fruit tissues of diverse C. chinense genotypes. Here, we used gas chromatography-mass spectrometry (GC-MS) of two different habanero peppers, PI 224448 from Costa Rica and PI 257145 from Peru, for metabolome analysis to study cutin composition across genotypes. We also performed deep paired-end RNA-Seq of the two samples by using the Illumina Nextseq 500 platform to identify differentially expressed genes (DEGs) and pathways associated with cutin and other traits by comparing PI 257145 and PI 224448. This is the first study to generate transcriptome and metabolome data pertaining to cutin in habanero peppers. These results can be used by plant breeders for hot pepper fruit quality improvement via biotechnological modifications and can also serve as a model for the other Solanaceae crops.

Metabolomic Analysis of Cutin Monomers
Raw cutin from two habanero genotypes, PI 224448 and PI 257145, were depolymerized in 3N methanolic hydrochloride (Me-OH-HCl), and cutin composition was analyzed and quantified by using GC-MS. The compositions of cutin monomers identified from the two habanero cultivars are given in Table 1. The cutin monomers from these genotypes mostly consisted of long-chain aliphatic ω-hydroxy acids, especially dihydroxy hexadecanoic acids, considered the most important component of most plant cutin materials, especially in fruits [7,24]. Parsons et al. [4] showed 16-fold differences in cutin monomer amounts between the most extreme accessions studied. Similar to this report, the metabolic analysis of cutin composition between our selected genotypes revealed significant variations in both total cutin monomer content and relative proportion of cutin. PI 257145 had the most abundant cutin content, with about 1284 mg/g dry weight (DW), and PI 224448 had the lowest cutin content, 232.4 mg/g DW. Total cutin composition and relative proportion of individual monomers varied between the two cultivars, with about 6-fold higher cutin content in PI 257145 versus PI 224448. Reports by Kissinger et al. [21] and Parsons et al. [4], showed that 10,16-dihydroxy hexadecanoic acid was the dominant cutin monomer with portions from 50% to 82% total cutin in Capsicum annum. Of note, our study showed a similar pattern, with PI 224448 having the lowest amount of dihydroxy hexadecanoic acid, about 114 mg/g DW (49%), and genotype PI 257145 showing the highest amount, 1060mg/g DW (83%). Among the octadecanoic acids, 9,10,12,13,18-pentahydroxy octadecanoic acid was dominant, with PI 257145 showing the highest amount, 35.3 mg/g DW. This compound was detected only in fruits, which suggests that they might play a major role in cutin composition of plants. Levels of p-coumaric acid, a phenolic compound, also showed significant variations between the two pepper genotypes. These variations between the samples provided a good background to investigate the cutin biosynthesis mechanisms by examining variations in expression of the some of the key players in this pathway. Data are mean ± standard deviation mg/g dry weight from three independent biological replications.

Fruit Transcriptome Sequencing and Analysis
Total RNA was isolated from the green fruit tissues from the two habanero pepper genotypes, PI 257145 (high cutin) and PI 224448 (low cutin). An RNA-Seq library was prepared for each genotype separately by using total RNA pooled from three biological replicates. The library was subjected to paired-end sequencing (2 × 75 bp) with Illumina NextSeq 500 platform (Illumina, USA). A total of 22,550,145 and 24,056,689 reads were generated for PI 257145 and PI 224448, respectively ( Table 2). The raw RNA sequencing data for the two genotypes were deposited in the Short Read Archive (SRA) database of NCBI with the accession numbers SRX6761116 and SRX6761113 for PI 257145 and PI 224448, respectively, under the bioproject PRJNA562491. The raw reads were subjected to stringent quality filtering, which resulted in 21,411,561 and 19,981,360 high-quality reads for PI 257145 and PI 224448, respectively. The Q30 percentage of reads in each library was ≥95%. The reads from the two genotypes were aligned to the C. chinense reference genome [25] by using the STAR universal RNA-Seq alignment tool with default parameters [26]. A total of 21,021,870 (98.18%) and 19,649,669 (98.34%) quality-filtered reads were mapped to the reference genomes for PI 257145 and PI 224448, respectively; 2% of the reads remained unmapped.

DEGs Between PI 257145 and PI 224448
The individual read count tables across genes for the two genotypes were created by genome alignment with the HTSeq R package and RSEM [27] with RPKM normalization. DEGs were identified by pair-wise combinations by comparing PI 257145 and PI 224448 with the use of NOISeq R/Bioc package [28] with three simulated replicates having a variability of 0.02 and CPM value 1. The DEGs were filtered based on the minimum Log2FC of 1 and p-value 0.9 as per the NOISeq R/Bioc package. A total of 2703 statistically significant DEGs were identified including 1693 upregulated and 1010 downregulated genes in PI 257145 versus PI 224448 ( Figure 1). The top 20 upregulated genes in PI 257145 versus PI 224448 are in Table 3. The top 50 differentially expressed genes between PI 224448 and PI 257145 based on the FPKM-normalized-Log10 transformed counts are in Figure 1.

Functional Annotation and Classification of DEGs
DEGs were annotated by using the BLASTx algorithm and nonredundant protein database at NCBI. Gene annotation and gene ontology (GO) enrichment analysis was performed with BLAST2GO (https://www.blast2go.com/). The DEGs were classified under the three major GO terms such as biological process, molecular function and cellular components. GO classification showed significant functions of the identified DEGs in PI 257145 versus PI 224448. A total of 1071 upregulated genes in PI 257145 were classified under top ten categories of biological process. The significant categories upregulated in PI 257145 included "cell wall biogenesis", "polysaccharide biosynthetic process", "carbohydrate metabolic process", "cell wall biogenesis", "cell wall organization",

Functional Annotation and Classification of DEGs
DEGs were annotated by using the BLASTx algorithm and nonredundant protein database at NCBI. Gene annotation and gene ontology (GO) enrichment analysis was performed with BLAST2GO (https://www.blast2go.com/). The DEGs were classified under the three major GO terms such as biological process, molecular function and cellular components. GO classification showed significant functions of the identified DEGs in PI 257145 versus PI 224448. A total of 1071 upregulated genes in PI 257145 were classified under top ten categories of biological process. The significant categories upregulated in PI 257145 included "cell wall biogenesis", "polysaccharide biosynthetic process", "carbohydrate metabolic process", "cell wall biogenesis", "cell wall organization", "response to hormone" and "external encapsulating structure organization". These categories are important for the structural stability of the fruits. The molecular function category included 941 DEGs with enriched terms of "oxidoreductase activity", "transferase activity" and "transmembrane receptor protein kinase activity". All these enriched molecular functions are important for fruit quality and plant defense. The major cellular components enriched in DEGs upregulated in PI 257145 included "cell periphery", "plasma membrane", "cell wall", "cell-cell junction", "plasmodesma" and "external encapsulating structure". All these cellular component terms enriched in DEGs upregulated in high-cutin PI 257145 are essentially involved in maintenance of cellular structure and fruit quality. The statistically enriched GO terms (false discovery rate (FDR) < 0.05) among the DEGs upregulated in PI 257145 versus PI 224448 are shown in Figure 2. important for the structural stability of the fruits. The molecular function category included 941 DEGs with enriched terms of "oxidoreductase activity", "transferase activity" and "transmembrane receptor protein kinase activity". All these enriched molecular functions are important for fruit quality and plant defense. The major cellular components enriched in DEGs upregulated in PI 257145 included "cell periphery", "plasma membrane", "cell wall", "cell-cell junction", "plasmodesma" and "external encapsulating structure". All these cellular component terms enriched in DEGs upregulated in high-cutin PI 257145 are essentially involved in maintenance of cellular structure and fruit quality. The statistically enriched GO terms (false discovery rate (FDR) < 0.05) among the DEGs upregulated in PI 257145 versus PI 224448 are shown in Figure 2.

Pathway Analysis of DEGs
Pathway analysis of DEGs involved using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database with KOBAS and MapMan. The DEGs upregulated (1693) and downregulated (1010) in PI 257145 versus PI 224448 were assigned to 100 and 97 pathways, respectively. KEGG pathway analysis shown that many of the upregulated genes are enriched in

Pathway Analysis of DEGs
Pathway analysis of DEGs involved using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database with KOBAS and MapMan. The DEGs upregulated (1693) and downregulated (1010) in PI 257145 versus PI 224448 were assigned to 100 and 97 pathways, respectively. KEGG pathway analysis shown that many of the upregulated genes are enriched in pathways relevant to cutin biosynthesis and its regulation [11]. The pathways enriched in upregulated genes of high-cutin PI 257145 were "phenylpropanoid biosynthesis", "plant hormone signal transduction", "oxidative phosphorylation", "biosynthesis of secondary metabolites", "linoleic acid metabolism", "cutin, suberine and wax biosynthesis", "fatty acid biosynthesis", "sesquiterpenoid and triterpenoid biosynthesis", "alpha-linolenic acid metabolism" and "brassinosteroid biosynthesis. The top 20 enriched KEGG pathways among upregulated and downregulated DEGs in PI 257145 compared to PI 224448 are shown in Figure 3. Pathway analysis using MapMan [29] showed differences in the activity of different cellular metabolisms between PI 257145 and PI 224448. Many of the DEGs involved in lipid metabolism and secondary metabolism were highly upregulated in PI 257145 ( Figure 4). Cutin composition and its genetic basis have been studied in model plants such as Arabidopsis, tomato and rice [14][15][16][17]23]. Studies have shown that cuticle composition varies across pepper cultivars and this variation in turn affects their responses to biotic and abiotic stresses [21,30]. The genes involved in management of biotic and abiotic stresses are highly upregulated in the high-cutin PI 257145 versus low-cutin PI 224448. The genes regulating redox state and TFs involved in regulating defense genes were upregulated in the high-cutin PI 257145 ( Figure 5). pathways relevant to cutin biosynthesis and its regulation [11]. The pathways enriched in upregulated genes of high-cutin PI 257145 were "phenylpropanoid biosynthesis", "plant hormone signal transduction", "oxidative phosphorylation", "biosynthesis of secondary metabolites", "linoleic acid metabolism", "cutin, suberine and wax biosynthesis", "fatty acid biosynthesis", "sesquiterpenoid and triterpenoid biosynthesis", "alpha-linolenic acid metabolism" and "brassinosteroid biosynthesis. The top 20 enriched KEGG pathways among upregulated and downregulated DEGs in PI 257145 compared to PI 224448 are shown in Figure 3. Pathway analysis using MapMan [29] showed differences in the activity of different cellular metabolisms between PI 257145 and PI 224448. Many of the DEGs involved in lipid metabolism and secondary metabolism were highly upregulated in PI 257145 ( Figure 4). Cutin composition and its genetic basis have been studied in model plants such as Arabidopsis, tomato and rice [14][15][16][17]23]. Studies have shown that cuticle composition varies across pepper cultivars and this variation in turn affects their responses to biotic and abiotic stresses [21,30]. The genes involved in management of biotic and abiotic stresses are highly upregulated in the high-cutin PI 257145 versus low-cutin PI 224448. The genes regulating redox state and TFs involved in regulating defense genes were upregulated in the high-cutin PI 257145 ( Figure 5).

Functional Network Analysis of DEGs
Ten functional network clusters were obtained, including response to organic substance, signal transduction, multicellular organism development, cell wall biogenesis, lipid metabolic process, cell surface receptor signaling pathway, phenylpropanoid metabolic process, monocaboxylic acid metabolic process, seed oil body biosynthesis and lipid localization ( Figure 6A). Most of the functional network groups were well connected to the enzymes and proteins involved in cutin biosynthesis and its regulation. Figure 6B shows the network representing an interaction between cutin genes and TFs. Bar plots are used to denote the gene expression profiles between the two genotypes and show DEGs between high-and low-cutin habanero pepper based on FPKM.

Functional Network Analysis of DEGs
Ten functional network clusters were obtained, including response to organic substance, signal transduction, multicellular organism development, cell wall biogenesis, lipid metabolic process, cell surface receptor signaling pathway, phenylpropanoid metabolic process, monocaboxylic acid metabolic process, seed oil body biosynthesis and lipid localization ( Figure 6A). Most of the functional network groups were well connected to the enzymes and proteins involved in cutin biosynthesis and its regulation. Figure 6B shows the network representing an interaction between cutin genes and TFs. Bar plots are used to denote the gene expression profiles between the two genotypes and show DEGs between high-and low-cutin habanero pepper based on FPKM. were considered important positive regulators of cutin biosynthesis [11,31,32]. These TFs playing an important role in regulating cutin biosynthesis were highly upregulated in high-cutin genotype PI 257145 versus low-cutin genotype PI 224448.

Functional Network Analysis of DEGs
Ten functional network clusters were obtained, including response to organic substance, signal transduction, multicellular organism development, cell wall biogenesis, lipid metabolic process, cell surface receptor signaling pathway, phenylpropanoid metabolic process, monocaboxylic acid metabolic process, seed oil body biosynthesis and lipid localization ( Figure 6A). Most of the functional network groups were well connected to the enzymes and proteins involved in cutin biosynthesis and its regulation. Figure 6B shows the network representing an interaction between cutin genes and TFs. Bar plots are used to denote the gene expression profiles between the two genotypes and show DEGs between high-and low-cutin habanero pepper based on FPKM.

Genes Involved in Cutin Biosynthesis
The cutin monomers are mostly composed of long-chain aliphatic ω-hydroxy acids, especially dihydroxy hexadecanoic acids, and have been considered the most important component of most plant cutin materials, especially in fruits [7,24]. Cutin is synthesized by epidermal cells in higher plants and is an insoluble, covalently cross-linked polymer consisting of organic chemicals including glycerol, hydroxylated fatty acids and hydroxylated epoxy compounds with carbon atom chains of lengths 16 and 18 [5,12,33]. Their monomers consist of C16 or C18 aliphatic fatty acids, their derivatives and glycerol and phenolic compounds. These monomers are generated from fatty acyl-CoA by a series of hydroxylation and epoxidation reactions that are catalyzed primarily by cytochrome-P450-dependent enzymes [5,34]. Cutin polymers are essential for plant development and are synthesized via the cutin biosynthetic pathway [11]. C16/C18 fatty acid precursors are initially catalyzed by long-chain acyl-CoA synthetase (LACS) genes, and further catalysis by downstream genes yields various monomers along the cutin pathway. Several enzymes for the biosynthesis of cutin polymer have been identified in Arabidopsis, involving cascade of activities from long-chain acyl-CoA synthetase (LACS1/LACS2) to cutin synthase/GDSL lipase [11,35]. Arabidopsis homologs for genes involved in cutin biosynthesis were used to identify the corresponding homologs from C. chinense, and their differential expression between the two genotypes in terms of fold change were calculated (Table 4). The CoAs are esterified to fatty acids by long-chain acyl-CoA synthetase (LACS1 and LACS2) to give acyl-CoA [35,36]. Mutations in LACS2 showed a reduced amount of cutin monomers and slight reduction in amount of cuticular wax in Arabidopsis [37]. LACS2 is important for the biosynthesis of cutin monomer, and in our study the expression of LACS2 was highly correlated with cutin content for the two genotypes. LACS2 was expressed three-fold higher in PI 257145 than PI 224448. We located the expression of three LACS2 genes in habanero genotypes and all three were significantly upregulated in high-cutin PI 257145 versus low-cutin PI 224448. The Log2FC values for the three LACS2 genes were 2.9, 4.2 and 3.5. Cytochrome-P450-dependent enzymes (particularly members of the CYP86A family and CYP77A) catalyze a series of hydroxylation and epoxidation reactions in epidermal cells in plants [5]. In the cutin biosynthetic pathway, CYP86A encodes a ω-hydroxylase that incorporates a hydroxyl group to give 16-hydroxy or 18-hydroxy fatty acids, whereas CYP77A carries a midchain hydroxylase or epoxidase activity for the synthesis of dihydroxy fatty acids [11,15,38]. Both enzyme families were upregulated in PI 257145 with fold-change range from 2.8 to 5.5 (Table 4).
Another enzyme in the pathway encodes the activity of an acyltransferase, glycerol-3-phospahate acyl transferase 6 (GPAT6), which adds the glycerol moieties into cutin. GPAT6 enzymes are involved in the transfer of fatty acids from acyl-CoA to glycerol-3-phosphate [11,24,39]. A gpat6-a mutant showed a striking phenotype in tomato fruit, with greatly altered cuticle thickness, composition and properties [40,41]. GPAT6 gene was expressed four-fold higher in high-cutin PI 257145 than low-cutin PI 224448. The enzyme GDSL esterase or lipase/cutin deficient 1 (CD1) encodes α-hydroxylase that is involved in the polymerization of various acyl-glycerols to give the cutin polymers. Previous reports have clearly demonstrated the role of this enzyme in cutin biosynthesis and showed a marked reduction of cutin content in GDSL lipase mutant tomato genotypes [12,[42][43][44]. GDSL is considered one of the major rate-limiting enzymes for cutin biosynthesis. We have found two genes for GDSL esterase or lipase in C. chinense, and both were highly expressed in high-cutin PI 257145. The FC ranged from 3 to 7 in PI 257145 compared with low-cutin PI 224448. Certain ATP binding cassette (ABC) transporters, the ABCG subfamily (ABCG11 and ABCG32), have also been associated with cutin biosynthesis and are involved in the export of cutin precursors across the plasma membrane in plants [45][46][47][48]. These transporter genes are important for cutin biosynthesis. All are highly expressed in PI 257145 versus PI 224448, which agreed with the cutin content.
The transcriptional regulators in the cutin biosynthesis pathway play major roles in regulating biosynthetic genes. The WIN/SHN TFs were first identified in Arabidopsis, and there are three major SHN genes for cuticle biosynthesis (SHN1, SHN2 and SHN3) [31]. These sets of genes belong to the Arabidopsis APETALA 2 (AP2) family TFs and they regulate cutin and epidermal cells. WIN1/SHN1 is an activator of the promoter region of several cutin genes, and in tomato, SISHN3 has been reported to upregulate multiple genes involved in cutin metabolism, e.g., CYP86A gene of the cytochrome P450 [17,18,33,38,49]. Hence, the SHN1 TF is considered a strong positive regulator of cutin biosynthesis. Of note, SHN1 was expressed at a higher level in PI 257145 than in PI 224448, with 8-fold difference. Another set of TFs, the homeodomain leucine zipper IV (HD-Zip IV) TFs, were identified in Arabidopsis. They are highly expressed in epidermal cells and their functions are epidermis-related. One of these TFs, nuclear factor X-like 2 (NFXL2), has been identified as a negative repressor for all SHN genes, ultimately leading to negative alterations in cutin composition [18]. The expression of NFXL2 in PI 257145 was not significant, which agrees with high cutin content in this genotype. Another member of the class IV homeodomain-leucine-zipper proteins TFs regulating cutin biosynthesis discovered in Arabidopsis was anthocyaninless2 (ANL2). In [50], the leaf cutin composition in the ANL2 mutant was 40% less than in the Arabidopsis wild-type. Supporting this, in our study, ANL2, a positive regulator of cutin biosynthesis was expressed at higher level in PI 257145 than PI 224448, with about 6-fold difference. Overexpression of MYB30 in Arabidopsis was also reported to stimulate the synthesis of long chain fatty acids and cutin [51]. In our study, MYB protein had higher expression in high-cutin PI 257145 than low-cutin PI 224448, which further strengthens its role as a candidate regulatory factor in cutin metabolism. The putative cutin biosynthetic pathway genes predicted for habanero peppers based on RNA-Seq data are shown in Figure 7A; their expression based on FPKM-normalized Log10-transformed counts is shown in Figure 7B. Analysis of all genes for the cutin biosynthesis pathway revealed that all the genes experimentally validated to positively regulate cutin biosynthesis were significantly upregulated in high-cutin PI 257145 versus low-cutin PI 224448. The RNA-Seq based gene expression data and metabolic data showed significant correlation in cutin content and gene expression between the two habanero genotypes, which in turn identified the important genes and TFs contributing to the increased cutin content in PI 257145.
Integrating metabolomic and transcriptomic analysis revealed significant differences in cutin biosynthesis between the habanero genotypes PI 257145 and PI 224448. Metabolomics analysis revealed about 6-fold higher cutin content in PI 257145 versus PI 224448. Transcriptomic analysis revealed several significant DEGs between the high-and low-cutin genotypes. Genes such as GDSL lipase, glycerol-3 phosphate acyltransferase 6, long-chain acyltransferase 2 and cytochrome P450 86A/77A were found to be important for cutin biosynthesis. TFs such as SHN1, ANL2 and HDG1 are found to be the key regulators of the cutin biosynthetic pathway.
Integrating metabolomic and transcriptomic analysis revealed significant differences in cutin biosynthesis between the habanero genotypes PI 257145 and PI 224448. Metabolomics analysis revealed about 6-fold higher cutin content in PI 257145 versus PI 224448. Transcriptomic analysis revealed several significant DEGs between the high-and low-cutin genotypes. Genes such as GDSL lipase, glycerol-3 phosphate acyltransferase 6, long-chain acyltransferase 2 and cytochrome P450 86A/77A were found to be important for cutin biosynthesis. TFs such as SHN1, ANL2 and HDG1 are found to be the key regulators of the cutin biosynthetic pathway.

Collection of Plant Material
Seeds from two different habanero pepper genotypes (PI 224448 from Costa Rica and PI 257145 from Peru) from a worldwide collection of habanero peppers were obtained from USDA GRIN. Ten plants for each line were started in the greenhouse as surface-sterilized seeds in pots. The seeds were sown in moderately wet soil and covered with black paper bags in the dark for about 3 to 4 days to germinate in a temperature-and humidity-controlled incubator. After 4 days of germination in darkness, the pots were removed from the incubator, uncovered and left to grow under controlled conditions in the greenhouse, watered daily and finally transplanted to the Sissonville field plots. The plants were allowed to mature, and the appearance of waxes or glossiness guided our selection for the fruit sample collection. Mature green fruit tissues from each of the genotypes flowered at the same time were collected, frozen in liquid nitrogen and stored at −80 °C.

Cutin Isolation and GC-MS Analysis
Detailed metabolite profiling involved GC-MS. Cutin composition of the fruit tissues of the genotypes were examined with three replications according to the protocol reported by Parsons et al. [4] with slight modifications. Cuticle was isolated from 50 mg frozen fruit tissue powder obtained from lyophilized matured green fruits. Enzymatic digestion of the powdered samples involved using 2% pectinase and 0.1% cellulase in 0.2 mM citrate buffer, 3.7 pH (using 0.001% phenylmercuric nitrate as an antimicrobial agent). An incubator-shaker was set at 35 °C and 100 rpm for several days until the discs had little or no debris on them. Acetone with 50 mg L -1 butylated hydroxytoluene was used to rinse the isolated cuticles three times, followed by refluxing delipidation of the discs in chloroform:methanol (1:1, v/v). Depolymerization in 3N methanolic hydrochloride (Me-OH-HCl) was then performed by using a protocol by [52] with 6.5 mL of 3 N Me-OH-HCl for each depolymerization reaction and left for 16 h at 60 °C. The reaction vials were cooled to room temperature, and 6 mL saturated aqueous NaCl was added to stop the depolymerization reaction. The individual cutin monomers were removed as methyl esters in two different extractions by using distilled dichloromethane [53] Centrifugation at 3000 rpm for 3 min was used to separate the different phases, followed by washing the organic phase with 0.9% aqueous NaCl three times and incubation with 2,2-dimethoxy propane at 60 °C to remove dissolved water in the organic phase and then drying

Collection of Plant Material
Seeds from two different habanero pepper genotypes (PI 224448 from Costa Rica and PI 257145 from Peru) from a worldwide collection of habanero peppers were obtained from USDA GRIN. Ten plants for each line were started in the greenhouse as surface-sterilized seeds in pots. The seeds were sown in moderately wet soil and covered with black paper bags in the dark for about 3 to 4 days to germinate in a temperature-and humidity-controlled incubator. After 4 days of germination in darkness, the pots were removed from the incubator, uncovered and left to grow under controlled conditions in the greenhouse, watered daily and finally transplanted to the Sissonville field plots. The plants were allowed to mature, and the appearance of waxes or glossiness guided our selection for the fruit sample collection. Mature green fruit tissues from each of the genotypes flowered at the same time were collected, frozen in liquid nitrogen and stored at −80 • C.

Cutin Isolation and GC-MS Analysis
Detailed metabolite profiling involved GC-MS. Cutin composition of the fruit tissues of the genotypes were examined with three replications according to the protocol reported by Parsons et al. [4] with slight modifications. Cuticle was isolated from 50 mg frozen fruit tissue powder obtained from lyophilized matured green fruits. Enzymatic digestion of the powdered samples involved using 2% pectinase and 0.1% cellulase in 0.2 mM citrate buffer, 3.7 pH (using 0.001% phenylmercuric nitrate as an antimicrobial agent). An incubator-shaker was set at 35 • C and 100 rpm for several days until the discs had little or no debris on them. Acetone with 50 mg L −1 butylated hydroxytoluene was used to rinse the isolated cuticles three times, followed by refluxing delipidation of the discs in chloroform:methanol (1:1, v/v). Depolymerization in 3N methanolic hydrochloride (Me-OH-HCl) was then performed by using a protocol by [52] with 6.5 mL of 3 N Me-OH-HCl for each depolymerization reaction and left for 16 h at 60 • C. The reaction vials were cooled to room temperature, and 6 mL saturated aqueous NaCl was added to stop the depolymerization reaction. The individual cutin monomers were removed as methyl esters in two different extractions by using distilled dichloromethane [53] Centrifugation at 3000 rpm for 3 min was used to separate the different phases, followed by washing the organic phase with 0.9% aqueous NaCl three times and incubation with 2,2-dimethoxy propane at 60 • C to remove dissolved water in the organic phase and then drying under nitrogen gas. BSTFA was used for derivatization followed by GC-FID analysis as previously described [54]. An Agilent 5975C GC-MS instrument with an HP-5 MS column (30 m, 0.25 mmID, 0.25 µm film) was used, and methyl heptadecanoate and methyl tricosanoate were used as internal standards. Published mass spectra of methyl ester and trimethyl silyl derivatives were used to identify the monomers ( [55]; http://lipidlibrary.aocs.org/). The amount of individual cutin monomers was expressed in milligrams/gram dry weight.

RNA Isolation, Library Preparation and Transcriptome Sequencing
Total RNA was isolated from 100 mg matured green fruit tissues of the two genotypes PI 257145 and PI 224448 with three biological replicates by using the Nucleospin RNA plant kit (Macherey Nagel). Total RNA was treated with DNAseI (Qiagen) to remove co-isolated genomic DNA and purified by using the RNeasy MinElute Cleanup Kit (Qiagen). The Qubit 4 Fluorometer (Invitrogen) and Agilent 2100 Bioanalyzer were used to detect the concentration and integrity of total RNA. Total RNA from three replicates was pooled for each genotype before RNA-Seq library preparation. Libraries for the RNA-Seq of the two habanero genotypes were prepared by using the NEBNext Ultra II RNA Library Prep Kit according to the manufacturer's specification. Taking 1 µg total RNA, mRNA enrichment for poly-A involved using magnetic beads with Oligo (dT) with NEBNext Poly (A) mRNA Magnetic Isolation Module (NEB, E7490) followed by fragmentation into shorter fragments by using fragmentation buffer. Oligo dT primers were used for synthesis of first-strand cDNA. Sequencing adapters were added to the resulting cDNA followed by amplification of the library using sequencing primers. After constructing the RNA-seq library, the Agilent 2100 Bioanalyzer (Invitrogen) was used to analyze the library insert size, and the Qubit 4 Fluorometer (Invitrogen) was used to quantify the library concentration. RNA-Seq for each of the samples involved using the Illumina NextSeq 500 platform with a paired-end sequencing protocol. The resulting image files in the bcl format were converted to FASTQ with 2 × 75 bp reads with the bcl2fastq tool (Illumina).

Transcriptome Analysis
The quality of raw reads was ascertained by checking the adapter, GC distribution, average base content and quality score of the distribution by using fastqc (https://www.bioinformatics.babraham. ac.uk/projects/fastqc/). The adapter sequences and low-quality reads (Phred score QV < 30) were removed and the clean reads were filtered from the raw data by using the software cutadapt (https://cutadapt.readthedocs.io/en/stable/guide.html) and sickle (https://github.com/najoshi/sickle), respectively. The quality-filtered reads were mapped to the C. chinense reference genome v1.2 (http://peppergenome.snu.ac.kr/) by using the STAR universal RNA-Seq alignment tool with default parameters [26] to generate BAM alignment. The read count tables for the genes across all the samples were created by using BAM alignment and the general feature format (GFF) of genome annotation with the HTSeq R package [27] and RSEM (https://deweylab.github.io/RSEM/). The counts were normalized by using reads per kilobase of transcripts per million (RPKM). The gene expression based on the read counts were studied using fragments per kilobase of transcripts per million (FPKM). The FPKM values for each of the genes were calculated based on the read count table, the total number of reads per sample and gene length in kb.
The DEGs resulting from the comparison of PI 257145 and PI 224448 were identified using the NOISeq R/Bioc package [28] with three simulated replicates having variability of 0.02 and counts per million (CPM) of 1. The DEGs were filtered based on the minimum Log2FC of 1 and p-value of 0.9 as per the NOISeq R/Bioc package. Gene annotation, gene ontology (GO) enrichment analysis was performed with BLAST2GO (https://www.blast2go.com/). Transcription factor (TF) prediction, and TF enrichment analysis was involved using the Plant Transcription Factor Database (http://planttfdb.cbi.pku.edu.cn/). Heatmaps were generated by using mev (http://mev.tm4.org/). Gene network analysis involved using Cytoscape (https://cytoscape.org/) and the STRING database (https://string-db.org/) with Arabidopsis as the reference to retrieve protein-protein interactions. Functional networks for DEGs were derived by using the ClueGO plugin [56] available in Cytoscape. Pathway mapping involved using KOBAS [57] and MapMan (https://mapman.gabipd.org/). Sequences of genes involved in cutin biosynthesis were identified by using the Arabidopsis homolog and C. chinense mRNA sequences [11].

RT-Quantitative PCR (RT-qPCR)
Total RNA was isolated from frozen matured green fruit tissues of habanero pepper by using the Plant RNA mini spin kit (Macherey-Nagel). The NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific, MA, USA) was used to measure RNA concentrations. The Super Script First-Strand Synthesis system (Invitrogen) was used for first-strand cDNA synthesis with 6 µg total RNA per sample. An amount of 1 µL cDNA diluted 1:6 was used for RT-qPCR analysis. In a final volume of 20 µL, diluted cDNA was mixed with 10 µL SYBR Green PCR master mix (Applied Biosystems, Foster City, CA, USA) and 10 pmol each of forward and reverse primers and completed with nuclease free water. Primer3Plus software (http://www.primer3plus.com/) was used to design gene-specific primers for the randomly selected genes involved in cutin biosynthesis. Details of the genes with primer sequences are available in Supplementary Table S1. Semiquantitative RT-PCR amplification to test primers was performed in a total reaction volume of 20 µL containing 1 µL cDNA, 10 µL colorless GoTaq and 10 pmol each of forward and reverse primers and completed with nuclease free water. Thermocycling conditions were an initial denaturing step of 95 • C for 1 min, followed by 25 cycles of 95 • C for 15 s, corresponding annealing temperature 60 • C for 70 s and 72 • C for 30 s, with a final extension step of 72 • C for 25 min. An amount of 1% agarose gel pre-stained with ethidium bromide was used to confirm the amplified fragments by visualization under UV light. Transcript-level expression was detected by RT-qPCR with SYBR Green PCR Master mix (ROX) (Roche, Shanghai) on a StepOnePlus Real-Time PCR System (Applied Biosystems, Foster City, USA). PCR involved a total reaction volume of 20 µL containing 1 µL cDNA, 1 µL of the forward and reverse primers (10 µM), 10 µL of SYBR Green PCR Master mix (ROX) (Roche, Shanghai, China) and 8 µL sterile distilled water. Amplification conditions were 95 • C for 10 min, followed by 40 cycles of 95 • C for 15 s, and 60 • C for 1 min. The reactions were performed in three technical replications and three biological replicates to compute the average Ct values. The gene expression for each gene was normalized against beta-tubulin expression and data analysis for the relative gene expression was computed with the 2 -∆∆CT method. The results are expressed as Log2foldchange (Log2FC) ± mean standard error (SEM).

Conclusions
Integrating metabolomic and transcriptomic analysis revealed significant differences in cutin biosynthesis between the habanero genotypes PI 257145 and PI 224448. Metabolomics analysis revealed significant variations in cutin composition between the two genotypes, with about 6-fold higher cutin content in PI 257145 versus PI 224448. Cutin monomer 10,16-dihydroxy hexadecanoic acid was present at the highest percentage (82.6%) in PI 257145. Transcriptomic analysis with RNA-Seq revealed significant gene expression differences between the high-and low-cutin genotypes. In this study, we report transcriptome and metabolome data pertaining to cutin in habanero peppers along with the predicted putative cutin biosynthetic pathway for habanero peppers. Genes such as GDSL lipase, glycerol-3 phosphate acyltransferase 6, long-chain acyltransferase 2 and cytochrome P450 86A/77A and TFs such as SHN1, ANL2 and HDG1 are found to be the key genes highly contributing to the high cutin content in PI 257145. These genes previously showed a similar pattern of regulation in tomato and Arabidopsis. These analyses advance our knowledge on the molecular mechanisms regulating the accumulation of cutin in habanero pepper fruits. These resources can be built on for developing habanero fruit cultivars with high cutin content that show resistance to biotic and abiotic stresses.