Longer Duration of Active Oil Biosynthesis during Seed Development Is Crucial for High Oil Yield—Lessons from Genome-Wide In Silico Mining and RNA-Seq Validation in Sesame

Sesame, one of the ancient oil crops, is an important oilseed due to its nutritionally rich seeds with high protein content. Genomic scale information for sesame has become available in the public databases in recent years. The genes and their families involved in oil biosynthesis in sesame are less studied than in other oilseed crops. Therefore, we retrieved a total of 69 genes and their translated amino acid sequences, associated with gene families linked to the oil biosynthetic pathway. Genome-wide in silico mining helped identify key regulatory genes for oil biosynthesis, though the findings require functional validation. Comparing sequences of the SiSAD (stearoyl-acyl carrier protein (ACP)-desaturase) coding genes with known SADs helped identify two SiSAD family members that may be palmitoyl-ACP-specific. Based on homology with lysophosphatidic acid acyltransferase (LPAAT) sequences, an uncharacterized gene has been identified as SiLPAAT1. Identified key regulatory genes associated with high oil content were also validated using publicly available transcriptome datasets of genotypes contrasting for oil content at different developmental stages. Our study provides evidence that a longer duration of active oil biosynthesis is crucial for high oil accumulation during seed development. This underscores the importance of early onset of oil biosynthesis in developing seeds. Up-regulating, identified key regulatory genes of oil biosynthesis during early onset of seed development, should help increase oil yields.


Background
Plants accumulate oil primarily in the form of triacylglycerols (TAG) [1]. Triacylglycerols have nutraceutical value and are the main source of edible oils for households and industrial applications. With the increasing human population, the global consumption for vegetable oils has increased by >50% over the past decade and is expected to double by 2040 [2,3]. The enzymatic reactions for TAG biosynthesis in plants are well established with the pathways for fatty acid (FA) biosynthesis in the plastid followed by TAG formation in the endoplasmic reticulum (ER) (Figure 1). For the final step in TAG biosynthesis, in addition to the reaction catalyzed by diacylglycerol O-acyl transferase (DGAT) (Figure 1), The first committed enzyme is acetyl-CoA carboxylase (ACCase), which acts as a control point over the carbon flux into FAs [8]. In plants, it is present in two functional forms, the heteromeric form in plastids and the homomeric form in the cytosol [11]. Heteromeric ACCase participates in FA synthesis, while the homomeric ACCase is involved in the synthesis of long-chain FAs and secondary metabolites [11]. There are four subunits present in heteromeric ACCase, encoded by accC for biotin carboxylase (BC), accB for biotin carboxyl carrier protein (BCCP), accA for α-carboxyl transferase, and accD for β-carboxyl transferase [12]. Of these four, only accD is chloroplast-encoded, while the others are nuclear encoded [8]. The malonyl-CoA-acyl carrier protein transacylase (MCAT) coding gene, FabD, is one of the key regulators in FA biosynthesis [13]. It aids in the formation of malonyl-ACP, the building block of FA synthesis, from malonyl-CoA [14]. Subsequently, β-ketoacyl-ACP synthase III (KASIII; encoding gene: FabH) catalyzes the condensation and transacylation of acetyl-CoA with malonyl-ACP to form 3-ketobutyryl-ACP. β-ketoacyl-ACP synthase I (KASI; FabB) elongates 3-ketobutyryl-ACP to palmitoyl-ACP in six condensation cycles. In the final elongation step, β-ketoacyl-ACP synthase II (KASII, FabF) converts palmitoyl-ACP to stearoyl-ACP in the plastid [15]. The acyl-ACP thioesterases (acyl-ACP TE; FatA, FatB) catalyze the FA chain termination through hydrolysis of the thioester bond of acyl-ACP to release the free FAs [9].
These free FAs are reactivated by a long-chain acyl-CoA synthetase (LACS) to form acyl-CoA esters and are exported to the ER [16]. FA desaturases (FAD) desaturate fatty acyl chains to form unsaturated fatty acids [17]. Plant FADs are of two types, soluble and membrane-bound desaturases. Stromal ∆9 acyl-ACP desaturases (AADs) introduce a double bond in saturated acyl chains to form cis-monoenes. The stearoyl-ACP-desaturase (SAD) is the archetype for this family and desaturates stearoyl-ACP (18:0) at C-9 to form oleic acid (18:1; omega-9). An AAD isoform, palmitoyl-ACP 19 desaturase (PAD), exhibits substrate specificity to palmitoyl-ACP, thereby producing palmitoleic acid (16:1; omega-7) [18]. The substrate-specific activity of SAD enzymes determines the ratio of omega-7 and omega-9 FAs in plants [19]. Membrane-bound desaturases are localized to the ER and plastid membranes, and they are grouped into several FAD subfamilies including Omega-3-FAD (fatty acid desaturase3 FAD3, FAD7, and FAD8) and Omega-6-FAD (FAD2 and FAD6) [20]. In ER, the sequential transfer of fatty acyl moieties to a glycerol-3-phosphate backbone generates TAG in four steps, also known as the Kennedy pathway [21]. Glycerol-3-phosphate is acylated by glycerol-3-phosphate acyltransferase (GPAT) to form lyso-phosphatidic acid (LPA) and further acylation of LPA by lysophosphatidic acid acyltransferase (LPAAT) to produce phosphatidic acid (PA) [1]. Subsequently, PA is hydrolyzed by phosphatidic acid phosphatase (PAP) to generate diacylglycerol (DAG). Plant PAPs are classified into two types: soluble-type PAPs called phosphatidate phosphohydrolases (PAH) dephosphorylate PA and membrane-bound PAPs called lipid phosphate phosphatases (LPP) dephosphorylate different lipid phosphates, non-specifically. In the final step, diacylglycerol acyltransferases (DGAT) convert DAG to TAG through the transfer of an acyl group [7] (Figure 1). Sesame (Sesamum indicum L., 2n = 2x = 26) is an ancient oilseed crop of the Pedaliaceae family. Sesame seeds are nutritionally rich and possess 50-60% oil, 25% protein, and antioxidants such as sesamolin and sesamin [22]. Sesame is a resilient crop with strong adaptation to environmental changes and is considered the 'queen of oil seeds' [23]. Sesame seed yield during 2019 was 6.5 million tons, as per FAO statistics (https://www.fao.org/ faostat/en/#data; accessed on 28 September 2022). Understanding the genetic basis of oil biosynthesis and developing stable high oil yielding sesame varieties have been key objectives in sesame breeding in recent years [22]. To accelerate crop improvement, omics and molecular tools play key roles through the availability of high-throughput sequencing technologies. Sesame has gained increasing attention, and new genomic information is becoming available for the identification of genes and markers underlying the traits of interest, especially oil yield and productivity, with the help of omics data [24,25]. Here, we mined the existing genomic information for genes (and their translated amino acid sequences) and their families involved oil biosynthesis and subjected them to different bioinformatic tools to understand the characteristic features of the nucleotide and translated protein sequences. Our study identified the orthologs in sesame, on a homology basis, of genes that were previously unknown and also of genes associated with high oil yield with the potential to enhance oil yield in sesame.

Physicochemical Properties of Retrieved Proteins
The translated amino acid sequences of genes involved in the oil biosynthesis pathway and their paralogs in sesame were mined from the NCBI database (Table S1). A total of seven genes that encode four subunits of heteromeric ACCase enzymes were found, including two for SiaccB, one for SiaccC, three for SiaccA, and one for chloroplastic SiaccD (Table S1). One MCAT, eight KAS, and thirteen FAD genes, including seven SAD, two FAD2, and one each of FAD3, FAD6, FAD7, and FAD8, were retrieved. Six LACSs, seventeen GPATs, six LPAATs, three PAHs, and three DGATs were retrieved in sesame (Table S1). Translated amino acid sequences (Seq_File_S1.docx) were used for the analysis. The analyzed protein sequences of sesame ranged between 271 amino acids for SiaccB-2 (29.06 kDa) and 1004 (110.47 kDa) for SiPAH2 (Table S2). The predicted aliphatic index of the studied sesame proteins ranged from 72.20 to 121.27. The aliphatic index indicates the thermostability and half-life of a protein [26].

Subcellular Localization
The subcellular localization for the translated AA sequences was predicted in silico using the DeepLoc tool. For the nuclear-encoded subunits of ACCase, SiaccB1, SiaccB2, and SiaccC were localized to the plastid while SiaccA localized in the plastid membrane (Table S2). All the members of SiKASI and SiKASII were plastid localized (Table S2).
The TMHMM tool predicted the existence of transmembrane helix domains in the FAD, GPAT, LPAAT, and DGAT families. In the case of membrane-bound FADs (SiFAD2-1, SiFAD2-2, SiFAD3, SiFAD6, SiFAD7, and SiFAD8), the number of transmembrane domains among proteins exhibited variability between a total of five transmembrane domains in SiFAD2-1 and only one in SiFAD7 ( Figure S1). SiGPAT proteins were predicted to have two to four transmembrane domains ( Figure S2). The TMHMM tool had predicted nine and two transmembrane domains for SiDGAT1-1 and SiDGAT-2, respectively (Figures S1-S3).

Homology Relationship
A tree was constructed based on an amino acid sequence alignment to understand the homology relationships between gene families of oil biosynthesis of sesame and other related oilseed crops (Seq_File_S1.docx). The gene products involved in oil biosynthesis were grouped into 11 distinct clades ( Figure 2). All desaturase proteins formed a single clade with four subclades, in which one subclade consist of two groups, FAD2 in one and FAD3 and FAD7/8 in the second group. The second subclade possesses most of the SAD proteins excluding six SADs (SiSAD5, SiSAD6, SiSAD7, AtSAD7, AhSAD2, and AhSAD3, that are grouped in third subclade). Lastly, the fourth subclade grouped all FAD6 proteins ( Figure 2). KAS and LACS proteins formed a separate clade ( Figure 2). All GPAT proteins were grouped into a single clade, except SiGPAT1 and AtATS1. In a clade containing FatA and FatB proteins, α-CT and β-CT subunits of ACCase were grouped in two distinct subclades ( Figure 2). The other subunit proteins of ACCase proteins, BCCP and BC, were grouped in a different clade. Among the Kennedy pathway gene products, DGAT3 and PAH were more closely related and fall into one clade. Similarly, DGAT1 and DGAT2 forms a distinct clade ( Figure 2). The protein identified by us as SiLPAAT-B on the basis of homology relationship with class-B LPAAT of Ricinus communis 410 (RcLPAAT-B, Figure 2) is annotated in the NCBI protein database as 1-acyl-sn-glycerol-3-phosphate acyltransferase, (XP_020554192.1).

Multiple Sequence Alignment
Multiple sequence alignments of BCCP proteins indicated that the C-terminal region was conserved with a typical biotinyl domain (CIIEAMKLMNEIE) ( Figure S4A), while sequence alignments of BC (accC) proteins indicated their highly conserved nature ( Figure S4B). The N-terminal domain of α-CT proteins showed more conserved sites than the C-terminal one, whereas in the β-CT proteins, the C-terminal domain was more conserved ( Figure S5).
The specific amino acid residues that are conserved in the active sites of FabD proteins are present in SiFabD as Gln112, Ser197, Arg222, His310, and Gln359 were highly conserved in AhFabD as Gln89, Ser174, Arg199, His287, and Gln336 [13,28] (Figure S6). An active site triad Cys-His-Asn and the motif GNTSAAS were found to be conserved in SiKASIII proteins ( Figure S7). The seven catalytic sites' residues (Cys, His, Thr, Thr, Lys, His, and Phe) and cation site residues [29] are highly conserved in the KASI and KASII proteins ( Figure S8). The presence of two histidine boxes in the SiSAD genes is consistent with plastidial stearoyl-ACP desaturases containing EENRHG and DEKRHE boxes ( Figure S9; Table 1).
Plants 2022, 11, x FOR PEER REVIEW 5 of 23 Figure 2. Circular dendrogram for the gene products involved in oil biosynthesis constructed using the N-J method with 10,000 bootstraps using MEGAv10 [27]. For accession number details of each gene product, please refer to Supplementary

Multiple Sequence Alignment
Multiple sequence alignments of BCCP proteins indicated that the C-terminal region was conserved with a typical biotinyl domain (CIIEAMKLMNEIE) ( Figure S4A), while sequence alignments of BC (accC) proteins indicated their highly conserved nature (Figure S4B). The N-terminal domain of α-CT proteins showed more conserved sites than the C-terminal one, whereas in the β-CT proteins, the C-terminal domain was more conserved ( Figure S5).

Conserved Motif Analysis
Proteins having related functions contain highly conserved short (<20 amino acids) amino acid sequences recurring in fixed-length patterns called motifs. A motif may represent important biological features, such as protein-binding or targeting a particular subcellular location. Conserved motifs in all studied genes (Figures S19-S36) were identified using the MEME web tool.

Promoter Analysis
Based on the homology relationships for genes associated with oil biosynthesis in model plants and genotypes exhibiting higher oil yield, we have identified 12 genes in sesame that could be potential targets for enhancement in the sesame oil yield ( Table 3). The upstream sequences (3 kb from the start codon) for each of the 12 genes were subjected to promoter analysis using the PlantCARE database to predict the potential regulatory features [36]. A total of 79 CAREs (cis-acting regulatory elements) were detected from the upstream sequences of the aforementioned 12 genes potentially targeted to improve oil yield ( Figure 3; Table S3). Different TFs associated with FA biosynthesis regulation and TAG accumulation are well documented, including DNA binding with one finger (Dof), LEAFY COTYLEDON1 (LEC1), LEC2, sequences over-represented in light-induced promoters 5 (SORLIP5), AGAMOUS-LIKE15 (AGL15), BASIC LEUCINE ZIPPER 67 (bZIP67), SPATULA (SPT), WRINKLED1 (WRI1), MYB96, ABSCISIC ACID INSENSITIVE3 (ABI3), and FUSCA3 (FUS3) [37,38]. Of these 11 TFs, binding sites for TFs DOF, SORLIP5, AGL15, and LEC1 were identified in the promoter regions of all the 12 genes studied (Figure 4). Although the number of DOF binding sites is over-represented in all these promoters studied, the literature suggests that not all the AAG motifs in plant promoters are targets for the DOF domain containing proteins and are variably regulated [39]. DOF4 and DOF11 positively regulate ACCase and LACS and, thereby, increase seed oil content in Arabidopsis, while this negatively regulates CRA1, associated with seed storage protein [40]. LEC1 and LEC2 positively regulate WRI1 and AGL15, respectively [41,42]. WRI1 in turn positively regulates TAG biosynthesis through DGAT1 [43]. Our transcriptome analysis also revealed the higher expression levels of WRI1 (SIN_1023649), specifically in high oil yielding genotypes. Binding sites for MYB96, a positive regulator of seed oil accumulation and activator of the DGAT1 gene in Arabidopsis seeds [44], was detected in the upstream sequences of the SiDGAT2 and SiDGAT3 genes in sesame (Figure 4), but not in the SiDGAT1 gene. It is reported that MYB96 binds to the promoter of ABI4 and activates its expression [45], and ABI4 has been shown to directly regulate DGAT1 expression in Arabidopsis [46]. Virus-induced gene silencing of JcKASII significantly altered TAG biosynthesis [51] SiFabF-3

AtDGAT1
Overexpression of AtDGAT1 under the control of a seed-specific promoter in Arabidopsis and canola increased seed oil content by 28 and 16%, respectively 4% increase in seed oil content in transgenic C. sativa overexpressing CsDGAT1B [56]

TmDGAT1 (Tropaeolum majus)
Overexpression of embryo-specific TmDGAT1 increased the storage oil content in transgenic Arabidopsis and rapeseed by~8 and~15%, respectively [

Validation of In Silico Predicted Oil Biosynthesis Genes Using RNA-Seq
The key regulatory genes associated with oil biosynthesis (Table 3) were identified through genome-wide in silico mining approaches. Expression of these genes was validated through RNA-seq based differential expression studies using the publicly available transcriptome data for three genotypes (high oil yielding-ZZM4728, low oil yielding-ZZM3495, ZZM2161) at four developmental stages (10,20,25, and 30 days-post-anthesis: DPA) [24]. In total, 501 genes were found to be significantly differentially expressed in the high oil yielding sesame genotype (ZZM4728) when compared to the low oil yielding genotypes (ZZM3495 and ZZM2161) (Table S4). Further, of the 501 genes, 27 significantly differentially expressed genes were associated with the lipid biosynthesis (including lipid droplet biogenesis) pathway. From these, DGAT, LPAAT, FAD, SAD, KCS (β-ketoacyl-CoA synthase), Oleosin, nsLTP (non-specific lipid transfer protein), and DIR1 (defective in induced resistance1) genes were selected manually.
The SAD gene showed higher expression at the early stage (10 DPA) in the high oil yielding genotype while its higher expression was found during middle stages (20 and 25 DPA) in the low oil yielding genotypes ( Figure 5). Similar to SAD expression, FAD2 gene expression also increases consistently from the early seed developmental stage (10 DPA) and is decreased at 30 DPA in the high oil yielding genotype, while, in case of low oil yielding genotypes, its expression level increased only during the middle developmental stage (20 DPA) ( Figure 5). We also found genes associated with lipid droplet biogenesis (oleosin gene family) to be significantly more highly expressed in the high oil yielding genotypes, at 10 DPA. Interestingly, these oleosin genes exhibited higher expression levels in both high and low oil yielding genotypes during later developmental stages (25 and 30 DPA). The genes WSD1/DGAT, LPAAT, and FAD4L2exhibited higher expression levels during the early and middle developmental stages (10 and 20 DPA) in both high and low oil yielding genotypes. However, the expression level of these genes decreased towards later developmental stages (25 and 30 DPA) in low oil yielding genotypes while the expression pattern in high oil yielding genotypes is sustained at comparatively higher levels than the low oil yielding genotypes. Genes for lipid transfer proteins include nsLTP-encoding genes and DIR1 [61,62]. The gene product of KCS catalyzes the first step of the very long chain FA biosynthesis [63,64]. These three genes, nsLTP, DIR1, and KCS, tended to show higher expression in the high oil content sesames genotype when compared to the low oil yielding genotypes ( Figure 5J-L). Of these, expression pattern for nsLTP1 and DIR1 transcripts were lower at 10 DPA in low oil yielding genotypes when compared to high oil yielding genotypes, whereas KCS expression was found to be lower at 30 DPA in the low oil yielding genotypes than the high oil yielding genotype ( Figure 5M-O). These results suggest that early onset of oil biosynthesis during seed development is important for higher oil yield, and this is regulated transcriptionally by the key regulatory genes studied here. In low oil yielding genotypes, the capacity for oil biosynthesis evidently initiates during the mid to late seed developmental stage, which may explain the low oil yield. Since the time duration between anthesis and capsule maturity is uniformly 30-45 days [24], early onset of oil biosynthesis during seed development results in the longer duration of active oil formation and, thereby, higher oil yield. Plants 2022, 11, x FOR PEER REVIEW 11 of 23

Discussion
With the availability of high-throughput technologies and the sesame genome datasets, various genes involved in oil biosynthesis have been mined from the publicly available sesame genome and RNA-seq datasets and were studied using various bioinformatics tools. The present report focused on studies pertaining to conserved sites, catalytic sites, domains and motifs, physicochemical properties, subcellular location, and their homology with genes of other oilseed crop and model plants. TAG biosynthesis in oilseeds can be summarized in three major steps ( Figure 1): (i) biosynthesis of FA in plastids, (ii) desaturation of FA in plastid and ER, and (iii) TAG assembly in ER, where the free fatty acids are exported from the plastid to undergo stepwise acylation onto the glycerol backbone, forming TAG.

Discussion
With the availability of high-throughput technologies and the sesame genome datasets, various genes involved in oil biosynthesis have been mined from the publicly available sesame genome and RNA-seq datasets and were studied using various bioinformatics tools. The present report focused on studies pertaining to conserved sites, catalytic sites, domains and motifs, physicochemical properties, subcellular location, and their homology with genes of other oilseed crop and model plants. TAG biosynthesis in oilseeds can be summarized in three major steps ( Figure 1): (i) biosynthesis of FA in plastids, (ii) desaturation of FA in plastid and ER, and (iii) TAG assembly in ER, where the free fatty acids are exported from the plastid to undergo stepwise acylation onto the glycerol backbone, forming TAG.
Plant de novo FA synthesis occurs in the stroma of the plastid in a series of reactions: initiation, elongation, and termination [5]. The heteromeric ACCase subunit genes have been characterized in different crop plants, including peanut [13], soybean [12], Gossypium species [48], and pea [65]. Overexpression of the BCCP gene modulates the oil content in seeds of Arabidopsis and cotton [47]. Additionally, overexpression of the accD gene through chloroplast transformation resulted in increased ACCase levels that led to higher oil production in transgenic tobacco [49].

Characterization of the In Silico Mined Genes of Oil Biosynthesis
An evaluation of the physicochemical properties of the amino acids of FA synthesis genes highlighted that, on an isoelectric point (pI) basis, most of the proteins were alkaline in nature except SiaccB-2, SiaccC, SiaccD, SiKASIII, SiSAD, and FATB (Table S2). Similarly, the isoelectric points (pIs) of FAD genes from sunflower (pIs ranged from 6.24 to 9.61) and Brassica napus (7.8 to 9.5) showed alkaline nature [66,67]. The instability index is a measure of correlation with the in vivo half-life of a protein [68]. The aliphatic index is defined as a relative volume occupied by aliphatic side chains (alanine, valine, leucine, and isoleucine). Aliphatic amino acids are hydrophobic; therefore, a high aliphatic index indicates that a protein is thermo-stable over a wide temperature range and is regarded as a positive factor for the increase in thermostability of globular proteins [69]. The GRAVY value indicates the solubility of a protein, and a low GRAVY value indicates the hydrophilicity of the protein [69]. Most studied proteins have negative values revealing their hydrophilic nature (Table S2).
The catalytic residues are well conserved within KASI, KASII, and KASIII proteins ( Figures S7 and S8). These residues are reported to each have a key function: cysteine (C) is a substrate-binding residue, two histidines (H) are involved in the decarboxylation, and two threonines (T) are involved in forming hydrogen bonds with the ACP phosphopantetheine moiety [15] (Figure S8). In the C-terminal region, a conserved Gly-rich GNTSAAS motif was also found, that has been reported to be involved in forming oxide anion free radical [15]. Hence, formed oxide anion free radicals participate in redox reactions leading to oxidative modifications. The plant KASIII enzyme-specific catalytic triad composed of Cys-His-Asn was conserved in all SiKASIII proteins [70] (Figure S7). Although the N-terminal domains of FatA and FatB showed high divergence, the specific catalytic residues were conserved (D268, N270, H272, and E306 in SiFatA; D282, N284, H286, and E320 in SiFatB-1, and D319, N321, H323, and E357 in SiFATB1-2) ( Figure S10).
The FAD gene family is well characterized in oilseed crops, including 31 in peanut [71], 68 in rapeseed [66], 29 in soybean [72], 40 in sunflower [67], 12 in B. juncea, and 8 in black mustard [73]. SAD enzymes catalyze the first desaturation in the plant FA biosynthesis pathway. The members of the SAD family have been identified in oilseed crops, including peanut, B. juncea, B. rapa, B. nigra, olive, and camelina, and possess 3, 12, 7, 8, 3, and 3 SAD genes, respectively [74]. Among seven AtSAD members, SSI2/FAB was characterized as a typical 18:0-ACP-specific acyl-ACP ∆ 9 desaturase [18], while two members from AtSAD, AtAAD2, and AtAAD3 catalyze 16:0-ACP∆ 9 desaturation [19]. The key amino acid residues located in the catalytic domain are predicted to shape a deep substrate-binding pocket for 18:0-ACP, while in 16:0-ACP specific enzymes, a shorter channel substrate-binding pocket was observed [75]. Based on the alignment of eight AA residues at the substrate-binding pocket, five SiSADs (SiSAD1-5) might be of 18:0-ACP specificity and two SiSADs (SiSAD6 and 7) possibly exhibit 16:0-ACP specificity (Table 2; Figure S9). Of the two SiSADs, SiSAD6 and 7, structural and functional association analysis of the amino acid residues (for amino acid positions refer to Table 2) reveals the presence of a bulky amino acid in SiSAD6 (W198), which is similar to AtAAD2 (F224), AtAAD3 (F216), and MuPAD (W151), but not in SiSAD7. This observation favors SiSAD6 as a better candidate for PAD than SiSAD7. Since the orthologous genes from different species tend to group together rather than with the paralogous genes within a species, homology relationships ( Figure 2) indicate SiSAD6 and AtAAD2 and 3 would have evolved their catalytic specificity independently. Hence, the SiSAD6 is not as similar (with respect to orthologous relationship) as BrSAD6 with AtAAD2 or BrSAD7 with AtAAD3 ( Figure 2). As reported in cotton (GhA-SAD6 and GhD-SAD8) and Arabidopsis (AtAAD2 and 3) for the preferential expression (of the respective gene copies) in the endosperm and aleurone tissues [19,31,[75][76][77], the gene copies do exhibit tissue-specific selective expression patterns in addition to their specificity for 16:0-ACP or 18:0-ACP.
Membrane-bound FADs include omega-6-FADs (FAD2 and FAD6), omega-3-FADs (FAD3, FAD7 and FAD8), and palmitate desaturase (FAD4) [67]. FAD2 and FAD6 synthesize linoleic acid from oleic acid in plastids and ER, respectively [9]. In this study, three omega-6-FADs were retrieved from the sesame genome (Table S2). The presence of multiple copies of FAD2 is common in oilseed crops. For example, six FAD2 have been reported in peanut, five in soybean, six in safflower, three in sunflower, four in cotton, and two in flax [20]. Previously, FAD2 from sesame was isolated and characterized [78]. Furthermore, screening of the variants for SiFAD2-1 from 705 accessions detected a mutation causing an amino acid change (R142H), which probably affects the desaturase activity, with mutant accessions accumulating extremely high levels of oleic acid (48%) in seed oil [25]. In the present study, a FAD2 member, designated as SiFAD2-2, was retrieved from the sesame genome (Table 1). The omega-3 desaturase genes (FAD3, FAD7, and FAD8) were found as single copies in sesame (Table S1).
The physicochemical properties of the FA desaturase proteins revealed that both SiSAD and SiFAD families were hydrophilic in nature. SiFAD proteins were predicted as alkaline proteins while SiSADs were slightly acidic (Table S2). The motifs identified using MEME software for the SiFAD genes were highly conserved within each subfamily ( Figure S28), in accordance with a previous report [71]. Generally, soluble and membrane-bound desaturases were found to possess two and three histidine boxes, respectively ( Table 1). The histidine boxes are essential for FAD catalytic activity [79]. The presence of conserved transmembrane domains is a typical characteristic of membrane-bound FADs [20]. The altered expression pattern of FAD family members is known to alter fatty acid profiles in various oil crops such as Brassica [80], cotton [81], and soybean [82]. The mechanism through which the FAD family members' regulation alters the protein and oil content and yield is not known yet.

Identification of Key Regulatory Genes
The MCAT gene has been isolated and characterized in rapeseed, soybean, and peanut [13,83]. MCAT is a key enzyme that catalyzes the first committed step in the FA pathway, and its expression level is closely associated with increased storage oil content [50]. The total seed yield and oil content were increased in transgenic Arabidopsis plants overexpressing MCAT [50].
β-ketoacyl-acyl carrier synthases (KAS) catalyze chain-initiation, -elongation, and -condensation steps and are classified as KASIII, KASI, and KASII [14]. A total of eight KAS gene family members, including two KASIII, three KASI, and three KASII, were found in sesame (Table S2). In silico genome-wide analysis of the KAS gene family in flax (Linum usitatissimum L.) led to the identification of twelve genes consisting of four KASIII, six KASI, and two KASII [84]. KAS genes are reported to affect oil production. KASI-deficient mutant Arabidopsis seeds have been reported to produce significantly lower oil content [85]. Overexpression of NtKASI-1 significantly enhanced oil accumulation in tobacco [15]. Virus-induced gene silencing in Jatropha curcas demonstrated that silencing of JcKASII significantly altered FA composition and TAG biosynthesis [51]. Statistically significant associations for the two major genes, KASI (SiFabB-2) and SiDGAT2, were reported to determine the unsaturated to saturated fat ratio [25]. No variability for SiFabB-2 at sequence level was found among the 705 sesame accessions studied [25].
Acyl-ACP thioesterases play an essential role and exhibit two distinct classes (FatA and FatB) [86]. FatA has higher specificity for unsaturated acyl groups, while FatB is more active in saturated acyl-ACPs. FatA and FatB grouped with α-CT and β-CT subunits of ACCase (Figure 2), which might be due to the presence of a conserved motif in these proteins ( Figure S36). Genetic engineering of acyl-ACP thioesterases has been demonstrated to be effective for oil improvement [87]. Overexpression of FatB1 from Umbellularia californica (California Bay laurel) and MlFatB from Madhuca longifolia in Brassica juncea increased the laurate levels by over 50% and stearate levels by 16-fold, respectively [87,88]. A genome-wide association study of 705 sesame accessions reported that the candidate genes underlying the variation in FA composition and oil content include SiFatB, SiFatA, KASII, KASI, and SAD [25]. Based on homology relationships, we identified them as SiFatB1-1, SiFatA, SiFabF-1, SiFabB-2, and SiSAD-1, respectively. Availability of the sequence information is much helpful in identifying, on a homology basis, the actual gene copy (among the gene copies in a genome) being transcriptionally regulated with tissue (space) and developmental stage (time) specificity. Hence, the evolutionary or comparative genomics tools are helpful in establishing a homology relationship and, thereby, predict the actual gene copy being specifically expressed with reference to space and time in an organism. Unraveling such biological information on a homology-basis leads to novel insights due to the exceptional conservation of synteny among ortholog blocks [89][90][91][92][93][94]. Adding such homology-based information through bioinformatic tools enriches our understanding on the expression patterns for the biological pathways, with reference to space and time, in the organism of study.
TAGs are the major form of energy storage in plants and contribute to many specific developmental stages of the plant [1]. Long-chain acyl-coenzyme A synthetase (LACS) catalyzes the formation of acyl-CoAs from free fatty acids, which is pivotal for TAG biosynthesis [33]. LACS enzymes esterify free fatty acids to fatty acyl-CoA thioesters [95]. These fatty acyl-CoA thioesters are utilized in many metabolic pathways for FA elongation, triacylglycerols, membrane lipids, wax, cutin, and suberin as well as in FA catabolism [33]. Arabidopsis contains one of the largest known LACS families, having nine LACS genes [95]. In sesame, we found six LACS genes. In comparison to Arabidopsis, three LACS types, LACS3, LACS5, and LACS7, were absent in sesame (Table S2). Likewise, substantial variation in 629 LACS homologs from 122 species was reported [96].
The functions of most of the LACS members in Arabidopsis are well characterized. AtLACS9 functions redundantly with either AtLACS1 or AtLACS4 in seed TAG biosynthesis [16]. AtLACS-6 and -7 localize in the peroxisome and are involved in β-oxidation. AtLACS-1, -2, and -4 are involved in surface wax and cutin biosynthesis [16]. The overall functions of AtLACS3 and AtLACS5 are unknown yet [96]. In oilseed crops, LACS from sunflower, HaLACS1 and HaLACS2, showed high sequence homology with AtLACS9 and AtLACS8 and are essential in oil production [97]. BnLACS2 from rapeseed is predominantly expressed during seed development and is involved in seed oil synthesis [98]. The specific gene homologs of LACS and DGAT in flax, LuLACS8A and LuDGAT2, respectively, have been reported to contribute in the enrichment of flaxseed oil with α-linolenic acid [99].
LPAAT incorporates FA at the sn-2 position of PA and is a crucial enzyme in membrane phospholipid and storage lipids biosynthesis [107]. In plants, LPAATs are classified into two classes, plastid LPAATs and ER LPAATs, with the ER LPAATs further classified into Class-A and Class-B [108]. In Arabidopsis, five LPAATs have been reported, one plastidial (AtLPAAT1) and four (AtLPAAT2-5) microsomal targeted [109]. Class-A LPAATs are ubiquitously present in all parts of most of the plants and show substrate specificity with C18 unsaturated FAs [107], whereas class-B LPAATs are expressed in seeds and are associated with acylation of unusual acyl-CoAs [110]. Among eight sesame LPAATs, one (SiLPAAT1) was identified to be plastid localized in this study and exhibits similarity with Arabidopsis AtLPAAT1 (Figure 2). We have identified the protein annotated in the NCBI protein database as 1-acyl-sn-glycerol-3-phosphate acyltransferase (XP_020554192.1) as SiLPAATB because it was found to be homologous with class-B LPAAT of Ricinus communis (RcLPAATB) (Figure 2). The microsomal LPAAT2/3 group includes SiLPAAT2-1 and SiLPAAT2-2, along with AhLPAAT2 [53], BnLPAAT2 of B. napus [54], and AtLPAAT2 of Arabidopsis [110] (Figure 2). We retrieved SiLPAAT3 from the sesame genome, a probable ortholog of the male gametophyte isoenzyme AtLPAAT3 characterized in Arabidopsis [110] (Figure 2). Overexpression of LPAAT genes has proven to increase seed oil content. Specifically, LPAAT2 from rapeseed and peanut enhanced oil content by 13% and 7.4%, respectively, when transformed in Arabidopsis seeds [53,54]. The overexpression of yeast SLC1 and SLC1-1 genes (homologs of ER AtLPAATs) in Arabidopsis, soybean, and rapeseed resulted in an 8-48% increase in seed oil content [111].
The dephosphorylation of PA to produce DAG catalyzed by PAPs is a committed step in TAG biosynthesis [112]. Transient expression of N-terminal green fluorescent protein (GFP) fused AtPAH1 in tobacco leaves showed signals in cytosol and is predicted to become localized to the ER membrane to bind with PA [113], which is consistent with previous reports in yeast Pah1 and Tetrahymena thermophila TtPah1 [114]. A double mutant of the two AtPAH genes (AtPAH1 and 2) is affected in phosphatidylcholine production at the ER [113]. These two proteins have been reported as being involved in phospholipid homeostasis in the ER and in TAG synthesis, especially under N starvation [115].
DGATs are studied for enhancing oil production as they are one of the rate-limiting factors in plant storage lipid accumulation. Three structurally unrelated classes of DGATs were identified in plants: DGAT1, DGAT2, and DGAT3. DGAT1 and DGAT2 are associated predominantly with membranes, while DGAT3 is cytosolic [6,116]. Three members of the DGAT1 and four members of the DGAT2 family were found in soybean [60], and seven DGAT1, eight DGAT2 and three DGAT3 were identified in peanut [117].
The putative C-terminal ER retrieval motif is detectable in SiDGAT1 (-YYHDL) and in other plant DGAT1s [118]. These putative ER retrieval motifs possess hydrophobic amino acid residues [119]. The SiDGAT2 was found to have five conserved domains ( Figure S17) as signature motifs within the DGAT2 subfamily. The Pfam analysis revealed that SiDGAT1 is a membrane-bound O-transferase (MBOAT) protein with nine transmembrane domains, consistent with the DGAT1 proteins of Arabidopsis, B. napus, castor, peanut, and soybean, as reported earlier [116]. In SiDGAT2, two transmembrane domains at the N-terminal of the protein were predicted ( Figure S1), as in the other characterized plant DGAT2 proteins [120]. Overexpression of DGAT1 genes from Arabidopsis and Camelina sativa in Arabidopsis, rapeseed, and Camelina led to a 5-25% increase in storage oil content [55,56,121]. Similar results were documented when DGAT1 from Tropaeolum majus was transformed into B. napus [57] and DGAT1 from B. napus and Sesamum indicum was transformed into Arabidopsis [58,122]. Overexpression of soybean and oil palm DGAT2 in Arabidopsis also increased TAG biosynthesis [59,60].

Validation of the Identified Regulatory Genes through RNA-Seq Studies
Oil biosynthesis in plants is performed through de novo fatty acid biosynthesis, TAG assembly, and lipid droplet biogenesis ( [5] and citations therein). LPAAT and WSD1/DGAT are known to regulate key steps of TAG or lipid biosynthesis in oil seed plants [123,124]. Higher expression levels of the genes involved in TAG biosynthesis such as LPAAT, GPAT, and DGAT were known to be associated with enhanced oil accumulation in the seeds of the oilseed crops [24,123,125]. SAD and FAD2 desaturate fatty acyl chains of unsaturated fatty acids. SAD desaturates stearoyl-ACP (18:0) to form oleic acid (18:1), and FAD2 catalyzes synthesis of linoleic acid (18:2) from oleic acid (18:1) [24]. The expression pattern of SAD and FAD2 corresponds with oleic acid (18:1) and linoleic acid (18:2) at the early and middle stages of developing seeds of sesame [126]. The oleosins have been linked to lipids' biosynthesis/metabolism and the size of seed oil bodies [127,128]. The expression patterns observed for oleosins are consistent with their critical roles in oil body biogenesis.
Validation of the identified key regulatory genes for oil biosynthesis was completed using RNA-seq data. The transcriptome dataset used for this study was downloaded from publicly available datasetsfor the high and low oil yielding genotypes at four developmental stages [24]. Our results indicate that there are 27 genes (of 501 genes that are significantly differentially expressed between the high-and low-oil yielding genotypes) associated with oil biosynthesis and lipid droplet biogenesis (Table S4; Figure 5). These include those identified through our genome-wide in silico studies. In general, oil accumulation is said to increase rapidly during seed maturation [129]. Transcriptomic studies performed by Wang et al. (2019) underscore the importance of higher expression levels of oil biosynthetic genes at 30 DPA for increased oil accumulation in high oil yielding genotypes, compared to the low oil yielding genotypes [24]. In addition to this, our analysis using the same dataset suggests that early onset of higher expression levels for oil biosynthetic genes is also crucial for higher oil yield. Combining these two, early onset by 10 DPA and sustained expression until 30 DPA, the high oil yielding genotype accommodates a broader window of time for active oil accumulation when compared to the low oil yielding ones. Hence, for higher oil yield, in addition to expression patterns at 30 DPA, expression for genes of the oil biosynthesis pathway to be triggered during the early seed developmental stage (10 DPA) itself also seems to be a determining factor, as identified through the study. This helps in oil accumulating for a longer duration within the anthesis-capsule maturity window, thereby yielding higher oil content in the seeds at maturity.

Sequence Retrieval
Using the available sesame genome sequence information [25,130], the gene family members involved in oil biosynthesis were retrieved. The sequence information for genes (and their translated amino acid sequences, Table S1 and Seq_File_S1) in oil biosynthesis in Arabidopsis, peanut, soybean, and other plant species was also retrieved. Transcriptome raw reads (90 bp paired-end sequencing using the Illumina Hiseq 2000 platform) of 12 samples representing different stage of seed development (10,20,25, and 30 DPA) for 1 high (ZZM4728) oil-and 2 low (ZZM3495 and ZZM2161) oil-yielding genotypes were downloaded from the GenBank SRA database [24] for validation studies.

Phylogenetic Analysis and Conserved Motifs Screening
The conserved domains for the retrieved proteins were also identified from the data reported in the NCBI conserved domain database [133]. Domain analysis of the retrieved proteins was performed using the Pfam database [134]. Retrieved sequences were also verified using a simple modular architecture research tool (SMART, http://smart.embl. de; accessed on 28 September 2022), an annotation resource [135]. Multiple sequence alignments (MSA) of the retrieved protein sequences were performed using the Clustal [136] with default parameters. Sequence analyses were conducted in MEGAv10 [27] with a bootstrap value set to 10,000. The MEME v4.12.0 [137] server was utilized to identify the conserved amino acid motifs (Table S1) with the following setting: the maximum number of motifs-ten and minimum motif width-six.

Analysis of Cis-Acting Regulatory Elements
The retrieved candidate promoter regions (3 kb upstream) were used to predict the potential cis-acting regulatory elements (CAREs) using the PlantCare tool (http:// bioinformatics.psb.ugent.be/webtools/plantcare/html/; accessed on 28 September 2022). Function annotation of identified CAREs was retrieved from the PlantCare database (Lescot et al. 2002).

RNA-Seq Analysis between High and Low Oil Content Yielding Sesame Genotypes
The fastq files of raw reads [24] were processed to check the quality parameters using FastQC [140]. The adapter, low-quality sequences, were trimmed using sickle (v1.33) [141]. The reads were subjected to a quality check using FastQC post-trimming. The filtered high-quality reads were mapped against the reference sesame genome [130] using TopHat (v2.1.1) [142] with default parameters. A reference-guided assembly of the transcriptome data from all 12 samples was performed using Cufflinks (v2.1.1) [143], and a consensus assembly was generated using Cuffmerge. The differentially expressed genes were identified using cuffdiff [144] with default parameters. The transcripts exhibiting differences of at least two-fold with FDR value ≤ 0.05 were considered to be significantly differentially expressed. The transcriptome database (cDNA sequences) for sesame genotype Zhongzhi-13 [130] was downloaded and functionally annotated using omicsbox (v1.4.11) [145]. All significantly differentially expressed gene IDs were searched in the annotated file to assign the functionality of the unknown differentially expressed genes.

Conclusions
We report here on the predicted physicochemical properties, subcellular locations, conserved sites, and homology relationships for the gene products involved in oil biosynthesis in sesame, using the available genome information. Of eight SAD members, which are known as stearoyl-(acyl-carrier-protein) 9-desaturases, two members were predicted to be possibly 16:0-ACP specific (Table 2). Similarly, SiLPAAT1, previously uncharacterized, has been functionally characterized by its homology to LPAAT1 from Arabidopsis ( Figure 2). Moreover, SiLPAATB, previously annotated as 1-acyl-sn-glycerol-3-phosphate acyltransferase, is a class-B LPAAT, identified through a homology relationship with Ricinus communis RcLPAATB (Figure 2). The genome-wide in silico mining revealed key regulatory genes associated with the oil biosynthesis pathway ( Table 3). Validation of these genes was performed through RNA-seq approaches using the publicly available transcriptome dataset [24]. Our validation studies underscored the requirement to trigger the early onset of oil biosynthesis during seed development to have a longer duration of oil accumulation, resulting in higher oil yield. This is especially required when the anthesis-capsule maturity window period does not vary much. Therefore, identifying genotypes with early onset (prior 10 DPA) and sustained expression patterns until 30 DPA for the genes involving oil biosynthesis would help enhance oil yield in the oilseed crop sesame.