Integrative Analysis of Oleosin Genes Provides Insights into Lineage-Specific Family Evolution in Brassicales

Oleosins (OLEs) are a class of small but abundant structural proteins that play essential roles in the formation and stabilization of lipid droplets (LDs) in seeds of oil crops. Despite the proposal of five oleosin clades (i.e., U, SL, SH, T, and M) in angiosperms, their evolution in eudicots has not been well-established. In this study, we employed Brassicales, an economically important order of flowering plants possessing the lineage-specific T clade, as an example to address this issue. Three to 10 members were identified from 10 species representing eight plant families, which include Caricaceae, Moringaceae, Akaniaceae, Capparaceae, and Cleomaceae. Evolutionary and reciprocal best hit-based homologous analyses assigned 98 oleosin genes into six clades (i.e., U, SL, SH, M, N, and T) and nine orthogroups (i.e., U1, U2, SL, SH1, SH2, SH3, M, N, and T). The newly identified N clade represents an ancient group that has already appeared in the basal angiosperm Amborella trichopoda, which are constitutively expressed in the tree fruit crop Carica papaya, including pulp and seeds of the fruit. Moreover, similar to Clade N, the previously defined M clade is actually not Lauraceae-specific but an ancient and widely distributed group that diverged before the radiation of angiosperm. Compared with A. trichopoda, lineage-specific expansion of the family in Brassicales was largely contributed by recent whole-genome duplications (WGDs) as well as the ancient γ event shared by all core eudicots. In contrast to the flower-preferential expression of Clade T, transcript profiling revealed an apparent seed/embryo/endosperm-predominant expression pattern of most oleosin genes in Arabidopsis thaliana and C. papaya. Moreover, the structure and expression divergence of paralogous pairs was frequently observed, and a good example is the lineage-specific gain of an intron. These findings provide insights into lineage-specific family evolution in Brassicales, which facilitates further functional studies in nonmodel plants such as C. papaya.


Introduction
Oleosins are a class of highly abundant structural proteins of lipid droplets (LDs), which represent a major carbon reserve and are widely present in various plant organs such as seeds, pollen, flowers, fruits, and certain tubers [1][2][3][4][5].Oleosins are typical for their small molecular weight (MW) of 14-30 kDa [5][6][7][8][9][10][11][12][13].Nevertheless, all of them share a conserved central hydrophobic portion of approximately 72 residues, which could form a hairpin penetrating the surface phospholipid monolayer of an LD into the matrix.The hydrophobic hairpin is composed of two arms (each of about 30 residues) connected by a 12-residue loop with the pattern of PX 5 SPX 3 P, where X represents a nonpolar residue.By contrast, N-and C-terminal peptides, which lie on the phospholipid surface and may act as a receptor for metabolic enzymes or regulatory proteins, are amphipathic and usually variable [8,14].Genome-wide surveys reveal that oleosin genes have already appeared in the single-celled algae, e.g., Chlamydomonas reinhardtii, and have diverged into at least six clades known as P (primitive), U (universal), SL (seed low), SH (seed high), T (tapetum), and M (mesocarp) during later evolution [4,8,15].The most primitive Clade P was only found in green algae, mosses, and ferns, whereas Clade U, which is typical for the C-terminal AAPGA, is universally present in all land plants including Selaginella moellendorffii.Clade SL, which is present in seeds of both gymnosperms and angiosperms, was named after the low MW.This clade was proposed to first evolve from Clade U and later gave rise to Clades SH, M, and T. Clade SH, which is usually present in seeds of angiosperms, is typical for the high MW and C-terminal insertion relative to Clade SL.By contrast, Clades M and T were reported to be lineage-specific, which are confined to Lauraceae and Brassicaceae, respectively [4,8].Comparative genomics analyses indicated that, for most clades, gene expansion was mainly contributed by whole-genome duplications (WGDs) especially those lineage-specific recent WGDs, e.g., the Brassicaceae-specific α WGD and the ρ WGD shared by cassava (Manihot esculenta) and rubber tree (Hevea brasiliensis) in Euphorbiaceae [5,6,12], in stark contrast to a key role of tandem duplication for Clade T in Brassicaceae [3,8,16].Brassicaceae belongs to the order Brassicales, which includes 17 families, 398 genera, and 4450 species that have experienced multiple independent WGDs [17].Thus far, genomewide identification of oleosin family genes has been reported in 10 species within Brassicales.However, most of them (80%) belong to the Brassicaceae family [3,8,10].Although it was established that Clade T is absent from papaya (Carica papaya, Caricaceae) and spider flower (Tarenaya hassleriana, Cleomaceae) [8], whether it is present or has been lost in other families within Brassicales is yet to be addressed.Recently available or updated genome assemblies for species in five Brassicales families beyond Brassicaceae, i.e., papaya [18], horseradish (Moringa oleifera, Moringaceae) [19], Bretschneidera sinensis (Akaniaceae) [20], caperbush (Capparis spinosa, Capparaceae) [21], Cleome violacea (Cleomaceae), acaya (Gynandropsis gynandra, Cleomaceae) [22], and spider flower [23], provide a good chance to uncover lineage-specific evolution of the oleosin gene family in this important plant order.
This study presents a comprehensive comparative analysis of the oleosin gene family in Brassicales.Significantly, our results showed that Clade M is actually not Lauraceae-specific but an ancient group that has already been present in the basal angiosperm Amborella trichopoda and is preserved in the early-diverging eudicot Aquilegia coerulea and all Brassicales species examined in this study.Moreover, a novel but ancient group named N was identified in most tested species, i.e., A. trichopoda, papaya, horseradish, C. violacea, acaya, and spider flower.In papaya, an economically and nutritionally important tree fruit crop widely cultivated in tropical and subtropical areas [18], this group was shown to be constitutively expressed, which includes pulp and seeds of the fruit.Herein, we report our findings.

Identification of Oleosin Genes in A. trichopoda, Avocado, A. coerulea, and Representative Brassicales Species
To gain insight into lineage-specific family evolution in Brassicales, recently available chromosome (Chr)-level genome assemblies of A. trichopoda (a single living representative within the sister lineage Amborellales to all other flowering plants) [24], avocado (Persea americana, a Laurales member of an early-branching lineage of angiosperms that includes one M oleosin) [25], and A. coerulea (a Ranunculales member of the basal-most eudicot clade) [26] were first employed to identify oleosin family genes, resulting in five, three, and five members, respectively (Table 1).Five members identified in A. trichopoda and A. coerulea are consistent with what is found in previous assemblies [8], whereas only two avocado oleosin genes (i.e., PaOLE2 and -3) have been reported by previous studies [4,27].Moreover, an allele for PaOLE2 that was discarded for further analyses in this study was also identified from tig00003364, and their coding sequences (CDS) were shown to exhibit  1).Notably, compared with the previous study [8], one more member was identified in both papaya and spider flower, which were named CpOLE6 and ThOLE8, respectively (Table 1).
Physiochemical parameters and conserved domains of deduced oleosin proteins are summarized in Table 1.In contrast to the great majority of oleosins featuring a single oleosin domain, MoOLE6 harbors two instead.Since the sequence was also found in two other genome assemblies [28,29], it is more likely to be a true gene that was resulted from tandem duplication.The sequence length of oleosins varies from 115 (CsOLE3) to 267 (MoOLE6) amino acids (AA) with an average of 151 AA, and correspondingly, their theoretical MW varies from 11.92 (CsOLE3) to 28.01 (MoOLE6) kDa with an average of 16.01 kDa.It is worth noting that CpOLE6, MoOLE6, CvOLE6, GgOLE7, and ThOLE8 possess unexpected low pI values of 4.43-6.56, in striking contrast to the alkaline characteristic of 9.23-11.00for others.Except for BsOLE9, which exhibits an unusual GRAVY value of −0.144, the values for others are greater than 0, varying from 0.078 to 0.784 (Table 1).Nevertheless, all proteins possess relatively high aliphatic index (AI) values of 88.90-123.83(Table 1) as well as similar Kyte-Doolittle hydrophobicity plots (except for MoOLE6) (Figure S1), which is in accordance with their amphipathic property.

Evolutionary Analysis and Definition of Orthogroups
To uncover their relationships, an unrooted evolutionary tree was first constructed using full-length protein sequences of five AtrOLEs, three PaOLEs, five AcOLEs, six CpOLEs, six MoOLEs, 10 BsOLEs, eight CsOLEs, six CvOLEs, seven GgOLEs, eight ThOLEs, eight MeOLEs, nine PtOLEs, and 17 AtOLEs.As shown in Figure 1A, they were clustered into six clades, five of which were previously defined as U, SL, SH, T, and M [4,8].Whereas Clade T is restricted to Arabidopsis (Arabidopsis thaliana), Clade M, which was first described in the Lauraceae family [4,27], was unexpectedly found in all species examined in this study.The presence of Clade M in A. trichopoda (i.e., AtrOLE2) supports its early origin before the radiation of angiosperms.Moreover, a novel clade denoted N is not only present in papaya (i.e., CpOLE6), horseradish (i.e., MoOLE6), C. violacea (i.e., CvOLE6), acaya (i.e., GgOLE7), and spider flower (i.e., ThOLE8) but also in A. trichopoda (i.e., AtrOLE5), implying its early origin and lineage/species-species gene loss during later evolution.Structural features of Clade N relative to other CpOLEs are shown in Figure 1B.In contrast to AtrOLE5 possessing the conserved PX 5 SPX 3 P pattern, other members of Clade N exhibit PX 5 S/GPX 3 G/F variants.Moreover, an 18-residue insertion that is present in Clade SH was not detected in this clade as well as CpOLE4, MoOLE5, and CsOLE6, implying their divergence.Notably, AtrOLE4 possesses a 22-residue insertion instead (Figures 1B and S2).Additionally, whereas the majority of U oleosins feature the C-terminal AAPGA, AcOLE1, and GgOLE1 harbor the AAPSA instead (Figure S3).Furthermore, the BRH (best reciprocal hit) method was used to identify orthologs across different species.Except for T oleosins that were proven to be widely present in Brassicaceae plants [16], the criterion of at least one member present in more than one species examined in this study was used to define orthogroups (OGs).As shown in Figure 2 and Table S1, a total of nine OGs were obtained, i.e., U1/-2, M, SL, SH1/-2/-3, N, and T, where five AtrOLE genes belong to U1, M, SL, SH1, and N, respectively, supporting early diversification of this family in angiosperms.During later evolution, linage-specific expansion and concentration were found.Notably, only two OGs (i.e., U1 and M) are preserved in avocado, whereas four OGs (i.e., U1, M, SL and SH1) are retained in A. coerulea (Figure 2).

Analysis of Exon-Intron Structure
To learn more about structure divergence, the exon-intron structures were analyzed on the basis of revised gene models.As shown in Table 1, a single intron was found in 27 out of 64 identified oleosin genes, occupying approximately 42.19%, smaller than 88.24% found in Arabidopsis (At-T8 represents the sole member possessing two intron) (Table S2).These intron-containing genes belong to Clades U, SL, SH, and N, which seems to be independent.Notably, no intron was found in Clade M as well as any member of A. trichopoda, avocado, and A. coerulea, whereas one intron is present in all SL members of papaya, horseradish, B. sinensis, and other Brassicales species.Moreover, in C. violacea, acaya, and spider flower, all U and SH members harbor an intron, whereas GgOLE7 represents the unique N member with one intron (Table 1).Interestingly, the intron position appears to be conserved within clades but differs between different clades.Whereas Clade SL features one intron immediately after the sequence encoding the hydrophobic hairpin, the intron found in Clade N is located at the C-terminus of the hydrophobic hairpin; the intron found in Clade SH is located before the hydrophobic hairpin; and the intron found in Clade U is located at the C-terminus of the proline knot.These results imply an independent and lineage-specific gain of an intron (Figures 1 and  S3).

Gene Localization, Synteny Analysis, and Lineage-Specific Family Evolution in Brassicales
Gene localization revealed that identified oleosin genes are distributed across two-to-six chromosomes of A. coerulea, avocado, B. sinensis, A. trichopoda, papaya,

Analysis of Exon-Intron Structure
To learn more about structure divergence, the exon-intron structures were analyzed on the basis of revised gene models.As shown in Table 1, a single intron was found in 27 out of 64 identified oleosin genes, occupying approximately 42.19%, smaller than 88.24% found in Arabidopsis (At-T8 represents the sole member possessing two intron) (Table S2).These intron-containing genes belong to Clades U, SL, SH, and N, which seems to be independent.Notably, no intron was found in Clade M as well as any member of A. trichopoda, avocado, and A. coerulea, whereas one intron is present in all SL members of papaya, horseradish, B. sinensis, and other Brassicales species.Moreover, in C. violacea, acaya, and spider flower, all U and SH members harbor an intron, whereas GgOLE7 represents the unique N member with one intron (Table 1).Interestingly, the intron position appears to be conserved within clades but differs between different clades.Whereas Clade SL features one intron immediately after the sequence encoding the hydrophobic hairpin, the intron found in Clade N is located at the C-terminus of the hydrophobic hairpin; the intron found in Clade SH is located before the hydrophobic hairpin; and the intron found in Clade U is located at the C-terminus of the proline knot.These results imply an independent and lineage-specific gain of an intron (Figures 1 and S3).

Expression Divergence of Oleosin Genes
Global expression profiles of AtOLE genes were first examined from the Arabidopsis RNA-seq Database, which includes 28,164 libraries.As shown in Figure S5, most members of Clade T are preferential to be expressed in flowers, though At-T8 is also expressed in embryos and seeds.By contrast, other members are predominantly expressed in seeds, embryos, and endosperms, as well as in silique.Notably, At-Sm2 and At-Sm3 were also shown to be expressed in pollen and flowers.Moreover, during embryo development, transcripts of most members (including At-T8) increase gradually, peaking at the stage of mature green.At the stage of 8-cell/16-cell, At-S5, At-Sm2, and At-Sm1 represent the three most expressed isoforms, contributing 83.23% of total transcripts.Then, a sudden drop of total transcripts was observed at the globular stage, where At-S5, At-Sm2, and At-Sm1 also contribute 75.98% of total transcripts.At stages from early heart to late torpedo, At-S5 represents the most expressed isoform that contributes 44.69-62.58% of total transcripts.At the stage of bent cotyledon, At-S1, At-S3, and At-S5 represent the three most expressed isoforms, contributing 76.94% of total transcripts.At the stage of mature green, At-S3, At-S4, and At-S1 represent the three most expressed isoforms contributing 80.93% of total transcripts (Figure S6).
Then, papaya was used as an example of a fruit plant to study the expression evolution of oleosin genes.The RNA-seq data of various tissues, i.e., callus, shoot, hypocotyl, leaf, root, phloem sap, stamen, pollen, ovule, and pulp of mature fruit, were first investigated.As shown in Figure 5, their transcripts were detected in at least one of the tested tissues, though gene abundances are highly diverse.Total transcripts of the whole gene family were most abundant in shoot (100%), followed by callus (5.21-20.37%),and they were considerably low in other tissues (0.12-0.65%).In contrast to the constitutive expression of CpOLE6, CpOLE1 was rarely expressed in sap and pulp.Whereas CpOLE6 represents the unique isoform expressed in sap, three members were shown to be expressed in pulp, i.e., CpOLE6, -3, and -5.In the shoot and callus, CpOLE3, -4, and -2 represent three dominant isoforms, which contribute 80.98-91.69% of total transcripts.On the contrary, CpOLE4 was rarely expressed in other tissues; CpOLE3 was rarely expressed in sap, stamen, pollen, and ovule; CpOLE2 was rarely expressed in root, sap, and pulp; and CpOLE5 was rarely expressed in root and sap.As expected, according to their expression patterns over various tissues, six CpOLE genes were grouped into three main clusters: Cluster I includes the two most expressed genes in shoot and callus, i.e., CpOLE3 and -4; Cluster II includes two moderately expressed isoforms, i.e., CpOLE2 and -5; and Cluster III includes CpOLE6 and -1, which were constitutively expressed in most tissues (Figure 5A).
Since no transcriptome data are available for the seed tissue, qRT-PCR analysis was further conducted using seeds derived from mature fruits.As shown in Figure 5B, except for CpOLE6 and -1, the expression levels of other CpOLE genes were significantly higher than the reference gene CpEIEF, varying from 2.61-36.81folds, implying their divergence.Notably, CpOLE3 and -4 were shown to represent two dominant isoforms whose transcript levels were comparable (Figure 5B).
In the present study, we used Brassicales, an economically important order of flowering plants that harbors the lineage-specific T clade [3,8,17], as an example to address evolution patterns of the oleosin gene family.In addition to 34 members reported in Arabidopsis, cassava, and poplar [12; this study], a number of 64 oleosin family genes were identified from ten species representing eight plant families, i.e., Amborellaceae (A. trichopoda), Lauraceae (avocado), Ranunculaceae (A. coerulea), Caricaceae (papaya), Moringaceae (horseradish), Akaniaceae (B.sinensis), Capparaceae (caperbush), and Cleomaceae (C.violacea, acaya, and spider flower), while gene numbers of the family vary from three to ten.Interestingly, the family amounts are usually higher in species that Since no transcriptome data are available for the seed tissue, qRT-PCR analysis was further conducted using seeds derived from mature fruits.As shown in Figure 5B, except for CpOLE6 and -1, the expression levels of other CpOLE genes were significantly higher than the reference gene CpEIEF, varying from 2.61-36.81folds, implying their divergence.Notably, CpOLE3 and -4 were shown to represent two dominant isoforms whose transcript levels were comparable (Figure 5B).
In the present study, we used Brassicales, an economically important order of flowering plants that harbors the lineage-specific T clade [3,8,17], as an example to address evolution patterns of the oleosin gene family.In addition to 34 members reported in Arabidopsis, cassava, and poplar ( [12], this study), a number of 64 oleosin family genes were identified from ten species representing eight plant families, i.e., Amborellaceae (A. trichopoda), Lauraceae (avocado), Ranunculaceae (A. coerulea), Caricaceae (papaya), Moringaceae (horseradish), Akaniaceae (B.sinensis), Capparaceae (caperbush), and Cleomaceae (C.violacea, acaya, and spider flower), while gene numbers of the family vary from three to ten.Interestingly, the family amounts are usually higher in species that experienced recent WGDs.According to comparative genomics analysis, after the split with A. coerulea, the last common ancestor of core eudicots underwent the γ whole-genome triplication (WGT) event at around 117 million years ago (MYA) [39].Furthermore, Brassicaceae species, represented by Arabidopsis, experienced two more WGDs named At-β (60-65 MYA) and At-α (~35 MYA) [30,40], where the At-β WGD was shown to be shared by caperbush, C. violacea, acaya, and spider flower [17,[21][22][23].In the Capparaceae lineage, caperbush further experienced one independent WGD known as Cs-α at 18.6 MYA [21].In the Cleomaceae lineage, after the split with C. violacea, the last common ancestor of acaya and spider flower first experienced one independent WGD known as Gg-α (~22 MYA), which was followed by an addition of a third genome (Th-α, ~18.4 MYA) to spider flower but not acaya [41].After the split with papaya, B. sinensis in the Akaniaceae lineage was also shown to experience one independent WGD known as Bs-α [20].Correspondingly, compared with five members present in both A. trichopoda and A. coerulea, one more was identified in papaya, horseradish, and C. violacea.By contrast, more than seven members were identified in B. sinensis, acaya, and spider flower, which are comparative to eight and nine reported in cassava and poplar, respectively [12].
According to evolutionary analysis, 98 oleosin genes were grouped into six clades, one more than that described before [4,8,12].Interestingly, this novel and so-called N clade are present in A. trichopoda and most Brassicales species examined in this study, implying its early origin and lineage-specific gene loss.Besides Clade N, four other AtrOLE genes were assigned into four clades, i.e., U, SL, SH, and M, instead of only two as proposed by Huang and Huang (2015) [8].The updated classification is not only supported by evolutionary analysis but also by BRH-based orthologous and synteny analyses.Whereas Clades SL, M, N, and T contain a single OG, U and SH have evolved to form two and three, respectively, a high member of which are still located within syntenic blocks.As for Clade M, PaOLE3, AtrOLE2, AcOLE2, MeOLE2, PtOLE2a, and PtOLE2b were shown to be located within syntenic blocks, whereas CpOLE2, MoOLE3, BsOLE3, BsOLE4, CsOLE2, CvOLE2, GgOLE3, ThOLE3, and At-Sm3 were also characterized as syntelogs, implying a highly conserved evolution of this clade, which argues Lauraceae-specific distribution proposed by Huang and Huang (2016) [4].Moreover, this clade has expanded in B. sinensis and poplar via recent WGDs, which were shown to be Akaniaceae and Salicaceae-specific, respectively [20,42].As for Clade N, despite a frequent loss in species examined in this study, CvOLE6, GgOLE7, and ThOLE8 are still located within syntenic blocks, implying possible functions in specific biological processes that are yet to be studied.As for Clade U, which is typical for the C-terminal AAPGA [8,12], gene expansion was observed in avocado, cassava, B. sinensis, acaya, spider flower, and Arabidopsis, which were contributed by WGDs and dispersed duplication.Among them, though BsOLE1 and -2 are no longer located within syntenic blocks, both of them were characterized as syntelogs of CsOLE1, which is consistent with their comparable Ks value to three Bs-α WGD repeats identified in this species, i.e., BsOLE3/4, BsOLE5/6, and BsOLE9/10, implying their WGD-derivation and chromosome rearrangement of the BsOLE2-encoding region.Similar cases were also observed for GgOLE1/-2 and ThOLE1/-2, where GgOLE1, GgOLE2, and ThOLE1 were characterized as syntelogs of CsOLE1, At-Sm1, and At-Sm2, though ThOLE2 is no longer located within syntenic blocks.As for Clade SL, gene expansion was observed in A. coerulea, B. sinensis, caperbush, spider flower, Arabidopsis, cassava, and poplar, which were contributed by WGDs, as well as tandem and dispersed duplication.Notably, BsOLE6 and -7 represent the unique pair of tandem repeats beyond Clade T. Compared with other clades, Clade SH has extensively expanded in core eudicots, forming three OGs as identified in this study.Among them, SH1 and SH2 are more likely to arise from the γ event [39], and MoOLE4, MoOLE-5, BsOLE8, BsOLE9, BsOLE10, MeOLE4a, and MeOLE5 are still located within syntenic blocks with similar Ks values, whereas SH3 appears to be generated by the At-β event [30].Moreover, SH1 has further expanded in caperbush, Arabidopsis, cassava, and poplar via lineage-specific recent WGDs, i.e., Cs-α, At-α, ρ, and p, respectively [20,30,42,43].It is worth noting that, despite the wide presence of Clade T in Brassicaceae plants [8,16], no ortholog was identified in any other Brassicales species examined in this study, implying its appearance sometime after the Brassicaceae-Cleomaceae divergence.Nevertheless, At-T1 and At-T8 were characterized as syntelogs of CpOLE3, ThOLE4, ThOLE5, GgOLE4, CsOLE3, CsOLE4, and CsOLE5, implying that Clade T was indeed derived from Clade SL.
In addition to species-specific retention of repeats after WGDs, structural divergence was also shown to play a role in the evolution of the oleosin family.In contrast to no intron that is present in oleosin genes of A. trichopoda, avocado, and A. coerulea, Clade SL has gained one intron immediately after the sequence encoding the hydrophobic hairpin stretch in all Brassicales species examined in this study, which is similar to that reported in Salicaceae and Euphorbiaceae [12].Interestingly, the intron position found in Clade SL is different from that observed in several members of Clades U, SL, SH, and N, implying an independent gain of an intron.Since all SH members in C. violacea, acaya, spider flower, and Arabidopsis feature the intron that is located before the hydrophobic hairpin, its gain may occur sometime after the split with Capparaceae but before Brassicaceae-Cleomaceae divergence.The absence of the Cleomaceae U intron in Arabidopsis, which is located at the C-terminus of the proline knot, implies that its gain occurred sometime after the split with Brassicaceae.By contrast, the intron found in GgOLE7, which is located at the C-terminus of the hydrophobic hairpin, may be Gynandropsis-specific, since it is absent from its orthologs CvOLE6 and ThOLE8.
Expression divergence also plays an important role in the evolution of oleosin family genes in Brassicales.Among six oleosin genes identified in papaya, CpOLE6 in Clade N and CpOLE1 in Clade U have evolved to be constitutively expressed, whereas CpOLE3 in Clade SL and CpOLE4 in Clade SH have evolved into two dominant isoforms in seeds, calluses, and shoots, though CpOLE4 is more likely to be a WGD (γ) repeat of CpOLE5, another SH member.The constitutive expression of U oleosin genes has been widely reported in other species, e.g., castor bean (Ricinus communis), physic nut (Jatropha curcas), cassava, rubber tree, safflower (Carthamus tinctorius), rapeseed (Brassica napus), and tigernut (Cyperus esculentus) [5,[9][10][11][12][13].Nevertheless, to our surprise, CpOLE1 was shown to be rarely expressed in both sap and pulp, which is different from CpOLE6.Compared with CpOLE4, transcript levels of CpOLE5 were shown to be considerably lower in seeds, calluses, and shoots.By contrast, it was also moderately expressed in pollen, stamens, and ovules, as well as pulp.Notably, though Clade M was previously reported to be mesocarp-abundant [8,42], the expression of CpOLE2 was rarely detected in pulp or roots and sap.Interestingly, the transcript level of CpOLE2 is usually higher than that of CpOLE5, CpOLE6, and CpOLE1 in most tissues.By contrast, its ortholog in Arabidopsis (At-Sm3) is always less expressed than most members beyond Clade T.Moreover, among several repeat pairs identified in Arabidopsis, i.e., At-Sm1/-Sm2, At-S3/-S5, At-S1/-S2/-S4, and At-T1/-T2/-T3/-T4/-T5/-T6/-T7/-T8/-T9, At-Sm1, At-S5, At-S4, and At-T5 have evolved into dominant isoforms, respectively.In Brassicales, the lineage-specific expansion and tissue-specific expression of oleosin genes reflect their roles in the oil accumulation of seeds and anther [3,34].In seeds, the accumulation of oleosins is usually negatively correlated with LD size but positively associated with oil content, which could not only affect seed germination but also the freezing tolerance of seeds [34,35].Moreover, Brassicaceae-specific T oleosins are acquired for tapetosome formation, which confer additive benefits of pollen vigor [44].

Sequence Alignment, Evolutionary Analysis, and Definition of Orthogroups
Multiple sequence alignment was conducted using MUSCLE [47], which was subject to evolutionary tree construction using MEGA 6.0 [48] with the maximum likelihood method, Jones-Taylor-Thornton (JTT) model, uniform rates, complete deletion of gaps, nearest-neighbor interchange (NNI), and bootstrap of 1000 replicates.Orthologs between species were identified using the BRH (best reciprocal hit) method, and OGs across different species were defined as described before [49,50], which were assigned only when they were present in at least two species tested.

Gene Localization, Synteny Analysis, and Calculation of Evolutionary Rate
Gene locations on chromosomes and/or scaffolds were inferred from the revised genome annotation and displayed using TBtools [51].For synteny analysis, duplicate pairs between or within species were identified using the all-to-all BLASTp [52] method with E-value cutoff of 1 × 10 −10 , and gene colinearity was inferred using MCScanX [53] with the cutoff of five BLAST hits.Duplication modes such as tandem, proximal, transposed, dispersed, and WGD were identified using the DupGen_finder pipeline as previously described [54], and Ks (synonymous substitution rate) and Ka (nonsynonymous substitution rate) of duplicate pairs were calculated using codeml [55].

Gene Expression Analysis
Expression profile data of AtOLE genes were accessed from Arabidopsis RNA-seq Database (https://plantrnadb.com/athrdb/, accessed on 31 October 2023) and Arabidopsis Embryo eFP Browser (https://bar.utoronto.ca/efp/cgi-bin/efpWeb.cgi, accessed on 31 October 2023), whereas global expression profiles of CpOLE genes were analyzed using transcriptome datasets as shown in Table S4.Raw sequence reads in the FASTQ format were obtained using fastq-dump, and quality control was performed using Trimmomatic [56].Read mapping was conducted using HISAT2 [57], and the FPKM (fragments per kilobase of exon per million fragments mapped) method was used to determinate relative transcript levels.
To uncover the relative expression levels of CpOLE genes in the seed tissue, mature seeds were collected from the yellow fruits of Zhongbai cultivar as described before [58].Total RNA extraction, synthesis of the first-strand cDNA, and qRT-PCR analysis were conducted as previously described [59], where CpEIEF was used as the reference gene.Primers used in this study are shown in Table S5.Relative gene expression levels were estimated with the 2 −∆∆Ct method, and statistical analysis was performed using SPSS Statistics 20 as described before [60].

Conclusions
In this study, a focus on a comparative analysis of the oleosin gene family in Brassicales was conducted, which includes 13 species representing 10 plant families.Ninety-eight oleosin genes were assigned into six clades (i.e., U, SL, SH, M, N, and T) and nine OGs (i.e., U1, U2, SL, SH1, SH2, SH3, M, N, and T).The newly identified Clade N represents an ancient group that diverged before the radiation of angiosperm.Interestingly, this group was constitutively expressed in papaya, which includes the fruit and sap.Moreover, the previously defined Clade M is not Lauraceae-specific but an ancient and widely distributed group that has already appeared in the basal angiosperm A. trichopoda.Compared with A. trichopoda, the family expansion in Brassicales was largely contributed by lineage-specific recent WGDs but also the ancient γ event shared by all core eudicots.The expression of Clade T was shown to be flower-preferential, whereas other members exhibit an apparent seed/embryo/endosperm-predominant expression pattern.The structure and expression divergence of paralogous pairs was frequently observed, and a good example is a lineagespecific gain of an intron.These findings provide insights into lineage-specific family evolution in Brassicales, which facilitates further functional studies in papaya and other nonmodel species.

Figure 1 .
Figure 1.Multiple sequence alignment and evolutionary analysis of oleosins.(A) Evolutionary analysis of oleosins.Shown is an unrooted evolutionary tree resulting from full-length oleosins with MEGA6 (maximum likelihood method and bootstrap of 1000 replicates), where the distance scale denotes the number of AA substitutions per site and the name of each clade is indicated next to the corresponding clade.(B) Sequence alignment and structural features of N oleosins together with AtrOLE4, MoOLE4, MoOLE5, and other CpOLEs.MoOLE6N and MoOLE6C represent Nand C termini of the MoOLE6 protein, whereas sequence alignment and display were conducted using MUSCLE and Boxshade, respectively.Identical and similar residues are highlighted in black

Figure 1 .
Figure 1.Multiple sequence alignment and evolutionary analysis of oleosins.(A) Evolutionary analysis of oleosins.Shown is an unrooted evolutionary tree resulting from full-length oleosins with MEGA6 (maximum likelihood method and bootstrap of 1000 replicates), where the distance scale denotes the number of AA substitutions per site and the name of each clade is indicated next to the corresponding clade.(B) Sequence alignment and structural features of N oleosins together with AtrOLE4, MoOLE4,

Figure 2 .
Figure 2. Species-specific distribution of nine oleosin orthogroups identified in this study.Taxonomy relationships of tested species follow that of NCBI Taxonomy (M: mesocarp; N: novel; SH: seed high-molecular-weight; SL: seed low-molecular-weight; T: tapetum; U: universal).

Figure 2 .
Figure 2. Species-specific distribution of nine oleosin orthogroups identified in this study.Taxonomy relationships of tested species follow that of NCBI Taxonomy (M: mesocarp; N: novel; SH: seed high-molecular-weight; SL: seed low-molecular-weight; T: tapetum; U: universal).

Figure 5 .
Figure 5. Expression profiles of CpOLE genes.(A) Tissue-specific expression profiles of CpOLE genes.Color scale represents FPKM normalized log2 transformed counts where blue indicates low expression and red indicates high expression.(B) CpOLE transcript abundance relative to the reference gene CpEIEF.Bars indicate SD (N ≥ 3) and uppercase letters indicate difference significance tested following Duncan's one-way multiple-range post hoc ANOVA (p < 0.01).(Cp: C. papaya; FPKM: fragments per kilobase of exon per million fragments mapped; OLE: oleosin).

Figure 5 .
Figure 5. Expression profiles of CpOLE genes.(A) Tissue-specific expression profiles of CpOLE genes.Color scale represents FPKM normalized log 2 transformed counts where blue indicates low expression and red indicates high expression.(B) CpOLE transcript abundance relative to the reference gene CpEIEF.Bars indicate SD (N ≥ 3) and uppercase letters indicate difference significance tested following Duncan's one-way multiple-range post hoc ANOVA (p < 0.01).(Cp: C. papaya; FPKM: fragments per kilobase of exon per million fragments mapped; OLE: oleosin).