BOX38, a DNA Marker for Selection of Essential Oil Yield of Rosa × rugosa

Rosa rugosa L. was a famous aromatic plant whose cultivars (Rosa × rugosa) have been widely used in the perfume industry in Asia. The perfume market looks for rose cultivars bearing higher essential oil, while the oil yields of most R. × rugosa have not been evaluated due to limiting conditions, such as insufficient cultivation areas. Here, we tested the yield and the aroma components of essential oil of 19 R. × rugosa. The results indicated that the yields of nerol, citronellol, and geraniol could represent an alternative index of the total yield of essential oil. Sequence syntenic analysis indicated that the Rosa genus specific cis-element Box38 was highly polymorphic. The Box38 region isolation of Rosa × rugosa by flanked primers proved that Box38 repeat number was significantly positively correlated with the essential oil yield of the corresponding cultivar. In the breeding of Rosa × rugosa, six-Box38-repeat could be a robust threshold for selection of high-essential-oil roses. Together, we found that Box38 was a DNA marker for essential oil yield and that it would be helpful in the early selection and breeding of essential oil roses.


Introduction
Essential oil plants are favored in industrial crop cultivation due to the great value of their essential oils for food preservation [1], aromatherapy [2], medicine [3], and flavors [4]. The perfume rose is an ancient essential oil plant and is famous for its rose essential oil. In about 200 species of Rosa genus, only dozens of species with a strong fragrance were noticed particularly by humans, e.g., Rosa chinensis cv. 'Old Blush' (Old Blush), Rosa rugosa (Rugosa, R. rugosa), Rosa damascena, Rosa centifolia, and Rosa alba [5,6]. These fragrant species have been widely used in the breeding of the perfume roses [7]. The Middle East (Damascus) origin R. damascena and its hybrid cultivars (R. × damascena) contribute mostly to the Europe market [8]. For example, R. × damascena 'trigintipetala', a famous oil bearing rose, has been planted in Turkey and Bulgaria since the 16th century and was introduced to Asia (China) in the 1970s [9]. Meanwhile, the East Asia (China and Japan) origin R. rugosa and its hybrid cultivars R. × rugosa are more popular in the Asian market [10,11].
Plant essential oil composition is related with many factors, e.g., geographical area of production [12,13]; harvest year [13]; irrigation [14]; extraction system [15]. For the essential oil production of R. × rugosa, unified methods including harvest timing (3-to 5-year-old plants), hydrodistillation extraction and open field culture without irrigation indicated that only geographical area and cultivar differences should be the key factors. The abundant cultivars of R. × rugosa came from the allele recombination of several wild species and spontaneous bud mutations of cultivars, e.g., the national geographic indication cultivar of China, R. × rugosa 'Kushui', planted in Kushui (Gansu Province, China) since the 18th century, is a natural hybrid of Rosa setate × R. rugosa [10,16]. Another China geographic indication cultivar, R. × rugosa 'plena' (Plena), planted in Pinyin (Shandong Province, China) since the 16th century, is a spontaneous variety of wild R. rugosa [17]. In addition to Biomolecules 2023, 13, 439 2 of 9 the above several famous cultivars, some oil roses were bred or introduced to China since the 1970s. Most of these oil roses lacked a paternal genetic background due to the mixed pollination in breeding processes, and their oil yield needed to be evaluated urgently [18]. Considering the high cost and long period of commercial planting, the fast and low-cost method based on genetic markers should be an executable solution [7,19,20].
Geraniol is the key index of commercial rose essential oils. Most plants convert geranyl diphosphate (GPP) to geraniol by a plastid monoterpene synthase [21]. While modern roses rely on a specific cytosolic pathway [22,23] which dephosphorylates GPP to geranyl phosphate (GP) by Nudix hydrolase (NUDX1), then to geraniol by one uncharacterized phosphatase. The NUDX1 family includes 4 sub-family, i.e., Nudx1-1(a/b), −2, −3, −4, while only NUDX1-1a gene clusters are specific to geraniol producing rose species whose NUDX1-1a copy number is more than non-geraniol-producing species [24,25]. The NUDX1-1a copy number positively correlated with the NUDX1-1a expression, and it could be a candidate marker for high geraniol level cultivars or mutations [24]. Besides the NUDX1-1a gene, transposon elements (TEs) in promotors could also be candidate markers for scentedrose breeding [21,24]. Here, based on the analysis of the conservation and variation of transposon fragments relevant motifs in R. × rugosa, a rose-specific motif, Box38, was selected as a robust marker for the early selection of high-oil-yield cultivars.

The Rose Cultivars and Sampling
Samples were collected from the rose germplasm resource nursery (116.457676 • E, 36.288978 • N) of Pingyin Rose Research Institute (Shandong province, China). All 19 R. × rugosa cultivars and their accession name used in this study were listed in Supplementary Table S1. The plants were cultivated in the open air under natural climate (temperate monsoon climate) with no fertilizers. Over 15 g leaves of each four-year-old plant were collected for three biological replicates and frozen in liquid nitrogen for DNA extraction. At the same time, the fresh flowers of were picked before 6:00 am (before sunrise) in May-July of 2011 and used for oil extraction instantly. The follower collection and oil extraction were repeated in May of 2022 except for several continuous-flowering cultivars.

Hydrodistillation Extraction and Gas Chromatography with Mass Spectrometry (GC-MS) Analyses of Essential Oil
At least 500 g flowers were distilled with deionized water (4 mL water per gram flower) at 240 • C for 0.5 h and at 180 • C for 1.5 h using a Clevenger-type apparatus. Then, the volatile oil layer settling on top of the aqueous layer were collected. The water in rough extractions was absorbed by anhydrous sodium sulfate, and then essential oils were prepared. The essential oils were diluted by n-hexane to the appropriate concentration (v/v = 1:1000). For GC-MS analysis, 50 µL dissolved essential oils and 50 µL 3-nonanone (internal standard) were added to the sample injector of a Trace DSQ GC-MS (Thermo Corporation, Waltham, MA, USA). The chromatographic column (ECONO-CAP, EC-1000, Alltech Corporation, Lexington, KY, USA) at the 1.00 mL/min helium flow rate was programmed as follows: 50 • C for 1 min (initial temperature), then increased to 140 • C at 10 • C/min for 5 min, and finally increased to 210 • C at 4 • C/min for 15 min. Then, 1 µL of sample was injected at a separation ratio of 100:1 and injector temperature of 240 • C. The mass spectral ionization temperature, 200 • C; automatic scanning at m/z 50-550 amu. The electron energy was 70 eV. Qualitative analysis based on NIST17 MS database and quantitative analysis based on peak areas normalized by internal standard were performed according to our previous methods [26].

Distinguish of B38 Copy Numbers by Length of PCR-Products
DNAs of all cultivars were isolated from corresponding leaves using a DNAprep Pure plant kit (Tiagen, China) following the manufacturer's recommended instructions. Cloning of B38 repeat region was performed with PCR amplification using high fidelity polymerase PrimeSTAR (Takara, Japan). The PCR parameters with primers (5 -TTTGCAAGAAACTAAT GCTG-3 and 5 -GTTACGAATTATTACAAATA-3 ) were as follows: 98 • C for 1 min, 25 cycles of (98 • C for 5 s, 58 • C for 5 s, and 72 • C for 10 s), and 72 • C for 1 min. Then, the length of PCR products was distinguished by the 6% polyacrylamide gel electrophoresis (PAGE). The two primers were located on the upstream sequence of the first Box38 repeat and the downstream sequence of the last repeat. The expected PCR products would be 40-bp longer than the Box38 repeat region.

The Alternative Index of Essential Oil Yield
Most commercial products of rose essential oil were concocted by nature plant flowers based on water-steam distillation. We isolated essential oils by the traditional method to detect the oil yield (Table S1). In the 19 cultivars, Plena_alba (0.4162 µL/g), Russia, Fenghua, and Pingyin were the top four whose oil yields exceeded 0.3 µL/g (0.3 µL oil per gram flowers). The oil yields of six cultivars, i.e., Tuwei, Kushui, HENSA, Daguo, Zizhi_DH and Zizhi (0.0314 µL/g), were inferior to 1 µL/g and others ranged from 0.1 µL/g to 0.3 µL/g. The highest oil yield (0.4162 µL/g), lowest oil yield (0.0314 µL/g), and variable coefficient 58.36% indicated the significantly differential oil yield among these cultivars.
GC-MS analysis of essential oils indicated that compared with the low content compounds (arenes, ketones, aldehydes, acids), hydroxy compounds (or alcohols), alkanes and esters were the top 3 compounds contributing to 31-76%, 7-31% and 0.27-32% of oil contents, respectively (Table S2). The alcohols yield was positively correlated with oil yield ( Figure 1A). Among the alcohols, we selected nerol, citronellol, and geraniol as the key compounds of essential oils since they were much richer than other minor compounds, e.g., linalool, diphenyl ethanol, farnesol, and bisabolol (Tables S3 and S4). Moreover, the yields of nerol, citronellol, and geraniol (yNCG) were significantly positively correlated with alcohol yield ( Figure 1B and Table S4). yNCG would be an alternative index of essential oil yield. Xihu1, Xihu3, Dnabanhong, and Linagyehong were richest in the yNCG. All three compounds contributed to the yNCG, except Banchongban, Daguo, and Pingyin ( Figure 1A).

Cosserved NUDX1-1a Clusters and Variable B38
Citronellol (dehydrogenation or oxidation) could transform from Nerol (cis-trans isomers of geraniol) and geraniol [26]. It seemed that geraniol is the basic version of high yNCG in cells. Moreover, geraniol production has been identified as a rose-specific pathway mediated by the expression of NUDX1-1a paralogs. We searched the candidate DNA markers based on the NUDX1-1a clusters of Oldbush, Rugosa, first haplotype of Zizhi (Zizhi_H1) and second haplotype of Zizhi (Zizhi_H2). According to the gene cluster on Chr2 of Oldbush (RcNUDX1-1 1st-5st copies), NUDX1-1a clusters of Rugosa (1st-6st copies), ZiZhi_H1 (1st-4st copies), and ZiZhi_H2 (1st-5st copies) were identified from the homologous chromosomes (Figures 2A and S1). The ZiZhi_H2 5st and ZiZhi_H1 4st NUDX1-1a were two pseudogenes including early stop codons and their colinear RcNUDX1-1 5st copies were reported as pseudogenes. When ignoring the long regions between 2st and 3st copies (or 1st-2st copies of ZizhiH1), the synteny of NUDX1-1a clusters was conserved. Besides, the 'Copia-NUDX1-1a-MITE' or 'MITE-Copia-NUDX1-1a' were conserved blocks (excepted ZiZhi_H1 4st block lack Copia element). The 150 bp or 136 bp linker region of TGA (stop codon)-P580.2030 (MITE) was highly conserved (Figure 2B), while the linker region of P580.2030-Copia produced different indels, including a 'A'-type simple sequence repeat (SSR-A) ( Figure 2C). peats number was variable. Except for B38a and B38c, other B38 repeats (c1, c2, d2, f, e) with 1-2 SNPs were named and founded in Rugosa. Unlike the little information about MITE G13534 or unknown insert sequences, the promoter activity of B38 repeats was observed in Oldbush. This indicated that the expression of NUDX1 relied on the interaction of potential trans-acting factors and B38 repeats. Considering that more repeat numbers would supply a stronger activation potential, we selected B38 as the candidate marker for further study.  BOX38 is a rose-specific cis-element produced by Copia. The sequence alignment indicated that the B38 element clustered with part overlapping repeats and its location was stable ( Figure 2D). The first B38 overlapped with Copia (33-bp) and the last B38 located on 138 bp upstream ATG. Compared with four B38 repeats of Oldbush, 6-7 repeats of Rugosa, 5-6 repeats of Zizhi_H1, and six repeats of Zizhi_H2 indicated that the B38 repeats number was variable. Except for B38a and B38c, other B38 repeats (c1, c2, d2, f, e) with 1-2 SNPs were named and founded in Rugosa. Unlike the little information about MITE G13534 or unknown insert sequences, the promoter activity of B38 repeats was observed in Oldbush. This indicated that the expression of NUDX1 relied on the interaction of potential trans-acting factors and B38 repeats. Considering that more repeat numbers would supply a stronger activation potential, we selected B38 as the candidate marker for further study.

Positively Correlation of Repeat Number of B38 and Essential Oil Yield
According to the PAGE of PCR products, the sequence length of B38 repeats regions and repeat number of B38 (RNB) were deduced (Table 1) based on sequence length/B38 length (38 bp or 34 bp). When RNB exceeded 4, the B38 length should be 34 bp as the overlap could not be neglected. Repeat number polymorphism was usually observed in different cultivars (even different individuals of same cultivar) and length polymorphism was only observed when RNB was 6. The RNB ranging from 2 to 9 showed a significant positive correlation with increasing yNCG (R = 0.803, p < 0.001) among 19 cultivars (Figure 3). When RNB was 4, it corresponded to four cultivars whose yNCG was differential. When RNB exceeded 6, the yNCG was higher than 2.414 µg/g. Together, this indicated that RNB was a positive correlation marker of yNCG. According to the PAGE of PCR products, the sequence length of B38 repeats regions and repeat number of B38 (RNB) were deduced (Table 1) based on sequence length/B38 length (38 bp or 34 bp). When RNB exceeded 4, the B38 length should be 34 bp as the overlap could not be neglected. Repeat number polymorphism was usually observed in different cultivars (even different individuals of same cultivar) and length polymorphism was only observed when RNB was 6. The RNB ranging from 2 to 9 showed a significant positive correlation with increasing yNCG (R = 0.803, p < 0.001) among 19 cultivars ( Figure  3). When RNB was 4, it corresponded to four cultivars whose yNCG was differential. When RNB exceeded 6, the yNCG was higher than 2.414 μg/g. Together, this indicated that RNB was a positive correlation marker of yNCG.

Discussion
The perfume industry has been looking for roses with higher essential oil content to increase the essential oil yield and satisfy the market demand [8,17]. Over 18,000 modern roses have been generated by hybridization, breeding, or introgression since the 19th century, and some of them have a pleasant fragrance [5,29]. To identify and select roses with high essential oil, DNA marker which is associated with oil yield would be helpful. We compared the promoters, CDSs and TEs of NUDX1-1a cluster and found that most elements, including CDSs of all NUDX1-1a, Copia R24588, MITE P580.2030, and the linker region, were conserved. Microsatellite genotyping of R. damascena accessions from Europe possessed identical profiles [9]. Microsatellite seemed to not be a marker for the selection of high oil yield roses due to the narrow gene pool. A simple sequence repeat (SSR-A) was located on the linker region of P580.2030-Copia, while its polymorphism showed no correlation with oil yield (results of SSR polymorphism in 19 cultivars were no showed in text). Interestingly, polymorphism of a cis-element B38 repeats were associated with the yNCG, which is a key index of oil yield. Though there were exceptions (like high-oil-yield Russia with only 4 B38 repeats), the positive correlation of repeat number and the yNCG was obvious. Interestingly, when the two cultivars not belonging to Rugosa pedigree (Russia, R. × centifolia and Damask_Oil, R. × damask) were not counted, the correlation was more significant. This indicated the higher repeat number of B38 could be a selection criterion for oil roses in Rugosa pedigree. The higher repeat number of B38 contributed to the yNCG and seven repeats seemed to be a threshold for high oil yield. Besides, Plena_alba and Russia were intensively recommended by local managers of Pingyin City based on their sensual experience of more intense floral fragrance, though the two cultivars were not famous as Fenghua or Pingyin. Our study proved they were the top-two oil yield cultivars, which is contradictory to their Box38 repeats to some degree. Whether the patterns are appropriate for oil roses of another pedigree needs to be tested in more species, e.g., the famous Damask pedigree.
B38 repeats originated from 3 long-terminal repeat (LTR) of Copia R24558 and the repeat number in promotor should tend to neutral evolution in wild species [24]. Seven repeats were also found in the wild Rugosa, which is the common ancestor of cultivars of Rugosa pedigree. Though Rugosa was a high-oil rose, its bush plant architecture was not applicable for cultivation [11]. Approximately 1300 years ago, flower growers of Pinying city selected the oil cultivars (namely Pinying) which stand erectly (Table S1). We found that it lost one repeat compared to Rugosa and still kept a high yNCG. Then, Plena with higher oil and its bud mutation Plena_alba were later selected and were the main cultivars in Pinyiing area until 1980s (Table S1). Other cultivars were mostly derived from the hybridization or introgression of Pinying, Plena, and other unknown parents, e.g., Fenghua and Zizhi, the local main cultivars now, which were selected by Pingyin Rose Research Institute in the 1980s. In the new breeding cultivars, a high number of B38 repeats (8)(9) was maintained in the Rugosa background and selection of high oil seemed to be helpful in higher repeat numbers since only seven repeats were found in other rose species.

Conclusions
GC-MS analysis indicated that the main compounds (nerol, citronellol, and geraniol) represent an alternative index of essential oil yield in Rosa × rugosa. Genome comparison and analysis and polymorphic PCR detection showed that the repeat number of cis-element Box38 was significantly positively correlated to the essential oil yield. The essential oil yield of Rosa × rugosa cultivars was determined by the genetic background under the same environment and using the same oil extraction method. We suggest that the high number of B38 repeats (>6) should be an important selection standard for high-essential-oil roses in the similar climate regions under the same cultivation conditions. The B38-maker-selection could be a time-saving and efficient breeding tool for perfume roses.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/biom13030439/s1, Figure S1: The synteny block of NUDX1-1a gene clusters of R. rugosa and R. chinensis by genome browser; Table S1: Accessions and oil yield of rose cultivars; Table S2: Percentage of organic compounds by GC-MS; Table S3: Main compounds of oil (µg per µL oil); Table S4: Yield of main compounds (µg per g flower).