Phenylalkanoid Glycosides (Non-Salicinoids) from Wood Chips of Salix triandra × dasyclados Hybrid Willow

Salix triandra (almond leaved willow) is an established crop, grown in coppicing regimes for basket-making materials. It is known as a source of non-salicinoid phenolic glycosides, such as triandrin and salidroside. A spontaneous natural hybrid of S. triandra and S. dasyclados was subjected to metabolite profiling by high resolution LC-MS, and 22 phenolic glycosides, including 18 that are new to the Salicaceae, were identified. Structures were determined by HPLC isolation and NMR methods. The hybridisation process has introduced novel chemistry into the Salix phenolic glycoside palette, in particular, the ability to generate disaccharide conjugates where the glycosyl group is further extended by a range of sugars, including apiose, rhamnose, xylose, and arabinose. Also of note is the appearance of chavicol derivatives, also not previously seen in Salix spp. The work demonstrates the plasticity of the phenolic glycoside biosynthetic pathway, and the potential to improve established crops such as S. triandra and S. dasyclados, via high-value metabolites, for both basketry and bioenergy markets.


Introduction
The Salicaceae family is a distinct taxon of perennial woody, dioecious shrubs and trees that can primarily be divided into two genera-willows (Salix spp.) and poplars (Populus spp.). Both groups have a rich secondary chemistry based on phenolic glycosides. The genus Salix is the largest and includes over 400 species that are variable in growth form, from large trees to small shrubs, and are distributed over a wide range of habitats [1]. Willow bark preparations have been used for the treatment of fever and pain since ancient times [2], and these bioactivities have mostly been related to their constitutive salicinoids, which are defined as derivatives of salicyl alcohol with β-D-glucopyranose moieties (e.g., salicin and salicortin) [3]. Although salicinoids are the most commonly studied class of secondary metabolites in the Salicaceae, other phenolic glycosides, as well as lignans, flavonoids, and terpenes, have been characterised [4]. Considering the variety of compounds described in willows, the beneficial effects of herbal products may not be ascribed only to salicinoids. Hydroxycinnamic acid and benzoic acid derivatives, for example, have potential antioxidant, antimicrobial and anticancer activities. These and other mono-and 1,4-di-substituted benzenoids may add value to short rotation One of the best known species of willow that is not rich in salicinoids is S. triandra L. (almond-leaved willow) [6,7]. This straight-stemmed shrubby species is commonly grown in plantations that are coppiced annually, and is used for basketry, and varieties such as "Black Maul", which can grow 2-3 m/year, are major contributors to this industry. As a part of our ongoing research towards adding value to willow crops, we are surveying the S. triandra species contained in the 1500+ U.K. National Willow Collection (NWC), maintained as a short-rotation coppice plantation at Rothamsted Research, Harpenden, United Kingdom. In this paper, we report on the secondary metabolite profile of wood-chips of an unusual Salix triandra × dasyclados hybrid (S. × schaburovii I.Beljaeva) (NWC1283). This naturally occurring hybrid originates from the flood plain of the Kama River, Perm, Russia, and came to the NWC in 2004 from the Botanical Gardens of the Urals. The study builds on our previous work [8,9] in the analysis of secondary metabolites in willow, particularly in the polar metabolites obtained from aqueous alcohol extractions [10].

Results and Discussion
Freeze-dried wood chips derived from the multiple whole stems of S. triandra × dasyclados (NWC 1283) grown in the field and harvested at the dormant stage (January), were extracted with aqueous ethanol and were analysed using UHPLC-HRMS. The total ion chromatogram ( Figure 1) showed that this extract is a complex mixture of compounds, and, unlike most willow extracts, is not dominated by salicinoids, that is, salicin is a relatively minor peak, while salicortin, normally appearing at 20.5 min, is now totally obscured by more abundant peaks. Peaks 1, 2, 11 and 14 were identified as picein, salidroside, rosarin and rosavin, respectively, in comparison with the 1 H-NMR and LC-MS data (Table 1) of the authentic standards run under identical conditions. Picein (1) has been detected in different parts of the Salix species, such as in the leaves of S. matsudana [11] and in the bark of S. purpurea [12]. Salidroside (2) is also a major component in the leaves of S. triandra and of its hybrids with S. viminalis and S. purpurea [13]. Peaks 1, 2, 11 and 14 were identified as picein, salidroside, rosarin and rosavin, respectively, in comparison with the 1 H-NMR and LC-MS data (Table 1) of the authentic standards run under identical conditions. Picein (1) has been detected in different parts of the Salix species, such as in the leaves of S. matsudana [11] and in the bark of S. purpurea [12]. Salidroside (2) is also a major component in the leaves of S. triandra and of its hybrids with S. viminalis and S. purpurea [13]. In order to identify further peaks, a portion of the extract was fractionated by semi-preparative HPLC via repeated injections. After the solvent removal, these fractions were analysed by NMR. In total, 22 phenylalkanoid glycosides were identified on the basis of the obtained HRMS and/or NMR data, compared with those found in the literature. Table 1 includes the UHPLC-HRMS data and the method of identification for these compounds, while their chemical structures can be found in Figure 2. Nineteen of the identified compounds are described for the first time in a Salix genotype, and one of them (16) O 8 , and was identified as p-coumaroyl-β-D-glucopyranoside based on its 1 H-NMR peaks ( Table 2) [14]. The molecule contained a pair of aromatic doublets, each with an 8.6 Hz coupling, at δ 6.96 and 7.60, indicating a para substituted molecule. A β-glucoside moiety was confirmed via inspection of the anomeric signal, which appeared as a doublet at δ 5.65 and had an 8 Hz coupling. Peak 4 at 15.8 min corresponded to the known compound triandrin, in which m/z 357.1187 arises from its formate adduct and 311.1133 from its molecular ion [15]. Triandrin (4) has been recently found in the leaves and twigs of S. reticulata [16], but is better known in S. triandra [6,7], a parent of NWC1283.  (Table S1).

Molecules 2019, 24, x FOR PEER REVIEW 4 of 12
In order to identify further peaks, a portion of the extract was fractionated by semi-preparative HPLC via repeated injections. After the solvent removal, these fractions were analysed by NMR. In total, 22 phenylalkanoid glycosides were identified on the basis of the obtained HRMS and/or NMR data, compared with those found in the literature. Table 1 includes the UHPLC-HRMS data and the method of identification for these compounds, while their chemical structures can be found in Figure 2. Nineteen of the identified compounds are described for the first time in a Salix genotype, and one of them (16) is a novel molecule.
Peak 3 at 15.2 min with [M -H] − at m/z 325.0930 had the molecular formula of C15H18O8, and was identified as p-coumaroyl-β-D-glucopyranoside based on its 1 H-NMR peaks ( Table 2) [14]. The molecule contained a pair of aromatic doublets, each with an 8.6 Hz coupling, at δ 6.96 and 7.60, indicating a para substituted molecule. A β-glucoside moiety was confirmed via inspection of the anomeric signal, which appeared as a doublet at δ 5.65 and had an 8 Hz coupling. Peak 4 at 15.8 min corresponded to the known compound triandrin, in which m/z 357.1187 arises from its formate adduct and 311.1133 from its molecular ion [15]. Triandrin (4) has been recently found in the leaves and twigs of S. reticulata [16], but is better known in S. triandra [6,7], a parent of NWC1283.  (Table S1).  (Table 2) similar to those of compound 5. This indicates that this molecule contains a para-disubstituted aromatic ring, and that the glucose moiety is O-linked to this ring. In addition to this, signals of an allylic side chain (terminal olefinic proton at δ 5.08 ppm; multiplet at δ 6.01 ppm and doublet at δ 3.35 ppm) were also detected and the structure of compound 21 was confirmed to be chavicol-β-D-glucopyranoside, a compound previously isolated from Cedronella canariensis (L.) Webb and Berth. (Lamiaceae) [18] and Alpinia officinarum Hance (Zingiberaceae) [19].
A further four peaks can be assigned to chavicol with a diglycosidic chain. Compounds 18 and 20 were isolated, and the MS/MS data of both compounds showed ions at m/z 133, corresponding to  (Table 2) similar to those of compound 5. This indicates that this molecule contains a para-disubstituted aromatic ring, and that the glucose moiety is O-linked to this ring. In addition to this, signals of an allylic side chain (terminal olefinic proton at δ 5.08 ppm; multiplet at δ 6.01 ppm and doublet at δ 3.35 ppm) were also detected and the structure of compound 21 was confirmed to be chavicol-β-D-glucopyranoside, a compound previously isolated from Cedronella canariensis (L.) Webb and Berth. (Lamiaceae) [18] and Alpinia officinarum Hance (Zingiberaceae) [19].
A further four peaks can be assigned to chavicol with a diglycosidic chain. Compounds 18 and 20 were isolated, and the MS/MS data of both compounds showed ions at m/z 133, corresponding to the loss of a hexose-pentose fragment (m/z 295). In compound 18, two sugar moieties were confirmed from the 1 H and 13 C-NMR data of the anomeric protons (δ H /δ C 5.04/103.5 and 5.00/111.2) (Tables 3  and S2). A coupling constant of 7.7 Hz for the anomeric signal at δ 5.04 suggested a β-glucoside linkage to the chavicol, whilst the smaller (1.3 Hz) coupling of the anomeric signal at δ 5.00 indicated an α-linkage of the pentose entity. The 1 H-NMR signals agreed well with the published literature data for chavicol-α-L-arabinofuranosyl-(1→6)-β-D-glucopyranoside [20]. The furanosyl form of the arabinose moiety was confirmed by the downfield shift of the 1"-anomeric proton, appearing at δ 5.00, and the 13 C resonance at δ 111.2 [21]. Chavicol-β-D-apiofuranosyl-(1→6)-β-D-glucopyranoside was the structure assigned to compound 20. This was confirmed by the coalescence of the 5"-H 2 signals to a singlet, appearing at δ 3.60, and characteristic of the apiose moiety. In addition, a quarternary signal was evident from the 13 C-NMR data at δ 82.3 (Tables 3 and S2). Both compound 18 and 20 have previously been found in Betula papyrifera Marsh. (Betulaceae) [20], and the NMR data was consistent for both compounds. Detailed 1 H-and 13 C-NMR spectroscopic data, including 2D correlations, can be found in Table S2. Peak 19, with the same molecular formula (C 20 H 28 O 10 ), was putatively annotated as chavicol-α-L-arabinopyranosyl-(1→6)-β-D-glucopyranoside, because of its proximity to 18 in the chromatographic run (less than 0.2 min later), and that 19 is an impurity in the 1 H-NMR spectrum of 18, being detected by the anomeric proton of the arabinopyranosyl moiety (δ H 4.46 ppm, d, J = 8.0 Hz). The fourth compound eluted at 23.0 min with m/z at 441.1766, and was identified as chavicol-rutinoside (compound 22) [19].
Monosubstituted aromatic glycosides were also identified in this study. Two compounds with molecular ions at m/z 401 (compounds 6 and 7) and three at m/z 415 (compounds 8-10) had their molecular formulas calculated for C 18 H 26 O 10 and C 19 H 28 O 10 , respectively, based on accurate masses.
Compound (12) 10 . The NMR data of this isolated compound was comparable to dihydrocinnamyl alcohol-α-L-arabinofuranosyl-(1→6)-β-D-glucopyranoside) (dihydro-rosarin), formerly described in Juniperus communis var. depressa [26]. Compound (16) showed the same molecular formula, however, the NMR data indicated a different sugar moiety ( Table 5). The 1D 1 H-NMR spectrum revealed the presence of a monosubstituted benzene ring and a hydroxypropyl group, as well as two sugars, a β-glucopyranose (anomeric proton at δ H 4.42/δ C 105.9) and a α-arabinopyranose (anomeric proton at δ H 4.53/δ C 104.4). Long range correlations observed between H-7/C-1 (δ H 2.71/δ C 145.3), H-1/C-9 (δ H 4.42/δ C 72.8) in the heteronuclear multiple bond correlation (HMBC) spectrum indicated how each moiety was connected in the molecule. Thus, the structure of 16 was determined to be dihydrocinnamyl alcohol-α-L-arabinopyranosyl-(1→6)-β-D-glucopyranoside (dihydro-rosavin), and this compound has not previously been reported in the literature.  Although compound 15 could not be purified, the UHPLC-MS chromatogram of the fraction that contains 16 indicates that a small proportion of 15 can be detected in it (two peaks not well resolved, data not shown). In order to find more evidence of the structure of 15, a careful analysis of the 1D 1 H and 2D 1 H-13 C-HSQC and HMBC-NMR spectra of 16 was performed. Some minor peaks corresponding to a β-apiofuranosyl moiety were observed, such as an anomeric at δ H 5.07 ppm (d, J = 3.2 Hz)/δ C 111.9 ppm (CH-1"); δ H 3.96 ppm (d, J = 3.2 Hz)/δ C 79.6 ppm (CH-2"); δ C 82.3 ppm (C-3"); δ H 4.00 ppm (d, J = 10.1 Hz), 3.85 ppm (d, J = 10.1 Hz)/δ C 76.7 ppm (CH 2 -4"); and δ H 3.62 ppm (s)/δ C 66.5 ppm (CH 2 -5"). The relative integration of these peaks to the corresponding signals of 16 was 1:3. Together with this data, the fact that 15 presented the same molecular formula and mass fragments as 16, and the proximity of them in the UHPLC run (less than 0.2 min), we can suggest that 15 is dihydrocinnamyl alcohol-β-apiofuranosyl-(1→6)-β-glucopyranoside.  In summary, the hybrid of S. triandra and S. dasyclados (NWC1283) contains a much larger array of chemistry, particularly of phenylalkanoid disaccharides, than has previously been reported in either of the parental species. Apart from salidroside (2), which was the only phenyethanoid glycoside previously known in the Salicaceae [27], many of the compounds reported have not been reported from willow before. Of particular note is the appearance of the rosavin, rosarin and chavicol analogues, and the propensity of these and others to contain disaccharide groups, where the "normal Salix" glycosyl moiety is further substituted at C-6 by arabinose, apiose, xylose or rhamnose. This second glycosidic substitution can also consist of either the pyranosyl form or of the furanosyl form, and evidence of both forms is present for several of the compounds isolated. As evidenced in this study, such compounds can be isolated directly from the chipped biomass with relatively straightforward extraction techniques. In this study, extraction was achieved using 20% aqueous ethanol, a solvent system that is cheap and therefore applicable to larger scale extractions. Similarly, heating steps were not required to release the compounds reported, and extraction at room temperature preserved the integrity of the compounds. This work also demonstrates the potential of plant breeding to introduce traits that result in new chemistry, potential bioactivities and provide the opportunity to add value to basketry and bioenergy crops. Table 5. NMR data of dihydro-rosarin (12) and dihydro-rosavin (16).

General Experimental Procedures
The 1 H-1D and 1 H-1 H and 1 H-13 C 2D-NMR spectra of each compound were acquired, using a 5 mm triple resonance (TCI) cryoprobe, on a Bruker Avance 600 MHz NMR spectrometer (Bruker Biospin, Germany), operating at 600.05 MHz for 1 H-NMR and 150.9 MHz for the 13 C-NMR spectra. Typical 1-dimensional 1 H spectra were obtained with an acquisition time of 4.6 s, a sweep width of 7142.9 Hz and 65,536 data points. A total of 16 scans were recorded using the zgpr pulse sequence with a 90 • angle. A relaxation delay of 5 s was used to suppress the residual HOD signal. The spectra were transformed using an exponential window with a line broadening of 0.5 Hz. 1 H-1 H correlation spectroscopy (COSY) were run using the pulse sequence cosyprqf for 3 h, and the frequency was 600.05 MHz in both dimensions. The acquisition times were 0.1434 and 0.0896 s, and the sweep widths were 7142.9 Hz. There were 1024 data points collected in each dimension using 32 transients. 1 H-13 C heteronuclear single quantum coherence (HSQC) spectra were performed using the pulse sequence hsqcetgpsi2 for 10 h, at 600.05 and 150.9 MHz frequencies, with acquisition times of 0.1433 and 0.00212 s. The data were acquired using sweep widths of 7142.9 and 30,120.5 Hz. There were 2048 and 1024 data points collected using 128 transients. The 1 H-13 C heteronuclear multiple bond correlation (HMBC) spectra were obtained using the pulse sequence hmbcgpndqf for 22 h. The acquisition parameters were the same as stated for the HSQC data collection. For comparability with previous work [10], all of the spectra were collected at 300 • K in D 2 O:CD 3 OD (8:2), and chemical shifts are given in δ, relative to TSP-d 4 ((trimethylsilyl) propionic acid, 0.01 % w/v) added as a chemical shift reference standard. The compound concentration was typically 1 mg/mL. Phasing and baseline correction were carried out within TOPSPIN v. 2.1 (Bruker Biospin, Germany). Structural assignments of carbohydrate moieties were made with reference to the authentic standards and the use of characteristic chemical shift data [21].
UHPLC-MS were recorded on an LTQ-Orbitrap Elite mass spectrometer (Thermo Fisher, Bremen, Germany) coupled to a Dionex UltiMate 3000 RS UHPLC system, equipped with a DAD-3000 photodiode array detector. Separation was carried out in reverse-phase using Hypersil GOLD™ column (1.9 µm, 30 × 2.1 mm i.d. Thermo Fisher Scientific, Germany), which was maintained at 35 • C. The solvent system consisted of water/0.1% formic acid (A) and acetonitrile/0.1% formic acid (B), both Optima™ grade (Thermo Fisher Scientific, Germany). The injection volume was 10 µL and separation was carried out for 40 min with a flow rate of 0.3 mL/min under the following gradient: 0-5 min, 0% B; 5-27 min, 31.6% B; 27-34 min, 45% B and 34-37.5 min, 75% B. The mass spectra were collected in negative ion mode using a heated electrospray source (Thermo Fisher Scientific, Germany). The resolution was 120,000 over m/z 50-1500, and the source voltage, sheath gas, auxiliary gas, sweep gas and capillary temperature were set to 2.5 kV, 35 (arbitrary units), 10 (arbitrary units), 0.0 (arbitrary units) and 350 • C, respectively. Automatic MS-MS was performed on the four most abundant ions using an isolation width of m/z 2. The ions were fragmented with a normalised collision energy of 65 and an activation time of 0.1 ms, using high-energy C-trap dissociation. The data were collected and inspected using Xcalibur v. 2.2 (Thermo Fisher Scientific, Germany).
The compounds were isolated by repeated injection into an HPLC system (Dionex UltiMate 3000, Thermo Fisher Scientific) equipped with an Ascentis C-18 column (5 µm, 5 × 250 mm i.d., Sigma-Aldrich, Gillingham, UK). The column was maintained at 35 • C and chromatographic separation was performed using a constant flow rate of 1 mL/min. The mobile phases were water (A) and acetonitrile (B), both containing 0.1% formic acid. To achieve separation, the gradient used was as follows: 0-2 min, 5% B; 2-5 min, 12% B, 5-10 min, 12% B, 10-60 min and 40% B. The peaks were detected using UV wavelengths of 210 to 360 nm, and fractions corresponding to the target compounds were collected into glass tubes. Twelve injections (100 µL each) were performed and the fractions from repeated runs were combined and the solvent was evaporated using a Speedvac concentrator (Genevac, Suffolk, UK). The previous crop was winter wheat. The soil type within the field is silty clay loam with flints over clay. Planting and agronomy followed the conventional SRC best practice. Willows were planted as cuttings using the typical twin-row design at a planting density of 16,667 plants ha −1 , establishment year growth received no fertilisers and two pre-emergence herbicides were applied within 10 days of planting Pendimethalin (Stomp at 3.3 l/ha) and Isoxaben (Flexidor at 1.0 l/ha). The genotypes were planted in unreplicated plots each containing 80 plants (two twin rows of 40).

Plant Material and Metabolite Extraction
The winter dormant above ground biomass was harvested in January 2018. The stems were cut 5 cm above the ground, and were immediately chipped in a Petrol Shredder 100 mm (4"). The material was stored at −80 • C and then freeze-dried. A voucher specimen has been retained and is available on request.
The chips (60 g) were extracted at room temperature, by soaking with 400 mL water:ethanol (4:1) for 16 h. Aliquots were taken for initial metabolite profiling by 1 H-NMR and UHPLC-MS and for compound isolation by HPLC.