Dataset on the Effects of Different Pre-Harvest Factors on the Metabolomics Profile of Lettuce (Lactuca sativa L.) Leaves

The study of the relationship between cultivated plants and environmental factors can provide information ranging from a deeper understanding of the plant biological system to the development of more effective management strategies for improving yield, quality, and sustainability of the produce. In this article, we present a comprehensive metabolomics dataset of two phytochemically divergent lettuce (Lactuca sativa L.) butterhead varieties under different growing conditions. Plants were cultivated in hydroponics in a growth chamber with ambient control. The pre-harvest factors that were independently investigated were light intensity (two levels), the ionic strength of the nutrient solutions (three levels), and the molar ratio of three macroelements (K, Mg, and Ca) in the nutrient solution (three levels). We used an untargeted, mass-spectrometry-based approach to characterize the metabolomics profiles of leaves harvested 19 days after transplant. The data revealed the ample impact on both primary and secondary metabolism and its range of variation. Moreover, our dataset is useful for uncovering the complex effects of the genotype, the environmental factor(s), and their interaction, which may deserve further investigation.

common types identified by breeders and retailers are romaine, butterhead, crisphead, and stem/stalk lettuce, along with leaf and Latin lettuce [2,4]. Even if significant differences exist among regional markets, the butterhead lettuce is arguably the most popular cultivated type [5]. Butterhead lettuces produce broad, wrinkled leaves that form a less compact and smaller head compared to crispheads. The butterhead group is also characterized by a strong divergence in the color of the outer leaves, which range from deep red/violet to light green.
Considering the market demand for year-round production of easy-to-clean heads, a short growing cycle, and very limited plant waste, the professional production of lettuce relies largely on soilless cultivation techniques, mainly hydroponics [6]. Specifically, the nutrient film technique is a very common system for lettuce, because of its versatility and the low water and nutrient consumption [7]. Soilless cultivation is valuable for leafy vegetables to achieve higher yield and quality, and it offers the advantage of more standardized cultural practices [8]. For instance, mineral availability can be tightly controlled by dissolving minerals in water to generate a nutrient solution (NS). More crucially, an important scientific benefit is that the combination of hydroponics with the use of growth chambers allows the control of several experimental factors while increasing repeatability [9].
Over the past decade, metabolomics has emerged as a useful technology for providing a detailed biochemical picture of a biological system [10]. The large-scale identification of small metabolites has various applications in plant science, and often, metabolomics has been used to identify compounds and related pathways that influence valuable phenotypes or biological functions [11,12]. A primary goal of agricultural sciences is to increase the accumulation of economically important and biologically functional phytochemicals, as well as to reduce undesirable compounds in the edible product. Under this perspective, metabolomics is widely employed to analyze the variations of compounds that are known to affect the quality (e.g., nutritional value, appearance, flavor, aroma, and palatability) of fresh vegetables under a variety of experimental factors [13,14]. Pre-harvest quality is a complex trait that depends on both qualitative and quantitative plant traits [15,16], and it is well-known that yield and product quality depend on genetic factors, the environment, and their interaction.
In this work, we present a dataset derived from the metabolomics analysis of two lettuce (L. sativa L. var. capitata) butterhead varieties, "Descartes" RZ (hereafter "Green Salanova") and "Klee" RZ (hereafter "Red Salanova"), which differ in leaf coloration (light-green and full red, respectively) (www.salanova.com). The metabolomics data of the leaves are relative to the influence of three major pre-harvest factors, namely, two levels of light intensity, three levels of the ionic strength of the NS, and three ratios of some macrocations (K, Ca, and Mg). For each pre-harvest factor, experiments were carried out independently in the same open-gas-exchange growth chamber, using hydroponics (nutrient film technique) (Supplementary Figure S1). Further details on the experimental factors, as well on the plant characteristics, can be found in the parent scientific papers [17][18][19].
Besides fostering new collaborations, the value of this dataset, in our opinion, is multifold. First, sharing the data will allow to make independent discoveries that, for instance, may stem from the meta-analysis of the different experimental factors, as well as from pre-testing assumptions on specific metabolic pathways or biochemical classes [20,21]. Moreover, researchers interested in evaluating the role of the genetic differences in lettuce can focus their analysis on the quest for the inheritable factors that are at the basis of the metabolomics profiles of the two different genetic backgrounds [22,23]. In addition, the here-presented fluctuations in metabolites can be exploited to develop hypotheses for a deeper understanding on the casual relationship between the environment and the leaf phytochemical profile [24]. Additionally, investigators can aim at understanding the quantitative dynamics of quality-related traits in relation to multifactorial environmental conditions, or distinct components of metabolic routes [25,26]. Finally, the metabolomic response to different perturbations provides information that, in the long term, will ultimately lead to model the lettuce biological system [27,28]. Considering recent advances in lettuce genomics, we also hope that the reconstruction of the lettuce metabolic system will be able to exploit an accurate genome annotation and experimentally validated species-specific biochemical data [29]. All this knowledge is essential to model crop biochemical interaction with the environment, and to establish a genome-scale network that controls the most commercially important and nutritional valuable features of the edible product [30].

Data Description
Data were organized in tabular form in a Microsoft Excel workbook .xlsx file, available as Supplementary Table S1. The workbook has three spreadsheets, each referring to one experimental factor, namely the ionic strength of the NS, the macrocation composition of the NS, and the light intensity.  [31,32]. Specifically, in each sheet, samples are presented in columns. For each column, the first two cells report the genotype and the experimental condition, using the abbreviation code indicated above. Moreover, individual samples under each experimental condition are distinguished by a unique code starting with an Arabic number, present in the third cell of a column/sample (Supplementary Figure S2). It is therefore possible to merge the three sheets by (common) raw names without losing the sample identifier. The column/sample's cells report the peak intensity value (raw abundance of extracted ion current). According to the software used (Agilent Mass Profiler professional B12.06), missing/uninformative values are reported as 1.
The identified chemical compounds are in rows and the CPD ID (PlantCyc annotation code, available at plantcyc.org is reported in the first column. In addition, we provide in each sheet a common chemical name for each compound, and the composite spectrum (mass-abundance combinations) in the last two columns.

Experimental Design and Sample Collection
Plants grew in the same 28 m 2 open-gas-exchange growth chamber equipped with high pressure sodium lamps (Master SON-T PIA Plus 400W, Philips, Eindhoven, The Netherlands) at the Department of Agricultural Sciences, University of Naples Federico II (Portici, Italy). Air temperature was set at 24/18 • C (light/dark) with a 12 h photoperiod and a relative humidity maintained within 60-80% by a fog system. Seeds of two lettuce varieties, Descartes RZ (also known as 'Green Salanova') and Klee RZ (also known as 'Red Salanova'), both from Rijk Zwaan (Der Lier, The Netherlands), were sown in vermiculite. Seedlings were transplanted 15 days later, at the two-true leaves stage in a 7 cm side rockwool cube (Delta, Grodan, Roermond, The Netherlands), and plants were cultivated using the nutrient film technique with closed polypropylene gullies. The main parameters were as follows: 1.5 L min −1 nutrient flow, 1% inclination of the gullies, and 25 L maximum tank capacity. The number of replications, the detailed composition of the various nutrient solutions, and other technical parameters were as already reported [17][18][19]33,34]. Leaves were harvested 19 days after transplant.

Chemicals and Isolation from Leaves
Samples (1.0 g of leaf dry weight) were extracted in 20 mL of 0.1% formic acid in 80% aqueous methanol LCMS grade (VWR International, Milan, Italy) as previously described [35]. An Ultra-Turrax (Ika T-25, Staufen, Germany) was used for sample homogenization and extraction, and the extracts were centrifuged at 12,000 g (Eppendorf 5810R, Hamburg, Germany) and then filtered through a 0.2 µm cellulose disposable syringe cartridge into amber glass vials for analysis.

Ultra-Performance Liquid Chromatography Mass Spectrometry (UHPLC-MS/MS)
Untargeted metabolomics was carried out by using ultra-performance liquid chromatography mass spectrometry (UHPLC)-QTOF mass spectrometry, employing a 1290 ultra-high-performance liquid chromatograph and a G6550 iFunnel QTOF mass spectrometer (Agilent Technologies, Santa Clara, CA, USA) as described [36]. An electrospray ionization source in positive polarity was also used. Chromatography was achieved in reverse-phase mode using an Agilent PFP column (2.0 × 100 mm, 3 µm) and a mobile phase of acetonitrile in water (6% to 94%, in 33 min runtime with a flow rate of 200 µL min −1 ). The mass spectrometer was operated in SCAN mode (100-1000 m/z), with a nominal resolution of 40,000 FWHM and in the extended dynamic range mode. The injection sequence was randomized, and quality control samples (QCs, made by pooling an aliquot of each extract) were injected throughout the sequence. Quality controls were separated under the same chromatographic conditions used for samples but analyzed in data-dependent MS/MS mode, using 12 precursors per cycle (1 Hz, 50-1000 m/z, positive polarity, active exclusion after two spectra) with collision energies of 10, 20, and 40 eV.

Feature Extraction and Data Pre-Processing
Raw spectra were processed using the "find-by-formula" algorithm (i.e., a combination of monoisotopic mass, isotope spacing, and isotope ratio) by the Agilent Profinder B.07 software (Agilent Technologies), with 5 ppm for mass and 0.05 min for retention time alignment. Compounds were putatively annotated against the database PlantCyc 12.6 (Plant Metabolic Network, http:// www.plantcyc.org; downloaded April 2018), according to Level 2, with reference to the COSMOS (Coordination of Standards in MetabOlomicS) metabolomics standards initiative [37]. Only the compounds annotated in at least 75% of replications within at least one treatment were retained for further analysis. Quality controls (QCs) obtained from pooled samples were used to achieve a higher degree of confidence in annotation using the MS-DIAL 3.98, by comparing the experimental MS/MS spectra to the publicly available MS/MS libraries built into the software (e.g., MoNA) [38]. Post-annotation processing was finally carried out using Agilent Mass Profiler Professional B.12.06 (Agilent Technologies), as previously described [36].

User Notes
The data should be easily exported into a tabular comma-separated values (.csv) format to be processed (for instance, in R) in scientific applications designed for metabolomics and other statistical software, as previously reported [33,34]. According to the destination software, it may be worth checking the delimiter (typically, a single comma or semicolon) and enable/disable the control for syntactically valid variable names. For the R environment, it is advisable that uninformative values are replaced by "NA", although in our experience this step is not strictly necessary, because these data are typically filtered out.