The Chloroplast Genome of Endive (Cichorium endivia L.): Cultivar Structural Variants and Transcriptome Responses to Stress Due to Rain Extreme Events

The chloroplast (cp) genome diversity has been used in phylogeny studies, breeding, and variety protection, and its expression has been shown to play a role in stress response. Smooth- and curly-leafed endives (Cichorium endivia var. latifolium and var. crispum) are of nutritional and economic importance and are the target of ever-changing breeding programmes. A reference cp genome sequence was assembled and annotated (cultivar ‘Confiance’), which was 152,809 base pairs long, organized into the angiosperm-typical quadripartite structure, harboring two inverted repeats separated by the large- and short- single copy regions. The annotation included 136 genes, 90 protein-coding genes, 38 transfer, and 8 ribosomal RNAs and the sequence generated a distinct phyletic group within Asteraceae with the well-separated C. endivia and intybus species. SSR variants within the reference genome were mostly of tri-nucleotide type, and the cytosine to uracil (C/U) RNA editing recurred. The cp genome was nearly fully transcribed, hence sequence polymorphism was investigated by RNA-Seq of seven cultivars, and the SNP number was higher in smooth- than curly-leafed ones. All cultivars maintained C/U changes in identical positions, suggesting that RNA editing patterns were conserved; most cultivars shared SNPs of moderate impact on protein changes in the ndhD, ndhA, and psbF genes, suggesting that their variability may have a potential role in adaptive response. The cp transcriptome expression was investigated in leaves of plants affected by pre-harvest rainfall and rainfall excess plus waterlogging events characterized by production loss, compared to those of a cycle not affected by extreme rainfall. Overall, the analyses evidenced stress- and cultivar-specific responses, and further revealed that genes of the Cytochrome b6/f, and PSI-PSII systems were commonly affected and likely to be among major targets of extreme rain-related stress.


Introduction
Endives are widely consumed in Europe as fresh or minimally processed and packaged salads with healthy properties [1].They belong to the species Cichorium endivia (fam.Asteraceae) with the botanical varieties crispum and latifolium, which provide the genetic pools for the breeding of curly-and smooth-leafed commercial cultivars [2].Endive is a low-input and cold-tolerant crop, although it is quite sensitive in terms of yield and quality to some stresses such as waterlogging or heavy rainfall [3].More than 90% of the Italian production takes place outdoors (2037 tons in 2021; http://dati.istat.it(accessed on 13 September 2023)) and commercial varieties have been selected to guarantee quality standards under variable environmental conditions and continuous production cycles that overcome seasonality.In the current context of climate change, the intensification of unpredictable rainfall excess events (together with inadequate drainage systems and the soil's intrinsic inability to store it) causes extreme damage to this crop even from shortterm waterlogging [4].Roots are the first anoxia-affected targets by soil anaerobiosis that trigger multi-level responses of the aerial organs resulting in stomatal closure, reduced leaf chlorophyll content and photosynthesis, and accelerated senescence [5].The wholeplant response that alters nutrient contents is complex and varies with stress intensity and genotypes, for instance, a cultivar-specific carbohydrate drop occurred in endive challenged by natural short-term waterlogging [6].Contextually, the chloroplast (cp) plays a central role in photosynthesis [7] and its genome responds to mechanical stresses/stimuli [8,9] such as water spray, precipitation hits and wounding of biotic or abiotic origin (e.g., pest bites or hail).Studies on the effects of heavy rainfall on plastid organelles are scarce and needed on a broad scale because of the expected increase in frequency and severity of extreme rainfall events [10].Angiosperm cp genomes maintain a quadripartite structure [11,12].However, the occurrence of sequence diversity [13] has allowed refined phylogenetic classification of the Asteraceae family into species, tribes, and subtribes [14,15].Moreover, SNP markers have been useful for intraspecific genotyping with the aim of genetic breeding [12,16,17].In this context, several resources (https://npgsweb.ars-grin.gov/gringlobal/search(accessed on 13 September 2023)) and markers are available for endive genetics [18], ranging from a reference genome sequence [19] and tissue-specific transcriptomes [6,20], although the cp genome sequence has not yet been published.Here, a chloroplast reference genome was assembled by DNA-Seq and structural variation in curly and smooth cultivars was further explored by RNA-Seq data.The cp transcriptome response was monitored in cultivars from two case studies of heavy pre-harvest rainfall (affecting production) compared to a standard cycle without extreme events.Gene expression analyses revealed a number of photosynthesis genes that showed similar expression changes, suggesting that they are common targets of extreme precipitation and may contribute to the resulting nutrient (sugar) loss.

Plant Material, Growth Conditions, Samplings and Studied Models
Enza Zaden Italy s.r.l.provided seeds and plants of curly-and smooth-leafed genotypes, which were patented cultivars and experimental lines, respectively identified by names and number codes.Curly-types: 'Domari' (D), 'Myrna' (M), 'Imari' (I), 'A32861' (A32); smooth-types: 'Confiance' (C), 'Flester' (F), 'E02S.0338'(E02S).The cultivation cycles spanned September-November 2011 and 2012; the dates of outdoor seedling transplant and harvest were 14/9/2011-17/11/2011 and 14/9/2012-15/11/2012.The growth occurred in the same parcels at Tarquinia (Lazio, Italy, 42 • 15' N 11 • 44' E, 31 m a.s.l.).Standard cultivation procedures, soil, water content variation, rainfall (R), relative humidity (RH), and temperature (T) were previously detailed [6,21,22].In 2013, D and F only were grown in Conversano fields (41 • 00' N 17 • 50' E, 100 m a.s.l.) in soils with similar characteristics as Tarquinia's [22] using standard interventions.Transplant and harvest dates were 15/9/2013-22/11/2013. Table S1 reports monthly average values of R, RH, and T of the entire cycle (September-November), while R daily values (Figure S1A) and T weekly values (Figure S1B) regard one month before harvest.The year 2011 was used as a reference because unexpected events of excess rain did not happen.Synoptically, R was much higher in 2012 than in 2011 (+200%) before harvest (November values in Table S1), while a low increase of T and RH was recorded (+5 and +7%).The extreme rain event of 11/11/2012 (Figure S1A) caused soil saturation (0 kPa) and root-zone waterlogging for 72 h, after which draining to field capacity (10 kPa) was restored [6].In the cultivation of 2013, R was +49%, RH −3%, T +5% (November values in Table S1), and the rainfalls of 12 and 19/11/2013 (Figure S1) did not cause waterlogging.The transcriptomic analyses were conducted using D, M, C, and F grown in 2012, and D and F grown in 2013 compared to respective cultivars grown in 2011.In all experiments, sampling included 9 heads per cultivar, accurately selected according to market standards.The leaves (n = 10) intended for consumption were cut from each head and pooled into a cultivar-specific bulk, which was subdivided into 3 replicate batches (RB) of 30 leaves each.RBs of equal weights were frost in liquid nitrogen and gently crunched, and aliquots were ground to powder and used for nucleic acid isolation.As for variant studies based on RNA-Seq, all genotypes were used.
For gene expression analyses, HISAT2 (ver 2.2.1, parameter "dta") and StringTie (v2.2.1) were respectively used to map the reads on the reference genome and assemble them into transcripts.Transcript-level count data were used to estimate gene-level abundances by the R-package Tximport (v.1.28.0).Analysis of differentially expressed genes (DEG) was achieved by DESeq2 package [27] and by selecting genes with false discovery rate (FDR) ≤ 0.05 and an absolute log2 fold change ≥1.

Phylogenetic Trees
A phylogenetic tree was constructed using 17 chloroplast genome sequences from species (listed below) of the Cichorieae tribe.MAFFT v.7.490 (https://mafft.cbrc.jp/alignment/software/ (accessed on 13 September 2023)) produced sequence alignment (Data S1) and MEGA11 (https://www.megasoftware.net/(accessed on 13 September 2023)) produced both the distance matrix and the Maximum Likelihood tree with 1000 bootstrap iterations.The phylogenetic tree of botanical varieties and cultivars of endive was built using SNP concatemers, which generated FASTA sequence files further processed by MEGA11.

Metabolite Profiling of Hydro-Soluble Compounds by NMR
The whole methodology to achieve water-soluble fraction starting from 25 mg of lyophilized ground leaves of endives and producing both NMR spectra (Bruker AVANCE 600 NMR; proton freq. of 600.13 MH) and metabolite concentrations were detailed previously [6].

Phylogenetic and Sequence Variant Analyses
The 'Confiance' chloroplast genome sequence was included in a phylogenetic tree construction that was restricted to species of Cichorioideae subfamily (Figure 2) to address whether chloroplast DNA polymorphisms could separate the subtribe groups.The genus Cichorium formed a phyletic group per se with the well-separated endivia and intybus species at the level of subtribes.Structurally, the 'Confiance' cp genome harboured SSR variants (Tables 2 and 3) and RNA editing sites (Table S2).Di-, tri-, and tetra-nucleotides were scored a maximum of six times (Table 2); briefly, the three-time repeated trinucleotides were the most frequent (65 times) and the AAT/ATT motifs prevailed (34.3%).Moreover, out of 48 SSRs, 43 fell in 24 in mRNAs, 3 in tRNAs, and 2 in rRNAs, and the ycf2 genes contained the highest SSR number, followed by the ndh group (Table 3).
Cytosine to uracil (C/U) changes are typical of Asteraceae, as variant calling on the DNA-Seq data of 'Confiance' did not reveal any variants (not shown), DNA/RNA-Seq comparison within the cultivar (Table S2) identified 20 C/U events (homo + het) out of 104 SNPs (Table 4) falling in 12 protein-coding genes, and all cultivars showed C/U homozygous switches in 13 identical positions (Table S2, grey shaded rows), suggesting that RNA editing was conserved.Nucleotide variation was further explored using RNA-Seq data of seven cultivars that showed 70 to 98% coverage vs. the 'Confiance' reference genome (Figure 3 and Table 4).
Focusing on cultivars with >90% coverage, the variant number was higher in smooth (C and F) than curly (M and D) ones, 141-152 vs. 113-129.As for all cultivars, the normalised frequency of variants (total variants/RNA-Seq covered region) ranged from 0.79 in M to 1.08 in A32 (Table 4).A phylogenetic tree was built by SNP concatemers using 7 cultivars; separation into the curly and smooth botanical varieties was observed though supported by modest bootstrap values (Figure S3).As for the SNP putative effects (Table 5), 10 were intergenic and 41 lay in coding regions; variants with high and moderate impacts were ca.55%, the high ones (5.9%) were of stop-lost/gained types, whilst the moderate ones (49%) were all missense.Private SNPs were assigned (Figure 4) in those regions that were fully covered by the reads of all cultivars and found as heterozygous type only in I (2), M (1), and E02S (5).The private SNPs of M, I, and E02S (1, 2, and 4, respectively) fell into protein-coding genes, and a high-impact variant occurred in rpl14 of Imari (Table 6).Among all cultivars, six genes (Table 7) contained Genes 2023, 14, 1829 9 of 15 homozygous SNPs of high or moderate impact that are predicted to cause protein changes; specifically, the ndhD, ndhA, psbF genes shared moderate events in 6 out of 7 cultivars.variant annotation in the format: "reference amino acid" "position" and "changed amino acid".*, SNP occurrence.The notation "Ter598Gluext*?"refers to the loss of the normal termination site, occurring at position 598, its substitution by a Glu-codon, and the addition of a tail of new amino acids of unknown length (position *?), since the shifted frame does not contain a new stop codon.

Cp Transcriptome Response to Heavy Rain-Related Stress Before Harvest
Two case studies related to cultivation that underwent rain excess before harvest (Figure S1) were used to compare leaf cp expression with that of a cycle unaffected by extreme events.As for the high vs. moderate rainy year (HRY vs. MRY), a 60% rain surplus occurred in November (Table S1, bold values), and loss of several nutrients was recorded in D and M cultivars by NMR survey (Table S3), with significant impact on carbohydrate levels.In the second case study, cultivars D, M, C, and F were affected by excessive rainfall and waterlogging as described above [6].
Looking at the cp transcriptome variation of HRY vs. MRY (Table 8, left columns), we scored 14 differentially expressed genes (DEGs) with modest fold change variation (min and max values were −1.27 and 1.51).The smooth F was more responsive than the curly D (13 vs. 8 DEGs); both cultivars shared the up-regulation of 2 genes belonging to cytochrome b 6 f (Cyt.b6/f) and 2 of the large subunit ribosomal protein groups (LS-RP), and the down-regulation of 3 genes of the photosystem I and II groups (PSI and II).Moreover, it was observed the cultivar F-specific variation of 6 genes, which were scattered in the groups of NADH-oxidoreductases (NADH-OR), PSI and II, RNA polymerases (RNA-POL), and small subunits ribosomal protein (SS-RP), while cultivar D-specific variation included 1 gene of the PSI group.As for the cp expression in HRY + WL vs. MRY (Table 8, right columns), a total of 25 genes among the four cultivars showed expression variation ranging from −2.14 to 2.08 fold changes; the smooth C and F were more responsive than curly D and M (18-22 vs. 13-14 DEGs).All cultivars shared the down-regulation of 5 genes within the PSI and PSII groups, and the up-regulation of 3 genes of the Cyt.b6/f and 1 (very modest) of the NADH-OR group; cultivar-specific expression also occurred (e.g., rpoC2 and rpl18 were unique DEG in 'Confiance').Overall, the altered expression pattern recurred in genes of the Cyt.b6/f,PSI, and PSII systems in the two case studies, suggesting that these may be among the major targets of extreme rain.

Discussion
The endive chloroplast genome here sequenced is conformed in the typical quadripartite structure [11,12] and its size falls in the range of species of the Cichorieae tribe [28].Narrowing comparisons between the C. intybus and endivia close species, we observed that the latter showed a 166 bp smaller genome [29] due to shorter LSC (−319 bp) and longer IR and SSC (21 and 132 bp), higher n. of genes (136 vs. 127), of protein coding ones (90 vs. 74), and of tRNAs (38 vs. 29), and lower n. of rRNAs (8 vs. 24).Moreover, pafI and clpP contained two introns in both species, which also showed similar GC content (37.7 vs. 37.3%).Chloroplast genome variation within an individual (heteroplasmy) can be due to variable orientation of SSC and IR zones (defining distinct haplotypes) or due to DNA sequence polymorphism [30,31].The endive plastome was constructed by short reads and did not allow the assembly of two haplotypes, which are known to occur in lettuce through long-read sequence technology [30].However, several polymorphic events were revealed within the same cultivar by multiple RNA seq runs.Hence, the SNP pools that existed after selecting out the RNA editing motifs known for Asteraceae [32] indirectly support the sequence heteroplasmy as observed in cultivars of other crops [33,34].
In most species of Asteraceae, the positions of rps19 and Ψ rps19 respectively mark the LSC/IRB and IRA/LSC junctions and those of ycf1 and Ψ ycf1 mark the IRB/SSC and SSC/IRA junctions [28].Here (Figures 1 and 3), the endive plastome sequence showed swapped positions for ycf1 and Ψ ycf1, as observed both in Asteraceae [35] and other species [36], indicating that the SSC orientation may belong to the haplotype A [30].After refining the annotation of both rps19 and Ψ rps19 from C. intybus (NC_043842.1),we observed that both species shared the same orientations.In coding regions, ycf2 showed a high SSR number in both IRs, consistently with other Asteraceae species [35,37] representing a region of variability useful to develop highly specific markers.Cp sequence variability allowed the separation of several species at the subtribe level so that Cichoriinae fell apart from Lactucinae, Crepidinae, and Hyoseridinae, which is consistent with the effective use of chloroplast sequence to place subtribes within genera [38].The 98.4% RNA-Seq coverage vs. cp genome in Confiance is consistent with the almost full transcription in plants [39,40] and that of the other cultivars ranged from 74 to 96%, scoring variants inclusive of private SNPs.The association between gene variants and cultivar leaf-type can be attributed to endive breeding programs based on crossing the same types rather than a functional impact on leaf shape.The trend of SNP number was higher in smooth-than curly-types though a much larger variety pool is needed to confirm it.Most cultivars commonly hosted SNPs of moderate effect (missense) in ndhA, D, and psbF genes.The ndhA and psbF of some species undergo RNA editing and transcription slippage that explain variability [41]; Table 6 of SNP impact excluded C/U editing, hence, the computed protein isoforms may account for diversified functions related to stress response/protection as shown for ndh genes [42].The ycf1 hosts frequent nonsynonymous SNPs, which were observed in endives and are often used in barcoding alternatively to the matK and rbcL genes [43].Similarly, the rpo genes are relatively fast-evolving sequences used for marker production [44]; however, SNPs in endive rpoC1 suggest a high functional impact that prompts further investigation.
Here, the affected leaf nutrient content confirmed that endive is a sensitive crop to excessive rainfall before harvest [6]; therefore, it is useful to gain insight into the cp transcriptome response of cultivars under this multifactorial stress.In both case studies, the smooth types showed a higher number of responding genes than the curly ones, and the range of expression variation was higher in all cultivars that underwent waterlogging.The former results suggest the occurrence of cultivar-dependent plasticity, which has also been observed in stressed lettuce [45] and deserves further study.The latter results may be due to the severity and/or complexity of the stress.The similar expression patterns observed in the two stress situations (one of which was exacerbated by waterlogging) could be the result of different transduction and control processes, considering the mutual interactions between chloroplast and nucleus [46].Consistently, waterlogging is known to affect chloroplast morphology in sensitive lines [47] accompanied by the involvement of gene expression re-arrangements.In this work, the up-regulation of RPL20 and RPL33 in D and M cultivars is consistent with the stress status of the chloroplast; the former gene is essential for ribosome assembly and the latter is highly sensitive to temperature stress [48].Moreover, the upregulation of Cyt.b6/f genes is accompanied by the down-regulation of some PSI and PSII ones in both case studies, supporting that these are common targets of the two stresses.Cytochrome b6/f controls the electron transfer from PSII to PSI and alleviates oxidative stress [49], so the altered and "orchestrated" expression would sustain that the cp machinery was undergoing photosynthetic impairment and oxidative stress.Consequently, it is proposed that such an alteration contributes to the loss of leaf carbohydrate content, although further studies targeting gene function are needed.

Figure 1 .
photosystem assembly/stability factors RubisCO large subunit NADH dehydrogenase ATP synthase cytochrome b/f complex photosystem II photosystem I

Figure 2 .
Figure 2. Phylogenetic analysis.The maximum likelihood tree of species belonging to the Cichorieae tribe was built using their complete chloroplast genomes (accession numbers are in the material and methods section).The chloroplast sequence of Cucumis sativus (fam.Cucurbitaceae) was used as an outgroup.The bootstrap values were inferred from 1000 replicates.The scale bar reports the number of changes per site.

Figure 2 .
Figure 2. Phylogenetic analysis.The maximum likelihood tree of species belonging to the Cichorieae tribe was built using their complete chloroplast genomes (accession numbers are in the material and methods section).The chloroplast sequence of Cucumis sativus (fam.Cucurbitaceae) was used as an outgroup.The bootstrap values were inferred from 1000 replicates.The scale bar reports the number of changes per site.

Figure 3 .
Figure 3. Coverage plots.Top: coverage of DNA-Seq (black line) and RNA-Seq (coloured lines) data over the reference cp genome measured as the logarithm of counts per million reads aligned over a given region.Bottom: linearised map of the cp genome.Genes marking region boundaries are bolded.*, Single intron gene; **, gene containing two introns; Ψ, pseudogene.

Figure 3 .
Figure 3. Coverage plots.Top: coverage of DNA-Seq (black line) and RNA-Seq (coloured lines) data over the reference cp genome measured as the logarithm of counts per million reads aligned over a given region.Bottom: linearised map of the cp genome.Genes marking region boundaries are bolded.*, Single intron gene; **, gene containing two introns; Ψ, pseudogene.

Figure 4 .
Figure 4. UpSet plot of private and shared SNPs among the seven cultivars.In the upper panel, the light and dark grey vertical bars correspond to the number of SNPs that are genotype-specific or shared by two or more genotypes, respectively.The number of SNPs in each set is given above the bars.Single and connected dots indicate the cultivar (names are on the left panel) that shares each gene group.Horizontal bars indicate the total number of SNP per cultivar.

Table 1 .
Summary of the complete genome features of C. endivia chloroplast.

Table 2 .
SSR in cp of Confiance.
a, Total variants account for the sum of heterozygous and homozygous for alternative allele genotypes.b, Normalised with respect to the length of the Confiance cp genome.

Table 3 .
Distribution of SSR in cp genes.
a, Total variants account for the sum of heterozygous and homozygous for alternative allele genotypes.b, Normalised with respect to the length of the Confiance cp genome.

Table 5 .
Overview of SNP effects.

Table 6 .
Private SNPs in genes.

Table 5 .
Overview of SNP effects.

Table 6 .
Private SNPs in genes.

Table 7 .
Homozygous SNPs and likely effect on deduced proteins.

Table 8 .
Cp gene expression variation in years affected by heavy rain.