The Investigation of Perennial Sunflower Species (Helianthus L.) Mitochondrial Genomes

The genus Helianthus is a diverse taxonomic group with approximately 50 species. Most sunflower genomic investigations are devoted to economically valuable species, e.g., H. annuus, while other Helianthus species, especially perennial, are predominantly a blind spot. In the current study, we have assembled the complete mitogenomes of two perennial species: H. grosseserratus (273,543 bp) and H. strumosus (281,055 bp). We analyzed their sequences and gene profiles in comparison to the available complete mitogenomes of H. annuus. Except for sdh4 and trnA-UGC, both perennial sunflower species had the same gene content and almost identical protein-coding sequences when compared with each other and with annual sunflowers (H. annuus). Common mitochondrial open reading frames (ORFs) (orf117, orf139, and orf334) in sunflowers and unique ORFs for H. grosseserratus (orf633) and H. strumosus (orf126, orf184, orf207) were identified. The maintenance of plastid-derived coding sequences in the mitogenomes of both annual and perennial sunflowers and the low frequency of nonsynonymous mutations point at an extremely low variability of mitochondrial DNA (mtDNA) coding sequences in the Helianthus genus.


Introduction
From a genomic point of view, plants are unique organisms, since their cells have three genetic systems-nuclear, plastid, and mitochondrial-which function independently, but are simultaneously intimately interconnected. Nuclear genomes are of primary interest, while cytoplasmic genomes, especially mitochondrial, are often underestimated [1]. Mitochondrial DNA (mtDNA) in higher plants usually has complex multipartite structures, frequent insertions, and rapid rearrangements [2]. Mitochondrial genomes show significant variability not only between species but sometimes even within the same species [3]. Thus, even in species whose nuclear genomes have been investigated, the mitogenomic information is often lacking or incomplete [4].
The genus Helianthus is a diverse taxonomic group with approximately 50 species [5], divided into four sections: two annual sections, Helianthus and Agrestis, and two perennial sections, Ciliares and Divaricati [6]. Most sunflower genomic investigations are devoted to the economically valuable species, such as the annual species H. annuus, and in the case of the perennial species, H. tuberosus, while the other Helianthus species, especially perennial, are predominantly a blind spot. To the best of our knowledge, very few studies of perennial sunflower mtDNA are available [7]. Moreover, to date, the complete mitochondrial genomes of only one species-H. annuus-are deposited in GenBank. Mitochondrial DNA of cytoplasmic male sterility (CMS) sources, obtained from wild progenitors, may be considered as exceptions-for instance, the MAX1 CMS mitogenome [8] was gained through hybridization with Helianthus maximiliani Schrad as a maternal form. Since CMS systems are essential for the hybrid seed production of crops [9], particularly of sunflowers [10], the investigations of mitochondrial genomes of related wild species are also of interest for agricultural purposes.
As with H. maximiliani, the two species H. grosseserratus and H. strumosus represent perennial species and belong to the section Divaricati. H. maximiliani and H. grosseserratus are diploid, whereas H. strumosus can be either tetraploid (2n = 4x = 68) or hexaploid (2n = 6x = 102) [6]. The crossability between diploid species in the genus Helianthus is relatively good. However, while crossing tetraploid or hexaploid species is possible, there are considerable problems which require the application of biotechnological methods [11]. Many sunflower species, mostly diploid ones, have been used to develop new CMS systems for hybrid breeding in sunflowers, but very little is known about the general effects of different cytoplasms on sunflower performance. For instance, in field trials of Jan et al. [12] annual and perennial sunflower species were used to gain 20 cytoplasmic substitution lines, which were compared with the HA89 line. Although alloplasmic lines with cytoplasms of wild species showed predominantly neutral or negative effects on agronomic traits, such as an elevated level of lodging or reduced yield, the beneficial effects were also mentioned [12]. The characteristics of the achene walls are important for the industrial processing of sunflower seeds. H. grosseserratus, H. maximiliani and H. strumosus are characterized by a thin pericarp with a reduced sclerenchymatic layer, which represents an interesting trait in breeding for improved hullability [13].

Plant Material and Mitochondrial DNA Extraction
The perennial species of Helianthus: H. grosseserratus Martens and H. strumosus L. ( Figure S1) were obtained from the collection of the N. I. Vavilov All-Russian Institute of Plant Genetic Resources. Plant leaves were used for mitochondrial DNA isolation. For H. strumosus, the previously described method was used [14]. But for H. grosseserratus, we used a slightly different technique of multi-step centrifugations. but for H. grosseserratus, Leaves (5 g, without petiole and midrib) were homogenized by mortar and pestle in 20 ml of STE buffer (0.4 M sucrose, 50 mM Tris pH 7.8, 4 mM EDTA-Na2, 0.2% bovine serum albumin, 0.2% 2-mercaptoethanol) and then centrifuged using several steps: (1) 500 g for 5 min, picking the supernatant, (2) 4000 g for 5 min, picking the supernatant, (3) 10,000 g for 15 min, discarding the supernatant. The pellet was treated using 10 units of DNAse (Syntol, Moscow, Russia) for 7 min and then used for DNA isolation. For both samples, the DNA extraction was performed with a PhytoSorb kit (Syntol, Moscow, Russia), according to the manufacturer's protocol.

Next-Generation Sequencing
Next-generation sequencing (NGS) libraries were made with a Nextera XT DNA Library Prep Kit (Illumina, San Diego, CA, USA), following the guidelines of Illumina. The quality and quantity of the libraries were evaluated with the Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA) and with a Qubit 4 fluorometer (Thermo Fisher Scientific, Waltham, MA, USA). Sequencing was performed on different runs of NextSeq 500 (Illumina, San Diego, CA, USA) with the NextSeq 500/550 High Output Kit v2.5: 150 cycles for H. grosseserratus and 300 cycles for H. strumosus. A total number of 1,349,630 75-bp paired reads and 3,014,579 150-bp paired reads were generated.

Results
First of all, it should be mentioned that different mitochondrial fraction isolation techniques showed significant variations in the amount of mtDNA derived sequences for the two perennial Helianthus species. In H. strumosus, only about 3.9% of the reads mapped to the mitogenome, whereas the NGS library of H. grosseserratus predominantly contained mitochondrial reads (67.3%). Such difference significantly influenced the complexity of mitogenome assembly. In the case of H. grosseserratus, a relatively small amount (1.3 mln) of short (75 bp) reads were obtained, and the assembly with a k-mer value equal to 65 generated only 14 high coverage (depths > 50) large contigs (N50 = 55,431) which allowed a quick scaffolding. In the case of H. strumous, twice as many elongated reads (3 mln, 150 bp) were observed, but many smaller contigs (N50 = 4,632), containing nuclear and chloroplast sequences, were present, which in turn increased the complexity of scaffolding. Additional filtering of mitochondrial contigs had to be performed in the case of H. strumosus mitochondrial genome assembly. Thus, for mitogenome assembly, the proportion of mtDNA in a sample is more crucial than the number of reads.
For both of the studied sunflower species, master circles of the mitochondrial chromosomes were gained ( Figure 1), with lengths of 273,543 bp and 281,055 bp for H. grosseserratus and H. strumosus, respectively. Mitochondrial genomes often have a complex architecture, including sub-genomes or multichromosomal organization [23,24]. However, in the current study, we did not analyze the mtDNA stoichiometry and only focused on the master circle of the mitogenome type.
While comparing the amino acid sequences of the mitochondrial proteins of perennial sunflowers with annual sunflowers (NCBI accession MN171345.1), a limited number of differences could be detected ( Table 1). The most severe changes concerned atp6 and nad6 in the mtDNA of H. grosseserratus. In atp6 two nucleotide changes in the 37th codon resulted in the terminal leading to the N-terminal shortening of ATPase subunit 6, whereas in nad6, a frameshift led to a premature stop codon, resulting in C-terminal truncation of the encoded protein. Table 1.
Single nucleotide polymorphisms localized in protein-coding genes of perennial sunflower species. H. grosseserratus mitogenome includes 21 tRNA genes corresponding to 16 amino acids. The same tRNA genes are present in H. strumosus mtDNA, but an additional one (trnA-UGC) allows, in summary, to carry 17 amino acids by the mitogenome of H. strumosus. The in silico prediction of new open reading frames (ORFs) provided a large number of long (more than 300 bp) ORFs in both species. Meanwhile, we annotated only those (Table 2) with significant similarity to proteins and ORFs (according to the protein blast search) in sunflowers or other Magnoliophyta.
Among the discovered ORFs, we observed several interesting patterns. For instance, we discovered the presence of an ORF encoding the chloroplast-like ribosomal protein S11, in both currently studied species and all up-to-date, complete mitochondrial genomes of Helianthus in GenBank (MN175741.1, MN171345.1, MH704580.1, MG770607.2, MG735191.1, NC_023337.1, CM007908.1). Different ORFs-orf284 (H. grosseserratus), orf334 (H. annuus), and orf365 (H. strumosus)-are encoding psaA-like proteins, which have almost identical amino acid sequences at the N-terminus. Chimeric mitochondrial genes, including cox2 fragments, are often linked to male sterility inducing mtDNA of plants [28]. The orf188 encoded protein contains 58 amino acids identical to the N-terminus of cytochrome oxidase subunit 2. While the orf316, present in H. annuus and H. strumosus, has 67 amino acids similar to the middle part of the COX2 sequence. The protein encoded by orf188 in H. grosseserratus shares almost the same structure as the cox2-chimeric protein (QFS00065.1), which we identified in ANN2, a CMS line of sunflower in a previous study [29]. According to GenBank data, orf316 is an ORF common to sunflower mtDNA. Notably, an uncharacterized protein from the cupredoxin superfamily (XP_022040088.1) annotated in the nuclear genome of H. annuus has a sequence equal to the orf316 encoded protein. The largest ORF (orf633) was observed in the H. grosseserratus mitogenome. Most of the orf633 encoded protein (397 aa) is identical to the N-terminus of ATP ATPase subunit 1.

Discussion
The mitochondrial genomes of perennial sunflower species are 7-9% smaller than annual sunflower mitogenomes. Meanwhile, except for the sdh4 gene, annual and perennial sunflowers share the same protein-coding gene content. The sdh3 and shd4 genes are among the most "unstable" mitochondrial genes, with frequent cases of their transfer to the nucleus, and even related species may have different sets of succinate dehydrogenase genes [25]. The mitochondrial gene sequences showed slight differences between H. grosseserratus and H. strumosus, as well as between H. annuus and both perennial species. Only a few SNPs (1-3 per gene) leading to nonsynonymous substitutions occurred in rps4, cob, rpl16, matR, and atp6. Among the detected polymorphisms, only a single transversion (G to C) in the atp6 gene (Table 1) was common for both perennial species in comparison to H. annuus. Notably, the same SNP in atp6 was observed in the sunflower line with MAX1 (MH704580.1) CMS type, but was absent in PET1 (MG735191.1), PET2 (MG770607.2) CMS lines. Since the CMS lines have cytoplasmic genetic information (plastid and mitochondrial) initially obtained from wild species [30], PET1 and PET2 may be considered as H. petiolaris mitogenomes, and MAX1 as a H. maximiliani mitogenome. This SNP can likely be tracked only in perennial sunflowers, or at least those belonging to the Divaricati section. Sunflower species have unusually diverse karyotypes [31] and high rates of karyotypic evolution [32,33]. Within annual and perennial sunflower diploid species, there is a moderate variation in the sequence of plastid genes and quite a high variation in the nuclear genes [34]. Opposite to nuclear and plastid genes, an extremely low polymorphism level was found in sunflower mitochondrial genes. While only a few species were investigated, future research will clarify this feature. Similar conservation of mitogenome sequences can be observed in other species. Recently, a comparison of the domesticated lettuce (Lactuca sativa) mitogenome with two wild lettuce species (L. saligna and L. serriola) revealed identical sequences and rearrangements for L. sativa and L. serriola, but significant differences with the mitogenome of L. saligna [24]. Such data indicate that domestication had little influence on the mitochondrial genome as L. serriola is regarded as a wild ancestor of the domesticated lettuce [35].
Many mitochondrial genomes have abounded with foreign DNA acquired by horizontal transfer, especially from plastid genomes [36]. However, the functional activity of such plastid-originated insertions is of particular interest. The mitochondrial rps11 gene is excluded from mtDNA in a vast number of eudicots [25,37]. The presence of orf139, encoding a protein that is similar to the chloroplast-like ribosomal protein S11, in the same position (close to cox1 gene) of all up-to-date sunflower mitogenomes, does not seem accidental. The non-coding sequences in plant mitogenomes vary greatly and may rearrange [38], while the coding sequences are conservative [39,40], being under strong purifying selection [41]. Since orf139 has an identical coding sequence in both perennial sunflowers and all up-to-date H. annuus mitogenomes (including CMS lines), we can assume that orf139 plays a vital role in sunflower mitochondria. The functions of the ORFs similar to the psaA gene, are more cryptic. Three ORFs (orf284, orf334, and orf365) have a large identical part and have the same position in the mitogenome-between atp1 and ccmFN (much closer to ccmFN). Thus, it is most likely that the chloroplast DNA (cpDNA) insertion entered the mitochondrial genome in a common sunflower ancestor, and then a limited number of mutations resulted in sequence divergence between these ORFs. Notably, cpDNA-derived sequences with partial homology to the psaA gene have been maintained over long periods in the mtDNA of different Brassica species [42].
Because of high mitochondrial DNA recombination rates, ORFs with partial mitochondrial gene sequences are common in mtDNA [40]. Such chimeric genes often cause the CMS phenotype in plants [28,43]. The orf188 that we annotated in the H. grosseserratus mitogenome is similar to the previously described cox2-chimeric ORF (QFS00065.1), potentially playing a role in the formation of ANN2 CMS type in sunflower [29]. Nevertheless, the coding sequences of these two ORFs are not identical (about 76% similarity), but they have the same appearance in mtDNA. The other notable chimeric ORF in H. grosseserratus is orf633, which has no analogs, among other described sunflower mitogenomes. The orf633 has a large part of sequence which is identical to the atp1 gene, but its role is unclear. Studies have revealed the association between the CMS phenotype with mutations in atp1 or a decrease in protein abundance [44]. However, to the best of our knowledge, there are no data about new atp1-chimeric ORFs involved in CMS. Since the subunit α (F1 sector) of ATPase (atp1 encoded protein) lacks transmembrane domains, the association between additional rearranged atp1 copy (orf633) and CMS phenotype is equivocal. Mitogenomes of higher plants are enriched with ORFs with an unknown function [45,46], while the CMS phenotype is often associated with unusual ORFs present in the mtDNA [47]. Thus, the current study may help to understand whether the ORFs discovered in future research will be new or standard for the sunflower mitogenomes.

Conclusions
The complete master circles of mitogenomes were obtained for H. grosseserratus (273,543 bp) and H. strumosus (281,055 bp). Except for sdh4 and trnA-UGC, both perennial sunflower species had the same gene content and almost identical protein-coding sequences when were compared with each other or with the annual sunflower (H. annuus). Mitochondrial ORFs (orf117, orf139, orf334) common to sunflowers and unique ORFs for H. grosseserratus (orf633) and H. strumosus (orf126, orf184, orf207) were determined. The observed maintenance of plastid-derived DNA coding sequences in the mitogenomes of both annual and perennial sunflowers and the low frequency of nonsynonymous mutations point at an extremely low variability of mtDNA coding sequences in Helianthus genus. The current investigation may be useful for future studies in mitochondrial genomics.