Organization Features of the Mitochondrial Genome of Sunflower (Helianthus annuus L.) with ANN2-Type Male-Sterile Cytoplasm

This study provides insights into the flexibility of the mitochondrial genome in sunflower (Helianthus annuus L.) as well as into the causes of ANN2-type cytoplasmic male sterility (CMS). De novo assembly of the mitochondrial genome of male-sterile HA89(ANN2) sunflower line was performed using high-throughput sequencing technologies. Analysis of CMS ANN2 mitochondrial DNA sequence revealed the following reorganization events: twelve rearrangements, seven insertions, and nine deletions. Comparisons of coding sequences from the male-sterile line with the male-fertile line identified a deletion of orf777 and seven new transcriptionally active open reading frames (ORFs): orf324, orf327, orf345, orf558, orf891, orf933, orf1197. Three of these ORFs represent chimeric genes involving atp6 (orf1197), cox2 (orf558), and nad6 (orf891). In addition, orf558, orf891, orf1197, as well as orf933, encode proteins containing membrane domain(s), making them the most likely candidate genes for CMS development in ANN2. Although the investigated CMS phenotype may be caused by simultaneous action of several candidate genes, we assume that orf1197 plays a major role in developing male sterility in ANN2. Comparative analysis of mitogenome organization in sunflower lines representing different CMS sources also allowed identification of reorganization hot spots in the mitochondrial genome of sunflower.


Introduction
Low substitution rate in genes along with considerable variability in size and structure are distinct features of plant mitochondrial genomes (mitogenome) [1,2]. Reorganization events in mitochondrial DNA (mtDNA) are primarily caused by disruption of a fragile equilibrium of We describe the first assembly of the CMS source ANN2, which occurred spontaneously in Helianthus annuus L. The current study also provides insights into the flexibility of sunflower mitochondrial genome by comparing different isonuclear male-sterile lines HA89 (ANN2, MAX1, PET1, PET2) and the male-fertile line HA89, allowing identification of hot spots for rearrangements in the sunflower mitochondrial genome. For the CMS mechanism in ANN2, new open reading frames were identified, which were transcriptionally active. The ANN2 CMS source may be interesting not only for oilseed hybrid breeding, but also for horticultural purposes, as it is difficult to restore. It is a highly desirable trait in ornamental sunflowers since pollen production is usually not required nor looked-for, except if the pollen color enhances the contrast with the florets.

Rearrangements in the Mitochondrial Genome of the Male-Sterile Line HA89(ANN2)
We assembled the complete mitochondrial genome of the HA89 sunflower line with ANN2 cytoplasmic sterility type (NCBI accession MN175741.1). The master chromosome of HA89 (ANN2) consists of 306,018 bp ( Figure 1), and it is 5071 bp longer than the mitogenome of the male-fertile isonuclear line HA89 (NCBI accession MN171345.1).
The HA89 (ANN2) mitochondrial genome has a wide range of rearrangements as compared to the male-fertile HA89 mitogenome. The summary of whole mitogenome alignment of male-sterile and male-fertile lines is presented in Table 1. We assembled the complete mitochondrial genome of the HA89 sunflower line with ANN2 cytoplasmic sterility type (NCBI accession MN175741.1). The master chromosome of HA89 (ANN2) consists of 306,018 bp (Figure 1), and it is 5071 bp longer than the mitogenome of the male-fertile isonuclear line HA89 (NCBI accession MN171345.1). The mitochondrial genomes of male-sterile ANN2 and male-fertile HA89 share 14 complementary regions, but their localizations and orientation differ. We showed the localization of complementary regions in a scheme with both genomes shown in linear forms in Figure 2. Since regions #1 and #14 in the case of the circular molecule represent the same region, we classified the other twelve regions (#2-#13) as rearrangements.
In most cases, rearrangements only resulted in a reversal of a gene's direction or a change in gene order. However, the 8584 bp (#7) and 21,433 bp (#8) rearrangements influenced the coding sequence of nad6, and the 6029 bp rearrangement (#12) impaired atp6. The largest part of the nad6 gene sequence (~88%) is in the rearrangement #8, while the 3 terminal part of nad6 lies in the rearrangement #7. As a result of the convergence of #8 and #10 rearrangements in mitochondrial DNA of HA89(ANN2), the new nad6-chimeric open reading frame-orf891-was created. Analyses of orf891 transcription pattern gave ambiguous results. Transcripts were detected for both nad6 (HA89) and orf891 (HA89(ANN2)) when using primers derived from the 5 identical sequence of their mRNA (Supplementary Table S1). Nevertheless, using the same forward primer, but different reverse primers (Supplementary Table S1), complementary to the 3 sequence of nad6 and orf891, transcription was detected only for nad6 (the fertile line), but not for orf891 (CMS line). It is important to note that almost all the rearrangements found in mtDNA of HA89(ANN2) are accompanied by other types of genome reorganizations-deletions and insertions. sequence of nad6, and the 6029 bp rearrangement (#12) impaired atp6. The largest part of the nad6 gene sequence (~88%) is in the rearrangement #8, while the 3′ terminal part of nad6 lies in the rearrangement #7. As a result of the convergence of #8 and #10 rearrangements in mitochondrial DNA of HA89(ANN2), the new nad6-chimeric open reading frame-orf891-was created. Analyses of orf891 transcription pattern gave ambiguous results. Transcripts were detected for both nad6 (HA89) and orf891 (HA89(ANN2)) when using primers derived from the 5′ identical sequence of their mRNA (Supplementary Table S1). Nevertheless, using the same forward primer, but different reverse primers (Supplementary Table S1), complementary to the 3′ sequence of nad6 and orf891, transcription was detected only for nad6 (the fertile line), but not for orf891 (CMS line). It is important to note that almost all the rearrangements found in mtDNA of HA89(ANN2) are accompanied by other types of genome reorganizations-deletions and insertions.

Deletions and Insertions in the Mitochondrial Genome of the Male-Sterile Line HA89(ANN2)
In comparison to the male-fertile analog, we identified nine longer than 100 bp deletions in the mtDNA of HA89(ANN2), which are shown in Table 2. Most deletions did not affect the proteincoding sequences, except for two deletions of 316 bp and 1165 bp. The 1165 bp deletion resulted in the total elimination of orf777, while the 316 bp deletion affected the part of the atp6 gene. Interestingly, in previous studies, we also discovered the removal of orf777 from the mitochondrial genomes of two other CMS lines-HA89(PET2) [25] and HA89(MAX1) [26].

Deletions and Insertions in the Mitochondrial Genome of the Male-Sterile Line HA89(ANN2)
In comparison to the male-fertile analog, we identified nine longer than 100 bp deletions in the mtDNA of HA89(ANN2), which are shown in Table 2. Most deletions did not affect the protein-coding sequences, except for two deletions of 316 bp and 1165 bp. The 1165 bp deletion resulted in the total elimination of orf777, while the 316 bp deletion affected the part of the atp6 gene. Interestingly, in previous studies, we also discovered the removal of orf777 from the mitochondrial genomes of two other CMS lines-HA89(PET2) [25] and HA89(MAX1) [26]. Seven longer than 100 bp insertions were detected in mtDNA of the HA89(ANN2) CMS line ( Table 3). As a result of these insertions in the mitochondrial DNA of HA89(ANN2), five new open reading frames, namely orf324, orf327, orf345, orf558, and orf933, have appeared. All five ORFs are transcribed in the case of ANN2, contrary to the HA89 line.
A search for transmembrane domains (TDs) revealed that the protein encoded by orf558 contained a single TD. In the case of orf933, two TDs were detected. The orf933 encoded protein did not show homology to other sunflower proteins in GeneBank, and had only limited similarity (40-60 amino acids) to hypothetical mitochondrial proteins with unknown functions in Lactuca sativa (accession PLY70338.1), Salvia miltiorrhiza (accession YP_008992338.1), Beta vulgaris (accession CBJ23356.1), etc. Forty-six amino acids of the N-terminus of the protein encoded by orf558 matched the N-terminus of cytochrome c oxidase subunit 2 (cox2 gene). Moreover, most of the amino acids that form the transmembrane domain in orf558 protein are identical to those in COX2. However, the sunflower cytochrome c oxidase subunit 2 has two TD and the protein encoded by orf558-only a single one ( Figure 3). So the orf558 represents a chimeric cox2 gene and could potentially play a role in the ANN2 CMS phenotype development. not show homology to other sunflower proteins in GeneBank, and had only limited similarity (40-60 amino acids) to hypothetical mitochondrial proteins with unknown functions in Lactuca sativa (accession PLY70338.1), Salvia miltiorrhiza (accession YP_008992338.1), Beta vulgaris (accession CBJ23356.1), etc. Forty-six amino acids of the N-terminus of the protein encoded by orf558 matched the N-terminus of cytochrome c oxidase subunit 2 (cox2 gene). Moreover, most of the amino acids that form the transmembrane domain in orf558 protein are identical to those in COX2. However, the sunflower cytochrome c oxidase subunit 2 has two TD and the protein encoded by orf558-only a single one (Figure 3). So the orf558 represents a chimeric cox2 gene and could potentially play a role in the ANN2 CMS phenotype development.  The most complex among the discovered ORFs in the HA89(ANN2) mitogenome was orf1197, which has appeared from three simultaneous reorganization events involving the 316 bp deletion, the 430 bp insertion, and the 6029 bp rearrangement. The orf1197 represents a chimeric atp6 gene, with transcription activity specific for the CMS line HA89(ANN2). In sunflower, the atp6 gene typically encodes a protein consisting of 351 aa, whereas the predicted size of the translation product of the orf1197 is 399 aa. Both proteins share 251 identical amino acids in the C-terminus. Thus, the protein encoded by the orf1197 carries all seven TDs present in the C-terminus of the ATP synthase Fo subunit 6 from mitochondria of male-fertile sunflower ( Figure 4). The most complex among the discovered ORFs in the HA89(ANN2) mitogenome was orf1197, which has appeared from three simultaneous reorganization events involving the 316 bp deletion, the 430 bp insertion, and the 6029 bp rearrangement. The orf1197 represents a chimeric atp6 gene, with transcription activity specific for the CMS line HA89(ANN2). In sunflower, the atp6 gene typically encodes a protein consisting of 351 aa, whereas the predicted size of the translation product of the orf1197 is 399 aa. Both proteins share 251 identical amino acids in the C-terminus. Thus, the protein encoded by the orf1197 carries all seven TDs present in the C-terminus of the ATP synthase Fo subunit 6 from mitochondria of male-fertile sunflower ( Figure 4).

Discussion
Recently, we had investigated complete mitochondrial DNA sequences for three CMS sources in sunflower, coming from interspecific crosses: PET1, PET2, MAX1 [25,26]. Comparison of the

Discussion
Recently, we had investigated complete mitochondrial DNA sequences for three CMS sources in sunflower, coming from interspecific crosses: PET1, PET2, MAX1 [25,26]. Comparison of the HA89(ANN2) mitogenome with mitochondrial genome assemblies of male-fertile lines [25,27] and the other HA89 male-sterile analogs provides insights into reorganizations of mtDNA associated with CMS phenotypes. The male-fertile lines (HA89, HA412) have only slight variations in the mtDNA sequence [25]. Whereas the mitogenomes of the CMS sources (HA89(ANN2), HA89(PET1), (HA89(PET2), HA89(MAX1)) showed significant differences as compared to their alloplasmic male-fertile analog. The complete mitochondrial genome of the male-fertile line HA89 adds up to 300,947 bp (NCBI accession MN171345.1), while HA89(PET1) has a size of 305,217 bp (NCBI accession MG735191.1), HA89(PET2) of 316,582 bp (NCBI accession MG770607.2), HA89(MAX1) of 295,586 bp (NCBI accession MH704580.1) and HA89(ANN2) of 306,018 bp (MN175741.1). The difference in genomes sizes is due to several deletions and insertions. For instance, in the mtDNA of all the investigated CMS sources, except HA89(PET1), a similar deletion in the nad4L-orf777-atp8 region was observed. In the case of HA89(PET2), this is due to a 711 bp deletion, in HA89(MAX1) has a 978 bp deletion, and in HA89(ANN2) there is an 1195 bp deletion. All these deletions resulted in removal of orf777 from the mtDNA in the CMS lines. Another region enriched by deletions is the area between cob-ccmFC, here three overlapping deletions were detected: a 451 bp deletion in HA89(PET1), one of 3780 bp in HA89(PET2) and another one of 4204 bp in HA89(ANN2).
The coincidence between locations of these deletions is not accidental. There are three 265 bp repeats present in the sunflower mitochondrial genome, with the following positions in the mtDNA of the male-fertile HA89 line: 36537-36801 (adjacent to atp8), 190074-190338 (next to cob), and 202902-202638 (between orf873-atp1). These repeat regions are shown by red stars in Figure 5. Repeats represent common recombination points in mtDNA molecules [4,28]. The identification of small repeats involved in recombination is important because they influence the maintenance and evolution of mitochondrial genomes [28]. An imbalance in the nuclear-mitochondrion relationship that may occur in distant hybridizations impairs the recombination of mtDNA sub-genomic molecules, therefore, leading to reorganizations in the mitochondrial master chromosome. For instance, in HA89(PET1) the deletion, insertion, and inversion were mentioned in the cob-atp1 region, directly between two repeats ( Figure 5). In HA89(PET2), there were also several rearrangements in hot spots, resulting in the formation of a new gene cluster cob-atp8-cox3, as well as in the translocation of the ccmFC-orf873-atp1 gene cluster into the nad4L-orf777 region, combined with a deletion and huge insertion ( Figure 5). In the mtDNA of both lines, HA89(MAX1) and HA89(ANN2), the specific atp1-atp8-cox3 gene order was created, while in the MAX1 CMS source ccmFC-orf873 translocated into the nad4L-orf777 region (with deletion and huge insertion in this region). In the case of ANN2, the ccmFC-orf873 region is located next to the cob gene ( Figure 5). Thus, we have established that these three 265 bp repeats represent a reorganization hot spots in the sunflower mitochondrial genome.
Considering the insertions discovered in the HA89(ANN2) mitogenome, we also observed a similarity between the insertions of different CMS sources in sunflower. For instance, about 85% of the 1027 bp insertion sequence (ANN2) is complementary to the part of the 15,885 bp insertion (PET2).  nad4L-orf777 region (with deletion and huge insertion in this region). In the case of ANN2, the ccmFC-orf873 region is located next to the cob gene ( Figure 5). Thus, we have established that these three 265 bp repeats represent a reorganization hot spots in the sunflower mitochondrial genome. Although there are similarities in deletions, insertions, and rearrangements between mitochondrial genomes of the HA89(ANN2) and other CMS lines, the discovered ORFs were different. We summarized the data of all identified ORFs in the male-sterile cytoplasms in Table 4. The ORFs encoding proteins with similarity to other mitochondrial proteins and especially having transmembrane domains are of particular interest since the chimeric proteins with TD most often cause CMS phenotypes [10,29]. In the mtDNA of HA89(ANN2), we detected three new transcriptionally active ORFs, encoding proteins with TD-orf558 (one TD), orf933 (two TD), orf1197 (seven TD). The orf933 shows no homology to other sunflower genes, while orf558 represents a chimeric cox2 gene, and orf1197 a chimeric atp6 gene. It is difficult to estimate the exact contribution of orf558, orf933, orf1197 to the development of the male-sterile phenotype in ANN2. However, the possibility of involvement of more than one open reading frame might be one explanation that ANN2 is so difficult to restore. On the other hand, the presence of a suppressor gene S1 discovered by Liu et al. [23] might be the reason for low rates of fertility restoration of the ANN2 CMS source. Previous studies indicate that the chimeric atp6 genes or new ORFs that are co-transcribed with atp6 most often Although there are similarities in deletions, insertions, and rearrangements between mitochondrial genomes of the HA89(ANN2) and other CMS lines, the discovered ORFs were different. We summarized the data of all identified ORFs in the male-sterile cytoplasms in Table 4. The ORFs encoding proteins with similarity to other mitochondrial proteins and especially having transmembrane domains are of particular interest since the chimeric proteins with TD most often cause CMS phenotypes [10,29]. In the mtDNA of HA89(ANN2), we detected three new transcriptionally active ORFs, encoding proteins with TD-orf558 (one TD), orf933 (two TD), orf1197 (seven TD). The orf933 shows no homology to other sunflower genes, while orf558 represents a chimeric cox2 gene, and orf1197 a chimeric atp6 gene. It is difficult to estimate the exact contribution of orf558, orf933, orf1197 to the development of the male-sterile phenotype in ANN2. However, the possibility of involvement of more than one open reading frame might be one explanation that ANN2 is so difficult to restore. On the other hand, the presence of a suppressor gene S1 discovered by Liu et al. [23] might be the reason for low rates of fertility restoration of the ANN2 CMS source. Previous studies indicate that the chimeric atp6 genes or new ORFs that are co-transcribed with atp6 most often cause CMS phenotypes in flowering plants [10,[30][31][32][33]. Therefore, we suggest that orf1197 is the major CMS candidate gene for ANN2 CMS source. Moreover, chimeric atp6 genes were also identified in MAX1 [26] and CMS3/ANT1 [34] CMS types of sunflower. In CMS lines, chimeric atp6 genes encode N-terminal extended proteins compared to the normal ATP synthase subunit 6 (351 aa): ANN2-399 aa, MAX1-429 aa, (AYV91168.1), CMS3/ANT1-437 aa (CAA57790.1). Moreover, 397 of 399 amino acids in the orf1197 protein are identical to the chimeric ATP6 of CMS3/ANT1 line, and therefore this protein represents a shorter version of this 437 aa-long protein. Such similarities support our hypothesis about the importance of orf1197 in shaping the CMS phenotype in ANN2.
In bold: ORFs encoding proteins with transmembrane domains.
The orf558 (as the chimeric cox2 gene) might also cause cytoplasmic male sterility in ANN2. In other plants species, modified cox2 sequences seem to be involved in the male sterility. For instance, the CMS specific pcf gene in petunia is composed from sequences of the 5 portion of atp9, segments of cox2, and a large region of unknown origin-urfS [35]. In wild beets, the CMS-associated orf129 shows homology to the 5 flanking and the coding sequence of cox2 [36]. In the mitogenome of the inap CMS source of Brassica napus, which was created by a somatic hybridization with Isatis indigotica, a novel cox2-2 gene was detected, which represents recombination of the cox2 of woad and cox2-2 of rapeseed [37].
Another unique feature of HA89(ANN2) mitogenome is the formation of orf891. According to the ORFs predictions, a 3 elongation of the nad6 gene (orf891) may occur. However, the cDNA analyses did not agree with the genomic data. Perhaps due to nad6 mRNA editing instead of a 3 elongated transcript, the shorter one is formed. Heteromorphism in nad6 transcript length was also observed in Mimulus guttatus x M. nasutus hybrids with the CMS phenotype [38]. Both male-fertile and male-sterile hybrids have a single copy of the nad6 gene, and the divergence in mRNA length was only observed in male-sterile plants [38].

Plant Material
The CMS line HA89(ANN2) of sunflower was obtained from the genetic collection of the N. I. Vavilov All-Russian Institute of Plant Genetic Resources (Saint Petersburg, Russia). The original source of the ANN2 male-sterile cytoplasm was obtained by Serieys in 1984 [18]. All sunflower lines were grown in regularly irrigated pots in the growth chamber KBWF 720 (Binder, Tuttlingen, Germany) with the following growth conditions: temperature-26 • C, humidity-70%, photoperiod-10/14 h (dark/light).

Mitochondrial DNA Extraction, NGS Library Preparation, and Sequencing
First, the organelle fraction from leaves of 14-days-old sunflower seedlings was isolated, as described by Makarenko et al. [39]. Such preparations significantly reduced the amount of nuclear DNA. Then DNA extraction was performed with PhytoSorb kit (Syntol, Moscow, Russia), according to the manufacturer's protocol. Equal amounts of DNA from seven plants were mixed, and we used 1 ng of DNA pull for the NGS library preparation step. The library was made with Nextera XT DNA Library Prep Kit (Illumina, Mountain View, CA, USA), following the guidelines of Illumina. The quality of the library was evaluated using Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA). The library was quantified at the Qubit fluorimeter (Invitrogen, Carlsbad, CA, USA) and by qPCR, then diluted up to the concentration of 8 pM. Sequencing was performed on two different Illumina sequencing platforms: HiSeq 2000 using TruSeq SBS Kit v3-HS 200-cycles and MiSeq using MiSeq Reagent Kit v2 500-cycles (Illumina, Mountain View, CA, USA). A total number of 3,063,836 reads (100-bp paired) and 3,305,268 reads (250-bp paired) were generated.

Expression Analyses
RNA was extracted from leaves of seven 28-days-old sunflower plants using a guanidinium thiocyanate-phenol-chloroform based method with the ExtractRNA reagent kit (Evrogen, Moscow, Russia). The quality and concentration of the RNA were evaluated with the Qubit fluorimeter (Invitrogen, Carlsbad, CA, USA) and the NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). Total RNA (0.5 µg) was treated with DNAse I (Thermo Fisher Scientific, Waltham, MA, USA), and then cDNA was synthesized using the MMLV RT kit (Evrogen, Moscow, Russia) with random primers. As a negative control for each DNAse treated mRNA sample, the same reverse transcription protocol was performed, but without the MMLV enzyme. The quantitative PCR was performed with EvaGreen based RT-PCR kit (Syntol, Moscow, Russia) on Rotor-Gene 6000 (Corbett Research, Mortlake, NSW, Australia). A summary of all primer sequences is given in Supplementary  Table S1. The primers annealing temperature (Tm) was 60 • C.

Conclusions
Assembly of CMS ANN2 mitochondrial genome (HA89(ANN2) line) revealed several rearrangements, insertions, and deletions, as well as seven new open reading frames: orf324, orf327, orf345, orf558, orf891, orf933, and orf1197. Transcripts were detected for all new open reading frames in CMS ANN2, but not in the fertile cytoplasm. Only orf558, orf891, orf933, and orf1197 encoded proteins that contained membrane domains, making them the most likely CMS candidate genes for the ANN2 source. Notably, orf1197 represents a chimeric atp6 gene and presumably plays a major role in the CMS phenotype development of ANN2. However, CMS ANN2 may be caused by simultaneous action of several candidate genes. Hot spots for rearrangements (265 bp repeats) were identified, and we propose that they influence the maintenance and evolution of the mitochondrial genome in sunflower.