Complete Plastid and Mitochondrial Genomes of Aeginetia indica Reveal Intracellular Gene Transfer (IGT), Horizontal Gene Transfer (HGT), and Cytoplasmic Male Sterility (CMS)

Orobanchaceae have become a model group for studies on the evolution of parasitic flowering plants, and Aeginetia indica, a holoparasitic plant, is a member of this family. In this study, we assembled the complete chloroplast and mitochondrial genomes of A. indica. The chloroplast and mitochondrial genomes were 56,381 bp and 401,628 bp long, respectively. The chloroplast genome of A. indica shows massive plastid genes and the loss of one IR (inverted repeat). A comparison of the A. indica chloroplast genome sequence with that of a previous study demonstrated that the two chloroplast genomes encode a similar number of proteins (except atpH) but differ greatly in length. The A. indica mitochondrial genome has 53 genes, including 35 protein-coding genes (34 native mitochondrial genes and one chloroplast gene), 15 tRNA (11 native mitochondrial genes and four chloroplast genes) genes, and three rRNA genes. Evidence for intracellular gene transfer (IGT) and horizontal gene transfer (HGT) was obtained for plastid and mitochondrial genomes. ψndhB and ψcemA in the A. indica mitogenome were transferred from the plastid genome of A. indica. The atpH gene in the plastid of A. indica was transferred from another plastid angiosperm plastid and the atpI gene in mitogenome A. indica was transferred from a host plant like Miscanthus siensis. Cox2 (orf43) encodes proteins containing a membrane domain, making ORF (Open Reading Frame) the most likely candidate gene for CMS development in A. indica.


Introduction
The structure and gene contents of plastid genomes are highly conserved in most flowering plants and range from 110 to 160 kb in length and contain 110 genes (~79 protein coding, 29 tRNA, and 4 rRNA genes) [1]. In contrast, angiosperm mitogenomes are remarkably divergent in size, structure, and mutation rate. Most angiosperm mt genomes contain 24 to 41 protein coding genes, three rRNA genes, and two or three rRNA genes [2][3][4][5][6][7].
Orobanchaceae is a family of mostly parasitic plants of the order Lamiales and contains about more than 2000 species in 90-115 genera. Members of this family include all types of parasitic plants such as hemiparasites and holoparasites [17][18][19]. Holoparasites (obligate parasites) cannot live without a host, whereas hemiparasites (facultative parasites) can. The plastomes of Orobanchaceae (parasite species) are remarkably variable with respect to genome size, genome structure, and gene contents. The majority of photosynthesis-related and plastid-encoded NAD(P)H-dehydrogenase (NDH) complex genes in the Orobanchaceae plastome have been lost or pseudogenized [20,21], and in several species, one IR (inverted repeat) copy has been completely lost, and as a result, Orobanchaceae plastomes range from 45 kb (Conopholis americana) to 160 kb (Schwalbea americana) in length [22]. The complete mitochondrial genome of Orobanchaceae, Castilleja paramensis has been reported [23], and Zavas et al. [24] reported the mitochondrial genes of two Lathraea species. Genes of the Orobanchaceae plastome and mitogenome, such as atp1 [25], rpoC2 [26], atp6 [27], and nad1 [28], exhibit several transfers between Orobanchaceae and angiosperms.
Aeginetia indica is a holoparasitic plant of the Orobanchaceae family and is parasitic on the roots of monocots like Miscanthus [18]. Previous phylogenetic studies have shown that A. indica is united with Stiga, Buchera, Radmaea, and Harveya [29,30]. The plastome of A. indica has been reported to be 86,212 bp in size and to have lost almost all photosynthesisrelated genes [31].
The present study was undertaken to determine the plastome and mitogenome of A. indica and to compare these with previously reported results [31], especially with respect to mitogenome size, gene, and intron contents and repeats, and to analyze the HGT, IGT, and CMS genes in the A. indica mitogenome.

Characteristics of the A. Indica Plastid Genome
A previous study [31] showed that the plastid genome of A. indica is 86,212 bp in length with an LSC (Large single Copy), SSC (Small Single Copy), and two IRs. However, we found the complete plastid genome of A. indica (GenBank accession number: MW851293) is 56,381 bp in length and contains an LSC, SSC, and only one IR ( Figure 1) together with 26 protein coding genes.
The coverage of A. indica in the present study was 6089X ( Figure S1). In contrast, the coverage of the plastid in the previous study of A. indica had gaps and low coverage values ( Figures S1 and S2). Furthermore, the GC content of the plastid genome in this study (32.9%) was higher than in the previous study (34.4%), and 18 tRNAs was smaller in the present study. Protein coding gene contents in the plastid genome of A. indica showed all atp, ndh, psa, psb, pet, and rpo gene groups have been lost together with cemA, ccsA, rbcL, ycf3, and ycf4 genes. However, the ndhB gene was pseudogene and the atpH gene remained intact ( Table 1).
The A. indica plastid genome is the second smallest among the Orobanchaceae, in which previously sequenced genomes ranged in size from 45,673 bp in Conopholis americana (NC_023131) to 160,910 bp in Schwalbea americana [22]. Wicke et al. [22] showed that 16 protein genes (matK, rpl2, rpl16, rpl20, rpl33, rpl36, rps11, rps2, rps4, rps7, rps12, rps14, rps18, rps8, ycf1, and ycf2), 15 tRNAs (trnD-GUC, trnE-UUC, trnfM-CAU, trnH-GUG, trnI-CAU, trnL-UAG, trnM-CAU, trnN-GUU, trnP-UGG, trnQ-UUG, trnS-GCU, trnS-UGA, trnW-CCA, and trnY-GUA), and four rRNAs (rrn16, rrn23, rrn4.5 and rrn5) are present in the chloroplast genomes of Orobanchaceae and that A. indica also contains these genes. Most of the chloroplast genome lengths of hemiparasites in Orobanchaceae are longer than holoparasites in Orobanchaceae ( Figure S3). Hemiparasitic species contain pseudogenes of photosynthesis-related genes and NADH dehydrogenase complex (ndh genes) and only a few genes have been lost. However, many genes in holoparasitic species have been completely lost. Especially in A. indica, most of photosynthesis-related genes and ndh genes have been completely lost and have one IR region.    We conducted phylogenetic analysis using a gene data matrix based on 14 protein coding genes from 34 species (Table S1) with 13,499 bp aligned nucleotides. Orobanchaceae species formed a monophyletic group with high bootstrap values, except for P. cheilanthifolia. Two A. indica formed a highly supported clade ( Figure S3). It is possible that the previous study [31] and the present study were performed on different species A. indica and that reclassification of the genus Aeginetia is required.

Characteristics of the A. Indica Mitogenome
The assembled A. indica mitogenome was 491,631 bp long (GenBank accession number: MW851294) with a GC content of 43.5% ( Figure 1). The average coverage of the A. indica mitogenome was 1379.9X ( Figure S4). We could not assemble a circular mitochondrial genome for A. indica and considered that the genome might be linear or a collection of sub-genomic molecules that arise via recombination of repeat regions [32]. Tandem repeats ranged in length from 37 to 419 bp with a total length of 8384 bp. We identified 12 chloroplast genome fragments in the mitochondrial genome that included genes and intergenic regions (Figures 1 and 2, Table S2). The fragments ranged from 57 to 154 bp. The mitochondrial genome contained a pseudogene of ndhB, a partial rps4 gene, six tRNAs, and two IGS (Intergenic spacer) regions. The A. indica mitochondrial genome is common in terms of genome and repetitive sequence sizes. However, the A. indica mitochondrial genome had a smaller plastid-derived sequence size than other Lamiales (Figure 2). A total of 34 complete native mitochondria protein coding genes and one complete chloroplast protein coding gene (atpI) were annotated in the mitogenome with 15 tRNAs (11 native mitochondrial tRNAs and four plastid-derived tRNAs) and three rRNAs ( Figure 1, Tables S3-S5). The A. indica mitochondrial genome did not contain ribosomal protein subunit genes (rps1, rps2, rps7, rps11, and rps19), and two respiratory genes (shd3, and sdh4), which have been lost in angiosperms (Table S3) [14,23,33,34]. In previous studies, ribosomal protein gene (rps10 and rps7) and sdh (sdh3 and shd4) genes were functionally transferred to the nuclear genome many times [34,35]. Similarly, the ribosomal genes and sdh genes in A. indcia were also transferred to the nuclear genome.

Characteristics of the A. Indica Mitogenome
The assembled A. indica mitogenome was 491,631 bp long (GenBank accession number: MW851294) with a GC content of 43.5% (Figure 1). The average coverage of the A. indica mitogenome was 1379.9X ( Figure S4). We could not assemble a circular mitochondrial genome for A. indica and considered that the genome might be linear or a collection of sub-genomic molecules that arise via recombination of repeat regions [32]. Tandem repeats ranged in length from 37 to 419 bp with a total length of 8384 bp. We identified 12 chloroplast genome fragments in the mitochondrial genome that included genes and intergenic regions (Figures 1 and 2, Table S2). The fragments ranged from 57 to 154 bp. The mitochondrial genome contained a pseudogene of ndhB, a partial rps4 gene, six tRNAs, and two IGS (Intergenic spacer) regions. The A. indica mitochondrial genome is common in terms of genome and repetitive sequence sizes. However, the A. indica mitochondrial genome had a smaller plastid-derived sequence size than other Lamiales (Figure 2). A total of 34 complete native mitochondria protein coding genes and one complete chloroplast protein coding gene (atpI) were annotated in the mitogenome with 15 tRNAs (11 native mitochondrial tRNAs and four plastid-derived tRNAs) and three rRNAs (Figure 1, Tables S3-S5). The A. indica mitochondrial genome did not contain ribosomal protein subunit genes (rps1, rps2, rps7, rps11, and rps19), and two respiratory genes (shd3, and sdh4), which have been lost in angiosperms (Table S3) [14,23,33,34]. In previous studies, ribosomal protein gene (rps10 and rps7) and sdh (sdh3 and shd4) genes were functionally transferred to the nuclear genome many times [34,35]. Similarly, the ribosomal genes and sdh genes in A. indcia were also transferred to the nuclear genome.  Table S6. Genome size, amount of plastid-like and repetitive DNA in seven Limiales species. Blue branches and red branches indicate Lamiales and Orobanchaceae species, respectively.

IGT and HGT of A. Indica Organelle Genomes
Angiosperm genomes sometimes contain foreign genes caused by IGT and/or HGT. In plants, IGT between cp, mt, and nuclear genomes is a common and well-known evolutionary phenomenon [36][37][38][39][40]. In the Orobanchaceae species, most chloroplast genes and fragments have been transferred from the nuclear or mitochondrial genomes of chloroplasts [41].  Table S6. Genome size, amount of plastid-like and repetitive DNA in seven Limiales species. Blue branches and red branches indicate Lamiales and Orobanchaceae species, respectively.

IGT and HGT of A. Indica Organelle Genomes
Angiosperm genomes sometimes contain foreign genes caused by IGT and/or HGT. In plants, IGT between cp, mt, and nuclear genomes is a common and well-known evolutionary phenomenon [36][37][38][39][40]. In the Orobanchaceae species, most chloroplast genes and fragments have been transferred from the nuclear or mitochondrial genomes of chloroplasts [41].
To identify the genes transferred between the chloroplast and mitochondrial genomes of A. indica, we used BLAST analysis to identify sequences with significant homology in the two genomes. In was reported in the previous study [31] that most chloroplast genes of A. indica could not be detected in its transcriptomes, which suggested that they were non-functional [31]. We detected two pseudogenes (ψndhB and ψcemA) of chloroplast genes in the A. indica mitochondrial genome ( Figure 3A,B). Phylogenetic analyses of these two pseudogenes showed that both are monophyletic groups with Orobanchaceae species (Figure 3A,B). Thus, we suggest that the ψndhB and ψcemA genes in the A. indica mitochondrial genome were transferred from the A. indica chloroplast genome and are probably the result of IGT. Cusimano and Wicke [41] suggested that most of the photosynthesisrelated genes lost from Orobanchaceae chloroplast genomes have been transferred to mitochondrial or nuclear genomes by IGT and subsequently fragmented.
In a previous study [31], it was reported that the atpH gene in the A. indica chloroplast genome had been lost. However, we found an intact atpH gene in the A. indica chloroplast genome that is not typically found in Orobanchaceae species.
Phylogenetic analyses of atpH genes from Orobanchaceae species including A. indica and other angiosperm species (Table S6) showed that A. indica is not closely related to Orobanchaceae species ( Figure 3C). Park et al. [42] reported three chloroplast genes (rps2, trnL-F, and rbcL) in the genus Phelipanche (Orobanchaceae) were acquired from another Orobanchaceae species by HGT between chloroplast genomes. Our results suggest that the atpH gene in the A. indica chloroplast genome was acquired from another angiosperm chloroplast genome.
The atpI gene in A. indica mitogenome was also acquired from another angiosperm. This gene clustered closely with monocot species (Figure 3D), and monocots like Miscanthus sinensis are known A. indica hosts, which suggest that the atpI chloroplast gene was transferred from a host to A. indica. Most HGT events typically occur between mitochondrial or between chloroplast genomes of different species [8,42,43]. However, the atpI gene in the A. indica mitochondrial genome was acquired from the chloroplast genome of another species. Gandini and Sanchez-Puerta [9] suggested that native plastid sequences are initially transferred by IGT from plastids to mitochondria and then transferred to mitochondria of related species by HGT. We consider that the atpI of the A. indica mitogenome was introduced in the same manner.

Cytoplasmic Male Sterility (CMS) of Genes in the A. Indica Mitogenome
Previous studies have shown that the production of functional pollen and structural variations in mt DNA are associated with CMS, which is caused by the expressions of chimeric open reading frames (ORFs) in the mitochondrial genome [12,15,44,45]. We identified 751 mitochondrial ORFs (≥150 bp in length) in A. indica and by BLAST searching A. indica mitochondrial genes. The A. indica mitogenome contained seven ORFs (≥30 bp in length), that is, orf525, orf709 orf103, orf403, orf99, orf724, and orf 43 (Table S7). Of these, orf43 contained fragments of cox2 in the A. indica mitochondrial gene and was predicted to encode two transmembrane domains ( Figure 4). Thus, orf43 might be responsible for CMS. Previous studies have shown that the wild beet (Beta vulgaris ssp. vulgaris) [46], sunflower (Helianthus annuus) [16] and Brassica [47] mitogenomes contain two copies of cox2 gene associated with CMS. Accordingly, our study provides clues regarding the evolution of CMS in Orobanchaceae.

Plant Samplingand DNA Sequencing
Orobanchaceae Aeginatia indica was collected from Jeju Island (Korea) and vouchers (YNUH-JAI001) were preserved in the herbarium of Yeungnam University. Genomic DNA was extracted from fresh leaf tissue using a DNeasy Plant Mini Kit (Qiagen, Hilden, Germany). Paired-end libraries with an average insert size of 550 bp using Illumina Hiseq 2500 (Illumina, San Diego, CA, USA). Approximately 20 Gb PE reads were generated.

Plastome and Mitogenome Analyses
The complete plastome of A. indica (this study) was compared with that previously reported (MN529629) of A. indica [31] using the mVISTA program [54]. The A. indica plastome determined in the present study was used as a reference.
Tandem repeats in the mitochondrial genome were identified using Tandem Repeat Finder v4.04 [55]. Plastid-derived regions and plastid-like sequences transferred to mitochondrial genomes were estimated using cp genomes and BLASTN2.2.24+ using an e-value cutoff of 1 × 10 −6 and at least 75% sequence identity.

Phylogenetic Analysis of Plastid and Mitochondrial Genes
Phylogenetic trees were constructed for: (1) 14 concatenated chloroplast genes of 34 species (Table S1) and (2) concatenated 19 mitochondrial genes of 20 species (Table S5). The 14 concatenated chloroplast genes of 34 species and the 19 concatenated mitochondrial genes of 20 species were aligned using MAFFT v7.222 [56]. Phylogenetic analyses were performed in RAxML using the GTRGAMMA model under rapid bootstrap values [57].

Analysis of Intracellular Gene Transfer (IGT), Horizontal Gene Transfer (HGT) and Cytoplasmic Male Sterility (CMS) Genes
IGT and HGT events in the A. indica mitogenome were identified using blastN and using an e-value cutoff of 1 × 10 −6 searches for genes and ORFs of the mitochondrial genome against Arabidopsis plastid-encoded genes. For phylogenetic IGT and HGT analyses, sequenced mitochondrial and plastid genes across angiosperms were selected (Table S7). The data sets of individual mitochondrial and plastid genes were aligned using MAFFT [56] in Geneious Prime. Phylogenetic trees were constructed using RAxML and the GTRGAMMA model under rapid bootstrap values (1000 replicates) [57].
The ORFs of at least 150 bp were compared with identified A. indica mitochondrial genes using BlastN and an e-value cutoff of 1 × 10 −3 , a minimum length of 30 bp, and a sequence identity of at least 90%. Transmembrane domains in candidate ORFs were predicted using TMHMM v2.0 [58].
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/ijms22116143/s1, Figure S1: Comparison of Aeginetia indicla plastid genome length and coverage between this study (A) and previous study (B). Figure S2: Comparison of the plastid genome sequences of this study and previous study generated using mVISTA program. Figure S3: Phylogenetic tree of 34 species (Table S1) taxa based on 14 chloroplast genes in the cp genome. Blue color indicates hemiparasite plants and red color indicates holoparsite plants in Orobanchacae. Figure S4: Coverage of Aeginetia indica mitogenome. Table S1: Source for plastid genomes included this study. Table S2: Blast result of plastid-derived DAN segments in mitochondrial genome of Aeginetia indica Gene contents of Angiosperm mitogenomes including Aeginetia indica. Table S3: indica Gene contents of Angiosperm mitogenomes including Aeginetia indica. Table S4: The tRNA gene contents of Angiosperm mitogenome including Aeginetia indica. Table S5: The intron contents of Angiosperm mitogenome including Aeginetia indica. Table S6: Soure for mitochondrial genoems included this study. Table S7: Taxon accession numbers for phylogenetic analysis of IGT and HGT.