Chloroplast Genes Are Involved in The Male-Sterility of K-Type CMS in Wheat

The utilization of crop heterosis can greatly improve crop yield. The sterile line is vital for the heterosis utilization of wheat (Triticum aestivum L.). The chloroplast genomes of two sterile lines and one maintainer were sequenced using second-generation high-throughput technology and assembled. The nonsynonymous mutated genes among the three varieties were identified, the expressed difference was further analyzed by qPCR, and finally, the function of the differentially expressed genes was analyzed by the barley stripe mosaic virus-induced gene silencing (BSMV-VIGS) method. A total of 16 genes containing 31 nonsynonymous mutations between K519A and 519B were identified. There were no base mutations in the protein-encoding genes between K519A and YS3038. The chloroplast genomes of 519B and K519A were closely related to the Triticum genus and Aegilops genus, respectively. The gene expression levels of the six selected genes with nonsynonymous mutation sites for K519A compared to 519B were mostly downregulated at the binucleate and trinucleate stages of pollen development. The seed setting rates of atpB-silenced or ndhH-silenced 519B plants by BSMV-VIGS method were significantly reduced. It can be concluded that atpB and the ndhH are likely to be involved in the reproductive transformation of 519B.


Introduction
Wheat (T. aestivum L.) is the world's most widely grown food crop, feeding nearly half of the world's population [1,2]. It is also an important raw material for industry. For example, wheat gluten can be used as a natural binder in the manufacture of paper [3]. Additionally, dry gluten extracted from wheat can be used as an additive for bread processing [4]. Improving yield per unit area and stress resistance of wheat are very effective strategies to alleviate the food problem. The utilization of heterosis is an effective way to improve wheat yield and quality, and it plays an essential role in the breeding of crops [5,6]. Wheat is a monoecious, self-pollinated crop. Therefore, male sterile lines are essential for the utilizing of heterosis. Male sterility in plants refers to the development of gynoecium as normal during the growth process, but fertilization and seed setting after receiving pollen are abnormal, which are caused by the abnormal development of stamens. The abnormal development of stamens means that the pollens are abortive or anther tissue structure is abnormal, which eventually leads to the decline or loss of pollen vitality. The causes of male sterility in plants are various. According to the origin of male sterile genes, male sterile lines can be divided into three types: genic male sterility (GMS), cytoplasmic male sterility (CMS), and nucleus-cytoplasmic male sterility. At first, Kihara [7] transferred the nucleus of common wheat into Aegilops caudata and cultivated the first nucleus-cytoplasmic male sterile line in 1951. Wilson and Ross [8] introduced the common wheat nucleus into copy region (SSC), and a long single copy region (LSC). The sequence of the IRs' regions is the same but in the opposite direction. The structures of the chloroplast genomes are relatively conservative and have fewer changes during the evolution of species. Therefore, the chloroplast genome can provide reliable and accurate information for biodiversity research and provide a lot of basic information for comparative evolution research [38,39].
Since the chloroplast genes can be transferred to mitochondria as mentioned above, and since many mitochondrial genes are involved in the process of male sterility in plants, chloroplast genes might affect male sterility by changing the gene function of mitochondria. Besides, some special structural compositions in chloroplasts were also considered related to the fertility of plants [40]. Li et al. [41] used SDS-PAGE and two-dimensional electrophoresis to compare the differences of peptides on chloroplast thylakoid membranes between CMS lines and their maintainers in beet, corn, and sorghum. There were significant differences in the size, number, and distribution of peptide plaques between the two lines, proving that the composition of chloroplast thylakoid polypeptides might affect cytoplasmic male sterility. Li et al. [42] studied the chloroplast ultrastructure of the CMS line and its maintainer in rape, the number of stacks of grana in the thylakoid of the sterile line was significantly reduced. Additionally, the lamellar structure between grana became thinner or even fractured so that the lamellar system was disordered. Chen et al. [43] found that after the digestion of chloroplast genome DNA of sorghum male sterile line and maintainer line with HindIII endonuclease, the male sterile line produced a 3.7 kb specific fragment, with a 165 bp lack compared to the maintainer line (3.8 kb). The deletion fragment was from the rpoC2 gene, which encodes the β subunit of the RNA polymerase. The amino acids of the deletion fragment will participate in the formation of an α helix. It was found that the identified 1501 differentially expressed transcripts in leaves and anthers at different developmental stages were most of these DETs being localized in plastid and mitochondrion of male sterile line in Brassica napus L. induced by the chemical hybridization agent compared to the wild Brassica napus L. [44].
K519A is a non-1B/1R K-CMS line, and 519B was the maintainer line of K519A with the same nuclear genotype. Their nuclear genomes are derived from the same paternal species, but their cytoplasm genomes are from different maternal species. Thus, the differences in fertility come from the differences between cytoplasmic genomes. YS3038 is also a non-1B/1R K-CMS line. Additionally, the nucleus of YS3038 is basically the same as that of K519A. However, the sterility of YS3038 is thermo-sensitive. To understand the sterile mechanism of K519A and YS3038, the chloroplast genomes of K519A, YS3038, and 519B were sequenced, assembled, annotated, and compared in this study. Then the nonsynonymous mutant sites of coding genes were verified. Besides, possible SSR, possible long repeat sequences, and phylogeny were analyzed. This study can provide an important reference value for the sterile mechanism studies of K519A and YS3038.
After the pollens matured, the opening angle of the glumes of the male sterile line become larger than 519B, the anthers stretch out from the glume (Figure 1(A1,B1)). Additionally, the anther is wizened, short, and crooked, and the pollen cannot spread out from the anther (Figure 1(A2)). After the pollens have matured, the anthers of 519B are plump and cracked, and the amount of fertile pollen spreads out from the anthers (Figure 1(B2)). The test materials were planted in the experimental field of the Northwest A&F University (108 • 4 E, 34 • 16 N) on 3 October 2018. The two materials were planted in 4 rows (25 cm between rows), respectively, with 15 seeds in 1 m. Planting was done with general field management measures. Young leaves were stored in a refrigerator at −80 • C in March 2019.

CTAB Extraction of DNA and Chloroplast Genome Sequencing
Genomic DNA from young leaves of K519A, YS3038, and 519B wheat was extracted by means of the CTAB method. The whole-genome DNA sequences were obtained using secondgeneration high-throughput technology, with the Illumina HiSeq XTM Ten sequencing platform and paired-end 150 bp sequencing strategy. Firstly, the original sequencing data for each sample was obtained, then low-quality reads and connector sequences were removed. Additionally, the clean data was used for further genome assembly of the chloroplast.

Genome Assembly and Annotation
The chloroplast assembly was carried out based on the assembly method of Hahn et al. [45]. The assembly steps are as follows: Firstly, the clean data is spliced using SPAdes software [46], setting default parameters except that the cut-off parameter is not selected. The clean data are spliced into scaffolds. Using the published wheat chloroplast DNA sequences (Chinese Spring, CS, T. aestivum) and protein-coding gene sequences as a reference (KJ614396.1). Then the sequences are aligned using Blastn and Exonerate, respectively. The e value threshold of DNA alignment is 1 × 10 −10 , and the protein similarity threshold is 70%, respectively. Scaffold with gene matching was selected and the coverage was ranked to remove fragments that were clearly not the target fragments, such as most of them were 1000X, but a few were 10X, and then the low-coverage fragments (10X) were removed. PRICE and MITObim was used to splice the collected fragmented sequences to extend the scaffolds. The number of iterations generally was 50. The original sequencing reads were aligned to scaffolds using bowtie2, then the matched paired reads were picked out and spliced by SPAdes. The path was studied to see if a distinct circle diagram could be formed. If so, the circle genome was extracted, otherwise, steps 3-5 above were repeated until a circular genome was formed.
The organelle genome annotation is mainly divided into three parts: protein-coding gene annotation, RNA annotation, and structural annotation. Three methods are commonly used for protein-coding gene prediction. The DNA sequences or protein-coding sequences directly align with NCBI to confirm the genes. Submitting sequences to the online service annotation tools, DOGMA (http://dogma.ccbb.utexas.edu/, accessed on 7 April 2020) and CpGAVAS [47], predicting the ORFs, and then annotating genes in the nr database. In combination with the three methods, the most accurate annotation results are chosen.
Chloroplast tRNA annotations were performed using the tRNAscan-SE online site (http://lowelab.ucsc.edu/tRNAscan-SE/, accessed on 15 May 2020). The rRNA annotations were performed using the RNAmmer 1.2 Server (http://www.cbs.dtu.dk/services/ RNAmmer/, accessed on 28 May 2020), in combination with homologous alignments, and boundary correcting. After the sequence annotation was completed, the structure note was edited by Sequin to generate a submission file that could be submitted to the GenBank database. The file was submitted to OGDRAW (https://chlorobox.mpimp-golm.mpg.de/ OGDraw.html, accessed on 17 June 2020) to draw the annotated map.

Codon Preference Analysis of Chloroplast Gene
A codon is a link between genetic information from DNA and protein, and every three adjacent codons encode an amino acid. In all organisms in nature, there are only 20 amino acids in total, however, each amino acid corresponds to at least one or up to six codons. Synonymous codons mean that different codons can encode the same amino acid [48]. The analysis of codon preference was performed using the CHIPS program in the EMBOSS software package [49].

Simple Sequence Repeat (SSR) and Long Repeat Sequence Analysis
The Microsatellite identification tool (MiSa) [50] was used for finding simple sequence repeats (SSR) in the genome. We used this software to detect SSR with eight times as a minimal repeat number of mononucleotides, five times for dinucleotides, and at least three times for three or more bases.
The polymorphism SSR markers at the same position in the same gene of K519A and 519B were selected. Primer Premier 6.0 software was used to design SSR primers to verify the polymorphism between K519A and 519B.
Using REPuter online software [51] to find long repeat sequences, all options: forward, reverse, complement, and palindromic were selected for "Match Direction". The parameter "Hamming distance" was set to 3, "maximum computed repeat" to 50, and "minimal repeat size" to 15.

Gene Alignment Analysis and Phylogenetic Analysis
We used MEGA4.0 software [52] to align the homology of protein-coding genes, then we selected all nonsynonymous mutant sites to verify the accuracy of second-generation sequencing results. According to the company's sequencing sequence, Primer Premier 6.0 software was used to design primers of nonsynonymous mutant sites, then PCR products were sent to the company for first-generation sequencing (FGS).
The genome-wide alignment of chloroplast genomes K519A, 519B, YS3038, and another 100 chloroplasts of near-source species were performed using HomBlocks (https: //github.com/fenghen360/HomBlocks, accessed on 22 July 2021) software with the trimAl trim method. The NJ method was used to build the evolutionary tree with bootstrap value 1000.

Quantitative Real-Time PCR (qPCR) Analysis
Premier 6.0 software was used for gene-specific primers. RevertAid First Strand cDNA Synthesis Kit (Thermo Scientifc, Wilmington, DE, USA) was used for cDNA synthesis with random primers according to instructions. TB GreenTM Premix Ex TaqTM II (Tli RNaseH Plus) Kit (Takara Biological Engineering, Tokyo, Japan) and Applied Biosystems 7300 Real-Time PCR System (Life Technologies, Carlsbad, CA, USA) were used for the qPCR analysis with three biological replicates and three technical replicates. The wheat Actin gene with AB181991.1 (Gene Bank) was used for normalization. The 2 -∆∆CT method was used to calculate relative expression levels of target genes. The approximate 200 bp gene fragment with the PacI (TTAATTAA) and NotI (GCGGC-CGC) enzyme locus were used for gene-silenced. Then the vector of γ-gene was obtained.

Linearization and in Vitro Transcription of γ-Gene Vector
MluI was used for the linearization of α and γ plasmids; SpeI was used for the linearization of β plasmid; and BssHII was used for the linearization of γ-PDS and γ-gene plasmids. Then, in vitro transcription was performed using Ribo m 7 G Cap Analog (Promega), RiboMAX Large Scale RNA Production System-T7 (Promega), and Ribolock RNase Inhibitor (Thermo).

Creation of Transfection Mixture and Virus Infection of Seedlings
The negative control plants used in vitro transcript mixture of α, β, and γ, while the positive control plants used in vitro transcript mixture of α, β, and γ-PDS, and the treatment plants used an in vitro transcript mixture of α, β, and γ-gene. The 247 µL mixture could be used to infect five individuals including 7 µL in vitro transcription product for each vector, 40 µL sterilized 1% DEPC water, and 200 µL GK-Pbufer.
The penultimate leaves and the fully unfolded flag leaves were smeared with abovementioned mixture. The plants were then placed in a 24-26 • C incubator for dark cultivation for 24 h.

Detection of Silencing Efficiency and Phenotypes
After virus infection, 4-5 spikelets per ear were collected at the binucleate stage for RNA extraction and qPCR analysis. When ears grew to the mature stage, the seed setting rate was calculated as follows: Seed setting rate per ear = number of grains per ear / (effective spikelet number × 2) × 100%.

Genome Sequencing and Chloroplast Assembly
Genomic sequencing was performed on K519A, 519B, and YS3038, and the number of paired-end reads of 150 bp was greater than or equal to 14,716,209 for each sample, and the number of reads was greater than or equal to 14,705,908 (4,406,664,764 clean data) and Q30 was greater than 92.35% for each sample after quality control. The length of the chloroplast genome was about 136 kb, indicating a sequencing coverage of 40×.

Genome Content and Characteristics
Like the chloroplast genomes of other higher plants, the two chloroplast genomes in this experiment contained two inverted repeats, IRA and IRB, which divided the entire genome into four parts, and the remaining regions were the large single-copy region (LSC) and the small single-copy region (SSC). The total length of the chloroplast genome of K519A was 136,996 bp, and the lengths of the four parts were 81,136 bp (LSC), 12,776 bp (SSC), and 21,542 bp (IRA and IRB) ( Figure 2). The total length of the YS3038 was 136,919 and the lengths of the four parts were 81,059 bp (LSC), 12,776 bp (SSC), and 21,542 bp (IRA and IRB) ( Figure 3). The total length of 519B was 136,235 bp, and the lengths of the four parts were 80,335 bp (LSC), 12,796 bp (SSC), and 21,552 bp (IRA and IRB) ( Figure 4). The GC contents of genomes were 38.27% (K519A), 38.28% (YS3038), and 38.28% (519B), respectively. The structures of the chloroplast genomes of the two male sterile lines and 519B were different, such as the expansion and contraction of the boundaries of the four parts, and the sequence insertion of some regions.
All chloroplast genomes include 77 protein-coding genes, 4 rRNA genes, and 30 tRNA genes. Based on the functions, chloroplast genes and RNA were mainly divided into three categories: photosynthesis (45), self-replication (58), unknown function (4), and others (4) ( Table 1). Of these, there were two copies of the protein-coding genes ndhB, ndhH, rps7, rps15, rps19, ycf1, ycf2, and rpl2. Additionally, there were three copies of rpl23. In the tRNA genes, trnA-UGC, trnH-GUG, trnI-GAU, trnN-GUU, trnR-ACG, and trnV-GAC contained two copies. Additionally, there were three copies of the trnfM-CAU. Besides, the trnL-CAA gene had two copies only in YS3038. The trnA-UGC, trnG-UCC, trnI-GAU, trnK-UUU, trnL-UAA, trnV-UAC, atpF, ndhA, ndhB, petB, petD, rpl2, rpl16, and rps16 contained an intron, and ycf3 contained two introns. Rps12 is a trans-splicing gene.     The regions of exons and introns for each gene were predicted, and the codon preferences were analyzed based on the sequence of exons. The codon preferences of K519A and 519B were different. Additionally, the codon preferences of genes in K519 and 519B are shown in Tables 2 and 3, respectively. Among them, both Met and Trp were encoded by only one codon. Ile and stop codons were encoded by three codons. Ala, Gly, Pro, Thr, and Val were encoded by four codons. Arg, Leu, and Ser were encoded by six codons.

Discovery of Possible SSRs in K519A and 519B
The SSR markers of chloroplasts have been applied for studies of population genetics, evolution processes, and phylogeny [53,54]. In this study, we found 186 possible SSR repeat motifs in K519A (Table 4) and 188 in 519B (Table 5). In SSRs, the mononucleotide repeats were the most abundant compared to polynucleotide repeats, and the number of A/T repeat motifs was significantly higher than the number of C/G, which was consistent with the results of six legume plastid genomes [55]. In K519A, the proportion of single nucleotide repeat motif was 68.28% and was 67.02% in 519B. The SSR number composed of dinucleotide was 8, of which 6 were AT/TA repeat motifs and 2 were TC in K519A. The SSR number composed of dinucleotide was 9, of which 7 were AT/TA repeat motifs and 2 were TC in 519B. Among the trinucleotides, AAC/TTC most frequently appeared. Additionally, the SSR number of the remaining repeat motif types appeared only once.

Discovery of Possible Long Repeat Sequences in K519A and 519B
Besides SSR analysis, we also analyzed possible long repeat sequences of the chloroplast genome. Long repeat sequences are special DNA sequences that occur repeatedly in the genome and usually occupy a large proportion in the genome. In general, the proportion of repeat sequences is positively related to the size of the genome. For example, the Arabidopsis thaliana genome is 120 Mb, and the repeat sequences account for 10%. Most repeat sequences accounting for 81% of the wheat genomes (17,000 Mb) exist in the intergenic region, and a small number of repetitive sequences exist in the gene coding region [56]. Repetitive fragments have important molecular significance for plant evolution research [57]. In K519A, a total of 50 possible long repeats were found, and the sizes were 20-286 bp. Of the 50 possible repeats, the positive repeat, palindrome repeats, and inverted replicates were 37, 11, and 2, respectively. The longest repeat was the forward repeat with 286 bp in length. In 519B, a total of 52 possible long repeats were found, the sizes of which were also 20-286 bp, which was consistent with K519A, of which the positive repeat, palindrome repeat, and inverted repeat were 39, 10, and 3, respectively. Most of the long repeats between the two lines are consistent, only a few are inconsistent (Table 6).

Alignment for Protein-Coding Genes between CS, 519B, K519A, and YS3038
Firstly, the CS was used as a reference sequence to compare the protein-coding genes of 519B, 519A, and YS3038. Compared with CS, 519B has 4 genes with nonsynonymous mutations, 15 genes with synonymous mutations, a total of 28 base mutations, 4 amino acid mutations, 7 deletions, and 2 insertions. There were no base mutations in the proteinencoding genes between K519A and YS3038. Compared with CS, both K519A and 519B have 15 nonsynonymous mutated genes and 27 synonymous mutated genes, including 113 base mutations, 30 amino acid mutations, 8 deletions, and 3 insertions (Table 7). It was concluded that compared with the protein-coding gene of CS, 519B has the least synonymous mutation and nonsynonymous mutation genes, base mutation and amino acid mutation sites, insertions, and deletions, which also indicates that the cytoplasm of 519B belongs to the common wheat cytoplasm.
A comparative analysis of K519A and 519B revealed that 104 SNPs were identified in the protein-coding sequence in 42 genes (Table 7). Among them, 73 were synonymous mutations, 31 were nonsynonymous mutations from 16 genes (atpB, atpI, ccsA, matK, ndhA, ndhF, ndhH, ndhK, psbB, psbH, rbcL, rpl14, rpl32, rpoB, rpoC2, and rps16), and the proportion of nonsynonymous mutations was 29.8% of the total mutations. Of non-synonymous mutation genes, the NADH dehydrogenase genes are the most, and four genes occur with mutations. The protein-coding gene with the largest difference between K519A and 519B sequences was the rpoC2 gene, and there were 13 SNPs with 5 amino acid differences.     35C-12A means that the 35th base in this line is C, which corresponds to the 12th amino acid, A, and the following is the same.

The Verification of Nonsynonymous Mutant Sites between K519A and 519B by First-Generation Sequencing
To verify the accuracy of second-generation sequencing results, 31 nonsynonymous mutant sites from 16 genes, selected based on the results of the above alignment analysis, were identified via FGS technology. The corresponding primers were listed in Table 8, which were prefixed with FGS. It was concluded that the amino acids of the 16 genes were all consistent under the two methods. Some comparison results were listed in Figure 5. Table 8. Primers for first-generation sequencing.

Phylogenetic Analysis of K519A and 519B
The phylogenetic analysis of chloroplast genomes of K519A, 519B, and another 100 species that have near relatives was performed ( Figure 6). Bromus vulgaris was used as the outgroup. It was found that the species of the Aegilops genus and the Triticum genus were clustered in the largest number through phylogenetic analysis. Besides, four species of the Amblyopyrum genus were clustered into the Aegilops genus and a Taeniatherum genus cultivar was clustered into the Triticum genus. Then the Secale genus and the Elymus genus were clustered, but a species of the Thinopyrum genus and two species of the Dasypyrum genus were clustered in the Elymus genus. The relationships between the Hordeum genus and Aegilops genus were relatively distant. In the Eremopyrum genus, five other genus species were clustered. It was concluded that the chloroplasts of K519A and 519B were from two independent taxa, the chloroplast of 519B was closely related to the Triticum genus species. However, K519A was closely related to the Ae. kotschyi cultivar and other Aegilops species, which was consistent with the characteristic that K519A's cytoplasm was derived from Ae. kotschyi.

The qPCR of Nonsynonymous Mutation Genes
To further study the effect of nonsynonymous mutant genes in chloroplasts, we performed a qPCR analysis to assess the expression patterns of genes. According to the existing research results, we selected six nonsynonymous mutant genes (atpB, ccsA, matK, rbcL, rpoC2, and ndhH) for analysis of gene expression levels of K519A compared with 519B. The corresponding primers of qPCR are listed in Table 9. The efficiency of the qPCR primers calculated based on standard curve is 95.56-106.72% ( Figure 7A). Table 9. Primers for the qPCR.

Genes
Forward Primers Reverse Primers GGCTGGTGCTTCCCTTGTTG ATCGGTCCTGCACTTGTCCT It can be seen from the figures that the expression levels of all genes were downregulated at the binucleate and trinucleate stages, except that the rbcL gene was downregulated only at the trinucleate stage. Additionally, the rpoC2 gene was upregulated at the late uninucleate stage ( Figure 7B).

Function Analysis of Candidate Genes by BSMV-VIGS Method
To further study the relationship between chloroplast genes and fertility, several differentially expressed nonsynonymous mutant genes were selected for barley stripe mosaic virus-induced gene silencing (BSMV-VIGS) in 519B. First, the primers used for gene silenced were designed (Table 10). The recombinant vector was then successfully constructed. Albinism occurred in the leaves after PDS-silencing (positive control) ( Figure 8A). It was concluded that the seed setting rate of atpB-silenced plants (20.7%) was significantly reduced compared with the negative control (93.8%). Additionally, the seed setting rate of ndhH-silenced plants (24.5%) was significantly reduced compared with the negative control (93.8%) ( Figure 8B) (Table 11). This indicates that atpB and ndhH were likely to participate in the fertility transformation of 519B.   Several spikelets at the binucleate stage were saved for qPCR. The primers for qPCR were listed in Table 10. Compared with the negative control individuals, the expression of atpB in the gene-silenced individuals was significantly reduced with atpB-qPCR primer, and the expression level was 0.31. After analysis of the reproduction of atpB-silenced fragment with atpB-VIGS primer, it was found that the reproduction efficiency was above 15,914. Compared with the negative control individuals, the expression of ndhH in the gene-silenced individuals was significantly reduced with ndhH-qPCR primer, and the expression level was 0.40. After analysis of the reproduction of ndhH-silenced fragment with ndhH-VIGS primer, it was found that the reproduction efficiency was above 21,283 ( Figure 9).

Plant Chloroplast Genomics Research and Phylogenetic Analysis
The chloroplast genome sequencing technology has developed from the FGS to the current third-generation sequencing [58][59][60]. Currently, most of the chloroplast genome sequences on NCBI are performed by second-generation sequencing technology [61,62]. Due to the fast improvement of sequencing technology, this results in sequencing costs continually decreasing, thus chloroplast genome data is largely supplemented. In the past few decades, chloroplast studies have made a breakthrough. More than 2400 plant chloroplast genomes have been published in the NCBI database. Of which, the chloroplast genome of tobacco is the first genome to be sequenced in higher plants [59]. The chloroplast genome encodes 100-120 genes, including 70-88 protein structural genes, 30-32 tRNA genes, and 4 rRNA genes. According to the functional classification of chloroplast genes, they can be divided into three categories: the first are genes encoding photoreactive structural proteins, such as psaA, psbB, petD, etc. The second are the coding genes of energy metabolism-related enzymes, such as atpA, ndhB, rbcL, and other genes. The third are the encoding genes of the transcriptional translation system, mainly including the ribosome large subunit encoding RPL family, the ribosome small subunit encoding RPS family, the transport RNA encoding TRN family, and transport RNA polymerase encoding RPOA family, and so on. Besides, there are some genes whose functions have not yet been determined, such as the YCF family [63]. Researches have shown that many genes of plant chloroplasts are involved in some important biological processes. For instance, Zhong et al. [64] demonstrated that the chloroplast heat shock protein HSP21 is involved in plastid-encoded RNA polymerase (PEP)-dependent transcription in Arabidopsis. Yu et al. [65] found that the downregulated expression of ribosomal protein S1 (RPS1) in rps1 mutants negatively modulates the expression of heat-responsive gene HsfA2 and its target genes in Arabidopsis.
Our study completed the sequencing of the chloroplast genome of the K-type CMS K519A and maintainer 519B. This is similar to the chloroplast genome structure of most plants, which is a typical four-segment structure. The chloroplast genome of angiosperms belongs to maternal inheritance and is generally not affected by genetic recombination.
The chloroplast genome is much smaller than the nuclear genome and there are more copies in normal cells, so the chloroplast genome is easy to obtain. Additionally, the structure and coding genes of the chloroplast genome are relatively conserved [66,67].

SSR Molecular Marker
Molecular marker technology is to detect the polymorphism of DNA sequences based on PCR, which can directly compare the mutations of DNA sequences. With the continuous improvement of molecular biology technologies, it is becoming increasing mature and is applied in many research directions [68,69]. Among the molecular markers, the SSR marker has high variability, codominant inheritance, easy operation, and easy analysis [70]. It is widely used in genetic mapping construction, gene mapping, genetic diversity analysis, and variety identification of various crops. Stachel et al. [3] successfully analyzed the genetic diversity of 60 wheat cultivated varieties originating from three agro-ecological zones using microsatellite markers. Salameh et al. [71] used a molecular marker to transfer the resistant QTL (Fhb1) on the 5A chromosome into nine European winter wheat lines to develop new scab-resistant lines. Besides, Shim et al. [72] sequenced the whole chloroplast genome of millet and compared it with other reported chloroplast genomes, and obtained 125 SSR loci and 34 indel changes, which can provide effective information for the phylogeny. In our study, there were 186 possible SSR repeat motifs in K519A and 188 possible SSR repeat motifs in 519B. The accuracy of SSR markers and the probability of polymorphism among individuals in a population or among species can only be known when these SSR markers are used for genotyping. Therefore, we will collect some experimental materials for analysis.

Chloroplast Genes Related to Cytoplasmic Male Sterility
Some nuclear male sterility genes have been identified. For example, Xing et al. [21] cloned the TaAPT2 gene related to thermo-sensitive sterility in wheat, and the gene expression levels of the young anthers, which during the fertility conversion stage, were significantly reduced under the sterile condition by Northern analysis, indicating that the TaAPT2 gene may be related to fertility. Xia et al. [73] confirmed that the male sterile ms2 mutant in wheat was caused by an insertion of terminal-repeat retrotransposons in a miniature (TRIM) element in the promoter region of the Ms2 gene. In addition to this, some studies have shown that the deletion, duplication, or mutation of plant chloroplast DNA may cause the wrong transmission of genetic information between chloroplast, mitochondria and nucleus, leading to male sterility [74,75].
Some studies have found that some chloroplast genes in plants may affect the fertility of plants. For example, the matK gene is located in the intron of the chloroplast trnK gene. Jia et al. [76] extracted DNA of abortive flower buds from the radish male sterile line BT-18 and normal flower buds using DD-PCR technology to study the differential expression of RNA between abortive and normal buds in radish. Additionally, it was found that the matK gene appeared three times in different primers of PCR amplification, and the amplification bands were enhanced in abortive buds, which showed upregulated expression, indicating that the abortion of radish buds had a great relationship with matK gene. Ou et al. [77] studied the intron sequences of ORFs and rps16 genes in chloroplasts of different CMS lines in rice. Additionally, they found that in the gametophyte sterility line, the rps16 gene intron contains a GMTGAG sequence and a unique G at position 595, which can be used as a molecular marker to distinguish sporophytic sterility and gametophyte sterility. In this study, it was found that some genes had nonsynonymous mutant sites between K519A and 519B, and these genes might be related to male sterility.
The RNA polymerase is a key enzyme controlling chloroplast genomic transcription and gene expression. It is encoded by the copy genes rpoA, rpoB, rpoC1, and rpoC2 genes [78]. The α subunit and β subunit of RNA polymerase are encoded by rpoA and rpoB genes, respectively, and the β' and β" subunit by rpoC1 and rpoC2 genes. Chen et al. [79] found a new chloroplast DNA deletion in the sorghum CMS line, which occurs in the middle of the rpoC2 gene, which may be associated with cytoplasmic male sterility in sorghum. We infer that nonsynonymous mutations of rpoB and rpoC2 genes may affect the production of β and β" subunits of RNA polymerase.

Phylogenetic Relationship of Important Crops Based on Chloroplast Sequences
In recent years, with the continuous completion of plant chloroplast genome sequencing, phylogenetic researches based on genes or chloroplast genomes have been greatly promoted. Kallersjo et al. [80] utilized the rbcL chloroplast gene to analyze the phylogenetic relationship of 2538 species, covering the prokaryotic cyanobacteria to higher flowering plants, using the parsimony jackknife analysis method. Bruneau et al. [81] analyzed the evolutionary relationship of 223 legume materials by analyzing the introns of the chloroplast trnl gene. Kim [82] sequenced the chloroplast genome of Panax schinseng Nees and analyzed the evolutionary relationship of vascular plants. Liu [83] analyzed and reported the chloroplast genome of Morella rubra and the evolution patterns of Fagales, proving that Myricaceae is a sister to Juglandaceae. By comparing the chloroplast genomes of rice, wheat, and maize, it was found that some hot parts of the genes were susceptible to mutation, and the tandem tRNA region was species-specific. Additionally, gene deletion of the IR region and the boundary characteristics of the IR region indicated that the genetic relationship between rice and wheat is closer than that of maize [84]. Matsuoka et al. [85] studied 106 genes in the chloroplast genome of rice, wheat, and maize, and found that 86.8% of the genes showed the same evolution rate.
In this study, it was found that the chloroplasts of 101 materials from 13 genera were closely related. From the perspective of evolutionary relationship, Hordeum was the first to evolve, followed by Eremopyrum, Henrardia, Australopyrum, Agropyron, Elymus, Thinopyrum, Dasypyrum, Secale, Aegilops speltoides, Triticum, Taeniatherum, Aegilops, and Amblyopyrum. Middleton et al. [69] studied the phylogenetic relationships of the chloroplast genomes of 12 species and found that Hordeum vulgare divided into Secale cereale and wheat approximately 8-9 million years ago. It was consistent with this study. Saarela et al. [86] carried out phylogeny analysis on the chloroplast genomes of Triticinae and found that the differentiation relationship was in the order of Hordeum jubatum, H. vulgare subsp. Vulgare, Connorochloa tenuis, Secale cereal, Taeniatherum caput-medusae, Ae. speltoides var. speltoides, T. timopheevii, T. aestivum, Triticum turgidum, Triticum macha, Triticum monococcum, Triticum urartu, Aegilops tauschii, Aegilops cylindrica, Aegilops geniculate, Aegilops longissima, Aegilops sharonensis, Aegilops bicornis, Ae. kotschyi, and Aegilops searsii. This is almost consistent with the evolutionary relationships we found in the 13 genera. In addition, the Triticum contained hexaploid wheat and four origin species of hexaploid wheat in this study. Additionally, the order of differentiation is T. timopheevii, T. Turgidum, T. urartu and T. monococcum. In comparison with these four genera, the differentiation of T. aestivum is not successively obvious and almost simultaneous. The evolutionary sequence of T. urartu and T. monococcum was just opposite to the results of Saarela et al. [86], which may be due to the close relationship between the two species and the small differences in chloroplast genomes. In addition, previous studies found that the genome donors of hexaploid wheat diverged 2.1-2.9 million years ago [69]. It is not difficult to see from the successful application of nextgeneration sequencing technology in the acquisition of chloroplast genomic data in recent years that the emergence of this new technology will greatly promote the development of chloroplast phylogenetic analysis, and the phylogenetic research of plants will have a bright prospect [87,88].

Conclusions
In this study, we obtained and analyzed the three chloroplast genomes of two male sterile lines and one maintainer line in wheat. Based on a comparative analysis, a total of 104 mutations were found in 42 genes. There were 16 genes with nonsynonymous mutations between K519A and 519B. There were no base mutations in the protein encoding genes between K519A and YS3038. The atpB, ccsA, matK, rbcL, rpoC2, and ndhH genes were mostly downregulated at the binucleate and trinucleate stages. AtpB and ndhH were likely to be involved in the reproductive transformation of 519B. The chloroplast genomes of 519B and K519A were closely related to the Triticum genus and Aegilops genus, respectively.
Author Contributions: L.M. and Q.D. designed and supervised the study and wrote the manuscript; Y.H. and Y.G. planned and performed the experiments, analyzed the data, and wrote the manuscript; Y.L. and X.Z. performed the experiments and analyzed the data; H.Z. analyzed data. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: The sequences of mitochondrial genomes have been deposited in the GenBank database at National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm. nih.gov/genbank/, accessed on 30 November 2021) and can be accessed by the accession numbers MN605257, MN605258, and OL678073 for K519A, 519B, and YS3038, respectively.
Acknowledgments: A preprint has previously been published (https://www.researchsquare.com/ article/rs-15231/v1, accessed on 25 November 2021), but the new version is partly newer and contains more content than the preprint version.

Conflicts of Interest:
The authors declare no conflict of interest.

BSMV-VIGS Barley stripe mosaic virus-induced gene silencing CMS
Cytoplasmic male sterility GMS Genic male sterility FGS First-generation sequencing