Characterization of Genomic Variation from Lotus ( Nelumbo Adans.) Mutants with Wide and Narrow Tepals

: Compared with rose, chrysanthemum, and water lily, the absence of short-wide and long-narrow tepals of ornamental lotus ( Nelumbo Adans.) limits the commercial value of ﬂowers. In this study, the genomes of two groups of lotus mutants with wide-short and narrow-long tepals were resequenced to uncover the genomic variation and candidate genes associated with tepal shape. In group NL (short for N. lutea , containing two mutants and one control of N. lutea ), 716,656 single nucleotide polymorphisms (SNPs) and 221,688 insertion-deletion mutations (Indels) were obtained, while 639,953 SNPs and 134,6118 Indels were obtained in group WSH (short for ‘Weishan Hong’, containing one mutant and two controls of N. nucifera ‘Weishan Hong’). Only a small proportion of these SNPs and Indels was mapped to exonic regions of genome: 1.92% and 0.47%, respectively, in the NL group, and 1.66% and 0.48%, respectively, in the WSH group. Gene Ontology (GO) analysis showed that out of 4890 (NL group) and 1272 (WSH group) annotated variant genes, 125 and 62 genes were enriched ( Q < 0.05), respectively. Additionally, in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, 104 genes (NL group) and 35 genes (WSH group) were selected ( p < 0.05). Finally, there were 306 candidate genes that were sieved to determine the development of tepal shape in lotus plants. It will be an essential reference for future identiﬁcation of tepal-shaped control genes in lotus plants. This is the ﬁrst comprehensive report of genomic variation controlling tepal shape in lotus, and the mutants in this study are promising materials for breeding novel lotus cultivars with special tepals.


Introduction
Lotus (Nelumbo Adans.) is one of the most economically aquatic plants in Southeast Asia, especially in China [1]. The genus Nelumbo has two species: N. nucifera Gaertn., commonly known as Asian lotus, and N. lutea Willd., commonly known as American lotus [2,3]. Asian lotus is one of the top 10 traditional flowers in China and the national flower of India [4]. Lotus, particularly Asian lotus, is usually divided into three types, depending on its agricultural application: ornamental lotus, rhizome lotus, and seed lotus [1,5]. Approximately 2000 commercial cultivars of ornamental lotus have been developed to date, accounting for 96% of the total number of the three types [6]. However, as an ornamental character of importance [7], the shape of petals (referred to as tepals in the lotus) exhibits little phenotypic variation in these cultivars. Few lotus germplasms exhibit short-wide tepals (similar to rose) and long-narrow tepals (as in chrysanthemum), which limits the ornamental value and market requirement of lotus [8]. Therefore, to develop lotus cultivars with unique tepal shapes, understanding the genetic basis of its determination is critical.
Genetic mapping of quantitative trait loci (QTL), genome-wide association studies (GWAS), and transcriptome analysis using modern high-throughput sequencing methods are powerful approaches that facilitate the identification of novel genes and gene regulatory Horticulturae 2021, 7, 593 2 of 11 networks [9,10]. The completion of de novo sequencing of the lotus genome [11,12] and the rapid development of high-throughput sequencing technology have made it possible to analyze nucleotide sequence variation among various lotus cultivars. Whole-genome resequencing is used to detect single nucleotide polymorphisms (SNPs) and insertiondeletion mutations (Indels) in the whole genome [13,14] and to mine structural variation and copy number variation by high-depth sequencing [15,16]. Using this technique in lotus, Hu et al. [17] identified a large number of nucleotide sequence variations and 103,656 simple sequence repeats (SSRs) between the genomes of 'Chiang Mai Wild' and 'Middle Lake Wild', which could be used for QTL mapping and molecular-assisted breeding. In another study, whole-genome resequencing of 19 accessions belonging to three subgroups of cultivated temperate lotus revealed that rhizome lotus showed the lowest genomic diversity, and the genes that found in the genomic region and related to temperate lotus and tropical lotus always exhibited divergent expression patterns [18].
Petal growth depends on the precise orchestration of the frequency of cell division, the orientation of cell division planes, and the extent of cell elongation [19][20][21]. For example, non-uniform cell expansion within a petal resulted in distorted petals in Eustoma grandiflorum [22]. Although the molecular basis of petal shape is largely unclear, it has been shown that several genes, such as LEUNIG [23,24], SEUSS and APETALA2 [25,26], JAGGED [27][28][29], CYCLOIDEA [30], INDOLE-3-BUTYRIC ACID-RESPONSE5 [31], and SPIKE1 [32], regulate petal shape by affecting cell proliferation and or cell expansion. However, the roles of these genes in petal shape regulation were mainly verified in model plants including Arabidopsis and Antirrhinum.
In this study, three lotus mutants with wide-short and narrow-long tepals were divided into two groups and analyzed by whole-genome resequencing. We explored the differences in the number and distribution of SNPs and Indels in each group and compared them between the two lotus groups. In the meantime, functional genes and metabolic pathways related to the development of broad and narrow tepals were screened. The results of this study provide important clues for the exploration of genes affecting tepal shape in lotus plants.

Plant Material
In 2015, two accessions of American lotus (N. lutea) with significant differences in tepal shape were obtained from the 'International Nelumbo Collection' (certified by International Waterlily and Water Gardening Society in 2016) of Shanghai Chenshan Botanical Garden, Shanghai, China. Both these accessions were germinated from wild lotus seeds collected from Florida, USA. Compared with the wild American lotus, one genotype (M512) showed wide and short tepals, while the other genotype (DP23) exhibited narrow and long tepals (Figure 1), and the tepal shapes of both accessions were stable after years of vegetative propagation via rhizomes. In 2016, 1024 seeds of 'Weishan Hong', a wild Asian lotus (N. nucifera), were irradiated with cobalt-60, which emits gamma rays, and a mutant (M652) with narrow and long tepals was obtained (Figure 1), and the tepal shape was stable after three years of clonal propagation.

Grouping of Lotus Accessions
The lotus genotypes investigated in this study were divided into two groups: NL (short for N. lutea), which comprised American lotus genotypes, M512, DP23, and their wild-type control (NL-CK), and WSH (short for 'Weishan Hong'), which comprised M652 and its two wild-type controls, WSH-CK1 and WSH-CK2. Each control contained three wild seedlings.

Grouping of Lotus Accessions
The lotus genotypes investigated in this study were divided into two groups: NL (short for N. lutea), which comprised American lotus genotypes, M512, DP23, and their wild-type control (NL-CK), and WSH (short for 'Weishan Hong'), which comprised M652 and its two wild-type controls, WSH-CK1 and WSH-CK2. Each control contained three wild seedlings.

DNA Extraction
Three young leaves of each mutant (M512, DP23, and M652) were harvested separately from three pots of their clones, and one young leaf of three seedlings of each control (NL-CK, WSH-CK1, and WSH-CK2) were collected. DNA was extracted separately from the three leaves of the same genotype using the modified cetyl triethylammonium bromide (CTAB) method [33], in which the sugar removal was performed three times, then they were pooled together at equimolar concentrations. The integrity of the pooled DNA samples was detected by electrophoresis on 1% agarose gels, and the DNA quality and concentration were measured by NanoDrop 2000c and Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA). DNA samples with OD260/OD280 (optical density, OD) values of 1.8-2.0 and with contents greater than 1.5 μg were used for library construction.

Library Construction and Whole Genome Resequencing
The purified genomic DNA was randomly sheared into 350 bp fragments using a COVARIS M220 Focused-ultrasonicator TM (Covaris, Woburn, MA, USA). The genomic DNA library was constructed using the TruSeq Nano DNA HT Sample Preparation Kit (Illumina Inc., San Diego, CA, USA), and its quality was assessed on the Qubit 2.0 Fluorometer (Life Technologies, Grand Island, NY, USA). The size of inserts was determined using Agilent Technologies 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Finally, whole-genome resequencing was conducted by Beijing Novogene Bioinformatics Technology Co. Ltd. (Beijing, China) at a depth of 10× using the Illumina HiSeq 2500 platform. . The NL group accessions included M512 ((a,b), wide-short tepals), DP23 ((e,f), narrow-long tepals), and NL-CK ((c,d), wild American lotus; the control). The WSH group accessions included M652 ((g,h), derived from 'Weishan Hong' by gamma irradiation; narrow-long tepals) and WSH-CK ((i,j), 'Weishan Hong' belonging to wild Asian lotus; the control). Scale bars = 1 cm.

DNA Extraction
Three young leaves of each mutant (M512, DP23, and M652) were harvested separately from three pots of their clones, and one young leaf of three seedlings of each control (NL-CK, WSH-CK1, and WSH-CK2) were collected. DNA was extracted separately from the three leaves of the same genotype using the modified cetyl triethylammonium bromide (CTAB) method [33], in which the sugar removal was performed three times, then they were pooled together at equimolar concentrations. The integrity of the pooled DNA samples was detected by electrophoresis on 1% agarose gels, and the DNA quality and concentration were measured by NanoDrop 2000c and Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA). DNA samples with OD260/OD280 (optical density, OD) values of 1.8-2.0 and with contents greater than 1.5 µg were used for library construction.

Library Construction and Whole Genome Resequencing
The purified genomic DNA was randomly sheared into 350 bp fragments using a COVARIS M220 Focused-ultrasonicator TM (Covaris, Woburn, MA, USA). The genomic DNA library was constructed using the TruSeq Nano DNA HT Sample Preparation Kit (Illumina Inc., San Diego, CA, USA), and its quality was assessed on the Qubit 2.0 Fluorometer (Life Technologies, Grand Island, NY, USA). The size of inserts was determined using Agilent Technologies 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Finally, whole-genome resequencing was conducted by Beijing Novogene Bioinformatics Technology Co. Ltd. (Beijing, China) at a depth of 10× using the Illumina HiSeq 2500 platform.

Data Filtering
The original image data generated by the sequencing machine were converted into sequence data via base calling (Illumina pipeline CASAVA v1.8.2). The sequence data were then subjected to the quality control (QC) procedure to remove unusable reads: (1) the reads with a Phred quality (Q) score < 20; (2) the reads contain the Illumina library construction adapters; (3) the reads contain more than 10% unknown bases (N bases); (4) one end of the read contains more than 50% of low-quality bases (sequencing quality value ≤ 5).

Read Mapping, Variant Detection, and Annotation
The clean reads were mapped to the reference genomes of 'China Antique' (accession number: GCA_003033685.1) using the Burrows-Wheeler Aligner (BWA) program [34] using the following parameters: 'mem-t4-k32-M'. Subsequent processing, including duplicate removal, was performed using SAM tools [35] with the parameter 'rmdup' and PICARD (http://broadinstitute.github.io/picard/, 11 December 2021). Raw SNP and Indel sets were called with SAM tools using the following parameters: 'mpileup-m2-F0.002-d1000'. Then, these sets were filtered using the following criteria: mapping quality >20; depth of the variate position >4. Additionally, the missing data of SNPs and InDels of six samples from two groups were filtered. ANNOVAR [36] was used for the functional annotation of variants.

Screening and Enrichment Analysis of Variant Genes
Genes showing SNPs and Indels between any two samples (except between WSH-CK1 and WSH-CK2) within each group were selected, that is, M512 vs. NL-CK, DP23 vs. NL-CK, M512 vs. DP23, M652 vs. WSH-CK1, and M652 vs. WSH-CK2. Then, these genes from pairwise comparisons were combined and removed duplicates in each group. Among these genes, those showing SNPs and Indels in the exonic regions were manually extracted and searched in the NCBI non-redundant protein (Nr) database for annotation and in the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases for enrichment analyses.

The Assessment of the Reliability of Candidate Genes
Based on the blasting in the GO and KEGG databases, genes from NL and WSH groups showing significant enrichment (Q < 0.05) were identified and considered as the candidate genes that may relate to the development of tepal shape in lotus. Among those genes, the genes common to both groups and the top 10% genes with the highest mutation rate (SNP and Indel density per kb in the coding region) were extracted and searched in NCBI and UniProtKB databases to clarify their biological information. Both types of genes were served as a rough ruler to measure the reliability of the candidate genes.

Whole-Genome Resequencing Data of the Six Genotypes
A total of six samples belonging to groups NL and WSH were subjected to the wholegenome resequencing, and 9.23-12.85 Gb clean data, with 'Q20 rate' > 97% and 'Q30 rate' > 92%, were generated (Table S1). Among the six samples, M652 produced the lowest amount of clean data (9.23 Gb), whereas NL-CK produced the highest amount (12.85 Gb). The GC content of all samples varied from 39.28% to 40.14%, and no significant difference was detected in GC content between and within the two groups (Table S1). The mapping rate of NL group genotypes (93.63-94.43%) was lower than that of WSH group accessions (98.22-98.43%). Additionally, the 1× and 4× coverage of NL group samples was also lower than those of WSH group accessions; the 4× coverage of the NL group ranged from 80.41% to 83.54%, whereas that of the WSH group varied from 94.65% to 98.67% (Table S1).

SNPs and Indels in the Six Genotypes
The sequence reads of all six accessions were mapped onto the reference genome sequence of lotus, and SNPs and Indels were called for each accession. The total number of SNPs identified in each sample of NL group (3,928,934; 4,197,575; 4,383,471) was about 7-10-fold higher than that of WSH group (444,148; 542,312; 569,431), and it was about 8-10-fold higher in Indel, with the number of 399,031, 430,348, and 450,433 in NL group and 43,181, 51,780, and 50,568 in the WSH group ( Figure S1). Of all the SNPs identified in the NL group, 6.15% (M512), 5.90% (DP23), and 5.73% (NL-CK) were mapped to exonic regions, which were approximately 3.5 times greater than the number of exonic SNPs in the WSH group. Of all the Indels identified in the NL group, 0.84% (M512), 0.81% (DP23), Horticulturae 2021, 7, 593 5 of 11 and 0.79% (NL-CK) were mapped to exonic regions, which was approximately 1.5 times those of the WSH group ( Figure S1).
Within each group, the controls contained more SNPs and Indels than their mutants ( Figure S1), which was probably associated with the increase in variant sites caused by the pooling of DNA extracted from three seedlings per control. Additionally, the ratio of nonsynonymous to synonymous SNPs (Nonsyn/Syn), transitions to transversions (Ts/Tv), and exonic sequence variation to total variation (exonic/total) varied slightly among the three accessions within each group ( Figure S1), to some extent, suggesting that the three genotypes within each group exhibit very low genetic diversity.

Analysis of SNPs and Indels between Mutant Accessions and Their Controls
Within each group, the SNPs and Indels were filtered between any two samples (except between WSH-CK1 and WSH-CK2). A total of 716,656 SNPs and 221,688 Indels were obtained from the NL group, while 639,953 SNPs and 134,6118 Indels were obtained from the WSH group (Figure 2). Compared with the controls, nucleotide polymorphisms in the mutant samples of the two groups occurred mainly in the intergenic regions: intergenic SNPs accounted for 61.3% and 72.6% of all SNPs identified in NL and WSH groups, respectively, and intergenic Indels accounted for 58.4% and 67.4% of all Indels identified in NL and WSH groups, respectively ( Figure 2). By contrast, the proportion of SNPs and Indels that mapped to exonic regions was very small in both groups (1.9% SNPs and 0.47% Indels in the NL group; 1.7% SNPs and 0.48% Indels in the WSH group) (Figure 2). This showed that the ratio of exonic SNPs to total SNPs was slightly higher in the NL group than in the WSH group.
of SNPs identified in each sample of NL group (3,928,934; 4,197,575; 4,383,471) was about 7-10-fold higher than that of WSH group (444,148; 542,312; 569,431), and it was about 8-10-fold higher in Indel, with the number of 399,031, 430,348, and 450,433 in NL group and 43,181, 51,780, and 50,568 in the WSH group ( Figure S1). Of all the SNPs identified in the NL group, 6.15% (M512), 5.90% (DP23), and 5.73% (NL-CK) were mapped to exonic regions, which were approximately 3.5 times greater than the number of exonic SNPs in the WSH group. Of all the Indels identified in the NL group, 0.84% (M512), 0.81% (DP23), and 0.79% (NL-CK) were mapped to exonic regions, which was approximately 1.5 times those of the WSH group ( Figure S1).
Within each group, the controls contained more SNPs and Indels than their mutants ( Figure S1), which was probably associated with the increase in variant sites caused by the pooling of DNA extracted from three seedlings per control. Additionally, the ratio of nonsynonymous to synonymous SNPs (Nonsyn/Syn), transitions to transversions (Ts/Tv), and exonic sequence variation to total variation (exonic/total) varied slightly among the three accessions within each group ( Figure S1), to some extent, suggesting that the three genotypes within each group exhibit very low genetic diversity.

Analysis of SNPs and Indels between Mutant Accessions and Their Controls
Within each group, the SNPs and Indels were filtered between any two samples (except between WSH-CK1 and WSH-CK2). A total of 716,656 SNPs and 221,688 Indels were obtained from the NL group, while 639,953 SNPs and 134,6118 Indels were obtained from the WSH group (Figure 2). Compared with the controls, nucleotide polymorphisms in the mutant samples of the two groups occurred mainly in the intergenic regions: intergenic SNPs accounted for 61.3% and 72.6% of all SNPs identified in NL and WSH groups, respectively, and intergenic Indels accounted for 58.4% and 67.4% of all Indels identified in NL and WSH groups, respectively ( Figure 2). By contrast, the proportion of SNPs and Indels that mapped to exonic regions was very small in both groups (1.9% SNPs and 0.47% Indels in the NL group; 1.7% SNPs and 0.48% Indels in the WSH group) (Figure 2). This showed that the ratio of exonic SNPs to total SNPs was slightly higher in the NL group than in the WSH group.

Annotation and Analysis of Genes Harboring SNPs and Indels in the Exonic Regions
The predicted amino acid sequences of the variant genes with exonic SNPs and Indels were blasted in the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases to annotate their functions. In the GO database, 4890 (72.8%) of the 6715 variant genes in the NL group were annotated and assigned to three functional categories for a total of 19,785 times ( Table 1). The most distributed times of these variant genes were assigned to the biological process category (99,311 times (50.2%)), followed by the cellular component and molecular function categories (Table 1). In the entire genetic background and variant genes, GO terms including 'ADP binding', 'pattern binding', and 'polysaccharide binding' were significantly (Q < 0.05) enriched in the molecular function category and contained 88, 37, and 37 candidate genes, respectively (Figure 3a). After excluding duplicates, a total of 125 candidate genes were selected from the above three GO items (Table S2). variant genes in the NL group were annotated and assigned to three functional categories for a total of 19,785 times ( Table 1). The most distributed times of these variant genes were assigned to the biological process category (99,311 times (50.2%)), followed by the cellular component and molecular function categories (Table 1). In the entire genetic background and variant genes, GO terms including 'ADP binding', 'pattern binding', and 'polysaccharide binding' were significantly (Q < 0.05) enriched in the molecular function category and contained 88, 37, and 37 candidate genes, respectively (Figure 3a). After excluding duplicates, a total of 125 candidate genes were selected from the above three GO items (Table  S2).  In the WSH group, 1272 genes were annotated in the GO database, accounting for 23.8% of the 5353 variant genes, which was significantly lower than the proportion of annotated genes in the NL group (Table 1). These annotated genes of the WSH group received a total of 52,425 assignments in the three functional categories (in the same order as that observed in the NL group): 25,824 times (49.3%) in the biological process category, followed by 20,751 times in the cellular component category, and 5850 times in the molecular function category (Table 1). Seven GO terms, such as 'ADP binding', 'chitin metabolic In the WSH group, 1272 genes were annotated in the GO database, accounting for 23.8% of the 5353 variant genes, which was significantly lower than the proportion of annotated genes in the NL group (Table 1). These annotated genes of the WSH group received a total of 52,425 assignments in the three functional categories (in the same order as that observed in the NL group): 25,824 times (49.3%) in the biological process category, followed by 20,751 times in the cellular component category, and 5850 times in the molecular function category (Table 1). Seven GO terms, such as 'ADP binding', 'chitin metabolic process', 'chitin catabolic process', showed significant differences (Q < 0.05). Among them, 'ADP binding' contained 55 genes, while each of the remaining six GO terms contained 12 identical genes (Figure 3b). Finally, A total of 62 candidate genes were identified in the 7 GO terms, after excluding gene duplicates (Table S2).
In the KEGG database, 2286 variant genes of the NL group were assigned into 122 pathways, of which the top three pathways were 'metabolic pathways' (430 genes), 'biosynthesis of secondary metabolites' (256 genes), and 'starch and sucrose metabolism' (67 genes) ( Table 1). By contrast, only 645 genes of the WSH group were annotated in 68 pathways, and the top three pathways with the most annotated genes and their order were the same as those in the NL group, each with 128, 84, and 39 genes, respectively (Table 1). However, no pathway that showed a significant overrepresentation, based on the Q-value (Q < 0.05), was found in the two groups. While four pathways (comprising 104 genes, Table S2) with Horticulturae 2021, 7, 593 7 of 11 significant p-value (p < 0.05) were detected in the NL group, including 'ribosome biogenesis', 'ABC transporters', 'propanoate metabolism', and 'phenylpropanoid biosynthesis'. Additionally, 5 pathways (comprising 35 genes, Table S2) were detected in the WSH group (p < 0.05), including 'ABC transporters', 'starch and sucrose metabolism', 'plant-pathogen interaction', 'ether lipid metabolism', and 'amino sugar and nucleotide sugar metabolism'.
After excluding gene duplicates, a total of 306 enriched genes were collected from the NL (125 GO + 104 KEGG) and WSH (62 GO + 35 KEGG) groups, and which comparison group each candidate gene came from was also displayed (Table S2). These are essential references for locating and screening tepal shape-regulating genes of lotus in the future.

Sketchy Sampling of Biological Information of Candidate Genes
According to the 306 genes enriched in the GO and KEGG databases, those shared by both groups and with the highest mutation rate (top 10%) in the exonic regions were selected to measure the 306 candidate genes. Thus, a total of 41 genes were sampled, in which 8 genes were shared by the 2 groups, and 24 and 9 genes were exclusive to the NL and WSH groups, respectively. Among the eight shared genes, three (LOC104590721, LOC104605275, and LOC104607443) were enriched in the GO but not the KEGG database, and the remaining five were enriched in both GO and KEGG databases (Table S3).
Of the 41 selected genes, 35 were annotated with a clear function, including 17 disease resistance-related genes, 6 receptor (like-) protein kinases, and 12 genes related to chitinase, transferase, synthase, and other functions. The lengths of the 41 genes varied from 891 to 17,163 bp, in which 2 candidate genes (LOC104607832 and LOC104609114) produced 4 transcript variants, and 10 genes produced 2-3 transcript variants (Table S3).
In terms of 'family and domain', the genes harboring nucleotide-binding adaptor shared by APAF-1, R proteins, and CED-4 (NB-ARC) and the genes containing leucine-rich repeat (LRR) domains were found the most (18 genes, 44%) and mainly related to the function of 'ADP binding' (Table S3). While the domains of the six receptor (like-) protein kinases were epidermal growth factor (EGF)-like, Pkinase, and wall-associated receptorkinase galacturonan-binding (GUB_WAK), which are associated with the morphological development of plants due to their functions such as 'ATP binding', 'calcium ion binding', 'polysaccharide binding', and 'protein serine/threonine kinase activity'. In the KEGG database, only 18 of the 41 candidate genes were assigned to specific pathways, while the remaining 23 genes were unknown (Table S3).

Genetic Relationship of the Individuals in Each Group
The mutants in both groups germinated from the seeds of wild populations. American lotus (N. lutea) grows in a wild state in North America, their populations showed low genetic diversity but a high level of differentiation with very low genetic flow [37]. The three accessions of the NL group were the seedlings from the Okeechobee Lake population of Florida, USA, which largely differentiated and separated from the population outside Florida according to our previous work [38]. For the accessions of the WSH group, they were obtained from the seeds of the Weishan Lake population, which has always been regarded as one of the lotus populations native to China [39,40]. Additionally, high uniformity and stability were observed in the phenotypic (especially floral) characteristics of the population and the 1024 seedlings, which were irradiated with the gamma rays of cobalt-60. Thus, the origins of the mutants and controls in each group implied a close genetic relationship among themselves, which was beneficial to rule out the noise, to some extent, in the genetic background that was irrelevant to the tepal-shape variation. In other words, the three mutants in this study are good materials for producing novel lotus cultivars with extreme-shaped tepals and for exploring the genetic regulation of tepal shape in plants.

Differences in Genome Mapping between the NL and WSH Groups
The two groups of samples investigated in this study were representative wild-type germplasms of the two species of Nelumbo. Our results by whole-genome resequencing revealed many differences at the DNA level. For example, the mapping rate, 1× coverage, and 4× coverage of the three lotus accessions in the NL group were lower than those in the WSH group (Table S1), more SNPs and Indels were detected in the NL group than in the WSH group (Table 1). It was probably due to 'China Antique' (source of the reference genome) exhibiting a closer genetic relationship with each accession of the WSH group (native to China) than with the geographically isolated American lotus accessions in the NL group. This fact was supported by the fact that 'China Antique' and the WSH accessions are the wild lotus genotypes belonging to N. nucifera, while the NL accessions are the wild lotus belonging to N. lutea, which are two allopatric species of Nelumbo that diverged for approximate 12 million years [41]. Therefore, there could be a substantial number of sequences and genes showing presence and absence between N. nucifera and N. lutea, for instance, between 'China Antique' and the accessions of the NL group in this study. In this sense, the N. nucifera reference genome ('China Antique') has its limitation to studying the N. lutea genetic variations.
Previously, we showed that many SNPs and Indels among M512, DP23, and 'China Antique' could be detected using expressed sequence tag-derived simple sequence repeat (EST-SSR) makers, and polymorphic DNA bands were produced by 26/217 EST-SSR makers between M512 and DP23, from which the nucleotide sequences or genes obtained were associated with organ development, hormone metabolism, signal transduction, etc. [42]. This indirectly supported the above inference that the reference genome exhibited lower sequence similarity with the American lotus genome than with the 'Weishan Hong' genome. In addition, the three accessions in the WSH group exhibited higher Nonsyn/Syn ratio and Ts/Tv ratio than those in the NL group ( Figure S1), which was consistent with the results of our previous study [42]. In other words, fewer non-synonymous mutations and transitions occurred in the genomes of NL group accessions, probably because of the lack of gene transfer between Asian lotus and American lotus during the long period of geographical isolation [18,43]. Additionally, genes with non-synonymous mutant SNPs are the focus of future research, and they will be screened and annotated in combination with a comparative analysis of transcriptomes.

The Reliability of the Candidate Genes That May Associate with Tepal Shape in Lotus
We picked 41 genes to roughly estimate the 306 candidate genes (NL 277 + WSH 89), which were assumed to associate with the development of tepal shape in lotus. Tepal shape and size are remarkably consistent within a given ecotype [44,45] but are influenced by the environment [46,47]. Among the 41 genes in this study, 17 genes encoded disease resistance proteins belonging to the NB-LRR family, of which 12 were exclusive to the NL group, 4 were common to both groups, and only 1 gene was unique to the WSH group (Table S3). The contrasting number of disease resistance genes between the two groups was consistent with, at least to some degree, the fact that M512 and DP23 were derived from natural mutation or selection of lotus seeds, whereas M652 was induced by artificial radiation.
The development of tepal shape is mainly regulated by cellular proliferation, expansion, and the direction of cell division [20,21]. In this study, 6 genes from the 41 sampled genes predicted to encode EGF-like or WAK domain-containing proteins (Table S3), required for cell expansion [48][49][50], may play essential roles in the morphogenesis of lotus tepals. ERECTA, which encodes an LRR receptor-like serine-threonine kinase (STK), was verified to have a major effect locus for shaping petals in Arabidopsis thaliana [51]. A BLAST search of this Arabidopsis gene sequence in the lotus genome identified its homolog, LOC104600563, which was also annotated as ERECTA. However, this gene was not one of the 306 genes screened in this study. Nonetheless, the LOC104604923 gene identified in the WSH group, as well as the previously reported WAK domain-containing genes LOC104590721, LOC104596880, LOC104608917, and LOC109115528, were all STK-type genes [48], and similar to ERECTA, they will be the focus of subsequent research.
To summarize this study, we believe that the genomic data and the 306 candidate genes in this study are valuable for future studies on the tepal shape of lotus plants, combined with other analyses such as transcriptomic data or epigenomic study.

Conclusions
The shape and size of petals are essential for the survival and reproduction of plants. Both these morphological features are important targets in ornamental crop breeding. Lotus germplasms with wide and narrow tepals generated by natural mutation and artificial irradiation, respectively, represent ideal materials for exploring the mechanism of tepal shape development. Compared with the mutant samples to their respective controls within each group, a large number of SNPs, Indels, and candidate genes can be found, which also exhibited differences between group NL and WSH. The 306 candidate genes enriched from the GO and KEGG databases, especially the EGF and WAK domain-containing genes related to cell expansion and division, must be the research focus in future studies for understanding tepal development in lotus plants. Overall, the results of this study provide a fruitful resource for future identification of petal-shaped control genes study and a valuable database that can be used combined with transcriptome technology and QTL mapping for understanding the mechanism of flower morphogenesis in Nelumbo.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/horticulturae7120593/s1, Figure S1: Distributions of SNPs and Indels in the genomes of six accessions, Table S1: The statistics of the data generated by whole-genome resequencing and genome mapping in the six samples, Table S2: The genes enriched in the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) database, Table S3: The statistics of candidate genes that may associate with tepal shape in lotus.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.