A High-Quality Genome Sequence of Model Legume Lotus japonicus (MG-20) Provides Insights into the Evolution of Root Nodule Symbiosis

Lotus japonicus is an important model legume for studying symbiotic nitrogen fixation as well as plant development. A genomic sequence of L. japonicus (MG20) has been available for more than ten years. However, the low quality of the genome limits its application in functional genomic studies. Therefore, it is necessary to assemble high-quality chromosome sequences of L. japonicus using new sequencing technology to facilitate the study of functional genomics. In this report, we used the third-generation sequencing combined with the Illumina HiSeq platform to sequence the genome of L. japonicus (MG20). We obtained 544 Mb of genomic sequence using third-generation assembly. Based on sequence analysis, 357 Mb of repeats, 28,251 genes, 626 tRNAs, 1409 rRNAs, and 1233 pseudogenes were predicted in the genome. A total of 27,991 genes were annotated into databases. Compared to the previously published data, the new genome database contains complete L. japonicus sequences in the proper order and orientation with a contig N50 2.81Mb and an excellent genome coverage, which provides more accurate genome information and more precise assembly for functional genomic study.


Introduction
Lotus japonicus (MG-20), an important model plant of the Leguminosae, is a wild early-flowering accession collected on Miyako Island, the southernmost point of the Japanese archipelago, which was later named as Miyakojima MG-20 [1]. The L. japonicus (MG-20) is a diploid (2n = 12) with six chromosomes and 472.1 Mbp genome size determined on the condensation patterns and the locations of rDNA loci [2]. L. japonicus is widely used by researchers worldwide to study the molecular mechanisms related to symbiotic interactions with rhizobia as well as plant development. Several essential components involved in the nodulation signaling pathway have been cloned and functionally characterized via forward genetics in L. japonicus, including Nod Factor Receptors 1 and 5 (NFR1 and NFR5), Symbiosis Receptor Kinase (SYMRK), NSP1, etc. Genome-wide research not only helps to reveal the relationship between genetics and evolution, but also to discover some new genes with unknown functions: for example, new genes related to phosphate accumulation [3], as well as genes related to the phenotypic changes in overwintering and flowering time due to regional differences [4].
A previously published genome assembly of L. japonicus (MG-20) based on clone-by-clone sequencing and shotgun sequencing technologies contains 394 Mbp with 44,464 sequence contigs [5].

Evaluation of the Assembly
The assembly results were evaluated using three methods: (1) the assembly was compared to the second-generation sequencing data; (2) the single-base error rate was calculated; and (3) the genome integrity was assessed. To compare the assembly to the second-generation sequencing data, clean reads were used to create clean read data files, with one unit per four rows and double-ended statistics; for example, read1 and read2 were recorded as two reads. Mapped reads (%), indicating the number of clean reads targeted to the reference genome divided by the percentage of all clean reads, were determined using the samtools flagstat command. Properly mapped (%) double-end sequences were mapped to the reference genome, and the distance was consistent with the length distribution of the sequencing fragments, as determined using the samtools flagstat command. To evaluate single-base error rate, the second-generation sequencing reads were compared to the assembled genomic sequences, and the single-base error rate of the genome was estimated by determining the ratio of the number of bases in the genome that were inconsistent with the sequencing reads.
Genome integrity was assessed using BUSCO software. The embryophyta_odb9 database in BUSCO v2 [16] contains 1440 conserved core genes in terrestrial plants. We used BUSCO v2.0 software to assess the integrity of the Lotus genome assembly. CEGMA v2.5 [17] contains 458 conserved core genes in eukaryotes. We also used CEGMA v2.5 to assess the integrity of the genome assembly. For Hi-C library quality assessment, we aligned the sequencing data with the sequences of the assembled genomes using BWA [18]. An excessively long insert implies that the fragment may be formed by random joins. Therefore, read pairs that were larger than the maximum insert length were filtered out. To evaluate the Hi-C data based on invalid interaction pairs and valid interaction pairs, the genome sequence was divided, sorted, and oriented using LACHESIS software, and the assembly results were evaluated. For genomic error correction of the assembly, interrupting contigs were identified based on a length of 300 kb. The contigs were reassembled with Hi-C, and the positions that could not be restored to the original assembly sequence were listed as the candidate error area. The position of the low Hi-C coverage depth in this area was identified as the error point, thus completing the correction of the preliminarily assembled genome. The error-corrected genome was assembled using LACHESIS software. To evaluate the Hi-C assembly results, for Hi-C data assembled into chromosome, the genome was divided into 100 kb of sequence per bin, and the number of Hi-C read pairs between any two bins was calculated based on the intensity signal of the interaction between the two bins. These values were used to construct a heatmap.
The Rfam database and the miRBase [28] database with Infenal 1.1 were used to predict rRNAs and microRNAs, and tRNAs were identified using tRNAscan-SE v1.3.1 [29]. The pseudogene has a sequence similar to a functional gene, but it has lost its original function due to mutations such as insertions and deletions. GenBlastA v1.0.4 [30] alignment was used to search for homologous gene sequences (possible genes) in the genome of the masked whole locus, and GeneWise v2.4.1 [31] was used to find premature stop codons and frameshift mutations to obtain pseudogenes. For repeat sequence annotation, we used LTR FINDER v1.05 [32], MITE-Hunter [33], RepeatScout v1.0.5 [34], and PILER-DF v2.4 [35] to build a database of repetitive Lotus japonicus genome sequences based on structural prediction and de novo prediction. PASTEClassifier [36] was used to classify the database, which was then merged with the Repbase [37] database as the final repetitive sequence database. RepeatMasker v4.0.6 [38] software was used to predict repeat sequences based on the newly generated database.

Phylogenetic Analysis
OrthoFinder (https://github.com/davidemms/OrthoFinder) [55] software was used for phylogenetic analysis of 10 Leguminosae species. OrthoFinder finds orthogroups and orthologs, generates rooted gene trees for all orthogroups, and identifies all gene duplication events in these gene trees. Orthofinder also generates a rooted species tree for the species being analyzed and maps the gene duplication events from the gene trees to branches in the species tree. Finally, OrthoFinder provides comprehensive statistics for comparative genomic analyses. Divergence times were estimated using r8s (http://hydrodictyon.eeb.uconn.edu/eebedia/index.php/Phylogenetics: _r8s_Lab#Using_r8s_to_estimate_divergence_times). The species tree was drawn using FigTree (http://tree.bio.ed.ac.uk/publications/491/).

Genome Sequencing and Assembly
Using third-generation sequencing technology, the original sequencing reads appeared as dumbbell-shaped structures containing junctions at both ends, as revealed by comparative physical mapping. The genomic DNA of L. japonicus was interrupted using g-TUBE followed by connecting with a dumbbell-shaped connector before construction of a library for sequencing. BluePippin was used to screen the target DNA fragments for the creation of sequencing libraries. Qubit2.0 and Agilent 2100 were used to detect the library concentration and insert size, respectively. The concentration of the library was further quantified using the Q-PCR method. After high-throughput sequencing (sequencing depth is about 60×), sequencing data are evaluated and filtered to obtain high-quality data (subreads) of 43.57 Gb with more than 97.43% of Q20 ratio and more than 93.81% of Q30 ratio, respectively. The mean read length was 9328 bp ( Table 1). The resulting set of sub-reads contained 4,671,067 reads, with an average length of >9.32 kb. All the reads were assembled using Canu software and corrected using Pilon software based on second-generation data. After assembly, 593 contigs were obtained, with a total length of 544,144,611 bp. The contig N50 was 2,811,151 bp, and the contig N90 was 709,636 bp. The longest contig was 11,738,399 bp. The GC content was~37.99%. We obtained 28.06 Gb of clean data using Hi-C (high-throughput chromatin conformation capture) sequencing. The sequencing coverage was 51× and the Q30 ratio reached 92.66%. After Hi-C assembly, LACHESIS software was used to divide, rank, and orient the genomic sequence into groups for evaluation of the assembly results. A total of 517.54 Mb of the genomic sequences were mapped to the six chromosomes of L. japonicus, accounting for 95.1% of the total. Among these sequences, the sequence length capable of determining order and direction is 499.06 Mb, accounting for 96.43% of the total length of the mapped chromosome. The corresponding number of sequences was 537, and the total sequences clustered with a percentage of 87.18%.

Assessment of the Genomic Assembly
We evaluated the assembly results using three methods: (1) we compared the assembly to the second-generation sequencing data, (2) we calculated the single-base error rate, and (3) we assessed genome integrity. When we compared the assembly to the second-generation sequencing data, we obtained 222,255,924 clean reads, indicating that 98.93% of the sequences could be mapped to the reference genome. The paired-end sequences were mapped to the reference genome. The distance between sequences is consistent with the length distribution of the sequenced fragments (~96.96%). Analysis of the single-base error rate revealed that 252 bp were inconsistent, comprising 0.000046% of the total length of contigs, indicating that the single-base error rate was approximately 0.000046%.
Hi-C is an extension of chromosome conformation capture (3C) technology. An evaluation of the Hi-C assembly results showed that the six chromosome groups could be clearly distinguished.
Within each group, the intensity of the interaction at the diagonal position ( Figure 1) is higher than that of the non-diagonal position, indicating a high intensity of interaction between adjacent sequences (diagonal position) and a weak interaction signal between non-adjacent sequences (non-diagonal positions), which is consistent with the principle of Hi-C genome assembly.
Genes 2020, 11, x FOR PEER REVIEW 6 of 15 Analysis of the single-base error rate revealed that 252 bp were inconsistent, comprising 0.000046% of the total length of contigs, indicating that the single-base error rate was approximately 0.000046%. Hi-C is an extension of chromosome conformation capture (3C) technology. An evaluation of the Hi-C assembly results showed that the six chromosome groups could be clearly distinguished. Within each group, the intensity of the interaction at the diagonal position ( Figure 1) is higher than that of the non-diagonal position, indicating a high intensity of interaction between adjacent sequences (diagonal position) and a weak interaction signal between non-adjacent sequences (nondiagonal positions), which is consistent with the principle of Hi-C genome assembly.

Genome Annotation
Based on the newly constructed repetitive sequence database, we predicted the number of repeat sequences in L. japonicus. There were approximately 357 Mb of repetitive sequences, accounting for 65.61% of the genome. We predicted the genetic structure of L. japonicus using de novo prediction, homologous species prediction, and unigene prediction. We then used EVM v1.1.1 software to integrate the prediction results and PASA v2.0.2. to modify the final gene models. We ultimately obtained 28,251 genes (Table S1). Non-coding RNA prediction and pseudo-gene annotation statistics are shown in Table S1. Motif annotation revealed 2788 motifs and 33,544 domains. Based on the predicted genes, a total of 27,991 genes could be annotated to databases such as NR (Table S2). In total, 14,794 predicted genes were annotated to the KOG database, accounting for 52.37% of the total

Genome Annotation
Based on the newly constructed repetitive sequence database, we predicted the number of repeat sequences in L. japonicus. There were approximately 357 Mb of repetitive sequences, accounting for 65.61% of the genome. We predicted the genetic structure of L. japonicus using de novo prediction, homologous species prediction, and unigene prediction. We then used EVM v1.1.1 software to integrate the prediction results and PASA v2.0.2. to modify the final gene models. We ultimately obtained 28,251 genes (Table S1). Non-coding RNA prediction and pseudo-gene annotation statistics are shown in Table S1. Motif annotation revealed 2788 motifs and 33,544 domains. Based on the predicted genes, a total of 27,991 genes could be annotated to databases such as NR (Table S2). In total, 14,794 predicted genes were annotated to the KOG database, accounting for 52.37% of the total ( Figure S3). Finally, 17,074 predicted genes were annotated to the GO database, accounting for 60.44% of the total (Figure 2).
( Figure S3). Finally, 17,074 predicted genes were annotated to the GO database, accounting for 60.44% of the total (Figure 2). The embedophyta_odb9 database in BUSCO v2 contains 1440 conserved core genes in the terrestrial plants. We used BUSCO to assess genome integrity. Among the genes used, 1335 complete BUSCO genes were identified, 1244 of which were single-copy genes. Only 75 genes were not found in the embryophyta_odb9 library. The BUSCO genome integrity score was 92.71%. The CEGMA assessment was further performed to identify 448 of the 458 core genes (97.82%) in the L. japonicus genome and 227 of the 248 highly conserved sequences (91.53%) in the L. japonicus genome. The quality of Hi-C libraries was confirmed by comparing efficiency, insert length, and effective Hi-C data volume. The sample reads and assembled genomes had an efficiency of 94.51%, and the ratio of unique mapped read pairs was 43.85%, indicating that they were suitable for subsequent analyses. Evaluation of insert length showed that the main peak sequence length of the inserted sample was ~350 bp. No deviation from the target region indicates that the insert size exhibits a normal distribution ( Figure S2). An evaluation of the effective Hi-C data volume showed that the library obtained a unique alignment of 41.09 M of sequence to the genome, 32.69 M of which were valid Hi-C data, accounting for 79.56% of the data for the genome.

Synteny and Species Evolutionary Analysis
We compared all predicted protein sequences with the published data to indirectly reveal the genome structure. A comparison of genomic structures revealed a high level of synteny between our newly generated LjPB_ver1.0 sequence and the previously published L. japonicus v3.0 (MG-20). The chromosome structures of both genomes were almost identical except for a few fragments. For 87% of the sequences, we found the corresponding block in LjPB_ver1.0, and for 2% of the sequences, we found two corresponding blocks in LjPB_ver1.0. For 9% of the L. japonicus v3.0 (MG-20) protein The embedophyta_odb9 database in BUSCO v2 contains 1440 conserved core genes in the terrestrial plants. We used BUSCO to assess genome integrity. Among the genes used, 1335 complete BUSCO genes were identified, 1244 of which were single-copy genes. Only 75 genes were not found in the embryophyta_odb9 library. The BUSCO genome integrity score was 92.71%. The CEGMA assessment was further performed to identify 448 of the 458 core genes (97.82%) in the L. japonicus genome and 227 of the 248 highly conserved sequences (91.53%) in the L. japonicus genome. The quality of Hi-C libraries was confirmed by comparing efficiency, insert length, and effective Hi-C data volume. The sample reads and assembled genomes had an efficiency of 94.51%, and the ratio of unique mapped read pairs was 43.85%, indicating that they were suitable for subsequent analyses. Evaluation of insert length showed that the main peak sequence length of the inserted sample was~350 bp. No deviation from the target region indicates that the insert size exhibits a normal distribution ( Figure S2). An evaluation of the effective Hi-C data volume showed that the library obtained a unique alignment of 41.09 M of sequence to the genome, 32.69 M of which were valid Hi-C data, accounting for 79.56% of the data for the genome.

Synteny and Species Evolutionary Analysis
We compared all predicted protein sequences with the published data to indirectly reveal the genome structure. A comparison of genomic structures revealed a high level of synteny between our newly generated LjPB_ver1.0 sequence and the previously published L. japonicus v3.0 (MG-20). The chromosome structures of both genomes were almost identical except for a few fragments. For 87% of the sequences, we found the corresponding block in LjPB_ver1.0, and for 2% of the sequences, we found two corresponding blocks in LjPB_ver1.0. For 9% of the L. japonicus v3.0 (MG-20) protein sequences, we could not find the corresponding block in LjPB_ver1.0. By contrast, for 20% of LjPB_ver1.0 sequences, we could not find the corresponding block in L. japonicus v3.0 (MG-20). For 76% of these sequences, we found the corresponding block in L. japonicus v3.0 (MG- 20), and for 2% of the sequences, we found two corresponding blocks in L. japonicus v3.0 (MG-20). Both genomes contain~2% of sequences with a lower degree of matching; we therefore did not include these sequences in our statistical analysis (Figure 3).  -20). For 76% of these sequences, we found the corresponding block in L. japonicus v3.0 (MG-20), and for 2% of the sequences, we found two corresponding blocks in L. japonicus v3.0 (MG-20). Both genomes contain ~2% of sequences with a lower degree of matching; we therefore did not include these sequences in our statistical analysis (Figure 3). We constructed a phylogenetic tree based on sequence alignment of single-copy gene families shared by Vitis vinifera, Lupinus angustifolius, L. japonicus (MG-20), Cicer arietinum, Medicago truncatula (A17), Trifolium pratense, Phaseolus vulgaris, Vigna radiata, Vigna angularis, and Glycine max. We calculated the divergence times of various species with r8s software (expressed in millions of years ago [MYA]) ( Figure 4). Based on an orthogroup analysis, we showed statistics concerning the orthologous genes among different species. The legend indicates the number of different orthologous genes, and the histogram represents the total statistics of the number of different orthologous genes of a single species. The other histogram shows the proportion of different orthologous genes in the genome of a species. The total proportion of orthologous genes in each species is not 100%, because, for some of the genes, we did not find orthologs in other species, perhaps because they are speciesspecific genes (Figure 4). We constructed a phylogenetic tree based on sequence alignment of single-copy gene families shared by Vitis vinifera, Lupinus angustifolius, L. japonicus (MG-20), Cicer arietinum, Medicago truncatula (A17), Trifolium pratense, Phaseolus vulgaris, Vigna radiata, Vigna angularis, and Glycine max. We calculated the divergence times of various species with r8s software (expressed in millions of years ago [MYA]) ( Figure 4). Based on an orthogroup analysis, we showed statistics concerning the orthologous genes among different species. The legend indicates the number of different orthologous genes, and the histogram represents the total statistics of the number of different orthologous genes of a single species. The other histogram shows the proportion of different orthologous genes in the genome of a species. The total proportion of orthologous genes in each species is not 100%, because, for some of the genes, we did not find orthologs in other species, perhaps because they are species-specific genes ( Figure 4). Synteny analysis of the L. japonicus genome with the genomes of other legume species revealed strong synteny between species. Although the L. japonicus genome only has six chromosomes, one of the chromosomes is longer than the others. Synteny analysis also indicated that the chromosomal fragments containing the most genes had a large number of corresponding regions in the genomes of other species. Perhaps during evolution, chromosome breaks or fusions occurred that led to the formation of different legume species ( Figure S4). Based on the results of phylogenetic analysis, a comparison of the synteny of adjacent species on a phylogenetic tree branch revealed a high degree of synteny. Information obtained from a comparison of synteny between neighboring species in the evolutionary tree at the genome level might be used to infer the general chromosomes of their ancestor species, which could shed light on the origin of nodulation in legumes ( Figure 5). We constructed a pattern diagram based on this idea ( Figure 6). Based on the results of phylogenetic analysis, we compared the synteny of adjacent species on a phylogenetic tree branch. We proposed a possible chromosomal rearrangement based on the linear correspondence of the chromosome fragments of the comparison results, which might have occurred in the ancestral species of the A and B species. Alternatively, perhaps the A and B species are closely related and do not share an intermediate ancestor species, but the A species might have arisen as a new species due to a chromosomal mutation in the B species. Based on our subjective analysis of phenotypes, species with better phenotypes are considered to be evolutionarily successful species, and species with weaker phenotypes are considered to be ancestral species. Synteny analysis of the L. japonicus genome with the genomes of other legume species revealed strong synteny between species. Although the L. japonicus genome only has six chromosomes, one of the chromosomes is longer than the others. Synteny analysis also indicated that the chromosomal fragments containing the most genes had a large number of corresponding regions in the genomes of other species. Perhaps during evolution, chromosome breaks or fusions occurred that led to the formation of different legume species ( Figure S4). Based on the results of phylogenetic analysis, a comparison of the synteny of adjacent species on a phylogenetic tree branch revealed a high degree of synteny. Information obtained from a comparison of synteny between neighboring species in the evolutionary tree at the genome level might be used to infer the general chromosomes of their ancestor species, which could shed light on the origin of nodulation in legumes ( Figure 5). We constructed a pattern diagram based on this idea ( Figure 6). Based on the results of phylogenetic analysis, we compared the synteny of adjacent species on a phylogenetic tree branch. We proposed a possible chromosomal rearrangement based on the linear correspondence of the chromosome fragments of the comparison results, which might have occurred in the ancestral species of the A and B species. Alternatively, perhaps the A and B species are closely related and do not share an intermediate ancestor species, but the A species might have arisen as a new species due to a chromosomal mutation in the B species. Based on our subjective analysis of phenotypes, species with better phenotypes are considered to be evolutionarily successful species, and species with weaker phenotypes are considered to be ancestral species.

Orthogroup Analysis
We performed orthogroup analysis of the ten plant species and identified 5436 orthogroups that were related to LjPB_ver1.0 (Table S3,S4). We focused on the largest orthogroup in LjPB_ver1.0, which contains 96 genes. Among the orthologous genes of other species, there are 82 orthologous genes in G. max, 11 in L. angustifolius, 11 in M. truncatula (A17), 86 in P. vulgaris, and 6 in T. pratense. No homologous genes are present in V. vinifera, C. arietinum, V. radiata or V. angularis (Figure 7). The total

Orthogroup Analysis
We performed orthogroup analysis of the ten plant species and identified 5436 orthogroups that were related to LjPB_ver1.0 (Table S3,

Orthogroup Analysis
We performed orthogroup analysis of the ten plant species and identified 5436 orthogroups that were related to LjPB_ver1.0 (Table S3, Table S4). We focused on the largest orthogroup in LjPB_ver1.0, which contains 96 genes. Among the orthologous genes of other species, there are 82 orthologous genes in G. max, 11 in L. angustifolius, 11 in M. truncatula (A17), 86 in P. vulgaris, and 6 in T. pratense. No homologous genes are present in V. vinifera, C. arietinum, V. radiata or V. angularis (Figure 7). The total number of genes in this orthogroup is 292. However, surprisingly, the functions of all of these genes are unknown, and the proteins have not been characterized (Table S5).
Genes 2020, 11, x FOR PEER REVIEW 11 of 15 number of genes in this orthogroup is 292. However, surprisingly, the functions of all of these genes are unknown, and the proteins have not been characterized (Table S5). We checked the expression data for these genes in the Noble Research Institute website (https://ljgea.noble.org/v2/index.php). Among these 96 genes, we found the expression levels of only 12 genes, some of which are highly expressed in roots, stems, and pods ( Figure 8). These findings suggest that these genes play important roles in plant growth and development in some legume plants. They also highlight the need to further study this huge orthogroup. We also checked the same information on the LotusBase website (https://lotus.au.dk/) [56][57][58][59]. Based on Blastp, we set "-max_target_seqs 1" to ensure that the LjPB_ver1.0 gene and Ljv3 expression data correspond to each other. However, as shown in Table S6, there are some differences between the two genomes. After removing the repeated IDs from all the gene IDs obtained, 37 genes were identified. Among the 37 genes, only five genes were identified to have expression data in the LotusBase ( Figure S1). The difference between the two genomes also indicates that the accuracy of sequencing and assembly is subject to verification by biological experiments. In fact, while it is difficult to verify the functions of such a large group of orthologous genes, further analysis of the orthogroup revealed that this orthogroup has only a few members in M. truncatula (A17), L. angustifolius, and T. pratense. The data suggest that we can study the functions of these genes in these three species to infer the possible functions of the largest group of orthologous genes in L. japonicus through functional annotation. We checked the expression data for these genes in the Noble Research Institute website (https://ljgea.noble.org/v2/index.php). Among these 96 genes, we found the expression levels of only 12 genes, some of which are highly expressed in roots, stems, and pods ( Figure 8). These findings suggest that these genes play important roles in plant growth and development in some legume plants. They also highlight the need to further study this huge orthogroup. We also checked the same information on the LotusBase website (https://lotus.au.dk/) [56][57][58][59]. Based on Blastp, we set "-max_target_seqs 1" to ensure that the LjPB_ver1.0 gene and Ljv3 expression data correspond to each other. However, as shown in Table S6, there are some differences between the two genomes. After removing the repeated IDs from all the gene IDs obtained, 37 genes were identified. Among the 37 genes, only five genes were identified to have expression data in the LotusBase ( Figure S1). The difference between the two genomes also indicates that the accuracy of sequencing and assembly is subject to verification by biological experiments. In fact, while it is difficult to verify the functions of such a large group of orthologous genes, further analysis of the orthogroup revealed that this orthogroup has only a few members in M. truncatula (A17), L. angustifolius, and T. pratense. The data suggest that we can study the functions of these genes in these three species to infer the possible functions of the largest group of orthologous genes in L. japonicus through functional annotation.

Discussion
In the current study, we generated a high-quality genome sequence for the model legume L. japonicus. This sequence contains a large amount sequence information, high sequencing depth, and long contigs. The high quality of our genome assembly makes it an excellent platform for molecular biology and genetic research. An analysis of orthologous genes in the whole genomes of several leguminous plants will provide insight into species evolution caused by the formation of gene families and gene clusters via repeats and mutation events. However, this analysis must be performed using two species that are particularly close to each other before they can be used to infer ancestral chromosome composition. Otherwise, there may be multiple mutations in chromosome compositions due to the large number of intermediate species. Such complexity could be confusing when trying to obtain information about gene synteny, preventing researchers from making accurate inferences. Therefore, studies on the evolution of species still require more genome-wide sequencing data.

Conclusion
Comparative genomics studies using bioinformatics techniques revealed similarities and differences between the re-sequenced and assembled genomes and previously published versions. Based on assembly evaluation and annotation of gene function evaluation, our data has higher accuracy, which will provide researchers with more accurate genomic information. We have submitted the data to Phytozome, which will soon be published on the website for use. The identification of all orthologous genes in 10 species will help provide critical evolutionary information when studying genes' functions.

Discussion
In the current study, we generated a high-quality genome sequence for the model legume L. japonicus. This sequence contains a large amount sequence information, high sequencing depth, and long contigs. The high quality of our genome assembly makes it an excellent platform for molecular biology and genetic research. An analysis of orthologous genes in the whole genomes of several leguminous plants will provide insight into species evolution caused by the formation of gene families and gene clusters via repeats and mutation events. However, this analysis must be performed using two species that are particularly close to each other before they can be used to infer ancestral chromosome composition. Otherwise, there may be multiple mutations in chromosome compositions due to the large number of intermediate species. Such complexity could be confusing when trying to obtain information about gene synteny, preventing researchers from making accurate inferences. Therefore, studies on the evolution of species still require more genome-wide sequencing data.

Conclusions
Comparative genomics studies using bioinformatics techniques revealed similarities and differences between the re-sequenced and assembled genomes and previously published versions. Based on assembly evaluation and annotation of gene function evaluation, our data has higher accuracy, which will provide researchers with more accurate genomic information. We have submitted the data to Phytozome, which will soon be published on the website for use. The identification of all orthologous genes in 10 species will help provide critical evolutionary information when studying genes' functions.