Next Article in Journal
Selection of New Field Pea Varieties for the Organic and Conventional Farming Systems in the Nemoral Climatic Zone
Next Article in Special Issue
Construction of an SNP Fingerprinting Database and Population Genetic Analysis of Auricularia heimuer
Previous Article in Journal
Real-Time Detection of Varieties and Defects in Moving Corn Seeds Based on YOLO-SBWL
Previous Article in Special Issue
The Molecular Mechanism of Mycelial Incubation Time Effects on Primordium Formation of Pleurotus tuoliensis Through Transcriptome and Lipidomic Analyses
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Genomic Inference Unveils Population Bottlenecks and a North-to-South Migration Pattern of Wild Cordyceps militaris Across China

National Health Commission Science and Technology Innovation Platform for Nutrition and Safety of Microbial Food, Guangdong Provincial Key Laboratory of Microbial Safety and Health and State Key Laboratory of Applied Microbiology Southern China, Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou 510070, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Agriculture 2025, 15(7), 686; https://doi.org/10.3390/agriculture15070686
Submission received: 16 February 2025 / Revised: 6 March 2025 / Accepted: 13 March 2025 / Published: 24 March 2025
(This article belongs to the Special Issue Genetics and Breeding of Edible Mushroom)

Abstract

:
The Ascomycete genus Cordyceps affects plant crops significantly, filling an important ecological niche. Cordyceps militaris (L.) Fr. presents many health benefits for humans, but its population history has not been reported. The objective of this research was to report the collection, population structure, demographic history, diversity, and cytosine deaminases of 43 wild strains of C. militaris in China through resequencing using an Illumina HiseqTM platform. All strains were assigned to the warm, subtropical, and middle temperate zone populations, confirmed by ADMIXTURE-1.3.0, PCA, and phylogenic analysis. Their population sizes declined historically, suggesting that this species suffered from bottlenecks in the wild. LD decays (r2) revealed a north-to-south migration pattern of wild C. militaris, consistent with the MSMC2-v2.1.4 analysis. The regions of high Pi were aggregating at the chromosomes CP023325.1 (51) and CP023323.1 (9), playing a key role in adaptation, especially for the sites on cytosine deaminase. Within the species, genetic differentiation was relatively high among the three populations (Fst = 0.083, 0.092, and even 0.109). According to the artificial intelligence-assisted (RoseTTAFold) predicted structures of the cytosine deaminases, they were classified into eight clades with unique, distinct, and structurally conserved domains, offering a potential suite of single- and double-stranded deaminases of great promise as tunable base editors for therapeutic and agricultural breeding applications. These provided new insights for mining novel proteins from macrofungi, structurally and functionally.

1. Introduction

The Ascomycete genus Cordyceps [1,2,3], which affects plant crops significantly, consists of 500 plus species [4,5] and is associated with fighting injurious insects, arthropods, nonarthropod microinvertebrates, and pathogens and parasites on food crops by causing infectious diseases against them and then filling an important ecological niche in the ecosystem [6,7,8]. Generally, these fungi always penetrate the cuticles of their host when competing with serious and complex interactions with the host, and a successful infection is only built when all barriers and reactions of the host are overcome or defeated [9]. Given their outstanding performance in infecting injurious insects and protecting agricultural crops, these fungi are usually developed as biocontrol agents, which demonstrate superiorities of biosafety, non-pollution, and degradation; since they come from nature and are utilized in nature to cause significant pest population reductions, they are important in many agricultural systems [10].
Cordyceps militaris (L.) Fr. occurs throughout much of the Northern Hemisphere and fills an important ecological niche as a pathogenic fungus against lepidopteran insect pupae, controlling the outbreaks of many harmful lepidopterans that harm agricultural production [11,12], such as Pyrausta nubilalis Hubern and Dendrolimus punctatus Walker, the pupae of which are dormant in the soil and emerge as adult moths that eat various plants and reduce production. Moreover, C. militaris follows the classic life cycle of entomopathogenic fungi, secreting multiple extracellular enzymes to invade its host efficiently during the pathogen–host interaction [13], including proteases [13], chitinases [14], and lipases [15], which degrade the structural components of the host. Among these, lipase (EC 3.1.1.3), which plays a crucial role in infecting hosts, is a water-soluble enzyme for catalyzing the hydrolysis reaction of apolar and water-insoluble ester substrates, such as long-chain triacylglycerols, and initiating the conidial adhesion on the epicuticle surface of its host in the early stage by releasing free fatty acids and alkenes and enhancing the hydrophobic interactions between the conidial surface and epicuticle through destroying the hydrophobic barrier of the cuticles of the insects and providing nutrients for fungal colonization and growth, which consist of lipoproteins, acylglycerol, and long-chain alkene esters [15]. These functional elements and mechanisms represent next-generation biotechnologies that may benefit people by converting various biomasses into major food sources.
For centuries, C. militaris has been used as a traditional herbal medicine in China, Japan, Korea, and other Asian countries [16] due to the large range of bioactive compounds it produces and its various benefits for protecting health [17], such as anti-hyperuricemia properties. A series of pharmacologically active components have been isolated and characterized, consisting of cordycepin, cordycepic acid, flavones, carotenes, macrolides, and polysaccharides, some of which play a role as valuable chemical markers for quality control, including cordycepin (3′-deoxyadenosine), which was the earliest and arguably most significant natural product first isolated in 1950 from C. militaris as the major bioactive component [18]. These components exhibit broad bioactivity as antimicrobial and antivirus agents and polyadenylation inhibitors, some of which are undergoing clinical trials. Their fungal biosynthesis was unveiled to be coupled with the production of pentostatin, which is an important drug against cancer, clinically [19]. Their biosynthesis involves HisG-type ATP phosphoribosyltransferase, xidoreductase (PenB), and phosphoribosylaminoimidazolesuccinocarboxamide synthetase. Also, the characteristic component cordycepin has served as an important skeleton for designing and synthesizing anti-virus drugs and pro-drugs since it has a structure similar to that of a nucleotide, which may disturb the DNA replication of viruses [20]. Overall, it inhibits tumors [21] and is an immune or metabolic modulator for preventing the deterioration of inner organs such as the lungs, liver [22], and kidneys. With these benefits and so many useful genes, C. militaris was cultivated widely in China and other countries for commercial purposes.
C. militaris features with its sexual fruiting bodies growing on the mycosed pupae, earning it the common name “pupa grass” in China [3]. Presently, C. militaris is cultivated in laboratories and industrial factories on worms or grain culture such as rice, mung beans, corn, and sunflower seeds, or by using other protein sources that can replace worms and insects because of their high market value. The cereal culture method is exploited most frequently in artificial culturing [23] since it results in fruiting body formation and the formula may be adjusted to fit the desire for high yield and high quantity of cordycepin and adenosine. Also, on another aspect, C. militaris cultivated with silkworm pupae has been reported to induce allergic events after consumption by some people due to the cross reactivity and incompatibility with silkworm pupae. However, certain aspects remain unclear about the nature and evolution of the interaction of the fungi with its hosts, the synthesis of its polysaccharides, and the evolution and diversity of its nucleases [24], proteases [13,25,26], cellulase [27,28] and deaminase [29], which may be keys for developing next-generation biotechnologies.
Previously, we built the Scientific Database of Edible and Medicinal Fungi in Institute of Microbiology, Guangdong Academy of Sciences over a period of 10 years. Based on that database, we obtained a wild strain of C. militaris with the merits of high cordycepin content, reaching 3.72 mg/kg (dried weight), and high polysaccharide content, reaching 6.7 g/100 g (dried weight) in fruiting bodies [30]. Significantly, it was confirmed that pairing the opposite mating-type isolates of C. militaris is a perquisite for the induction of sexual fruiting body formation, where sex mating is key for chromatin and DNA recombination and the evolution of the species [31,32,33]. Importantly, it was discovered by our group that it was a significant modulator against the metabolism disorder hyperuricemia due to the uric acid-lowering effects of an extract of C. militaris [34] (Yong et al., 2016) and its component cordycepin [35,36] through down-regulating URAT1, which is a novel and significant target for hyperuricemia and gout [37,38]. Also, its geographical origins were included [39]. On the other hand, it was reported that the exploration of mitochondrial genomes presents novel opportunities for enhancing mushroom cultivation biotechnology and medicinal applications [40]. Overall, most of studies on Cordyceps were concentrated on its pharmacologic effects and cultivation [41].
However, the population structure and demographic history have not been included so far. Quantifying its population structure, demographic history, and diversity and then mining its functional genes based on these remain challenging. We hypothesized that combining resequencing and clustering of predicted protein structures may be an effective strategy [42]. In this paper, 43 wild strains of C. militaris were collected across China, and their DNA sequences were re-sequenced for the first time using an Illumina HiseqTM platform (Figure 1a) to quantify the population structure, demographic history, and diversity and then mining their functional genes. Their population structure was determined by the ADMIXTURE, PCA and phylogenic tree analysis. Demographic analysis was conducted with MSMC2 to reveal that historical dynamics of their population sizes and their divergence history. Linkage disequilibrium analysis was conducted to reveal the evolution and distribution history of C. militaris strains. Hardy–Weinberg Equilibrium analysis was performed to illustrate the proportion of sites that have undergone natural selection. The nucleotide diversity Pi was analyzed to unveil the genome features and the sites that played a key role in adaption to the climate and other alterations. On the other hand, the Tajima’s D was examined to reveal the deviations from neutrality in order to identify evolutionary pressures and genomic variation. Genetic differentiation (Fst) analysis was carried out to reveal the genomic modules of divergences and differentiations between the three populations of C. militaris. Specifically, the cytosine deaminase of C. militaris was selected to carry out a combination of AI-assisted protein structure predictions, structural alignments, and clustering since it may be a key for next-generation biotechnology. This work demonstrates how genomic variation leads to functional diversity.

2. Materials and Methods

2.1. Collection of C. militaris Strains

In total, 43 strains of C. militaris (Table 1, Figure 1b) were collected over the vast geographic distribution of China, including middle temperate, warm temperate, and subtropical temperate zones (Figure 1c), and then they were stored in our in-house database, the Scientific Database of Edible and Medicinal Fungi in Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou, China. The strain numbers, collection locations, latitude, longitude, and collection dates are listed in Table 1. In addition, the morphologic features are depicted in Figure 1b.

2.2. Trans-Inoculation of Strains and Their DNA Extraction

Strains were trans-inoculated on a PDA medium (20% potato, 2% glucose, 2% agar, 0.3% KH2PO4, 0.15% MgSO4, trace vitamin B1) and then grown at 25 °C for 7 days in a dark environment [30]. Then, mycelium was isolated and collected from the culture and ground into powder under liquid nitrogen to extract DNA.
Briefly, DNA extraction was performed with a QIAGEN® DNA extraction kit (Cat#13323, QIAGEN, Dusseldorf, Germany) according to the manufacturer’s instructions. The purity of the extracted DNA was detected using a NanoDrop™One UV-Vis Detector (Thermo Fisher Scientific, Waltham, MA, USA) with OD260/280 in the range of 1.8-2.0 and OD260/230 in 2.0-2.2.

2.3. Construction of a DNA Library

To obtain proper DNA [36], ultrasonic waves were utilized to slice DNA sequences into randomized fragments. Then, fragments were amended by adding an A at the 3′-end. After the ligation of the sequencing adapter, magnetic beads were used to enrich the DNA sequences, and sequences of about 400 bp were absorbed, then amplified with PCR to construct a library. The constructed library was inspected, and then the qualified library was sequenced using an Illumina HiSeqTM platform with the method of Illumina PE150 and reads of 300 bp (paired-end) to target coverage of 30× [36]. The detailed flow is shown in Figure 1a.

2.4. Whole-Genome Re-Sequencing and SNP Calling

Raw data were downloaded from the Illumina HiseqTM platform and qualified [36], after which low-quality data were removed, and finally clean data were obtained. Therein, short reads were cleaned using Trimmomatic v.0.38 with a strict filtering process and then aligned to the C. militaris reference genome (CmilitarisCM01_v01) [4] using BWA v.0.7.15, and the obtained belongs of locations were BAM files. BAM files were corrected following the Best Practice of GATK, and SNPs and genotypes were called using HaplotypeCaller implemented in GATK v.4.1 [43]. SnpEff_v4_5 [44], Annovar 0.8 version [45], and genes of the reference genome were used to annotate the functions of SNPs. The obtained SNP markers were used for investigating the genetic diversity and evolutionary structure.

2.5. Population Structure Analyses

In order to elucidate the population structure, an admixture analysis was conducted with ADMIXTURE v.1.3.0 [46], and a principal component analysis (PCA) was performed with PCANGSD v.0.98 [47]. Also, the maximum likelihood algorithm was established for evolutionary trees with MEGAX. ADMIXTURE was run with predefined numbers of clusters (K) ranging from 1 to 19, each repeated 20 times. The K-value with the lowest cross-validation error was selected as the most likely number of putative genetic groups. PCANGSD was accomplished based on genotype likelihoods accounting for sequencing errors and uncertainty in genotype calls. Population structure analyses were carried out using common SNPs with missing rate < 0.1, minor allele frequency (MAF) > 5%, and correlation coefficient < 0.2 with any other SNPs in sliding windows of 50 SNPs.

2.6. Demographic History

The multiple sequentially Markovian coalescent (MSMC) was performed with MSMC v.2.00 to assess the cross-coalescence rate and to track variation in effective population size (Ne) over time using parameters reported [48]. The 2D joint-unfolded site frequency spectrum (SFS) was calculated with ANGSD v.0.935 based on intergenic sites, which are affected by selection least. Furthermore, the times of splitting events of the two were estimated according to MSMC2.

2.7. Population Genetic Statistics

ANGSD [49] was used to compute summary statistics including Pi, Tajima’s D, and Fst grounded on the folded SFS. All summary statistics were carried out in 10 kb nonoverlapping sliding windows. Fst was estimated by averaging all pair-wise Fst values between populations. Gene ontology (GO) was performed using GOWINDA [50] and Annovar [45]. LD between SNPs with a sliding window of 10 kb was computed using PLINK v.1.07 and then averaged over all pair-wise sites c. Only SNPs with MAF > 5% were included, and only windows with ≥ 10 SNPs were reserved for PLINK.

2.8. Protein Clustering and Analysis of Cytosine Deaminase of C. militaris

According to the annotated results, the SNPs of cytosine deaminase of C. militaris were selected and then extracted with VCFtools [51]. The variant nucleotide sequences were obtained by converting the SNPs by BCFtools with the genome region of the cytosine deaminase domain as reference, which was annotated above. Then, the protein sequences were obtained by translating the variant nucleotide sequences. After that, high confidence and accurate protein structure of each protein sequence were predicted by the AI algorithm RoseTTAFold [52] and filtered with average per-residue confidence metric predicted local-distance difference test (PLDDT) > 70. The paired structure alignment was performed based on the TM-score method with Foldseek [53] (van Kempen et al., 2023) to afford the overall structural matrix, which was then clustered, and the representatives of each cluster was presented by PyMol 2.1.0.

3. Results

3.1. Population Structure and Demographic History of the Three Populations of C. militaris

Forty-three strains (Figure 1b and Table 1) of C. militaris were collected from locations throughout China in different temperate zones (Figure 1c). Each sample was resequenced on an Illumina HiseqTM platform (paired-end 300 bp) to a target coverage 30×. Genetic structure analysis for a population may provide its origin clues and components. High-quality SNPs was analyzed and clustered with ADMIXTURE software to obtain the genetic structure. It was found that cross-validation error was the lowest when K was 2 during the scanning of K between the range of 1 to 19 (Figure 1d), suggesting that all the strains collected may originate from two common ancestors. Eight, seven, and eighteen individuals were assigned to the warm temperate, subtropical temperate, and middle temperate zone populations of C. militaris when K was set at 2, respectively. One individual showed a high admixture, which was excluded from downstream analyses (Figure 1e). Others were discarded and not considered since they failed to grow fruiting bodies.
Generally, PCA analysis was conducted on the basis of pure mathematics, in which several variables were transformed linearly from large variables. According to PCA analysis, all wild C. militaris strains were clustered into three populations, consisting of middle, warm temperate, and subtropical zone populations, except for two from the subtropical temperate zone (Figure 2a). Phylogenic trees are frequently used to elucidate and interpret the evolutionary distance and relationships between strains; strains are placed on the branches of a tree scheme, demonstrating evolutionary history and genetic relationships. The maximum likelihood algorithm was used to describe and establish the phylogenic tree of the collected populations (Figure 2b). All samples of the subtropical temperate zone population were separated from the warm temperate zone population except for S27, which was collected from Yaoxiang, Shandong Province. In terms of population structure, the PCA and maximum likelihood analyses further confirmed the separation patterns and clustering of the genetic differentiations supplied by the ADMIXTURE algorithm.
Overall, the three populations were all collected from locations in relatively low temperate climates, underlining the shade-loving feature of C. militaris and implying that the spread of C. militaris may be from north to south in China. On the other hand, the three populations presented different population sizes, with a pattern of subtropical zone > middle temperate zone > warm temperate zone (Figure S1a). In particular, the subtropical and middle temperate zone populations showed much larger population sizes than the warm temperate zone population, further implying the shade-loving feature and north-to-south migration.
Furthermore, all three populations experienced several bottlenecks and booms of different magnitudes and durations (Figure 2c). The middle temperate zone population boomed about 0.021 Myr between 0.025 Myr and 0.004 Myr ago, and then it bottlenecked about 0.001 Myr between 0.003 Myr and 0.0005 Myr. Then it was reduced to nearly zero, which may have been caused by over-exploitation due to its high value as medicine. The warm temperate zone population presented a much smaller population size in comparison to the other two populations. It boomed at 0.1 Myr ago and bottlenecked at 0.01 Myr ago, lasting about 0.09 Myr. It then boomed immediately at 0.01 Myr ago, and this ended at 0.001 Myr, lasting for 0.009 Myr. Then, it boomed and then declined to nearly zero, which may be attributed to over-exploitation. In detail, the subtropical temperate zone population experienced a short bottleneck between 0.05 Myr and 0.04 Myr years ago, lasting about 0.01 Myr, and then it boomed for about 0.04 Myr between 0.04 Myr and 0.0002 Myr ago. Interestingly, it declined evidently within the recent 0.0002 Myr, which may be have been caused by its over-exploitation. It was evident that the population sizes for the three populations in recent years were declining quickly, indicating the resources for this species suffered a bottleneck. These were consistent with modern breeding, which was fostered by the wild resource over-exploitation and the high value of this fungus as a traditional medicine in various Asia countries.
Demographic analyses using MSMC were conducted to reveal the divergence between the three populations of C. militaris (Figure 2d). Locally, the split in recent years between the middle and subtropical temperate zone populations occurred about 0.0002 Myr ago, and this occurred for the middle and warm temperate zone populations 0.00035 Myr ago. These events happened simultaneously with the declines in the sizes of the three populations. Specifically, the separating event occurred at 0.4 Myr ago for the subtropical and warm temperate zone populations. Over the whole relative cross−coalescence rate analysis, the warm population was first to start separating from the middle temperate population at 2.5 Myr and from the subtropical temperate population before 20 Myr, and it hybridized with the two in recent years and at about 7.5 Mry, experiencing several splits and hybridizations (Figure S1b). In the process, the population expansions between 0.1 Myr and 0.01 Myr ago were consistent with its relatively low hybridization with the subtropical temperate zone population, as reflected by the relative cross-coalescence rate analysis. Also, the recent splits between them were accompanied by steep population declines at almost the same time scales. Interestingly, it was found that the migration pattern of C. militaris may be from the north to south, provided by the earlier splitting times of middle versus warm temperate zone populations and the later splitting times of middle versus subtropical temperate zone populations. Possibly, the divergence between the subtropical temperate population and middle temperate population occurred earlier than that between the warm temperate population and subtropical temperate population, demonstrating further that the spread of C. militaris may be from the low temperate zone to the warm temperate zone. It gained its tolerance to high temperatures gradually.

3.2. Linkage Disequilibrium Analysis

For a population, the frequency of simultaneous inheritance of two genes at different loci being higher than random frequency is called linkage disequilibrium, providing the negative genetic element of a species. Generally, r2 is obtained as linkage disequilibrium, through analyzing SNP combinations at the same chromosome and the linkages of SNPs in all samples. The linkage is high when it approaches 1. By fitting SNP distance and r2, it would be discovered that the closer the SNP distance, the higher the r2. A slow rate of r2 decay indicates a high probability of SNP linkage. Generally, a species with a quick decay of r2 may be original. In this research, r2 values were calculated and are shown in Figure 3a. It was observed that strong LD (r2 > 0.3) occurred for the three populations at short distances. The LD decays of the three populations were r2 at 0.27, 026, and 0.23 for the middle, subtropical, and warm temperate zone populations, corresponding to distances of about 35 kb, 35 kb, and 10 kb. LD decay analysis suggested that the LD reaches very low values for distances greater than 150 kb (r2 < 0.10). Linkage disequilibrium analysis revealed that C. militaris strains of middle temperate zone resemble the common ancestors, supporting that C. militaris in China may have originated from the north and then spread to the south.

3.3. Population Diversity of C. militaris

SNP distributions at each chromosome were presented (Figure 3b). Testing for deviations from Hardy–Weinberg equilibrium (HWE) offers fundamental information about genetic variation and evolutionary processes in natural populations. Overall, 176,310 sites were analyzed to obtain Hardy–Weinberg equilibrium for their alleles. Of those, 169,775 deviated from Hardy–Weinberg equilibrium (Figure 3c), illustrating that a large proportion of sites had undergone natural selection and also that natural selection has played an important role in the evolution and species shaping of C. militaris.
Overall, the observed homozygosities of the three populations were at 0.932, 0.959, and 0.896 for the middle, subtropical, and warm temperate zone populations (Table 2). On the other hand, the observed heterozygosities of the three populations were 0.068, 0.041, and 0.104. Also, the average Pi within each population was at 0.261, 0.275, and 0.229. The genetic variation was heterogeneous across the genome for the three populations. The overall level of genome-wide polymorphisms of the three populations was estimated with nucleotide diversity Pi, which was in the range of 0–0.041, with an average of 0.016 and a max of 0.041 (Figure 3d). Altogether, the genome of C. militaris presented relatively stable and conserved features. Interestingly, there were 60 regions with Pi above 0.39, aggregating at chromosomes CP023325.1 (51) and CP023323.1 (9). The regions of relatively high nucleotide diversity at the two chromosomes may have been subjected to distinct selection mechanisms and/or demographic histories and then played a key role in adaption to the climate alterations in immigration or in the history of climate change. Tajima’s D was calculated for the whole genome of the strains collected for the three populations (Figure 3e). The average Tajima’s D score for them was found to be −0.943 with a standard deviation of 0.809, revealing that most elements of their genomes did not deviate significantly from neutrality. However, 127 sites or genes were flagged as not evolving neutrally, including 88 with a Tajima’s D equal to or above 2 and 39 equal to or below −2.
Genetic diversity is the main component of biological diversity, for which species present their unique gene library, genetic organization, and then phenotype diversity. The genetic diversity of species and populations is formed during their long-term evolution and then serves in return as the prerequisite for their survival and evolutionary adaptation to climate alterations. To determine the pattern of sequence diversity between the three populations, we estimated relative sequence divergence, Fst, between the three populations by sequencing the individuals sampled from locations across China (Table 3). Fst is approaching 1 when the speciation of two populations is high but approaching 0 when speciation is low. Consistent with the crystal population separation and then the structure, the average Fst was at 0.109 in the analyses of the warm versus the subtropical temperate zone populations over 3,448,082 sites of the genome, 0.092 for the warm versus the middle temperate zone populations, and 0.083 for the subtropical versus the middle temperate zone populations (Table 3), given that they were within the species C. militaris. At a glance, a relatively high Fst was observed for almost the whole genome except for a few regions with elevated Fst on seven chromosomes (Figure 3f–h). At a finer scale, three sharp peaks (Fst > 0.70) were found on chromosome CP023322.1, two on CP023323.1, four on CP023324.1, three on CP023325.1, four on CP023326.1, three on CP023327.1, and three on CP023328.1 for the comparison of the warm versus the subtropical temperate zone populations (Figure 3f). Overall, 22 Fst peaks were observed between the warm and the subtropical temperate zone populations. Correspondingly, 21 Fst peaks were obtained in the comparison of the warm versus the middle temperate zone populations, including two peaks on CP023322.1, three on CP023323.1, three on CP023324.1, three on CP023325.1, four on CP023326.1, three on CP023327.1, and three on CP023328.1 (Figure 3g). On the other hand, 22 Fst peaks were extracted through the comparison of the subtropical versus the middle temperate zone populations, consisting of three peaks on CP023322.1, two on CP023323.1, three on CP023324.1, five on CP023325.1, three on CP023326.1, three on CP023327.1, and three on CP023328.1 (Figure 3h). Thus, the Fst peaks in the seven chromosomes represent genomic modules of divergences and differentiations between the three populations of C. militaris. Selective sweep was induced by the decline and sweep of the difference of adjacent nucleotides at selective loci under the stress of some strong positive natural selection. As a mutant is generated and then leads to an elevation of adaption to the environment, selective sweep may be produced. Natural selection benefits the survival of individuals with strong adaption, and as time goes on, the frequency of novel mutants for alleles increases gradually, and then linkages of neutral mutants and novel mutants may increase. Then, the domain of selective sweep in a genome forms a positive haplotype gradually, leading to a decrease. The Fst and Pi selected thresholds were at 0.95 and 0.05, Tajima’s D at 0.05 and 0.95, and these were correlated with each other to extract candidate regions and mutant loci in the regions. Selective sweep loci were identified (Figure 4a,b), and cytosine deaminase was selected as a candidate.

3.4. Representative Predicted Structures for Eight Deaminase Clades

It was considered that structure determines function, which enabled the comparison and clustering of known or predicted protein structures, potentially classifying cytosine deaminases of C. militaris into functional clades and providing novel functions. Thus, an AI-assisted protein structure protein prediction, structure alignment, and clustering were combined to generate protein classification relationships among cytosine deaminases. From the strains of C. militaris, several non-synonymous cytosine deaminases were provided by calling SNPs and reconstructing sequences with the cytosine deaminase domain in the genome of C. militaris as reference. All the structures of the obtained protein sequences were predicted accurately with the AI algorithm RoseTTAFold. Also, multiple structural alignments (MSTAs) of the sequences were conducted, and then structural similarity matrices between these proteins were generated (Table S1), reflecting their overall structural correlations. According to the structure similarity matrices, clustering was carried out to classify them into eight unique structural clades (Figure 4c and Table S2), and the cytosine deaminases within each clade had distinct conserved protein structural domains, reflecting their diversity and then evolution at the structural and functional levels.

4. Discussion

In this paper, we reported the collection and re-sequencing, population structure, demographic history, diversity, and a cytosine deaminase of 43 wild strains of C. militaris collected from locations across China, offering a potential suite of single- and double-stranded deaminases of great promise as tunable base editors for therapeutic or agricultural breeding applications (Figure 4d). Genomic sequencing of fungi is rare since they present numerous genera and species and bisporous features in general. Therefore, the dynamics of their population sizes were seldom reported. The population size history of C. militaris is still poorly known, although it is essential for understanding the complex interactions and convolutions between it and its hosts and revealing the spread of their populations. It is likely to be difficult or impossible to obtain ancient DNA from C. militaris samples that would aid in dating its emergence. However, it would be particularly notable if present-day C. militaris genomic sequences could be used to robustly infer both the recent and ancient population size histories of this species. Population size changes that occurred hundreds of thousands of years ago affected the rates of coalescence and thus have left their signatures in the site frequency spectrum (SFS) of genomic sequences. In this research, the SFS is the distribution of allele frequencies in the sequences, randomly collected from the present-day population of C. militaris. Each SFS category contains a certain number of mutations of the same size. All three populations experienced several bottlenecks with different magnitudes and durations, especially in recent years. In order to conserve this species and its genetic diversity, recommendations for conservation parks should be established for this and other valuable species.
LD is an important tool in association studies as well as in studies aiming to evaluate genetic diversity. Thus, LD has been used in several studies to determine the diversity and history, signatures of selection, recombination rates, effective population size and other population events. Moreover, for high-resolution association mapping, it is also necessary to identify haplotype-block structures and a minimal set of polymorphisms. In this research, the LD pattern of the genome of C. militaris was evaluated. Furthermore, LD revealed that the middle temperate zone population resembled its common ancestors, implying that C. militaris in China may have migrated from the north to the south.
Darwin’s evolutionary theory, presented in the book The Origin of Species, serves as a foundation of biology. However, the preservation and evolution of natural variation and selection within populations was still a puzzle at that time. About fifty years later, G.H. Hardy and W. Weinberg presented the proof of variation in a population mathematically, depicting that random mating results in stationary allele and genotype frequencies over generations. The genotypic frequencies were simply considered allele frequencies and called Hardy–Weinberg proportions (HWPs). In population genetics and evolutionary genetics, the proof of Hardy–Weinberg equilibrium [54] is regarded as the Law of Inertia of biology and is thus an important landmark. Hence, testing HWE is a routine and important procedure to infer the genetic basis of population evolution and to identify evidence for genetic associations. For the collected C. militaris strains, a large proportion of sites have undergone natural selection, playing an important role in this species’ evolution and its species shaping. Over decades, considerable interest has been focused on detecting natural selection through sequencing. The neutrality test for allele frequency is popular, especially Tajima’s D testing [55]. Numerous genes were determined to have undergone natural selection, such as lactase, trpv6, and the HLA immune complex. In this work, Tajima’s D was estimated for the whole genome of the collected strains, revealing that most elements of their genomes have not deviated significantly from neutrality. However, several sites or genes were flagged as not evolving neutrally.
Fst is defined as (πb − πw)/(πb + πw), where πb (also known as dxy) and πw are the absolute pairwise divergence between and within populations [56], respectively. An increase in Fst can therefore be raised from an increase in πb, a decline in πw, or a synergetic behavior of these two. Genome scans of closely related populations or species have revealed “genome modules” as peaks of high relative sequence divergence (Fst) that stand out against a lower “sea” of divergence. Their causes remain unclear, but they have been suggested to contain key loci involved in local adaption and/or reproductive isolation. However, their significance for speciation or differentiation with or without gene flow between populations is a matter of debate. One hypothesis argued that gene flow is unimpeded across most of the genome, reducing inter-population diversity, except for loci under divergent selection and loci in close physical linkage to selected loci. Another hypothesis is that genomic modules reflect selective sweeps, where specific alleles are driven to high frequency, thus reducing within-population diversity. These two hypotheses are typically presented as alternatives, although they are not mutually exclusive—both barriers and selective sweeps of gene flows may play a role. Here, we determined how these processes were involved in controlling flora differences and divergences between subset populations of C. militaris. Consistent with the crystal population separation and then the structure, genetic differentiation was relatively high among the three populations, given that they were within a species C. militaris. The Fst peaks in the seven chromosomes represent genomic modules [57] of divergences and differentiations between the three populations of C. militaris. This system has the advantage of being genetically tractable and having a hybrid zone that allows selection and gene flow to be analyzed in nature. However, these may be limited.
Elucidating the functions and mechanisms of genes and their translated proteins underlies modern biotechnology, which has been exploited functionally, such as through base editing [58], prime editing, epigenome editing, gene editing, and PROTACs [59], and has propelled the life sciences forward greatly, even defining an age of the life sciences. On the other hand, protein mining has been focused extensively in bio-enzyme refining and antibody and vaccine design, but it relies on sequences instead of structures from the past. However, structures define docking and interactions with others, as well as functions. In this work, the sequences of cytosine deaminase, which is an important nuclease functioning as a base editor for next-generation biotechnology, were provided from the collected C. militaris strains since they demonstrated importance in adaption. Specifically, cytosine deaminase is a significant component that may exert enzyme activity as a nuclease modifying bases on DNA and serve as an important tool in the next-generation biotechnology boom. Their structures were predicted accurately utilizing the AI algorithm RoseTTAFold [52,60], and then clustered subsequently based on predicted structure similarities [61]. In the future, cytidine deaminases will be engineered and may be efficient cytosine base editors packaged into a single adeno-associated virus. Also, some of them will be profiled and confirmed by editing plants, animals, and human cells. These discovered deaminases, based on AI-assisted structural predictions, may greatly expand the utility of base editors for therapeutic and agricultural applications. Accurate protein structure prediction and clustering may be generated on the basis of protein structural alignments, even without the use of contextual information such as conserved gene neighborhoods and domain architectures. When using structure-based clustering, different clades reflect unique structures, implying distinct catalytic functions and properties. Generally, structure-based clustering is much more robust and effective at sorting functional similarities than traditional 1D amino acid sequence-based clustering. AI-assisted 3D protein structures provide reliable clustering results and only require an amino acid sequence, making them a convenient and effective strategy for generating protein relationships and the discovery of novel functions.

5. Conclusions

In conclusion, we reported the collection and re-sequencing of 43 wild strains of C. militaris from sites across China, for which the population structure, demographic history, diversity, and a cytosine deaminase of C. militaris in China were examined. It was found that all the collected strains may originate from two common ancestors. Also, the individuals could be assigned to the warm, subtropical, and middle temperate zone populations, confirmed by PCA analysis and a phylogenic tree with a maximum likelihood algorithm. The three populations presented a population size pattern of subtropical zone > middle temperate zone > warm temperate zone, implying a north-to-south migration pattern. Furthermore, all three populations experienced bottlenecks and booms, especially in recent years. Strong LD occurred for the three populations at short distances. C. militaris strains of the middle temperate zone resemble the common ancestors, implying that C. militaris in China may have originated from northern China and then spread forward to the south, as supported by population size analysis. A large proportion of sites have undergone natural selection, and also, natural selection has played an important role in the evolution of C. militaris and its species shaping. The nucleotide diversity presented a relatively stable and conserved feature. Interestingly, the regions of high Pi were aggregated at the chromosomes CP023325.1 and CP023323.1, playing a key role in adaption to environmental alterations, especially the regions at cytosine deaminase. Genetic differentiation was relatively high among the three populations, given that they were within the species C. militaris. Relatively high Fst was observed for almost the whole genome except for a few regions with elevated Fst on the seven chromosomes. Thus, the Fst peaks in the seven chromosomes represent genomic modules of divergences and differentiations between the three populations of C. militaris. According to the structure similarity matrices, clustering was carried out to classify cytosine deaminases into eight unique structural clades. The cytosine deaminases within each clade have distinct conserved protein structural domains, reflecting their diversity and evolution at the structural and functional levels. This work demonstrates how genomic variation leads to functional diversity.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/agriculture15070686/s1, Figure S1: (a) Overlap of the historic effective population size (Ne) for the C. militaris of the three temperate zones estimated by multiple sequentially Markovian coalescent (MSMC2). Ne was described as e.g., 5e+03 equals to 5 × 103. (b) Divergence processes for each pair of the C. militaris of the three temperate zones inferred by MSMC2 over a long time period; Table S1: Query and target identifier, TMscore, translation (3) and rotation vector = (3 × 3); Table S2: The protein structure clustering.

Author Contributions

Conceptualization, T.Y.; methodology, T.Y., Y.L. and M.C.; validation, H.H.; formal analysis, T.Y., Y.L. and M.C.; investigation, T.Y., Y.L., M.C., L.Z., X.W., H.G., H.H., Y.G., S.C., Y.X. and W.Z.; resources, Y.L.; writing—original draft preparation, T.Y.; writing—review and editing, T.Y.; visualization, T.Y.; supervision, H.H.; project administration, T.Y. and H.H.; funding acquisition, T.Y. and H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key Research and Development Program for Food Nutrition and Safety (No. 2023YFF1104100, China), the Guangdong Provincial Science and Technology Fund Special Project (No. 210909154531306, China), the Key Research and Development Program of Guangdong Province (No. 2022B1111040002, China), the Science and Technology Program of Linzhi (No. LZZX-05, China), the Strategic Special Project for Rural Vitalization of Guangdong Province (No. 2022-WJS-00-001, China), the Natural Science Foundation of Guangdong (No. 2022A1515011066 and 2021A1515010960, China) and the Guangdong Province Sail Plan (No. 2017YT05S115, China).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

References

  1. Tran, M.H.; Nguyen, T.M.; Huynh, V.B.; Sridhar, K.; Deshmukh, S.K.; Fung, S.-Y.; Mahadevakumar, S. Diversity evaluation of Cordyceps spp. in Bidoup Nui Ba, Lam Dong province, Vietnam. In IOP Conference Series: Earth and Environmental Science, Proceedings of the 4th International Conference on Sustainable Agriculture and Environment, Ho Chi Minh City, Vietnam, 17–19 November 2022; IOP Publishing Ltd.: Ho Chi Minh City, Vietnam, 2023; Volume 1155, p. 012003. [Google Scholar]
  2. Tiwari, M.; Saraf, A.; Khelkar, T. Exploring the world of Cordyceps: Ecology, cultivation, biotechnology, and future horizons. In Futuristic Trends in Biotechnology; Iterative International Publishers (IIP), Selfypage Developers Pvt Ltd.: Novi, MI, USA, 2024; Volume 3, pp. 112–126. [Google Scholar]
  3. Chen, W.; Han, Y. Taxonomy, phylogeny, and genetics of Cordyceps. In Advances in Cordyceps Research, 1st ed.; Sridhar, K., Deshmukh, S.K., Fung, S.-Y., Mahadevakumar, S., Eds.; CRC Press: Boca Raton, FL, USA, 2024; pp. 1–24. [Google Scholar]
  4. Zheng, P.; Xia, Y.; Xiao, G.; Xiong, C.; Hu, X.; Zhang, S.; Zheng, H.; Huang, Y.; Zhou, Y.; Wang, S.; et al. Genome sequence of the insect pathogenic fungus Cordyceps militaris, a valued traditional Chinese medicine. Genome Biol. 2011, 12, R116. [Google Scholar] [CrossRef] [PubMed]
  5. Mahadevakumar, S.; Sridhar, K.R. An overview of the phylogeny of Cordyceps. In Advances in Cordyceps Research, 1st ed.; Sridhar, K., Deshmukh, S.K., Fung, S.-Y., Mahadevakumar, S., Eds.; CRC Press: Boca Raton, FL, USA, 2024; pp. 1–21. [Google Scholar]
  6. Roberson, R.W. Subcellular structure and behaviour in fungal hyphae. J. Microsc. 2020, 280, 75–85. [Google Scholar] [PubMed]
  7. Woolley, V.C.; Teakle, G.R.; Prince, G.; de Moor, C.H.; Chandler, D. Cordycepin, a metabolite of Cordyceps militaris, reduces immune-related gene expression in insects. J. Invertebr. Pathol. 2020, 177, 107480. [Google Scholar] [PubMed]
  8. Wang, Y.; Dong, Q.-Y.; Luo, R.; Fan, Q.; Duan, D.-E.; Dao, V.-M.; Wang, Y.-B.; Yu, H. Molecular phylogeny and morphology reveal cryptic species in the Cordyceps militaris complex from Vietnam. J. Fungi 2023, 9, 676. [Google Scholar] [CrossRef]
  9. Wang, J.B.; Leger, R.S.; Wang, C. Advances in genomics of entomopathogenic fungi. Adv. Genet. 2016, 94, 67–105. [Google Scholar]
  10. Lei, Y.; Hussain, A.; Guan, Z.; Wang, D.; Jaleel, W.; Lyu, L.; He, Y. Unraveling the mode of action of Cordyceps fumosorosea: Potential biocontrol agent against Plutella xylostella (Lepidoptera: Plutellidae). Insects 2021, 12, 179. [Google Scholar] [CrossRef]
  11. Zhang, J.; Wen, C.; Duan, Y.; Zhang, H.; Ma, H. Advance in Cordyceps militaris (Linn) Link polysaccharides: Isolation, structure, and bioactivities: A review. Int. J. Biol. Macromol. 2019, 132, 906–914. [Google Scholar]
  12. Avery, P.B.; Kumar, V.; Francis, A.; McKenzie, C.L.; Osborne, L.S. Compatibility of the predatory beetle, delphastus catalinae, with an entomopathogenic fungus, Cordyceps fumosorosea, for biocontrol of invasive pepper whitefly, Aleurothrixus trachoides, in Florida. Insects 2020, 11, 590. [Google Scholar] [CrossRef]
  13. Kato, T.; Nishimura, K.; Misu, S.; Ikeo, K.; Park, E.Y. Changes of the gene expression in silkworm larvae and Cordyceps militaris at late stages of the pathogenesis. Arch. Insect Biochem. Physiol. 2022, 111, e21968. [Google Scholar] [CrossRef]
  14. Zhang, Z.J.; Yin, Y.Y.; Cui, Y.; Zhang, Y.X.; Liu, B.Y.; Ma, Y.C.; Liu, Y.N.; Liu, G.Q. Chitinase is involved in the fruiting body development of medicinal fungus Cordyceps militaris. Life 2023, 13, 764. [Google Scholar] [CrossRef]
  15. Lee, J.; Lee, H.; Lee, J.; Chang, P.S. Heterologous expression, purification, and characterization of a recombinant Cordyceps militaris lipase from Candida rugosa-like family in Pichia pastoris. Enzym. Microb. Technol. 2023, 168, 110254. [Google Scholar] [CrossRef] [PubMed]
  16. Paterson, R.R.M. Cordyceps—A traditional Chinese medicine and another fungal therapeutic biofactory? Phytochemistry 2008, 69, 1469–1495. [Google Scholar] [CrossRef]
  17. Yue, K.; Ye, M.; Zhou, Z.; Sun, W.; Lin, X. The genus Cordyceps: A chemical and pharmacological review. J. Pharm. Pharmacol. 2012, 65, 474–493. [Google Scholar] [CrossRef] [PubMed]
  18. Cunningham, K.G.; Manson, W.; Spring, F.S.; Hutchinson, S.A. Cordycepin, a metabolic product isolated from cultures of Cordyceps militaris (Linn.) Link. Nature 1950, 166, 949. [Google Scholar] [CrossRef] [PubMed]
  19. Xia, Y.; Luo, F.; Shang, Y.; Chen, P.; Lu, Y.; Wang, C. Fungal cordycepin biosynthesis is coupled with the production of the safeguard molecule pentostatin. Cell Chem. Biol. 2017, 24, 1479–1489.e1474. [Google Scholar] [CrossRef]
  20. He, J.; Liu, S.; Tan, Q.; Liu, Z.; Fu, J.; Li, T.; Wei, C.; Liu, X.; Mei, Z.; Cheng, J.; et al. Antiviral potential of small molecules cordycepin, thymoquinone, and n6, n6-dimethyladenosine targeting SARS-CoV-2 entry protein ADAM17. Molecules 2022, 27, 9044. [Google Scholar] [CrossRef]
  21. Liao, Y.; Ling, J.; Zhang, G.; Liu, F.; Tao, S.; Han, Z.; Chen, S.; Chen, Z.; Le, H. Cordycepin induces cell cycle arrest and apoptosis by inducing DNA damage and up-regulation of p53 in Leukemia cells. Cell Cycle 2015, 14, 761–771. [Google Scholar] [CrossRef]
  22. Lan, T.; Yu, Y.; Zhang, J.; Li, H.; Weng, Q.; Jiang, S.; Tian, S.; Xu, T.; Hu, S.; Yang, G.; et al. Cordycepin ameliorates nonalcoholic steatohepatitis by activation of the amp-activated protein kinase signaling pathway. Hepatology 2021, 74, 686–703. [Google Scholar] [CrossRef]
  23. Sirithep, K.; Xiao, F.; Raethong, N.; Zhang, Y.; Laoteng, K.; Hu, G.; Vongsangnak, W. Probing carbon utilization of Cordyceps militaris by sugar transportome and protein structural analysis. Cells 2020, 9, 401. [Google Scholar] [CrossRef]
  24. Saito, M.; Xu, P.; Faure, G.; Maguire, S.; Kannan, S.; Altae-Tran, H.; Vo, S.; Desimone, A.; Macrae, R.K.; Zhang, F. Fanzor is a eukaryotic programmable RNA-guided endonuclease. Nature 2023, 620, 660–668. [Google Scholar] [CrossRef]
  25. Balakireva, A.V.; Kuznetsova, N.V.; Petushkova, A.I.; Savvateeva, L.V.; Zamyatnin, A.A., Jr. Trends and prospects of plant proteases in therapeutics. Curr. Med. Chem. 2019, 26, 465–486. [Google Scholar] [PubMed]
  26. Cui, Z.; Zeng, C.; Huang, F.; Yuan, F.; Yan, J.; Zhao, Y.; Zhou, Y.; Hankey, W.; Jin, V.X.; Huang, J.; et al. Cas13d knockdown of lung protease Ctsl prevents and treats SARS-CoV-2 infection. Nat. Chem. Biol. 2022, 18, 1056–1064. [Google Scholar] [CrossRef]
  27. You, C.; Chen, H.; Myung, S.; Sathitsuksanoh, N.; Ma, H.; Zhang, X.-Z.; Li, J.; Zhang, Y.H.P. Enzymatic transformation of nonfood biomass to starch. Proc. Natl. Acad. Sci. USA 2013, 110, 7182–7187. [Google Scholar] [CrossRef] [PubMed]
  28. Xu, X.; Zhang, W.; You, C.; Fan, C.; Ji, W.; Park, J.-T.; Kwak, J.; Chen, H.; Zhang, Y.-H.P.J.; Ma, Y. Biosynthesis of artificial starch and microbial protein from agricultural residue. Sci. Bull. 2023, 68, 214–223. [Google Scholar]
  29. Huang, J.; Lin, Q.; Fei, H.; He, Z.; Xu, H.; Li, Y.; Qu, K.; Han, P.; Gao, Q.; Li, B.; et al. Discovery of deaminase functions by structure-based protein clustering. Cell 2023, 186, 3182–3195.e3114. [Google Scholar] [PubMed]
  30. Feng, D.; Hu, H.; Yong, T.; Liu, Y.; Xiao, C.; Huang, L.; Xie, Y.; Wu, Q. Induction of sexual fruiting-body formation by pairing the opposite mating-type isolates of Cordyceps militaris. Mycosystema 2023, 42, 344–352. [Google Scholar]
  31. Zu, Z.; Wang, S.; Zhao, Y.; Fan, W.; Li, T. Integrated enzymes activity and transcriptome reveal the effect of exogenous melatonin on the strain degeneration of Cordyceps militaris. Front. Microbiol. 2023, 14, 1112035. [Google Scholar] [CrossRef]
  32. Wang, X.; Li, X.E.; Qiu, W.; Sa, F.; Feng, Y.; Ge, Y.; Yang, S.; Liu, Y.; Xie, J.; Zhang, W.; et al. Effects of mating-type ratio imbalance on the degeneration of Cordyceps militaris subculture and preventative measures. PeerJ 2024, 12, e17648. [Google Scholar] [CrossRef]
  33. Lou, H.; Lin, J.; Guo, L.; Wang, X.; Tian, S.; Liu, C.; Zhao, Y.; Zhao, R. Advances in research on Cordyceps militaris degeneration. Appl. Microbiol. Biotechnol. 2019, 103, 7835–7841. [Google Scholar]
  34. Yong, T.; Zhang, M.; Chen, D.; Shuai, O.; Chen, S.; Su, J.; Jiao, C.; Feng, D.; Xie, Y. Actions of water extract from Cordyceps militaris in hyperuricemic mice induced by potassium oxonate combined with hypoxanthine. J. Ethnopharmacol. 2016, 194, 403–411. [Google Scholar] [CrossRef]
  35. Yong, T.; Chen, S.; Xie, Y.; Chen, D.; Su, J.; Shuai, O.; Jiao, C.; Zuo, D. Cordycepin, a characteristic bioactive constituent in Cordyceps militaris, ameliorates hyperuricemia through URAT1 in hyperuricemic mice. Front. Microbiol. 2018, 9, 58. [Google Scholar] [CrossRef] [PubMed]
  36. Chai, L.; Li, J.; Guo, L.; Zhang, S.; Chen, F.; Zhu, W.; Li, Y. Genomic and transcriptome analysis reveals the biosynthesis network of cordycepin in Cordyceps militaris. Genes 2024, 15, 626. [Google Scholar] [CrossRef] [PubMed]
  37. Dalbeth, N.; Gosling, A.L.; Gaffo, A.; Abhishek, A. Gout. Lancet 2021, 397, 1843–1855. [Google Scholar] [CrossRef]
  38. Zhao, Z.; Luo, J.; Liao, H.; Zheng, F.; Chen, X.; Luo, J.; Chen, Y.; Zhao, K.; Zhang, S.; Tian, J.; et al. Pharmacological evaluation of a novel skeleton compound isobavachin (4′,7-dihydroxy-8-prenylflavanone) as a hypouricemic agent: Dual actions of URAT1/GLUT9 and xanthine oxidase inhibitory activity. Bioorg. Chem. 2023, 133, 106405. [Google Scholar] [CrossRef]
  39. Shi, Y.; Wei, F.; Wang, G.-L.; Ma, S.-C.; Lin, R.-C. Identification of geographical origins of Cordyceps based on data of amino acids with self-organizing map neural network. Zhongguo Zhong Yao Za Zhi 2021, 46, 4765–4773. [Google Scholar]
  40. Zeb, U.; Aziz, T.; Azizullah, A.; Zan, X.Y.; Khan, A.A.; Bacha, S.A.S.; Cui, F.J. Complete mitochondrial genomes of edible mushrooms: Features, evolution, and phylogeny. Physiol. Plant 2024, 176, e14363. [Google Scholar] [CrossRef]
  41. Chiu, C.-P.; Hwang, T.-L.; Chan, Y.; El-Shazly, M.; Wu, T.-Y.; Lo, I.-W.; Hsu, Y.-M.; Lai, K.-H.; Hou, M.-F.; Yuan, S.-S.; et al. Research and development of Cordyceps in Taiwan. Food Sci. Hum. Wellness 2016, 5, 177–185. [Google Scholar]
  42. Fu, X.; Wong, K.K.; Tseng, Y. Editorial: A new frontier for traditional medicine research-multi-omics approaches. Front. Pharmacol. 2023, 14, 1203097. [Google Scholar] [CrossRef]
  43. Van der Auwera, G.A.; Carneiro, M.; Hartl, C.; Poplin, R.; del Angel, G.; Levy-Moonshine, A.; Jordan, T.; Shakir, K.; Roazen, D.; Thibault, J.; et al. From fastq data to high-confidence variant calls: The genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinform. 2013, 43, 11.10.1–11.10.33. [Google Scholar] [CrossRef]
  44. Cingolani, P.; Platts, A.; Wang, L.L.; Coon, M.; Nguyen, T.; Wang, L.; Land, S.J.; Lu, X.; Ruden, D.M. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 2012, 6, 80–92. [Google Scholar] [CrossRef]
  45. Wang, K.; Li, M.; Hakonarson, H. ANNOVAR: Functional annotation of genetic variants from next-generation sequencing data. Nucleic Acids Res. 2010, 38, e164. [Google Scholar] [CrossRef] [PubMed]
  46. Alexander, D.H.; Novembre, J.; Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009, 19, 1655–1664. [Google Scholar] [PubMed]
  47. Meisner, J.; Albrechtsen, A. Inferring population structure and admixture proportions in low-depth NGS data. Genetics 2018, 210, 719–731. [Google Scholar] [CrossRef]
  48. Wang, Y.; Obbard, D.J. Experimental estimates of germline mutation rate in eukaryotes: A phylogenetic meta-analysis. Evol. Lett. 2023, 7, 216–226. [Google Scholar]
  49. Korneliussen, T.S.; Albrechtsen, A.; Nielsen, R. ANGSD: Analysis of next generation sequencing data. BMC Bioinform. 2014, 15, 356. [Google Scholar]
  50. Kofler, R.; Schlötterer, C. Gowinda: Unbiased analysis of gene set enrichment for genome-wide association studies. Bioinformatics 2012, 28, 2084–2085. [Google Scholar]
  51. Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T.; et al. The variant call format and VCFtools. Bioinformatics 2011, 27, 2156–2158. [Google Scholar] [CrossRef]
  52. Baek, M.; DiMaio, F.; Anishchenko, I.; Dauparas, J.; Ovchinnikov, S.; Lee, G.R.; Wang, J.; Cong, Q.; Kinch, L.N.; Schaeffer, R.D.; et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871–876. [Google Scholar] [CrossRef]
  53. Van Kempen, M.; Kim, S.S.; Tumescheit, C.; Mirdita, M.; Lee, J.; Gilchrist, C.L.M.; Söding, J.; Steinegger, M. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 2024, 42, 243–246. [Google Scholar]
  54. Wang, J.; Feng, L.; Mu, S.; Dong, A.; Gan, J.; Wen, Z.; Meng, J.; Li, M.; Wu, R.; Sun, L. Asymptotic tests for Hardy–Weinberg equilibrium in hexaploids. Hort. Res. 2022, 9, uhac104. [Google Scholar] [CrossRef]
  55. Korneliussen, T.S.; Moltke, I.; Albrechtsen, A.; Nielsen, R. CalcuFlation of Tajima’s D and other neutrality test statistics from low depth next-generation sequencing data. BMC Bioinform. 2013, 14, 289. [Google Scholar]
  56. Tavares, H.; Whibley, A.; Field, D.L.; Bradley, D.; Couchman, M.; Copsey, L.; Elleouet, J.; Burrus, M.; Andalo, C.; Li, M.; et al. Selection and gene flow shape genomic islands that control floral guides. Proc. Natl. Acad. Sci. USA 2018, 115, 11006–11011. [Google Scholar] [CrossRef] [PubMed]
  57. Malinsky, M.; Challis, R.J.; Tyers, A.M.; Schiffels, S.; Terai, Y.; Ngatunga, B.P.; Miska, E.A.; Durbin, R.; Genner, M.J.; Turner, G.F. Genomic islands of speciation separate cichlid ecomorphs in an East African crater lake. Science 2015, 350, 1493–1498. [Google Scholar]
  58. Hu, J.; Sun, Y.; Li, B.; Liu, Z.; Wang, Z.; Gao, Q.; Guo, M.; Liu, G.; Zhao, K.T.; Gao, C. Strand-preferred base editing of organellar and nuclear genomes using CyDENT. Nat. Biotechnol. 2024, 42, 936–945. [Google Scholar]
  59. Marei, H.; Tsai, W.-T.K.; Kee, Y.-S.; Ruiz, K.; He, J.; Cox, C.; Sun, F.T.; Penikalapati, S.; Dwivedi, P.; Choi, M.; et al. Antibody targeting of E3 ubiquitin ligases for receptor degradation. Nature 2022, 610, 182–189. [Google Scholar]
  60. Wang, J.; Lisanza, S.; Juergens, D.; Tischer, D.; Watson, J.L.; Castro, K.M.; Ragotte, R.; Saragovi, A.; Milles, L.F.; Baek, M.; et al. Scaffolding protein functional sites using deep learning. Science 2022, 377, 387–394. [Google Scholar]
  61. Mosalaganti, S.; Obarska-Kosinska, A.; Siggel, M.; Taniguchi, R.; Turoňová, B.; Zimmerli, C.E.; Buczak, K.; Schmidt, F.H.; Margiotta, E.; Mackmull, M.-T.; et al. AI-based structure prediction empowers integrative structural analysis of human nuclear pores. Science 2022, 376, eabm9506. [Google Scholar]
Figure 1. Collected C. militaris strains, locations, temperature zones, and population structure. (a) Workflow of resequencing of C. militaris and the key protein clustering of cytosine deaminase based on AI-predicted structures. The structures of annotated and polymorphic domain sequences were predicted by AI and subsequently clustered based on structural similarities. (b) C. militaris strains were collected from various researches for locations across China. These included the strains A160421, A150549, M150611, W141904, W180129, A200293, A200332, W180100, W180099, W180091, W180103, S180104, A200181, A200168, A200190, W141451, E141314, W141436, D160246, E141315, E160329, W141434, E151091, W141456, W141457, E151088, E141313, D160278, W141449, W141458, W141432, E141327, and W141433. (c) The temperature zones distributed in China: A, tropical zone; B, subtropical zone; C, warm-temperate zone; D, mid-temperate zone; E, frigid zone; F, high plateau zone. The three populations of C. militaris strains collected for this research were mainly located in middle temperate (D), warm temperate (C), and subtropical (B) zones. (d) Cross validation error (CV-error) when K varies from 1 to 19 across the three populations. (e) Population structure as populations were set at 2 for all strains, in which brown and blue represent the two ancestries.
Figure 1. Collected C. militaris strains, locations, temperature zones, and population structure. (a) Workflow of resequencing of C. militaris and the key protein clustering of cytosine deaminase based on AI-predicted structures. The structures of annotated and polymorphic domain sequences were predicted by AI and subsequently clustered based on structural similarities. (b) C. militaris strains were collected from various researches for locations across China. These included the strains A160421, A150549, M150611, W141904, W180129, A200293, A200332, W180100, W180099, W180091, W180103, S180104, A200181, A200168, A200190, W141451, E141314, W141436, D160246, E141315, E160329, W141434, E151091, W141456, W141457, E151088, E141313, D160278, W141449, W141458, W141432, E141327, and W141433. (c) The temperature zones distributed in China: A, tropical zone; B, subtropical zone; C, warm-temperate zone; D, mid-temperate zone; E, frigid zone; F, high plateau zone. The three populations of C. militaris strains collected for this research were mainly located in middle temperate (D), warm temperate (C), and subtropical (B) zones. (d) Cross validation error (CV-error) when K varies from 1 to 19 across the three populations. (e) Population structure as populations were set at 2 for all strains, in which brown and blue represent the two ancestries.
Agriculture 15 00686 g001
Figure 2. Population structure and demographic history of the C. militaris strains. (a) Principal component analysis (PCA) based on genome-wide single nucleotide polymorphisms (SNPs) is shown with the first three components (PCs). (b) Maximum likelihood tree based on SNP data. (c) Changes in the historic effective population size (Ne) of the C. militaris strains from the three temperate zones, estimated by multiple sequentially Markovian coalescent (MSMC2). (d) Divergence processes for pairs of C. militaris populations from the three temperate zones, inferred by MSMC2.
Figure 2. Population structure and demographic history of the C. militaris strains. (a) Principal component analysis (PCA) based on genome-wide single nucleotide polymorphisms (SNPs) is shown with the first three components (PCs). (b) Maximum likelihood tree based on SNP data. (c) Changes in the historic effective population size (Ne) of the C. militaris strains from the three temperate zones, estimated by multiple sequentially Markovian coalescent (MSMC2). (d) Divergence processes for pairs of C. militaris populations from the three temperate zones, inferred by MSMC2.
Agriculture 15 00686 g002
Figure 3. LD decay, SNP distributions and genome-wide differentiations occurred for or among the three populations of C. militaris. (a) LD decay plot for each population. SNP distance is on the X axis and r2 on the Y axis. LD decay was at the SNP distance when r2 decayed by half. (b) SNP distributions at each chromosome. (c) Hardy–Weinberg. (d) Pi. (e) Tajima’s D. (f) Population differentiation (Fst) for warm versus subtropical temperate zone in 10 kb nonoverlapping windows. (g) Population differentiation (Fst) for warm versus middle temperate zone. (h) Population differentiation (Fst) for subtropical versus middle temperate zone. Different colors represent alternative chromosomes, and Fst > 0.8 values were ascribed as Fst islands.
Figure 3. LD decay, SNP distributions and genome-wide differentiations occurred for or among the three populations of C. militaris. (a) LD decay plot for each population. SNP distance is on the X axis and r2 on the Y axis. LD decay was at the SNP distance when r2 decayed by half. (b) SNP distributions at each chromosome. (c) Hardy–Weinberg. (d) Pi. (e) Tajima’s D. (f) Population differentiation (Fst) for warm versus subtropical temperate zone in 10 kb nonoverlapping windows. (g) Population differentiation (Fst) for warm versus middle temperate zone. (h) Population differentiation (Fst) for subtropical versus middle temperate zone. Different colors represent alternative chromosomes, and Fst > 0.8 values were ascribed as Fst islands.
Agriculture 15 00686 g003
Figure 4. Correlations of diversity parameters and AI-assisted prediction and clustering of cytosine deaminases of collected C. militaris strains. (a) Fst and Pi correlation. The abscissa coordinate of the middle scatter plot represents Pi ratio distribution and vertical coordinate Fst, in which blue scatters represent the Pi over 0.95 or below 0.05, and blue area was the selected; the right plot represents the distribution of Fst, wherein orange area was over 0.95; the above plot indicates the Pi ratio distribution, in which the blue area was below 0.05. (b) Pi and Tajima’s D correlation. The abscissa coordinate of the middle scatter plot represents Pi distribution, and vertical coordinate Tajima’s D, in which blue scatters represent the Tajima’s D and Pi below 0.05, green scatters Tajima’s D over 0.95 and Pi below 0.05, and blue or green areas were the selected; the right plot represents Tajima’s D distribution, wherein the green area was below 0.05 and blue over 0.95; the above plot represents Pi distribution, in which the green area was below 0.05. (c) Representative predicted structures for eight deaminase clades. (d) Re-sequencing and AI-assisted structural predictions and alignments establish a new protein classification and functional mining method, providing a suite of cytosine deaminases of various activities and single- and double-stranded functions, potentially showing great probabilities as customed base editors, as therapeutics, or as tools for breeding novel species.
Figure 4. Correlations of diversity parameters and AI-assisted prediction and clustering of cytosine deaminases of collected C. militaris strains. (a) Fst and Pi correlation. The abscissa coordinate of the middle scatter plot represents Pi ratio distribution and vertical coordinate Fst, in which blue scatters represent the Pi over 0.95 or below 0.05, and blue area was the selected; the right plot represents the distribution of Fst, wherein orange area was over 0.95; the above plot indicates the Pi ratio distribution, in which the blue area was below 0.05. (b) Pi and Tajima’s D correlation. The abscissa coordinate of the middle scatter plot represents Pi distribution, and vertical coordinate Tajima’s D, in which blue scatters represent the Tajima’s D and Pi below 0.05, green scatters Tajima’s D over 0.95 and Pi below 0.05, and blue or green areas were the selected; the right plot represents Tajima’s D distribution, wherein the green area was below 0.05 and blue over 0.95; the above plot represents Pi distribution, in which the green area was below 0.05. (c) Representative predicted structures for eight deaminase clades. (d) Re-sequencing and AI-assisted structural predictions and alignments establish a new protein classification and functional mining method, providing a suite of cytosine deaminases of various activities and single- and double-stranded functions, potentially showing great probabilities as customed base editors, as therapeutics, or as tools for breeding novel species.
Agriculture 15 00686 g004
Table 1. Cordyceps militaris strains collected and their locations.
Table 1. Cordyceps militaris strains collected and their locations.
Group and Temperate ZoneSeq. No.Strain No.Collection LocationLatitude LongitudeCollection Date
subtropical temperate zoneS1A160421Xingdou Moutain, Hubei30°01′29″109°06′10″23 September 2016
S10A150549Huangsang, Hunan26°28′21″110°06′23″13 May 2015
S17M150611Tiantangzhai, Anhui31°07′19″115°54′58″12 August 2015
S29W141904Hailuogou, Sichuan29°34′27″101°59′57″17 September 2014
S32W180129Qingliang Peak, Zhejiang30°06′39″118°53′28″07 November 2018
S58A200293Tianma, Anhui31°09′32″115°45′57″14 September 2020
S59A200332Tianma, Anhui31°17′32″115°41′07″16 September 2020
warm temperate zoneS6W180100Yaoxiang, Shandong36°19′43″117°07′15″15 August 2018
S15W180099Yaoxiang, Shandong36°19′43″117°07′15″15 August 2018
S24W180091Yaoxiang, Shandong36°19′43″117°07′15″15 August 2018
S27W180103Yaoxiang, Shandong36°19′43″117°07′15″15 August 2018
S28S180104Yaoxiang, Shandong36°19′35″117°06′56″15 August 2018
S48A200181Yaoxiang, Shandong36°19′38″117°07′22″18 August 2020
S50A200168Yaoxiang, Shandong36°19′45″117°07′16″18 August 2020
S54A200190Yaoxiang, Shandong36°19′49″117°06′58″18 August 2020
middle temperate zoneS3W141451Zuojia, Jilin44°04′56″126°04′17″12 August 2014
S4E141314Zuojia, Jilin44°04′43″126°04′12″12 August 2014
S5W141436Zuojia, Jilin44°04′44″126°04′13″12 August 2014
S8D160246Zuojia, Jilin44°04′45″126°04′14″09 August 2016
S11E141315Zuojia, Jilin44°04′43″126°04′12″12 August 2014
S12E160329Jingyue Lake, Jilin43°46′32″125°27′50″10 August 2016
S14W141434Zuojia, Jilin44°04′44″126°04′13″12 August 2014
S16E151091Zuojia, Jilin44°04′46″126°04′30″29 August 2015
S18W141456Zuojia, Jilin44°05′04″126°04′14″12 August 2014
S20W141457Zuojia, Jilin44°05′04″126°04′14″12 August 2014
S22E151088Zuojia, Jilin44°04′52″126°04′35″29 August 2015
S23E141313Zuojia, Jilin44°04′43″126°04′12″12 August 2014
S25D160278Jingyue Lake, Jilin43°46′41″125°28′02″10 August 2016
S26W141449Zuojia, Jilin44°04′53″126°04′15″12 August 2014
S30W141458Zuojia, Jilin44°05′04″126°04′14″12 August 2014
S33W141432Zuojia, Jilin44°04′44″126°04′13″12 August 2014
S40E141327Zuojia, Jilin44°04′56″126°04′18″12 August 2014
S43W141433Zuojia, Jilin44°04′44″126°04′13″12 August 2014
Table 2. Parameters of population genetics of the collected C. militaris strains.
Table 2. Parameters of population genetics of the collected C. militaris strains.
Pop ID aPrivate bNum InDv cObs Het dObs Hom eExp Het fExp Hom gPi h
Middle21,786180.0680.9320.2530.7470.261
Subtropical12,92480.0410.9590.2580.7420.275
Warm443370.1040.8960.2130.7870.229
a Pop ID, population marker. b Private, featured SNP number of specific population. c Num Indv, average individual number of each locus included in population. d Obs Het, observed heterozygosity. e Obs Hom, observed homozygosity. f Exp Het, expected heterozygosity. g Exp Hom, expected homozygosity. h Pi, nucleotide polymorphisms.
Table 3. Genetic differentiation between populations collected.
Table 3. Genetic differentiation between populations collected.
FstaWarm Temperate ZoneMiddle Temperate Zone
Subtropical temperate zone0.1090.083
Warm temperate zone 0.092
a Genetic differentiation between populations is represented by Fst.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yong, T.; Liu, Y.; Cai, M.; Zhuo, L.; Wu, X.; Guo, H.; Hu, H.; Gao, Y.; Chen, S.; Xie, Y.; et al. Genomic Inference Unveils Population Bottlenecks and a North-to-South Migration Pattern of Wild Cordyceps militaris Across China. Agriculture 2025, 15, 686. https://doi.org/10.3390/agriculture15070686

AMA Style

Yong T, Liu Y, Cai M, Zhuo L, Wu X, Guo H, Hu H, Gao Y, Chen S, Xie Y, et al. Genomic Inference Unveils Population Bottlenecks and a North-to-South Migration Pattern of Wild Cordyceps militaris Across China. Agriculture. 2025; 15(7):686. https://doi.org/10.3390/agriculture15070686

Chicago/Turabian Style

Yong, Tianqiao, Yuanchao Liu, Manjun Cai, Lijun Zhuo, Xiaoxian Wu, Huiyang Guo, Huiping Hu, Yichuang Gao, Shaodan Chen, Yizhen Xie, and et al. 2025. "Genomic Inference Unveils Population Bottlenecks and a North-to-South Migration Pattern of Wild Cordyceps militaris Across China" Agriculture 15, no. 7: 686. https://doi.org/10.3390/agriculture15070686

APA Style

Yong, T., Liu, Y., Cai, M., Zhuo, L., Wu, X., Guo, H., Hu, H., Gao, Y., Chen, S., Xie, Y., & Zhong, W. (2025). Genomic Inference Unveils Population Bottlenecks and a North-to-South Migration Pattern of Wild Cordyceps militaris Across China. Agriculture, 15(7), 686. https://doi.org/10.3390/agriculture15070686

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop