Population Genetic Structure of Chlorops oryzae (Diptera, Chloropidae) in China

Simple Summary Recently, Chlorops oryzae has become one of the major pests of rice in some regions, which has caused serious economic losses. To understand the genetic mechanisms of frequent local outbreaks and population expansion of C. oryzae, we analyzed the population genetic structure using two molecular markers, COI and ITS1 sequences. The results indicated that the C. oryzae populations experienced rapid expansion after a “Bottleneck effect” and the local outbreaks were probably caused by frequent gene flow among populations. Abstract Frequent outbreaks have made Chlorops oryzae one of the major pests of rice in some regions. In order to understand the ecological adaptation of C. oryzae at the molecular level, and provide a scientific basis for formulating management strategies, we used two molecular markers, COI and ITS1 sequences, to systematically analyze the genetic structure of 31 populations. The higher haplotype diversity and lower nucleotide diversity indicated that the C. oryzae populations experienced rapid expansion after a “Bottleneck effect”. The results of the mismatch distribution, neutrality test (Fu’s Fs < 0, p < 0.001), and haplotype network analysis suggested that the population has recently undergone an expansion. Although genetic differentiation among C. oryzae populations was found to have existed at low/medium levels (Fst: 0.183 for COI, 0.065 for ITS1), the frequent gene flow presented as well (Nm: 2.23 for COI, 3.60 for ITS1) was supposed to be responsible for frequent local outbreaks.


Introduction
Population genetic structure, the most basic genetic information of a species, represents the amount and distribution of genetic variation within and among populations. Moreover, it is the accumulation of the evolutionary history and the basis for the development of future evolutionary adaptations of a species [1,2]. Population genetic structure is usually indicated with genetic diversity, genetic differentiation, and genetic distance [3]. Analysis of population genetic structure is conducive to revealing the extent and pattern of gene flow and establishing phylogenetic relationships among populations, thereby contributing to understanding population dynamics, occurrence trends, and genetic relationships among populations [4,5].
Molecular markers are a direct reflection of genetic diversity, which is a very effective tool to study the genetic structure of species [6]. Presently, molecular markers, such as mitochondrial DNA (mtDNA) [7], ribosomal DNA (rDNA) [8], and microsatellites [9], have been widely used in the analysis of the genetic structure of insect populations. As one of the classical molecular markers, mtDNA supports a better understanding of the process of population dispersal and evolution because of its small size, high level of mutations, strict adherence to matrilineal inheritance, lack of introns, and recombination characteristics [10]. Particularly, the mtDNA cytochrome oxidase I (COI) gene has been widely used owing Insects 2022, 13, 327 2 of 14 to its moderate evolutionary rate and readily available advantages [11][12][13][14]. However, the genetic diversity and evolutionary history of mtDNA are not necessarily identical to, or representative of, the organism due to potential introgression, lower effective population size, and potential selection. In light of this, combining mtDNA and other markers can greatly improve the reliability of results [15]. For example, the population dispersal pathways and timing of Meromyza saltatrix were studied in connection with the COI gene and morphological features [16,17]. Moreover, nuclear genes (nDNA) which have genetic information that mtDNA lacks are able to play a complementary role to mtDNA. The rDNA internal transcribed spacer 1 (ITS1), a non-coding sequence located between 18S and 5.8S rDNA with high nucleotide polymorphism in most eukaryotes, has also been widely used as markers to study population genetic structure [18]. For example, the genetic structure of 13 geographic populations of Melipona subnitida in northeastern Brazil was analyzed using ITS1 sequences [19,20]. The analysis of genetic diversity of invasive species Halyomorpha halys in North America and Europe using ITS1 and COI markers suggested that the joint use of ITS1 and COI could improve the accuracy of detection of the source areas of an invasion [21].
Chlorops oryzae (Diptera, Chloropidae), an important rice pest, is widely distributed in Asia, such as China, Japan, and Korea [22]. The larvae of C. oryzae bore into the stem and move to the growing point, where they feed on the developing leaves and young panicles, inhibiting the effectiveness of chemical insecticide [23,24]. In recent years, with changes in agroecology, cultivation and farming system, climate, and control agents, the C. oryzae occurrence areas have been expanding, and the damage levels are geographically different [25]. The frequent outbreaks have made C. oryzae one of the major pests of rice in some areas [26], causing yield losses of 20-50% [27]. In China, C. oryzae occurs 2-5 generations per year in different regions, mainly dependent on diapause induction and duration [23]. Presently, studies on C. oryzae are still limited and have mainly focused on physiology and ecology [28,29]. Till recently, Zhou et al. [30] used COI and ISSR markers to analyze its genetic structure and speculated that frequent gene flow in C. oryzae populations was responsible for outbreaks. However, the genetic mechanisms underlying different occurrence levels and gradual geographic expansions are still unknown.
In this study, we used two molecular markers, COI and ITS1 sequences, to systematically analyze the genetic structure of 31 geographic populations collected from the main distribution areas of C. oryzae in China. The results of this study will contribute to understanding the ecological adaptation of C. oryzae at the molecular level, thereby providing a scientific basis for formulating management strategies.

Sample Collection and DNA Extraction
Samples of 26 C. oryzae populations were collected from Guizhou, Chongqing, and Sichuan Provinces, China, from May 2020-August 2021 ( Figure 1 and Table S1). All samples were soaked in 75% ethanol and stored at −20 • C until total DNA extraction. Genomic DNA was extracted according to the method of DeBarro and Driver with minor modification [31]. Briefly, samples were soaked in deionized water to remove alcohol prior to extraction. Individual samples were grounded thoroughly in centrifuge tubes with 30 µL of the lysis buffer (50 mM KCl, 10 mM Tris pH 8.4, 0.45% Tween 20, 0.2% gelatin, 0.45% NP40, 60 µg/mL proteinase K) to form a homogenate, which was incubated at 65 • C for 30 min and then boiled for 10 min to inactivate the proteinase K. Total DNA extraction were stored at −20 • C for subsequent analysis. Populations data for TY, ZZ, LH, XT, HX, and ZJ were obtained from Zhou et al. [30] and not marked in the figure.

Data Analysis
The sequencing data were edited using SnapGene v.4.2 [32]. All data processing, including basic statistics and calculation of inter-and intraspecific genetic distances (Kimura 2-parameter model for COI and Tamura 3-parameter model for ITS1) and transition/transversion (ts/tv) ratio, were performed using MEGA v.7.0 [33].

PCR Amplification and Sequencing
The genomic DNA extracted from C. oryzae was used as a template for COI and ITS1 amplification. The COI fragments were amplified with specific primers COI F (5 -CTA GGT GCT CCA GAT ATA GCA TTT C-3 ) and COI R (5 -GGC TAA AAC AAC TCC TGT TAA TCC-3 ) [30]. The ITS1 fragments were amplified with primers ITS1 F (5 -CGC ATT ATG TGT TAC GGA TGT T-3 ) and ITS1 R (5 -GGT TGC GAA TGT CTC TAA TTC-3 ). PCR was performed in 30 µL volumes comprised of 15 µL 2 × Taq PCR MasterMix (Biomed, Beijing, China), 1 µL of each primer (10 mmol/L), 1 µL of template DNA solution, and 12 µL double distilled water. Amplifications were conducted as follows: 34 cycles of denaturation at 94 • C for 30 s, annealing at 55 • C for 30 s, and extension at 72 • C for 1 min. All PCR products were checked by electrophoresis on a 1% agarose gel and bi-directionally sequenced by Sangon Biotech (Shanghai, China). Sequences were deposited in the GenBank under accession numbers OM490688-OM491162 for COI and OM540945-OM541304 for ITS1. COI sequence data of six populations (Hunan: TY, ZZ, LH, XT; Guizhou: HX; Zhejiang: ZJ) were kindly provided by Zhou et al. [30].

Data Analysis
The sequencing data were edited using SnapGene v.4.2 [32]. All data processing, including basic statistics and calculation of inter-and intraspecific genetic distances (Kimura 2-parameter model for COI and Tamura 3-parameter model for ITS1) and transition/ transversion (ts/tv) ratio, were performed using MEGA v.7.0 [33].
The number of haplotypes (h), haplotype diversity (Hd), average number of nucleotide differences (k), nucleotide diversity (Pi), and haplotype analysis were calculated by DnaSP v.5.10 [34]. Neutrality tests (Fu's Fs [35] and Tajima's D [36]), F-Statistics (Fst) (Bonferroni correction for significance), gene flow (Nm) (COI: Nm = (1 -Fst)/2Fst; ITS1: Nm = (1 -Fst)/4Fst), and analysis of molecular variance (AMOVA) were performed using Arlequin v.3.15 [37]. The distribution of pairwise differences between individual sequences was analyzed by mismatch distribution analysis using DnaSP. In addition, the statistics of the raggedness (rg) index of the observed distribution and the sum of square deviations (SSD) between the observed and the expected mismatch were also calculated using Arlequin based on the spatial expansion model. The statistical significance of variance components in Arlequin was tested with 1000 permutations.
Geographical distances among populations were calculated using MapInfo Professional v.8.5 (Table S2) [38]. Mantel test was conducted by NTsyspc v.2.1 for the natural logarithm of interspecific genetic distance (Fst/(1 -Fst)) and geographic distance [39]. The haplotype networks were constructed using the median-joining method in software Network v.10.2 [40]. Compared with traditional phylogenetic trees, haplotype network diagrams can better reveal the genealogical relationships between conspecifics.

Base Composition and Gene Mutation
A total of 598 COI sequences (475 obtained in this study and 123 from Zhou et al. [30]) representing 31 populations and 360 ITS1 sequences representing 26 populations were used for subsequent analysis (Table S1). The final aligned COI sequence fragments were 720 bp and all alignments were unambiguous, with no insertions or deletions. The average nucleotide composition was T = 36.4%, C = 17.4%, A = 29.6%, and G = 16.6%, showing an obvious AT bias (66.0%). In total, 57 polymorphic sites (7.92%) were detected in all COI sequences, including 34 singleton variable sites and 23 parsimony informative sites. There were 47 transitions and 10 transversions, and the overall ts/tv bias was 6.934. The ts/tv rate ratio was observed to be higher with purines (20.78) than pyrimidines (11.08).
The final aligned ITS1 sequence fragments were 607 bp. The average nucleotide composition was T = 31.2%, C = 14.8%, A = 36.5%, G = 17.5% and A + T = 67.7%. All sequences had a total of 20 polymorphic sites (3.29%), including 10 single variable sites and 10 parsimony informative sites. There were 8 transitions and 12 transversions, and the overall ts/tv bias was 0.639. The ts/tv rate ratio was observed to be higher with purines (2.402) than pyrimidines (0.116).

Genetic Diversity
For COI analysis, the 31 C. oryzae populations had a total of 55 haplotypes, with an overall Hd of 0.346, k of 0.854, and Pi of 0.0012. The Hd, k, and Pi of each population ranged from 0.000-0.638, 0.000-3.985, and 0.0000-0.0052, respectively. LZ and NC populations had higher levels of genetic diversity (Hd > 0.5, Pi > 0.005) ( Table 1). For ITS1 analysis, the 26 C. oryzae populations had a total of 26 haplotypes, with an overall Hd of 0.750, k of 1.551, and Pi of 0.0026. The Hd, k, and Pi of each population ranged from 0.400-0.956, 0.400-2.911, and 0.0007-0.0048, respectively. Except for the YW and SZ populations, all populations had higher levels of Hd (Hd < 0.5) and lower levels of Pi (Pi < 0.005) ( Table 2).

Population Demographic History
When all samples were taken as one population, the neutrality test and mismatch analysis of the C. oryzae population were performed based on COI and ITS1 sequences. The Tajima's D and Fu's Fs values of the total population were all negative, and all Fu's Fs values were significant (p < 0.01) (Tables 1 and 2). Besides, the mismatch distribution of both markers in the total populations showed a single-peaked form, indicating that the population experienced expansion events. In addition, both the statistical reference SSD and rg values did not reach significant level, supporting the spatial expansion model (COI: SSD = 0.0020, p = 0.7000, rg = 0.2316, p = 0.8000; ITS1: SSD = 0.0006, p = 0.8730, rg = 0.0248, p = 0.9050) (Figure 2).

Genetic Differentiation
Genetic distances within and between populations were estimated based on COI and ITS1 sequences. For COI analysis, the inter-and intra-population genetic distance ranged from 0.0001-0.0060 and 0.0000-0.0056, respectively ( Figure 3A), indicating that genetic distances between populations were higher than those within populations. For ITS1 analysis, the two values ranged from 0.0010-0.0047 and 0.0007-0.0048, respectively ( Figure  3B).

Genetic Differentiation
Genetic distances within and between populations were estimated based on COI and ITS1 sequences. For COI analysis, the inter-and intra-population genetic distance ranged from 0.0001-0.0060 and 0.0000-0.0056, respectively ( Figure 3A), indicating that genetic distances between populations were higher than those within populations. For ITS1 analysis, the two values ranged from 0.0010-0.0047 and 0.0007-0.0048, respectively ( Figure 3B).
The results of AMOVA suggested that the genetic variation in C. oryzae populations was mainly from within populations, while less from among populations (Va < Vb, p < 0.001). Moreover, there was some degree of genetic variation within the overall populations (Fst > 0.05, p < 0.001) ( Table 3). Genetic distances within and between populations were estimated based on COI and ITS1 sequences. For COI analysis, the inter-and intra-population genetic distance ranged from 0.0001-0.0060 and 0.0000-0.0056, respectively ( Figure 3A), indicating that genetic distances between populations were higher than those within populations. For ITS1 analysis, the two values ranged from 0.0010-0.0047 and 0.0007-0.0048, respectively ( Figure  3B). The results of AMOVA suggested that the genetic variation in C. oryzae populations was mainly from within populations, while less from among populations (Va < Vb, p < 0.001). Moreover, there was some degree of genetic variation within the overall populations (Fst > 0.05, p < 0.001) ( Table 3). The Fst and Nm values between pairwise populations were calculated based on COI and ITS1 sequences (Tables 4 and 5). The Fst values ranged from 0.000-0.536 and 0.000-0.307 for COI and ITS1 sequences, respectively. For COI analysis, the genetic differentiation was mainly attributed to individuals in the NC population (Fst > 0.25, Nm < 1). A three-level AMOVA analysis of the NC population with the rest of the populations based on COI markers showed the presence of significant genetic differentiation (Fst = 0.69, p < 0.001).
The Mantel tests based on both markers did not support a significant correlation between geographic distance and genetic distance, thus excluding the effect of distance segregation on genetic differentiation (Figure 4).

R PEER REVIEW 12 of 16
The Mantel tests based on both markers did not support a significant correlation between geographic distance and genetic distance, thus excluding the effect of distance segregation on genetic differentiation (Figure 4).

Haplotype Network Analysis
To understand the relationships of identified haplotypes, the median-joining haplotype network was constructed ( Figure 5). Among the 55 COI haplotypes, H1 occupied the center of the network and was shared by all populations. H1 was also the most common haplotype, accounting for 80.6% of all samples. The remaining 54 haplotypes were distributed around H1 in a star pattern. H21 and H44 were far away from H1 and mainly shared by NC, LZ, and FL populations ( Figure 5A). Among the 26 ITS1 haplotypes, H1 was the ancestral haplotype shared by all populations and occupied a central position in

Haplotype Network Analysis
To understand the relationships of identified haplotypes, the median-joining haplotype network was constructed ( Figure 5). Among the 55 COI haplotypes, H1 occupied the center of the network and was shared by all populations. H1 was also the most common haplotype, accounting for 80.6% of all samples. The remaining 54 haplotypes were distributed around H1 in a star pattern. H21 and H44 were far away from H1 and mainly shared by NC, LZ, and FL populations ( Figure 5A). Among the 26 ITS1 haplotypes, H1 was the ancestral haplotype shared by all populations and occupied a central position in the network. The four haplotypes, H3, H6, H9, and H10, were derived from H1 and collectively accounted for 37.5% of all samples ( Figure 5B).  (1 -Fst)) and logarithm geographic distance of C. oryzae populations based on COI sequence (A) and ITS1 sequence (B).

Haplotype Network Analysis
To understand the relationships of identified haplotypes, the median-joining haplotype network was constructed ( Figure 5). Among the 55 COI haplotypes, H1 occupied the center of the network and was shared by all populations. H1 was also the most common haplotype, accounting for 80.6% of all samples. The remaining 54 haplotypes were distributed around H1 in a star pattern. H21 and H44 were far away from H1 and mainly shared by NC, LZ, and FL populations ( Figure 5A). Among the 26 ITS1 haplotypes, H1 was the ancestral haplotype shared by all populations and occupied a central position in the network. The four haplotypes, H3, H6, H9, and H10, were derived from H1 and collectively accounted for 37.5% of all samples ( Figure 5B).

Figure 5.
Median-joining network of haplotypes for C. oryzae based on COI haplotypes (A) and ITS1 haplotypes (B). Each circle represents a haplotype, and the area of a circle is proportional to the number of individuals with that haplotype. Colors within nodes refer to C. oryzae sampling regions. Figure 5. Median-joining network of haplotypes for C. oryzae based on COI haplotypes (A) and ITS1 haplotypes (B). Each circle represents a haplotype, and the area of a circle is proportional to the number of individuals with that haplotype. Colors within nodes refer to C. oryzae sampling regions.

Discussion
Genetic diversity, which refers to the sum of genetic variation among populations within a species or individuals within a population [1], is a fundamental guarantee for maintaining species evolution [2]. Genetic diversity is caused by the variation of genetic material and is influenced by mutation rate, effective population size, gene flow, and other factors [41]. A population with rich genetic diversity often possesses a strong adaptability to the environment, thus facilitating population outbreaks and their large-scale spread.
Haplotype diversity and nucleotide diversity are the main indicators of genetic diversity. In this study, the ITS1 analysis presented that C. oryzae populations had higher haplotype diversity (Hd > 0.5) and lower nucleotide diversity (Pi < 0.005), suggesting that the population experienced a recent "Bottleneck effect" followed by a short period of rapid population expansion [42]. Meanwhile, the COI analysis revealed low haplotype diversity, probably due to different genetic patterns of molecular markers [43]. This is consistent with the finding that ITS1 possesses greater genetic diversity than COI in Halyomorpha halys [21].
Historical population demographics is one of the core elements of molecular phylogeography. The analysis of historical population demographics is in favor of understanding the effects of external environmental factors on population development and distribution, and also provides a reference for developing pest management strategies [44]. To this end, neutrality tests (Tajima's D and Fu's Fs values) and mismatch distributions are commonly used [45,46]. In this study, when all samples were calculated as a population, the Tajima's D and Fu's Fs values were significantly negative except for Tajima's D for ITS1 data, indicating the expansion of the population size, which is consistent with the previous speculation [25]. Additionally, the extremely small Fu's Fs values (−3.40 × 10 38 ) implied that the population expansion occurred not so long ago as the Fu's Fs values are more sensitive to recent population expansion [35]. Likewise, the mismatch distribution showed population expansion as well because of the single-peaked curve, insignificant SSD values, and small rg values [46]. Moreover, the haplotype network for COI with a star-shaped distribution further supported speculation of population expansion [47].
Genetic distance, AMOVA [48], Fst value [49], and Nm value [50], are important indicators of the genetic differentiation of populations. The genetic distance between populations was generally close except for NC and LZ for COI, and the genetic variation was mainly from within the population revealed by AMOVA. Similarly, the Fst and Nm values between paired populations indicated that the genetic differentiation between populations was stemmed from a few populations such as NC. In fact, the population density of NC is relatively lower than that of other populations. Whether the worse performance of the NC population is correlated with its genetic background remains unknown and necessitates further investigation. Factors such as geographic isolation and farming patterns often affect the adaptation of populations to the environment and initiate genetic differentiation [51]. The Mantel test showed that there was no significant correlation between genetic distances and geographical distances, indicating that genetic differentiation of these populations is not caused by geographical isolation, but other factors, such as tillage practices and farmland landscape patterns [52].
It has been demonstrated that frequent gene flow can improve the population's adaptability to the environment and cause outbreaks of pests [53]. In this study, gene flow was found to have existed among C. oryzae populations, which was in line with the previous study [30]. However, the degree of gene flow varied remarkably with geographic populations, and it was probably related to the occurrence level of these populations. For instance, QJ and YY populations that showed intensive gene flow performed much better than other populations, while NC and GY populations which showed restricted gene flow occurred lightly.
The genetic structure of C. oryzae has been analyzed previously by Zhou et al. [30]. Likewise, we analyzed the genetic structure of C. oryzae as well, but using more geographic populations collected from a larger area, within which the ecological environment is more diverse and the annual occurrence generation of C. oryzae changes accordingly [27]. Both studies revealed that C. oryzae populations have low or medium levels of genetic differentiation and experienced recent expansion events. However, disparities in the genetic diversity and experience of the "Bottleneck effect" were presented between two studies, probably due to the differences in sample size, molecular markers, and sampling locations.
The "Bottleneck effect" refers to the dramatic variation of genetic structure owing to the sharp reduction in population size caused by deteriorated environmental conditions, such as farmland ecology and pesticide application levels [54]. For the past several decades, C. oryzae has been subjected to highly toxic pesticides, such as furadan, oxamyl, and triazophos [25]. We, therefore, speculated that the "Bottleneck effect" of C. oryzae might be caused by the abundant use of these pesticides. In addition, the recent expansion of C. oryzae may be related to changes in factors such as agroecological environments, tillage and cultivation systems, winter temperatures, and control agents [27].
In the future, the investigations of the relationship between the population dynamics of C. oryzae and farmland environments and farming practices may elucidate the causes of the "Bottleneck effect", genetic differentiation, population expansion, and frequent outbreaks, thereby providing a theoretical basis for formulating management strategies.

Conclusions
This study showed that C. oryzae populations suffered from a recent "Bottleneck effect", followed by a rapid expansion. We also speculated that genetic differentiation and gene flow among populations are responsible for the geographical differences in the occurrence level.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/insects13040327/s1, Table S1: Information on the C. oryzae samples used in this study; Table S2. Geographical distance (km) among C. oryzae populations.
Author Contributions: Methodology, validation, formal analysis, writing-review and editing, X.L. and J.W.; software, data curation, visualization, writing-original draft preparation, X.L.; investigation, X.L., S.W. and Y.X.; conceptualization, resources, supervision, project administration, funding acquisition, J.W. and Y.L. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: Sequences used in this study has been deposited in GenBank under accessions number OM490688-OM491162 for COI, and OM540945-OM541304 for ITS1.