Population Genetics for Inferring Introduction Sources of the Oriental Fruit Fly, Bactrocera dorsalis: A Test for Quarantine Use in Korea

Simple Summary The more the global agricultural product trade becomes active every year, the more foreign pests’ invasion probability increases. Accordingly, many notorious invasive pests are spreading worldwide, and the nations should try to block their introduction through quarantine systems. As an important quarantine pest, the oriental fruit fly, Bactrocera dorsalis (Hendel) (Diptera: Tephritidae), is one of the most destructive pest insects for fruit crops in tropical and subtropical areas. This species is a highly invasive and economically important pest with a broad host range. Here, we collected 40 geographically or temporally different collections from 12 Asian countries, including four from the Korean border quarantine detection, and performed haplotype analysis and population genetics analysis. Abstract To infer the introduction sources of the oriental fruit fly, Bactrocera dorsalis, we used a mitochondrial marker to reconstruct the haplotype network and 15 microsatellite loci to reveal genetic structure and relationships between the geographically or temporally different collections from Asia. We performed Approximate Bayesian computations to infer a global origin and a source of the quarantine collections found in Korea. As a result, the 40 populations were divided into three groups, of which genetic similarity is not related to the geographic vicinity. Korean samples had a similar genetic structure to Taiwan and Thailand ones. Our results suggest that the place of origin of the B. dorsalis specimens found in Korea’s border quarantine is likely to be Taiwan or Thailand. As the global origin of B. dorsalis, we estimated that Taiwan and Thailand were most likely the global origins of Southeast Asian populations by testing hypothetical scenarios by the approximate Bayesian computation analyses. Our results will allow easier identification of the source region of the forthcoming invasion of quarantined B. dorsalis specimens.


Introduction
In the 21st century, both transportation and communication developed rapidly around the world. The development of various means of exchange has facilitated the movement of human and material resources across national borders [1]. However, due to multiple forms of international exchanges, such as trade and overseas tourism transnational marriage, invasive pests from foreign countries are also increasing [2][3][4]. Moreover, climate change as a global problem has promoted invasive pests' settlement, and the threat and economic loss of indigenous ecosystems by alien creatures are increasing [5,6]. Moreover, it is expected In this study, sampled collections all have not been obtained in prohibited areas such as national parks where permission is requested, thus we make it clear that there is no content in relation to permission of collecting samples. We obtained 565 B. dorsalis individuals from 36 Table S1). In addition, 26 samples assigned to the quarantine-detected Groups 1, 2, 3 and 4 were collected from South Korea (Figure 1 and Supplementary Materials Table S1). The individuals of 26 Korean quarantine-detected samples were assigned to four groups (KR 001-004) by STRUCTURE analysis (K = 4) in Supplementary Materials Figure S1. Although these quarantine-detected groups as pooled non-natural ones were not like other naturally collected populations, we set them as four 'virtual' populations to infer their origins and invasion routes compared with foreign samples according to the STRUCTURE analysis (Supplementary Materials Figure S1). To capture B. dorsalis, we equipped a transparent cylindrical acrylic trap with methyl eugenol and insecticide-infused cotton, and we installed this apparatus near the host plant.
We identified the samples of Bactrocera dorsalis as the same biological species, B. dorsalis complex, consisting of Bactrocera dorsalis s. str., B. papayae, B. philippinensis and B. invadens [12,13]. Accordingly, 'Bactrocera dorsalis', which is mentioned after that, is meant to the valid species taxonomically revised by the two recent studies [12,13]. After we collected the samples, we identified them as belonging to the species complex and used them in the experiment.  Table S1.
We identified the samples of Bactrocera dorsalis as the same biological species, B. dorsalis complex, consisting of Bactrocera dorsalis s. str., B. papayae, B. philippinensis and B. invadens [12,13]. Accordingly, 'Bactrocera dorsalis', which is mentioned after that, is meant to the valid species taxonomically revised by the two recent studies [12,13]. After we collected the samples, we identified them as belonging to the species complex and used them in the experiment.
First, we morphologically identified the collected fruit fly samples based on the morphological characteristics, especially on veins of wing and body patterns [69,70]. Second, we experimented with molecular identification, using the barcode region of the cytochrome c oxidase subunit I (COI) [71] the samples we identified as B. dorsalis. After processing the morphological and molecular identifications, we performed the following haplotype and population analyses, using the exact identified B. dorsalis samples.
To be used in the DNA experiment, we cut one or two hind legs from each fly individual, considering the specimen's size and condition. We performed the DNA extraction according to the manufacturer's protocol of LaboPass™ Tissue Genomic DNA mini Kit (COSMOGENETECH, Seoul, Korea) or AccuPrep ® Genomic DNA Extraction Kit (BI-ONEER, Daejeon, Korea).

Haplotype Network Analysis
We performed a polymerase chain reaction (PCR), using the extracted DNA template. The site we amplified was COI barcode, of which the length of the amplification product was 658 bp [71]. The primer set of LepF1 (5′-ATTCAACCAATCATAAAGA-TATTGG-3′) and LepR1 (5′-TAAACTTCTGGATGTCCAAAAAATCA-3′) was used to amplify the COI barcode sequence [71]. We used AccuPower ® PCR PreMix K-2037 (BI-ONEER, Daejeon, Korea) for the PCR, we mixed 2 µ L of extracted DNA with 1 µ L of 10 pmol forward primer, 1 µ L reverse primer and 18 µ L nuclease-free water to amplification. We performed PCR, using a GS482 thermo-cycler (GENE TECHNOLOGIES, Essex, UK) according to the following procedure: initial denaturation at 95 °C for five min, followed by 38 cycles of 95 °C for 20 s; annealing at 45 °C for 30 s; extension at 72 °C for 40 s, and a final extension at 72 °C for 5 min.
We visualized the PCR products with electrophoresis on a 1.8% agarose gel with low range DNA ladder to check for positive amplifications. For electrophoresis, we prepared a mix of 1 g SeaKem ® LE Agarose (LONZA, Bend, OR, USA), 0.8 g Metaphor™ Agarose (LONZA, USA), 3 μL of RedSafe (LONZA, Bend, OR, USA) and 100 mL of 1×TAE buffer  Table S1.
First, we morphologically identified the collected fruit fly samples based on the morphological characteristics, especially on veins of wing and body patterns [69,70]. Second, we experimented with molecular identification, using the barcode region of the cytochrome c oxidase subunit I (COI) [71] the samples we identified as B. dorsalis. After processing the morphological and molecular identifications, we performed the following haplotype and population analyses, using the exact identified B. dorsalis samples.
To be used in the DNA experiment, we cut one or two hind legs from each fly individual, considering the specimen's size and condition. We performed the DNA extraction according to the manufacturer's protocol of LaboPass™ Tissue Genomic DNA mini Kit (COSMOGENETECH, Seoul, Korea) or AccuPrep ® Genomic DNA Extraction Kit (BIONEER, Daejeon, Korea).

Haplotype Network Analysis
We performed a polymerase chain reaction (PCR), using the extracted DNA template. The site we amplified was COI barcode, of which the length of the amplification product was 658 bp [71]. The primer set of LepF1 (5 -ATTCAACCAATCATAAAGATATTGG-3 ) and LepR1 (5 -TAAACTTCTGGATGTCCAAAAAATCA-3 ) was used to amplify the COI barcode sequence [71]. We used AccuPower ® PCR PreMix K-2037 (BIONEER, Daejeon, Korea) for the PCR, we mixed 2 µL of extracted DNA with 1 µL of 10 pmol forward primer, 1 µL reverse primer and 18 µL nuclease-free water to amplification. We performed PCR, using a GS482 thermo-cycler (GENE TECHNOLOGIES, Essex, UK) according to the following procedure: initial denaturation at 95 • C for five min, followed by 38 cycles of 95 • C for 20 s; annealing at 45 • C for 30 s; extension at 72 • C for 40 s, and a final extension at 72 • C for 5 min.
We visualized the PCR products with electrophoresis on a 1.8% agarose gel with low range DNA ladder to check for positive amplifications. For electrophoresis, we prepared a mix of 1 g SeaKem ® LE Agarose (LONZA, Bend, OR, USA), 0.8 g Metaphor™ Agarose (LONZA, USA), 3 µL of RedSafe (LONZA, Bend, OR, USA) and 100 mL of 1× TAE buffer (BIONEER, Daejeon, Korea) gel. We ran 3 µL of PCR product and 3 µL of 100 bp DNA ladder (BIONEER, Daejeon, Korea) for 25 min at 100 V, using Mupid ® -ex (TAKARA BIO, Kusatsu, Japan). We used a gel image analysis system WGD-20 (DAIHAN, Seoul, Korea) to visually confirm the amplification products. We sequenced successfully amplified We used the CHROMAS 2.4.4 (TECHNELYSIUM Pty Ltd., South Brisbane, QLD, Australia) and MEGA X [72] for sequence analysis and alignment. We saved the aligned sequence as a FASTA format file that we used in the MEGA X program. We performed haplotype analysis, using the 465 COI sequences by DNASP 6.0 [73]. We reconstructed median-joining (MJ) networks between the COI haplotypes, using NETWORK 5.0.1.1 [74] to infer the haplotypes evolutionary relationships.

Microsatellite Marker Screening and Design of Multiplex PCR Set
We used random DNA samples to investigate the amplification success and polymorphism of microsatellite markers and retrieved 43 candidate microsatellite loci (MSL) with 86 primers from the five previous studies [64][65][66][67][68] to find and apply MSL microsatellite loci (MSL) with high efficiency and resolution. In the preliminary experiment, we used 43 candidate MSL to four random populations for testing amplification. We conducted a total of 172 PCR tests, and then we estimated the amplification success and polymorphism of each marker. Finally, we selected 15 MSL (Supplementary Materials Table S2), which we subsequently screened for primer-dimer formation with Multiple Primer Analyzer (THERMO FISHER, Seoul, Korea). Based on the screening results, we designed four multiplex PCR sets that were made by a combination of fluorescent dyes, such as FAM, VIC, NED and PET (ABI, Waltham, MA, USA), at the 5 -end of the forward primer (Supplementary  Materials Table S2).

Multiplex PCR and Fragment Analysis
We performed multiplex PCR for each primer set, using the extracted DNA of B. dorsalis sample. For multiplex PCR, we used AccuPower ® PCR PreMix K-2037 (BIONEER, Daejeon, Korea). We mixed the 2 µL of template DNA and 2 µL of set mixture solution and 18 µL of nuclease-free water, and we performed the PCR at a total volume of 20 µL. We performed the PCR, using a GS482 thermo-cycler (GENE TECHNOLOGIES, Essex, UK) according to the following procedure: initial denaturation at 95 • C for 5 min, followed by 40 cycles of 95 • C for 20 s; annealing at 53 • C for 30 s; extension at 72 • C for 40 s, and a final extension at 72 • C for 5 min. We conducted a fragment analysis-standard run (500LIZ size standard), MACROGEN, Inc. (Seoul, Korea), using fluorescent-labeled (5 modification) forward primers, which we successfully amplified by checking the gel image.
We performed allele reading, using GENEMAPPER ver. 4.0 (ABI, Waltham, MA, USA). We converted raw allelic datasets to a text form, and then it was analyzed in GENALEX 6.503 [75] working in Microsoft Excel 2019 (Microsoft, Redmond, WA, USA).

Data Analysis
Using 565 B. dorsalis individual samples from 40 collections (Figure 1 and Supplementary Materials Table S1), we performed the data processing for calculating allele frequency and genetic distance, principal coordinates analysis (PCoA), heterozygosity, F-statistics, polymorphism, allelic patterns and Nei's genetic distance, using the GENALEX 6.503 [75,76]. We used the GENCLONE 2.0 [77] to identify multilocus genotypes (MLGs) among populations and to calculate the genotypic diversity (GD = [G/N]), where G is the number of different MLGs and N is the sample size [78]. Observed and expected heterozygosity (H o and H e ) values among loci were estimated by using GENEPOP 4.6 [79][80][81]. Using the Bonferroni correction sequential for all trials with multiple comparisons, Hardy-Weinberg equilibrium [82] and linkage disequilibrium [83] tests were performed on the adjusted levels of significance. The deviations from HWE was estimated for heterozygote excess or deficit. We used FSTAT 2.9.3.2 [84] to estimate the gene diversity (H s ), the mean number of alleles (N a ), the number of effective alleles (N e ) and allelic richness (R s ). We also calculated the pairwise genetic differentiation (F st ) values [85,86].
We used BOTTLENECK version 1.2.02 [87] to detect the effect of a recent bottleneck in the populations in our samples. We used the strict stepwise mutation model (SMM) and the two-phase model (TPM) considered as appropriate for microsatellite datasets. A model includes both 90% SMM and 10% TPM for 10,000 iterations. Significant deviations in observed heterozygosity among all loci were tested by using a nonparametric Wilcoxon signed rank test [88]. Due to small sample sizes, the bottleneck analysis excluded the several groups with a population of fewer than four samples, which were KR 004 (2 individuals), CN SHA (1), KH PNP (1) and NP POK (1). Therefore, we excluded five individuals in four groups, and we normally conducted the analysis.
We used STRUCTURE 2.3.4 to analyze the genetic structure of 565 B. dorsalis individuals from 40 populations, using a Bayesian methodology [89]. We set the number of assignments (K) from one to 15 and conducted five replicate runs for each K value. In each run, it was performed by Markov chain Monte Carlo (MCMC) of 500,000 repetitions with an admixture model after a burn-in period of 30,000 steps. We obtained the value of delta K (∆K), using the ad hoc quantity, on the basis of the rate of the second order of the likelihood change [90]. We calculated K by using the online resource STRUCTURE HARVESTER [91] that explained the data structure to perform this process correctly. We conducted STRUCTURE results visualization, using DISTRUCT 1.1 [92]. GENECLASS 2 was used to carry out the Bayesian tests with optioned to detect first-generation migrants [93]. For each population, the program estimated its likelihood of belonging to any other population or to the place where it was collected. The most likely source of the sample has the highest assignment probability on the assigned genotype. Bayesian method was used to calculate allelic frequencies of population [94] with Monte Carlo resampling for estimating the significance of assignments (simulated individuals = 10,000, alpha = 0.01).
We tested independent groupings based on biogeography with analysis of molecular variance (AMOVA) by ARLEQUIN 3.5.1.2 [95,96], with its significant value estimated with the nonparametric permutation method designed by precedent research [97]. We used POPULATIONS 1.2.30 to estimate the relationship between groups, using the analyzed allele values, and created a population-distances phylogenetic tree of intergroup genetic structures and explore migration areas [98]. For this analysis, we used a file in the format and the alleles coded with three digits of GENEPOP.
Approximate Bayesian computation (ABC) was carried out in DIYABC 1.0.4 [99] to estimate the relative likelihood of alternative introduction scenarios of the oriental fruit fly, using microsatellite datasets. The DIYABC test includes genetic admixture events in introduced populations, serial or independent introductions and the comparison of complex scenarios involving bottlenecks [100]. For modeling scenarios, the parameters consist of the rate of admixture, the duration of the bottleneck during colonization, the effective number of founders in introduced populations, the stable effective population size and the times of split or admixture event [101]. To estimate the posterior distribution of parameters and to select the most likely scenario, the program produces a simulation dataset [101]. A simulation dataset generated by the program is used to choose the most similar to the observed (selected) dataset (n δ ) and then used to calculate the posterior distribution of parameters [99]. We analyzed to test two different ABC analyses, using all or partial microsatellite data. We performed the DIYABC analysis to track the ancestral populations and the migration process among all experimental groups. Eventually, conducted the DIYABC 1.0.4 analysis to determine which of the groups we collected was the most ancestral group and which group was derived. Three virtual groups were first set up to proceed with the analysis. We referred to the STRUCTURE (K = 3) results and PCoA to set the virtual group (see Results). Above all, we removed the Korean quarantine samples from all groups, and then we regrouped the remaining samples into G1 (Subgroup 1), G2 (Subgroup 2) and G3 (Subgroup 3). The G1 consists of the samples of China, Vietnam, Laos and Malaysia; the G2 consists of those of Taiwan and Thailand; and the G3 consists of those of the Philippines, India, Myanmar, Ho Chi Minh and Vietnam. In the first analysis, we analyzed six scenarios with G1 G2 and G3 origins to find the most ancestral group and the topology between them (Supplementary Materials Figure S2). In the second analysis, we analyzed nine scenarios for tracing the Korean quarantine groups (Supplementary Materials Figure S3). A million simulation datasets were obtained for each scenario. A generalized stepwise model (GSM) as the mutational model for microsatellites was used, which assumes increases or decreases by single repeat units [99]. To identify the posterior probability of these three scenarios, the 1% simulation datasets are selected for the logistic regression, 0.01% ones for the direct approach as closest to the pseudo observed dataset [101]. From the simulated and observed dataset, the summary of statistics was calculated for each of the tested scenarios, such as genetic differentiation between pairwise groups (F ST ), classification index, the mean number of alleles per locus (A), mean genetic diversity within-and between-group, Goldstein distance and shared alleles distance (D as ).

Haplotype Network
A total of 465 B. dorsalis sequences on COI barcode were generated in this study (GenBank accession nos. MW322099-MW322563). We built a crown-like MJ network, and in total, 50 haplotypes (H1-H50) were obtained ( Figure 2). There were some high-frequency haplotypes (such as H1, H2, H3, H23, H32 and H31) located in the center, and other rare haplotypes were connected to them through several mutation steps. In most cases, it was connected radially from the highest frequency haplotype, H1, through several mutation steps. Several haplotypes were found as connected by mutations converging from two different haplotypes. For example, H5 was a converged mutation haplotype of H2 and H23, while H2 and H23 were haplotypes of divergence mutations that occur in different directions from the dominant haplotype, H1. Besides, H11 was a mutation haplotype converging from H3 and H25, and those were both connected with H1. H14 (CN) and H38 (PH) were further isolated with eight steps from H1.
The H1 as the highest frequency haplotype included 312 individuals comprised of the samples found in all countries, which means H1 commonly exists in all countries (Supplementary Materials Table S3). As the secondarily highest frequency haplotypes, both the H31 and H32 consisted of individuals only from the Philippine population. The Korean quarantine individuals, the samples included in pop KR 001-4, were found in haplotype H1, H2, H3 and H4; in particular, H4 consists of two Korean quarantine individuals. We used a total of 34 individuals from Taiwan in the analysis, and 11 haplotypes were shown. We used 117 individuals from the Philippines in the analysis, and we observed 19 different haplotypes. The Philippines showed a wide variety of haplotypes, but the haplotype diversity of Taiwan seemed to be higher than that of the Philippines in relation to the population size (Supplementary Materials Table S3).

Genetic Differentiation within and between Populations
In this study, we genotyped 565 B. dorsalis samples by using fifteen microsatellite loci, which all were discovered to be non-clonal MLGs, i.e., non-identical genotypes estimated by multiple loci (Table 1). Therefore, in all groups, the MLGs were the same as the number of individuals in each population. Besides, each individual had a MLG that was not shared with the other, so every GD value was 1.00. The observed heterozygosity (H o ) and heterozygosity (H e ) values from all collected East Asian populations ranged from 0.378 to 0.704 and from 0.200 to 0.751, respectively (Table 1 and Supplementary Materials Figure  S4). In HWE, there was no population deviation due to heterozygote excess or deficit (p < 0.0001). Gene diversity (H s ), the mean number of alleles (N a ) and the number of effective alleles (N e ) averaged 0.70, 6.30 and 3.70, respectively, and allelic richness (R s , mean ± SD) was 2.69 ± 0.23.
connected radially from the highest frequency haplotype, H1, through several mutation steps. Several haplotypes were found as connected by mutations converging from two different haplotypes. For example, H5 was a converged mutation haplotype of H2 and H23, while H2 and H23 were haplotypes of divergence mutations that occur in different directions from the dominant haplotype, H1. Besides, H11 was a mutation haplotype converging from H3 and H25, and those were both connected with H1. H14 (CN) and H38 (PH) were further isolated with eight steps from H1.  We estimated pairwise genetic differentiation (F ST ) between 37 different geographical populations (Supplementary Materials Table S4) As the results of BOTTLENECK [87], a significant observed heterozygosity excess (p < 0.05) was not detected from the Wilcoxon sign-rank tests (SMM and TPM), whereas the shifted mode was observed in the seven groups: KR 001, KR 003, TW JIA, TH BAN, TH CHA, IN DEL and PH BUS. This seems to show a bottleneck process occurred in the population, even although the bottleneck test should be cautiously regarded because the sample size for some populations was less than 30 individuals [88].

Genetic Structure and Assignment
We show the results of the PCoA in Figure 3. The x-axis is the value of coordinate 1 (−1.20 to 0.60), and the y-axis value is in the range of coordinate 2 (−0.60 to 1.20). Groups with similar values of the coordinates 1 and 2 are located close to the graph plane. Twelve countries are represented in the legend (Figure 3). The 39 groups were grouped into three masses we labeled as G1, G2 and G3. The KR, which means individuals found by quarantine in Korea, was included only in G1 and G2. The G1 comprises groups of CN, VN, LA and MY, and KR 003. The G2 consists of TW and TH and included two Korean quarantine groups, KR 001 and 002. Members of the G3 belonged to the first group of VN, IN, MM and PH groups but did not include the Korean quarantine group. The quarantine group KR 004 did not belong to any group, and KH and NP also did not belong to any group because they had very low values of coordinate 1. KR 004 did not belong to any group, and KH and NP also did not belong to any group because they had very low values of coordinate 1. In all STRUCTURE analyses, the most likely number of clusters was K = 2; using the ΔK calculation of the Evanno et al. [90] method result (Supplementary Materials Table S5 and Figure S5), we found that the best value was 317.32 on ΔK = 2, the second-best value 186.73 on ΔK = 3 and the third-best value 31.24 on ΔK = 6 (from K = 1 to K = 15). Although K = 2 was the best, the results of K = 3 and K = 6 were referred thereafter because those were effective to display the relationships between the populations when compared and combined with the other analyses such as PCoA (Figure 3). The STRUCTURE analysis for all samples resulted in two distinct clusters, the structure of the group was divided into LA similar types and TW similar types (Supplementary Materials Figure S6a  In all STRUCTURE analyses, the most likely number of clusters was K = 2; using the ∆K calculation of the Evanno et al. [90] method result (Supplementary Materials Table S5 and Figure S5), we found that the best value was 317.32 on ∆K = 2, the second-best value 186.73 on ∆K = 3 and the third-best value 31.24 on ∆K = 6 (from K = 1 to K = 15). Although K = 2 was the best, the results of K = 3 and K = 6 were referred thereafter because those were effective to display the relationships between the populations when compared and combined with the other analyses such as PCoA (Figure 3). The STRUCTURE analysis for all samples resulted in two distinct clusters, the structure of the group was divided into LA similar types and TW similar types (Supplementary Materials Figure S6a The assignment test using GENECLASS 2 ( Table 2) showed the average probability with which samples were destined to the most likely reference population. The selfassignment probability values (SAPV) averaged 0.418 ± 0.273 (mean ± SD) in all populations, and in the Korean quarantine group, SAPV averaged 0.468 ± 0.05. In the Taiwanese population, SAPV averaged 0.260 ± 0.112, whereas it averaged 0.480 ± 0.451 in China, 0.288 ± 0.142 in the Vietnamese population, 0.203 ± 0.048 in Laos, 0.522 ± 0.293 in the population from Thailand and 0.390 ± 0.184 for the populations in the Philippines. Five cases to confirm genetic variance between the preordained groups were analyzed by using AMOVA implemented in ARLEQUIN [95]. There is no significant apportionment of variance between the Korean vs. East Asian country, within populations in Case 1 (p = 0.2176). Genetic variance of about 4.57% among groups in Case 2 (Based on the STRUC-TURE (K = 2)) and 6.13% among groups in Case 3 (Based on the PCoA and STRUCTURE (K = 3)) suggests that there are relatively different regional structures between the compared regions. In Cases, 1, 4 and 5, there was no significant genetic variation between groups ( Table 2). The analysis of the Case 1 clearly shows that the KR quarantine group is not an indigenous species or an invasive species that have settled for a long time, showing 0.61% of among group percentage of variation. Through the Case 4, it was confirmed that the genetic variation according to the collection period was more significant than the regional genetic variation, but there was no seasonal effect with a value of 3.28%. Analysis of the Case 5 (based on the longitude 113 • E, west side vs. east side) showed 1.56% of the variation. Because of this, it was not possible to confirm the genetic variation for regional isolation.

Inferring an Introduction to Test Hypothetical Scenarios by ABC Analysis
Most of the forty populations were regrouped into three subgroups by referring to the results we obtained with the PCoA and structure (Figures 2 and 3 and Supplementary Materials Figure S6). Each of the three regrouped subgroups formed a more extensive group containing populations with similar genetic structure. Subgroup 1 has a Chinese allelic pattern and is composed of individuals from China and Vietnam (Cuc Phuong) as 'CH + VN'. Subgroup 2 has a Taiwan-like allelic pattern and consists of Taiwan and Thailand objects as 'TW + TH'. Subgroup 3 is a set of allelic patterns similar to the Philippines, composed of groups from the Philippines, Myanmar and India as 'PH + MY + IN'.
Analyses 'A' constructed and analyzed six candidate scenarios to infer the ancestor relationship of 'CH + VN', 'TW + TH' and 'PH + MY + IN' (Supplementary Materials Figure  S2). In Scenarios 1 and 2, 'CH + VN' was designed as an ancestor group (Supplementary Materials Figure S2a Figure S2e,f). We observed the highest logistic regression value in Scenario 4, and the next highest value we observed was in Scenario 3 (Table 3 and Supplementary Materials Figure S8). As a result, both Scenarios 3 and 4 implied that 'TW + TH' was an ancestor group (Supplementary Materials Figure S8).  We performed the analyses 'B' of DIYABC to trace B. dorsalis' invasion origin into the Korean Peninsula. For this reason, we performed this analysis by adding KR 001, KR 002 and KR 003 as the fourth subgroup into the cladistics relationships of analyses (A), in which 'TW + TH' set to the most basal group due to the results of their ancestry (Table 4 and Supplementary Materials Figure S3). In the closest logistic regression, Scenario 3 had the highest values in Analyses B-1 and B-2 (Table 4 and Supplementary Materials Figure S8). This means that the two Korean groups, KR 001 and KR 002, were originated from 'TW + TH'. In Analysis B-3, Scenario 1 showed the highest value of posterior probability. As a result of this analysis, KR 003 was originated from 'CH + VN'.  Figure S8B-1). Values in bold indicate the best scenario in each analysis.

The Genetic Structure and the Global Origin of Bactrocera dorsalis
Bactrocera dorsalis is inferred to be a species native to Southeast Asia, and it is spreading worldwide and occurring in many countries due to its strong invasiveness [11]. The main distribution areas in Asia are Bangladesh, Bhutan, Cambodia, South China, Hong Kong, India, Indonesia, Japan Ryukyu Islands, Laos, Malaysia, Myanmar, Nepal, Ogasawara Islands, Pakistan, Philippines, Sri Lanka, Taiwan, Thailand and Vietnam [14]. However, the species is not confined only to Asia, and it was recently reported to have been introduced to North America, specifically in Hawaii, California and Florida [103]. Especially in the United States, it is now found in all major Hawaiian Islands after a sudden invasion in 1944/1945 [104]. The case of fruit flies invading California shows that it is challenging to exterminate this species because invasion and eradication were frequently repeated between 1960 and 2007 [11,105].
Moreover, the oriental fruit fly also has a history of invasion and settlement in the Mariana and Tahiti Islands in the Pacific and the African Continent [106,107]. Several studies have argued that the B. dorsalis' genetic structure and isolation-by-distance tend to be weakly supported within Asia [28,54,57,108], even though several patterns are evident in our study.
Our PCoA and STRUCTURE analyses show that the fifteen microsatellite markers we used in this study were able to separate a total of 565 samples from 40 populations into three main Subgroups (Figures 3 and 4). There was a genetic variation between the three subgroups: Subgroup 1 was defined as the Chinese type, Subgroup 2 as the Taiwanese type and Subgroup 3 as the Philippines type ( Figure 3). According to other previous research papers, invasion processes from tropical Asia to non-Asia have been observed, so finding the Asian invasion origin can help researchers find the global spread source of B. dorsalis. Between nations, border division cannot be an isolation factor for each species' populations. For example, Vietnam's territory has a longitudinally elongated shape along the coastline. Because of this territory's shape, populations VN BC1 and VN BC2 collected in Northern Vietnam had genetic structures similar to Yunnan in China than VN HOC, belonging to the same country ( Figure 4). The VN HOC population from Southern Vietnam was more similar in genetic structure to KH PNP, a group in Phnom Penh in Cambodia, which is geographically adjacent. The case of the group IN DEL in Delhi, India, and the sample NP POK in Pokhara, Nepal, are also inferred from this phenomenon. On the other hand, the Laos populations, LA VI1, LA VI2 and LA VI3, were collected with a unique strategy. The three groups from Laos were regularly collected in the same place, using the methyl eugenol trap [109]. During the sampling period from 2016 to 2017, there was no introduction from other groups in the area. Therefore, we can infer that the three Laos populations do not compose a metapopulation but continuously maintained populations presumed to be isolated (Table 2 and Figure 4) [110].
belonging to the same country (Figure 4). The VN HOC population from Southern Vietnam was more similar in genetic structure to KH PNP, a group in Phnom Penh in Cambodia, which is geographically adjacent. The case of the group IN DEL in Delhi, India, and the sample NP POK in Pokhara, Nepal, are also inferred from this phenomenon. On the other hand, the Laos populations, LA VI1, LA VI2 and LA VI3, were collected with a unique strategy. The three groups from Laos were regularly collected in the same place, using the methyl eugenol trap [109]. During the sampling period from 2016 to 2017, there was no introduction from other groups in the area. Therefore, we can infer that the three Laos populations do not compose a metapopulation but continuously maintained populations presumed to be isolated (Table 2 and Figure 4) [110]. As mentioned in the results, between some groups, the genetic structure was similar according to the natural habitat range rather than the administrative division of humans (Figures 3 and S5) [47]. However, other groups deviated from this expectation. India, Nepal, Myanmar, Cambodia and Southern Vietnam had similar genetic structures to the Philippines, even though they were not located in adjacent regions (Figure 4). The Laos populations' genetic structure was identical to those of Malaysia, Fujian and Western China. Thailand's B. dorsalis populations' genetic structure is similar to those from Taiwan that are as much as 2400 km away across the sea. As this result suggests, one can assume that the migration, i.e., a transference of individuals affected by human activities, between groups with similar genetic structures occurred. Wu et al. [30] made the following arguments through mitochondrial NDI gene sequence analysis. Generally, ancestors show significantly more genetic diversity than derivative populations because of the founder effect As mentioned in the results, between some groups, the genetic structure was similar according to the natural habitat range rather than the administrative division of humans ( Figure 3 and Figure S5) [47]. However, other groups deviated from this expectation. India, Nepal, Myanmar, Cambodia and Southern Vietnam had similar genetic structures to the Philippines, even though they were not located in adjacent regions (Figure 4). The Laos populations' genetic structure was identical to those of Malaysia, Fujian and Western China. Thailand's B. dorsalis populations' genetic structure is similar to those from Taiwan that are as much as 2400 km away across the sea. As this result suggests, one can assume that the migration, i.e., a transference of individuals affected by human activities, between groups with similar genetic structures occurred. Wu et al. [30] made the following arguments through mitochondrial NDI gene sequence analysis. Generally, ancestors show significantly more genetic diversity than derivative populations because of the founder effect [111]. According to this principle, it was considered that the Chinese B. dorsalis' populations originated from Southeast Asia (Manila, Pattaya and Bangkok) [30]. Symmetric migration patterns were detected in Yunnan, Guangdong, Fujian and Taiwan, while the asymmetric migration of gene flow indicated multiple invasions and multiple origins. The detection of gene flow, using mitochondrial genes, strongly supports our results [30].
As the previous studies on the spread of oriental fruit flies argue, the first cause of B. dorsalis migration is the food transportation by humans and tourists' movement along the countries' borders [30,37]. It may be secondarily caused by typhoons' generation and its activity in the northern hemisphere occurring every year [112,113]. As the tropical cyclone between 180 • E and 100 • E in the northern hemisphere, typhoons often transport tropical pests to subtropical or temperate regions along their path [112]. This area is called the Northwestern Pacific Basin and is the most active tropical cyclone basin on Earth, where one-third of the annual typhoons are generated here [114]. Generally, typhoons created in the Northwestern Pacific Basin move northwest [115]. However, when they reach 20 • to 30 • north latitude, they change their routes northward or northeastward due to the westerlies' effect and crosses Northeast Asia [116]. Many studies revealed that insects living in tropical regions were transferred to subtropical or temperate regions because of typhoons. Otuka et al. [117] suggested a non-intentional B. dorsalis transmission by typhoons from the Philippines to Okinawa, Japan. Shoji [118] recognized that typhoons were an essential factor in moving butterflies from the Philippines to the Ryukyu Islands. Typhoons are suggested as a cause of unintentional introductions of other pests, as well. In our study, it was impossible to trace the exact reason for the international spread and migration of B. dorsalis. However, the weakly revealed B. dorsalis' genetic flow direction is similar to frequent typhoons' routes, such as from Taiwan to Jeju island.
There were several hypotheses on the global origin of B. dorsalis. Three sources of the Chinese B. dorsalis specimens were proposed by classical studies [35]. Since B. dorsalis was first recorded in Taiwan [31], Wang [31] hypothesized that this species was introduced from Taiwan about a century ago to Mainland China and spread to other Asian and Pacific countries in the next 90 years. Nevertheless, Wang [31] hypothesis was not supported by the microsatellite diversity data [32,33]. Interestingly, a previous study [33] that used nine microsatellite markers suggested that Guangdong, China or Southeast Asia (Cambodia, Laos and Thailand) were the mainland fruit flies' source based upon their two-way migration estimates, which is similar to our results. The study of Shi, Kerdelhue and Ye [34], based on the high levels of local genetic variability, proposed Yunnan as a possible source area for the B. dorsalis. If not, they argued that at least the Yunnan area would be of old colonization [119]. Chen, Ye and Mu [36] estimated that B. dorsalis on the island of Taiwan and Hainan might have migrated from Mainland China, crossing the Taiwan Strait and the Qiongzhou Strait. This migration was also due to the increase in the amount of trade in agricultural products such as fruits and vegetables in recent decades [35]. These previous studies suggest that Mainland China or Taiwan may be the origin of B. dorsalis. Some of them may have migrated to the west and spread to Southeast Asia [35,36]. Similarly, the previous result of Wu et al. [30] suggested that oriental fruit flies migrated from Eastern Asia, from places such as Taiwan and China, and spread to the western regions, such as Thailand or Laos. Accordingly, our study estimated that B. dorsalis spreads and migrates from the Southwestern Asia, near Thailand, to China, Taiwan and the Philippines, and from there, it repeatedly reinvades or migrates to Northeast Asia, from Taiwan and the Philippines (Table 2 and Figure 4). However, based on the scenario of B. dorsalis migration from west to east by Clarke et al. [10] and Qin et al. [37], it was suggested that India, Nepal and their neighboring countries are, most likely, the ancestry origin of B. dorsalis. In this study, there is a weak side to the verification of these hypotheses because sampling from India and Nepal was relatively insufficient. Therefore, further study is needed to clarify the debate of its origin to expand our genetic analysis with the addition of samples from India and Nepal.

Inferring Source Population for Korean Quarantine Samples
Although many similar studies have been performed before, it was challenging to trace the introduction source of B. dorsalis Asian population [33,120]. Researchers could deduce that Northern Southeast Asia or Southern China were likely to be the introduction source because the gene flow from Southeastern China or islands to inland was detected [28,30,33,35,54,108]. Wu et al. [30] attempted to reveal the genetic structure and origin of B. dorsalis by using the mitochondrial ND1 gene. Network analysis performed by mitochondria identified the migrations between China, Taiwan, Thailand, Laos and the Philippines. As a result, Wu et al. [30] detected the migration between Thailand and Taiwan. These results support why the microsatellite structures from Taiwan and Thailand specimens, despite being geographically far apart, are similar in our study.
Recently reported studies have suggested that the theory of origin in China and Taiwan, which has been supported, results from a lack of sampling in previous studies [121]. These studies suggest India and Bangladesh as new invasion origins [37,121]. However, as a result of our analysis, populations from India were subordinated to those from the Philippine genetic group, and both the Taiwanese and the Chinese populations constituted a separate pool of alleles (Supplementary Materials Table S7 and Figure 3). As a result of the DIYABC analysis, the Taiwanese group was confirmed and located in the ancestral position (Table 3 and Supplementary Materials Figure S2).
This study's most crucial aim was to answer the question about the invasion source of some B. dorsalis obtained at Korea's quarantine border. Among the samples we obtained, KR 001 and KR 002 had similar genetic structures to Taiwan or Thailand (Figures 3 and 4). On the other hand, KR 003 had identical populations and genetic structure in Fujian, China or Laos (Figures 3 and 4). Although K004 was also similarly clustered to Fujian, China or Laos (Figure 4), it could not be estimated from where it originated due to large difference genetically from other groups ( Figure 3). As a result of this, it is estimated that B. dorsalis' origin, which can be introduced through the Korean Peninsula's borders, may be various countries (multiple cases), not a single country. The genetic structure found at the highest frequency in the border quarantine sample was from Taiwan or Thailand, and 19 individuals belonged to it (Figure 3 and Supplementary Materials Figure S6). Following that, seven samples were of Chinese or Laos type (Figure 3 and Supplementary Materials Figure S7). Considering the fruit trade scale and annual tourist entry rate regardless of the geographical accessibility, Thailand and China is more likely to be the introduction source than Taiwan and Laos. Unfortunately, even if compressed into these four countries, it is difficult to pinpoint a specific country for the source of introduction in this study.

Applications for Future Quarantine
The tropical and subtropical fruit industries generate income and employment in many countries, became the primary means of foreign currency income and provided a balanced diet internationally regarding human health and nutrition [82]. Therefore, for developing countries in tropical and subtropical regions, the fruit industry must be a project that cannot be given up for the national economy's vitality [82,122]. It is a vibrant sector with progressive expansion in production, international trade and consumption [82]. With the development of food processing technology and trade, the tropical fruit industry shows an increase of 3.5% annually, and the significant fruits with significant growth rates include bananas, mangos, pineapples and papaya [123][124][125]. These fruits are the primary B. dorsalis' host plants [11], and Asia is the largest tropical fruit production region, with annual production accounting for 66% of global production [125].
In the 1990s, the Korean fruit trade was opened, and tropical-fruit imports have increased significantly since the 2000s, due to the Free Trade Agreement's conclusion. Most tropical fruit imports depend on Asian countries, and their imports increase each year (data from Korea International Trade Association). In addition, the number of foreign visitors or foreign residents increased from major Asian countries due to tourism, study abroad, employment, international marriage, etc. (Statistics Agency data). For this reason, the Korean government spurs the APQA to prevent the invasion of foreign pests and eradicate invaded ones.
The invasion risk of B. dorsalis is increasing through the channels of portable food, portable belongings and trade stuff. The temperature is rising each year worldwide and in the Korean Peninsula as well, changing from a temperate climate to a subtropical climate. With this phenomenon, if tropical pests invade the Korean Peninsula, it will be fatal to the fruit industry. The geographic range in which an insect or plant lives is naturally dynamic and varies over time. Human-induced climate change, civil engineering and trade are rapidly destroying and rebuilding the insects' and plants' habitat barriers [126][127][128]. According to its high score in the invasion risk assessment estimation and its serious hazard upon the fruit industry, the APQA, Korea, has defined B. dorsalis as a 'prohibited pest'. In this respect, our study will be applicable to elucidate B. dorsalis' source of introduction, providing support for future quarantine and management actions regarding B. dorsalis' invasions by both the Korean government and the Korean fruit industry.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/insects12100851/s1. Figure S1: The individual assignment result of 26 Korean samples from four quarantine-detected groups (KR 001-004) by STRUCTURE analysis (K = 4). According to Evanno et al. [90], the best delta K was estimated to 4. The ordering number of individuals are consistent to that in Table S1. Figure S2: The simulated DIYABC historical scenarios by analysis (A). Six analysis scenarios to identify ancestral groups and to determine the separation order of each group. Each branch's color represents a specific group, red: Pop 1 (CH + VN), green: Pop 2 (TW + TH), blue: Pop 3 (PH+MY+IN). Information and results of Scenarios 1-6 (see Table 3). Figure S3: The simulated DIYABC historical scenarios by Analysis B based on the former Analysis A, Pop 2 (TW + TH) was fixed as an ancestral group, and we performed an analysis to determine from which country did the KR 001-003 originate. KR 001 and KR 002 were approached for the Scenario 3 model (group KR 001 and 002 has a high probability of originating from Taiwan and Thailand). The KR 003 population has approached Scenario 1 (KR 003 has a high probability of originating in China and Vietnam) (see Table 4). Figure S4: Allelic patterns in the 15 microsatellite loci in 40 populations of Bactrocera dorsalis. Na, number of different alleles; Na (Freq. ≥ 5%), number of alleles with a frequency greater than 5%; Ne, number of effective alleles; I, Shannon's Information Index; No. Private Alleles, number of alleles unique to a single population; No. LComm Alleles (≤25%), number of locally common alleles occurring in 25% or less in the populations; No. LComm Alleles (≤50%), number of locally common alleles occurring in 50% or less in the populations; He, expected heterozygosity; Vertical bars represent the standard error. Figure Figure S6: Bayesian clustering by STRUCTURE. Individual assignment plots for K = 2, 3, 4, 5, 6. Mean log probability of the data for different K values (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15), and plot of ∆K statistic of Evanno et al. [83] showing best K = 2. (a) In the result of K = 2, KR 001,002, TW and TH showed the same clustering, and the other groups were separated. (b) The result of K = 3 was the same as that of PCoA. KR 001 and 002 have the same clustering as TW + TH, whereas KR 003 has the same clustering as CH + VN (BC1,2) + LA + MY. Figure S7 Results of the DIYABC Analysis A. Direct estimate; left, Logistic regression; right. Scenarios 1 to 6 are color labeled (Table 3 and Supplementary Materials Figure S2). Figure S8: Results of the DIYABC Analysis B. The B-1 is the analysis result for the identification of the origin of KR 001. B-2 is the analysis result for the identification of the origin of KR 002. B-3 is the analysis result for the identification of the origin of KR 003. Each color reflects each evolutionary scenario in the legend on the right. The red legend on each graph (Scenario 1) is a scenario where CN + VN is the origin, the green legend (Scenario 2) means TW + TH is the origin, and the blue legend (Scenario 3) means PH + MY + IN is the origin (Table 4 and Supplementary Materials Figure S3). KR 001 and 002 have the highest logistic regression of scenarios originated from TW + TH, KR 003 has the highest logistic regression of the scenario originated from CN + VN. Table S1: Collection data for oriental fruit fly (Bactrocera dorsalis) analyzed in this study. † Abbreviations for collectors: APQA = Animal and plant quarantine agency of Korea; JJ = Jaeyong Jeon; HN = Hyeban Namgung; HK = Hyojoong Kim; DC = Deuk-Soo Choi; SL = Seong-Jin Lee; JL = Jong-Ho Lee. Table S2: Microsatellite loci and primer sequences and multiplex PCR primer set used in this study. Set; unit of multiplex PCR and capillary electrophoresis injection.