Development of Nuclear DNA Markers for Applications in Genetic Diversity Study of Oil Palm-Pollinating Weevil Populations

Simple Summary The main pollinator for the African oil palm, Elaeis guineensis, in Malaysia and Indonesia is Elaeidobius kamerunicus. The weevil species is not native to these countries, but it was introduced from Cameroon, West Africa, in 1981 to improve pollination efficiency, thus improving bunch formation and yield. Forty years after the weevil introduction, recent reductions in bunch yield reported in Malaysia caused by poor bunch formation could be associated with a decrease in pollination efficiency due to the founder effect. Several genetic diversity studies of weevil populations based on morphological and mitochondrial markers have been carried out; however, the studies did not provide sufficient evidence for explaining the genetic variation, particularly at the intra-species level. This study aims to develop a set of robust E. kamerunicus-specific nuclear DNA markers to directly assess the genetic diversity of weevil populations. The marker development lays the foundation for future applications by extending the survey into larger areas where the oil palm is cultivated to obtain a more comprehensive understanding of the genetic diversity and inbreeding occurrence status of E. kamerunicus in these introduced regions. This could facilitate sustainable genetic monitoring and conservation planning of E. kamerunicus, especially in non-native oil palm-growing countries. Abstract The oil palm-pollinating weevil (Elaeidobius kamerunicus Faust) was introduced from Cameroon, West Africa, to Malaysia in 1981, and subsequently, to other oil palm-growing countries as well. This study aims to develop a set of robust E. kamerunicus-specific nuclear DNA markers to directly assess the genetic diversity of the weevil populations. A total of 19,148 SNP and 223,200 SSR were discovered from 48 weevils representing three origins (Peninsular Malaysia, Sabah, and Riau) using RAD tag sequencing. Subsequent filtering steps further reduced these to 1000 SNP and 120 SSR. The selected 220 SNP exhibited a polymorphism information content (PIC) of 0.2387 (±0.1280), and 8 SSR had the PIC of 0.5084 (±0.1928). These markers were found to show sufficient polymorphism, making it possible to assign 180 weevils into three major clusters from Ghana, Cameroon, and Southeast Asia (mainly in Malaysia and Indonesia). These DNA markers successfully confirmed the Cameroon origin of the Southeast Asian cluster. However, the presence of null alleles in the SSR markers, due to limited flexibility of the probe design on the short RAD tags, led to an underestimation of heterozygosity within the populations. Hence, the developed SNP markers turned out to be more efficient than the SSR markers in the genetic diversity assessment of the E. kamerunicus populations. The genetic information provides useful insight into developing guidelines for the genetic monitoring and conservation planning of E. kamerunicus.


Introduction
Oil palm, Elaeis guineensis Jacq., originated from West Africa. The earliest record of its introduction to Southeast Asia is found regarding the planting of four dura palms in Bogor Botanical Garden, Indonesia, in 1848 [1]. Currently, descendants of these palms have become the major planting material of Southeast Asia and Oceania [2], supplying about 34.3% of the total palm oil trade [3]. The oil crop is monoecious, producing male and female inflorescences in an alternating cycle to promote cross pollination among palms for fruit bunch development, mainly assisted by weevils under genus Elaeidobius in West Africa [4]. However, the native pollinator was unavailable in Southeast Asia when the oil palm was first planted in the region. A lower than 60% fruit-to-bunch ratio indicates inefficient pollination, which can cause a significant yield loss of fresh fruit bunches (FFB) [5]. In the late 1970s, Thrips hawaiiensis was identified in Malaysia as an alternative pollinator to the native weevil, but the species was far less efficient and usually absent from young plantings [4]. Hence, the commercial plantings were mainly hand pollinated, which was extremely costly and laborious [6]. The oil palm industry was left with no other option but to consider introducing the native pollinator from Cameroon to Malaysia [7].
The species Elaeidobius kamerunicus Faust, 1898 (Coleoptera: Curculionidae) was chosen after the species was studied to be the most predominant oil palm pollinating weevil in Cameroon. The species is also well adapted to both wet and hot seasons, and has a high pollen carrying capacity among the other Elaeidobius species [7]. In addition, hostspecificity testing found that the species was unable to complete its life cycle on any plant species in the country other than the oil palm; thus, it was deemed safe to be introduced [8]. Elaeidobius kamerunicus is holometabolous, completing its development on post anthesis oil palm male inflorescence [9,10]. Adult female oviposit eggs on the male oil palm inflorescence, and the larva emerge and undergo perfect metamorphosis. The adult (weevil), attracted to the strong aniseed smell produced by male flowers at anthesis, visits the male flowers as a food source (nectar) and oviposits its eggs. Pollen grains are attached to the adult weevils' body, mostly on the elytra, but some on the ventral part of the thorax and abdomen and under the elytra [4]. The weevil visit the female flowers by deception, as they are attracted to the same aniseed smell given off by male flower at the receptive stage. This results in the transference of pollens from the male to the female flowers (pollination) [11].
In 1981, the E. kamerunicus species in the form of pupae was imported from Plantations Pamol du Cameroon, Lobe Estate, Cameroon, to Malaysia [12]. A total of 3000 adult weevils were reared and first released into two plantations in Malaysia in 1981; the introduction was a huge success. The rapid establishment of weevils throughout the plantations led to significant improvements in fruit set and oil yield [6,12,13]. Hand pollination was then terminated, with an estimated saving of MYR 50 million per year during the 1980s [6]. The release of weevil species extended to Indonesia, Papua New Guinea, Thailand, and Colombia revolutionized the entire palm oil industry [2,14].
A declining trend of FFB, however, has occasionally been reported in Malaysia since the 1990s. Low rainfall and over-pruning, causing a reduction in the formation of female inflorescences, could be contributing factors to this trend. In addition, the descendant population of introduced E. kamerunicus probably suffered a bottleneck effect that weakened their adaptability to local climate change, making them vulnerable to parasitic nematode infection [15][16][17][18]. This may directly reduce the pollination ability of the insect. Many studies on the genetic diversity of E. kamerunicus populations were conducted to determine the presence of a bottleneck effect, but multivariate analysis based on 15 morphological characteristics of E. kamerunicus mainly explained the variation between male and female insects, rather than between geographical regions [19]. Due to strong influences from the environment, some of these morphologic traits could mask the genetic variations [20]; therefore, the founder effect still remains uncertain.
Hence, geneticists started adopting molecular markers to directly measure the genetic diversity of this important weevil population. Mitochondrial marker COI, COII and a nuclear gene fragment of arginine kinase (AK) were used in estimating the phylogeography of the E. kamerunicus populations [21]. The South American and Asian populations were found to have originated from West and Central Tropical Africa [21]. However, mitochondrial markers alone may not be sufficient to detect genetic polymorphism for an intra-species population; hence, the use of a nuclear marker was suggested [22]. Nuclear markers, which include single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers, are able to detect allele heterozygosity information, hence more useful than mitochondrial markers in insects' genetic diversity studies [23]. Simple sequence repeats (SSR) have the advantage of being multi-allelic (having a higher marker polymorphism than SNP). In addition, SSR can be genotyped using a standard PCR-based method, thus possessing a simpler start-up requirement. On the other hand, SNP is bi-allelic, which means that it contains less genetic information per locus than SSR markers, as presented in [24]. Nevertheless, it has the capacity to be used in a high-throughput system to genotype a high number of SNP markers with a large sample size [25].
We hypothesize that the designed novel DNA markers will be effective to measure the inbreeding effect among the Malaysia populations, as well as to estimate the genetic diversity of the other introduced region (Indonesia), and to contrast with this effect with those of the origin countries (Cameroon and Ghana). Hence, the aim of this study was to develop two robust E. kamerunicus-specific nuclear marker systems, i.e., single nucleotide polymorphisms (SNP) and simple sequence repeats (SSR), using RAD (restriction site associated DNA) tag sequencing, and then to assess the effectiveness and applicability of the two marker sets to capture the expected genetic patterns in the population.

Sample Collection and Genomic DNA Extraction
A total of 48 individual weevils originating from Malaysia (40 individual weevils representing northern, central, southern Peninsular Malaysia, as well as 5 individual weevils from Sabah) and Indonesia (3 individual weevils from Riau) were collected for marker discovery and development. Tissues from the head and membranous part of the hind wings of each weevil were sampled for DNA extraction using Analytik Jena TM Tick DNA kit (Analytik Jena, Berlin, Germany), according to the manufacturer's protocol. This was to ensure good quality DNA for the RAD tag sequencing and to minimize the risk of cross-contamination from other organisms, particularly nematodes, which are usually present in the abdominal segment of weevils. Subsequently, the evaluation of marker applicability for genetic diversity studies was performed, based on a larger set of 180 weevils representing two native populations (from Cameroon and Ghana) and 10 introduced populations in Malaysia and Indonesia (Table 1); each population consists of 15 individuals. The genomic DNA was extracted using Sbeadex™ Mini kit (LGC Genomics, Hoddesdon, UK) in an oKtopure™ (LGC Genomics, Hoddesdon, UK). Standard protocols, with the modification of an additional washing step using 80% ethanol (150 µL) and 25 µL of Proteinase-K, were performed. The quality of the extracted DNA was quantified using a Microplate reader (ThermoFisher Scientific, Massachusetts, USA).

RAD Tag Sequencing and Sequence Alignment
The genomic DNA of each individual weevil was digested using the Msll restriction enzyme. Digested DNA fragments were subjected to 150 bases paired-end sequencing using Illumina NextSeq500 V2 platform (Illumina Inc., California, USA). Standard preprocessing steps were taken after the de-multiplexing of the library groups using Illumina bcl2fastq 2.17.1.14 software (Illumina Inc., California, USA). Raw sequences per sample were obtained by the de-multiplexing of library groups into samples, according to their inline barcodes (with a maximum of 2 mismatches or Ns) and verification of the restriction site. All reads were processed by an adaptor clipper, and reads with a final length shorter than 20 bases were discarded as well. The next step was quality trimming by removing reads containing Ns and at the 3 -end to get the average Phred quality score (Q > 20) within a window size of 10 bases. Processed sequence reads were pooled and clustered using the CD-HIT-EST program at 95% similarity cut-off [26]. The alignment of the subsampled quality-trimmed reads against the cluster sequences was performed using BWA version 0.7.12 [27]. Variant discovery and genotyping were then performed using Freebayes v1.0.2-16 [28] by following the parameters set to minimum base quality of 10, a minimum coverage of 5, a mismatch base quality threshold of 10, a minimum supporting allele qsum of 10, and a read mismatch limit of 3. These combined variants, scored in a variant call format (vcf) file, were re-filtered for the read count, ensuring that each locus showed at least 8 reads and a minor allele frequency >5%. Lastly, the genotypes must be observed in at least 32 out of 48 samples.

SNP Discovery and Genotyping
An in-house script was used to check the availability of minimal 75 bp upstream and downstream flanking sequences of each SNP. The fastacmd program [29] under BLAST [30] was used to extract the SNP markers with adequate flanking sequences. Remapping of these markers on the tag sequence pool was performed using BLASTN with an e-value threshold of 1e-05 and 95% identity over a minimum of 140 bp to ensure high specificity to the targeted SNP markers. In addition, unique markers were then mapped against the protein (NR) database and the bacteria complete genome from the NCBI database, using BLASTN with a maximum e-value of 1e-05. The list of bacteria can be found at ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/ assembly_summary.txt (accessed on 26 January 2018). This step was performed to remove possible contaminants from weevil-specific sequences. In addition, as E. kamerunicus belongs to the order Coleoptera, those identified markers that were mapped to genomes of beetle species (Aethina tumida, Agriotes lineatus, Anoplophora glabripennis, Biphyllus lunatus, Carabus granulatus, Dascillus cervinus, Dendroctonus ponderosae, Julodis onopordi, Leptinotarsa decemlineata, Onthophagus taurus, Scarabaeus laticollis, and Tribolium castaneum) in the NR database were selected for this study. Additionally, those unmapped markers with a >90% call rate calculated in the PLINK program [31] were selected to eventually make up the 1000 SNP markers. To validate these candidate SNP markers, 75 bp upstream and downstream from an SNP were probed to genotype a total of 180 weevil samples using SeqSNP TM Genotyping by Sequencing (LGC Genomics, Berlin, Germany. Only SNP with a >95% call rate were retained for subsequent analysis [32].

SSR Discovery and Genotyping
FullSSR suite v1.0 [33] was adopted to identify SSR. Unique contigs that contained the SSR motifs were confirmed by aligning all-against-all BLASTN at the e-value threshold of 1e-25 to the cluster sequences. Only those assured contigs harboring SSR motifs with > 8X coverage in BAM files were then imported into the Primer Premier 6 program (PREMIER Biosoft, US) for primer design. The design complied with a primer length ranging from 18 bases to 23 bases, target PCR products ranging from 100 bp to 280 bp, and a melting temperature (T m ) ranging from 54 • C to 60 • C; the maximum T m difference between the forward and the reverse primer was set at 0.5 • C. A total of 120 primer pairs with an M13-tailed forward primer for subsequent amplicon dye-labeling [34] were synthesized (IDT, Singapore).

Genetic Diversity and Clustering Analysis
Only SNP and SSR markers with a call rate >95% were included in the genetic diversity analyses [32]. At the marker level, genetic diversity parameters, including major allele frequency (MAF), mean number of alleles (N A ), expected heterozygosity (H e ), observed heterozygosity (H o ), polymorphism information content (PIC) [35], and inbreeding coefficient (F is ) [36], were generated in the PowerMarker v3.25 program [37] for SSR and SNP, respectively. Same parameters of each marker system were also estimated at the population level, i.e., the assayed 12 weevil populations (Table 1). In the PowerMarker v3.25 program, genetic relatedness among 12 assayed weevil populations was subsequently estimated based on pairwise Nei's genetic distance (D) [38] to construct a phenogram using the unweighted pair group method with arithmetic average (UPGMA) approach. These phenograms were visualized in the TreeView program [39]. In addition, principal component analyses (PCA) were performed on 180 individual weevils in R packages, namely adegenet [40], Hiefstat [41] and Pegas v 3.2.4 [42], following the references for Population Genetics in R [43]. The frequency of the null alleles (NAF) of SSR was further investigated in MICRO-CHECKER v2.2.3 [44] using the Brookfield 1 method for frequency estimation [45]. Allelic frequencies of SSR markers were corrected using the same program, followed by the re-estimation of SSR-based F is across the populations using the PowerMarker v3.25 program [37].

SNP Discovery and Marker Informativeness
The RAD tag sequencing produced a total of 190,055,454 reads, with an average of 1.98 million reads per weevil, out of which 1,911,233 sequences were clustered and used for marker discovery. The number of SNPs obtained in each filtering step is shown in Figure 1. The initial number of SNPs discovered in 48 weevil samples was 19,148 SNP. For probe design, the number was reduced to 8354 SNP with the length of the flanking sequences of at least 75 bp upstream and downstream. Only 4360 SNP were found to be unique, and the number subsequently dropped to 2746 SNP that had no hit to the protein database and bacterial genome from the NCBI database. From this pool, as well as an additional 46 SNPs with good hits to the NR database, a total of 1000 SNP were eventually shortlisted and deposited in the European Variation Archive (EVA) database. Nevertheless, out of these 1000 SNPs with a mean call rate of 0.53% (±0.4322), only 220 SNPs with a call rate > 95% (Table S1) were selected for genetic diversity analyses. The informativeness of the selected 220 SNPs was first evaluated using mean PIC per SNP = 0.2387 (±0.1280), ranging from 0.0000 to 0.3750. Furthermore, the mean H o per SNP was 0.2917 (±0.2221), which was close to the mean H e per SNP = 0.2982 (±0.1742). In this study, N A = 435 were detected, with the average of 1.9773 alleles per SNP (Table S1).
phenograms were visualized in the TreeView program [39]. In addition, principal component analyses (PCA) were performed on 180 individual weevils in R packages, namely adegenet [40], Hiefstat [41] and Pegas v 3.2.4 [42], following the references for Population Genetics in R [43]. The frequency of the null alleles (NAF) of SSR was further investigated in MICRO-CHECKER v2.2.3 [44] using the Brookfield 1 method for frequency estimation [45]. Allelic frequencies of SSR markers were corrected using the same program, followed by the re-estimation of SSR-based Fis across the populations using the PowerMarker v3.25 program [37].

SNP Discovery and Marker Informativeness
The RAD tag sequencing produced a total of 190,055,454 reads, with an average of 1.98 million reads per weevil, out of which 1,911,233 sequences were clustered and used for marker discovery. The number of SNPs obtained in each filtering step is shown in Figure 1. The initial number of SNPs discovered in 48 weevil samples was 19,148 SNP. For probe design, the number was reduced to 8354 SNP with the length of the flanking sequences of at least 75 bp upstream and downstream. Only 4360 SNP were found to be unique, and the number subsequently dropped to 2746 SNP that had no hit to the protein database and bacterial genome from the NCBI database. From this pool, as well as an additional 46 SNPs with good hits to the NR database, a total of 1000 SNP were eventually shortlisted and deposited in the European Variation Archive (EVA) database. Nevertheless, out of these 1000 SNPs with a mean call rate of 0.53% (±0.4322), only 220 SNPs with a call rate > 95% (Table S1) were selected for genetic diversity analyses. The informativeness of the selected 220 SNPs was first evaluated using mean PIC per SNP = 0.2387 (±0.1280), ranging from 0.0000 to 0.3750. Furthermore, the mean Ho per SNP was 0.2917 (±0.2221), which was close to the mean He per SNP = 0.2982 (±0.1742). In this study, NA = 435 were detected, with the average of 1.9773 alleles per SNP (Table S1).

SSR Discovery and Marker Informativeness
A total of 223,200 SSR motifs were identified from the contigs, with an average length of 121 bp within the range between 103 bp and 320 bp. Only 645 SSR motifs were flanked with a sufficient length of sequences for primer designs. The number was further reduced to 120 sequences due to a good predicted match of primer pairs. The 120 designed primers

SSR Discovery and Marker Informativeness
A total of 223,200 SSR motifs were identified from the contigs, with an average length of 121 bp within the range between 103 bp and 320 bp. Only 645 SSR motifs were flanked with a sufficient length of sequences for primer designs. The number was further reduced to 120 sequences due to a good predicted match of primer pairs. The 120 designed primers were tested in a subset of 8 samples representing 1 native and 7 introduced populations for amplifiability and polymorphism tests. The results show that 30 SSR consistently produced amplicons with the expected size, with only 8 SSR found to be polymorphic (Table 2), while the remaining 22 monomorphic SSR are shown in Table S2. In addition, these polymorphic SSR, which were successfully validated based on single pass sequencing results (Supplementary Figures S1-S8), were subjected to full genotyping, with 180 weevil samples representing 12 populations, as shown in Table 1. A total of 50 alleles were detected in the sample pool, and the mean N A per SSR was 6.25 (±3.2) alleles, ranging from 4 to 13 alleles (Table 2). Moreover, the mean H o per SSR = 0.2747 ±0.1614 was lower than the mean H e per SSR = 0.5550 ±0.1934. The PIC per SSR was between the range of 0.2540 and 0.8024, with the mean of 0.5084 (±0.1928). All eight SSR markers indicated a significant deviation from the Hardy-Weinberg equilibrium (HWE) at p = 0.05 threshold (Table S3), but no strong linkage disequilibrium (LD) was detected among them (r 2 < 0.5) (Table S4).

Genetic Diversity Analysis
The genetic diversity of 12 assayed weevil populations was analyzed using 220 SNP and 8 SSR, respectively, to compare the applicability of both marker systems. For SNP, MAF of the CAM population (0.8222) and Ghana (GHN) population (0.9199) were higher than 0.7974 (±0.0409), which is the MAF mean (Table 3). This also resulted in a lower  In Table 3, using SNP, all populations recorded a negative F is , ranging from −0.1146 (Central West Kalimantan, CWK) to −0.0244 (CAM), except for the GHN population, with 0.0330. The SSR markers, however, indicated the opposite trend of F is , in which all 12 populations showed positive values in the range between 0.2931 (South Sarawak, SS) to 0.5857 (North Peninsular Malaysia, NPM). As this was observed due to the presence of null alleles in some markers, allelic frequencies were then adjusted for re-estimation of SSR-based F is . After the adjustment, the SSR-based F is were reduced, ranging from −0.0246 (CAM) to 0.1173 (SS). Particularly, the SBH (−0.0024), CWK (−0.0030), and CAM (−0.0246) populations recorded negative F is values ( Table 3). As expected, all assayed populations carried more SSR alleles (mean N A ≥ 3.0000) than did the SNP, leading to a lower MAF mean (0.6370 ± 0.0665) ( Table 3)

Genetic Distance Estimate and Clustering Analysis
In general, the mean pairwise Nei's genetic distance (D) among the 12 assayed populations calculated using SNP and SSR was 0.0329 and 0.1330, respectively. The lowest genetic relatedness between the GHN population and the CAM population was revealed by SNP (D = 0.1452) and SSR (D = 0.3841). However, the closest related population pair detected in both marker systems varied: NPM-CPM pair (D = 0.011) in SNP and CPM-SR pair (D = 0.0367) in SSR. The genetic relatedness among populations was subsequently visualized in a UPGMA phenogram, based on D estimates (Figure 2). Both SNP and SSR mutually assigned the assayed 12 populations into 3 major divisions (2 lineages and 1 cluster): (i) GHN, (ii) CAM, and (iii) introduced populations in Malaysia and Indonesia (Figure 2A,C). Apparently, the introduced cluster was more genetically related to the native CAM than that of the native GHN. Besides similarity clustering, PCA plots ( Figure 2B,D) further explained the resolution of both marker systems. The assayed SNP panel indicated a much higher resolution by tightly isolating the three clusters ( Figure 2B), compared to assaying with the SSR panel ( Figure 2D). The assayed 220 SNP provided a higher resolution than did the 8 SSR markers, with a reduced within-group variation and an increased between-group distance. (Figure 2A,C). Apparently, the introduced cluster was more genetically related to the native CAM than that of the native GHN. Besides similarity clustering, PCA plots ( Figure  2B,D) further explained the resolution of both marker systems. The assayed SNP panel indicated a much higher resolution by tightly isolating the three clusters ( Figure 2B), compared to assaying with the SSR panel ( Figure 2D). The assayed 220 SNP provided a higher resolution than did the 8 SSR markers, with a reduced within-group variation and an increased between-group distance.

Nuclear DNA Marker Development and Informativeness
RAD tag sequencing is a cost-effective and rapid method for discovering DNA markers, especially for non-model species with no prior genome references [47][48][49][50][51]. The method enabled the identification of 19,148 SNP and 223,200 SSR motifs from 48 weevils.

Nuclear DNA Marker Development and Informativeness
RAD tag sequencing is a cost-effective and rapid method for discovering DNA markers, especially for non-model species with no prior genome references [47][48][49][50][51]. The method enabled the identification of 19,148 SNP and 223,200 SSR motifs from 48 weevils. Normally, SNPs were expected to be more abundant throughout the genome of most species compared to SSR [52,53], but this was not observed in this study. The identification of SNP is dependent on the presence of nucleotide variants across individuals; therefore, overrepresentation of the introduced populations in Malaysia and Indonesia might limit the genetic polymorphism. Hence, the inclusion of a wider collection of native populations is recommended. For example, in the SNP discovery to study the population structure of the Asian longhorned beetle, the longhorned beetle samples from a wide coverage area in China and Korea were used [54].
Although discovery of SSR is more feasible, even with a single reference genome, only 0.28% of the identified SSR in this study were suitable for primer design. This was due to short generated RAD tag sequences, which usually spanned between 150 bp and 300 bp [55], thereby limiting the availability of flanking sequence for primer designs. A similar experience was reported in a previous study: only 237 out of 94,851 reads with SSR motif (0.24%) of an endangered yew species, Taxus florinii, were suitable for primer design [56]. In that study, the generated RAD tag sequences were on average 141 bp. Besides, in the absence of the E. kamerunicus reference genome, the identification of weevilspecific SNPs and SSRs, based on cross-mapping with reference genomes of the order Coleoptera, was found to be promising. A similar SNP marker discovery using RAD tag sequencing for a coconut leaf beetle without a reference genome was reported in [57]. As an example in plants, , the conserved DNA markers of the oil palm has successfully improved the genome assemblies in two other economically important palm species: the coconut and date palms [58].
Despite the reduced number of DNA markers, 220 SNP and 8 SSR were found to be informative. Markers with PIC values ≥0.5 are usually considered as highly informative, 0.25-0.5 as moderate, and <0.25 as low [35]. For bi-allelic SNP, the maximum PIC value per marker can only achieve 0.345 [59]. This explains why multi-allelic SSRs generally exhibited a higher PIC than that of the assayed SNP. The constrained informativeness of SNP, nevertheless, can always be compensated by using a large number of SNPs with high genotyping throughput. In this study, the N A of SNP = 435 was observed to be 8.7 times higher than the N A of SSR at 50.

Applications of Weevil-Specific SNP and SSR in Genetic Diversity Assessment
Both SNP and SSR mutually assigned 12 assayed populations into three major divisions: 2 lineages namely (i) GHN, (ii) CAM, and a cluster of (iii) introduced populations in Malaysia and Indonesia. With a higher mean N A of multi-allelic SSR, the marker system had more advantages in explaining within-cluster variation than inter-clusters ( Figure 2C,D). Previous studies reported that SSR was far more informative than SNP for population structure inference, where a randomly chosen set of SSRs would have 4-12 times greater informativeness than a random chosen set of SNPs [60,61]. For example, a total of 20 SSR markers were reported as useful for measuring gene heterozygosity and elucidating the population genetic structure of the adzuki bean weevil [62]. However, as mentioned earlier, SNP has a higher genotyping throughput compared to SSR, to compensate for the shortcoming. This feature enabled the assayed 220 SNP to confer better PCA clustering among the three groups, with reduced within-group variation and increased between-group distance. Similarly, a total of 239 SNP of red mangrove species effectively provided greater resolution on the population structure compared to the 8 SSR markers [63].
From the UPGMA phenogram, the cluster of the introduced populations in Malaysia and Indonesia was more closely related to CAM than GHN, which corresponded to the introduction event of the weevil from Cameroon to Southeast Asia. Furthermore, separate genetic clustering between Ghana, Central Africa, and Cameroon, West Africa, is in agreement with that obtained in a previous study reporting the difference in mitochondrial genetic clustering of E. kamerunicus between Central and West Africa [21].
Interestingly, 11 weevil populations indicated excessive SNP-based heterozygosity (F is < 0), suggesting an absence of inbreeding in these populations. These findings might further support those of previous studies. Morphometric measurements of E. kamerunicus in Malaysia and Indonesia were not significantly different from those of the native Liberia weevils [64]. The population abundance of E. kamerunicus has also been reported to be stable [65]. It is worth noting that both SNP and SSR revealed unexpectedly poorer PIC and H o in native GHN and CAM populations compared to the introduced populations, leading to a homozygosity deficit (F is > 0) in the CAM population. One possible reason for this was non-random sampling, where these samples were collected from a limited number of palms, thereby not representing the heterozygosity of the real populations. Hence, random sampling from a wider range of the native population is recommended in future studies.
On the other hand, SSR-based F is showed the total opposite results, with all populations recording positive values, suggesting excess homozygosity which can be partly due to the presence of null alleles [66]. Unfortunately, null alleles with high NAF > 0.2 were present in half of the assayed SSR (Table 2). Despite the reduction in the adjusted SSR-based F is , most values remained positive across the populations (Table 3). A null allele event, which was first reported in 1993, can occur when alleles fail to amplify due to the mutated annealing sites of SSR primers, leading to incorrect scoring of two different alleles as one allele [67]. This means that the genotyping of the samples may appear homozygous rather than heterozygous for the null allele. Certainly, underestimation of the heterozygosity parameter, such as Wright's inbreeding estimates, can arise [68][69][70]. For instance, the estimated F is of the oriental fruit moth populations derived from 10 SSR with high NAF was reported to be two to three times higher than the estimation derived from another set of 10 SSR with low NAF [71]. In this study, it is suggested that the existence of null allele in SSR has led to overestimation of homozygosity of assayed weevil populations, hence contributing to the positive values of SSR-based F is . Consequently, the generated positive F is does not correctly represent inbreeding occurrence in both native and introduced E. kamerunicus populations.
The presence of null alleles predicates the need to redesign the primers [70], but that was not feasible in this study. The length of the flanking region of RAD tag-SSR was extremely limited. Therefore, RAD tag-derived SNP was clearly more advantageous over RAD tag-derived SSR markers. The sequencing of the weevil genome as a reference is recommended to minimize the presence of null alleles, as illustrated in the Asian honeybee [72]. The SSR markers developed in this study can be deployed across other oil palm-pollinating weevil, especially from the Elaeidobius genus, i.e., E. plagiatus, E. singularis, E. bilineatus, E. subvittatus, and E. spatilifer. Closely related species would be expected to share more conserved sequence sites [73,74]. If the sequence is conserved between the model and the species of interest, the primers that flank the conserve regions will have a higher probability of amplifying DNA fragments in the species of interest [73]. Many successful cross-taxa applications of DNA markers, such as planthopper Laodelphax striatellus, with its related taxa [75], Diabrotica beetles [76], and oil palm Elaeis guineensis, with its close relative E. oleifera [77], were reported.
These 220 informative SNP markers can be deployed as a PCR-based genotyping method, such as Seq-SNP (LGC Genomics, Berlin, Germany), Kompetitive Allele Specific PCR (KASP) (LGC Genomics, Hoddesdon, UK) [78] and TaqMan ® (Applied Biosystem, Massachusetts, USA) to obtain consistent and high call rates for more reliable results in future studies. In addition, PCR-based SNP markers can be cost-effective when genotyping a large number of samples. The Seq-SNP genotyping takes 3 days and costs USD 7 for each DNA sample genotyped with 163 SNPs in 261 cucumber varieties [25]. This feature is crucial for other researchers to extend similar surveys in different areas of interest and eventually, to form a comprehensive genetic diversity profile of weevil populations across Africa and Southeast Asia.

Conclusions
The discovery of nuclear DNA markers for E. kamerunicus using RAD tag sequencing was promising. The applicability of informative SNP in the genetic diversity assessment of weevil populations was observed to be superior to SSR. The main limitation of SSR is the null allele, which is common in RAD tag sequences. The presence of null allele in SSR markers from this study has led to an upward bias in inbreeding coefficient estimates. The inbreeding coefficient that was estimated using SNP-based data has recorded negative values in introduced E. kamerunicus populations indicating inbreeding is not being observed for now. For further confirmation, we propose expanding the survey to more populations across Malaysia, Indonesia, and Papua New Guinea using the developed specific nuclear DNA markers. Collaborative studies can be performed among industry players in oil palmgrowing countries. This would obtain a more comprehensive genetic diversity concerning weevil populations in these regions. The information will be used to create guidelines regarding conservation planning and sustainability monitoring of this important pollinator species, from the estate to the national level.