Characterization of Potential Molecular Markers in Lac Insect Kerria lacca (Kerr) Responsible for Lac Production

Simple Summary Lac insect Kerria lacca (Hemiptera, Kerriidae) is widely used to produce lac in tropical and subtropical areas. This species is genomically understudied and there is no information present on single-nucleotide polymorphisms and insertions and deletions of lac insects. This study involves distribution and characterization of molecular markers in K. lacca. Nucleotide substitution frequency and transition to transversion ratio were highest at the late adult stage. The maximum numbers of SNPs and InDels were distributed on chromosome 8 and 1, respectively. Most variants at low frequencies were detected in InDels compared to SNPs. Gene function analysis showed several gene variations, and candidate SNPs that were involved in lac synthesis. This work paves the way to determining further mechanisms involved in increased lac production. Abstract Kerria lacca (Kerr) is an important lac insect extensively used in industrial products in the form of resin, wax and dye. The scarce knowledge on molecular markers for K. lacca is a barrier in elucidating genetic information. Our study identified a total of 16,921 single-nucleotide polymorphisms (SNPs), and 6231 insertions and deletions (InDels)—of which, intergenic variation accounted for 41.22% and 56.30%, and exonic variation accounted for 39.10% and 17.46%, of SNPs and InDels, respectively. Observation of SNPs suggested that nucleotide substitution frequency and transition to transversion (Ts/Tv) ratio were highest at the late adult stage, 3.97, compared to at the other stages, with a genome-wide Ts/Tv ratio of 2.95. The maximum number of SNPs, 2853 (16.86%), was identified in chromosome 8, while the lowest, 1126 (6.65%), was identified in chromosome 7. The maximum and minimum numbers of InDels were located on chromosome 1 and 7, with 834 (13.38%) and 519 (8.33%), respectively. Annotation showed that highest numbers of exonic and intergenic SNPs were present at the late adult stage, whereas the maximum number of InDels was found at the larval stage. On the basis of gene function, 47 gene variations were screened and 23 candidate genes were identified in associations with lac production. Concluding work will enhance knowledge on molecular markers to facilitate an increase in lac production in K. lacca as well as other lac insects.


Introduction
Lac insects are known as solitary animals with significant ability to produce natural resin [1]. The natural product produced is called lac, which is composed of three main components-lac resin, wax, and dye [2]. These components have important applications in cosmetics, food, textiles, pharmaceutical industries, surface coating, and electrical industries [3]. Lac is widely used in several industries due to properties including moisture resistance, non-toxic, strong adhesiveness, tasteless, good insulation, and smooth film texture [4]. Lac resins are a combination of lactides and lactones made of polyterpene esters and hydroxyl fatty acids [5]. In cytoplasm, mevalonate (MVA) is a biosynthetic pathway in the formation of polyterpene esters [6,7]. The pathway involves conversion of acetyl-CoA into mevalonic acids through the action of hydroxylmethylglutaryl-CoA synthase and reductase. Further, mevalonic acid is converted into sesquiterpene acids through the action of terpene synthases (TPS) followed by conversion of sesquiterpene acids to sesquiterpene acid esters [8]. Hydroxyl fatty acids, the second components, are formed through fatty acid synthesis by passing through different steps catalyzed by fatty acid desaturases (FAD), elongases of very long chain fatty acids (ELO), and fatty acid synthetases (FAS). Another important constituent of lac resins is aleuritic acid (−9, 10, 16-erythro aleuritic acid), which is produced from acetyl CoA, and fatty acid synthases [9].
Lac insects are members of the genus Kerria (Hemiptera, Kerriidae) [10], with 29 species worldwide [11], of which K. lacca species is most widely cultivated for commercial lac production [12]. Kerria lacca is well distributed in China, India, Pakistan, Azerbaijan, Bangladesh, Georgia, Guyana, Malaysia, Myanmar, Nepal, Sri Lanka, and Thailand, and reported on more than one hundred host plants [13,14]. Females and males of K. lacca showed sexual dimorphism [15], and both produce and secrete remarkably different proportions of lac during their life cycle [4]. The life cycle of female lac insects is approximately 6-8 months and involves eggs, nymph instars and the adult stages ( Figure 1) [16]. Female larvae and males produce small amounts of lac, whereas adult females (in the early, mid and late adult stages) are capable of secreting huge amounts of lac [17].
Insects 2022, 13, x FOR PEER REVIEW 2 of 12 resistance, non-toxic, strong adhesiveness, tasteless, good insulation, and smooth film texture [4]. Lac resins are a combination of lactides and lactones made of polyterpene esters and hydroxyl fatty acids [5]. In cytoplasm, mevalonate (MVA) is a biosynthetic pathway in the formation of polyterpene esters [6,7]. The pathway involves conversion of acetyl-CoA into mevalonic acids through the action of hydroxylmethylglutaryl-CoA synthase and reductase. Further, mevalonic acid is converted into sesquiterpene acids through the action of terpene synthases (TPS) followed by conversion of sesquiterpene acids to sesquiterpene acid esters [8]. Hydroxyl fatty acids, the second components, are formed through fatty acid synthesis by passing through different steps catalyzed by fatty acid desaturases (FAD), elongases of very long chain fatty acids (ELO), and fatty acid synthetases (FAS). Another important constituent of lac resins is aleuritic acid (−9, 10, 16-erythro aleuritic acid), which is produced from acetyl CoA, and fatty acid synthases [9].
Lac insects are members of the genus Kerria (Hemiptera, Kerriidae) [10], with 29 species worldwide [11], of which K. lacca species is most widely cultivated for commercial lac production [12]. Kerria lacca is well distributed in China, India, Pakistan, Azerbaijan, Bangladesh, Georgia, Guyana, Malaysia, Myanmar, Nepal, Sri Lanka, and Thailand, and reported on more than one hundred host plants [13,14]. Females and males of K. lacca showed sexual dimorphism [15], and both produce and secrete remarkably different proportions of lac during their life cycle [4]. The life cycle of female lac insects is approximately 6-8 months and involves eggs, nymph instars and the adult stages ( Figure 1) [16]. Female larvae and males produce small amounts of lac, whereas adult females (in the early, mid and late adult stages) are capable of secreting huge amounts of lac [17]. Knowledge on molecular markers is always a primary need for genomic research [18]. In terms of DNA markers, single-nucleotide polymorphisms (SNPs) and insertions and deletions (InDels) are the most valuable markers [19]. These markers (SNPs and In-Dels) are associated with protein-coding genes and their pertinent translated regions [20], Knowledge on molecular markers is always a primary need for genomic research [18]. In terms of DNA markers, single-nucleotide polymorphisms (SNPs) and insertions and deletions (InDels) are the most valuable markers [19]. These markers (SNPs and InDels) are associated with protein-coding genes and their pertinent translated regions [20], and are expedited to understand their link to gene function [21]. Several studies were carried out on lac insects in the terms of ecology, biology, genetic diversity, microbial diversity, host plants, molecular phylogeny, biogeography, and transcriptome [1,4,14,22,23]. However, K. lacca is also genomically understudied insect, and there are no data available on SNPs and InDels of lac insects. Therefore, the present study involves the use of K. lacca for genome-wide identification of DNA variants through an Illumina NovaSeq. Potential variants (SNPs and InDels) were studied to develop and enlist lac synthesis genes to predict and observe its effect using genomic location. The linked variants will be further used to facilitate genotype studies and for marker-assisted selection for lac synthesis.

Insect Material
Kerria lacca was reared on host plant Schleichera oleosa in the Research Station of Yuanjiang, Institute of Highland Forest Science, Yunnan, China (102 • 00 46 E, 23 • 36 11 N). Female lac insects at the larvae (n = 100), early adult (n = 100), mid adult (n = 100) and late adult (n = 100) stages were collected. Immediately, all samples were put in liquid nitrogen, then transferred and stored at −80 • C. The samples were used for genomic DNA and variation studies as follows.

Sequencing and Variant Identification
Total genomic DNA was extracted from 1 g/sample (3 biological replication per stage) using an insect DNA Extraction Kit, Zymo Research (Irvine, CA, USA), following the manufacturer's instructions. DNA quality was assessed using a NanoDrop (ND2000) spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA), and extracted DNA was stored at −20 • C for future use and library preparation. The whole-genome library was prepared using a library preparation kit Nextera Flex Library Prep Kit (Illumina, San Diego, CA, USA) according to the manufacturer's instructions, with an average insert size of 150 bp, and was sequenced on the Illumina NovaSeq 6000 at Benagen Technology Co. Ltd. (Wuhan, China). The raw data were quality controlled using FastQC (version 0.20.1) [24] and clean reads were aligned to the reference genome of the K. lacca species with default parameters. SAMtools (version 1.11) was used to delete the duplicate reads to discover and filtrate variants [25] to sort, remove duplicated reads and merge the bam alignment results of each sample. GATK (version 4.2.0.0) [26] software was used to detect variation on data to obtain high-quality SNP and InDel information, with these parameters (SOR > 4.0; QD < 2.0; MQ < 40.0; FS > 60.0; DP < 10.0; ReadPosRankSum < −8.0).

Annotation of Single-Nucleotide Polymorphisms and Insertion/Deletion Polymorphism
SnpEff (version 4.3) was used to determine the putative effect of DNA variants [27]. The SnpEff tool was used to annotate a variant call format (VCF) file containing SNPs and InDels using default parameters, and variants were classified as genic and intergenic according to their genomic location. Variants in intergenic and intron regions are classified as modifier impact. Variants in coding genic regions can have two effects: a moderate to low impact or a high impact. Moderate-to low-impact variants (e.g., synonymous variants) are thought to be mainly innocuous or unlikely to affect protein behavior, but a non-disruptive variant that may affect protein efficacy is termed as moderate impact (e.g., inframe deletion and non-synonymous variant). High-impact variants (e.g., frameshift variant and stop gained) may cause protein truncation or loss of function [27,28].

Identification of Lac Production-Related Genes with Linked Variants
Based on the lac secretion-minimum (larvae) and lac secretion-active (early, mid, and late adult) stages, diverse variations (SNPs and InDels) were analyzed. We selected differentially expressed genes (DEGs) during the lac production-minimum and -maximum stages including fatty acid metabolism, fatty acid biosynthesis, fatty acid elongation, and terpenoid backbone biosynthesis. We listed lac-related genes as those identified with high expression during the lac production-maximum stage. The protein sequences of DEGs were used as queries to identify genes with related functions. We explored the association with SNPs and InDels variants from selected lac-related genes. The genes linked to variants were classified into four categories: fatty acid metabolism genes, fatty acid biosynthesis genes, fatty acid elongation-related genes, and terpenoid backbone biosynthesis genes. The expression profiles of selected SNPs were indicated as heatmaps through TBtools software (version 0.665) [29].

SNP Substitutions at Different Stages of Kerria Lacca
On the basis of nucleotide substitutions, SNPs were grouped as transitions (purine to purine and pyrimidine to pyrimidine) and transversions (purine to pyrimidine and pyrimidine to purine). We found 16,640 transitions and 5637 transversions, with a genomewide transition to transversion ratio (Ts/Tv) of 2.95. Study of SNPs showed that the frequency of nucleotide substitution and Ts/Tv ratio were greater at the late adult stage (3.97), compared to at the mid adult stage (2.74), the early adult stage (2.64) and the larval stage (2.49) ( Table 2 and Table S1). Most SNPs were of the A/G (38.05%) type followed by T/C (36.64%) ( Figure 2C). Transversion substitutions of SNPs consisted of T/A (10.22%) and A/C (5.68%), followed by G/T (5.82%) and G/C (3.58%) ( Figure 2C).

SNP and InDel Annotation
A total of 22,175 SNPs were collected at the larvae, early adult, mid adult and late adult stages of K. lacca. SNP annotation revealed that more SNPs were located at upstream sites at the late adult stage (243 SNPs), followed by at the early adult (225 SNPs), mid adult (146 SNPs), and larvae (145 SNPs) stages. The highest number of SNPs were present at exonic sites at the late adult stage (3250 SNPs), the lowest at the mid adult stage (1706 SNPs), and the larvae and early adult stages contained 1970 SNPs and 2914 SNPs, respectively. The early adult stage contained more SNPs at intron sites and the lowest number

SNP and InDel Annotation
A total of 22,175 SNPs were collected at the larvae, early adult, mid adult and late adult stages of K. lacca. SNP annotation revealed that more SNPs were located at upstream sites at the late adult stage (243 SNPs), followed by at the early adult (225 SNPs), mid adult (146 SNPs), and larvae (145 SNPs) stages. The highest number of SNPs were present at exonic sites at the late adult stage (3250 SNPs), the lowest at the mid adult stage (1706 SNPs), and the larvae and early adult stages contained 1970 SNPs and 2914 SNPs, respectively. The early adult stage contained more SNPs at intron sites and the lowest number of SNPs at the larval stage and the mid adult stage. More SNPs were located at downstream sites at the late adult stage (737 SNPs), followed by at the early adult (720 SNPs), larvae (536 SNPs), and mid adult stages (480 SNPs). The highest number of SNPs were present at intergenic sites at the early adult stage (1679 SNPs), and the lowest at the mid adult stage (1095 SNPs), while the larvae and late adult stages contained 1213 SNPs and 1658 SNPs, respectively ( Figure 3C and Table S2).
A total of 7404 InDels were collected at the larvae, early adult, mid adult and late adult stages of K. lacca. InDel annotation revealed that more InDels were located at upstream sites at the late adult stage (90 InDels), followed by at the mid adult (84 InDels), larvae (82 InDels), and early adult stages (77 InDels). The highest number of InDels were present at exonic sites at the larval stage (650 InDels), and the lowest at the late adult stage (590 InDels), while the early and mid adult stages contained 604 InDels and 597 InDels, respectively. The larval stage contained more InDels at intron sites and the lowest at the late adult stage. More InDels were located at downstream sites at the larval stage (307 InDels), followed by at the mid adult (295 InDels), early adult (290 InDels), and late adult stages (270 InDels). The highest number of InDels were present at intergenic sites at the larval stage (541 InDels), and the lowest at the late adult stage (426 InDels), while the early and mid adult stages contained 474 InDels and 448 InDels, respectively ( Figure 3D and Table S3).

Functional Classification Variants
According to genomic position, collectively, 20,092 effects were predicted based on 8498 DNA variants. The effect of one DNA variant can be considered as on multiple gene (e.g., a variant might be upstream one gene while downstream from another gene). SNPs and InDels caused a total of 14,736 and 5356 effects, respectively ( Figure 4A). The effects of the variants were classified into three categories: modifier (15,017), moderate to low (4471), and high or strong (604) impact. The SNP effects of the variants were 10,268, 4456, and 12 as modifier, moderate to low and high impact, respectively. The InDel effects of the variants were 4749, 15, and 592 as modifier, moderate to low and high impact, respectively ( Figure 4A). InDels), followed by at the mid adult (295 InDels), early adult (290 InDels), and late adult stages (270 InDels). The highest number of InDels were present at intergenic sites at the larval stage (541 InDels), and the lowest at the late adult stage (426 InDels), while the early and mid adult stages contained 474 InDels and 448 InDels, respectively ( Figure 3D and Table S3).

Functional Classification Variants
According to genomic position, collectively, 20,092 effects were predicted based on 8498 DNA variants. The effect of one DNA variant can be considered as on multiple gene (e.g., a variant might be upstream one gene while downstream from another gene). SNPs and InDels caused a total of 14,736 and 5356 effects, respectively ( Figure 4A). The effects of the variants were classified into three categories: modifier (15,017), moderate to low (4471), and high or strong (604) impact. The SNP effects of the variants were 10,268, 4456, and 12 as modifier, moderate to low and high impact, respectively. The InDel effects of the variants were 4749, 15, and 592 as modifier, moderate to low and high impact, respectively ( Figure 4A).  The variants with a high impact had a direct effect on gene function, and we observed a total of 12 SNPs and 592 InDels with this effect. This is mostly caused in SNPs by stop codon gain or loss ( Figure 4B) and may have important functional significance. However, in InDels, this is produced by a disturbance in translational reading frame and may lead to improper amino acid sequences and, ultimately, abnormal proteins. A total of 4456 and 15 moderate-to low-impact variants were identified for SNPs and InDels, respectively. In SNPs, moderate impact is due to non-synonymous substitution, which results in a change in one amino acid; and in InDels, this occurs in coding regions ( Figure 4C). Low-impact SNPs are largely contained synonymous substitutions without any change in amino acid ( Figure 4C). A total of 10,268 SNPs and 4749 InDels were observed with modifier-impact variants, and predicted in non-coding regions ( Figure 4D).

Identification of Lac Synthesis Genes
For identification of lac-related genes, we analyzed variations (SNPs and InDels) in the lac secretion-minimum and lac secretion-active stages of the insect. We selected 47 DEGs during the lac production-minimum and -maximum stages determined through gene expression in fatty acid metabolism and terpenoid backbone biosynthesis. From 47 selected lac-related genes, a total of 30 genes were associated with 114 variants (74 SNPs and 40 InDels) (Tables S4 and S5). Of these variants, 44 are located in intergenic regions and 19 variants down and upstream of the genes. Only 40 variants are located inside the genes, including 24 in introns and 16 in exons. Of 30 genes, 20 were associated with both SNPs and InDels, while 13 and 7 variants were specifically associated with SNPs and InDels only, respectively. The 23 lac-related genes with linked variants were categorized into four groups, of which 6 genes were related to terpenoid backbone biosynthesis, 5 to fatty acid metabolism, 6 to both fatty acid biosynthesis and metabolism, and 6 to both fatty acid elongation and metabolism ( Table 3). The expression of these SNP variation genes at different developmental stages of K. lacca is shown in Figure 5.

Discussion
A total of 16,921 single-nucleotide polymorphisms (SNPs) and 6231 insertions and deletions (InDels) were obtained in this study, of which the maximum numbers of variations were found at chromosomes number 8 (2853 SNPs) and 1 (834 InDels), whereas the lowest numbers of variations were located at chromosomes number 7 (1126 SNPs and 519 InDels). The frequency of SNPs and the Ts/Tv ratio were higher at the late adult stage, compared to at the larvae, early and mid adult stages, revealing a trend of genomic con-

Discussion
A total of 16,921 single-nucleotide polymorphisms (SNPs) and 6231 insertions and deletions (InDels) were obtained in this study, of which the maximum numbers of variations were found at chromosomes number 8 (2853 SNPs) and 1 (834 InDels), whereas the lowest numbers of variations were located at chromosomes number 7 (1126 SNPs and 519 InDels). The frequency of SNPs and the Ts/Tv ratio were higher at the late adult stage, compared to at the larvae, early and mid adult stages, revealing a trend of genomic conservation at different stages during evolution. Transversion substitution and frequency of occurrence of K. lacca showed similar trends to that of Asian hornet insect Vespa velutina [30] and Pygmy grasshopper Tetrix japonica [31]. Single-copy variants are called singleton alleles and two-copy variants are known as doubleton alleles [28]. Most variants in K. lacca at low frequencies are detected in InDels (2132 singleton) compared to SNPs (1810 singleton). The highest numbers of SNPs (86) were found in only one Kerria_0015610.1 (carbohydrate metabolism) unigene, which was involved in RGN gluconolactonase. The maximum numbers of InDels 11, 12, 12, and 12 were found in the genes Kerria_0038410.1 (inositoltriphosphate 3-kinase), Kerria_0031000.1 (hypothetical protein), Kerria_0016260.1, and Kerria_0018720.1 (DNA polymerase epsilon subunit 1), respectively.
Based on the lac secretion-maximum (early, mid and late adult) and -minimum (larvae) stages and gene expression level, we screened 23 candidate SNPs possibly involved in lac production. These SNPs were included in terpenoid backbone biosynthesis as decaprenyl-diphosphate synthase, farnesyl pyrophosphate synthase, farnesyl diphosphate synthase, acetyl-CoA acetyltransferase, and hydroxymethylglutaryl-CoA reductase. As sesquiterpene acid is the main component of lac [17], SNPs concerned in terpenoid synthesis may take part in the production of sesquiterpene acids. SNPs were involved in fatty acid metabolism: acyl-CoA dehydrogenase, acyl-coenzyme A oxidase, acyl-CoA delta (11) desaturase, and very long chain fatty acid-elongase; SNPs included in both fatty acid metabolism and fatty acid biosynthesis: fatty acid synthase, long-chain acyl-CoA synthetase, and S-malonyltransferase; SNPs involved in both fatty acid metabolism and fatty acid elongation: elongation-very long chain fatty acids-protein-4, 6, 7, and 3-ketoacyl-CoA thiolase. Hydroxyl fatty acid is also a part of lac [4]; hence, SNPs included in fatty acid synthesis may participate in the production of hydroxyl fatty acids. Wang et al. (2019) found 28 candidate genes that were involved in lac secretion; of 23 candidate SNPs, 9 have a similar function including decaprenyl-diphosphate synthase, acyl-CoA dehydrogenase, acyl-CoA synthetase, and fatty acid synthase.
In the current era, the use of molecular markers plays an important role in understanding and mapping of candidate genes, ecology and genetic diversity [32]. SNP and InDel markers displayed more potential due to high reproducibility, number of alleles per locus and polymorphic nature [32][33][34]. The limitation is the high cost in developing these markers followed by scarce information on genomes in some species including Kerria lacca. The presence of SNPs and InDels positively affected the lac-related genes of K. lacca to study associated diversity, mapping, and marker-assisted selection. Based on this information, the present study started with DNA variants to search associated lac-related genes with backbone biosynthesis and fatty acid metabolism.

Conclusions
In summary, we characterized the molecular markers (SNPs and InDels) of K. lacca and revealed that there are 3-fold more linked SNPs compared to InDels. Based on the lac secretion-maximum and -minimum stages and gene expression level, few SNPs were involved in lac synthesis. Overall, the presence of significant variations in K. lacca genes paves the way to determining further mechanisms involved in increased lac production.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/insects13060545/s1, Table S1: Summary of SNPs at different stages of Kerria lacca; Table S2: SNP annotation at different stages of Kerria lacca; Table S3: InDel annotation at different stages of Kerria lacca; Table S4: Genes associated with SNP variants; Table S5: Genes associated with InDel variants. Data Availability Statement: All raw sequencing related to our manuscript is deposited to the public database NCBI under accession number ON687507-ON687529.