Genome and Genetic Engineering of the House Cricket (Acheta domesticus): A Resource for Sustainable Agriculture

Abstract Background: The house cricket, Acheta domesticus, is one of the most farmed insects worldwide and the foundation of an emerging industry using insects as a sustainable food source. Edible insects present a promising alternative for protein production amid a plethora of reports on climate change and biodiversity loss largely driven by agriculture. As with other crops, genetic resources are needed to improve crickets for food and other applications. Methods: We present the first high quality annotated genome assembly of A. domesticus from long read data and scaffolded to chromosome level, providing information needed for genetic manipulation. Results: Gene groups related to immunity were annotated and will be useful for improving value to insect farmers. Metagenome scaffolds in the A. domesticus assembly, including Invertebrate Iridescent Virus 6 (IIV6), were submitted as host-associated sequences. We demonstrate both CRISPR/Cas9-mediated knock-in and knock-out of A. domesticus and discuss implications for the food, pharmaceutical, and other industries. RNAi was demonstrated to disrupt the function of the vermilion eye-color gene producing a useful white-eye biomarker phenotype. Conclusions: We are utilizing these data to develop technologies for downstream commercial applications, including more nutritious and disease-resistant crickets, as well as lines producing valuable bioproducts, such as vaccines and antibiotics.


Introduction
Multiple reports demonstrate that the destruction of natural habitats and pollution from human activity, largely a result of land clearing for agriculture and climate change, have led to substantial global biodiversity loss and mass extinction [1][2][3][4]. Threats to biodiversity are also threats to all life on earth, including humans [2,[5][6][7]. With historic levels of biodiversity loss and an increasing human population, it is critical to reduce the consumption of natural resources from earth and its ecosphere. The UN expects the human population to grow to nearly 10 billion by 2050 [8], and food demand is projected to increase up to 62% [9]. Climate change, reduced productivity of agricultural lands, overfishing, dwindling freshwater, pollution from fertilizers and pesticides, and a host of other factors resulting from population increase will place a disproportionate burden on Table 1. Metrics of the A. domesticus genome assembly. Data included the CANU draft assembly from PacBio long read data, and the scaffolded assembly from Chicago and Hi-C long range data. An in silico genome annotation revealed 29,304 predicted genes in 1064 scaffolds, concentrated in the 11 large scaffolds (84%). We compared the length of the BUSCO reference genes from A. domesticus to those from a model coleopteran genome, Tribolium castaneum ( Figure S2). We found that 84% of the reference genes in A. domesticus were longer than those in T. castaneum, some more than 80-fold longer. The longer length of A. domesticus genes was due mostly to longer introns.

Metric
There were 6246 scaffolds in the A. domesticus genome assembly that were identified as metagenome sequences using multiple automated and manual annotations. Metagenomic scaffolds were from 1013 to 512,394 bp, and one larger scaffold, Ado_MfTpo_scaffold_8829, with 3,776,750 bases and 218 genes was identified as Acinetobacter baumannii; these scaffolds were submitted to NCBI as A. domesticus-associated metagenome data. Approximately 83% of the metagenome scaffolds had similarity to Invertebrate Iridescent Virus 6 (IIV6, File S3, "Ado_metagenome_Kraken2"). Bacterial scaffolds were mostly from the classes  An in silico genome annotation revealed 29,304 predicted genes in 1064 scaffolds, concentrated in the 11 large scaffolds (84%). We compared the length of the BUSCO reference genes from A. domesticus to those from a model coleopteran genome, Tribolium castaneum ( Figure S2). We found that 84% of the reference genes in A. domesticus were longer than those in T. castaneum, some more than 80-fold longer. The longer length of A. domesticus genes was due mostly to longer introns.
There were 6246 scaffolds in the A. domesticus genome assembly that were identified as metagenome sequences using multiple automated and manual annotations. Metagenomic scaffolds were from 1013 to 512,394 bp, and one larger scaffold, Ado_MfTpo_scaffold_8829, with 3,776,750 bases and 218 genes was identified as Acinetobacter baumannii; these scaffolds were submitted to NCBI as A. domesticus-associated metagenome data. Approximately 83% of the metagenome scaffolds had similarity to Invertebrate Iridescent Virus 6 (IIV6, File S3, "Ado_metagenome_Kraken2"). Bacterial scaffolds were mostly from the classes Gammaproteobacteria, Flavobacteriia, Sphingobacteriia, Fusobacteriia, Betaproteobacteria, Alphaproteobacteria, Actinomycetia, and Bacilli. However, follow-up screening of unclassified reads using blastn [50] and NCBInr found an additional 913 potential metagenome scaffolds (File S3, "Blastn_unclassified"). The final scaffolds were identified as mostly IIV6 and bacteria, but also a smaller number of scaffolds were fungi, protozoa, and nematode (File S3, "Metagenome_summary"). Of note, there were 27 scaffolds from Wolbachia similar to those described from other insects. In our analysis, we did not find evidence of common foodborne pathogens, such as Salmonella spp., and the few scaffolds aligning to Listeria sp., Escherichia coli or Staphylococcus sp. were insufficient to confirm the presence of these pathogens.
A closer examination of the predicted proteins from the virus scaffolds indicated that only seven were assigned as "high-quality" viral genomes, and 25 scaffolds were determined to be viral genomes of "medium-quality" as determined by CheckV ( [51] File S4). Although many shorter scaffolds were not aligned to the IIV6 genome, many of the predicted genes in these scaffolds were annotated as viral. Therefore, these scaffolds are likely to be viral and quality may be affected by the database integrity, sequence read quality, or assembly artifacts. These sequences also may represent new viruses. Other sequences A closer examination of the predicted proteins from the virus scaffolds indicated that only seven were assigned as "high-quality" viral genomes, and 25 scaffolds were determined to be viral genomes of "medium-quality" as determined by CheckV (Ref. [51] File S4). Although many shorter scaffolds were not aligned to the IIV6 genome, many of the predicted genes in these scaffolds were annotated as viral. Therefore, these scaffolds are likely to be viral and quality may be affected by the database integrity, sequence read quality, or assembly artifacts. These sequences also may represent new viruses. Other sequences were classified as low-quality matches or undetermined genomes.
Interestingly, one medium quality scaffold (Scaffold_4233) was similar to a circular virus (DTR_035638) from the metagenome in the reduced digestive system of an oligochaete, Olavius albidis, a marine gutless worm related to another Olavius spp. with a symbiont metaproteome that evidently contributes to nutrient metabolism [52]. This scaffold was longer (8979 bp) than that of the circular virus (5878 bp).
Regarding immunity genes, conserved sequences involved in insect immunity were used to identify orthologs in the A. domestica genome assembly. Genes related to conserved cercropins and defensins were not found in the assembly and have not been identified in the literature from any orthopteran insect to date. However, there were two genes identified as peptidoglycan recognition protein (PGRP) (File S5). Three transcripts found in a previous transcriptome assembly [22] corresponded to these PGRP genes ( Figure 2). Only one PGRP transcript appears to be expressed at low levels in the embryo. Expression of all PGRPs, especially ci_142522_1, ramps up during the 1st week of development, and gradually decreases throughout the nymphal stages. PGRPs are expressed at higher levels in female compared to male adult A. domesticus. served cercropins and defensins were not found in the assembly and have not been identified in the literature from any orthopteran insect to date. However, there were two genes identified as peptidoglycan recognition protein (PGRP) (File S5). Three transcripts found in a previous transcriptome assembly [22] corresponded to these PGRP genes ( Figure 2). Only one PGRP transcript appears to be expressed at low levels in the embryo. Expression of all PGRPs, especially ci_142522_1, ramps up during the 1st week of development, and gradually decreases throughout the nymphal stages. PGRPs are expressed at higher levels in female compared to male adult A. domesticus. Figure 2. Expression profiles of three potential PGRP transcripts from A. domesticus developmental stages or male and female adults. The figure legend has the identification of transcripts from a previous transcriptome assembly [22].
There were four genes annotated as lysozymes in the A. domesticus assembly (File S5). Contigs cl_658063 (five isoforms) and cl_660380_1 (two isoforms) corresponding to ANN24139-RA and ANN24140-RA/ANN24139-RA, respectively, were the most highly expressed putative lysozyme transcripts in the transcriptome assembly [22]; ANN19980-RA transcripts were expressed at low levels ( Figure 3B). Lysozyme transcripts were expressed higher in 1 and 2 week nymphs and male and female adults.
The A. domesticus genome assembly contained nine genes annotated as PPO (File S5). Contig cl_93919 (nine isoforms), corresponding to all nine genes, was more highly expressed throughout life stages ( Figure 3C). However, cl_672515 (three isoforms) corresponding to ANN15897-RA, and cl_195343 corresponding to ANN15899-RA were more highly expressed in newly hatched larvae.
Regarding repetitive sequences, we annotated repetitive elements of the newly assembled genome of A. domesticus along with those in previously sequenced orthopteran genomes. We compared the repetitive elements in the genome assemblies of A. domesticus to those in other crickets (Apteronemobius asahinai, Gryllus bimaculatus, Laupala kohalensis, Teleogryllus occipitalis, T. oceanicus) and a locust, Locusta migratoria. This approach identi-  There were four genes annotated as lysozymes in the A. domesticus assembly (File S5). Contigs cl_658063 (five isoforms) and cl_660380_1 (two isoforms) corresponding to ANN24139-RA and ANN24140-RA/ANN24139-RA, respectively, were the most highly expressed putative lysozyme transcripts in the transcriptome assembly [22]; ANN19980-RA transcripts were expressed at low levels ( Figure 3B). Lysozyme transcripts were expressed higher in 1 and 2 week nymphs and male and female adults.
The A. domesticus genome assembly contained nine genes annotated as PPO (File S5). Contig cl_93919 (nine isoforms), corresponding to all nine genes, was more highly expressed throughout life stages ( Figure 3C). However, cl_672515 (three isoforms) corresponding to ANN15897-RA, and cl_195343 corresponding to ANN15899-RA were more highly expressed in newly hatched larvae.
Regarding repetitive sequences, we annotated repetitive elements of the newly assembled genome of A. domesticus along with those in previously sequenced orthopteran genomes. We compared the repetitive elements in the genome assemblies of A. domesticus to those in other crickets (Apteronemobius asahinai, Gryllus bimaculatus, Laupala kohalensis, Teleogryllus occipitalis, T. oceanicus) and a locust, Locusta migratoria. This approach identified repetitive elements ranging from 33.00% of the G. bimaculatus genome to 58.13% of the L. migratoria genome ( Table 2). Among the cricket genomes, A. domesticus had the highest overall content of repetitive elements at 49.42%. Among the major classes of the repetitive elements, DNA transposons were the most abundant element in the A. domesticus genome, accounting for 8.26% ( Figure 4A). The major classes of repetitive elements in the cricket genomes examined were not uniformly present but were biased by species. For example, in the cricket genomes, L. kohalensis LINE had the highest abundance at 13.4% while G. bimaculatus and T. oceanicus had only 3.34% and 4.04% LINE, respectively. Because repetitive elements, in particular transposable elements (TEs), are well-known contributors to genome size expansion in a wide range of insect species [53,54], we further examined the relative contribution of repetitive elements to the genome sizes of the orthopteran species. There was a significant correlation between total contents of repetitive elements and genome sizes in Orthoptera species (p = 0.024 and R2 = 0.61) ( Figure S6a). Because the contribution of the large genome size of L. migratoria may be significant in this analysis, we excluded the data of L. migratoria in the analysis and found that there was no correlation (p = 0.166 and R2 = 0.27 ( Figure S6b). We then dissected the repetitive elements into the major classes of TEs (i.e., SINE, LINE, LTR, DNA, and rolling circle) and investigated their contributions to genome size. As a result, we found that LINE, LTR elements and DNA transposon may contribute to the expansion of genome size in Orthoptera (LINE, p < 0.001 and R2 = 0.63; LTR, p < 0.001 and R2 = 0.58; DNA, p < 0.001 and R2 = 0.94) ( Figure 4B). When L. migratoria data was excluded, we found that the contents of LTR elements and DNA transposon were positively correlated with genome sizes (LTR, p < 0.001 and R2 = 0.43; DNA, p < 0.001 and R2 = 0.50) ( Figure S6a).   The Kimura substitution levels (x-axis) show sequence divergence, or "TE age", for the major classes. The classes that are skewed to the left (less sequence divergence) are estimated to have more recently diversified histories than the classes that are skewed to the right.
We next studied associations among intron length, genome size, and repeat content in Orthoptera, including A. domesticus, because they have been reported to be mutually correlated [55]. Median length of the A. domesticus gene model was 602 bp, which was about one-tenth of that of L. migratoria (Table 2). We found that total contents of repetitive elements and median length of introns in orthopteran species, including locust, were positively correlated (p = 0.044 and R2 = 0.51), but not within the cricket species (p = 0.791 and R2 < 0.01) ( Figure S6b). Further, correlations between intron length and major TE contents were positive for LINE, LTR elements and DNA transposon in Orthoptera (LINE, p < 0.001 and R2 = 0.70; LTR, p < 0.001 and R2 = 0.49; DNA, p < 0.001 and R2 = 0.89) ( Figure 4C), but not for SINE and rolling circle. These correlations were no longer established except for DNA transposons after limiting the analysis to crickets only (DNA, p = 0.034 and R2 = 0.05) ( Figure S6b). There also was significant and strong positive correlation between the median length of introns and genome size in orthopteran species (p < 0.001 and R2 = 0.97) ( Figure S6c). However, when limited to the analysis within crickets, genome size did not correlate with intron length (p = 0.949 and R2 < 0.01). Furthermore, a more detailed examination of the classes of TEs in Orthoptera revealed a positive correlation with intron length and genome size for LINE/CR1, LINE/L1, LINE/Penelope, LINE/RTE, DNA/hAT, DNA/Kolobok, and DNA/TcMar (Table S7). These results suggest that repetitive elements, especially several classes of TEs, such as LINE and DNA transposon, may contribute to variation in genome size and intron length in Orthoptera, and that while they are unlikely to be highly correlated within closely related species belonging to the same family, they are more likely highly correlated at the suborder level.
We also examined whether abundance pattern of TEs in orthopteran genomes is due to shared ancient proliferation events or recent/ongoing activity of TE. We analyzed the distribution of sequence divergence of the annotated TE to infer the timing of the change in TE composition. As a result, we found TE distribution patterns that were A. domesticusspecific or common to crickets ( Figure 4D). In A. domesticus, distributions of LTR transposon, especially LTR/ERV1 and LTR/Gypsy, showed a high abundance of recently diverged TE sequences compared to the other species, indicating an ongoing proliferative activity ( Figure S6d-f). Likewise, a similar pattern was seen in SINE of the T. occipitalis genome. On the other hand, the distribution pattern of DNA transposons in the orthopteran genomes examined here is relatively similar among the crickets, suggesting a possible shared activity of DNA transposons in ancient lineages of the crickets.
Regarding the promoters and genes annotated for genetic engineering, for genetic engineering experiments, we identified promoters of specific genes of interest (GOI). First, we identified and annotated the vermilion gene from A. domesticus (Ad vermilion) (Figure 5a) as our target for CRISPR editing and the muscle actin gene from A. domesticus (Ad muscle actin) ( Figure 5b) for a promoter for GOI expression. The Ad vermilion mRNA was identified from our previously published A. domesticus transcriptome data [22] by tBlastn using the T. castaneum vermilion amino acid sequence [56]. An incomplete mRNA was identified as the Ad vermilion mRNA. We then used this mRNA sequence to annotate the Ad vermilion gene and map the intron-exon splicing sites to identify each exon. The Ad muscle actin gene and its promoter were identified the same way initially using the T. castaneum muscle actin gene. In addition, we were able to identify two very similar actin genes in the A. domesticus genome. Since muscle actin genes in many insect species are a single exon gene, we identified one likely muscle actin gene with the highest score from blast and having a single exon. Once the gene was identified, we searched the genomic sequence 1.5 kb upstream from the start codon for promoter regions (Figure 5b).
We targeted the Ad vermilion gene to also serve as a convenient visual marker (white/vermilion eye color phenotype) for preliminary identification of positive knockout/knock-in G0 crickets. The Ad muscle actin gene's promoter was utilized due to its high level of continuous expression in all life stages, as well as its characteristic muscle anatomy expression phenotype, which is easily identified. For our downstream goals of expressing GOI in farm raised insects for food, feed, and bioproduct manufacturing, it is usually critical to use promoters with high levels of GOI expression (e.g., obtain the most product of interest per kilogram of insect) as well as have visual biomarkers (e.g., white eye or EGFP phenotype maintained through breeding selection) for simple genotype maintenance in farms and laboratories.
Biomolecules 2023, 13, x FOR PEER REVIEW 12 of 31 genes in the A. domesticus genome. Since muscle actin genes in many insect species are a single exon gene, we identified one likely muscle actin gene with the highest score from blast and having a single exon. Once the gene was identified, we searched the genomic sequence 1.5 kb upstream from the start codon for promoter regions (Figure 5b). We targeted the Ad vermilion gene to also serve as a convenient visual marker (white/vermilion eye color phenotype) for preliminary identification of positive knockout/knock-in G0 crickets. The Ad muscle actin gene's promoter was utilized due to its high level of continuous expression in all life stages, as well as its characteristic muscle anatomy expression phenotype, which is easily identified. For our downstream goals of expressing GOI in farm raised insects for food, feed, and bioproduct manufacturing, it is usually critical to use promoters with high levels of GOI expression (e.g., obtain the most product of interest per kilogram of insect) as well as have visual biomarkers (e.g., white eye or EGFP phenotype maintained through breeding selection) for simple genotype maintenance in farms and laboratories.
CRISPR knock-in/out experiments. Three sgRNA sites were identified as targets for our CRISPR knock-in/knock-out constructs (Table S8). The promoter of Ad muscle actin has many predicted promoters ( Figure 5b). To keep the knock-in DNA construct size small, we used 455 bp of the putative promoter sequences upstream from the starting code as the promoter for the marker gene. CRISPR knock-in/out experiments. Three sgRNA sites were identified as targets for our CRISPR knock-in/knock-out constructs (Table S8). The promoter of Ad muscle actin has many predicted promoters ( Figure 5b). To keep the knock-in DNA construct size small, we used 455 bp of the putative promoter sequences upstream from the starting code as the promoter for the marker gene.
For knock-outs, we targeted Ad vermilion using CRISPR/Cas9. Using this target sequence, we designed and used either (1) a combination of 3 different sgRNAs or (2) a single sgRNA (sgRNA#1) ( Table 3). The resulting G 0 hatchling crickets that received all three sgRNAs had 68% eye color knock-out phenotype while 32% of those receiving only sgRNA#1 had the phenotype. This phenotype resulted in eye color change from black (wild type) to vermilion or white eye color ( Figure 6A,B). Our results show the knockout color appears white in freshly hatched crickets and gradually changes to vermilion color after a few hours. In each set of injection experiments, both partial and complete knock-out of eye color phenotypes were found ( Figure 6B). Partial eye color knock-out was mosaic. In comparison, the eye color in wild-type crickets were always black even in freshly hatched individuals ( Figure 6A). To generate stable lines, groups of positive G 0 s were subsequently self-crossed. For self-crosses, all 16 crosses and 9 out of 10 crosses from three and one sgRNA treatments, respectively, provided vermilion eye color G 1 s, indicating the CRISPR/Cas9 knock-out rate was particularly high in A. domesticus. To check the knock-out sequences, we tested five of the knock-out strains (G 2 ) from the single sgRNA treatment. The sequencing results indicated deletions ranging from just a few base pairs to more than 240 bp among these five strains (Table 4). (wild type) to vermilion or white eye color ( Figure 6A,B). Our results show the knock-out color appears white in freshly hatched crickets and gradually changes to vermilion color after a few hours. In each set of injection experiments, both partial and complete knockout of eye color phenotypes were found ( Figure 6B). Partial eye color knock-out was mosaic. In comparison, the eye color in wild-type crickets were always black even in freshly hatched individuals ( Figure 6A). To generate stable lines, groups of positive G0′s were subsequently self-crossed. For self-crosses, all 16 crosses and 9 out of 10 crosses from three and one sgRNA treatments, respectively, provided vermilion eye color G1s, indicating the CRISPR/Cas9 knock-out rate was particularly high in A. domesticus. To check the knockout sequences, we tested five of the knock-out strains (G2) from the single sgRNA treatment. The sequencing results indicated deletions ranging from just a few base pairs to more than 240 bp among these five strains (Table 4).
* Miss match sequence for 14 bp.
For knock-in experiments, sgRNA#1 and the knock-in construct containing the enhanced green fluorescent protein (EGFP) marker gene coding region ( Figure 7) were coinjected with Cas9 protein into young A. domesticus eggs. We injected 939 eggs, and 255 crickets successfully hatched (hatch rate of 27%). From all hatched crickets, we obtained six G 0 crickets showing EGFP expression (2%) (Figure 8). The knock-in positive phenotype selected was based on EGFP expression in muscle tissue due to use of the Ad muscle actin promoter. EGFP expression varied from a small group of muscles to more than half of the muscle in the body ( Figure 8D,E). The enlarged picture shows a more detailed view of the muscle structure with GFP expression in part of the cricket leg ( Figure 8F). These results indicate that the promoter is functional and successfully expressing the marker gene in the appropriate tissue. From the G 0 EGFP positives, we were able to set up three out-crosses (with wild-type crickets) and one G 0 EGFP negative cross (using injected hatchlings that did not show EGFP phenotype) to screen G 1 offspring. From these experiments, we received EGFP positive G1s from two of the positive G0 out-crosses (#1 and #3) and were able to establish two muscle EGFP expression colonies (Figure 9). In Figure 9A,B, we can see that the EGFP expression patterns from #1 and #3 colonies are different. Crickets from colony #1 showed much lower EGFP expression and were only easy to screen from hind leg muscle, but most muscles in the body of crickets from colony #3 were expressing EGFP. There were no EGFP positive G 1 offspring from the negative G0 crosses. For knock-in experiments, sgRNA#1 and the knock-in construct containing the enhanced green fluorescent protein (EGFP) marker gene coding region ( Figure 7) were coinjected with Cas9 protein into young A. domesticus eggs. We injected 939 eggs, and 255 crickets successfully hatched (hatch rate of 27%). From all hatched crickets, we obtained six G0 crickets showing EGFP expression (2%) (Figure 8). The knock-in positive phenotype selected was based on EGFP expression in muscle tissue due to use of the Ad muscle actin promoter. EGFP expression varied from a small group of muscles to more than half of the muscle in the body ( Figure 8D,E). The enlarged picture shows a more detailed view of the muscle structure with GFP expression in part of the cricket leg ( Figure 8F). These results indicate that the promoter is functional and successfully expressing the marker gene in the appropriate tissue. From the G0 EGFP positives, we were able to set up three outcrosses (with wild-type crickets) and one G0 EGFP negative cross (using injected hatchlings that did not show EGFP phenotype) to screen G1 offspring. From these experiments, we received EGFP positive G1s from two of the positive G0 out-crosses (#1 and #3) and were able to establish two muscle EGFP expression colonies (Figure 9). In Figure 9A,B, we can see that the EGFP expression patterns from #1 and #3 colonies are different. Crickets from colony #1 showed much lower EGFP expression and were only easy to screen from hind leg muscle, but most muscles in the body of crickets from colony #3 were expressing EGFP. There were no EGFP positive G1 offspring from the negative G0 crosses.   RNAi experiments. In addition to demonstrating CRISPR-Cas9 knock-out and knock-in efficacy in A. domesticus based on our genomic data, we also demonstrated efficacy of RNAi in this species using the same data. Two different target genes were used: (1) Ad vermilion gene (in wild type crickets) and (2) the EGFP gene from CRISPR-Cas9 stable EGFP expressing lines. For each of the two strains, we injected three sets of crickets with each dsRNA ( Table 5). The results had survival rates four weeks post injection of approximately 70% for RNAi constructs and 51% for controls. Based on phenotype screening, the average KD (Knock Down) rate of either white eye or reduced EGFP expression was approximately 86% (e.g.,: 70 successful knock down phenotypes divided by 81 total injected survivors) ( Table 5). The observed phenotypes were visible at least two weeks post injection. The eye color knock-down phenotype was not as strong as the CRISPR knock-out but was sufficient for screening ( Figure 10A). The GFP RNAi knock-down experiments were similar ( Figure 10B). We continued screening both phenotypes for five weeks after injection and documented continuation of a reduction of both eye color pigmentation and GFP expression between two-and four-weeks post injection. RNAi experiments. In addition to demonstrating CRISPR-Cas9 knock-out and knockin efficacy in A. domesticus based on our genomic data, we also demonstrated efficacy of RNAi in this species using the same data. Two different target genes were used: (1) Ad vermilion gene (in wild type crickets) and (2) the EGFP gene from CRISPR-Cas9 stable EGFP expressing lines. For each of the two strains, we injected three sets of crickets with each dsRNA ( Table 5). The results had survival rates four weeks post injection of approximately 70% for RNAi constructs and 51% for controls. Based on phenotype screening, the average KD (Knock Down) rate of either white eye or reduced EGFP expression was approximately 86% (e.g., 70 successful knock down phenotypes divided by 81 total injected survivors) ( Table 5). The observed phenotypes were visible at least two weeks post injection. The eye color knock-down phenotype was not as strong as the CRISPR knock-out but was sufficient for screening ( Figure 10A). The GFP RNAi knock-down experiments were similar ( Figure 10B). We continued screening both phenotypes for five weeks after injection and documented continuation of a reduction of both eye color pigmentation and GFP expression between two-and four-weeks post injection.

Discussion
Insects are a dense source of dietary nutrients [57,58]. Research demonstrates other health benefits of consuming insects including biomarkers for improved gut health in humans [59]. Zoonotic diseases, such as viruses, are much less likely to jump from farmraised insects to humans than from mammals or bird livestock due to large genetic differences between humans and arthropods [23,35,60]. Additionally, studies show that foodborne pathogen loads in farmed insects, such as Salmonella spp. and Listeria monocytogenes, are low and often absent [61,62]. The European Union recently approved the focus species of this paper, A. domesticus, as well as Tenebrio molitor (yellow mealworm) as a safe novel food [63].
Given the excellent nutritional value of insects and their ability to be grown efficiently in small and confined spaces with minimal resource inputs, they will be a critically important solution for sustainable food/protein production [23,24,32,64,65].

Discussion
Insects are a dense source of dietary nutrients [57,58]. Research demonstrates other health benefits of consuming insects including biomarkers for improved gut health in humans [59]. Zoonotic diseases, such as viruses, are much less likely to jump from farm-raised insects to humans than from mammals or bird livestock due to large genetic differences between humans and arthropods [23,35,60]. Additionally, studies show that foodborne pathogen loads in farmed insects, such as Salmonella spp. and Listeria monocytogenes, are low and often absent [61,62]. The European Union recently approved the focus species of this paper, A. domesticus, as well as Tenebrio molitor (yellow mealworm) as a safe novel food [63].
Given the excellent nutritional value of insects and their ability to be grown efficiently in small and confined spaces with minimal resource inputs, they will be a critically important solution for sustainable food/protein production [23,24,32,64,65]. Entomophagy, consumption of insects as food, is accepted and practiced by over 2 billion people worldwide [34,66,67]. Some attractive features offered by insects include low land, water, and natural resource utilization [64], reduced greenhouse gas emissions [65,68], highly prolific (1500-3000 eggs per female) [64,69,70] and short life cycles for rapid scale-up to support food security. Insects are amenable to modularized vertical farming as well as automated and urban farming. They also can be grown closer to cities and/or processing facilities. Due to their small size and growth efficiencies, insects are likely the only animal feasible for production in space exploration, such as at stations on the moon and Mars [71,72].
Diversification of our food supply is critical for food security. According to the FAO, 75% of food comes from only 12 plant and 5 animal species [10,73]. Class Insecta are the largest, most diverse group of organisms on Earth, with approximately 950,000 species described and 4-30 million estimated [16,74,75]. At least 2000 species have been identified as eaten around the world, and many can be farmed [23,24,33,34,67]. One of the food security benefits of the biodiversity offered by insects is the ability to rapidly switch species in response to crop loss from disease [23,28].
In addition to being a food source for animals and humans, crickets have also been utilized as a valuable model system for studying biological processes such as neurobiology, developmental biology, animal behavior, and others [76]. With the recent interest in industrial farming crickets for human consumption, even greater potential exists for studying these creatures via expanded access to all life stages from farms, as well as the potential impact of research on cricket biology and genetics.

Genome Assembly
The A. domesticus genome assembly represents a contiguous assembly for downstream applications. The assembly contains 11 large scaffolds, presumably representing most of the autosomal and X chromosomes, in 2.138 Gb, close to the flow-cytometry predicted genome size of 2.150 Gb for male A. domesticus. There were 29,304 predicted genes, mostly concentrated on the large chromosomes. These data will be important for the insect food industries, but also for orthopteran studies in biology and evolution.
Immediately after obtaining the genome assembly, we looked for promoters for genes for use in our transgenic experiments but encountered genes with very long sequences that were distinctly longer than the coleopteran genomes we have studied. A quick comparison of the length of the BUSCO reference gene sequences estimated that the A. domesticus genes were on average 80% longer than those in T castaneum. We also noticed that, in most cases, the longer genes were due to longer introns that may be due to TEs as has been observed in other Orthoptera [53,54]. Therefore, we compared repetitive elements in the A. domesticus genome assembly with those available for other orthopterans and found that the A. domesticus genome assembly contains the highest content of TEs (almost half of the assembly) compared to other cricket genomes. Repetitive elements and genome size in Orthoperta were significantly correlated if we included the very large genome of L. migratoria (p = 0.024). When L. migratoria data was excluded, LTR elements and DNA transposon were positively correlated to the expansion of genome size in crickets (p < 0.001). As we suspected, repetitive element and median length of intron were positively correlated (p = 0.044) but only when L. migratoria was included, and especially for elements LINE/CR1, LINE/L1, LINE/Penelope, LINE/RTE, DNA/hAT, DNA/Kolobok, and DNA/TcMar. Our data suggest that LINE and DNA TEs contribute to variation in genome size and corresponding intron length in Orthoptera, but this is likely found only at the suborder level. Furthermore, the TE distribution pattern of LTR transposons shows a recent or ongoing burst in the A. domesticus genome and likely contribute to the high abundance of repetitive elements in the genome of this species.

A. domesticus Metagenome
Of the 16,290 original scaffolds in the A. domesticus assembly, 6246 scaffolds were removed and submitted to NCBI as metagenome data. Most of these metagenome scaffolds had similarity to IIV6, but we were unable to assemble these scaffolds into a complete IIV6 sequence. Closer examination of the predicted proteins from the virus scaffolds indicated that only 32 scaffolds were high to medium quality viral genome sequences. Therefore, scaffolds with similarity to virus sequences at the nucleotide level may be sequencing artifacts, or some may represent novel viruses that have not been described in current databases. Importantly, our analysis did not identify common foodborne pathogens, such as Salmonella sp., Listeria sp., E. coli, or Staphylococcus sp.
The question of crickets harboring covert virus infections was recently addressed [77]. In that study, Densovirus sequences were found in sick and healthy insects, and the authors speculated that difference in immunity may prevent active diseases in healthy colonies. Our insects came from an insect farm with no reports of insect disease. We did find scaffolds similar to Wolbachia symbionts in other insects. Wolbachia has been reported to have a protective effect against viruses in some insects [78]. Further, Wolbachia was reported in a previous study of microbial communities in A. domesticus [79]. More research is needed to understand the prevalence of iridoviruses in farmed insects, as well as the mechanisms that prevent active disease in infected but healthy individuals. Our goal is to engineer virus resistant crickets through CRISPR genetic engineering technology to improve the health of farmed insects. This is an important reason we seek to understand genes related to immunity in these insects.

Immunity-Related Genes and Antimicrobial Peptides
We annotated genes related to immunity in A. domesticus to better understand how these crickets respond to infection, important in insects being reared in close confines for animal feed and human consumption. Insects depend almost exclusively on an innate peptide and antimicrobial secondary metabolite forms of immunity against infections/pathogens [16,80]. The current study did not identify analogs of classical canonical small insect antimicrobial peptides (AMPs), such as cecropins or defensins, in the A. domesticus genome. AMPs are found more in holometabolous than hemimetabolous species [81], perhaps due to sampling (i.e., originally identified in insect pupae/larvae). However, this may also be due to the poor mobility and frequent terrestrial lifestyle of insect larvae vs. nymphs (e.g., grubs/caterpillars vs. cricket/grasshopper nymphs), thus exposing holometabolous insects to a wider variety of microbial pathogens for longer periods of their life [81]. Insect immune-related proteins have activity against microbes, including Plasmodium sp. [82]. However, as with much of Class Insecta, insufficient research is available to identify biological activities and potential practical applications for these proteins [16]. Given their abundance in insects and the growing scale of the insect production industry, insect antibacterial, antifungal, antiparasitic, and antiviral proteins also could be a valuable and low-cost bioresource or byproduct for future biomedical applications. Many studies have identified numerous AMPs from insects with a wide variety of efficacies against various pathogens and other potentially valuable biological activities [83,84]. Thus, these proteins represent an untapped resource for potential therapeutic applications.
Instead of AMPs, we found other conserved genes in the A. domesticus genome assembly that may protect it from pathogens, including PGRP, GNBP, lysozymes, and phenoloxidase. We reverted to our previous transcriptome data [22] to understand how these genes are expressed in different life stages or in male vs. female adults. Two PGRP genes are increasingly expressed during early nymphal development and are more highly expressed in female than male A. domesticus. Eleven genes were annotated as GNBP, but only one gene, ANN16377-RA, was more highly expressed in nymphs and adults. Lysozymes and PPO are important in resistance to disease in crickets [85]. Overall, lysozyme genes were expressed higher in 1-and 2-week-old nymphs, and male and female adults, and PPO genes were expressed mostly throughout life stages. However, two PPO genes, ANN15897-RA and ANN15899-RA, were more highly expressed in newly hatched larvae and thus at a time the cricket would be upregulating immune defenses. These data will be important in developing protocols for healthy farm-reared insects, both in diagnostics and prophylatics for disease control.
Additionally, when considering the impact of insect genetic engineering, such as work derived from this publication, insects may be an ideal host for expression and mass production of exogenous antimicrobial peptides from various species (e.g.,: crickets or mealworms may be useful as bioreactors to produce a particularly active/valuable peptide compared to other insects that may be more challenging or costly to mass produce). As such, applications are identified; given large insect biological diversity, as well as an established insect mass production industry, a massive and low-cost resource is now available for production of compounds for specific applications.

Genetic Engineering and Gene Editing via CRISPR
The primary purpose of this work was to sequence the A. domesticus genome to obtain necessary sequence data for genetic engineering to improve these animals as a sustainable food resource and for other forms of bioproduction. Thus, we identified and annotated two genes, Ad vermilion and Ad muscle actin, which were used to create knock-out and knock-in strains of A. domesticus as an initial proof of concept to develop the tools required for our downstream technologies. The analysis of Ad vermilion was our first indication that the genome of this species contains quite large introns. The exon sizes were similar to those of other orthologous insect genes such as vermilion from T. castaneum, but due to the extended intron length, this gene was more than 100 times longer in A. domesticus. The T. castaneum vermilion gene is around 2 kb but is around 90 kb in Ad vermilion. On the other hand, the annotation of the Ad muscle actin gene was more straightforward. As with Tribolium sp. and Drosophila sp., Ad muscle actin has only one exon and, therefore, is similar in size to that of other insect species. Once we had these genes annotated, we were then able to use the Ad vermilion gene to identify CRISPR target sites, and the promoter region from Ad muscle actin gene was harnessed for high level expression of knock-in marker genes. The Ad vermilion gene, when disrupted as a target for knock-in experiments, also functions as a convenient biomarker, as reducing or eliminating function of this gene results in a white-eye (or vermilion color eye) phenotype with less eye pigmentation than wild type. Thus, for the CRISPR genetic engineering work, we showed that the CRISPR/Cas9 gene editing tool works in A. domesticus. With up to a 68% CRISPR knock-out rate in G 0 , the efficiency was sufficient to produce and establish the modified strains for research and future commercial applications. After multiple generations in the laboratory, the vermilion mutation in A. domesticus colonies does not demonstrate any obvious decrease in phenotype or fitness based on survival and growth rate (data not shown). For our knock-in CRISPR experiments, we used the same sgRNA target site as for the knock-outs. As such, we expected the Cas9-sgRNA-created DNA double break in both the insect genome and the knock-in plasmid. Therefore, the EGFP marker gene was likely incorporated into the genome through the non-homologous end joining (NHEJ) DNA repair process. The Ad muscle actin promoter was successfully utilized to express exogenous EGFP in muscle tissue. The G 0 somatic knock-in rate was around 2%, and two out of three outcrosses provided EGFP positive G 1 , indicating the knock-in is efficient in those G 0 . As this was our first attempt, an optimized protocol will be created in the future to further improve this knock-in rate. Interestingly, the EGFP colonies had different muscle EGFP expression patterns, possibly as a result of a damaged promoter during knock-in since we used NHEJ to create the knock-in or due to epigenetic factors. Based on these results, we successfully showed CRISPR gene editing works to knock-in genes and function in house cricket and creates possibilities for valuable applications going forward.

RNAi in A. domesticus
For the RNAi experiments, abdominal microinjections of young A. domesticus with target gene dsRNAs resulted in either eye color reduced phenotype or whole body GFP knock-down, depending on the construct and strain injected. Therefore, A. domesticus has the necessary genes for systematic RNAi, and they are functional, as was previously described in this insect [86] and in the Allonemobius socius complex of crickets [87]. From experience with other insect species, we expected phenotypic changes around 48 h post injection, but since crickets regain their eye color every time they molt, we did not see the eye color change until their next molt after dsRNA microinjections. Depending on the timing, the eye color was sometimes more reduced after the next two to three molts. However, since RNAi can only reduce existing gene expression, it is likely eye color will gradually be restored through time after each molt. The RNAi eye color phenotype was as obvious as CRISPR knock-out. EGFP knock-down also showed visible phenotype (reduced EGFP expression) after two weeks. We believe this slow response compared with other insects may be because the EGFP protein is more stable, so it takes longer for the preexperiment protein to be degraded. For RNAi experiments, we observed a lower survival rate in some of the microinjection groups, likely due to non-optimal rearing conditions such as small observation containers that were not large enough for more than 2 molts/live stage, which will be optimized in future RNAi experiments.

Conclusions
This work provides the foundation for the potential of genome editing applications in farm raised A. domesticus, such as disease resistance and alteration of the nutrient profile for production of numerous bioproducts, using the low cost, low-tech farm raised insect bioreactor system. Insects are already highly efficient and sustainable compared to other livestock and protein production systems. Gene editing technology will produce insects with higher growth rate, lower mortality due to disease and other factors, and more nutrient dense that will in turn make insects more efficient and sustainable.
The assembly also will be a resource for basic research, as crickets are a valuable model system for studying biological processes such as neurobiology, developmental biology, animal behavior, and others [76]. With the recent interest in industrial farming crickets for human consumption, even greater potential exists for studying these creatures via expanded access to all life stages from farms, as well as the potential impact of research on cricket biology and genetics.
Many opportunities exist for insects as an untapped resource for applications like pharmaceuticals, including antimicrobials and antiviral substances, drug lead compounds, material for bioprospecting, enzymes, bioactive peptides, oils, biocompatable materials for wound healing and other medical applications, food waste mitigation, nutrient cycling, biodegradable plastics and packaging, new materials from chitin with novel properties, and durability [16,23]. However, little bioprospecting of insects has been done to date compared with other taxa.
Developing the tools for insect genetic engineering, including high quality reference genomes, provides an open-ended opportunity to use insects for purposes besides mere sources of food, protein, and dietary nutrients. Beyond the substances that insects contain naturally, genetic engineering provides additional potential for efficient, low-cost bioproduction of vaccines, enzymes, antibiotics, antimicrobial peptides, antibodies, color pigments and dyes, flavors, fragrances, functional ingredients, and many others. We believe this foundational research will play a critical role in reducing human environmental impact by utilizing the largest, most diverse, and yet almost entirely untapped biological resource on Earth: Class Insecta.

Materials and Methods
A. domesticus colony maintenance. A domesticus were originally purchased from a US insect farm in 2018 and maintained as a lab colony for multiple generations. They were fed a diet of specially formulated cricket feed from LoneStar (TFP Nutrition) (Nacogdoches, TX, USA) (http://lonestarfeed.com), which was the same feed used by the insect farm. Feed and water were provided to each cricket cage twice weekly. Eggs were collected twice per week from wild type adults in a petri dish (150 mm diameter × 10 mm deep ) filled with hydrated (deionized water saturated) polyacrylamide water crystals, as crickets prefer to lay eggs in moist areas. Each week, the old egg-lay dish was replaced with a new dish on Monday and Friday. Eggs were collected each Tuesday for colony maintenance. For egg collection, the petri dish with water crystals containing eggs was washed into a glass beaker using deionized water. Additional water was added to the beaker and stirred a few times, and eggs were allowed to settle to the bottom for approximately 10-15 s. Most of the excess water and crystals were slowly removed to retain eggs. Washing was repeated 2-3 times, and a plastic Pasteur pipette was used to aspirate eggs from the bottom of the beaker and transfer to a piece of damp/moist paper towel in a petri dish lid. The lid was placed into an 8 oz Deli container (S-21216, Uline, Pleasant Prairie, WI, USA) with a lid until the eggs hatched (approximately 8-10 days). After hatching, crickets were transferred into cricket cages. Cricket cages consisted of 20 qt plastic storage containers (Gasket Box, Sterilite, Townsend, MA, USA) with four 100 mm circles cut on the side and two on the lid that were covered by screen mesh material.
Cytometric genome size estimation of A. domesticus. The 1C (haploid) genome size of males and females of A. domesticus was estimated as described in Johnston et al. (2019) [49]. In brief, a single A. domesticus head, a single female head (1C = 328 Mbp), and a portion of brain from a Periplaneta americana male (1C = 3300) were placed together into 1 mL of ice-cold Galbraith buffer in a 2 mL Dounce tissue grinder. Nuclei were released by grinding with 10 strokes of an A (loose) pestle, then filtered through nylon mesh into a 1.5 mL microfuge tube. The DNA in the released nuclei were stained by adding 25 µL of propidium iodide (1 mg/mL) and allowed to stain 3 h in the dark at 4 • C. The mean red PI fluorescence of the stained DNA in the nuclei of the sample and the standards was quantified using a CytoFlex flow cytometer (Beckman Coulter). Haploid (1C) DNA quantity was calculated as (2C sample mean fluorescence/2C standard mean fluorescence) times 328 Mbp for the D. virilis standard and times 3300 Mbp for the P. americana standard. The estimates based on the two standards were averaged to produce a 1C estimate for each sample.
Genome sequencing and assembly. A single adult male A. domesticus was shipped to Dovetail Genomics (now Cantata Bio, Scotts Valley, CA, USA). Genomic DNA was extracted and shipped to North Carolina State University and USDA ARS, Stoneville, MS for long read sequencing, and extracted nuclei were used for Chicago and Hi-C libraries.
A Chicago library was prepared as described previously [89]. Briefly,~500 ng of HMW gDNA was reconstituted into chromatin in vitro and fixed with formaldehyde. Fixed chromatin was digested with DpnII, the 5 overhangs filled in with biotinylated nucleotides, and then, free blunt ends were ligated. After ligation, crosslinks were reversed and the DNA purified from protein. Purified DNA was treated to remove biotin that was not internal to ligated fragments. The DNA was then sheared to~350 bp mean fragment size and sequencing libraries were generated using NEBNext Ultra enzymes and Illuminacompatible adapters. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of each library. The libraries were sequenced on an Illumina HiSeq X to produce 208 million 2 × 150 bp paired end reads. A Dovetail HiC library was prepared in a similar manner as described previously [90]. Briefly, for each library, chromatin was fixed in place with formaldehyde in the nucleus and then extracted Fixed chromatin was digested with DpnII, the 5 overhangs filled in with biotinylated nucleotides, and then free blunt ends were ligated. After ligation, crosslinks were reversed, and the DNA was purified from protein. Purified DNA was treated to remove biotin that was not internal to ligated fragments. The DNA was then sheared to~350 bp mean fragment size and sequencing libraries were generated using NEBNext Ultra enzymes and Illumina-compatible adapters. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of each library. The libraries were sequenced on an Illumina HiSeq X to produce 143 million 2 × 150 bp paired end reads.
The input de novo assembly, Chicago library reads, and Dovetail Hi-C library reads were used as input data for HiRise, a software pipeline designed specifically for using proximity ligation data to scaffold genome assemblies [89]. An iterative analysis was conducted. First, Chicago library sequences were aligned to the draft input assembly using a modified SNAP read mapper [91]. The separations of Chicago read pairs mapped within draft scaffolds were analyzed with HiRise to produce a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative misjoins to score prospective joins and make joins above a threshold. After aligning and scaffolding Chicago data, Dovetail HiC library sequences were aligned and scaffolded following the same method. Scaffolds that were determined to be "contaminants" (mitochondrial, duplicate, vector, or microbial as described in the next section) were removed prior to submission, resulting in 9961 final scaffolds and a total length of 2,346,604,983 bp. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JAHLJT000000000. The version described in this paper is version JAHLJT010000000.
To determine scaffolds that were from the metagenome, we used multiple tools to screen the scaffolds, including BBSketch and NCBI Refseq (https://www.biostars.org/p/ 234837/) and Kraken2 [92] in Omicsboxs (Biobam, Valencia, Spain, version 2.1.2, database RefSeq 2021.04, 2022.02). The scaffolds also were used in beta testing for GX, a screening tool for microbial contamination at NCBI (https://github.com/ncbi/fcs). We combined results from all three datasets (BBtools, Kraken2, GX) and manually annotated unclassified scaffolds with blastn [50], further using read depth to assign as insect or microbial. After this process, 80 scaffolds were removed as a result of NCBI screening tools to identify vector or duplicates, and 6245 metagenomic scaffolds were submitted to NCBI MIMS as a metagenome/environmental, host-associated sample (accession number PRJNA908265).
Annotation and analysis. Annotation of the A. domesticus genome assembly was by Dovetail Genomics. Repeat families found in the genome assemblies of A. domesticus were identified de novo and classified using RepeatModeler (version 2.0.1) [93]. Repeat-Modeler depends on RECON (version 1.08) and RepeatScout (version 1.0.6) for the de novo identification of repeats within the genome. The custom repeat library obtained from RepeatModeler was used to discover, identify, and mask the repeats in the assembly file using RepeatMasker (version 4.1.0) [94]. Coding sequences from Locusta migratoria, Teleogryllus occipitalis, Laupala kohalensis, and Xenocatantops brachycerus were used to train the initial ab initio model for A. domesticus using AUGUSTUS (version 2.5.5) [95], with six rounds of prediction optimization. The same coding sequences also were used to train a separate ab initio model for A. domesticus using SNAP (version 2006-07-28) [89]. RNAseq reads were mapped to the genome using STAR aligner (version 2.7) [96]. and intron hints were generated with the bam2hints tools within AUGUSTUS. MAKER and SNAP (intron-exon boundary hints provided from RNA-Seq) were then used to predict genes in the repeat-masked reference genome. Swiss-Prot peptide sequences from the UniProt database were downloaded and used in conjunction with the protein sequences from the same training species to generate peptide evidence in Maker [97]. Only genes that were predicted by both SNAP and AUGUSTUS were retained in the final gene sets. AED scores were generated for each of the predicted genes as part of the MAKER pipeline to assess the quality of the gene prediction. Genes were further characterized for putative function by performing a BLAST [50] search of the peptide sequences against the UniProt database. tRNA were predicted using the software tRNAscan-SE (version 2.05) [98]. Predicted genes were analyzed by BUSCO (v.2.0) [99], using the lineage dataset insecta_odb9 (creation date: 2016-10-21, number of species: 42, number of BUSCOs: 1658). Predicted genes and sequence information are in supplemental files (Files S9 and S10).
CRISPR target gene for knock-out/in. We chose the 5 of the Ad vermilion gene as our CRISPR target site for knock-in experiments. We identified the Ad vermilion gene from the A. domesticus genome assembly using a transcriptome mRNA sequence (not complete, missing 3 ) by Blastation (TM Software, Inc., Arcadia, CA, USA). We were able to identify eight exons, including the first exon (Figure 5a). We then used the second exon (around 150 bp) to select three target CRISPR targeting sites for designing sgRNAs (Table S8) using GGP sgRNA designer (https://portals.broadinstitute.org/gpp/public/analysis-tools/ sgrna-design). The target sgRNA sequences were ordered from Synthego (Redwood City, CA, USA). To verify the knock-out sequence in A. domesticus, genomic DNA extraction was using a ZymoResearch Quick-DNA insect miniprep kit (Irvine, CA, USA), forward primer "GAGCAGTAGGCGAGAAAG" and reverse primer "TCCAAACGCAGAAGAGACCA" to amplify gDNA. After PCR, we used primer "TGTAGTGAGTGTTATCGCCA" to sequence the PCR product using the Oklahoma Medical Research Foundation sequencing service facility (OMRF DNA sequencing, https://omrf.org/). Using these data, we compared the gRNA sequence to wild type sequence to determine the indel(s) generated by our knock-out and knock-in experiments (Table 3).
Knock-in DNA construct design. For knock-in construct design, we used the Ad muscle actin gene promoter to express the marker gene, enhanced green florescent protein (EGPF) via CRISPR/Cas9. We utilized the muscle actin gene from T. castaneum and D. melanogaster to conduct a BLAST search of the A. domesticus transcriptome data and used the mRNA sequence from that search to identify the Ad muscle actin gene in the genome assembly. We searched for a promoter region 1.5 kb upstream from the start codon of the Ad muscle actin genomic sequence using the Neural Network Promoter Prediction tool (https://fruitfly.org/seq_tools/promoter.html).
To design the DNA construct, 455 bp of the Ad muscle actin promoter sequence with its 5 UTR were placed upstream of the EGFP coding sequence along with a sv40 polyadenylation signal 3 of EGFP as the marker gene. A section of 86 bp of gDNA included CRISPR knock-in site(s) placed on both sides of the marker gene as the final knock-in DNA construct, Ad-actin-EGFP-KI. This DNA sequence was synthesized by GeneScript (Piscataway, NJ, USA) in their pUC57 vector plasmid as the final product.
Microinjection solution preparation. The concentrations of the components of knockout or knock-in microinjection solutions were: 100 ng/µL of DNA construct Ad-actin-EGFP-KI (only for knock-in treatment), 10 pmol/µL of sgRNA, 1 µg/µL of TrueCut Cas9 protein V2 (ThermoFisher, Waltham, MA, USA), and 20% phenol red buffer (Sigma Aldrich, St. Louis, MO, USA). The solution was mixed and incubated at room temperature for 5-10 min for Cas9-sgRNA binding, and the solution was kept on ice during microinjection. For knock-out experiments, we used either three sgRNAs or sgRNA#1 alone for microinjections (Table S8, sgRNA target sequences). For knock-in experiments, we only used sgRNA#1 in the microinjection solution.
A. domesticus egg microinjection protocol for CRISPR. An egg lay dish with hydrated water crystals was placed into an adult wild type A. domesticus colony for 4 h to receive fresh and non-developed eggs. Eggs were collected and washed using the protocol described above in the "A. domesticus Colony Maintenance" section. After washing the eggs from the water crystals, a solution of 70% ethanol (ethanol/deionized water) was used to briefly rinse the eggs for 10-15 s and quickly rinsed with deionized water at least 3 times. The eggs were placed on microinjection slides ( Figure S12). To prepare the microinjection slide, a piece of black filter paper was cut into 50 × 10 mm pieces and laid on a standard clean glass microscope slide (75 × 26 mm). Another two pieces of black filter paper strips, 50 × 2 mm, were cut and laid one on the top of the other placed in the middle of the larger piece of black filter paper. Deionized water was added to the filter paper, and eggs were placed on the edge of two sides of the filter paper strips for microinjection. The wet filter prevented the eggs from drying out during microinjection and allowed a backstop to keep the eggs from moving during the procedure. The black paper made visualizing the eggs easier compared to using white filter paper. The needle for microinjection was pulled from Standard Glass Capillaries (1B100-4, World Precision Instruments) using a P-2000 needle puller (Sutter instruments). The following puller apparatus settings were used: Heat: 335 Fil: 4 Vel: 40 Del: 240 Pul: 120. For microinjection, 1 µL of microinjection solution was loaded into the back of the needle with Femtotips (Eppendorf, Hamburg, Germany), and all air bubbles were removed from the liquid in the needle, particularly the tip. The filled needle was then applied to a FemtoJet 5247 microinjector (Eppendorf) and air pressure of 4-7 psi was used for holding the solution in place and 9-10 psi for injection. Before injection, forceps were used to gently break the needle tips to achieve a small opening. The injection site for eggs was 2/3 from the smaller end. After injection, the black filter paper containing the injected eggs was transferred from the glass slides to a piece of wet paper towel and kept in a 150 × 10 mm petri dish with lid. The dish was placed into a hatch chamber (Modular Incubator Chamber, MIC-101, Billups-Rothenberg, San Diego, CA, USA) with two wet paper towels for 10-12 d.
Screening for CRISPR knock-out/in and established colonies. To screen for either white eye (Ad vermilion knock-out) or EGFP (EGFP knock-in) expression phenotypes, freshly hatched G 0 A. domesticus were transferred from their egg incubation chamber (described above) into a standard petri dish with lid. The petri dish was then placed on ice to immobilize for 3 min. The crickets where then observed under a fluorescent dissecting microscope with digital camera (Leica M125, Leica Microsystems Inc., Deerfield, IL, USA) for either different eye coloration from wild type in knock-out treatments (using a standard white LED ring light) or using a blue florescent light and GFP filter to check for knockin EGFP expression phenotype. Photos were taken using a QImaging Retiga R6 digital camera. After screening, G 0 crickets with positive knock-out white/vermilion eye (full or partial) phenotype were separated and reared to adulthood in containers described in the house cricket colony maintenance section above. Crickets with wild type eye phenotype from knock-out experiments were not kept. G 0 crickets from the knock-in experiments with EGFP positive and negative phenotypes were reared in separate containers until adulthood. Once crickets became adults, self-cross experiments were set up (two males with four to six females) groups for positive G 0 s from the knock-out experiments. For knock-in experiments, small out-cross groups were set up using EGFP positive G 0 s with two G 0 s out-cross to 2-3 wildtype crickets, and one self-cross group from all EGFP negative G 0 s pooled together in the same cage. Cross groups were maintained, and eggs were collected as described in the house cricket colony maintenance section above. Once the G 1 generations began to hatch from eggs, we then screened and separated all white eye color phenotype knock-out G 1 s from self-cross groups, and EGFP positive G 1 s from out-cross groups to establish separate colonies from all crosses. No EGFP positive G 1 crickets were found from the pooled EGFP negative G0 injected cricket colony.
RNAi target genes and dsRNA sequences. Short portions of the Ad vermilion mRNA gene sequence (679 bp) were used to design 450 bp (51-500 bp) dsRNA (dsAdV) ( Table S11) to target the Ad vermilion gene for knock-down via RNAi. An additional dsRNA (dsEGFP) also was made using 450 bp of EGFP coding sequence (region 96-545 bp) and was used both to knock-down EGFP in our knock-in cricket lines via RNAi and a negative control for the RNAi experiments for knocking down Ad vermilion for white eye color via RNAi. Both dsRNA sequences were from Genolution Inc. (Seoul, Republic of Korea).
A. domesticus microinjection and screening for RNAi. The concentrations of components of the cricket RNAi microinjection solution were: 2.5 µg/uL for dsRNA (dsAdV or dsEGFP) with 20% phenol red buffer. Two separate experiments were utilized to demonstration of RNAi efficacy in A. domesticus; one using wild type crickets and the other using our genetically modified strain. The dsAdV was injected into wildtype crickets to knock down eye color. The dsEGFP was injected into wild type crickets both as a negative control and also in separate experiments to knock down EGFP expression in our AdVELV1-3 EGFP expression strain. Two to three sets of microinjections of each RNAi experiment were conducted using young nymphs (following their first molt after hatching from eggs). Crickets were placed on ice for 5 min to immobilize them and were subsequently injected in the posterior abdomen ( Figure S12). Around 25 to 30 crickets were injected using a total of 4 µL of injection solution (approximately 0.15 µL per injection). Following injection, crickets were maintained in a medium sized cage (Kritter Keeper ® , 4.80 × 7.40 × 5.60 in, Lee's Aquarium & Pet Products, San Marcos, CA, USA) with food and water. After microinjection, crickets were screened based on eye color or EGFP expression every week and phenotype changes were documented.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/biom13040589/s1, Table S1. Genome size of A. domesticus. 1C = the amount of DNA in a gamete (1C is an average of the gametes with and without the X in the male). Figure S2. Comparison of the lengths of BUSCO reference genes (bp) in Tribolium castaneum and Acheta domesticus. Figure S3. Metagenomic scaffolds from the A. domesticus genome assembly. Spreadsheet "Ado_metagenome_Kraken2") details the output from files tentatively identified as noninsect through the first screening; "Blastn_unclassified" is the output from blastn of the unclassified scaffolds to NCBInr; "Metagenome_summary" are the scaffolds submitted to NCBI as A. domesticusassociated sequences. Figure S4. Screen of viral genome sequences through CheckV [51]. Data is sorted according to high, low, or medium quality, or not-determined based on matches to the CheckV database. Figure S5. Immune-related transcripts that are predicted from the annotation of the A. domesticus assembly. Predicted transcripts include: PGRP-peptidoglycan-recognition protein; GNBP-b-1-3-glucan-binding protein; lysozyme; PPO-prophenyloxidase. Figure S6. Supporting data for the comparison of intron length, genome size, and repeat content in orthopteran species (six crickets, including Acheta domesticus, Apteronemobius asahinai, Gryllus bimaculatus, Laupala kohalensis, Teleogryllus occipitalis, Teleogryllus oceanicus and one locust, Locusta migratoria). (A) LINE, (B) DNA, and (C) LTR transposon subclasses and correlation with intron length and genome size. Table S7. Comparison of intron length, genome size, and repeat content in orthopteran species (six crickets, including Acheta domesticus, Apteronemobius asahinai, Gryllus bimaculatus, Laupala kohalensis, Teleogryllus occipitalis, Teleogryllus oceanicus and one locust, Locusta migratoria). (A) Correlation between total repeat content and genome size; (B) Correlation between total repeat content and median length of intron; (C) Correlation between genome size and median length of intron; and Comparison of (D) LTR, (E) LINE, and (F) DNA transposable elements. Table S8. CRISPR sgRNA target sequences for the Ad vermillion gene. Table S9. Double-stranded RNA sequences used for RNAi experiments. Figure S10. Name and annotation of predicted A. domesticus genes, with scaffold number and length. Figure S11. Transcript sequences of predicted A. domesticus genes. Figure S12 Data Availability Statement: All data was deposited at NCBI, including the A. domesticus genome assembly-JAHLJT010000000, transcriptome assembly-GHUU00000000, and metagenome scaffolds-PRJNA908265.