Braconidae revisited: Bracon brevicornis genome showcases the potential of linked-read sequencing in identifying a putative complementary sex determiner gene

Bracon brevicornis is an ectoparasitoid of a wide range of larval-stage Lepidopterans, including several pests of important crops, such as the corn borer, Ostrinia nubilalis. It is also one of the earliest documented cases of complementary sex determination in Hymenoptera. Here, we present the linked-read genome of B. brevicornis, complete with an ab initio-derived annotation and protein comparisons with fellow braconids, Fopius arisanus and Diachasma alloem. We demonstrate the potential of linked-read assemblies in exploring regions of heterozygosity and search for structural and homology-derived evidence of the complementary sex determiner gene (csd).


11
Bracon brevicornis (Wesmael) is a gregarious ectoparasitoid of various Lepidoptera us the ability to view heterozygous regions, known as "phases" in diploid assemblies, 80 within a genome which allow us to investigate potential csd regions. 81 Here we report on the whole-genome sequencing of a pool of females from an 82 isolated B. brevicornis strain using 10X Genomics technology that relies on linked-83 read sequencing (10x Genomics Inc., Pleasanton, CA, USA). Due to their long history 84 of genetic isolation during laboratory rearing, the females in this strain are assumed 85 to have a high level of homozygosity, whereas a csd locus would retain its 86 heterozygosity. The 10X Genomics technology allows for generating phased data in 87 which allelic variants can be identified after assembly. High-molecular weight DNA 88 is partitioned into small droplets containing a unique barcode and adapter in such 89 a way that only a few DNA molecules are present within each droplet. Within each 90 droplet the DNA is broken into pieces and the barcode (Gel Bead-in-Emulsion, 91 "GEM") is ligated to each of the DNA fragments. This resulting library can then be 92 sequenced on an Illumina sequence platform. In the assembly step the reads 93 originating from the same fragment are organized by barcode and put together 94 into synthetic long-read fragments. Importantly, it is nearly impossible that two 95 fragments with opposing allelic-variances are together in the same droplet 96 (Weisenfeld et al. 2017). This technique therefore allowed us to identify potential csd 97 candidates in the female-derived B. brevicornis genome after sequencing by 98 studying the phased data containing the different haplotypes. Moreover, as B. 99 brevicornis is a potential biological control agent of several pests, the availability of 100 a full genome may provide effective ways to study and improve this species to grow 101 it into an established biological control agent for Lepidopteran pests. Immediately following emergence, 100 to 120 female wasps were flash frozen in 115 liquid nitrogen and ground with a mortar and pestle. Genomic DNA was extracted 116 using a protocol modified from Chang, Puryear, and Cairney (Chang et al. 1993).

117
Modifications include adding 300 μL BME to extraction buffer just before use. Instead 118 of 10M LiCl, 0.7 volume isopropanol (100%) was added to the initial supernatant, 119 after which it was divided into 1.5 mL Eppendorf tubes as 1 mL aliquots for 120 subsequent extractions. The initial centrifugation step occurred at a slower rate and with final assessments for DNA quality, amount, and fragment size confirmed via 126 BioAnalyzer 2100 (Agilent, Santa Clara, California, USA).

127
10X Genomics library preparation and sequencing: 128 As the genome of B. brevicornis is relatively small for the scale of the 10X platform, 129 there is a higher risk of overlapping fragments within single GEMs. In order to reduce  To filter sequence data from Heinz tomato (S. lycopersicon) carrier DNA sequences, 147 23bp (16bp GEM + 7 bp spacer) were removed from forward reads and all reads 148 were subsequently mapped to an in-house high quality reference assembly of the Heinz genome using BWA-MEM v0.7.17 (Li 2013). Using samtools v1.9 (Li et al. 2009), 150 all unaligned read pairs (-F=12) were extracted and labelled non-Heinz. The 151 assembly of the non-Heinz labelled read set was performed with 10X Supernova 152 assembler v2.1.0 (10X Genomics), using default settings including commands for 153 both pseudohap (--style=pseudohap) and pseudohap2 (--style=pseudohap2) 154 outputs (Weisenfeld et al. 2017). These commands determine the output from 155 Supernova, the first being the final scaffold output (pseudohap), while the second 156 is the so-called 'parallel pseudohaplotype' (pseudohap2) scaffolds that represent 157 areas of divergence or phases (Weisenfeld et al. 2017). Phasing is flattened in the 158 pseudohap output by selecting the region with higher mapping coverage, whereas 159 in the pseudohap2 output is differentiated by ".1" and ".2" at the end of each 160 scaffold name to denote phasing, though not all scaffolds are phased at this point 161 due to lack of divergence during assembly.

195
After prediction, the protein sequences were retrieved and compared to both F. annotation of exons (three cases) and removal of two predicted genes that were 203 more than 50% ambiguous nucleotides.

204
In silico identification of feminizer as a putative csd locus:

205
The pseudohap2 files were deduplicated using the dedupe tool within BBTools  (Kearse et al. 2012)). The protein of gene 218 "g7607" (locus tag = BBRV_LOCUS33129) was used in an NCBI BLASTp against the nr 219 database with default settings (Camacho et al. 2009;Acland et al. 2014). Next a 220 region stretching from ~10Kbp upstream and downstream of the first and last tBLASTn hit in scaffold 12, respectively, was annotated using HMM plus similar protein-  presumed to be heterozygous in females.

310
So far, a csd gene has been sequenced only in species of bees of the genus Apis, 311 and it is highly polymorphic, even within subspecies (Wang et al. 2012). It is located 312 adjacent to the more conserved feminizer (fem) (Hasselmann et al. 2008), and we 313 therefore started with localizing feminizer in the genome. As feminizer (or its ortholog 314 transformer, tra) was not identified in the ab-initio annotation, we used a local 315 tBLASTn search to find fem in the assembly. Four hits with E-value from 5.86e-04 to 316 8.59e-08 were found in scaffold 12. Searching the annotation using part of the tBLASTn result shows that it is annotated as "g7607" (locus tag = BBRV_LOCUS33129) 318 which gave a first hit with protein O-glucosyltransferase 2 (Diachasma alloeum) after 319 a BLASTp search, and no fem or tra hits were found. A closer inspection showed that 320 "g7607" is annotated as fusion protein with the N-terminal part resembling fem and Here, we present the genome of the braconid wasp Bracon brevicornis, a parasitoid 357 wasp that not only has biological control applications, but also offers potential as a 358 study system for future analyses into braconid phylogenetics and gene evolution.

359
With no previous genomes available for the subfamily Braconinae, the most 360 specious of the braconid wasps, the resources and investigations presented here fill 361 this gap. Our linked-read library, assisted by carrier DNA of S. lycopersicon, has 362 resulted in a highly contiguous, very complete assembly, comprised of just 353 363 scaffolds and 12,686 genes. This gene count is similar to related species, and in further protein length comparisons, the proteins are highly similar. This indicates that 365 the predicted genes are highly complete, a necessary feature for any future 366 phylogenetic comparisons between species or families. 367 We utilized the 10X Genomics linked-read approach to obtain pseudohaploid 368 information that would allow us to search for potential csd loci in silico. As a 369 substantial number of scaffolds were putatively heterozygous, we used the notion 370 that in A. mellifera, csd is located adjacent to fem (Hasselmann et al. 2008) to limit 371 our search for csd candidates. We manually annotated a putative B. brevicornis 372 fem and a partial Bbfem duplicate that is highly similar, and both genes encode all 373 known tra/fem protein domains ( Figure S1) (Verhulst et al. 2010   and purple text colour indicates hypervariable region in csd (Beye et al. 2003).