First Draft Genome Assembly of Tropical Bed Bug, Cimex hemipterus (F.)

: Cimex hemipterus , a blood-feeding ectoparasite commonly found in tropical regions, is a notorious household pest. The draft genome assembly of C. hemipterus is presented in this study, generated using SPAdes software with Illumina short reads. The obtained genome size was 388.66 Mb with a contig N50 size of 3503 bp. BUSCO assessment indicated that 96.71% of the expected Insecta lineage genes were complete in the genome assembly. Annotation of the C. hemipterus genome assembly identiﬁed 2.88% of repetitive sequences and 17,254 protein-coding genes. Functional annotation showed that most gene families are involved in cellular processes and signaling. This ﬁrst C. hemipterus genome will be helpful in further understanding the bed bug genetics and evolution, while the annotated genome may also help in devising new strategies in bed bug management. Dataset: The raw genome sequencing data of Illumina HiSeq were deposited in NCBI in FASTQ format with BioSample accession number SAMN18780126 under BioProject PRJNA722579. The assembled data and its annotation files are available in figshare repository (DOI: 10.6084/m9.figshare.16815364). Dataset


Summary
Cimex hemipterus is s a wingless, red-brown, human blood-feeding insect in the Cimicidae family [1]. It is also known as the tropical bed bug due to the fact that this species is typically found in the tropical regions of Southeast Asia, Africa, and South America [2]. Nevertheless, this species may have the ability to expand outside the tropical range, since it has also been reported in temperate regions, including Florida [3], Italy [4], and Paris [5].
Bed bugs develop through five nymphal instars before reaching adulthood, with each life stage requiring a blood meal to molt to the next instar [6]. Thus, bed bug infestations could happen almost anywhere, such as dormitories, residential houses, hotels, and airports, as long as they can contact their hosts [5,7].
Although there is no evidence to prove that they are a vector, bed bug bites can cause skin reactions, such as itching, wheals, and lesions to the majority of bitten individuals and may even lead to anemia and iron inadequacies with heavy bed bug infestations [1,[8][9][10]. Besides physiological responses, the victim can also be affected psychologically, resulting in anxiety, sleep deprivation, and fatigue [11].
Bed bug infestation of human habitats has increased drastically over the last two decades due to the difficulty in controlling bed bugs and human-mediated spreading [5,7,8,12]. This has created a need for renewed research on tropical bed bugs, such as at the molecular level, to further understand the crucial molecular biology processes, such as gene expression and transcription regulation, in order to find a new strategy for pest control. However, this goal could be impeded without the reference genome. Thus, this is the first study that reports the draft genome and annotation set of C. hemipterus.

Genome Assembly
The Illumina sequencing data of Cimex hemipterus were assembled, and an assembly with a total length of 388.66 Mb and a contig N50 of 3503 bp was yielded (Table 1). BUSCO [13] assessment of the genome indicated that 96.71% of the gene sets were completed (including single and duplicated gene sets), while the rest were either fragmented (2.19%) or missing (1.10%). The data set for C. hemipterus was concluded to be sufficiently comprehensive for further downstream analyses.

Protein-Coding Gene Annotation
A total of 17,254 protein-coding genes were estimated from the repeat-elements masked assembly, and a total of 11,431 genes were annotated against the Eukaryotic Orthologous Groups (KOG) database. Analysis showed that most of the genes were associated with cellular processes and signaling (33.0%) followed by metabolism (25.0%), and information storage and processing (24.0%), and 18% of the genes were poorly characterized (Table 3).

DNA Extraction, Library Construction, and Sequencing
A lab-reared specimen was used in this study. The bed bug samples were first collected in 2014 from cushioned seats in the waiting area in Kuala Lumpur International Airport (KLIA), Malaysia, and bred in the laboratory subsequently [7]. Before DNA extraction, one adult male bed bug was surface sterilized with 70% ethanol and rinsed with sterile distilled water. The whole organism was then crushed into pieces using a micro-pestle, and the DNA was extracted using the HiYield Genomic DNA isolation kit (Real Biotech Corporation, Taiwan) according to the manufacturer's instructions. Illumina DNA paired-end (PE) libraries were constructed according to the standard protocol provided by Illumina (San Diego, CA, USA), with short-insert sizes (150 bp). Sequencing was then performed on the HiSeq 2000 platform (Illumina, San Diego, CA, USA), and a total of 9 Gb of Illumina reads were produced.

Genome Assembly and Data Analysis
The following workflow was performed on Linux system Ubuntu (64-bit) with 6 CPUs processor and 13,000 MB base memory. Before downstream analyses, the quality of all the Illumina reads was estimated using FastQC [14]. The adapter-contaminated and low-quality reads (<Q30) were removed using Trimmomatic (ver0.36) [15], yielding a total of 8.66 Gb of clean Illumina reads. The paired-end reads were merged using FLASH (ver1.2.11) [16], producing 1.73 Gb Illumina data. The C. hemipterus genome was assembled using the SPAdes (ver3.15.2) [17] genome assembler. After assembling the data, the length statistics of the genome assembly were assessed by QUAST (ver5.0.2) [18]. The quality and genome completeness of the assembly was assessed using BUSCO (ver5.1.2) [13] against a set of highly conserved insect single-copy orthologs. The repetitive elements were identified and masked using the RepeatMasker (ver4.1.2) [19]. The number of protein-coding genes of the masked assembly was estimated using AUGUSTUS (ver3.4.0) [20]. The protein-coding genes were then annotated against the Eukaryotic Orthologous Groups (KOG) database using the WebMGA server [21].