The Genome Assembly and Annotation of the Southern Elephant Seal Mirounga leonina

The southern elephant seal Mirounga leonina is the largest phocid seal and one of the two species of elephant seals. They are listed as ‘least concern’ by the International Union for Conservation of Nature (IUCN) Red List of Threatened Species 2015. Here, we have assembled the reference genome for M. leonina using the 10× chromium sequencing platform. The final genome assembly of M. leonina was 2.42 Gb long, with a contig N50 length of 54 Mb and a maximum length of 111.6 Mb. The M. leonina genome contained 20,457 predicted protein-coding genes and possessed 41.51% repeated sequences. The completeness of the M. leonina genome was evaluated using benchmarking universal single-copy orthologous genes (BUSCOs): the assembly was highly complete, containing 95.6% of the core set of mammalian genes. The high-quality genomic information on M. leonina will be essential for further understanding of adaptive metabolism upon repeated breath-hold dives and the exploration of molecular mechanisms contributing to its unique biochemical and physiological characteristics. The southern elephant seal genome project was deposited at NCBI (National Center for Biotechnology Information) under BioProject number PRJNA587380.


Introduction
Elephant seals have been highlighted as a crucial animal model for studying their unique behavior, physiology, population dynamics, and geographical distribution. Unlike other ocean-going mammals (e.g., whales, dugongs, manatees), elephant seals have to emerge from the water to rest, molt, mate, and rear pups [1]. Seals undergo a long fasting period, and the energy generated from the blubber-fat allows them stay on land for over a month without food and water. The northern elephant seal is known for losing up to 40% body mass during the prolonged fasting period [2]. Furthermore, seals have evolved to regulate osmotic challenge by producing endogenous water from lipid oxidation and concentration of urine [3].
The elephant seal genus Mirounga contains two species, the southern elephant seal Mirounga leonina and the northern elephant seal M. angustirostris. The southern elephant seal is the largest species in the clade Pinnipedia, as adult males can weigh up to four tons and grow up to 5.8 m in length.
Four distinct populations were investigated in the circumpolar regions and sub-Antarctic islands [4,5]. Southern elephant seals have evolved to be able to deep-and long-dive for foraging. During predation, they spend most of their time underwater to prey on squid, mollusks, krill, and cephalopods, and they can undertake lengthy dives from 30~45 min to over 2 h and dive more than 2000 m in depth [6]. The enlarged eyes and sensitive vibrissae whiskers play a crucial role in predation in the deep sea [6].
M. leonina was almost exterminated by indiscriminate slaughter to harvest oil from the blubber in the early 19th century. Thus, they were listed as 'vulnerable' under the predecessor to the Environment Protection and Biodiversity Conservation Act 1999 (EPBC Act). Hunting was regulated by the International Convention for the Conservation of Antarctic seals and the Convention on Antarctic Marine Living Resources to conserve M. leonina individuals and their population, and population size has been recovered. They are currently listed as 'least concern' by the International Union for Conservation of Nature (IUCN) Red List of Threatened Species 2015 (e.T13583A45227247, http://www.iucnredlist.org, accessed on: 12. Dec. 2014). The southern elephant seal has encountered several threats, such as the depletion of prey stocks owing to intensive fishing in Antarctica, drastic fluctuations in habitat by climate change, and low genetic diversity [7]. Therefore, the consistent monitoring and highly regulated conservation of M. leonina populations are strongly required.
Genome information of M. leonina could help to understand the unique physiological and metabolic adaptations, such as the ability to spend large amounts of time in the sea as a mammal, survive in a frigid environment, perform repeated breath-hold dives, engage in deep-diving and long-ranging predation, tolerate routine hypoxia, and fast during the nursing period. In this study, the genome of the southern elephant seal was sequenced and analyzed for the first time in elephant seals. The genome of M. leonina was effectively sequenced from a small blood sample using the 10× chromium sequencing platform to diminish potential stress during sampling. This study will contribute to preserving the species by understanding its physiology and molecular mechanisms.

Materials and Methods
The genomic DNA of the southern elephant seal, Mirounga leonina, (Figure 1) was extracted from the blood specimen as described in our previous study [8]; approximately 1 mL of blood sample was collected from a single elephant seal on King George Island, South Shetland Islands, Antarctica. The blood sample (200 µL) was used to obtain high molecular weight gDNA using the QIAGEN MagAttract HMW DNA kit (QIAGEN, Germantown, MD, USA) according to the manufacturer's protocol. The quality and quantity of the gDNA were analyzed using a 5400 Fragment analyzer (Agilent Technologies, Santa Clara, CA, USA) and Qubit 2.0 Fluorometer (Invitrogen, Life Technologies, Carlsbad, CA, USA). DNA libraries for the southern elephant seal individuals were generated using the 10× Genomics Chromium technology according to the manufacturer's instructions. Gel bead-in-emulsions (GEMs) were created from a library of Genome Gel Beads combined with 1.5 ng of gDNA in a Master Mix and partitioning oil, using the 10× Genomics Chromium Controller instrument with a micro-fluidic Genome chip (PN-120257). The GEMs were then subjected to an isothermal incubation step. Bar-coded DNA fragments were extracted and underwent Illumina library construction, as detailed in the Chromium Genome Reagent Kits Version 2 User Guide (PN-120258). Library yield was measured through the Qubit dsDNA HS assay kit (Thermo Fisher Scientific, Waltham, MA, USA). Library fragment size and distribution were measured using an Agilent 2100 Bioanalyzer High Sensitivity DNA chip (Agilent, Santa Clara, CA, USA). The DNA was sequenced on a NovaSeq with a 2 × 250 bp read metric, and ∼1.6 billion paired-end (PE) reads were generated (Table 1).  The de novo genome assembly was performed using the paired-end sequence reads from the partitioned library as input for the Supernova assembler (RRID:SCR 016756) v2.1.1 (10× Genomics, San Francisco, CA, USA) [9] with default parameters.
This study including sample collection and experimental research conducted on these animals was performed according to the law on activities and environmental protection to Antarctic approved by the Minister of Foreign Affairs and Trade of the Republic of Korea.

Results and Discussion
The final assembled southern elephant seal, Mirounga leonina, genome obtained a 2.42 Gb genome (GC content: 41.51%) estimated the heterozygosity to be 0.301% with an N50 scaffold length of 54.233 Mb and a maximum scaffold length of 111.625 Mb (Table 2 and Supplementary Figure S1). The number of scaffolds in the southern elephant seal genome assembly was 1.115, and 54 scaffolds were over 10 Mb long and occupied 90.6% of our assembly (Supplementary Table S1.). Genome completeness was estimated with Benchmarking Universal Single-copy Orthologs (BUSCO) v3.0. (RRID:SCR 015008, version 3.0) by using the vertebrata_odb9 database [22]. Of the 2586 total BUSCO groups searched, 2472 BUSCO core genes were completely, and 49 were partially identified, leading to a total of 98.1% of BUSCO genes being found in the M. leonina genome (Supplementary Table S2). The analysis revealed that 41.51% of the assembled M. leonina genome consisted of repeat sequences, of which 33.28% were transposable elements (TE), including LINEs (22.12%), SINEs (6.06%), LTRs (3.06%), and DNA transposons (1.81%) ( Table 3). The Kimura divergence (κ-values) estimates the age and transposition history of TEs [23]. The LINEs are the most abundant transposable elements. Kimura distances (κ-values) were calculated for all TE copies of each element in order to estimate the "relative age" and transposition history of TEs [24]. The calculated Kimura divergence for all the TE copies of M. leonina is shown ( Figure 2) and strongly shaped by LINEs (κ-values ≤5), which means that transposable elements, and LINEs in particular, are a recent development in the southern elephant seal genome due to very similar copies (low κ-values) are indicative of rather recent activity.  A total of 20,457 protein-coding genes in the M. leonina genome were annotated based on the combination of ab initio gene prediction, homology search, and transcript mapping. The total length of exons occupied 30.5 Mb, with an average of 8.2 exons per gene (Table 4). The functional classification of Gene Ontology (GO) categories (Gene Ontology, RRID:SCR_002811) was performed using the BLAST2GO v5.25 pipeline (Blast2GO, RRID:SCR_005828) [25]. A Kyoto Encyclopedia of Genes and Genomes (KEGG, RRID:SCR_001120) [26] pathway annotation analysis was performed using the KEGG Automatic Annotation Server (KAAS), and EuKaryotic Orthologous Groups [27] were annotated with the KOG database proteins using BLASTp v2.2.31 with a maximal e-value of 1e-5. As a result, totally, 19,439 genes were annotated by a minimum of one database (Table 4). A total of 3354, 15,354, and 11,501 genes were annotated by the gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) databases, and Eukaryotic Orthogous Groups (KOG), respectively (Supplementary Figures S2-S4). The genome browser, local blast database and assembly data are accessible on https://antagen.kopri.re.kr/project/project.php, accessed on: 01.01.2020. Synteny analyses of chromosomes between California sea lion (Zalophus californianus) reference genome and southern elephant seal were performed using SyMAP v3.4 [28]. Southern elephant seal assembly is highly contiguous compared to the 18 chromosomes of the California sea lion (Figure 3 and Supplementary Table S3). A total of 2.22 Gb of southern elephant seal assembled genome were mapped to California sea lion chromosomes. OrthoVenn2 [29] was used to identify paralogy and orthology in annotated proteins of the southern elephant seal and other marine mammals. The deduced southern elephant seal proteins have 13,473 orthologous groups (containing 14,335 genes of the southern elephant seal) within five Pinnipedia species, and 148 orthogroups (containing 353 genes) are only for southern elephant seal (Figure 4).
The southern elephant seal assembly shows that a high-quality reference genome from 10× linked reads sequence will be essential for the further understanding of adaptive metabolism upon repeated breath-hold dives and the exploration of molecular mechanisms contributing to its unique biochemical and physiological characteristics. Additionally, our genomic data will provide valuable genetic resources for evolutionary studies regarding the divergence of pinnipeds.   Table S1: Lengths of the southern elephant seal genome assembly (over 1Mb). Table S2: Completeness of the southern elephant seal genome assembly evaluated with benchmarking universal single-copy orthologs (BUSCO). Table S3: Mapping of Southern elephant seal genome assembly to California sea lion chromosomes (assembly zalCal2.2). Figure S1: Graph of the k-mer distribution (K = 27) using GenomeScope. Figure S2: Gene ontology (GO). The horizontal axis indicates classes of the second-level GO-annotation, and the vertical axis indicates the number of genes in each class. Figure S3: Statistics of Kyoto Encyclopedia of Genes and Genomes (KEGG) classifications. Figure S4

Conflicts of Interest:
The authors declare no conflict of interest.