Nomenclature- and Database-Compatible Names for the Two Ebola Virus Variants that Emerged in Guinea and the Democratic Republic of the Congo in 2014

In 2014, Ebola virus (EBOV) was identified as the etiological agent of a large and still expanding outbreak of Ebola virus disease (EVD) in West Africa and a much more confined EVD outbreak in Middle Africa. Epidemiological and evolutionary analyses confirmed that all cases of both outbreaks are connected to a single introduction each of EBOV into human populations and that both outbreaks are not directly connected. Coding-complete genomic sequence analyses of isolates revealed that the two outbreaks were caused by two novel EBOV variants, and initial clinical observations suggest that neither of them should be considered strains. Here we present consensus decisions on naming for both variants (West Africa: “Makona”, Middle Africa: “Lomela”) and provide database-compatible full, shortened, and abbreviated names that are in line with recently established filovirus sub-species nomenclatures.


Introduction
On 10 March 2014, a viral hemorrhagic fever (VHF) outbreak was reported among humans in Guinea, West Africa [1]. Ebola virus (EBOV), the sole member of the species Zaire ebolavirus (genus Ebolavirus, family Filoviridae, order Mononegavirales [2]), was identified as the etiological agent. Consequently, the VHF was identified as Ebola virus disease (EVD) [1,3,4]. At the time of writing, this EVD outbreak has spread from Guinea into Liberia, Nigeria, Senegal, Sierra Leone, and Mali, with individual case exportations or transport of patients to France, Germany, Norway, Spain, UK, and US. At least 15351 human infections and 5459 deaths (proportion of fatal cases ≈36%) have been recorded as of 21 November 2014, making this outbreak the largest EVD outbreak in history [5]. Through conventional Sanger [1] and next-generation sequencing [6], 102 coding-complete EBOV genome sequences have been assembled (complete genome sequences with the exception of the ultimate 3' and 5' untranslated regions [7]) originating from three patients from Guinea and 78 patients from Sierra Leone [1,6]. Evolutionary analyses combined with epidemiological data demonstrate that all cases are directly epidemiologically linked, tracing back to a single introduction of EBOV into the human population [1,6] as has been found for most past EVD outbreaks [8].
On 24 August 2014, another EVD outbreak was reported from Boende District, Democratic Republic of the Congo, Middle Africa [9]. A total of 66 cases and 49 deaths (proportion of fatal case ≈74%) have been recorded [5]. As in Guinea, epidemiological analyses point towards a single introduction of EBOV from its unknown natural reservoir into the human population, with subsequent spread among humans by direct person-to-person transmission [9]. Thus far, two partial L (RNA-dependent RNA polymerase) gene sequences have been deposited into GenBank, and the coding-complete sequence of one isolate has been determined [9]. Phylogenetic analysis demonstrated that the Guinea and Democratic Republic of Congo EVD outbreaks were not related. The EBOV variants causing both outbreaks were distinct from each other and from variants known from previous EVD outbreaks [1,3,4,6,9].
Next-generation sequencing techniques enable the determination of coding-complete EBOV genomes in dozens and theoretically hundreds of clinical samples in parallel in the absence of classical virus culture [6]. The rapid accumulation of sequence data challenges sequence database curators and end users when novel sequences are not uniquely named according to common standards. Here, we assign final designations to the two EBOV variants causing the 2014 Guinea and Democratic Republic of Congo EVD outbreaks and update the current GenBank sequence entries accordingly.

Ebola Virus Strain, Variant, and Isolate Naming
In 2013, a consortium of filovirologists and sequence database experts working at the US National Center for Biotechnology Information (NCBI) established a consistent and prospective filovirus nomenclature below the species level [10]. This nomenclature, which already has been applied to all filovirus entries in NCBI's RefSeq database [11], is based on the template: <virus name>(/<strain>)/<isolation host-suffix>/<country of sampling>/<year of sampling>/<genetic variant designation>-<isolate designation>. [Note: the "/" between the <virus name> and the <isolation host-suffix> field was missing in the full name template outlined in [10], but is necessary for computational purposes. It is therefore introduced here and already implemented filovirus full names [11] will be retrospectively corrected.] The <virus name> field should contain a filovirus name as outlined in [10]. Currently, the accepted filovirus names and abbreviations are Bundibugyo virus (BDBV), Ebola virus (EBOV), Lloviu virus (LLOV), Marburg virus (MARV), Ravn virus (RAVV), Reston virus (RESTV), Sudan virus (SUDV), and Taï Forest ebolavirus (TAFV), Table 1 [2,12]. The <strain> field should contain a unique strain name in case the virus in question fulfills the criteria for being a strain (see [10]). The <isolation host-suffix> field should be provided in one word in the format "first letter of host genus name.full name of species descriptor" (e.g., "H.sapiens") followed by suffix that denotes whether the sequence stems from an unpassaged sample ("-wt"), from virus isolated in tissue culture ("-tc"), or is a genomic fragment ("-frag). The <country of sampling> and <year of sampling> fields should contain the alpha-3 three-letter ISO 3166-1 code for the country where the virus was isolated and the year in which it was isolated, respectively. Finally, the <variant designation> and <isolate designation> fields should contain a unique variant name (i.e., a name for the virus variant that was introduced into the human population that caused an outbreak) and unique isolate name (i.e., the name for a particular representative of the variant), respectively [10]. To simplify manuscript writing, shortened and abbreviated virus designations are also defined [10]. For instance, the designations full: Ebola virus/H.sapiens-tc/COD/1976/Yambuku-Ecran shortened: EBOV/H.sap/COD/76/Yam-Ecr abbreviated: EBOV/Yam-Ecr specify an isolate "Ecran" of Ebola virus as a representative of the variant "Yambuku" (not a strain) that originated from a human in the Democratic Republic of the Congo in 1976 and was isolated/sequenced using tissue culture [16].

The 2014 Ebola Virus Variant Originating in Guinea
At the end of 2013, EVD broke out around Guéckédou, Kissidougou, and Macenta, Guinea [1] and consequently spread to at least five additional West African countries. Epidemiological and phylogenetic studies indicate that this large EVD outbreak was caused by a single introduction of one particular ebolavirus, Ebola virus (EBOV), into humans (Homo sapiens) from an unknown reservoir and therefore that all subsequent human cases (over 15,000 cases) are derived from one unnamed variant [1,6]. Preliminary clinical observations among EVD patients in West Africa do not contradict past descriptions of EVD [17][18][19][20][21][22], i.e., this novel unnamed EBOV variant is not a strain as defined in standardized filovirus nomenclature [10]. Here we propose the name "Makona" (IPA: [mɑ'kɔnə] or [məˈkoʊnə]; English phonetic notation: mah-kaw-nuh or muh-koh-nuh) after the Makona River close to the border between Liberia, Guinea, and Sierra Leone (Figure 1)   At the time of writing, 102 coding-complete genomic sequences of EBOV/Mak have been deposited into GenBank, all of which were obtained directly from clinical samples ("p0") [1,6]. Following the rules laid out in filovirus standardized nomenclature [10], the names for these sequences therefore should contain the <suffix> "-wt". In addition, one fragment of the L gene of one isolate of EBOV/Mak was deposited. Based on definitions described in standardized nomenclature [10], the corresponding <suffix> field should therefore be filled with "-frag". All currently deposited sequences stem from either Guinean, Sierra Leonean, or Nigerian samples. The 3-letter country codes to be used the <country> field [10] for all countries that have thus far handled patients infected with EBOV/Mak are summarized in Table 2.  [1]. The 102 coding-complete genomes (including those of C05, C07, and C15) and the one fragmented L gene sequence that have been deposited into GenBank all already have assigned unique <isolate designation> descriptors. Accordingly, in all currently deposited sequences of EBOV/Mak, the definition line will be adjusted to "Zaire ebolavirus isolate Ebola virus/H.sapiens-<suffix>/<country>/2014/Makona-<isolate designation>, [coding-]complete genome, with the <suffix>, <country>, and <isolate designation> fields will be filled according to their origin. The GenBank <strain> field will be cleared throughout; the Genbank <isolate> field will be filled with "Ebola virus/H.sapiens-<suffix>/<country>/ 2014/Makona-<isolate designation>", and the <organism> field will be corrected, if necessary, to "Zaire ebolavirus" (Table 3).

The 2014 Ebola Virus Variant Originating in the Democratic Republic of the Congo
In late 2014, EVD broke out in Boende District, Democratic Republic of the Congo (3-letter country code: COD), Middle Africa. Epidemiological and phylogenetic studies indicate that this limited EVD outbreak was caused by a single introduction of one particular ebolavirus, Ebola virus (EBOV), into humans (Homo sapiens) from an unknown reservoir and therefore that all subsequent several cases (almost 70) are derived from one unnamed variant [9]. Preliminary clinical observations among EVD patients in this outbreak do not contradict past descriptions of EVD [9], i.e., this novel unnamed EBOV variant is not a strain as defined in [10]. Here we propose the name "Lomela" (IPA: [lɔ'mɛlɑ] or [lɔ'mɛlə]; English phonetic notation: law-me-lah or law-me-luh) after the Lomela River that runs through COD's Boende District (see Figure 2)   Accordingly, in all currently deposited sequences of EBOV/Lom, the definition line will be adjusted to "Zaire ebolavirus isolate Ebola virus/H.sapiens-<suffix>/<COD>/2014/Lomela-<isolate designation>, with the <suffix> and <isolate designation> fields will be filled according to their origin. The GenBank <strain> field will be cleared throughout; the Genbank <isolate> field will be filled with "Ebola virus/H.sapiens-<suffix>/<COD>/2014/Lomela-<isolate designation>", and the <organism> field will be corrected, if necessary, to "Zaire ebolavirus" (Table 4).