Global Discrepancies between Numbers of Available SARS-CoV-2 Genomes and Human Development Indexes at Country Scales

It has now been over a year since SARS-CoV-2 first emerged in China, in December 2019, and it has spread rapidly around the world. Some variants are currently considered of great concern. We aimed to analyze the numbers of SARS-CoV-2 genome sequences obtained in different countries worldwide until January 2021. On 28 January 2021, we downloaded the deposited genome sequence origin from the GISAID database, and from the “Our world in data” website we downloaded numbers of SARS-CoV-2-diagnosed cases, numbers of SARS-CoV-2-associated deaths, population size, life expectancy, gross domestic product (GDP) per capita, and human development index per country. Files were merged and data were analyzed using Microsoft Excel software. A total of 450,968 SARS-CoV-2 genomes originating from 135 countries on the 5 continents were available. When considering the 19 countries for which the number of genomes per 100 deaths was >100, six were in Europe, while eight were in Asia, three were in Oceania and two were in Africa. Six (30%) of these countries are beyond rank 75, regarding the human development index and four (20%) are beyond rank 80 regarding GDP per capita. Moreover, the comparisons of the number of genomes sequenced per 100 deaths to the human development index by country show that some Western European countries have released similar or lower numbers of genomes than many African or Asian countries with a lower human development index. Previous data highlight great discrepancies between the numbers of available SARS-CoV-2 genomes per 100 cases and deaths and the ranking of countries regarding wealth and development.


Introduction
The SARS-CoV-2 pandemic, which has been spreading for almost a year, has generated considerable global efforts in the sequencing, collection, and analysis of viral genomes. Sequence databases and various tools for storing, downloading, classifying, and analyzing these genomes have quickly become available [1,2]. In particular, GISAID sequence database hosts a collection of SARS-CoV-2 genomic sequences obtained worldwide (https://www.gisaid.org/; accessed on 28 January 2021) [1]. Our team has produced a large number of genome sequences for SARS-CoV-2, in particular when the incidence of cases considerably re-increased during the summer [3][4][5][6][7]. This enabled us to point out the existence of variants very early (which are strains that differ from all others by a set of several mutations and have reached a detectable population size) during the summer of 2020 [4]; we named those identified in our institute Marseille-1 to Marseille-10. They have been responsible for successive or overlapping epidemics, before becoming established at our country's scale. cessed on 28 January 2021) [1]). On the same day, we also downloaded from the "Our world in data" website (https://ourworldindata.org/; accessed on 28 January 2021) the numbers of SARS-CoV-2-diagnosed cases and SARS-CoV-2-associated deaths per country as well as various epidemiological data, including population size, life expectancy, gross domestic product (GDP) per capita, and human development index (collected from URL: https://covid.ourworldindata.org/data/owid-covid-data.xlsx; accessed on 11 April 2021). According to the United Nations Development Programme (http://hdr.undp.org/en/ content/human-development-index-hdi; accessed on 11 April 2021), the human development index is the geometric mean of normalized indices for the health dimension (assessed by life expectancy at birth), the education dimension (assessed by mean of years of schooling for adults ≥ 25 years of age, and expected years of schooling for children of schoolentering age), and the standard of living dimension (assessed by gross national income per capita). This index was used as a measure of country development to figure out if this latter was related to the capacity and/or willingness to perform next-generation sequencing to assess SARS-CoV-2 genomic epidemiology. Files were merged and data were analyzed using Microsoft Excel software (https://www.microsoft.com; accessed on 11 April 2021). We standardized the numbers of genomes sequenced per 100 SARS-CoV-2-diagnosed cases and per 100 SARS-CoV-2-associated deaths. Data were plotted using Microsoft Excel and GraphPad Prism v.5 (https://www.graphpad.com; accessed on 11 April 2021) software. The numbers of genomes per country taken into account were those released by a given country regardless of whether sequencing was performed inside or outside this country, considering the origin of the clinical specimen. We also checked the numbers of SARS-CoV-2 genomes for some countries on other sequence databases including the National Center for Biotechnology Information (NCBI; https://www.ncbi.nlm.nih.gov/; accessed on 11 April 2021), the European Bioinformatics Institute (EMBL-EBI; https://covid-19.ensembl.org/index.html; accessed on 11 April 2021), and the China National Center for Bioinformation (CNCB; https://bigd.big.ac.cn/ncov/; accessed on 11 April 2021).

Results
A total of 450,968 SARS-CoV-2 genomes were available from the GISAID database on 28 January 2020. They originated from five continents, from 135 countries and 8919 laboratories. The mean (±standard deviation) number of genomes per country was 3340 ± 18,498 (range, 556) and the median number was 129. The mean number of genomes per 100 SARS-CoV-2-associated deaths per country was 270 ± 1422 (0.06-14,397) and the median was 6.2. Finally, the mean number of genomes per 100 SARS-CoV-2 diagnosed cases per country was 2198 ± 9105 (0.001-70) and the median number was 0.120.
The top 100 source laboratories accounted for 72% (n = 324,837) of available genomes (Supplementary Table S1). They were mostly located in the USA (62%; n = 24), in England (21), in Denmark (11), and in the Netherlands (6). When considering the 19 countries for which the number of genomes per 100 deaths was > 100, 6 were in Europe (Iceland (number of genomes per 100 deaths = 14,397), Denmark (1680), Luxembourg (405), Norway (229), UK (186), and Finland (174)), while 8 were in Asia (Singapore (5969), Taiwan (2143), Thailand (653), Vietnam (406), Mongolia (350), Japan (310), Brunei (167), and South Korea (117)), 3 were in Oceania (New Zealand (4380), Australia (1902), and Papua New Guinea (144)) and 2 were in Africa (Gambia (344), and Equatorial Guinea (110)) (Figures 1 and 2; Table 1).        In addition, six (30%) of these countries have a human development index below the mean value for the 135 countries studied here (0.756): Thailand (human development index = 0.755), Vietnam (0.694), Mongolia (0.741), Gambia (0.460), Papua New Guinea (0.544), and Equatorial Guinea (0.591). Moreover, all these six countries have a GDP per capita below the mean value for the 135 countries studied here (22,884 US dollars) (Table 1). Similarly, when considering the 24 countries for which the number of genomes per 100 diagnosed cases was ≥ 1, eight were in Asia (Taiwan, Vietnam, Japan, Thailand, Singapore, Brunei, South Korea, and China) and three were in Africa (Gambia, Equatorial Guinea, and Democratic Republic of Congo). In addition, seven (29%) of these countries have a human development index below the mean value for the 135 countries studied here (0.  (Figure 4) by country. Finally, we checked for several countries that they did not submit significant numbers of SARS-CoV-2 genome sequences to sequence databases other than GISAID and particularly found a similar number of genomes in the China NBI sequence database that compiles sequences from GISAID and GenBank in comparison with GISAID alone (Supplementary  Table S2).
For a better legibility of the graph, only countries with more than 100 SARS-CoV-2 genomes are shown. Grey and yellow strips indicate countries with numbers of genomes per 100 deaths between 10 and 100, and between 1 and 10, respectively. Blue, green, and orange dots mark countries from Africa, America, and other regions, respectively, with a human development index below the mean value for all 135 countries studied here (0.756). Viruses 2021, 13, x FOR PEER REVIEW 11 of 15 For a better legibility of the graph, only countries with more than 100 SARS-CoV-2 genomes are shown. Grey and yellow strips indicate countries with numbers of genomes per 100 deaths between 10 and 100, and between 1 and 10, respectively. Blue, green, and orange dots mark countries from Africa, America, and other regions, respectively, with a human development index below the mean value for all 135 countries studied here (0.756). For a better legibility of the graph, only countries with more than 100 SARS-CoV-2 genomes are shown. Grey and yellow strips indicate countries with numbers of genomes per 100 deaths between 10 and 100, and between 1 and 10, respectively. Blue, green, and orange dots mark countries from Africa, America and other regions, respectively, with a GDP per capita below the mean value for all 135 countries studied here (22,884 US dollars). GDP is in US dollars. For a better legibility of the graph, only countries with more than 100 SARS-CoV-2 genomes are shown. Grey and yellow strips indicate countries with numbers of genomes per 100 deaths between 10 and 100, and between 1 and 10, respectively. Blue, green, and orange dots mark countries from Africa, America and other regions, respectively, with a GDP per capita below the mean value for all 135 countries studied here (22,884 US dollars). GDP is in US dollars.

Discussion
This analysis, conducted 10 months after WHO declared COVID-19 a pandemic (https://www.who.int/director-general/speeches/detail/who-director-general-s-openingremarks-at-the-media-briefing-on-covid-19, accessed on 11 march 2020), shows great disparities according to the country in the numbers of SARS-CoV-2 genomes available per 100 cases and deaths, as well as substantial discrepancies between these numbers and the ranking of countries based on their wealth and development, although this was not a general pattern. Here, we considered SARS-CoV-2 genomes from a given country regardless of whether they were obtained inside or outside this country. Therefore, the present analysis shows that several developed countries had either a technological or organizational delay in terms of high throughput sequencing, and/or an insufficient purposefulness to

Discussion
This analysis, conducted 10 months after WHO declared COVID-19 a pandemic (https: //www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarksat-the-media-briefing-on-covid-19, accessed on 11 march 2020), shows great disparities according to the country in the numbers of SARS-CoV-2 genomes available per 100 cases and deaths, as well as substantial discrepancies between these numbers and the ranking of countries based on their wealth and development, although this was not a general pattern. Here, we considered SARS-CoV-2 genomes from a given country regardless of whether they were obtained inside or outside this country. Therefore, the present analysis shows that several developed countries had either a technological or organizational delay in terms of high throughput sequencing, and/or an insufficient purposefulness to monitor SARS-CoV-2 genetic and proteic diversity and variability. Thus, firstly, in some developed countries, the importance of detecting, characterizing, and surveying SARS-CoV-2 variants may have been initially overlooked. Secondly, the majority of laboratories may have been unable to produce a large number of SARS-CoV-2 genomic sequences because the necessary infrastructure was not in place at the start of the pandemic. This includes the fact that these laboratories did not possess or even did not have access to next-generation sequencing instruments for clinical diagnosis, but only possessed sequencers using Sanger technology.
Another reason could have been the lack of organization in terms of human resources or pre-existing training, allowing a high capacity for high-throughput sequencing. Other obstacles could have been global supply chain issues for reagents and consumables. In contrast, several developing countries exhibited wills as well as capacities to sequence SARS-CoV-2 genomes and scaled up next-sequencing technologies [30][31][32]. This is another example that the SARS-CoV-2 pandemic is reshuffling the cards globally.
Limitations to the present study are that it may not comprehensively take into account all SARS-CoV-2 genome sequences obtained in each country. Thus, all genomes sequenced may not be submitted to a sequence database. They may also be submitted to other sequence databases than GISAID, but we did not observe by screening four different sequence databases that these results were biased by disparities between the proportions of sequences submitted to GISAID and other major sequence databases according to countries. Moreover, the human development index and GPD per capita analyzed here do not necessarily reflect the strength of medical research and technology at a country scale.
Such a worldwide distribution of the availability of SARS-CoV-2 genomes as observed here is very interesting. Indeed, several issues related to SARS-CoV-2 genotypic features which are of paramount importance are currently in the forefront of the SARS-CoV-2 pandemic. SARS-CoV-2 variants cause successive or overlapping epidemics with various kinetics, levels of contribution to the total burden of SARS-CoV-2 infections and durations [5][6][7]. In addition, they can be associated with differences regarding disease transmissibility and severity, and they can have the potential to evade immune responses elicited by prior infection or vaccine immunization [5,7,16,17,[26][27][28].
Overall, in a new disease caused by viruses with a high mutation rate, as we have learned for a long time with human immunodeficiency virus and hepatitis C virus, it is absolutely necessary to survey and monitor viral genome sequences to detect mutants and variants in order to identify possible differences in terms of transmissibility, clinical severity, resistance to treatments, and escape from vaccine immunity as well as natural immunity. SARS-CoV-2 genome-based surveillance should optimally be continuous with weekly assessments and should be capable of detecting the emergence of the viral variants and monitoring the dynamic and outcome of their epidemics. Considering previous data, broad-scale SARS-CoV-2 genomic surveillance should have been a priority for all developed countries that had the means to perform it.
Author Contributions: Conceived and designed the experiments: D.R. and P.C. Contributed for the materials/analysis tools: P.C. Analyzed the data: D.R. and P.C. Wrote the paper: D.R. and P.C. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: Data are available from the GISAID database (https://www.gisaid. org/; accessed on 28 January 2021), from the "Our world in data" website (https://ourworldindata. org/; accessed on 28 January 2021), or from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.