A Conceptual Model for Geo-Online Exploratory Data Visualization: The Case of the COVID-19 Pandemic

: Responding to the recent COVID-19 outbreak, several organizations and private citizens considered the opportunity to design and publish online explanatory data visualization tools for the communication of disease data supported by a spatial dimension. They responded to the need of receiving instant information arising from the broad research community, the public health authorities, and the general public. In addition, the growing maturity of information and mapping technologies, as well as of social networks, has greatly supported the diffusion of web-based dashboards and infographics, blending geographical, graphical, and statistical representation approaches. We propose a broad conceptualization of Web visualization tools for geo-spatial information, exceptionally employed to communicate the current pandemic; to this end, we study a signiﬁcant number of publicly available platforms that track, visualize, and communicate indicators related to COVID-19. Our methodology is based on (i) a preliminary systematization of actors, data types, providers, and visualization tools, and on (ii) the creation of a rich collection of relevant sites clustered according to signiﬁcant parameters. Ultimately, the contribution of this work includes a critical analysis of collected evidence and an extensive modeling effort of Geo-Online Exploratory Data Visualization (Geo-OEDV) tools, synthesized in terms of an Entity-Relationship schema. The COVID-19 pandemic outbreak has offered a signiﬁcant case to study how and how much modern public communication needs spatially related data and effective implementation of tools whose inspection can impact decision-making at different levels. Our resulting model will allow several stakeholders (general users, policy-makers, and researchers/analysts) to gain awareness on the assets of structured online communication and resource owners to direct future development of these important tools.


Introduction
By addressing public institutions-from a global to a local scale-the World Health Organization (WHO) Guidelines [1] acknowledge that communication expertise has become essential to outbreak control, as much as epidemiological training and laboratory analysis practices. Indeed, failing in communication might significantly increase the possibility to delay the outbreak control, undermining public trust and compliance, leading to unnecessarily prolonged economic, social, and political turmoil. Ultimately, mistakes in outbreak communication might weaken the rapid containment of an epidemic (i.e., limitation of mortality rates and of related socio-economic/environmental impacts [2,3]). Among the communication tools that address WHO's guidelines-despite not specifically cited-geographical visualizations can be acknowledged as highly significant since they can solidly support an assessment of the spatial distribution and diffusion patterns spanning a variety of cultures, political systems, as well as levels of education/economic development [1].
During an emergency, communication that takes place through visualization tools responds to multiple perspectives: first, it answers the need for instant information from impact on their own lives or small businesses), large companies, research centers (with impact on sub-systems of society), decision-makers in the health system, and policy-makers (whose decisions impact the large part of the society).
The structure of this paper is as follows. We first provide a critical discussion of recent Geo-OEDV-based literature (Section 2). Section 3 presents a conceptualization of the actors, data types, data providers, and tools that are currently employed for the purposes of pandemic-related visualization. Section 4 grounds our proposed taxonomy on the real instance of COVID-19 dashboards, for which we collected around 120 cases and defined the dimensions to be analyzed. Building on this data collection, in Section 5 we propose our results: a critical analysis of collected evidence (addressing RQ1), a focus on expert Geo-OEDV genomic and clinical tools (addressing RQ2), and an extensive modeling effort of Geo-OEDV in terms of an Entity-Relationship schema (answering RQ3). In conclusion, we discuss our contributions and future directions in Section 6 (proposing our answers to RQ4).

A Brief Review of Geo-Oedv-Related Literature
Kamel, Boulos, and Geraghty [9] recall that the earliest visualization of the relationship between location and health aspects dates back to 1694, related to the plague containment in Italy. Disease studies have revealed strong spatiality aspects in terms of location and diffusion. The value of maps as a tool in epidemiology as well as in medical and health geography blossomed over more than 200 years ago. Maps have been used to understand and track infectious diseases spread, such as the one of yellow fever, cholera, and influenza in 1918 [9][10][11][12]. The digital turn started in the 1960s allowed-through GIS systems-to significantly increase the possibilities for analyzing, visualizing, and detecting patterns of disease and their socio-economic and environmental impacts.
Lyseen et al. [13] found that, in health GIS-related literature, 28.7% of papers focused on infectious disease mapping. During the 2000s, GISs underwent a profound development, becoming web-based tools with increasingly more interactive and customizable possibilities, including the display of maps integrated within other structured information, acquiring the denomination of "geo-dashboard".
In [3], we have provided a preliminary analysis on the convergence of different information levels (spatial, statistical, genomic, and epidemiological) within a unique instrument of analysis, which uses design and mapping solutions to communicate the COVID-19 crisis. The first geo-dashboard for COVID-19 cases has been implemented at the Johns Hopkins University (Baltimore, USA) to analyze the data collected and hosted at the Center for Systems Science and Engineering (CSSE) [14]. This interactive tool has been widely adopted since very early during the first wave of the pandemic. It allows inspection of the numbers of infections, deaths, and recoveries within an interactive map; for each location, a graph details the progress of infections over time. The dashboard employs data from five authoritative sources: the World Health Organization, the European Centre for Disease Prevention and Control, the US Centers for Disease Control and Prevention, the National Health Commission of the People's Republic of China, and the DXY Chinese Web medical resource.
After this first instance, we attended to the birth of a wealth of works witnessing the need for online communication of the COVID-19 pandemic, for the creation of dedicated tools, and for the use of GIS technology for mapping the intricacies of the disease spread [15]. Popular solutions published during the first wave of the pandemic were built by the WHO (see Figure 1), the World Bank (https://datanalytics.worldbank.org/covid-dashboard/), a network of academic institutions that go under the name of Open COVID-19 Data Curation Group [16], the Artificial Intelligence Policy Observatory of the Organisation for Economic Co-operation and Development, as well as by single countries and private citizens.
Proposed solutions certainly provided many benefits to a quasi-real-time communication that was not possible during past pandemics. However, many concerns remain open. Some tools have become mainstream; in general, the audience tends to grow fond of specific resources that are not necessarily the most authoritative and updated ones, but possibly are better disseminated or show the most captivating features in terms of usability. There has been a general call to arms of the researchers in academia inviting to a responsible scientific approach that guarantees accurate, reliable, and representative information to affected communities worldwide [17].
In general, the community of Information Science and Geographical Information Systems is united in encouraging the use of GISs for the effective communication power of near real-time daily mapping of cases and fatalities [18]. However, the phenomenon of dashboards has been analyzed also with a very critical perspective, as it does not capture adequately the geographies of the present. With this respect, in [19] Everts defines the "dashboard pandemic" as the phenomenon that gives birth to new pandemic governmentalities, that miss more nuanced spatial, temporal, social, and epidemiological information. With this respect, there is the issue of communicating using a granularity level that is interesting for the audience. Collecting high-quality data at a fine spatial resolution is hard and encounters many barriers at the technological and privacy level [20]. Solutions include embracing data at a more general level: big data could help a more granular vision of the problem; many approaches are being experimented with big data analytics instruments to help fight pandemics and public health emergencies in general [21][22][23]. Another group of researchers requests commitment to sharing epidemiological data [16,24] and nucleotide sequences [25] in an open science manner. These are fundamental requirements for guaranteeing that everybody can contribute with their analysis (especially knowledgeable research groups in academia, as claimed by Wissel et al. [26]), towards the achievement of digital public health aiming to increase the effectiveness of tracing/isolating strategies for pandemic control [27].
In this preliminary review study, we allow particular attention to a significant case of tool evolution: the COVID-19 Situation Dashboard of the WHO [28]. In particular, a semiotic comparison [29] between the versions of the website is significant. From January 26th to April 6th, 2020, the WHO published online an ArcGIS Operation Dashboard (shown in Figure 1a) that contained at the center a world map showing-with simple WebGISbased features-the geographical diffusion at a country-level detail, in addition to affected countries, laboratory-confirmed cases by date of report, and a cumulative curve with dates and scale in plain text. Only for China, data were provided by provinces, autonomous regions, and municipalities as well.
A completely different interface was deployed as of April (see Figure 1b). Aesthetic and user experience were substantially improved and an exploratory feature was included. However, the principle of having all significant data in one single screen was strongly affected, as well as the axis descriptions. The type of offered communication moved from a scientific/technical one to a more emotional one. From a one-screen geo-dashboard visualization style, the new website shifted to a vertical layout, significantly reducing the number of information readable at first sight. A user is now requested to scroll and perform more actions to capture a broad view; full ranking among all countries and numeric proportions are lost. Indeed, only the main 12 highly affected countries are reported in the home page; cases are by absolute number, not related to the country's population. Graphs convey a more symbolic message (through improved aesthetics, dynamics, and emotional communicative choices), compared to the more statistics-based previous geo-dashboard. On a secondary note, an aggregate analysis by continent has been added.

Materials
In this section, we provide the framework of Geo-OEDV in terms of system components and main entities that will be used to later conceive our conceptual model. The analysis is based on our preparatory research, held during an epidemic-critical context, which allowed us to build a rich corpus of evidence for informing the following modeling process.
Four views have been selected to characterize the geo-visualization support for epidemics, taking the perspective of a Geo-OEDV developer: First, the relevant actors are categorized (Section 3.1). Second, Section 3.2 provides a taxonomy of different data types that are interesting for online communication during a world epidemic. Then, we propose a brief discussion on data sources and their reliability (Section 3.3). Finally, we discuss the definition of web-based instruments, i.e., dashboards, infographics, and explorers (Section 3.4).

Actors
The analysis of COVID-19 Geo-OEDV tools led to identifying the main actors involved in the process, following the perspective of designers/developers. Other actors are grouped in two subsets: those that use and benefit from information derived from Geo-OEDV (referred to as stakeholders), and those that own these tools, or more in general contribute to their governance.
Use-related actors (or stakeholders) are represented by the following categories: • General users: private citizens who wish to be informed, to deal with daily decisions in their personal lives, solve their sense of curiosity and socialization of emotion, and alleviate the sense of fear and uncertainty brought by epidemics. Other stakeholders such as small/medium size entrepreneurs and managers who need to be informed to tackle business and managerial choices. • Policy-makers: their decisions reflect on a whole country or society (e.g., Civil Protection in Italy or similar national bodies), on actions such as writing laws, arbitrating large-scale supply and logistics, strategic alliances with other countries, or flight traffic. These stakeholders are greatly supported by the analytical and monitoring power of Geo-OEDV tools. • Researchers/Analysts from private or public organizations, using Geo-OEDV to understand, monitor, and plan actions and policies.
Governance-related actors are of three kinds: • Analysts from bigger companies, hospitals, or research centers, developing resources to feed Geo-OEDV and related data analysis, to inform decision-making, leading to an impact on subsystems of society at a higher level (i.e., provincial/regional). They can be aided by pinpointing which technologies and dimensions are used to communicate the pandemic. • Researchers produce scientific knowledge, i.e., resources that deserve further investigation, such as how different Geo-OEDV configurations may distinctly convey information or how these tools have contributed to public risk perception during different waves of the pandemic. • Owner. A public or private organization that collects data and/or owns the Geo-OEDV.

Data Types
From an information science perspective, data can be unstructured (in various forms and supports, e.g., files, web pages, images) or structured (organized within databases). A multitude of different data can be available to track information related to a disease. Referring to the perspectives of epidemic geography, bioinformatics, and GIS science, we consider a number of categories. The following data type taxonomy resembles the one proposed in [20]: • The most specific kind of data relates to genomic aspects of both the virus and the host organism, i.e., the patient, and it is typically produced in sequencing laboratories; these data are described by conceptual models such as the Genomic Conceptual Model [30], the Conceptual Schema of the Human Genome [31], and the Viral Conceptual Model [32]. Genomic data are produced and hosted at many consortia and initiatives' sites (see in [33] for a complete review). • Clinical (or medical) data are collected from medical institutions; they include admission symptoms, risk factors, exposure information, and hospitalization course, among other information. Imaging data represent a particular subset of clinical data. A dated conceptual model for this information was proposed in [34] but more recent efforts are arising in the Cancer Genomics practice [35] and in the COVID-19 research community [36], as shown in [37]. • Epidemiological data include all the heterogeneous categories that serve the unique purpose of modeling disease-diffusion waves and predicting transmission patternsa comprehensive set of methods for this data is given in [38]. • Health administration data generally include the information regarding hospital capacities, quality of life, causes of death, health conditions of the population--this kind of information is usually available at the level of institutions (see, e.g., https: //healthdata.gov/ by the U.S. Department of Health and Human Services Office of the Chief Technology Officer). • Socio-economic and environmental data include a very broad set of information (e.g., social media, mobility, and transportation, employment, financial, air quality, weather, etc.).
All such categories can be georeferenced and are clearly orthogonally related to a spatial component: genetic and clinical data are connected to the location of the infected organism or medical patient; epidemiological and health data only make sense when properly set in a defined geographic area; and socio-economic-environmental data always report a geographical and also temporal scale.

Data Providers and Their Reliability
Considering the data sources of Geo-OEDV, datasets may contain original primary data-that can become official statistics-or secondary elaborations. Data should always satisfy the minimum requirements of being authoritative, reliable, and updated. Within these constraints, three levels can be recognized: Big data private companies such as Google, Apple, and Twitter can release datasets on specific topics (e.g., mobility trends and tweets).

Data Visualization Categories
In this research, we focus on the use of visualization to communicate on the Web the data types discussed in Section 3.2 to a wide public, in the scenario of an epidemic situation. We analyze in a systematic manner a number of available OEDV tools along with their relationships with underlying geographical information systems and their online representation, i.e., WebGIS.
OEDV systems are a broad set of tools that, in order to be effective, need to lay on the paradigmatic "three-legged stool" conceptualized by Iliinsky and Steele [39]. The shared objective is to visualize and effectively communicate complex data and their analytics. The visualization, according to this model, is the result of the interactions and activities of three main elements: data, designers, and readers. The dominant relationship determines the visualization result type, i.e., informative, persuasive, or artistic.
Building on Iliinsky and Steele's model and encompassing a geographical and health perspective, we conceptualized an extended configuration (see Figure 2), that adds three aspects: (i) data can include geo-referenced data; (ii) the designer might be a developer with a mixed profile of content management, software development, and digital cartography skills; and (iii) the reader becomes a stakeholder with particular attention to understanding location implications.
In such case, the output tends to be dominated by GIS technologies, rather than plain data reported in form of tables, graphs, or artistic design. Note that to further prove this assumption, we later described a significant analysis on a collection of 121 relevant websites representing Geo-OEDV instances (see Section 4). Primarily, tools that allow geo-visualization are of three categories, discussed in the following. Readers can refer to Figure 3 for example instances of these types.

Geo-Dashboards.
Understanding the meaning of different levels of data, especially when interlinked with each other, is anything but simple, requiring in most cases a fairly solid background in quantitative analysis and computer programming. Instead of querying data directly, business and decision-makers need intermediate means to access information, translate it into knowledge, and consequently into action. Dashboards are a type of data graphical interface to visualize, in an immediate fashion, key information in summarized ways, with the aid of graphs, parameters to be changed interactively, and possibly maps. Originally, dashboards have been widely employed in the business intelligence departments, where users need to grasp quickly the analytics that matters to their business or project (e.g., trends, occurrences, indicators). Dashboards are preferred to simpler visualizations as their interactive nature engages non-technical users in the discovery/communication process. Different kinds of information can be built in real-time depending on which dataset and parameters are chosen.While analyzing the viral success of the map-based dashboard of the Johns Hopkins University's Center for System Science and Engineering, Kamel Boulos and Geraghty [9] note that anyone with internet accessin a short time and in few clicks-can learn an increasing amount of information about COVID-19 outbreaks, by, for example, reading a text, a table, a graph, a map, or directly a geo-dashboard. This latter one integrates all the former, providing data visualization that includes map objects that allow any user, even lacking significant previous knowledge, to understand the analytics and spatiality of a phenomenon framed by an intuitive single (even limited) screen, i.e., a Web browser tab or a single-page application.
Infographics. Simpler forms of dashboards are called infographics when no user interaction is allowed (i.e., they are static), but more focus is dedicated to aesthetics [40] and to comprehension and memorability of the employed visualization charts [41]. The reader's attention is captured by using principles of graphic design and by targeting large and diverse audiences [42]. We call "stories" particular infographic instances where maps are combined with narrative text, images, and multimedia content, using a consequential pace; this particular visualization type allows to greatly promote the power of maps and geography of the pandemic to a non-technical public.
Explorers. Sophisticated geo-visualization tools including the possibility to perform statistical and geo-statistical analytics are called explorers. Usually, explorers also include mathematical models application, allowing users to set particular parameters to adjust predictive functions and scenarios. The target of these tools is typically different from the ones of infographics, as knowledge of statistical models is assumed.

Methodology
This section describes the practical method applied to collect and structure the data for our analysis, further setting the basis for the resulting conceptual model.
Our objective was to explore the possibilities of variables, dominant choices, spurious occurrences, and recurrent patterns so to pinpoint clusters of types and also exceptional cases that deserve to be mentioned as outliers.
With the aim to inform the model design discussed in the results, COVID-19 Geo-OEDV methodologies were investigated through the creation of a collection of 121 relevant websites; Table A1 reports the pages on COVID-19 holding significant data on the spatiality of the phenomena. The collection used in this work started on 20 February and ended 3 May 2020, i.e., an average phase one period of the outbreak. Each link, i.e., representing a visualization case, was collected through active monitoring of health national and international institutions' resources, news, plus an ad hoc search of keywords in the main search engines. At the conclusion of the second European COVID-19 wave (December 2020) we have rechecked all links to ensure their functioning and to update the related information. A list of 15 Geo-OEDV tools collections has also been considered (see Table A2 in the Appendix A), even if these are more sporadic occurrences on the Web. Despite the fact that a set of considered entries (121 relevant websites and 15 collections) cannot be considered the universe of possibilities, the learning process performed during the search phase consistently informed our effort of modeling of the phenomenon; it also guided our preliminary analysis and deductions: we recognized a broad set of conceptual entities, including the primary technologies for visualization and mapping, active organizations, geographical coverage and granularity, and data richness. In particular, Table A1 in the Appendix A shows a preliminary characterization of the sample. For each analyzed Geo-OEDV tool, we included eight kinds of information:

1.
Data type: the basic type of information shown by the tool (see Section 3.2). In the majority of cases, this corresponds to epidemiological information such as the number of infected cases (of SARS-CoV-2, the virus responsible for COVID- 19), paired with the number of deaths, of performed tests. Another important cluster of platforms is based on genomic information on the virus sequences. Other types include social media reactions to disease spread, predictive risk mapping using population travel data, tracing and mapping super-spreader trajectories and contacts across space and time.

2.
OEDV tool category: dashboard, infographic, or explorer (see Section 3.4). In the first case, data and queries are proposed mainly through a web-based cartographic representation, hence referring to generally reported as GIS technology. In the second case, data and queries are proposed mainly through a mixed statistical visualization, including maps. In the third case, maps and statistical information are significantly integrated and rich in complexity.

3.
Dominant visualization technology: Geo-OEDV employs all kinds of libraries or frameworks to structure the front-end of a tool; this information is not always available.

4.
Dominant mapping technology: the system used to represent maps and interactions on them. We overview also systems without maps, as long as they include an explicit and dominant knowledge of geographical areas (e.g., in filters or graphs).

5.
Wideness of geographical coverage: the extension of the geographical area represented in the tool (e.g., worldwide, a specific country or city).

6.
Depth of geographical coverage: the granularity of the provided information. Counts and other statistics may be given on a country, region, province, or city-level granularity. 7.
Type of owner of the page: the organization behind the development and sponsorship of a Geo-OEDV tool may be public or private, from the research or institutional domain (see Section 3.1, Governance point).

8.
Name of the owner of the page.
Other ten categories/parameters have been monitored: language, the openness of the platform, source of data, data download features, and type of repository, used secondary technologies, first online release, frequency of updates and updating method, closing or abandonment date, and interactivity. However, this information could not be captured for all platforms; therefore, it was excluded from the dataset used in this paper, as consistent statistical analysis could not be performed with such incomplete parameters.

Results
This section reports on the descriptive statistical analysis of our 121-record collection (Section 5.1) which extends and updates the one performed in [3]. Analyzed instances are mostly focused on representing the infected cases (what we called "epidemiological data" in Section 3.2); then the case of genomic geo-dashboards, referring to the "viral sequences" data type, is discussed in Section 5.2. Finally, we overview our proposed Entity-Relationship diagram to represent the space of Geo-OEDV tools used for pandemic events such as COVID-19 in a comprehensive and structured way (Section 5.3).

Statistical Data Analysis
In this section, we aim to observe the main trends in publicly available Geo-OEDV tools, by answering our initial research question RQ1. The analysis of the datasetsummarized in Table 1-shows that the great majority of analyzed tools are focused on the number of infected cases, followed by a small number of platforms dedicated to the viral sequences and mutations data (mainly proposed by universities and research centers), and by some experiments on forecasting of the epidemics. Two platforms analyze information spreading during the pandemic, whereas other single occurrences report on connections between infection cases (based on traveling data), mobility, supplies, or healthcare capacity.
The second dimension shows that the most spread online explanatory data visualization method is the dashboard one-based on WebGIS-with about 48% of occurrences, followed by infographics (∼41%) and explorers (∼11%).
Leading choices for the mapping libraries include a combination of MapBox, Open-StreetMap, and OpenMapTiles (∼26%); Esri and HERE technologies (∼18%); and Leaflet (∼4%), while Google Maps API, Bing, and Tableau each cover about 3% of the sample. Instead, ∼10% employ other single case solutions. In about 13% of the samples we were not able to infer the dominant mapping technology, as it probably corresponds to ad hoc solutions by the developers of the specific platforms, whereas in ∼13% of the dataset there is no explicit use of maps, but strong use of geo-spatial related attributes and labels.
The geographical coverage tends to include worldwide data with a country-based detail (∼26%); a high rate is represented by country coverage data with a provincial (17%) or regional (14%) breakdown. Many platforms focused on the whole USA with detail on counties (∼4%) or only on one US state with detail on its counties (∼3%). Only 10 records present data at the scale of a municipality or a city. It is also worth mentioning that only in a very few tools (four cases in total) the details of maps were reported at the building level or at the point granulometry. Such a low percentage expresses the economic and technological complexity of reaching a fine detail and the differences among countries in data privacy online publication.
Another analysis dimension concerns who has promoted the publication of COVID-19-related data. Table 1 highlights that geo-visualizations tend to be uniformly distributed among the public and private spheres. In particular, private persons, empowered by lightweight geo-visualization tools that allow anyone with programming skills to build engaging and interactive maps, are the most significant cluster (∼17%), followed by newspapers or news platforms (∼16%), private companies (∼14%, most of which are specialized in geographical information systems), national institutions (∼13%), research centers (∼13%), universities (∼9%), and multilateral organizations (7%). The remaining ∼10% is composed of sub-national public organization, non-governmental organizations (NGO), and volunteered geographical information or has unknown ownership.
WebGIS-based dashboards tend to be used by national, multilateral, and research centers, while the private segment, with the exception of GIS-specialized firms, tend to use the infographic-based cartographic approach. This confirms our expectations, as WebGIS often require higher specialized skills and more expensive software investments; also, in general, the aesthetic design is of lower quality. On the other hand, WebGIS-based visualizations are richer in data, interactive features, and analytics, but more advanced statistical and cartographic knowledge might be needed to fully grasp such richness. Moreover, complexity in the usage, lower aesthetic quality and understanding are found in the case of explorer features.
Thanks to the large availability of libraries to create interesting visualizations (see a list on the Geo Awesomeness blog [43]), reaching high aesthetic standards in dashboards has become relatively simple. Even individuals with limited technical skills (i.e., nonprogrammers) are learning ready-to-use development platforms, where dashboards can be easily prepared and shared with the community. Notable examples are given by DataWrapper (https://app.datawrapper.de/) and Flourish (https://flourish.studio/), which include free plans, or Tableau, which gives the possibility to share one's dashboards on a public gallery (https://public.tableau.com/en-us/gallery).
Regarding data sources (defined in Section 3. 3), the analysis shows that very few Geo-OEDV are not using institutional data or the Johns Hopkins University's CSSE data.
The main used sources are regional and national competent authority and the WHO. Even when no epidemic but health and other data are reported Even when the reported data is not epidemiological or clinical, the most common sources are the official ones; it is the case of health and other data, such as distributed material, location feeding level, social media sentiment, info-tracking, development, phylogenetic, fever, and traffic. This behavior shows the effectiveness of open data policies carried out by public institutions and research centers. Datasets are generally provided via simple downloads of CSV or similar formats directly via GitHub or equivalent repositories. This clearly leads to a beneficial effect in reducing the so-called "infodemic" and diffusion of fake news; however, the high concentration of data reduces possibilities of counter-checks.

The Case of Genomic and Clinical Geo-Oedv
This section targets the need for tools that are specific of pandemics, communicating data also at the genomic and clinical level-thereby responding to the research question RQ2. While the majority of observed Geo-OEDV focuses on reporting infected cases, we found a set of platforms with a focus on viral sequences, i.e., data describing the genomic characteristics of viral samples extracted through COVID-19 tests and sequenced in genetic laboratories throughout the world. Considering the high value for stakeholders (especially for researchers in the COVID-19 pandemic), we performed a restricted drill-down of our analysis to evaluate potential trends of this sub-type of tools.
Literature suggests that a number of experiments on real-time visualization have been performed in the past, long before the coronavirus epidemic, on other pathogens [44][45][46]. These systems were considered under the umbrella of "outbreak analytics" in [47]; some of them focused on phylogenetics [48][49][50], as they allowed to observe the evolutionary history of a taxonomic group of organisms. This feature is very critical during a pandemic caused by a virus that is continuously mutating, thus possibly impacting contingent studies on medications and vaccines.
Among the analyzed tools, NCBI Virus is a general portal for sharing sequences of any virus [51]. They have created a specific resource for SARS-CoV-2 (https://www. ncbi.nlm.nih.gov/labs/virus/vssi/#/sars-cov-2), the virus responsible for COVID-19, with simple map-based visualization.A platform that provides much more data of this kind, as it integrates different data sources, is ViruSurf [52], which characterizes its sequences by using location information (e.g., continent, country, region, and municipality, when available). Other well-known players of virology bioinformatics have contributed with a geo-spatial analysis on important mutations, such as the D614G variant on the Spike protein [53]; variant distributions in the world ( [54,55]), and the COVID-19 virus mutation tracker (https://www.cbrc.kaust.edu.sa/covmt/index.php?p=maps). Others have build trackers of virus subtypes [56] and of haplotype networks [57]. Two important platforms, NextStrain [48] (see Figure 4, left) and Microreact, have produced recent ad hoc endpoints for COVID-19 viral sequences phylogeny: these are being accessed thousands of times every day by researchers who analyze the phylogeny of the virus, to monitor the spread of particular lineages (as it happened with the Spanish variant in September 2020 [58] or the UK strain in December 2020 [59], attracting much attention from the media).
We also found a restricted number of platforms that are based on the "clinical data type" [60], including, for example, tools to monitor healthcare system capacity [61,62], to forecast different potential outcomes of the COVID-19 epidemic based on several parameters [63] (see Figure 4, right), or to assess the risk level of attending events [64].
Overall, genomic and clinical tools represent significantly less frequent cases of fairly complex Geo-OEDV structures; these are typically less user-friendly, suggesting that stakeholders do not correspond to the general public or non-specialized policy-makers, but analysts and researchers with advanced knowledge of scientific aspects of the pandemic.  [48] for SARS-CoV-2 phylogenetic and transmission analysis; available online at https://nextstrain.org/ncov/global; screenshot date: 23 January 2021. On the right, COVID-19 Senarios platform [63] for exploring and simulating population and epidemiological factors that impact the disease outbreaks; available online at https://covid19-scenarios.org/; screenshot date: 23 January 2021.

A Geo-Oedv Entity Relationship Model
The conducted descriptive analysis convinced us that many of the characteristics that we associated directly with Geo-OEDVs, instead pertain conceptually to different entities, which play an important role in the dashboard/tool conception, realization, and use. Our most important research question, i.e., RQ3, concerned the possibility to create a general model to convey information on creation, structure, control, and dissemination of Geo-OEDV tools. To express the complexity of this system, we thus propose to employ a powerful modeling formalism, i.e., the Entity Relationship (ER) diagram, which allows to building abstractions of real objects and phenomenon, such as classification, aggregation, and generalization. ER models were first introduced in [65] and became quickly prominent as an industrial standard for conceptually designing databases [66].
The described domain of knowledge is that of online visualization platforms that employ geo-spatial information to inform their public on particular health-related events, typically epidemics. Our modeling effort is inspired by the COVID-19 case study but applies to more general scenarios. Precisely defining entities involved and the relationships that connect them-characterized by a given cardinality-is important to reveal interesting facts such as the presence of (i) a few data sources that feed many platforms; (ii) stakeholders with different profiles that access the same platform; and (iii) collections that host different platforms and, vice versa, platforms that are hosted on different endpoints (possibly managed by different players), just to mention a few examples.
In Figure 5, we represent the skeleton of the proposed ER model. To allow for the readability of the figure, we here omit the attributes, while we list them comprehensively in Table 2, along with their descriptions. Following our systematization effort of Sections 3 and 4, we here identify thirteen main entities; they are represented with rectangle-shaped boxes with a thick stroke. They are related one to another with relationships named as indicated in the diamond-shaped boxes, to be read from the central entity outwards. For example, the GEOOEDVTOOL has one OWNER, while one PAGE includes 1:N (i.e., one-to-many) MODULEs. The central entity is described by four views: • its internal structure is composed by a set of PAGEs, which in turn include MOD-ULEs that are made of single LAYOUTCOMPONENTs (see Figure 6 for the graphical representation of one possible dashboard layout); • its technology includes: a software part, based on one SOFTWAREREPOSITORY that contains a set of SOFTWARECOMPONENTs of different kinds; a data part, relying on a DATAMART, which aggregates information from a single (or possibly a set of) DATABASEs, where the ORIGINALDATASOURCEs have been imported.
• its use comprises a set of use PROFILEs, belonging to given STAKEHOLDERs (these can be private or institutional ones). • its governance is defined by the OWNER of the platform and by the dissemination strategy: the platform appears in several RESOURCEs. Arrows are used to represent generalization relationships; these connect secondary entities that are specializations of the main entities. Specifically, the Geo-OEDV tool is a generalization of various categories, such as EXPLORER, WebGIS DASHBOARD, and INFOGRAPHIC (that could be of the STORY kind). These types are intrinsically different; they are used in different scopes and usually require different expertise to be fully exploited from stakeholders (see Section 3.4). Both a stakeholder and an owner of a GEOOEDVTOOL may be a PRIVATEACTOR or a PUBLICACTOR. A RESOURCE may be a CONTENTHOST (which in turn can be a COLLECTION, aggregating or only linking several tools), or a SCIENTIFICPUBL/PATENT. This is produced by one or more RESEARCHERs, who in turn can deliver one to several publications. A SOFTWARECOMPONENT may be of several kinds; here, we focus on MAPPINGSOFTWARE, STATISTICGRAPHICLIBRARY, which at times are merged or served by the same software provider, and on MATHEMATICALMODELs).

Entity
Attribute Attribute Description Mult.

GEOOEDVTOOL release_version
Version of the tool since its first release. release_date First release date of the tool geographical_coverage Maximum geographical coverage represented in the tool (see Table 1) geographical_granularity Finest geo-spatial detail level represented in the tool (see Table 1  The basic data information reported by the source (see Table 1) geographic_region Geographic regions represented in the source schema List of tables and schema of tables × download_format File or other format provided for download of data data_update_frequency Frequency of data update at the source license License under which the data is provided to the public metadata_availability List of metadata further characterizing the provided data × PROFILE language Language of communication (usually English, unless tool from national/regional institutions) login_needed If a (pay) login is required to access the tool use_configuration Privileges of the profile (granting access to specific layers of the data/analysis) req_previous_knowledge Assumed background of the stakeholder with this profile STAKEHOLDER name Name of the stakeholder/user type E.g., institution, organization, citize, group of people... expertise_level Previous knowledge of geo-spatial data and statistics PRIVATEACTOR type E.g., person, organization, company, firm, ... PUBLICACTOR level E.g., national, regional, provincial, multilateral, ...

OWNER name
Name of organization or single that manages the page (see Table 1) type Type of organization that manages the page (see Table 1) RESOURCE title Title of resource (e.g., Newspaper article title, University COVID-19 analysis page...) publication_date When the resource has been officially published and shared with the public We include a self-relationship on GEOOEDVTOOL, as each tool can be put in a relationship with a previous (or later) version of the same system. Each LAYOUTCOMPONENT in the internal structure view relies on a specific SOFTWARECOMPONENT in the technology view. The EXPLORER OEDV category uses one-to-several MATHEMATICALMODELs defined as software components. Cardinalities of relationships can be appreciated in detail in Figure 5: • One-to-one relationships connect the tool with its software repository and the data mart; then, one-to-many relationships connect these elements respectively to the software components, and to the databases and original data sources. • The internal structure view is characterized by one-to-many relationships outwards (from one single tool to many components). • From the central entity, the only many-to-many relationship is the one between the tool and the resources that use it, as they may host many tools; for example, the John's Hopkins University's dashboard appears on many websites and collections [14], at the same time many of such resources are collections or aggregators of different tools. • From the owner, towards the tool, we draw a one-to-many relationship. • A tool can have many profiles; these belong to one stakeholder. Similarly, one stakeholder may have many profiles; each of these corresponds to only one tool. • Other N:N relationships are between an explorer and its mathematical models, and between a scientific publication and the researchers sharing its authorship. Table 2 provides a list of all attributes related to each entity, accompanied by their description. Data types are the obvious ones: attributes starting with "is_" are Boolean flags, versions are numbers (or possibly strings, if they contain sub-versions), dates of release and publication are of date type, all others are strings. In the last column "Mult." we mark the attributes that represent a list of values. In a subsequent version of the model, these would probably be represented as an additional table, referenced by the main one. Seven attributes and three entities (dashboard, explorer, and infographic) are underlined, as they represent the eight information aspects that we captured in our previous analysis (see Section 4). An example instance of our ER model is represented by the Italian Civil Protection Department dashboard (previously shown in the left top corner of Figure 3), which is every day inspected by millions of Italians. The Italian Civil Protection Department is a particular STAKEHOLDER of this advanced GEO-OEDVTOOL, which has a specific use PROFILE, i.e., the one employed in the control room of the department. This profile has high privileges as it should monitor all relevant data that allow to take timely decisions for the whole nation. The dashboard for general stakeholders is available on a GIS by Esri (https://opendatadpc.maps.arcgis.com/apps/opsdashboard/index.html#/b0c68bce2 cce478eaac82fe38d4138b1), the data repository is available on GitHub (https://github. com/pcm-dpc/COVID-19), as an ORIGINALDATASOURCE that can be employed also by other tools. Note that in the ER diagram we do not represent higher-level information such as the actions operated by (some) stakeholders on data containers, software, or tools. This would require another kind of formalism, e.g., Business Process Models.

Discussion and Conclusions
In this research, we argue that during emergency times, advanced communication tools that combine online visualization with mapping interactive features-named here Geo-Online Exploratory Data Visualization-are of uttermost importance. Indeed, they are able to produce synthetic communication and to effectively convey geo-spatial information to a large set of stakeholders, including both individuals and institutions, at different scales from municipal/regional, to national/multilateral levels.
Our work tackles both the identification of main entities involved in the use of georeferenced data and their relationship in the case of pandemics (such as COVID-19). We precisely modeled the domain of Geo-OEDV tools, specifically when these showcase geo-spatial data-either in the form of maps or as an explicit notion-for performing analysis and driving user interaction. First, we provided a systematization of actors, data types, providers, and visualization tools. We then employed this taxonomy to analyze a collection of about 120 sites presenting relevant geo-visualizations dedicated to the COVID-19 pandemic communication. Based on this collection, we were able to (i) operate a critical statistical analysis of collected evidence (according to eight high-level parameters) showing main recurrence, choices, and typologies of platforms openly available on the Web, and (ii) propose a novel Entity Relationship model that overviews Geo-OEDV tools from four views, i.e., their internal structure, use, governance, and technology.
In this discussion, we motivate our research and answer to our last research question RQ4, which has to do with the benefits of proposing a conceptual model for Geo-OEDV, trying to identify the most relevant stakeholders of these systems. Conceptual Modeling has been previously used in GIS-related problems [67] also applied to specific fields (e.g., archaeology [68] or management of energy production [69]) but, to the best of our knowledge, this is a novelty for pandemic cases. With this formalism, we stress the importance of understanding in depth the tools that are currently driving the communication to the broad public of the pandemic, in its diffusion and implications. We deem conceptual models to be of substantial relevance for developers/designers, resource providers, and data owners for producing high-quality Geo-OEDV for stakeholders. Additionally, our model can be used as a starting framework for deeper analyses of Geo-OEDV as next-generation tools for communicating pandemics in a more informed fashion, where stakeholders of different expertise levels can understand the phenomenon with diverse degrees of detail.
Our data collection may highlight gaps in the system. For instance, we highlight that genomic and clinical data Geo-OEDV are less frequent and in general less userfriendly, but with a deep and vast content of information. Another clear trend is the rare combined use of cross-disciplinary data types, despite this being one of the main GIS features. For example, socio-economic data are hardly ever matched to epidemiological data, resulting in the impossibility of finding meaningful correlations among decisions made to control the epidemic.
Our conceptual model may be used to guide the logical and physical design of a relational database that serves as a repository of health-related dashboards. Such a data collection may be input to several existing frameworks to assess dashboards' usability and measure their value (see, e.g., in [70]) on several perspectives. Analysis on the use and interactions on Geo-OEDV tools on COVID-19 may also inform studies on public risk perception on the COVID-19 emergency (see, e.g., in [71]).
This study has implications on several sides and provides support for different stakeholders: general users, with impact on their own lives or small businesses; analysts/developers, as representatives of bigger hospitals/companies/research centers that have resources to devote to data analysis, with impact on subsystems of society; and policy/decision-makers-who are making laws and deciding on supplies, economy, and alliances among countries-which impact on the entire society.
Considerable amounts of new Geo-OEDV tools are being realized in these hectic times; this tendency witnesses that the worldwide research community and the general public are asking for always more effective forms of quantitative communication of the pandemic. However, while reviewing the dataset during the second wave of the pandemic, we found that, with respect to the first wave of the COVID-19 pandemic, there were no considerable changes and advancements in the proposed Geo-OEDV tools. This may be an interesting direction to investigate in future works; it also suggests that conceptual modeling could be used to track the evolution of tools while tackling gaps of communication, pinpointing interesting or alarming trends.

Informed Consent Statement: Not applicable
Data Availability Statement: Data sharing is not applicable to this article. Data is contained within the article or Appendix A.

Conflicts of Interest:
The authors declare no conflict of interest.