1. Introduction
Globalization, common access to the Internet and the rapid development of social media make fake information easy to spread. The phenomenon is so widespread that the English phrase “fake news” has made it its way to other languages. “Fake news” is defined as: “fabricated information that imitates news media content in form but not in organizational process or intent, which overlaps with other information disorders, such as misinformation (false or misleading information) and disinformation (false information deliberately disseminated to deceive people)” [
1]. False information is a threat that impacts reality. Thus, it has also attracted the attention of researchers who for years now have been analyzing the role that fake news plays, how it spreads and ways to eliminate it [
2,
3,
4,
5,
6].
Fake news related to health has been spreading for quite some time. Most of it refers to anti-vaccination movements [
7,
8], teeth strengthening and whitening at home [
9], oncology treatments [
10]. Fake news may provide patients and their families with false hope and reason for questioning existing medical knowledge or for exerting pressure on physicians [
10].
False information may be particularly harmful during pandemics such as in the SARS-CoV-2 pandemic that has infected millions of people and is still spreading [
11]. Searching for information during a pandemic about prevention, treatment and recovery is a natural reaction. What makes people fearful is the COVID-19 mortality rate. The death toll (27 April 2020) is hundreds of thousands around the world at approximately 7% of all infections, and a Case Fatality Ratio (CFR) of 18.8% [
12,
13].
In the situation of a rapidly developing pandemic and the lack of a vaccine or medication against COVID-19, news about prevention quickly makes it to the social media such as Facebook, Twitter, Whatsapp, Tik Tok or Instagram. Some of the information is false and may be even harmful [
6,
14]. Therefore, the amount of fake news appearing during the study period was limited by the actions of official entities (state and regional authorities, public health institutes, scientific societies, World Health Organization (WHO), etc.) providing verified information. At the same time, these institutions provided information on how to check information about a pandemic emerging in the public space. Among the most frequently appearing fake news was information on various methods of preventing infection, such as consuming large amounts of vitamin D, vitamin C, garlic, drinking water every 15 min [
15,
16]. Conspiracy theories also appeared, among them, the theory about the responsibility of 5G technology for the spread of the virus, which contributed to the destruction of the technology’s masts, and the theory that vaccination against COVID-19 serves to subjugate citizens [
15,
16].
The first official cases of pneumonia of unknown etiology in the city of Wuhan were reported by Chinese authorities to the WHO on 31 December 2019. A new type of virus responsible for pneumonia was identified in January 2020 and classified as 2019-nCoV. Just several weeks later, similar cases had been reported in Thailand, Japan and South Korea [
17]. The first European cases were in Italy on 31 January 2020. The WHO Information Network for Epidemics (EPI-WIN) for providing reliable information began operating on the same day [
18]. On 11 February, the WHO also introduced a new classification, renaming 2019-nCOV to SARS-CoV-2 and naming the disease caused by the virus COVID-19 [
19]. New ICD-10 and ICD-11 codes were implemented and COVID-19 became an official cause of death [
20,
21].
The WHO classified COVID-19 as a pandemic on 11 March 2020 [
22]. A month earlier, the WHO had used the term “infodemic” in one of its daily reports to describe the vast amount of both false and true information that was making it difficult for people to find reliable information when they needed it. The infodemic was spreading faster than COVID-19 itself [
23,
24]. Infodemiology is a science around the distribution and determinants of information in electronic media. Infodemiological data are usually collected and analyzed in almost real time. Some examples of infodemiological data applications are in analyzing search engine queries to predict disease outbreaks (e.g., influenza) and identifying and monitoring public health online publications, e.g., anti-vaccine websites. Infodemiology can provide valuable information on population health behaviors [
25,
26].
In this study, researchers investigated online information related to the SARS-CoV-2 coronavirus and the COVID-19 diseases as returned by a Google keyword search. They chose countries on four continents: Europe (Poland, UK, Spain, Italy, France and Germany), North America (USA), Asia (Singapore) and Australia. There was an increase in COVID-19 cases in these countries during the April 2020 period. These countries also differed in the percentage of deaths among those infected. Compared to the incidence on 1 April 2020, at the end of the month in countries such as Poland and the UK, the increase in infections was 5–6 times. In Italy, Spain, France and Germany, the increase was more than double. The largest differences were observed in Singapore (more than a 14-fold increase), but a significant increase in the incidence did not occur until 20 April. The increase in infections was the slowest in Australia (133%); usually it was a double-digit daily increase, with the highest recorded amounting to 266 new cases (2.04). The number of deaths varied from country to country. The highest percentage of deaths as of 1 April 2020 in relation to cases was recorded in Italy (11.9%), Spain (9.0%), Great Britain (8.7%), France (7.1%). These values were higher than the world average (5.2%). In other countries, this ratio was lower than the world average and amounted to: USA: 2.9%, Poland: 1.8%, Australia: 0.5% and Singapore: 0.3% [
11,
13].
The aim of the study was to characterize and analyze the pages of the first 30 search engine results (SERP) for each country publicly available during the pandemic, and to compare them between countries. Detailed objectives included:
- -
Characterization and analysis of websites typology by Google based on “COVID-19”, “Coronavirus”, “SARS-CoV-2” and “fake news” key words;
- -
Characterization and analysis of online information returned by Google based on “COVID-19”, “Coronavirus” and “SARS-CoV-2” key words;
- -
Calculating frequency of fake news on “COVID-19”, “Coronavirus”, “SARS-CoV-2” key words;
- -
Characterization and analysis of online information returned by Google based on “fake news” key words;
- -
Analysis of associations between results and epidemiological data on COVID-19, such as: number of deaths, number of infections, number of SARS-CoV-2 tests performed;
- -
Characterization and analysis of online information returned by Google based on “COVID-19”, “Coronavirus”, “SARS-CoV-2” and “fake news” about celebrities, religion and testimonials;
- -
Analysis of the Journal of the American Medical Association (JAMA score).
The research question was whether there were differences between the countries in content available online about COVID-19.
2. Materials and Methods
2.1. Websites
The study was conducted in 2020 between March 30 and April 27 using the methodology employed by Arif N. et al. [
7] for analyzing content about immunization. Authors used “vaccines” and “autism” keywords in Google between June and September 2017 and categorized the websites. The typology included: Commercial (C), Government (G), Health portal (HP), News (N), Non-profit (NP), Professional (P), Scientific journals (SJ) and “others” (O).The JAMA score was used (for the presence of the following information: author, date, references, owner of website) and the webpages were annotated according to the following features: (1) the name of the vaccine mentioned; (2) the overall stance on vaccines (positive, negative, or neutral); (3) the chemicals or adjuvants mentioned; (4) whether the page mentioned complementary and alternative medicine (CAM); (5) whether religion was mentioned; (6) whether the page contained a testimonial; (7) whether a celebrity was mentioned [
7].
This study was carried out according to SRQR guidelines (standards for reporting qualitative research) [
27].
The research was carried out using both Google Chrome and Firefox browsers without any add-ons. All cookies and the browser history were deleted prior to the search process to avoid any bias in the results [
7]. The researchers were aware that Google may identify the user’s location using their IP address and so influence the results, so they did not use a VPN while searching. The following searches were conducted: Poland (google.pl), France (google.fr), Italy (google.it), USA (google.com), UK (google.co.uk), Singapore (google.com.sg), Germany (google.de), Spain (google.es), Australia (google.com.au). All SERPs for each country and one keyword were analyzed on the same day. This part of the methodology is important in order to calculate the return frequency of websites in relation to their type. If the content returned was too extensive to analyze in one day (one keyword for a country), the links to the first 30 SERPs were copied to Excel and analyzed over two days.
As the search process was performed in Poland, the research team deleted the browser history and all cookies, and reset Google to the selected country and language. Although the keywords were typed in English, Google was set to return results in the national languages.
2.2. Keywords
The country Google search engines were used and four key words: “COVID-19”, “Coronavirus”, “SARS-CoV-2” and “fake news”. All but one national language in the study use the same spelling for “coronavirus”. The local word “koronawirus” was used for Poland. In Singapore, which has four official languages including English, the returned results using “coronavirus” were considered equal and comparable to other countries. It was decided to use only those keywords as we planned to obtain a sample of the websites returned independently for each expression. Thus, we did not to use questions such as “mitigations measures?”, “auto tests?”, “how to treat COVID-19” or “how to protect against coronavirus” because the results would be different depending on an exact question. We decided to use the search terms “COVID-19”, “Coronavirus”, “SARS-CoV-2” and “fake news” as this best represents what the lay public would search on the Internet. The researchers had basic and advanced language skills in the languages of the analyzed countries. Unclear content was translated into English or was consulted upon with translators.
The researchers deliberately used the keyword “fake news” because at the time the study was conducted, it was popular on the web to comments on fake news about the COVID-19 pandemic. However, a detailed analysis showed that some of the fake news about the COVID-19 pandemic was not debunked, but accepted as real information.
2.3. Content Analysis
Three researchers analyzed the first 30 SERPs for three different countries at the same time. There was no more information on the Internet about the topic at the time the study was carried out. By design, 30 SERPs were taken into account after the initial research. After compiling items for different countries in their national languages (in Singapore-English), it turned out that getting 30 addresses for each keyword is impossible. In the end, the first 20 SERPs were taken into account, which for some countries and for this reduced number was impossible to obtain (fake news: Spain-16 pages, Poland-13, see:
Table 1). Each researcher had to analyze 4 key words for each country, i.e., 120 SERPs and 360 SERPs in total. Each researcher analyzed one keyword a day for a country. Whenever the content was too extensive, the links to SERPs were copied to Excel and investigated the next day. The overall number of the analyzed SERPs reached 685 with the following numbers for each of the countries: Poland: 73, France: 80, Italy: 80, UK: 61, Singapore: 81, Germany: 80, Spain: 76, Australia: 80, USA: 74.
Using a 0–1 system, the researchers assessed whether each type of information appeared on the SERP, where 0 means no information, and 1 means it was present. Moreover, each SERP was rated according to webpage typology. The fact that the information on the website was “fake news” was also assessed using 0–1, where 0 means real information and 1 means fake news. The researchers, by consensus, judged whether the information was fake news or not, on the basis of information posted on websites that were generally considered reliable. Such websites include: the World Health Organization, WHO; the Centers for Disease Control and Prevention, CDC; the National Health Service, NHS; or national health authorities. Additionally, it was necessary for the team to agree on the comparability of the published information in different countries on the same topic. A structured study protocol (flowchart) is presented in
Table 1 and the
Supplementary Materials (Tables S1 and S2).
Researchers have jointly defined the scope of the analysis based on the information on COVID-19 available in the public domain of the Internet methods used by Arif N. et al. [
7]. Each website was assessed according to the information it contained. The following content was analyzed: information on quarantines, disease/infection symptoms, disease/infection risk factors, disease/infection consequences; virus transmission routes; virus incubation period; virus carrier; disease treatment; prevention measures prior to infection; alternative/supplementary medicine–unconventional or “home” treatment (including attitudes towards alternative medicine: positive, negative, neutral); epidemiological data (number of cases, deaths etc.); whether the page contained a testimonial (e.g., a personal story); whether a celebrity was mentioned; whether a religion was mentioned; others such as: regulations, services, economy, information on online fake news. The “other” category also included information on disease etiology. The etiology was not initially considered a separate category; however, numerous websites included speculation on how the pandemic started. All such speculation was eventually classified as “other”. All the websites returned by Google using the “COVID-19”, “Coronavirus” and “SARS-CoV-2” keywords were judged as true or false according to the scientific knowledge provided by organizations such as the CDC; NHS; Chief Sanitary Inspectorate “Główny Inspektorat Sanitarny, GIS”; National Health Fund “Narodowy Fundusz Zdrowia, NFZ”; Ministry of Health “Ministerstwo Zdrowia, MZ”; WHO, etc. The results for “fake news” were checked for whether the false information on the website had been deemed false. Fake news was defined as information inconsistent with the knowledge and information provided by the WHO, CDC, NHS or national health authorities. The SERPs for the “COVID-19”, “Coronavirus” and “SARS-CoV-2” keywords did not contain any websites on alternative medicine; thus, it was not included in further analysis.
2.4. Websites Typology
Each website was classified according to the typology used in other studies [
7]:
- -
- -
Health Portal (HP): websites with information on a variety of health topics, e.g.,
www.medscape.com (accessed on 30 March 2020);
- -
- -
Non-Profit (NP): websites of non-profit organization, e.g.,
https://choice.npr.org (accessed on 30 March 2020);
- -
Professional (P): websites created by health professional organizations (medical school, clinic/hospitals, medical boards); e.g.,
https://sph.nus.edu.sg (accessed on 30 March 2020);
- -
- -
Scientific journal (SJ): websites of academic journals, e.g.,
www.thelancet.com (accessed on 30 March 2020).
2.5. Selection of Countries
The countries were from four continents: Europe (Poland, UK, Spain, Italy, France and Germany), North America (USA), Asia (Singapore) and Australia. Out of these nine countries, six had the highest number of infections as of 27 April 2020: USA, Spain, Italy, France, Germany and UK [
11]. It was initially intended to include four countries but eventually extended to include USA, Singapore and Australia and make it more representative and comparable across continents. The European countries had the highest number of COVID-19 cases. Poland was added as the authors’ home country.
2.6. Inclusion and Exclusion Criteria
The researchers originally assumed to analyze the first 20 websites returned on each search engine result page (SERP). However, in some countries, keywords such as “fake news” returned only a few SERPs concerning the pandemic. Links to “Wikipedia”, “top stories”, “ads”, the WHO site, paid content and sites requiring registration were excluded. The links on the SERPs that referred to the keywords were copied to a standard Excel spreadsheet and described in detail. The SERPs returned by Google from the “fake news” keyword were included only if they referred to the COVID–19 pandemic.
2.7. JAMA Score
The JAMA score was based on information such as author (authorship), date (currency), financial ownership (disclosure) and references (attribution) [
7,
28]. Researchers evaluated each of these four aspects and either awarded a point or not. The JAMA score is the sum of the points awarded to a given website (for information relating to each of the four categories). The evaluated website could therefore receive between 0 and 4 points. In the JAMA evaluation, 1 point is insufficient information, 2–3 points are partially sufficient information and 4 points represent completely sufficient information [
7,
28].
2.8. Epidemiological Data and Statistical Analysis
Search results were tabulated with numbers and percentages. The COVID-19 epidemiological data for each analyzed country were read from the Worldometer website for the period from 1–27 April 2020 and presented as numbers [
11]. The data covered: number of deaths, number of infections, number of SARS-CoV-2 tests performed. A Spearman rank correlation coefficient was used to determine associations between the search results and COVID-19 epidemiological data [
29,
30,
31]. Correlation coefficients were calculated between COVID-19 epidemiological data and the frequency of coronavirus articles in the websites, and the frequency of information types in the websites. The correlation coefficient was considered statistically significant at
p < 0.05. The JAMA score was used to assess the reliability of each website, where the score considers the author, date, financial ownership and references at 1 point for each. The range of JAMA score values is then from 0 to 4 [
25]. The JAMA scores are presented as medians and interquartile ranges (IQR). Statistical analyses were performed with STATISTICA v 13.1 (Dell Inc., 2016, Tulsa, OK, USA).
5. Conclusions
The analyzed content about COVID-19 returned by Google during the pandemic showed that governments in the analyzed countries took effective measures in fighting fake news during the pandemic. Most of the published information was available on news or governments sites, referred to prevention, epidemiological data or disease symptoms. There were differences between continents when it came to the types of information available online: Asia was dominated by epidemiological data, Western Europe and Australia by prevention, and North America and Central Europe by risk factors. The COVID-19 pandemic information, including fake news, rarely made reference to celebrities, religion or testimonials. The JAMA score was comparable in most of the analyzed countries, except for Singapore and USA, where it was higher.
Although the first 20 SERPs from the COVID-19, coronavirus and SARS-CoV-2 keywords contained true information, it is inevitable that false information did make its way to the Internet when analyzed more deeply. Most commonly, SERPs for the “fake news” keyword described examples of fake news on the Internet and social media. In countries with the highest number of tests, a higher frequency of information on fake news referring to prevention was found. The higher the number of SARS-CoV-2 infections, the lower the amount of information on fake news referring to prevention against infection with the virus.