Review Reports - COVIDSensing: Social Sensing Strategy for the Management of the COVID-19 Crisis

Round 1

Reviewer 1 Report

The topic addressed in the paper is very current and the tool presented can be useful. I appreciate that the tool is publicly available on the web.
The article describes the methods used and the case study. It could be more detailed how the potential user can use the application (eg how to set up sending an alert, how to administer it, etc.).
Regarding the web application, the article does not explain what are the "hard and soft" sensors that are in the application? Furthermore, it is not clear from the article how much the application depends on the language used (Spanish)? Can it be used for other languages? If so, what needs to be done?

If possible, it would be good to edit some images - eg image 2 contains illegible texts. Also Figures 6, 7 and 8 - what exactly is displayed, what amount of data, some scale?

Author Response

The topic addressed in the paper is very current and the tool presented can be useful. I appreciate that the tool is publicly available on the web.

REPLY: We appreciate your positive feedback. Thank you.

1) The article describes the methods used and the case study. It could be more detailed how the potential user can use the application (eg how to set up sending an alert, how to administer it, etc.).

Reply: We thank the reviewer for letting us explain the interaction with our tool. In COVIDSensing there are two types of users; administrator and viewer. The administrator can set up the topics of interest to be tracked by the tool on social networks. As has been seen since the beginning of the pandemic, society's interests/concerns are changing as the pandemic evolves. Within each topic, a series of texts containing the keywords of that topic are collected and assigned a Problem Perception Index (PPI) (i.e., value between 0 and 1), which is a metric that assesses how likely the microtext is representing a problem within that category. Subsequently, a threshold limit is established and a number of messages above which a potential problem within that topic is reported. For example, if 5 messages with a PPI of 0.75 are detected in the same topic, the administrator is alerted of a potential problem. Both the number of messages and the PPI threshold is configurable by the administrator. This has been better explained in the text.

2) Regarding the web application, the article does not explain what are the "hard and soft" sensors that are in the application?

Reply: Hard and soft sensors is a classification of sensing strategies in the multisensor data fusion area. Initially, information from electronic devices (e.g. quantitative data) was referred to as hard sensors and information from human sensors as soft sensors. Truth be told, the hard sensors in COVIDSensing are the information that comes from the health department and not from a particular sensor. We have clarified this in the text.

Hall, D. L., McNeese, M., Llinas, J., & Mullen, T. (2008, June). A framework for dynamic hard/soft fusion. In 2008 11th International Conference on Information Fusion (pp. 1-8). IEEE.

3) Furthermore, it is not clear from the article how much the application depends on the language used (Spanish)? Can it be used for other languages? If so, what needs to be done?

Reply: The following paragraph has been inserted (end of section 3.3.1) to clarify this issue:

"It should be noted that, although this article focuses on the processing of micro-texts in Spanish, COVIDSensing has already been expanded to deal with English. Since the construction of the semantic spaces associated with topic categories is based on WordNet synsets, each one representing a set of synonymous words in a variety of languages, adapting this tool to other languages only requires a few further resources. In particular, COVIDSensing makes use of two types of language-dependent resources, i.e. text-processing resources (e.g. POS tagger) and lexical resources (e.g. polarity lexicon). In this regard, since the former are readily available for many European languages, the main effort should only be placed on the latter."

4) If possible, it would be good to edit some images - eg image 2 contains illegible texts. Also Figures 6, 7 and 8 - what exactly is displayed, what amount of data, some scale?

Answer: Figure 2, 3, 4, 5, 8 and 9 (now figure 3, 4, 5, 6, 9 and 10) has been re-generated to increase the font size. In Figures 6, 7 and 8 (now figures 7, 8 and 9), “Message Influx” graphs represent the total number of tweets, RSS and telegram messages by date. This date will depend on the previously selected filter, an hour, a day, a week, a month, etc. This graph focuses on providing qualitative information on the amount of information from different sources. This clarification has been added in the text after the explanation of the new figure 3, (old figure 2).

Reviewer 2 Report

Authors consider interesting tool for prediction of COVID development and its management. It is very effective that the tool is based on data from social media. This paper presents interesting result. But in the kay words authors indicated conceptions “Natural Language Processing”, “Machine Leaning”, “Data Analysis”. From my point of view, these conceptions are considered enough and this article is still a more popular publication for the COVIDSensing tool. I would be very happy to see in this work a more detailed presentation of the methods and algorithms used in this tool. Authors mention the use of ARIMA. Could you compare the this model with other tools? for example, with:

intelligent tutoring system (Subirats, L .; Fort, S .; Atrio, S .; Sacha, G.-M. Artificial Intelligence to Counterweight the Effect of COVID-19 on Learning in a Sustainable Environment. Appl. Sci. 2021, 11, 9923.)
-neural networks (Kamis, A .; Ding, Y .; Qu, Z .; Zhang, C Machine Learning Models of COVID-19 Cases in the United States: A Study of Initial Lockdown and Reopen Regimes. Appl. Sci. 2021, 11, 11227.)
decision trees (Levashenko , V .; Rabcan, J .; Zaitseva, E. Reliability Evaluation of the Factors That Influenced COVID-19 Patients' Condition. Appl. Sci. 2021, 11, 2589.)

I suppoce that an addition to the discussion of paper: Henzel, J .; Tobiasz, J .; Kozielski, et al., Screening Support System Based on Patient Survey Data — Case Study on Classification of Initial, Locally Collected COVID-19 Data. Appl. Sci. 2021, 11, 10790. would be just as interesting.

Author Response

Reply: The authors thank you for your thorough review, as well as for your constructive comments.

1) I would be very happy to see in this work a more detailed presentation of the methods and algorithms used in this tool.

Reply: We have included more details of the COVIDSensing tool in Section 3. Particularly, we detail the software architecture (see new figure 2), the geolocation and categorisation methodology (subsection 3.3.1) We have detailed The methodology, that is based on ARIMA and LSTM is now better explained in section 5.

2) Authors mention the use of ARIMA. Could you compare the this model with other tools? for example, with:

intelligent tutoring system (Subirats, L .; Fort, S .; Atrio, S .; Sacha, G.-M. Artificial Intelligence to Counterweight the Effect of COVID-19 on Learning in a Sustainable Environment. Appl. Sci. 2021, 11, 9923.)
-neural networks (Kamis, A .; Ding, Y .; Qu, Z .; Zhang, C Machine Learning Models of COVID-19 Cases in the United States: A Study of Initial Lockdown and Reopen Regimes. Appl. Sci. 2021, 11, 11227.)
decision trees (Levashenko , V .; Rabcan, J .; Zaitseva, E. Reliability Evaluation of the Factors That Influenced COVID-19 Patients' Condition. Appl. Sci. 2021, 11, 2589.)

Reply: Interesting papers. We have included a discussion of these papers in the background section.

Reviewer 3 Report

The authors present a timely, interesting and important research. The introduction / background sections contain an adequate number of references.

English requires moderate modifications. Most of the sentences are fine, although some are hard to read.

Regarding the language, it seems that the framework was only used in Spanish. Is it possible to use the framework in other languages? If so, does the framework require modifications? Are there any plans for this?

In line 213 the authors mention a lexicon that was used. However, there are several NLP lexicons. Which one did the authors use? This part of the research should be more detailed in the article.

A smaller remark: the link for the CovidSensing website is misspelled in the text of Figures 2 and 3.

The format of the references require a little modification: when citing journal articles, abbreviated journal names should be used.

Author Response

The authors present a timely, interesting and important research. The introduction / background sections contain an adequate number of references.

Reply: We appreciate your positive feedback. Thank you.

1) English requires moderate modifications. Most of the sentences are fine, although some are hard to read.

Reply: Deep proofreading has been performed on both the style and the English language.

2) Regarding the language, it seems that the framework was only used in Spanish. Is it possible to use the framework in other languages? If so, does the framework require modifications? Are there any plans for this?

Answer: The following paragraph has been inserted (end of section 3.3.1) to clarify your questions:

3) In line 213 the authors mention a lexicon that was used. However, there are several NLP lexicons. Which one did the authors use? This part of the research should be more detailed in the article.

Reply: The following paragraph has been inserted (beginning of section 3.3.1):

"Such categorisation is grounded on a knowledge-based approach. In his regard, apart from the Spanish WordNet [35], from which we leverage knowledge about semantic relations occurring between words, COVIDSensing employs several other resources that were constructed in [36], e.g. the polarity lexicon and the lexicon of valence shifters. On the one hand, the polarity lexicon includes positively- and negatively-marked words in terms of sentiment analysis. On the other hand, the lexicon of valence shifters includes words and phrases that can neutralize the values of the topic and sentiment attributes (e.g. no, sin [without]), or increase or decrease the value of the sentiment attribute (e.g. bastante [enough] or poco [little], respectively)."

4) A smaller remark: the link for the CovidSensing website is misspelled in the text of Figures 2 and 3.

Reply: We have removed the URL from these figures to focus on the relevant information of the tool.

5) The format of the references require a little modification: when citing journal articles, abbreviated journal names should be used.

Reply: Yes, you are right, in the published version the journal names are abbreviated. Nevertheless, according to the Latex instructions, you can provide the full name of the journals, as we do in the latex bibtex file. It seems that in the proofreading process, these full names are converted into those abbreviated journal names.

Reviewer 4 Report

In this manuscript, the authors present COVIDSensing.com - a real-time dashboard for a set of predefined socioeconomic and health categories and their combinations, measuring COVID-19 pandemic issues. The manuscript is well-written and well-structured. The proposed methodology for COVID19 data analysis is based on natural language processing and forecasting methods (ARIMA and LSTM). The new methodology is explained systematically.

My remarks are as follows:

In “3. COVIDSensing methodological tool” section, please add details about software implementation (development environment, programming language, libraries used).

In“5. Results and discussion” section, how is the Figure 9 obtained? What is the accuracy of your prediction? A comparison with results from previous similar studies is missing. In this section, please add also some recommendations for prevention of the spread of COVID-19.

The “6. Conclusion and Future Work” section should be extended – the study’s limitations are missing.

Technical remarks:

l. 4-5, 44-45: “dispersed throughout the geography” – Please, edit this phrase. The geography is a science.

l. 20-23: “The COVID-19 pandemic that is ravaging theworld is an unfortunate event about which we get a wealth of information through social media that can help us anticipate and advance decisions to improve the public health of citizens.” – The sentence is too long and should be edited.

l. 74-75: “evolve according to the evolution … evolves.” – Please, edit this fragment.

l. 75: “how gender violence evolves” – This social problem in not commented in the remaing part of the manuscript.

l. 119: “COVIDSensing is a tool that intrinsically describes a methodology” – The statement is not true. The software tool is created according to the proposed methodology.

Figure 2, 3, 4, 5 and 9: The font size is too small.

l. 287: “Study Case” -> “Case Study”

On web address https://covidsensing.com/aplicaciones/5f297d70b9ace7002df4b6bb

in the pie charts of following data series:

‘Processed Twitter messages’

‘Processed RSS articles’

‘Processed Telegram messages’ and

‘Processed messages and articles’

the data labels could be in (long) integer instead of real values (5 decimal places are redundant).

https://covidsensing.com: Message “There was a problem related to the server”. – This system warning could be suppressed in case of no data available.

Author Response

Reply: We appreciate your positive feedback and thorough review. Thank you.

My remarks are as follows:

1) In “3. COVIDSensing methodological tool” section, please add details about software implementation (development environment, programming language, libraries used).

Reply: We thank the reviewer for this suggestion that clarifies our approach. A new figure and explanation about the software implementation of COVIDSensing is provided.

2) In “5. Results and discussion” section, how is the Figure 9 obtained? What is the accuracy of your prediction? A comparison with results from previous similar studies is missing. In this section, please add also some recommendations for prevention of the spread of COVID-19.

Reply: We thank the reviewer for this question. The old Figure 9 (now Figure 10) shows the CI-14 days in real-time and a prediction based on a consensus strategy previously described in García-Cremades et al. As explained in the paper, the main goal of COVIDSensing is to predict trends in the evolution of the COVID-19 disease based on social detection. In order to make a reliable prediction, it should be fused with other sources and that’s why we include epidemiological figures and predictions also in the tool. For a full description of the CI-14 day prediction procedure, we refer the reader to the following paper:

Santi García-Cremades, Juan Morales-García, Rocío Hernández-Sanjaime, Raquel Martínez-España, Andrés Bueno-Crespo, Enrique Hernández-Orallo, José J. López-Espín, José M.Cecilia "Improving prediction of COVID-19 evolution by fusing epidemiological and mobility data" Nature Scientific Reports, 11A Article number: 15173 (2021)

3) The “6. Conclusion and Future Work” section should be extended – the study’s limitations are missing.

Reply: We have extended the conclusion section including a brief description of the current limitations and future work.

Technical remarks:

4-5, 44-45: “dispersed throughout the geography” – Please, edit this phrase. The geography is a science.
20-23: “The COVID-19 pandemic that is ravaging theworld is an unfortunate event about which we get a wealth of information through social media that can help us anticipate and advance decisions to improve the public health of citizens.” – The sentence is too long and should be edited.
74-75: “evolve according to the evolution … evolves.” – Please, edit this fragment.

Reply: Thank you for your remarks. Deep proofreading has been performed on both the style and the English language.

75: “how gender violence evolves” – This social problem in not commented in the remaing part of the manuscript.

Reply: You are right. We have removed this text.

119: “COVIDSensing is a tool that intrinsically describes a methodology” – The statement is not true. The software tool is created according to the proposed methodology.

Reply: we have rewritten this sentence.

Figure 2, 3, 4, 5 and 9: The font size is too small.

Reply: we have increased the font on these figures

287: “Study Case” -> “Case Study”

Reply: Corrected.

On web address https://covidsensing.com/aplicaciones/5f297d70b9ace7002df4b6bb

in the pie charts of following data series:

‘Processed Twitter messages’

‘Processed RSS articles’

‘Processed Telegram messages’ and

‘Processed messages and articles’

the data labels could be in (long) integer instead of real values (5 decimal places are redundant).

https://covidsensing.com: Message “There was a problem related to the server”. – This system warning could be suppressed in case of no data available.

Reply: we have revised the application and it should be working correctly now.