Next Article in Journal
Interoperability between Real and Virtual Environments Connected by a GAN for the Path-Planning Problem
Previous Article in Journal
Experimental Study of Primary Atomization Characteristics of Sonic Air-Assist Atomizers
Article

Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model

1
Department of Informatics, University of Rijeka, 51000 Rijeka, Croatia
2
Center for Artificial Intelligence and Cybersecurity, University of Rijeka, 51000 Rijeka, Croatia
3
Faculty of Humanities and Social Sciences, 51000 Rijeka, Croatia
*
Author to whom correspondence should be addressed.
Academic Editor: Valentino Santucci
Appl. Sci. 2021, 11(21), 10442; https://doi.org/10.3390/app112110442
Received: 4 October 2021 / Revised: 25 October 2021 / Accepted: 2 November 2021 / Published: 6 November 2021
(This article belongs to the Section Computing and Artificial Intelligence)
This study aims to provide insights into the COVID-19-related communication on Twitter in the Republic of Croatia. For that purpose, we developed an NL-based framework that enables automatic analysis of a large dataset of tweets in the Croatian language. We collected and analysed 206,196 tweets related to COVID-19 and constructed a dataset of 10,000 tweets which we manually annotated with a sentiment label. We trained the Cro-CoV-cseBERT language model for the representation and clustering of tweets. Additionally, we compared the performance of four machine learning algorithms on the task of sentiment classification. After identifying the best performing setup of NLP methods, we applied the proposed framework in the task of characterisation of COVID-19 tweets in Croatia. More precisely, we performed sentiment analysis and tracked the sentiment over time. Furthermore, we detected how tweets are grouped into clusters with similar themes across three pandemic waves. Additionally, we characterised the tweets by analysing the distribution of sentiment polarity (in each thematic cluster and over time) and the number of retweets (in each thematic cluster and sentiment class). These results could be useful for additional research and interpretation in the domains of sociology, psychology or other sciences, as well as for the authorities, who could use them to address crisis communication problems. View Full-Text
Keywords: sentiment analysis; clustering; BERT model; natural language processing; COVID-19; Twitter data; social media sentiment analysis; clustering; BERT model; natural language processing; COVID-19; Twitter data; social media
Show Figures

Figure 1

MDPI and ACS Style

Babić, K.; Petrović, M.; Beliga, S.; Martinčić-Ipšić, S.; Matešić, M.; Meštrović, A. Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model. Appl. Sci. 2021, 11, 10442. https://doi.org/10.3390/app112110442

AMA Style

Babić K, Petrović M, Beliga S, Martinčić-Ipšić S, Matešić M, Meštrović A. Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model. Applied Sciences. 2021; 11(21):10442. https://doi.org/10.3390/app112110442

Chicago/Turabian Style

Babić, Karlo, Milan Petrović, Slobodan Beliga, Sanda Martinčić-Ipšić, Mihaela Matešić, and Ana Meštrović. 2021. "Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model" Applied Sciences 11, no. 21: 10442. https://doi.org/10.3390/app112110442

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop