Next Article in Journal
A Repertoire of Virtual-Reality, Occupational Therapy Exercises for Motor Rehabilitation Based on Action Observation
Previous Article in Journal
Knowledge Management Model for Smart Campus in Indonesia
Data Descriptor

TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels

Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha 34110, Qatar
*
Author to whom correspondence should be addressed.
Received: 22 November 2021 / Revised: 6 January 2022 / Accepted: 7 January 2022 / Published: 10 January 2022
(This article belongs to the Section Information Systems and Data Management)
As the world struggles with several compounded challenges caused by the COVID-19 pandemic in the health, economic, and social domains, timely access to disaggregated national and sub-national data are important to understand the emergent situation but it is difficult to obtain. The widespread usage of social networking sites, especially during mass convergence events, such as health emergencies, provides instant access to citizen-generated data offering rich information about public opinions, sentiments, and situational updates useful for authorities to gain insights. We offer a large-scale social sensing dataset comprising two billion multilingual tweets posted from 218 countries by 87 million users in 67 languages. We used state-of-the-art machine learning models to enrich the data with sentiment labels and named-entities. Additionally, a gender identification approach is proposed to segregate user gender. Furthermore, a geolocalization approach is devised to geotag tweets at country, state, county, and city granularities, enabling a myriad of data analysis tasks to understand real-world issues at national and sub-national levels. We believe this multilingual data with broader geographical and longer temporal coverage will be a cornerstone for researchers to study impacts of the ongoing global health catastrophe and to manage adverse consequences related to people’s health, livelihood, and social well-being. View Full-Text
Keywords: social sensing; COVID-19; sentiment analysis; trends analysis; geo-mapping; natural cities social sensing; COVID-19; sentiment analysis; trends analysis; geo-mapping; natural cities
Show Figures

Figure 1

MDPI and ACS Style

Imran, M.; Qazi, U.; Ofli, F. TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels. Data 2022, 7, 8. https://doi.org/10.3390/data7010008

AMA Style

Imran M, Qazi U, Ofli F. TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels. Data. 2022; 7(1):8. https://doi.org/10.3390/data7010008

Chicago/Turabian Style

Imran, Muhammad, Umair Qazi, and Ferda Ofli. 2022. "TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels" Data 7, no. 1: 8. https://doi.org/10.3390/data7010008

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop