Next Article in Journal
Pollution, Particles, and Dementia: A Hypothetical Causative Pathway
Previous Article in Journal
Comparing Single-Objective Optimization Protocols for Calibrating the Birds Nest Aquifer Model—A Problem Having Multiple Local Optima
Open AccessArticle

The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure

1
Social Data Collaboratory, Public Health, NORC at the University of Chicago, Chicago, IL 60603, USA
2
Biostatistics, School of Public Health, University of Illinois at Chicago, Chicago, IL 60612, USA
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2020, 17(3), 864; https://doi.org/10.3390/ijerph17030864
Received: 10 January 2020 / Revised: 28 January 2020 / Accepted: 29 January 2020 / Published: 30 January 2020
(This article belongs to the Section Health Behavior, Chronic Disease and Health Promotion)
Public health and social science increasingly use Twitter for behavioral and marketing surveillance. However, few studies provide sufficient detail about Twitter data collection to allow either direct comparisons between studies or to support replication. The three primary application programming interfaces (API) of Twitter data sources are Streaming, Search, and Firehose. To date, no clear guidance exists about the advantages and limitations of each API, or about the comparability of the amount, content, and user accounts of retrieved tweets from each API. Such information is crucial to the validity, interpretation, and replicability of research findings. This study examines whether tweets collected using the same search filters over the same time period, but calling different APIs, would retrieve comparable datasets. We collected tweets about anti-smoking, e-cigarettes, and tobacco using the aforementioned APIs. The retrieved tweets largely overlapped between three APIs, but each also retrieved unique tweets, and the extent of overlap varied over time and by topic, resulting in different trends and potentially supporting diverging inferences. Researchers need to understand how different data sources can influence both the amount, content, and user accounts of data they retrieve from social media, in order to assess the implications of their choice of data source. View Full-Text
Keywords: Twitter; social media data source; point of access; data quality; e-cigarette Twitter; social media data source; point of access; data quality; e-cigarette
Show Figures

Figure 1

MDPI and ACS Style

Kim, Y.; Nordgren, R.; Emery, S. The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure. Int. J. Environ. Res. Public Health 2020, 17, 864.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop