Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014
Abstract
:1. Introduction
1.1. Part I
1.2. Part II
2. Data Description
3. Methods
3.1. Manual Coding
- Language, written in English or not;
- Reported location by:
- (a)
- Country income level as described by the World Bank yearly revised gross national income (GNI) per capita classifications [24],
- (b)
- United States or not, and
- (c)
- By state or territory if in the United States;
- Specific keyword mention of, World AIDS Day, or, Red ribbon;
- Mentions of:
- (a)
- HIV/AIDS epidemiological content (e.g., incidence, prevalence, etc.),
- (b)
- Sub-populations (e.g., age, race/ethnicity, gender, sexual orientation, drug use, other subgroup, etc.);
- Mentions of:
- (a)
- HIV/AIDS prevention and behavior content (e.g., abstinence, faithfulness to partner, condom use),
- (b)
- HIV/AIDS testing,
- (c)
- HIV/AIDS disclosure,
- (d)
- Stigma/discrimination awareness, and
- (e)
- HIV/AIDS compassion and support.
3.2. Part I: Statistical Analysis
3.3. Part II: Support Vector Machine Model
3.4. Ethics Approval
4. Results
4.1. Part I: Statistical Analysis
4.2. Part II: SVM Results
5. Discussion
5.1. Limitations
5.2. Conclusions
Supplementary Materials
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Appendix A. Technical Details of Twitter Data Retrieval
A.1. Original and Public Tweet Only
A.2. Filter Settings
- Language filter: English was selected.
- Query filter: Two keywords were used for our search: AIDS and HIV. These words were searched separately for technical considerations. Nevertheless, this was equivalent to search, AIDS OR HIV, at once.
- Time frame filter: This study included three days (2014-11-30, 2014-12-01, and 2014-12-02). The minimum unit is one day (not one hour). This study searched them day by day, and selected, from 2014-11-30 to 2014-12-01, in order to retrieve all tweets posted on 2014-11-30. The time zone used by Twitter search is UTC +00:00.
A.3. Searches
A.4. Python Script.
- user_id: the unique id of the user who posted the tweet (please keep this confidential)
- user_name: the screen name of the user
- user_location: the self-reported location (this was used for geo-coding)
- retweet_count: the tweet has been retweeted how many times upon the data collection
- txt: the raw text of the tweet
- created_at: the time of posting
- mentions_ids: the unique ids of the mentioned users in txt, separated by comma
- mentions_sn: the screen names of the mentioned users in txt, separated by comma
Appendix B. Interrater Reliability
Question & Response | Cohen’s kappa |
---|---|
2a. Self-Reported Location: Country income level as defined by the World Bank? | 0.65 |
3. Did the tweet mention WAD (or red ribbon)? | 0.96 |
4. HIV/AIDS Epidemiological Content | |
4a HIV Epidemiology information mentioned? | 0.75 |
4b HIV sub-populations mentioned? | 0.5 |
5. Content – HIV behaviors & prevention | |
5a HIV/AIDS prevention information mentioned? | 0.14 |
5b HIV test(ing) mentioned? | 0.87 |
5c HIV disclosure mentioned? | 0.55 |
5d Stigma/discrimination awareness mentioned? | 0.67 |
5e HIV compassion and support mentioned? | 0.37 |
References
- UNAIDS. Fact Sheet 2016. Available online: http://www.unaids.org/sites/default/files/media_asset/20150901_FactSheet_2015_en.pdf (accessed on 5 June 2019).
- Centers for Disease Control and Prevention. HIV in the United States and Dependent Areas. Available online: http://www.cdc.gov/hiv/statistics/overview/ataglance.html (accessed on 5 June 2019).
- Office of Disease Prevention and Health Promotion. Healthy People 2020—HIV. Available online: https://www.healthypeople.gov/2020/topics-objectives/topic/hiv/ (accessed on 5 June 2019).
- Fung, I.C.-H.; Tse, Z.T.H.; Fu, K.W. The use of social media in public health surveillance. West. Pac. Surveill. Response J. 2015, 6, 3–6. [Google Scholar] [CrossRef] [PubMed]
- Young, S.D.; Holloway, I.; Jaganath, D.; Rice, E.; Westmoreland, D.; Coates, T. Project HOPE: Online social network changes in an HIV prevention randomized controlled trial for African American and Latino men who have sex with men. Am. J. Public Health 2014, 104, 1707–1712. [Google Scholar] [CrossRef] [PubMed]
- Fung, I.C.-H.; Cai, J.; Hao, Y.; Ying, Y.; Chan, B.S.B.; Tse, Z.T.H.; Fu, K.-W. Global Handwashing Day 2012: A qualitative content analysis of Chinese social media reaction to a health promotion event. West. Pac. Surveill. Response J. 2015, 6, 34–42. [Google Scholar] [CrossRef] [PubMed]
- Blankenship, E.B.; Goff, M.E.; Yin, J.; Tse, Z.T.H.; Fu, K.W.; Liang, H.; Saroha, N.; Fung, I.C.-H. Sentiment, Contents, and Retweets: A Study of Two Vaccine-Related Twitter Datasets. Perm. J. 2018, 22, 17–138. [Google Scholar] [CrossRef] [PubMed]
- Fung, I.C.-H.; Fu, K.W.; Chan, C.H.; Chan, B.S.; Cheung, C.N.; Abraham, T.; Tse, Z.T.H. Social Media’s Initial Reaction to Information and Misinformation on Ebola, August 2014: Facts and Rumors. Public Health Rep. 2016, 131, 461–473. [Google Scholar] [CrossRef] [PubMed]
- Preotiuc-Pietro, D.; Volkova, S.; Lampos, V.; Bachrach, Y.; Aletras, N. Studying User Income through Language, Behaviour and Affect in Social Media. PLoS ONE 2015, 10, e0138717. [Google Scholar] [CrossRef] [PubMed]
- Sinnenberg, L.; Buttenheim, A.M.; Padrez, K.; Mancheno, C.; Ungar, L.; Merchant, R.M. Twitter as a Tool for Health Research: A Systematic Review. Am. J. Public Health 2017, 107, e1–e8. [Google Scholar] [CrossRef] [PubMed]
- Jordan, S.E.; Hovet, S.E.; Fung, I.C.-H.; Liang, H.; Fu, K.-W.; Tse, Z.T.H. Using Twitter for Public Health Surveillance from Monitoring and Prediction to Public Response. Data 2018, 4, 6. [Google Scholar] [CrossRef]
- Tricco, A.C.; Zarin, W.; Lillie, E.; Jeblee, S.; Warren, R.; Khan, P.A.; Robson, R.; Pham, B.; Hirst, G.; Straus, S.E. Utility of social media and crowd-intelligence data for pharmacovigilance: A scoping review. BMC Med. Inform. Decis. Mak. 2018, 18, 38. [Google Scholar] [CrossRef] [PubMed]
- Young, S.D.; Rivers, C.; Lewis, B. Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes. Prev. Med. 2014, 63, 112–115. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Adnan, M.M.; Yin, J.; Jackson, A.M.; Tse, Z.T.H.; Liang, H.; Fu, K.W.; Saroha, N.; Althouse, B.M.; Fung, I.C.-H. World Pneumonia Day 2011–2016: Twitter contents and retweets. Int. Health 2018. [Google Scholar] [CrossRef] [PubMed]
- Schaible, B.J.; Snook, K.R.; Yin, J.; Jackson, A.M.; Ahweyevu, J.O.; Chong, M.; Tse, Z.T.H.; Liang, H.; Fu, K.W.; Fung, I.C.-H. Twitter conversations and English news media reports on poliomyelitis in five different countries, January 2014 to April 2015. Perm. J. 2019, 23, 18–181. [Google Scholar]
- Fu, K.W.; Liang, H.; Saroha, N.; Tse, Z.T.H.; Ip, P.; Fung, I.C.-H. How people react to Zika virus outbreaks on Twitter? A computational content analysis. Am. J. Infect. Control 2016, 44, 1700–1702. [Google Scholar] [CrossRef] [PubMed]
- Fung, I.C.-H.; Tse, Z.T.H.; Fu, K.W. Converting Big Data into public health. Science 2015, 347, 620. [Google Scholar] [CrossRef] [PubMed]
- Fung, I.C.-H.; Jackson, A.M.; Mullican, L.A.; Blankenship, E.B.; Goff, M.E.; Guinn, A.J.; Saroha, N.; Tse, Z.T.H. Contents, Followers, and Retweets of the Centers for Disease Control and Prevention’s Office of Advanced Molecular Detection (@CDC_AMD) Twitter Profile: Cross-Sectional Study. JMIR Public Health Surveill. 2018, 4, e33. [Google Scholar] [CrossRef]
- Jackson, A.M.; Mullican, L.A.; Yin, J.; Tse, Z.T.H.; Liang, H.; Fu, K.W.; Ahweyevu, J.O.; Jenkins III, J.J.; Saroha, N.; Fung, I.C.-H. #CDCGrandRounds and #VitalSigns: A Twitter Analysis. Ann. Glob. Health 2018, 84, 710–716. [Google Scholar] [PubMed] [Green Version]
- Fung, I.C.-H.; Jackson, A.M.; Ahweyevu, J.O.; Grizzle, J.H.; Yin, J.; Tse, Z.T.H.; Liang, H.; Sekandi, J.N.; Fu, K.W. #Globalhealth Twitter Conversations on #Malaria, #HIV, #TB, #NCDS, and #NTDS: A Cross-Sectional Analysis. Ann. Glob. Health 2017, 83, 682–690. [Google Scholar] [PubMed]
- Sasaki, Y. The Truth of the F-Measure. 2007. Available online: https://www.toyota-ti.ac.jp/Lab/Denshi/COIN/people/yutaka.sasaki/F-measure-YS-26Oct07.pdf (accessed on 5 June 2019).
- Twitter Advanced Search. Available online: https://twitter.com/search-advanced (accessed on 5 June 2019).
- Liang, H.; Shen, F.; Fu, K.-W. Privacy protection and self-disclosure across societies: A study of global Twitter users. New Media Soc. 2017, 19, 1476–1497. [Google Scholar] [CrossRef]
- World Bank Country and Lending Groups. Available online: https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups (accessed on 5 June 2019).
- R: A Language and Environment for Statistical Computing. Available online: http://www.R-project.org (accessed on 5 June 2019).
Question & Response | Frequency (%) a |
---|---|
1. Language: Was the tweet written in English? | |
Yes | 1769 (97.0) |
2a. Self-Reported Location: Country income level as defined by the World Bank? | |
Low-income countries (reference category) | 42 (2.3) |
Lower-middle-income countries | 150 (8.2) |
Upper-middle-income countries | 98 (5.4) |
High-income countries—non-OECD | 15 (0.8) |
OECD countries | 677 (37.1) |
Others | 226 (12.4) |
No information reported | 616 (33.7) |
3. Did the tweet mention WAD (or red ribbon)? | |
Yes | 795 (43.6) |
4. HIV/AIDS Epidemiological Content | |
4a HIV Epidemiology information mentioned? | |
Yes (e.g., statistics provided, mentioned incidence/prevalence/epidemics, etc.) | 142 (7.8) |
4b HIV sub-populations mentioned? | |
Yes | 107 (5.9) |
5. Content—HIV behaviors & prevention | |
5a HIV/AIDS prevention information? | |
Yes | 30 (1.6) |
5b HIV test(ing) mentioned? | |
Yes | 106 (5.8) |
5c HIV disclosure mentioned? | |
Yes | 47 (2.6) |
5d Stigma/discrimination awareness mentioned? | |
Yes | 103 (5.6) |
5e HIV compassion and support mentioned? | |
Yes (e.g. remembering the positives, support family, etc.) | 542 (29.7) |
State/Territory | n (%) | State/Territory | n (%) |
---|---|---|---|
1. Alabama | 5 (1.1) | 31. New Mexico | 2 (0.4) |
2. Alaska | 1 (0.2) | 32. New York | 76 (16.8) |
3. Arizona | 4 (0.9) | 33. North Carolina | 10 (2.2) |
4. Arkansas | 1 (0.2) | 34. North Dakota | 0 (0) |
5. California | 55 (12.1) | 35. Ohio | 13 (2.9) |
6. Colorado | 1 (0.2) | 36. Oklahoma | 1 (0.2) |
7. Connecticut | 1 (0.2) | 37. Oregon | 3 (0.7) |
8. Delaware | 0 (0) | 38. Pennsylvania | 13 (2.9) |
9. Florida | 27 (6.0) | 39. Rhode Island | 2 (0.4) |
10. Georgia | 13 (2.9) | 40. South Carolina | 10 (2.2) |
11. Hawaii | 7 (1.5) | 41. South Dakota | 0 (0) |
12. Idaho | 2 (0.4) | 42. Tennessee | 3 (0.7) |
13. Illinois | 19 (4.2) | 43. Texas | 26 (5.7) |
14. Indiana | 5 (1.1) | 44. Utah | 2 (0.4) |
15. Iowa | 4 (0.9) | 45. Vermont | 1 (0.2) |
16. Kansas | 1 (0.2) | 46. Virginia | 3 (0.7) |
17. Kentucky | 5 (1.1) | 47. Washington | 4 (0.9) |
18. Louisiana | 6 (1.3) | 48. West Virginia | 1 (0.2) |
19. Maine | 1 (0.2) | 49. Wisconsin | 3 (0.7) |
20. Maryland | 10 (2.2) | 50. Wyoming | 0 (0) |
21.Massachusetts | 11 (2.4) | 51. Washington D.C. | 16 (3.5) |
22. Michigan | 8 (1.8) | 52. Puerto Rico | 1 (0.2) |
23. Minnesota | 5 (1.1) | 53. Guam | 0 (0) |
24. Mississippi | 2 (0.4) | 54. Other U.S. Territories | 0 (0) |
25. Missouri | 2 (0.4) | 55. U.S.A. Non-specific | 46 (10.2) |
26. Montana | 1 (0.2) | ||
27. Nebraska | 2 (0.4) | ||
28. Nevada | 6 (1.3) | ||
29. New Hampshire | 0 (0) | ||
30. New Jersey | 12 (2.6) |
Question | Level of the Predictor Variable | Adjusted Odds Ratio | 95% CI | p-Value |
---|---|---|---|---|
Outcome variable: Mentions of HIV/AIDS Epidemiology information | ||||
Country income level of self-reported locations * | LI | reference | - | - |
LMI | 0.804 | 0.300, 2.157 | 0.665 | |
UMI | 0.548 | 0.183, 1.642 | 0.283 | |
HIC | 0.974 | 0.166, 5.702 | 0.977 | |
OECD | 0.404 | 0.166, 0.981 | 0.045 | |
Others | 0.800 | 0.315, 2.031 | 0.639 | |
Mentions of Sub-populations | Yes | 7.226 | 4.408, 11.845 | <0.001 |
Outcome variable: Mentions of Sub-populations | ||||
Country income level of self-reported locations * | LI | reference | - | - |
LMI | 0.213 | 0.068, 0.664 | 0.008 | |
UMI | 0.523 | 0.175, 1.565 | 0.246 | |
HIC | 0.298 | 0.030, 2.944 | 0.300 | |
OECD | 0.424 | 0.176, 1.022 | 0.056 | |
Others | 0.441 | 0.169, 1.148 | 0.093 | |
Mentions of “World AIDS Day” or “Red Ribbon” | Yes | 0.631 | 0.398, 0.998 | 0.049 |
Mentions of HIV/AIDS Epidemiology | Yes | 6.856 | 4.168, 11.280 | <0.001 |
Outcome variable: Mentions of HIV/AIDS compassion and support (Yes/No) | ||||
Country income level of self-reported locations * | LI | reference | - | - |
LMI | 2.126 | 0.765, 5.903 | 0.148 | |
UMI | 1.782 | 0.611, 5.200 | 0.290 | |
HIC | 3.885 | 0.895, 16.873 | 0.070 | |
OECD | 3.080 | 1.179, 8.047 | 0.021 | |
Others | 1.617 | 0.591, 4.426 | 0.349 | |
Mentions of “World AIDS Day” or “Red Ribbon” | Yes | 1.838 | 1.407, 2.401 | <0.001 |
Mentions of HIV/AIDS Epidemiology | Yes | 0.353 | 0.189, 0.658 | 0.001 |
Mentions of HIV/AIDS Testing | Yes | 0.394 | 0.213, 0.731 | 0.003 |
SVM Model | Sparse Term Threshold | Number of Terms | TP | TN | FP | FN | Specificity | Sensitivity | Positive Predictive Value | F1 Score |
---|---|---|---|---|---|---|---|---|---|---|
Training Set (n = 1278) | ||||||||||
A | 0 | 4963 | 681 | 0 | 0 | 597 | - | 0.53 | 1 | 0.70 |
B | (n−3)/n | 688 | 636 | 579 | 52 | 11 | 0.92 | 0.98 | 0.92 | 0.95 |
C | (n−5)/n | 546 | 614 | 587 | 62 | 15 | 0.90 | 0.98 | 0.91 | 0.94 |
D | (n−7)/n | 308 | 604 | 569 | 80 | 25 | 0.88 | 0.96 | 0.88 | 0.92 |
E | (n−10)/n | 230 | 585 | 566 | 96 | 31 | 0.85 | 0.95 | 0.86 | 0.90 |
Test Set (n = 548) | ||||||||||
A | 0 | 4963 | Not applied to the test set because of poor performance in the training set | |||||||
B | (n−3)/n | 688 | 216 | 176 | 63 | 93 | 0.74 | 0.70 | 0.77 | 0.73 |
C | (n−5)/n | 546 | 235 | 168 | 56 | 89 | 0.75 | 0.73 | 0.81 | 0.76 |
D | (n−7)/n | 308 | 222 | 174 | 61 | 91 | 0.74 | 0.71 | 0.78 | 0.74 |
E | (n−10)/n | 230 | 211 | 190 | 75 | 72 | 0.72 | 0.75 | 0.74 | 0.74 |
Held-out Dataset (n = 180) | ||||||||||
C | (n−5)/n | 546 | 68 | 60 | 32 | 20 | 0.65 | 0.77 | 0.68 | 0.72 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fung, I.C.-H.; Yin, J.; Pressley, K.D.; Duke, C.H.; Mo, C.; Liang, H.; Fu, K.-W.; Tse, Z.T.H.; Hou, S.-I. Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014. Data 2019, 4, 84. https://doi.org/10.3390/data4020084
Fung IC-H, Yin J, Pressley KD, Duke CH, Mo C, Liang H, Fu K-W, Tse ZTH, Hou S-I. Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014. Data. 2019; 4(2):84. https://doi.org/10.3390/data4020084
Chicago/Turabian StyleFung, Isaac Chun-Hai, Jingjing Yin, Keisha D. Pressley, Carmen H. Duke, Chen Mo, Hai Liang, King-Wa Fu, Zion Tsz Ho Tse, and Su-I Hou. 2019. "Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014" Data 4, no. 2: 84. https://doi.org/10.3390/data4020084