A Longitudinal Analysis of Artificial Intelligence Coverage in Technology-Focused News Media Using Latent Dirichlet Allocation and Sentiment Analysis
Abstract
1. Introduction
- However, RQ 2 and 3 are more powerful when they are analyzed in relation to each other. Therefore, a 4th RQ addresses the relationships between topics and emotion.
2. Materials and Methods
2.1. Dataset Curation and Cleaning
2.1.1. URL Extraction and Metadata Scraping
2.1.2. Data Validation and Final Cleaning
2.2. Sentiment Analysis
2.3. Topic Modeling: Latent Dirichlet Allocation
2.3.1. Text Preprocessing
2.3.2. Time Period Identification
2.3.3. Model Fine-Tuning
2.3.4. Training Models
2.3.5. Qualitative Analysis of Topics and Visualization
2.3.6. Similar Topics Across Different Periods
3. Results
3.1. General Trends in the Dataset
3.2. AI-Related Sentiment Trends
3.3. AI-Related Topics in the Media
3.4. Change in Topic Sentiment over Time
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
BoW | Bag of Words |
BERT | Bidirectional Encoder Representations from Transformers |
EU | European Union |
LDA | latent Dirichlet Allocation |
LIWC | Linguistic Inquiry and Word Count |
RoBERTa | Robustly Optimized BERT Pretraining Approach |
URL | Uniform Resource Locator |
Appendix A
Appendix A.1. Websites That Passed Inclusion Criteria and Those That Failed Inclusion Criteria
- 1.
- TechCrunch [Passed inclusion criteria]
- 2.
- Wired [Passed inclusion criteria]
- 3.
- Engadget [Passed inclusion criteria]
- 4.
- The Verge [Passed inclusion criteria]
- 5.
- TechNewsWorld [Failed]
- 6.
- GeekWire [Failed]
- 7.
- CNET [Passed inclusion criteria]
- 8.
- Digital Trends [Failed]
- 9.
- Android Authority [Failed]
- 10.
- PCWorld [Failed]
- 11.
- The Next Web [Failed]
- 12.
- Silicon Valley Journals [Failed]
- 13.
- Tech Hubs Media [Failed]
- 14.
- Soft2Share [Failed]
- 15.
- Newskart [Failed]
- 16.
- Ars Technica [Failed]
- 17.
- Techmeme [Failed]
- 18.
- Gadgets 360 [Failed]
- 19.
- Techradar [Failed]
- 20.
- ZDNET [Failed]
- 21.
- TechSpot [Failed]
- 22.
- TechRepublic [Failed]
- 23.
- VentureBeat [Failed]
- 24.
- AppleInsider [Failed]
- 25.
- MacWorld [Failed]
- 26.
- Tech In Asia [Failed]
- 27.
- KnowTechie [Failed]
- 28.
- MIT Tech Review [Failed]
- 29.
- TechHive [Failed]
- 30.
- Mashable [Passed inclusion criteria]
- 31.
- Gizmodo [Passed inclusion criteria]
- 32.
- Cord Cutter News [Failed]
- 33.
- Life Hacker [Failed]
- 34.
- ComputerWorld [Passed inclusion criteria]
- 35.
- MakeUseOf [Failed]
- 36.
- HowToGeek [Failed]
- 37.
- Pymnts.com [Failed]
- 38.
- Product Hunt [Failed]
- 39.
- Pocket Lint [Failed]
- 40.
- Tom’s Guide [Failed]
- 41.
- Slash Gear [Failed]
- 42.
- The Information [Failed]
- 43.
- Term Sheet [Failed]
- 44.
- Ubergizmo [Failed]
- 45.
- 9to5 Mac [Failed]
- 46.
- Tech2 [Failed]
- 47.
- Recode [Failed]
- 48.
- IEEE Spectrum [Passed inclusion criteria]
- 49.
- O’Reilly [Failed]
- 50.
- Hackaday [Passed inclusion criteria]
Appendix A.2. Sample of CV Coherence Scores Testing for Hyperparameter Tuning
Alpha | Beta | CV Coherence |
---|---|---|
0.31 | symmetric | 0.3908774736 |
0.31 | 0.01 | 0.3922563959 |
0.31 | 0.31 | 0.4031903854 |
0.31 | 0.61 | 0.4028762874 |
0.31 | 0.91 | 0.4661256387 |
Alpha | Beta | CV Coherence |
---|---|---|
0.01 | symmetric | 0.5044535816 |
0.01 | 0.01 | 0.5028748581 |
0.01 | 0.31 | 0.5037658353 |
0.01 | 0.61 | 0.5419765267 |
0.01 | 0.91 | 0.552961287 |
Alpha | Beta | CV Coherence |
---|---|---|
0.61 | symmetric | 0.5215005753 |
0.61 | 0.01 | 0.5242769182 |
0.61 | 0.31 | 0.5246141571 |
0.61 | 0.61 | 0.5380900789 |
0.61 | 0.91 | 0.5463081086 |
Appendix A.3. Word Clouds, Word Lists, and Associated Topics for Three Topics—One per Period
- [‘computer’, ‘machine’, ‘system’, ‘google’, ‘learn’, ‘brain’, ‘ai’, ‘datum’, ‘he’, ‘learning’, ‘deep’, ‘researcher’, ‘network’, ‘way’, ‘algorithm’, ‘neural’, ‘artificial’, ‘image’, ‘these’, ‘build’, ‘company’, ‘language’, ‘program’, ‘research’, ‘facebook’, ‘world’, ‘understand’, ‘now’, ‘call’, ‘software’, ‘university’, ‘thing’, ‘go’, ‘technology’, ‘good’, ‘science’, ‘his’, ‘would’, ‘people’, ‘word’, ‘year’, ‘problem’, ‘even’, ‘many’, ‘model’, ‘who’, ‘much’, ‘know’, ‘then’, ‘help’]
- [‘startup’, ‘million’, ‘founder’, ‘market’, ‘investor’, ‘funding’, ‘round’, ‘raise’, ‘venture’, ‘investment’, ‘he’, ‘capital’, ‘tech’, ‘lead’, ‘build’, ‘business’, ‘product’, ‘partner’, ‘our’, ‘ceo’, ‘billion’, ‘focus’, ‘fund’, ‘team’, ‘co’, ‘industry’, ‘last’, ‘techcrunch’, ‘base’, ‘early’, ‘china’, ‘big’, ‘invest’, ‘firm’, ‘come’, ‘announce’, ‘grow’, ‘over’, ‘series’, ‘who’, ‘customer’, ‘plan’, ‘found’, ‘world’, ‘service’, ‘growth’, ‘month’, ‘first’, ‘opportunity’, ‘global’]
- [‘chatgpt’, ‘chatbot’, ‘openai’, ‘google’, ‘language’, ‘gpt’, ‘code’, ‘text’, ‘answer’, ‘datum’, ‘gemini’, ‘prompt’, ‘llm’, ‘question’, ‘information’, ‘response’, ‘large’, ‘bard’, ‘developer’, ‘write’, ‘source’, ‘generate’, ‘train’, ‘ask’, ‘generative’, ‘meta’, ‘release’, ‘open’, ‘system’, ‘give’, ‘bot’, ‘version’, ‘test’, ‘claude’, ‘access’, ‘word’, ‘available’, ‘example’, ‘call’, ‘anthropic’, ‘bing’, ‘research’, ‘task’, ‘provide’, ‘base’, ‘assistant’, ‘training’, ‘own’, ‘capability’, ‘launch’]
References
- Alaparthi, S., & Mishra, M. (2021). BERT: A sentiment analysis odyssey. Journal of Marketing Analytics, 9(2), 118–126. [Google Scholar] [CrossRef]
- Babina, T., Fedyk, A., He, A., & Hodson, J. (2024). Artificial intelligence, firm growth, and product innovation. Journal of Financial Economics, 151, 103745. [Google Scholar] [CrossRef]
- Barbieri, F., Camacho-Collados, J., Espinosa Anke, L., & Neves, L. (2020). TweetEval: Unified benchmark and comparative evaluation for tweet classification. In T. Cohn, Y. He, & Y. Liu (Eds.), Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics. [Google Scholar]
- Baron, D. P. (2006). Persistent media bias. Journal of Public Economics, 90(1), 1–36. [Google Scholar] [CrossRef]
- Bastani, K., Namavari, H., & Shaffer, J. (2019). Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints. Expert Systems with Applications, 127, 256–271. [Google Scholar] [CrossRef]
- Bird, S., & Loper, E. (2004). NLTK: The natural language toolkit. In Proceedings of the ACL interactive poster and demonstration sessions. Association for Computational Linguistics. [Google Scholar]
- Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. [Google Scholar]
- Boyd, R., Ashokkumar, A., Seraj, S., & Pennebaker, J. (2022). The development and psychometric properties of LIWC-22. Available online: https://www.liwc.app (accessed on 15 November 2024).
- Davenport, T., & Kalakota, R. (2019). The potential for artificial intelligence in healthcare. Future Healthcare Journal, 6(2), 94–98. [Google Scholar] [CrossRef]
- Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers). Association for Computational Linguistics. [Google Scholar]
- Elejalde, E., Ferres, L., & Herder, E. (2018). On the nature of real and perceived bias in the mainstream media. PLoS ONE, 13(3), e0193765. [Google Scholar] [CrossRef]
- European Union. (2024). Regulation (EU) 2024/1689 of the European parliament and of the council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Official Journal of the European Union, L 1689. Available online: https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng (accessed on 23 August 2025).
- Fast, E., & Horvitz, E. (2017, February 4–9). Long-term trends in the public perception of artificial intelligence. Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, CA, USA. [Google Scholar] [CrossRef]
- FeedSpot. (2024). Top 100 tech news websites in 2024. FeedSpot. Available online: https://news.feedspot.com/tech_news_websites/ (accessed on 23 August 2025).
- Franklin, E. (2015). Some theoretical considerations in off-the-shelf text analysis software. Available online: https://aclanthology.org/R15-2002.pdf (accessed on 20 November 2024).
- Garvey, C., & Maskal, C. (2019). Sentiment analysis of the news media on artificial intelligence does not support claims of negative bias against artificial intelligence. Omics, 24(5), 286–299. [Google Scholar] [CrossRef]
- Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv, arXiv:2203.05794. [Google Scholar] [CrossRef]
- Heaven, W. D. (2023). ChatGPT is everywhere. Here’s where it came from. MIT Technology Review, 10, 1–5. [Google Scholar]
- Honnibal, M., Montani, I., Van Landeghem, S., & Boyd, A. (2020). spaCy (version 3.7.2): Industrial-strength natural language processing in Python [Computer software]. Zenodo. [Google Scholar] [CrossRef]
- Howard, J. (2019). Artificial intelligence: Implications for the future of work. American Journal of Industrial Medicine, 62(11), 917–926. [Google Scholar] [CrossRef]
- Huang, C., Zhang, Z., Mao, B., & Yao, X. (2023). An overview of artificial intelligence ethics. IEEE Transactions on Artificial Intelligence, 4(4), 799–819. [Google Scholar] [CrossRef]
- Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169–15211. [Google Scholar] [CrossRef]
- Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1(9), 389–399. [Google Scholar] [CrossRef]
- Kherwa, P., & Bansal, P. (2019). Topic modeling: A comprehensive review. EAI Endorsed Transactions on Scalable Information Systems, 7(24), e2. [Google Scholar] [CrossRef]
- Kurniasih, A., & Manik, L. P. (2022). On the role of text preprocessing in BERT embedding-based DNNs for classifying informal texts. International Journal of Advanced Computer Science and Applications (IJACSA), 1024(512), 256. [Google Scholar] [CrossRef]
- Liao, C.-H. (2023). Exploring the influence of public perception of mass media usage and attitudes towards mass media news on altruistic behavior. Behavioral Sciences, 13(8), 621. [Google Scholar] [CrossRef]
- Liu, Y., & Li, X. (2021). Pro-environmental behavior predicted by media exposure, SNS involvement, and cognitive and normative factors. Environmental Communication, 15(7), 954–968. [Google Scholar] [CrossRef]
- Liu, Z., Lin, W., Shi, Y., & Zhao, J. (2021, August 13–15). A robustly optimized BERT pre-training approach with post-training. Chinese Computational Linguistics: 20th China National Conference (CCL 2021), Hohhot, China. [Google Scholar] [CrossRef]
- Maslej, N., Loredana Fattorini, R. P., Parli, V., Reuel, N., Brynjolfsson, E., Etchemendy, J., Ligett, K., Lyons, T., Manyika, J., Niebles, J. C., Shoham, Y., Wald, R., & Clark, J. (2024). The AI index 2024 annual report. Stanford University. Available online: https://aiindex.stanford.edu/wp-content/uploads/2024/05/HAI_AI-Index-Report-2024.pdf (accessed on 21 October 2024).
- Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093–1113. [Google Scholar] [CrossRef]
- Milmo, D. (2023). ChatGPT reaches 100 million users two months after launch. The Guardian. Available online: https://www.theguardian.com/technology/2023/feb/02/chatgpt-100-million-users-open-ai-fastest-growing-app (accessed on 23 August 2025).
- Mohanna, S., & Basiouni, A. (2024). Consumer’s cognitive and affective perceptions of artificial intelligence (AI) in social media: Topic modelling approach. Journal of Electrical Systems, 20(3), 1317–1326. [Google Scholar] [CrossRef]
- Moriniello, F., Martí-Testón, A., Muñoz, A., Silva Jasaui, D., Gracia, L., & Solanes, J. E. (2024). Exploring the relationship between the coverage of AI in WIRED magazine and public opinion using sentiment analysis. Applied Sciences, 14(5), 1994. [Google Scholar] [CrossRef]
- OpenAI. (2022). ChatGPT. OpenAI. Available online: https://chatgpt.com/overview?openaicom_referred=true (accessed on 30 November 2022).
- Ouchchy, L., Coin, A., & Dubljević, V. (2020). AI in the headlines: The portrayal of the ethical issues of artificial intelligence in the media. AI & Society, 35(4), 927–936. [Google Scholar] [CrossRef]
- Qi, W., Pan, J., Lyu, H., & Luo, J. (2023). Excitements and concerns in the post-ChatGPT era: Deciphering public perception of AI through social media analysis. Telematics and Informatics, 92, 102158. [Google Scholar] [CrossRef]
- Rahmani, A. M., Rezazadeh, B., Haghparast, M., Chang, W.-C., & Ting, S. G. (2023). Applications of artificial intelligence in the economy, including applications in stock trading, market analysis, and risk management. IEEE Access, 11, 80769–80793. [Google Scholar] [CrossRef]
- Rahutomo, F., Kitasuka, T., & Aritsugi, M. (2012, October 29–30). Semantic cosine similarity. The 7th International Student Conference on Advanced Science and Technology (ICAST 2012), Seoul, Republic of Korea. [Google Scholar]
- Richardson, L. (2007). Beautiful soup documentation. Available online: https://ucilnica.fri.uni-lj.si/pluginfile.php/217774/mod_resource/content/1/beautiful-soup-4-readthedocs-io-en-latest.pdf (accessed on 21 August 2024).
- Röder, M., Both, A., & Hinneburg, A. (2015, January 31–February 6). Exploring the space of topic coherence measures. WSDM 2015: Eighth ACM International Conference on Web Search and Data Mining, Shanghai, China. [Google Scholar]
- Řehůřek, R., & Sojka, P. (2010, May 22). Software framework for topic modelling with large corpora. LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta. [Google Scholar]
- Schwartz, H. A., Eichstaedt, J., Blanco, E., Dziurzynski, L., Kern, M. L., Ramones, S., Seligman, M., & Ungar, L. (2013). Choosing the right words: Characterizing and reducing error of the word count approach. In M. Diab, T. Baldwin, & M. Baroni (Eds.), Second joint conference on lexical and computational semantics (* SEM), volume 1: Proceedings of the main conference and the shared task: Semantic textual similarity. Association for Computational Linguistics. [Google Scholar]
- Shah, A. (2023). Top 40 tech news websites list to follow in 2023. SeekaHost. Available online: https://www.seekahost.com/best-tech-news-websites/ (accessed on 20 May 2024).
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017, December 4–9). Attention is all you need. 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA. [Google Scholar]
- Vayansky, I., & Kumar, S. A. P. (2020). A review of topic modeling methods. Information Systems, 94, 101582. [Google Scholar] [CrossRef]
- Wallach, H. M., Mimno, D., & McCallum, A. (2009, December 7–10). Rethinking LDA: Why priors matter. 23rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada. [Google Scholar]
- Wang, W., & Siau, K. (2018, May 17–18). Artificial intelligence: A study on governance, policies, and regulations. Thirteenth Midwest Association for Information Systems Conference, Saint Louis, MO, USA. [Google Scholar]
- Wankhade, M., Rao, A. C. S., & Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review, 55(7), 5731–5780. [Google Scholar] [CrossRef]
- Xu, Y., Liu, X., Cao, X., Huang, C., Liu, E., Qian, S., Liu, X., Wu, Y., Dong, F., Qiu, C.-W., Qiu, J., Hua, K., Su, W., Wu, J., Xu, H., Han, Y., Fu, C., Yin, Z., Liu, M., … Zhang, J. (2021). Artificial intelligence: A powerful paradigm for scientific research. The Innovation, 2(4), 100179. [Google Scholar] [CrossRef]
- Yi, A., Goenka, S., & Pandelaere, M. (2023). Partisan media sentiment toward artificial intelligence. Social Psychological and Personality Science, 15(6), 682–690. [Google Scholar] [CrossRef]
- Zhai, Y., Yan, J., Zhang, H., & Lu, W. (2020). Tracing the evolution of AI: Conceptualization of artificial intelligence in mass media discourse. Information Discovery and Delivery, 48(3), 137–149. [Google Scholar] [CrossRef]
- Zuiderwijk, A., Chen, Y.-C., & Salem, F. (2021). Implications of the use of artificial intelligence in public governance: A systematic literature review and a research agenda. Government Information Quarterly, 38(3), 101577. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jain, A.; Ranganathan, S. A Longitudinal Analysis of Artificial Intelligence Coverage in Technology-Focused News Media Using Latent Dirichlet Allocation and Sentiment Analysis. Journal. Media 2025, 6, 176. https://doi.org/10.3390/journalmedia6040176
Jain A, Ranganathan S. A Longitudinal Analysis of Artificial Intelligence Coverage in Technology-Focused News Media Using Latent Dirichlet Allocation and Sentiment Analysis. Journalism and Media. 2025; 6(4):176. https://doi.org/10.3390/journalmedia6040176
Chicago/Turabian StyleJain, Arjun, and Shyam Ranganathan. 2025. "A Longitudinal Analysis of Artificial Intelligence Coverage in Technology-Focused News Media Using Latent Dirichlet Allocation and Sentiment Analysis" Journalism and Media 6, no. 4: 176. https://doi.org/10.3390/journalmedia6040176
APA StyleJain, A., & Ranganathan, S. (2025). A Longitudinal Analysis of Artificial Intelligence Coverage in Technology-Focused News Media Using Latent Dirichlet Allocation and Sentiment Analysis. Journalism and Media, 6(4), 176. https://doi.org/10.3390/journalmedia6040176