This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Web Scraping Chilean News Media: A Dataset for Analyzing Social Unrest Coverage (2019–2023)
by
Ignacio Molina
Ignacio Molina 1
,
José Morales
José Morales 2
and
Brian Keith
Brian Keith 1,*
1
Department of Systems and Computing Engineering, Universidad Católica del Norte, Antofagasta 1270398, Chile
2
School of Journalism, Universidad Católica del Norte, Antofagasta 1270398, Chile
*
Author to whom correspondence should be addressed.
Data 2025, 10(11), 174; https://doi.org/10.3390/data10110174 (registering DOI)
Submission received: 10 September 2025
/
Revised: 28 October 2025
/
Accepted: 30 October 2025
/
Published: 31 October 2025
Abstract
This paper presents a dataset of Chilean news media coverage during the social unrest and constitutional processes from 2019 to 2023. Using Python-based web scraping with BeautifulSoup and Selenium, we collected articles from 15 Chilean news outlets between 15 November 2019 and 17 December 2023. The initial collection of 1254 articles was filtered to 931 usable data points after removing non-relevant content, duplicates, and articles unrelated to the Chilean social outburst. Each news outlet required specific extraction approaches due to varying HTML structures, with some outlets inaccessible due to paywalls or anti-scraping mechanisms. The dataset is structured in JSON format with standardized fields including title, content, date, author, and source metadata. This resource supports research on media coverage during political events and provides data for Spanish-language processing tasks. The dataset and extraction code are publicly available on GitHub.
Share and Cite
MDPI and ACS Style
Molina, I.; Morales, J.; Keith, B.
Web Scraping Chilean News Media: A Dataset for Analyzing Social Unrest Coverage (2019–2023). Data 2025, 10, 174.
https://doi.org/10.3390/data10110174
AMA Style
Molina I, Morales J, Keith B.
Web Scraping Chilean News Media: A Dataset for Analyzing Social Unrest Coverage (2019–2023). Data. 2025; 10(11):174.
https://doi.org/10.3390/data10110174
Chicago/Turabian Style
Molina, Ignacio, José Morales, and Brian Keith.
2025. "Web Scraping Chilean News Media: A Dataset for Analyzing Social Unrest Coverage (2019–2023)" Data 10, no. 11: 174.
https://doi.org/10.3390/data10110174
APA Style
Molina, I., Morales, J., & Keith, B.
(2025). Web Scraping Chilean News Media: A Dataset for Analyzing Social Unrest Coverage (2019–2023). Data, 10(11), 174.
https://doi.org/10.3390/data10110174
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.