Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (127)

Search Parameters:
Keywords = web scraping

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 708 KB  
Project Report
Exploring the Power of Content and Visitor Sentiment: A Study of Web Traffic Dynamics in South Africa’s Residential Real Estate Landscape
by Kola Ijasan and Charles Chimedza
Real Estate 2026, 3(2), 7; https://doi.org/10.3390/realestate3020007 - 3 Jun 2026
Viewed by 46
Abstract
The real estate sector has increasingly shifted toward digital platforms, where content sentiment plays a crucial yet understudied role in driving user engagement. While sentiment analysis has been widely applied in retail and finance, its impact on real estate web traffic remains poorly [...] Read more.
The real estate sector has increasingly shifted toward digital platforms, where content sentiment plays a crucial yet understudied role in driving user engagement. While sentiment analysis has been widely applied in retail and finance, its impact on real estate web traffic remains poorly understood, particularly in competitive digital marketplaces. This study examines the relationship between sentiment in web content and the traffic it attracts on residential real estate websites in South Africa. Specifically, it examines how different sentiments associated with the type of content (articles versus property listings) influence total monthly web traffic and user engagement. A quantitative analysis of six years (2017–2023) of scraped data from Property24, Remax, and Private Property employed R (rvest, sentimentr, and stats packages) for web scraping, sentiment analysis, and ANOVA testing to evaluate relationships between content sentiment, type (listings vs. articles), and web traffic metrics. The analysis revealed a significant impact of sentiment on web traffic, indicating that the sentiment of web content influences visitor numbers. Specifically, property listings generated a total of 16,780,623 monthly visitors, significantly surpassing the 13,407,521 visitors attracted by articles. This study contributes empirical evidence regarding the influence of content sentiment and content type on web traffic within the South African real estate market. It highlights the critical role of sentiment in shaping web traffic and potentially user engagement and provides actionable insights for real estate developers and marketers seeking to optimize their content strategies to improve user attraction and retention. Full article
Show Figures

Figure 1

29 pages, 1257 KB  
Article
Speed or Green? Strategic Trade-Offs in Online Delivery Options Across UK Retail and Logistics
by Thi Minh Tam Nguyen, Muhammad Azmat and Reem Hadeed
Logistics 2026, 10(6), 124; https://doi.org/10.3390/logistics10060124 - 2 Jun 2026
Viewed by 227
Abstract
Background: The rapid growth of e-commerce has intensified the tension between customer expectations for fast, convenient delivery and the need for more sustainable last-mile logistics. While existing studies have examined speed, price, sustainability, and convenience as separate delivery attributes, less attention has [...] Read more.
Background: The rapid growth of e-commerce has intensified the tension between customer expectations for fast, convenient delivery and the need for more sustainable last-mile logistics. While existing studies have examined speed, price, sustainability, and convenience as separate delivery attributes, less attention has been given to how these dimensions are combined and presented in consumer-facing delivery options. Methods: This study adopts a mixed-methods approach, combining a systematic literature review with structured analysis of publicly available delivery offers on websites across the UK retail and logistics sectors. Results: The findings show that delivery design remains strongly shaped by speed, price visibility, and convenience, while sustainability signals are rarely embedded at the point of customer choice. Although the literature highlights growing interest in green logistics, observed delivery menus suggest a persistent gap between sustainability commitments and their implementation at checkout. Five delivery strategy archetypes are identified, illustrating how firms configure trade-offs among fast delivery, affordability, sustainability signalling, and convenience. Conclusions: The study contributes a four-pillar choice architecture framework for understanding online delivery design. It highlights the need for clearer sustainability communication, greener default options, and stronger alignment among firm strategy, consumer decision-making, and policy support in last-mile delivery. Full article
(This article belongs to the Section Last Mile, E-Commerce and Sales Logistics)
Show Figures

Graphical abstract

23 pages, 677 KB  
Article
Large Language Models for Energy Market Analytics: An Exploratory Feasibility Study Across Geopolitical Monitoring, Commodity Summarisation, and Renewable Forecasting
by Alex Krempasky, Erik Kajati and Peter Papcun
Big Data Cogn. Comput. 2026, 10(6), 166; https://doi.org/10.3390/bdcc10060166 - 22 May 2026
Viewed by 249
Abstract
Large Language Models (LLMs) offer opportunities for processing heterogeneous information streams relevant to energy-market decision-making, but their practical role in forecasting-oriented analytical workflows remains uncertain. This paper presents an exploratory feasibility study of LLM use across four energy-market tasks: geopolitical event monitoring for [...] Read more.
Large Language Models (LLMs) offer opportunities for processing heterogeneous information streams relevant to energy-market decision-making, but their practical role in forecasting-oriented analytical workflows remains uncertain. This paper presents an exploratory feasibility study of LLM use across four energy-market tasks: geopolitical event monitoring for Dutch Title Transfer Facility (TTF) market context using Global Database of Events, Language, and Tone (GDELT)-based data, structured summarisation of commodity-intelligence articles, prompt-engineered solar-power and grid-load forecasting for Austria, and a short-horizon exploratory TTF price-estimation case. The study is positioned as a pilot investigation and hybrid workflow blueprint rather than as a statistically conclusive forecasting benchmark. A four-layer reference architecture was devised, including structured market data, semi-structured news intelligence, web-scraping concepts, and implemented Twitter/X and GDELT monitoring layers. The empirical cases indicate that LLMs are most useful for text-heavy reasoning, event-context integration, source triage, and structured interpretation. In the 20-article summarisation corpus, Gemini 1.5 Pro achieved higher commodity-direction accuracy than GPT-4, while GPT-4 showed stronger output-format stability. In selected solar case checks, OpenAI models produced plausible generation curves close to the Fraunhofer ISE Energy Charts reference, while Energy Charts remained more accurate for aggregate load estimation in the available benchmark comparison. The two-day TTF experiment illustrated that LLMs can incorporate qualitative geopolitical context into short-horizon reasoning, but it did not establish reliable price-forecasting capability. The Twitter/X monitoring layer is retained as a documented negative pathway, showing the limitations of informal social-media scraping for reproducible market intelligence. Full article
(This article belongs to the Special Issue Large Language Models and Their Limitations)
Show Figures

Figure 1

20 pages, 2618 KB  
Article
A Deep Hybrid Recommendation Method for Multimodal Information Integrating Content Generated by Large Language Models
by Chao Duan, Wenlong Zhang, Zhongtao Yu, Senyao Li, Xuelian Wan and Qionghao Huang
Information 2026, 17(3), 298; https://doi.org/10.3390/info17030298 - 18 Mar 2026
Viewed by 575
Abstract
Item description information plays a crucial role in helping users understand the basic situation of an item and is also vital auxiliary information in recommendation systems. Traditional methods obtain this data through platform backend data or web scraping techniques, but these data are [...] Read more.
Item description information plays a crucial role in helping users understand the basic situation of an item and is also vital auxiliary information in recommendation systems. Traditional methods obtain this data through platform backend data or web scraping techniques, but these data are often static, relatively fixed, and insufficiently descriptive. In recent years, large language models (LLMs) like generative pre-trained transformer (GPT) have become powerful tools in natural language processing, bringing new hope for LLM-based recommendations. However, does the text information generated by large language models help improve recommendation accuracy? How can the information produced by generative artificial intelligence be integrated with existing multi-source heterogeneous information? In this paper, we propose a novel deep hybrid recommendation method for multimodal information integrating content generated by large language models (DML). We first explore the use of large language models to generate detailed descriptive information about movies. Next, we perform a weighted fusion of the generated text information with existing movie category information and user demographic data, among other multi-source heterogeneous information. Finally, we use the fused information to predict movie ratings. The results indicate that the multimodal information deep hybrid recommendation method, which integrates content generated by large language models, provides substantial evidence of superior performance relative to existing baseline models. Full article
(This article belongs to the Special Issue Generative AI Transformations in Industrial and Societal Applications)
Show Figures

Figure 1

9 pages, 806 KB  
Data Descriptor
Tracking K-12 and Higher Education Job Postings Through Web-Scraped Longitudinal Data
by Mark A. Perkins and Bolaji Aderibigbe Akorede
Data 2026, 11(3), 52; https://doi.org/10.3390/data11030052 - 6 Mar 2026
Viewed by 965
Abstract
Teacher shortages and workforce trends in education are critical policy and research concerns. This study presents a robust data collection pipeline that systematically web-scrapes job postings for K-12 and higher education job postings across multiple sources. While the methodology could theoretically be adapted [...] Read more.
Teacher shortages and workforce trends in education are critical policy and research concerns. This study presents a robust data collection pipeline that systematically web-scrapes job postings for K-12 and higher education job postings across multiple sources. While the methodology could theoretically be adapted to other job categories, the pipeline is specifically implemented for educational job postings due to platform-specific structures and scraping constraints. Using R, we extract, clean, and archive job postings weekly, compiling them into a longitudinal master dataset that tracks trends in teacher openings over time. Our approach enables monthly trend analysis, providing insights into hiring patterns, subject-area demands, and geographic disparities. By making this dataset available, we contribute both a reproducible methodological pipeline for scraping, cleaning, and standardizing K-12 and higher education job postings, and a validated longitudinal dataset for research and workforce policy applications. This data descriptor details the methodology, data structure, and potential applications for researchers and policymakers monitoring education sector employment trends. Full article
Show Figures

Figure 1

23 pages, 6426 KB  
Article
An Improved Map Information Collection Tool Using 360° Panoramic Images for Indoor Navigation Systems
by Kadek Suarjuna Batubulan, Nobuo Funabiki, I Nyoman Darma Kotama, Komang Candra Brata and Anak Agung Surya Pradhana
Appl. Sci. 2026, 16(3), 1499; https://doi.org/10.3390/app16031499 - 2 Feb 2026
Viewed by 858
Abstract
At present, pedestrian navigation systems using smartphones have become common in daily activities. For their ubiquitous, accurate, and reliable services, map information collection is essential for constructing comprehensive spatial databases. Previously, we have developed a map information collection tool to extract building information [...] Read more.
At present, pedestrian navigation systems using smartphones have become common in daily activities. For their ubiquitous, accurate, and reliable services, map information collection is essential for constructing comprehensive spatial databases. Previously, we have developed a map information collection tool to extract building information using Google Maps, optical character recognition (OCR), geolocation, and web scraping with smartphones. However, indoor navigation often suffers from inaccurate localization due to degraded GPS signals inside buildings and Simultaneous Localization and Mapping (SLAM) estimation errors, causing position errors and confusing augmented reality (AR) guidance. In this paper, we present an improved map information collection tool to address this problem. It captures 360° panoramic images to build 3D models, apply photogrammetry-based mesh reconstruction to correct geometry, and georeference point clouds to refine latitude–longitude coordinates. For evaluations, experiments in various indoor scenarios were conducted. The results demonstrate that the proposed method effectively mitigates positional errors with an average drift correction of 3.15 m, calculated via the Haversine formula. Geometric validation using point cloud analysis showed high registration accuracy, which translated to a 100% task completion rate and an average navigation time of 124.5 s among participants. Furthermore, usability testing using the System Usability Scale (SUS) yielded an average score of 96.5, categorizing the user interface as ’Best Imaginable’. These quantitative findings substantiate that the integration of 360° imaging and photogrammetric correction significantly enhances navigation reliability and user satisfaction compared with previous sensor fusion approaches. Full article
(This article belongs to the Section Electrical, Electronics and Communications Engineering)
Show Figures

Figure 1

23 pages, 27544 KB  
Article
Application of the Dynamic Latent Space Model to Social Networks with Time-Varying Covariates
by Ziqian Xu and Zhiyong Zhang
Computation 2026, 14(2), 34; https://doi.org/10.3390/computation14020034 - 1 Feb 2026
Viewed by 745
Abstract
With the growing accessibility of tools such as online surveys and web scraping, longitudinal social network data are more commonly collected in social science research along with non-network survey data. Such data play a critical role in helping social scientists understand how relationships [...] Read more.
With the growing accessibility of tools such as online surveys and web scraping, longitudinal social network data are more commonly collected in social science research along with non-network survey data. Such data play a critical role in helping social scientists understand how relationships develop and evolve over time. Existing dynamic network models such as the Stochastic Actor-Oriented Model and the Temporal Exponential Random Graph Model provide frameworks to analyze traits of both the networks and the external non-network covariates. However, research on the dynamic latent space model (DLSM) has focused mainly on factors intrinsic to the networks themselves. Despite some discussion, the role of non-network data such as contextual or behavioral covariates remain a topic to be further explored in the context of DLSMs. In this study, one application of the DLSM to incorporate dynamic non-network covariates collected alongside friendship networks using autoregressive processes is presented. By analyzing two friendship network datasets with different time points and psychological covariates, it is shown how external factors can contribute to a deeper understanding of social interaction dynamics over time. Full article
Show Figures

Graphical abstract

21 pages, 1923 KB  
Article
Preparedness Without Pedagogy? An AI-Assisted Web Scraping Analysis of Informal Online Disaster Preparedness Resources for the Public
by Sophie Lacher and Matthias Rohs
Educ. Sci. 2026, 16(1), 146; https://doi.org/10.3390/educsci16010146 - 19 Jan 2026
Viewed by 750
Abstract
Informal learning increasingly occurs in digital environments, where citizens access, evaluate and apply knowledge outside of formal education. In the context of disaster preparedness, such informal learning is crucial for promoting individual and collective self-protection. This study examines how disaster preparedness knowledge is [...] Read more.
Informal learning increasingly occurs in digital environments, where citizens access, evaluate and apply knowledge outside of formal education. In the context of disaster preparedness, such informal learning is crucial for promoting individual and collective self-protection. This study examines how disaster preparedness knowledge is represented in German-language online resources, and how these materials can be categorised from an adult education perspective. An exploratory mixed-methods design combining expert-guided sampling, a qualitatively developed coding scheme, large-scale web scraping and AI-assisted classification was employed. A total of 7305 webpages were analysed in terms of actor type, topic, media format, and didactic design. The findings suggest that government and commercial organisations dominate the online preparedness landscape, with limited contributions from civil society and individuals. Thematically, most resources focus on general preventive measures and checklists, whereas scenario-specific and procedural content is underrepresented. Didactically rich and interactive formats are rare, with most materials relying on static, text-based communication. From an adult education perspective, these results suggest a gap between raising awareness and active learning. While online resources offer easy access to preparedness knowledge, they rarely facilitate deeper understanding, participation or collaborative learning. Methodologically, the study illustrates how AI-assisted analysis can combine qualitative interpretive depth with computational scalability in educational research. Full article
(This article belongs to the Special Issue Investigating Informal Learning in the Age of Technology)
Show Figures

Figure 1

23 pages, 2568 KB  
Article
Fusing Multi-Source Data with Machine Learning for Ship Emission Calculation in Inland Waterways
by Chao Wang, Hao Wu and Zhirui Ye
Atmosphere 2026, 17(1), 72; https://doi.org/10.3390/atmos17010072 - 9 Jan 2026
Viewed by 644
Abstract
Accurate estimation of ship emissions is essential for the effective enforcement of emission control policies in inland waterways. However, existing “bottom-up” models face significant challenges owing to severe data scarcity for inland ships, particularly regarding ship static parameters. This study proposes a novel [...] Read more.
Accurate estimation of ship emissions is essential for the effective enforcement of emission control policies in inland waterways. However, existing “bottom-up” models face significant challenges owing to severe data scarcity for inland ships, particularly regarding ship static parameters. This study proposes a novel data fusion and machine learning framework to address this issue. The methodology integrates real-time SO2 and CO2 pollutant concentrations on the Nanjing Dashengguan Yangtze River Bridge, Automatic Identification System (AIS) data, and meteorological information. To address the scarcity of design data for inland ships, web scraping was used to extract basic parameters, which were then used to train five machine learning models. Among them, the XGBoost model demonstrated superior performance in predicting the main engine rated power. A refined activity-based emission model combines these predicted parameters, ship operational profiles, and specific emission factors to calculate real-time emission source strengths. Furthermore, the model was validated against field measurements by comparing the calculated and measured emission source strengths from ships, demonstrating high predictive accuracy with R2 values of 0.980 for SO2 and 0.977 for CO2, and MAPE below 13%. This framework provides a reliable and scalable approach for real-time emission monitoring and supports regulatory enforcement in inland waterways. Full article
Show Figures

Figure 1

38 pages, 4787 KB  
Article
Spatial Distribution Characteristics of Marine Economy Based on AI-Assisted Multi-Source Data Fusion and Random Forest Analysis
by Mingming Wen, Quan Chen and Zhaoheng Lv
Sustainability 2025, 17(24), 11090; https://doi.org/10.3390/su172411090 - 11 Dec 2025
Cited by 1 | Viewed by 878
Abstract
Understanding the spatial dynamics of China’s marine economic geography is essential for sustainable coastal development and marine spatial governance. This study examines the spatial distribution patterns and influencing factors of spatial differentiation in China’s marine economy from 2013 to 2023, utilizing AI techniques [...] Read more.
Understanding the spatial dynamics of China’s marine economic geography is essential for sustainable coastal development and marine spatial governance. This study examines the spatial distribution patterns and influencing factors of spatial differentiation in China’s marine economy from 2013 to 2023, utilizing AI techniques to facilitate multi-source data fusion and employing a Random Forest analytical method. The research was integrated with AI-based web-scraping, automated data-cleaning procedures, multi-source data preprocessing, Min–Max normalization, and Random Forest regression to accomplish multi-source data fusion and factor-importance analysis. Kernel density estimation, global Moran’s I, Getis-Ord Gi* statistics, and buffer zone analysis were employed to characterize spatial heterogeneity across coastal, island, and maritime economic zones, while Spearman’s correlation was used to quantify the relationships of influencing factors. Results indicate that China’s marine economy exhibits a pronounced “south–hot–north–cold and east–strong–west–weak” spatial gradient, with high-value clusters concentrated in the Bohai Rim, Yangtze River Delta, and Guangdong–Hong Kong–Macao Greater Bay Area. The coastal zone economy accounts for over 65% of the national marine GDP and acts as the dominant driver of spatial agglomeration. Policy implications suggest strengthening cross-regional industrial cooperation and optimizing spatial planning to enhance marine economic resilience and sustainability. Full article
Show Figures

Figure 1

24 pages, 2248 KB  
Article
Understanding Public Reactions Across Time: A Sentiment Analysis of Itaewon Halloween Crowd Crush
by Camille Velasco Lim and Han-Woo Park
Digital 2025, 5(4), 65; https://doi.org/10.3390/digital5040065 - 10 Dec 2025
Viewed by 1993
Abstract
Following the Itaewon Halloween Crowd Crush of 29 October 2022, this study examines how public sentiment evolved on Naver, South Korea’s most influential digital platform. While prior research has focused on mainstream media and global social networks, little is known about localized discourse [...] Read more.
Following the Itaewon Halloween Crowd Crush of 29 October 2022, this study examines how public sentiment evolved on Naver, South Korea’s most influential digital platform. While prior research has focused on mainstream media and global social networks, little is known about localized discourse on Naver. To address this gap, we analyzed 2107 user-generated posts collected via Python-based web scraping across three time periods: the immediate aftermath, first anniversary, and passage of the Itaewon Special Law. Semantic network analysis, sentiment classification, and logistic regression were applied to uncover patterns in discourse and emotional tone. Results reveal a shift from grief and outrage in 2022 to demands for political accountability, safety reform, and memorialization by 2024. High-frequency keywords reflected media and government narratives, while low-frequency terms exposed grassroots voices and emotional nuance. Regression analysis confirmed statistically significant associations between sentiment, title length, and year. These findings suggest that digital platforms not only mirror public sentiment but also shape the emotional and political framing of national tragedies. By tracing sentiment over time, this study contributes to understanding how echo chambers, narrative framing, and temporal context interact in shaping collective responses to crisis. Full article
Show Figures

Figure 1

30 pages, 4743 KB  
Article
A Lifestyle-Based Fuzzy-Enhanced ANN Model for Early Prediction of Type 2 Diabetes and Personalized Management in the North Indian Population
by Shahid Mohammad Ganie and Majid Bashir Malik
Diagnostics 2025, 15(24), 3139; https://doi.org/10.3390/diagnostics15243139 - 10 Dec 2025
Viewed by 837
Abstract
Background: Type 2 Diabetes Mellitus (T2DM) continues to rise rapidly in Indian communities, affecting millions and posing a major public health challenge. Early identification of risk and timely lifestyle intervention are crucial for prevention. This study aims to develop a lifestyle-driven, fuzzy-enhanced Artificial [...] Read more.
Background: Type 2 Diabetes Mellitus (T2DM) continues to rise rapidly in Indian communities, affecting millions and posing a major public health challenge. Early identification of risk and timely lifestyle intervention are crucial for prevention. This study aims to develop a lifestyle-driven, fuzzy-enhanced Artificial Neural Network (ANN) model for early T2DM prediction and to design a personalized recommendation framework tailored to the North Indian population. Methods: A comprehensive exploratory data analysis, including statistical significance testing and age-cohort assessment, was conducted to evaluate data quality and identify key lifestyle associations. The ANN model was trained on 1939 lifestyle profiles and classified individuals into four risk categories: low, moderate, high-risk, and diabetic. A monotonic spline-based calibration method was used to refine predicted probabilities. Additionally, a web-based system, the Personalized Care and Intelligence System for Early Diabetes Assessment (PCISEDA), was developed to deliver individualized diet and physical activity recommendations. Cost-effective lifestyle options were curated via a structured web-scraping pipeline. Results: The proposed fuzzy-enhanced ANN model achieved an accuracy of 93.64%, precision of 94.00%, recall of 93.50%, F1-score of 93.50%, and a multiclass ROC–AUC of 94.07%, demonstrating strong discriminative performance. Feature importance analysis revealed age, weight, urination frequency, and thirst as the most influential lifestyle predictors of T2DM risk. The PCISEDA system successfully generated personalized and economically feasible lifestyle recommendations for each risk category. Conclusions: This lifestyle-based AI framework demonstrates substantial potential for early T2DM risk stratification and tailored lifestyle management. The integration of fuzzy calibration and personalized recommendations offers an accurate, scalable, and cost-effective solution that may support diabetes prevention and management in resource-constrained healthcare settings. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Graphical abstract

45 pages, 5846 KB  
Article
A Machine Learning Framework for Harvesting and Harmonizing Cultural and Touristic Data
by Kimon Deligiannis, Christos Tryfonopoulos, Paraskevi Raftopoulou, Costas Vassilakis, Vassilis Kaffes and Spiros Skiadopoulos
Information 2025, 16(12), 1038; https://doi.org/10.3390/info16121038 - 28 Nov 2025
Viewed by 2164
Abstract
Cultural and touristic information is increasingly available through a multitude of heterogeneous sources, including official repositories, community platforms, and open data initiatives. While prominent landmarks are typically covered across sources, less-known attractions are also documented with varying degrees of detail, resulting in fragmented, [...] Read more.
Cultural and touristic information is increasingly available through a multitude of heterogeneous sources, including official repositories, community platforms, and open data initiatives. While prominent landmarks are typically covered across sources, less-known attractions are also documented with varying degrees of detail, resulting in fragmented, overlapping, or complementary content. To enable integrated access to this wealth of information, harvesting and consolidation mechanisms are required to collect, reconcile, and unify distributed content referring to the same entities. This paper presents a machine learning-driven framework for harvesting, homogenizing, and augmenting cultural and touristic data across multilingual sources. Our approach addresses entity resolution, duplication detection, and content harmonization, laying the foundation for enriched, unified representations of attractions and points of interest. The framework is designed to support scalable integration pipelines and can be deployed in applications aimed at tourism promotion, digital heritage, and smart travel services. Full article
(This article belongs to the Special Issue Editorial Board Members’ Collection Series: "Information Systems")
Show Figures

Graphical abstract

17 pages, 565 KB  
Article
From Headlines to Thumbnails: Comparative Analysis of Web Publications in Bulgarian Digital Media and YouTube
by Plamen Hristov Milev and Yavor Nikolov Tabov
Journal. Media 2025, 6(4), 202; https://doi.org/10.3390/journalmedia6040202 - 28 Nov 2025
Cited by 1 | Viewed by 1646
Abstract
The objective of this study is to determine if the thematic priorities of news organizations are consistent or platform-specific by investigating the cross-platform strategies of three leading Bulgarian news agencies. Methodologically, the study combines a quantitative TF-IDF text analysis of 315,103 headlines from [...] Read more.
The objective of this study is to determine if the thematic priorities of news organizations are consistent or platform-specific by investigating the cross-platform strategies of three leading Bulgarian news agencies. Methodologically, the study combines a quantitative TF-IDF text analysis of 315,103 headlines from their websites and 6961 titles from their official YouTube channels with a qualitative analysis of YouTube thumbnails to assess their strategic visual contribution. The findings reveal a significant strategic divergence: YouTube channels are primarily dedicated to high-impact domestic political news centered on key public figures, while their official websites feature a much broader thematic scope, covering international conflicts or extensive cultural events. The thumbnail analysis further shows they function as a critical visual layer, adding emotional context and explicit cues that are not present in text headlines. This research concludes that news agencies do not simply mirror content but strategically adapt it to leverage the unique characteristics and audience expectations of each platform, employing distinct models for their YouTube and web presences. Full article
Show Figures

Graphical abstract

19 pages, 1467 KB  
Article
AI-Driven Process Mining for ESG Risk Assessment in Sustainable Management
by Riccardo Censi, Paola Campana, Francesco Bellini, Fulvio Schettino and Chiara De Pucchio
Buildings 2025, 15(23), 4260; https://doi.org/10.3390/buildings15234260 - 25 Nov 2025
Cited by 4 | Viewed by 2114
Abstract
The construction sector faces growing challenges in integrating sustainability, risk management, and regulatory compliance, in line with initiatives such as the European Green Deal, the Corporate Sustainability Reporting Directive, and international building standards. However, the systematic adoption of ESG metrics in decision-making remains [...] Read more.
The construction sector faces growing challenges in integrating sustainability, risk management, and regulatory compliance, in line with initiatives such as the European Green Deal, the Corporate Sustainability Reporting Directive, and international building standards. However, the systematic adoption of ESG metrics in decision-making remains limited due to fragmented data, the lack of predictive tools, and reliance on static reporting. This study proposes and illustrates a digital framework, based on simulated data, that combines Artificial Intelligence, Process Mining, and Robotic Process Automation to enhance ESG risk assessment in sustainable construction management. The model, formalized through Business Process Model and Notation, integrates Machine Learning for risk weighting and classification, and leverages Web Scraping and Business Intelligence for dynamic data acquisition. A simulated case study involving 100 synthetic construction projects is used to demonstrate the internal logic and quantitative feasibility of the framework, showing how automated data integration and predictive modeling can improve the consistency of ESG risk identification and classification. While the results are illustrative rather than empirical, they confirm the analytical coherence and reproducibility of the proposed workflow. From a scientific perspective, it contributes an integrated methodology that bridges predictive analytics and process management for ESG evaluation. From a practical standpoint, it offers a structured and reproducible workflow to anticipate, classify, and mitigate ESG risks, supporting the construction sector’s transition toward data-driven and sustainability-first management practices. Full article
(This article belongs to the Special Issue Applying Artificial Intelligence in Construction Management)
Show Figures

Figure 1

Back to TopTop