The Use of Big Data in Public Health Research and Practice

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
Cancers cancers	4.4	8.8	2009	19.1 Days	CHF 2900
International Journal of Environmental Research and Public Health ijerph	-	8.5	2004	29.5 Days	CHF 2500
ISPRS International Journal of Geo-Information ijgi	2.8	7.2	2012	33.1 Days	CHF 1900
Machine Learning and Knowledge Extraction make	6.0	9.9	2019	27 Days	CHF 1800
Smart Cities smartcities	5.5	14.7	2018	25.2 Days	CHF 2000

19 pages, 1537 KB

Open AccessArticle

Impact of Spatial Aggregation Level on Environmental Epidemiology Analyses: A Case Study of Combined Heat and Ozone Effects on Cardiovascular Emergencies

by Lorenzo Gianquintieri, Amruta Umakant Mahakalkar and Enrico Gianluca Caiani

ISPRS Int. J. Geo-Inf. 2026, 15(3), 133; https://doi.org/10.3390/ijgi15030133 - 17 Mar 2026

Viewed by 320

Abstract

Background: Spatial granularity plays a central role in the analysis of environmental hazards, yet its influence on health impact assessment remains overlooked. This study explicitly treats spatial aggregation level as a methodological variable and examines how different spatial aggregation strategies affect the association [...] Read more.

Background: Spatial granularity plays a central role in the analysis of environmental hazards, yet its influence on health impact assessment remains overlooked. This study explicitly treats spatial aggregation level as a methodological variable and examines how different spatial aggregation strategies affect the association between high temperature, ozone, and out-of-hospital cardiovascular emergencies recorded by emergency medical services. Methods: A distribution thresholding approach is applied to both the environmental hazard and the health outcome. The analysis is conducted at three spatial levels: a fully aggregated region-wide level, population-based districts, and a combined strategy that cumulates district level results. The model estimates the Odds Ratio for each configuration. Results: The combined district-based strategy provides the most robust association, with an Odds Ratio of 1.13 (95% confidence interval 1.10 to 1.17). The region-wide and single district approaches show weaker or inconsistent significance. The findings indicate that the spatial level of analysis heavily impacts both the significance and the interpretability of the statistical results. Conclusions: The study demonstrates that the spatial structure of data strongly influences the detection of short-term health effects linked to environmental stressors. This contributes to the geomatics field by explicitly isolating spatial aggregation as an analytical dimension, demonstrating how spatial aggregation choices and explicit consideration of the Modifiable Areal Unit Problem can enhance methodological accuracy, support clearer spatial reasoning, and guide the development of more reliable territorial health indicators. Full article

(This article belongs to the Topic The Use of Big Data in Public Health Research and Practice)

► Show Figures

Figure 1

32 pages, 684 KB

Open AccessArticle

Screening Smarter, Not Harder: Budget Allocation Strategies for Technology-Assisted Reviews (TARs) in Empirical Medicine

by Giorgio Maria Di Nunzio

Mach. Learn. Knowl. Extr. 2025, 7(3), 104; https://doi.org/10.3390/make7030104 - 20 Sep 2025

Cited by 1 | Viewed by 1573

Abstract

In the technology-assisted review (TAR) area, most research has focused on ranking effectiveness and active learning strategies within individual topics, often assuming unconstrained review effort. However, real-world applications such as legal discovery or medical systematic reviews are frequently subject to global screening budgets. [...] Read more.

In the technology-assisted review (TAR) area, most research has focused on ranking effectiveness and active learning strategies within individual topics, often assuming unconstrained review effort. However, real-world applications such as legal discovery or medical systematic reviews are frequently subject to global screening budgets. In this paper, we revisit the CLEF eHealth TAR shared tasks (2017–2019) through the lens of budget-aware evaluation. We first reproduce and verify the official participant results, organizing them into a unified dataset for comparative analysis. Then, we introduce and assess four intuitive budget allocation strategies—even, proportional, inverse proportional, and threshold-capped greedy—to explore how review effort can be efficiently distributed across topics. To evaluate systems under resource constraints, we propose two cost-aware metrics: relevant found per cost unit (RFCU) and utility gain at budget (UG@B). These complement traditional recall by explicitly modeling efficiency and trade-offs between true and false positives. Our results show that different allocation strategies optimize different metrics: even and inverse proportional allocation favor recall, while proportional and capped strategies better maximize RFCU. UG@B remains relatively stable across strategies, reflecting its balanced formulation. A correlation analysis reveals that RFCU and UG@B offer distinct perspectives from recall, with varying alignment across years. Together, these findings underscore the importance of aligning evaluation metrics and allocation strategies with screening goals. We release all data and code to support reproducibility and future research on cost-sensitive TAR. Full article

(This article belongs to the Topic The Use of Big Data in Public Health Research and Practice)

► Show Figures

Graphical abstract

20 pages, 4720 KB

Open AccessArticle

Dynamic Optimization of Emergency Infrastructure Layouts Based on Population Influx: A Macao Case Study

by Zhen Wang, Zheyu Wang, On Kei Yeung, Mengmeng Zheng, Yitao Zhong and Sanqing He

ISPRS Int. J. Geo-Inf. 2025, 14(9), 322; https://doi.org/10.3390/ijgi14090322 - 23 Aug 2025

Cited by 2 | Viewed by 1458

Abstract

This study investigates the spatiotemporal optimization of small-scale emergency infrastructure in high-density urban environments, using nucleic acid testing sites in Macao as a case study. The objective is to enhance emergency responsiveness during future public health crises by aligning infrastructure deployment with dynamic [...] Read more.

This study investigates the spatiotemporal optimization of small-scale emergency infrastructure in high-density urban environments, using nucleic acid testing sites in Macao as a case study. The objective is to enhance emergency responsiveness during future public health crises by aligning infrastructure deployment with dynamic patterns of population influx. A behaviorally informed spatial decision-making framework is developed through the integration of kernel density estimation, point-of-interest (POI) distribution, and origin–destination (OD) path simulation based on an Ant Colony Optimization (ACO) algorithm. The results reveal pronounced temporal fluctuations in testing demand—most notably with crowd peaks occurring around 12:00 and 18:00—and highlight spatial mismatches between existing facility locations and key residential or functional clusters. The proposed approach illustrates the feasibility of coupling infrastructure layout with real-time mobility behavior and offers transferable insights for emergency planning in compact urban settings. Full article

(This article belongs to the Topic The Use of Big Data in Public Health Research and Practice)

► Show Figures

Figure 1

34 pages, 3423 KB

Open AccessReview

Early Warning of Infectious Disease Outbreaks Using Social Media and Digital Data: A Scoping Review

by Yamil Liscano, Luis A. Anillo Arrieta, John Fernando Montenegro, Diego Prieto-Alvarado and Jorge Ordoñez

Int. J. Environ. Res. Public Health 2025, 22(7), 1104; https://doi.org/10.3390/ijerph22071104 - 13 Jul 2025

Cited by 8 | Viewed by 6742

Abstract

Background and Aim: Digital surveillance, which utilizes data from social media, search engines, and other online platforms, has emerged as an innovative approach for the early detection of infectious disease outbreaks. This scoping review aimed to systematically map and characterize the methodologies, performance [...] Read more.

Background and Aim: Digital surveillance, which utilizes data from social media, search engines, and other online platforms, has emerged as an innovative approach for the early detection of infectious disease outbreaks. This scoping review aimed to systematically map and characterize the methodologies, performance metrics, and limitations of digital surveillance tools compared to traditional epidemiological monitoring. Methods: A scoping review was conducted in accordance with the Joanna Briggs Institute and PRISMA-SCR guidelines. Scientific databases including PubMed, Scopus, and Web of Science were searched, incorporating both empirical studies and systematic reviews without language restrictions. Key elements analyzed included digital sources, analytical algorithms, accuracy metrics, and validation against official surveillance data. Results: The reviewed studies demonstrate that digital surveillance can provide significant lead times (from days to several weeks) compared to traditional systems. While performance varies by platform and disease, many models showed strong correlations (r > 0.8) with official case data and achieved low predictive errors, particularly for influenza and COVID-19. Google Trends and X (formerly Twitter) emerged as the most frequently used sources, often analyzed using supervised regression, Bayesian models, and ARIMA techniques. Conclusions: While digital surveillance shows strong predictive capabilities, it faces challenges related to data quality and representativeness. Key recommendations include the development of standardized reporting guidelines to improve comparability across studies, the use of statistical techniques like stratification and model weighting to mitigate demographic biases, and leveraging advanced artificial intelligence to differentiate genuine health signals from media-driven noise. These steps are crucial for enhancing the reliability and equity of digital epidemiological monitoring. Full article

(This article belongs to the Topic The Use of Big Data in Public Health Research and Practice)

► Show Figures

Figure 1

20 pages, 6355 KB

Open AccessArticle

How Did the Fever Visit Management Policy During the COVID-19 Epidemic Impact Fever Medical Care Accessibility?

by Zhiyuan Zhao, Youjun Tu and Yicheng Ding

ISPRS Int. J. Geo-Inf. 2025, 14(3), 117; https://doi.org/10.3390/ijgi14030117 - 6 Mar 2025

Viewed by 1759

Abstract

Fever visit management (FVM) played a critical role in reducing the risk of local outbreaks caused by positive cases during the coronavirus disease 2019 (COVID-19) pandemic under the dynamic zero-COVID-19 policy. Fever clinics were established to satisfy the healthcare needs of citizens with [...] Read more.

Fever visit management (FVM) played a critical role in reducing the risk of local outbreaks caused by positive cases during the coronavirus disease 2019 (COVID-19) pandemic under the dynamic zero-COVID-19 policy. Fever clinics were established to satisfy the healthcare needs of citizens with fever symptoms, including those with and without COVID-19. Learning how FVM affects fever medical care accessibility for citizens in different places can support decision making in establishing fever clinics more equitably. However, the dynamic nature of the population at different times has rarely been considered in evaluating healthcare facility accessibility. To fill this gap, we adjusted the Gaussian-based two-step floating catchment area method (G2SFCA) by considering the hourly dynamics of the population distribution derived from mobile phone location data. The results generated from Xining city, China, showed that (1) the accessibility of fever clinics explicitly exhibited spatial distribution patterns, being high in the center and low in surrounding areas; (2) the accessibility reduction in suburban areas caused by FVM was approximately 2.8 times greater than that in the central city for the 15 min drive conditions; and (3) the accessibility of fever clinics based on the nighttime anchor point was overestimated in central areas, but underestimated in suburban areas. Full article

(This article belongs to the Topic The Use of Big Data in Public Health Research and Practice)

► Show Figures

Figure 1

27 pages, 23808 KB

Open AccessArticle

Impact of Shared Bicycle Spatial Patterns During Public Health Emergencies: A Case Study in the Core Area of Beijing

by Zheng Wen, Lujin Hu and Jing Hu

ISPRS Int. J. Geo-Inf. 2025, 14(2), 92; https://doi.org/10.3390/ijgi14020092 - 19 Feb 2025

Cited by 3 | Viewed by 1842

Abstract

During public health emergencies, studying the travel characteristics and influencing factors of shared bicycles during different time periods on weekdays can provide valuable insights for urban transportation planning and offer recommendations for bike-sharing systems (BSS) affected by such events. Utilizing bike-sharing data, this [...] Read more.

During public health emergencies, studying the travel characteristics and influencing factors of shared bicycles during different time periods on weekdays can provide valuable insights for urban transportation planning and offer recommendations for bike-sharing systems (BSS) affected by such events. Utilizing bike-sharing data, this study initiated the analysis by scrutinizing the spatial flow patterns in the core area of Beijing, employing network indicators within the framework of complex network theory. Subsequently, influencing factors associated with bike-sharing trips were pinpointed using the exponential random graph model (ERGM). Using COVID-19 as an example, it examines the impact of public health emergencies on bike-sharing during multiple time periods. Supported by the network analysis method, our findings revealed that the majority of travel activities occurred between adjacent areas. Throughout weekdays, a consistent level of travel activity was observed, exhibiting distinct patterns during daytime and nighttime. The period from 4:00 to 8:00 emerged as the peak time, characterized by heightened traffic and temperature changes. Morning commuting extended until 8:00–12:00, followed by a transition period from 12:00–16:00. The most active travel time, encompassing various purposes, was identified as 16:00–20:00. Additionally, the presence of hospitals and train stations amplified travel within the pandemic-affected area. Finally, variants of ERGMs were employed to assess the influence of finance, shopping, dining, education, transportation, roads, and COVID-19 on bike-sharing activities. The road network emerged as the most critical factor, exhibiting a significant negative impact. Conversely, COVID-19 had the most pronounced positive influence, with transportation stops and educational institutions also contributing significantly in a positive manner. This research provides valuable transportation planning insights for addressing public health emergencies and promotes the effective utilization of bike-sharing systems. Full article

(This article belongs to the Topic The Use of Big Data in Public Health Research and Practice)

► Show Figures

Figure 1

19 pages, 11085 KB

Open AccessArticle

Understanding Urban Park-Based Social Interaction in Shanghai During the COVID-19 Pandemic: Insights from Large-Scale Social Media Analysis

by Haotian Wang, Tianyu Su and Wanting Zhao

ISPRS Int. J. Geo-Inf. 2025, 14(2), 87; https://doi.org/10.3390/ijgi14020087 - 17 Feb 2025

Cited by 6 | Viewed by 3026

Abstract

The COVID-19 pandemic highlighted the role of urban parks as green spaces in mitigating social isolation and supporting public mental health. Research in this area is limited due to the lack of large-scale datasets. Moreover, timely studies are indeed necessary under pandemic conditions. [...] Read more.

The COVID-19 pandemic highlighted the role of urban parks as green spaces in mitigating social isolation and supporting public mental health. Research in this area is limited due to the lack of large-scale datasets. Moreover, timely studies are indeed necessary under pandemic conditions. This study employs quantitative methods to analyze the temporal and spatial changes in social interaction in 160 urban parks before, during, and after the COVID-19 pandemic, and assesses their correlation with the built environment. Social media data from the Dianping platform were collected for this purpose. A two-step analytical approach was employed: first, machine learning-based keyword analysis identified review data related to social interaction, leading to the construction of two indicators: social interaction intensity and social interaction recovery rate. Second, we applied regression models to explore the correlation between the two indicators in urban parks and 18 characteristics of the built environment. The built environment characteristics associated with social interaction intensity varied across different periods, with seven factors, including natural landscapes, perceptual experience, building density, and road intersections, showing significant correlations with the recovery of social interaction capabilities in the post-pandemic era. Based on these findings, it is recommended that urban planners consider integrating more flexible design element, such as adding greenery and enriching the audio-visual experience for visitors. Furthermore, enhancing the quality and accessibility of park amenities can foster social interaction, thereby contributing to public health resilience in future crises. This research recommends that urban park design should not only support communities’ immediate needs but also prepare for unforeseen challenges. Full article

(This article belongs to the Topic The Use of Big Data in Public Health Research and Practice)

► Show Figures

Figure 1

19 pages, 8912 KB

Open AccessArticle

Revealing Spatial Patterns and Environmental Influences on Jogging Volume and Speed: Insights from Crowd-Sourced GPS Trajectory Data and Random Forest

by Xiao Yang, Chengbo Zhang and Linzhen Yang

ISPRS Int. J. Geo-Inf. 2025, 14(2), 80; https://doi.org/10.3390/ijgi14020080 - 13 Feb 2025

Cited by 6 | Viewed by 2277

Abstract

Outdoor jogging plays a critical role in active mobility and transport-related physical activity (TPA), contributing to both urban health and sustainability. While existing studies have primarily focused on jogging participation volumes through survey data, they often overlook the real-time dynamics that shape jogging [...] Read more.

Outdoor jogging plays a critical role in active mobility and transport-related physical activity (TPA), contributing to both urban health and sustainability. While existing studies have primarily focused on jogging participation volumes through survey data, they often overlook the real-time dynamics that shape jogging experiences. This study seeks to provide a data-driven analysis of both jogging volume and speed, exploring how environmental factors influence these behaviors. Utilizing a dataset of over 1000 crowd-sourced jogging trajectories in Shenzhen, we spatially linked these trajectories to road-section-level units to map the distribution of jogging volume and average speed. By depicting a bivariate map of both behavioral characteristics, we identified spatial patterns in jogging behavior, elucidating variations in the distribution of volume and speed. A random forest regression model was validated and employed to capture nonlinear relationships and assess the differential impacts of various environmental factors on jogging volume and speed. The results reveal distinct jogging patterns across the city, where jogging volume is shaped by the mixed interplay of natural, visual, and built environment factors, while jogging speed is primarily influenced by visual factors. Additionally, the analysis highlights nonlinear effects, particularly identifying a threshold beyond which incremental environmental improvements provide diminishing returns in jogging speed. These findings clarify the distinct roles of environmental factors in influencing jogging volume and speed, offering insights into the dynamics of active mobility. Ultimately, this study provides data-informed implications for urban planners seeking to create environments that support TPA and promote active lifestyles. Full article

(This article belongs to the Topic The Use of Big Data in Public Health Research and Practice)

► Show Figures

Figure 1

23 pages, 4222 KB

Open AccessArticle

Intersecting Paths to Health: A Factor Analysis Approach to Socioeconomic and Environmental Determinants in Indiana

by Siavash Ghorbany, Ming Hu, Siyuan Yao, Chaoli Wang, Matthew Sisk, Quynh C. Nguyen and Kai Zhang

Int. J. Environ. Res. Public Health 2025, 22(2), 219; https://doi.org/10.3390/ijerph22020219 - 4 Feb 2025

Cited by 9 | Viewed by 3314

Abstract

Public health is the basis of society’s well-being and the nation’s development. Despite the importance of this factor and huge investments in the health sector in the United States, public health is facing enormous challenges due to the unknown nature of the influential [...] Read more.

Public health is the basis of society’s well-being and the nation’s development. Despite the importance of this factor and huge investments in the health sector in the United States, public health is facing enormous challenges due to the unknown nature of the influential variables in this sector. This research aims to investigate the influential variables on public health from different sources including the demographic features, built environment, socioeconomic variables, and environmental factors impact on 30 major health issues. To achieve this goal, this study utilizes exploratory factor analysis and multiple regression methods on the data obtained from the state of Indiana. The results indicated that health issues and influential factors can be divided into five main factors. This study identifies Health Burdens and Socioeconomic Disparities as a key factor, encompassing a wide range of health issues and socioeconomic variables, highlighting a significant association between socioeconomic disparities, poor health outcomes, and environmental exposures. The analysis underscores the intricate relationship between socioeconomic status, health behaviors, chronic diseases, and environmental factors, suggesting that effective interventions must address healthcare access, quality, and broader determinants of health to improve outcomes in affected communities. The results of this study can be helpful to public health policymakers, urban planners, and future public health researchers. Full article

(This article belongs to the Topic The Use of Big Data in Public Health Research and Practice)

► Show Figures

Figure 1

14 pages, 2597 KB

Open AccessArticle

Potential and Observed Supply–Demand Characteristics of Medical Services: A Case Study of Nighttime Visits in Shenzhen

by Xiaojie Wu, Zhengdong Huang and Xi Yu

ISPRS Int. J. Geo-Inf. 2024, 13(11), 382; https://doi.org/10.3390/ijgi13110382 - 30 Oct 2024

Cited by 3 | Viewed by 2146

Abstract

Hospital selection patterns are essential for evaluating medical accessibility and optimizing resource management. In the absence of medical records, early studies primarily used accessibility functions to estimate potential selection probabilities (PSPs). With the advent of travel data, data-driven functions have enabled the calculation [...] Read more.

Hospital selection patterns are essential for evaluating medical accessibility and optimizing resource management. In the absence of medical records, early studies primarily used accessibility functions to estimate potential selection probabilities (PSPs). With the advent of travel data, data-driven functions have enabled the calculation of observed selection probabilities (OSPs). Comparing PSP and OSP helps to leverage travel data to understand hospital selection preferences and improve medical service evaluation models. This study proposes a selection probability-based accessibility model for calculating PSP and OSP accessibility. A case study in Shenzhen employed nighttime navigation data to reduce interference from different travel modes. The distance decay function was validated, with exponential and Gaussian functions performing best. For hospitals, the PSP distribution closely aligned with OSP, except in areas with high hospital density. This discrepancy may result from the PSP function overestimating the selection probability for nearby hospitals, a limitation that could be addressed by fitting the distance decay function to actual data. PSP-based accessibility and Gini coefficients differ from those of OSP. However, when parameters are fitted to actual data, the PSP- and OSP-based functions produce nearly identical results. Fitting to actual data can notably improve the accuracy of PSP and the corresponding accessibility outcomes. These findings may provide valuable references for medical service evaluation methodologies and offer insights for planning and management. Full article

(This article belongs to the Topic The Use of Big Data in Public Health Research and Practice)

► Show Figures

Figure 1

16 pages, 2887 KB

Open AccessArticle

Global and Local Interpretable Machine Learning Allow Early Prediction of Unscheduled Hospital Readmission

by Rafael Ruiz de San Martín, Catalina Morales-Hernández, Carmen Barberá, Carlos Martínez-Cortés, Antonio Jesús Banegas-Luna, Francisco José Segura-Méndez, Horacio Pérez-Sánchez, Isabel Morales-Moreno and Juan José Hernández-Morante

Mach. Learn. Knowl. Extr. 2024, 6(3), 1653-1666; https://doi.org/10.3390/make6030080 - 17 Jul 2024

Cited by 2 | Viewed by 2875

Abstract

Nowadays, most of the health expenditure is due to chronic patients who are readmitted several times for their pathologies. Personalized prevention strategies could be developed to improve the management of these patients. The aim of the present work was to develop local predictive [...] Read more.

Nowadays, most of the health expenditure is due to chronic patients who are readmitted several times for their pathologies. Personalized prevention strategies could be developed to improve the management of these patients. The aim of the present work was to develop local predictive models using interpretable machine learning techniques to early identify individual unscheduled hospital readmissions. To do this, a retrospective, case-control study, based on information regarding patient readmission in 2018–2019, was conducted. After curation of the initial dataset (n = 76,210), the final number of participants was n = 29,026. A machine learning analysis was performed following several algorithms using unscheduled hospital readmissions as dependent variable. Local model-agnostic interpretability methods were also performed. We observed a 13% rate of unscheduled hospital readmissions cases. There were statistically significant differences regarding age and days of stay (p < 0.001 in both cases). A logistic regression model revealed chronic therapy (odds ratio: 3.75), diabetes mellitus history (odds ratio: 1.14), and days of stay (odds ratio: 1.02) as relevant factors. Machine learning algorithms yielded better results regarding sensitivity and other metrics. Following, this procedure, days of stay and age were the most important factors to predict unscheduled hospital readmissions. Interestingly, other variables like allergies and adverse drug reaction antecedents were relevant. Individualized prediction models also revealed a high sensitivity. In conclusion, our study identified significant factors influencing unscheduled hospital readmissions, emphasizing the impact of age and length of stay. We introduced a personalized risk model for predicting hospital readmissions with notable accuracy. Future research should include more clinical variables to refine this model further. Full article

(This article belongs to the Topic The Use of Big Data in Public Health Research and Practice)

► Show Figures

Figure 1

Topic Menu

Topic Editors

The Use of Big Data in Public Health Research and Practice

Topic Information

Keywords

Participating Journals

Published Papers (11 papers)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI