Topic Editors

National Institute of Nursing Research (NINR), Bethesda, MD 20892, USA
Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, MD 20742, USA

The Use of Big Data in Public Health Research and Practice

Abstract submission deadline
31 October 2025
Manuscript submission deadline
31 December 2025
Viewed by
4054

Topic Information

Dear Colleagues,

We are organizing a Topic on the use of big data to inform health research and practice. To enable decision-making, it is important to obtain timely data on the determinants of health and well-being. Big data can often be operational or “organic” data generated for non-research purposes, including social media, news feeds, Google Street View images, online reviews, blogs, electronic health records, pharmacy records, and billing records. This Topic is focused on innovative ways that big data are leveraged for health research and practice. Some possible submission ideas are listed below; however, submissions addressing other related topics are also welcomed:

  • Use of electronic health records, billing data, and pharmacy data to understand individualized risk factors and treatment success; 
  • Characterization of built environments with big data derived from various sources (e.g., Street View images and remote sensing imagery data) as well as their impact on health; 
  • Using various user-generated content (e.g., GPS data, accelerometer data, users’ review data, social media data, and web search data) to study individual behaviors and social/cultural environments as well as their impacts on people’s health; 
  • Development of new methods or tools (e.g., natural language processing, machine learning, database management, high-performance computing, data mining, cloud computing, computer vision, visualization, geographic information systems, and spatial analysis) for big-data-based health research; 
  • Use of big data in COVID-19-related research; 
  • Application or development of causal inference methods for big data;
  • Investigating and addressing data quality and uncertainty issues;
  • Blending and integration of big data from different sources.

Dr. Quynh C. Nguyen
Dr. Thu T. Nguyen
Topic Editors

Keywords

  • big data
  • artificial intelligence
  • machine learning
  • deep learning
  • data science
  • natural language processing
  • computer vision
  • chat GPT

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Cancers
cancers
4.5 8.0 2009 17.4 Days CHF 2900 Submit
International Journal of Environmental Research and Public Health
ijerph
- 7.3 2004 25.8 Days CHF 2500 Submit
ISPRS International Journal of Geo-Information
ijgi
2.8 6.9 2012 35.8 Days CHF 1900 Submit
Machine Learning and Knowledge Extraction
make
4.0 6.3 2019 20.8 Days CHF 1800 Submit
Smart Cities
smartcities
7.0 11.2 2018 28.4 Days CHF 2000 Submit

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (4 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
19 pages, 8912 KiB  
Article
Revealing Spatial Patterns and Environmental Influences on Jogging Volume and Speed: Insights from Crowd-Sourced GPS Trajectory Data and Random Forest
by Xiao Yang, Chengbo Zhang and Linzhen Yang
ISPRS Int. J. Geo-Inf. 2025, 14(2), 80; https://doi.org/10.3390/ijgi14020080 - 13 Feb 2025
Viewed by 280
Abstract
Outdoor jogging plays a critical role in active mobility and transport-related physical activity (TPA), contributing to both urban health and sustainability. While existing studies have primarily focused on jogging participation volumes through survey data, they often overlook the real-time dynamics that shape jogging [...] Read more.
Outdoor jogging plays a critical role in active mobility and transport-related physical activity (TPA), contributing to both urban health and sustainability. While existing studies have primarily focused on jogging participation volumes through survey data, they often overlook the real-time dynamics that shape jogging experiences. This study seeks to provide a data-driven analysis of both jogging volume and speed, exploring how environmental factors influence these behaviors. Utilizing a dataset of over 1000 crowd-sourced jogging trajectories in Shenzhen, we spatially linked these trajectories to road-section-level units to map the distribution of jogging volume and average speed. By depicting a bivariate map of both behavioral characteristics, we identified spatial patterns in jogging behavior, elucidating variations in the distribution of volume and speed. A random forest regression model was validated and employed to capture nonlinear relationships and assess the differential impacts of various environmental factors on jogging volume and speed. The results reveal distinct jogging patterns across the city, where jogging volume is shaped by the mixed interplay of natural, visual, and built environment factors, while jogging speed is primarily influenced by visual factors. Additionally, the analysis highlights nonlinear effects, particularly identifying a threshold beyond which incremental environmental improvements provide diminishing returns in jogging speed. These findings clarify the distinct roles of environmental factors in influencing jogging volume and speed, offering insights into the dynamics of active mobility. Ultimately, this study provides data-informed implications for urban planners seeking to create environments that support TPA and promote active lifestyles. Full article
Show Figures

Figure 1

23 pages, 4222 KiB  
Article
Intersecting Paths to Health: A Factor Analysis Approach to Socioeconomic and Environmental Determinants in Indiana
by Siavash Ghorbany, Ming Hu, Siyuan Yao, Chaoli Wang, Matthew Sisk, Quynh C. Nguyen and Kai Zhang
Int. J. Environ. Res. Public Health 2025, 22(2), 219; https://doi.org/10.3390/ijerph22020219 - 4 Feb 2025
Viewed by 563
Abstract
Public health is the basis of society’s well-being and the nation’s development. Despite the importance of this factor and huge investments in the health sector in the United States, public health is facing enormous challenges due to the unknown nature of the influential [...] Read more.
Public health is the basis of society’s well-being and the nation’s development. Despite the importance of this factor and huge investments in the health sector in the United States, public health is facing enormous challenges due to the unknown nature of the influential variables in this sector. This research aims to investigate the influential variables on public health from different sources including the demographic features, built environment, socioeconomic variables, and environmental factors impact on 30 major health issues. To achieve this goal, this study utilizes exploratory factor analysis and multiple regression methods on the data obtained from the state of Indiana. The results indicated that health issues and influential factors can be divided into five main factors. This study identifies Health Burdens and Socioeconomic Disparities as a key factor, encompassing a wide range of health issues and socioeconomic variables, highlighting a significant association between socioeconomic disparities, poor health outcomes, and environmental exposures. The analysis underscores the intricate relationship between socioeconomic status, health behaviors, chronic diseases, and environmental factors, suggesting that effective interventions must address healthcare access, quality, and broader determinants of health to improve outcomes in affected communities. The results of this study can be helpful to public health policymakers, urban planners, and future public health researchers. Full article
Show Figures

Figure 1

14 pages, 2597 KiB  
Article
Potential and Observed Supply–Demand Characteristics of Medical Services: A Case Study of Nighttime Visits in Shenzhen
by Xiaojie Wu, Zhengdong Huang and Xi Yu
ISPRS Int. J. Geo-Inf. 2024, 13(11), 382; https://doi.org/10.3390/ijgi13110382 - 30 Oct 2024
Viewed by 734
Abstract
Hospital selection patterns are essential for evaluating medical accessibility and optimizing resource management. In the absence of medical records, early studies primarily used accessibility functions to estimate potential selection probabilities (PSPs). With the advent of travel data, data-driven functions have enabled the calculation [...] Read more.
Hospital selection patterns are essential for evaluating medical accessibility and optimizing resource management. In the absence of medical records, early studies primarily used accessibility functions to estimate potential selection probabilities (PSPs). With the advent of travel data, data-driven functions have enabled the calculation of observed selection probabilities (OSPs). Comparing PSP and OSP helps to leverage travel data to understand hospital selection preferences and improve medical service evaluation models. This study proposes a selection probability-based accessibility model for calculating PSP and OSP accessibility. A case study in Shenzhen employed nighttime navigation data to reduce interference from different travel modes. The distance decay function was validated, with exponential and Gaussian functions performing best. For hospitals, the PSP distribution closely aligned with OSP, except in areas with high hospital density. This discrepancy may result from the PSP function overestimating the selection probability for nearby hospitals, a limitation that could be addressed by fitting the distance decay function to actual data. PSP-based accessibility and Gini coefficients differ from those of OSP. However, when parameters are fitted to actual data, the PSP- and OSP-based functions produce nearly identical results. Fitting to actual data can notably improve the accuracy of PSP and the corresponding accessibility outcomes. These findings may provide valuable references for medical service evaluation methodologies and offer insights for planning and management. Full article
Show Figures

Figure 1

16 pages, 2887 KiB  
Article
Global and Local Interpretable Machine Learning Allow Early Prediction of Unscheduled Hospital Readmission
by Rafael Ruiz de San Martín, Catalina Morales-Hernández, Carmen Barberá, Carlos Martínez-Cortés, Antonio Jesús Banegas-Luna, Francisco José Segura-Méndez, Horacio Pérez-Sánchez, Isabel Morales-Moreno and Juan José Hernández-Morante
Mach. Learn. Knowl. Extr. 2024, 6(3), 1653-1666; https://doi.org/10.3390/make6030080 - 17 Jul 2024
Viewed by 1301
Abstract
Nowadays, most of the health expenditure is due to chronic patients who are readmitted several times for their pathologies. Personalized prevention strategies could be developed to improve the management of these patients. The aim of the present work was to develop local predictive [...] Read more.
Nowadays, most of the health expenditure is due to chronic patients who are readmitted several times for their pathologies. Personalized prevention strategies could be developed to improve the management of these patients. The aim of the present work was to develop local predictive models using interpretable machine learning techniques to early identify individual unscheduled hospital readmissions. To do this, a retrospective, case-control study, based on information regarding patient readmission in 2018–2019, was conducted. After curation of the initial dataset (n = 76,210), the final number of participants was n = 29,026. A machine learning analysis was performed following several algorithms using unscheduled hospital readmissions as dependent variable. Local model-agnostic interpretability methods were also performed. We observed a 13% rate of unscheduled hospital readmissions cases. There were statistically significant differences regarding age and days of stay (p < 0.001 in both cases). A logistic regression model revealed chronic therapy (odds ratio: 3.75), diabetes mellitus history (odds ratio: 1.14), and days of stay (odds ratio: 1.02) as relevant factors. Machine learning algorithms yielded better results regarding sensitivity and other metrics. Following, this procedure, days of stay and age were the most important factors to predict unscheduled hospital readmissions. Interestingly, other variables like allergies and adverse drug reaction antecedents were relevant. Individualized prediction models also revealed a high sensitivity. In conclusion, our study identified significant factors influencing unscheduled hospital readmissions, emphasizing the impact of age and length of stay. We introduced a personalized risk model for predicting hospital readmissions with notable accuracy. Future research should include more clinical variables to refine this model further. Full article
Show Figures

Figure 1

Back to TopTop