ijerph-logo

Journal Browser

Journal Browser

Big Data in Public Health: Challenges and Opportunities

A special issue of International Journal of Environmental Research and Public Health (ISSN 1660-4601). This special issue belongs to the section "Health Care Sciences".

Deadline for manuscript submissions: closed (30 September 2024) | Viewed by 7892

Special Issue Editor


E-Mail Website
Guest Editor
Crimial Justice, Pennsylvania State University, Schuylkill, PA 17972, USA
Interests: big data analysis; machine learning; juvenile delinquency; bullying and cyberbullying

Special Issue Information

Dear Colleagues,

Big data is one element that has consistently helped to achieve public health goals through its ability to deliver to practitioners a variety of structured or unstructured data not previously possible. Big data has enabled more widespread and specific research and trials of stratifying and segmenting populations at risk for a variety of health problems. The challenges and opportunities for big data in public health in modern society are more complex and diverse than in the past. There are many ways to predict and analyze the phenomena. The current era of the fourth industrial revolution is experiencing innovative changes as cutting-edge information and communications technology are incorporated into all areas of the economy and society, for example, big data, artificial intelligence (AI), the Internet of Things, and mobile technology. Researchers play a very important role in this process. They create or assemble high-quality data that can be used to train machine-learning systems, find machine-learning algorithms that are suitable for the data, and perform modeling. This Special Issue will introduce the challenges and opportunities in big data in public health.

Dr. Juyoung Song
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. International Journal of Environmental Research and Public Health is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2500 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data
  • public health
  • machine learning
  • social big data
  • artificial intelligence

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 1098 KiB  
Article
Wrangling Real-World Data: Optimizing Clinical Research Through Factor Selection with LASSO Regression
by Kerry A. Howard, Wes Anderson, Jagdeep T. Podichetty, Ruth Gould, Danielle Boyce, Pam Dasher, Laura Evans, Cindy Kao, Vishakha K. Kumar, Chase Hamilton, Ewy Mathé, Philippe J. Guerin, Kenneth Dodd, Aneesh K. Mehta, Chris Ortman, Namrata Patil, Jeselyn Rhodes, Matthew Robinson, Heather Stone and Smith F. Heavner
Int. J. Environ. Res. Public Health 2025, 22(4), 464; https://doi.org/10.3390/ijerph22040464 - 21 Mar 2025
Viewed by 504
Abstract
Data-driven approaches to clinical research are necessary for understanding and effectively treating infectious diseases. However, challenges such as issues with data validity, lack of collaboration, and difficult-to-treat infectious diseases (e.g., those that are rare or newly emerging) hinder research. Prioritizing innovative methods to [...] Read more.
Data-driven approaches to clinical research are necessary for understanding and effectively treating infectious diseases. However, challenges such as issues with data validity, lack of collaboration, and difficult-to-treat infectious diseases (e.g., those that are rare or newly emerging) hinder research. Prioritizing innovative methods to facilitate the continued use of data generated during routine clinical care for research, but in an organized, accelerated, and shared manner, is crucial. This study investigates the potential of CURE ID, an open-source platform to accelerate drug-repurposing research for difficult-to-treat diseases, with COVID-19 as a use case. Data from eight US health systems were analyzed using least absolute shrinkage and selection operator (LASSO) regression to identify key predictors of 28-day all-cause mortality in COVID-19 patients, including demographics, comorbidities, treatments, and laboratory measurements captured during the first two days of hospitalization. Key findings indicate that age, laboratory measures, severity of illness indicators, oxygen support administration, and comorbidities significantly influenced all-cause 28-day mortality, aligning with previous studies. This work underscores the value of collaborative repositories like CURE ID in providing robust datasets for prognostic research and the importance of factor selection in identifying key variables, helping to streamline future research and drug-repurposing efforts. Full article
(This article belongs to the Special Issue Big Data in Public Health: Challenges and Opportunities)
Show Figures

Figure 1

14 pages, 2109 KiB  
Article
Feature Selection and Machine Learning Approaches in Prediction of Current E-Cigarette Use Among U.S. Adults in 2022
by Wei Fang, Ying Liu, Chun Xu, Xingguang Luo and Kesheng Wang
Int. J. Environ. Res. Public Health 2024, 21(11), 1474; https://doi.org/10.3390/ijerph21111474 - 6 Nov 2024
Viewed by 1238
Abstract
Feature selection is essentially the process of picking informative and relevant features from a larger collection of features. Few studies have focused on predictors for current e-cigarette use among U.S. adults using feature selection and machine learning (ML) approaches. This study aimed to [...] Read more.
Feature selection is essentially the process of picking informative and relevant features from a larger collection of features. Few studies have focused on predictors for current e-cigarette use among U.S. adults using feature selection and machine learning (ML) approaches. This study aimed to perform feature selection and develop ML approaches in prediction of current e-cigarette use using the 2022 Health Information National Trends Survey (HINTS 6). The Boruta algorithm and the least absolute shrinkage and selection operator (LASSO) were used to perform feature selection of 71 variables. The random oversampling example (ROSE) method was utilized to deal with imbalance data. Five ML tools including support vector machines (SVMs), logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and extreme gradient boosting (XGBoost) were applied to develop ML models. The overall prevalence of current e-cigarette use was 4.3%. Using the overlapped 15 variables selected by Boruta and LASSO, the RF algorithm provided the best classifier with an accuracy of 0.992, sensitivity of 0.985, F1 score of 0.991, and AUC of 0.999. Weighted logistic regression further confirmed that age, education level, smoking status, belief in the harm of e-cigarette use, binge drinking, belief in alcohol increasing cancer, and the Patient Health Questionnaire-4 (PHQ4) score were associated with e-cigarette use. This study confirmed the strength of ML techniques in survey data, and the findings will guide inquiry into behaviors and mentalities of substance users. Full article
(This article belongs to the Special Issue Big Data in Public Health: Challenges and Opportunities)
Show Figures

Figure 1

11 pages, 808 KiB  
Article
Exploring Future Signals of COVID-19 and Response to Information Diffusion Using Social Media Big Data
by Juyoung Song, Dal-Lae Jin, Tae Min Song and Sang Ho Lee
Int. J. Environ. Res. Public Health 2023, 20(9), 5753; https://doi.org/10.3390/ijerph20095753 - 8 May 2023
Cited by 2 | Viewed by 2049
Abstract
COVID-19 is a respiratory infectious disease that first reported in Wuhan, China, in December 2019. With COVID-19 spreading to patients worldwide, the WHO declared it a pandemic on 11 March 2020. This study collected 1,746,347 tweets from the Korean-language version of Twitter between [...] Read more.
COVID-19 is a respiratory infectious disease that first reported in Wuhan, China, in December 2019. With COVID-19 spreading to patients worldwide, the WHO declared it a pandemic on 11 March 2020. This study collected 1,746,347 tweets from the Korean-language version of Twitter between February and May 2020 to explore future signals of COVID-19 and present response strategies for information diffusion. To explore future signals, we analyzed the term frequency and document frequency of key factors occurring in the tweets, analyzing the degree of visibility and degree of diffusion. Depression, digestive symptoms, inspection, diagnosis kits, and stay home obesity had high frequencies. The increase in the degree of visibility was higher than the median value, indicating that the signal became stronger with time. The degree of visibility of the mean word frequency was high for disinfectant, healthcare, and mask. However, the increase in the degree of visibility was lower than the median value, indicating that the signal grew weaker with time. Infodemic had a higher degree of diffusion mean word frequency. However, the mean degree of diffusion increase rate was lower than the median value, indicating that the signal grew weaker over time. As the general flow of signal progression is latent signal → weak signal → strong signal → strong signal with lower increase rate, it is necessary to obtain active response strategies for stay home, inspection, obesity, digestive symptoms, online shopping, and asymptomatic. Full article
(This article belongs to the Special Issue Big Data in Public Health: Challenges and Opportunities)
Show Figures

Figure 1

11 pages, 2344 KiB  
Article
Analysis of Caregiver Burden Expressed in Social Media Discussions
by Catherine C. Shoults, Michael W. Rutherford, Aaron S. Kemp, Merideth A. Addicott, Aliza Brown, Carolyn J. Greene, Corey J. Hayes, Jennifer M. Gan, Linda J. Larson-Prior and Jonathan P. Bona
Int. J. Environ. Res. Public Health 2023, 20(3), 1933; https://doi.org/10.3390/ijerph20031933 - 20 Jan 2023
Cited by 5 | Viewed by 3376
Abstract
Almost 40% of US adults provide informal caregiving, yet research gaps remain around what burdens affect informal caregivers. This study uses a novel social media site, Reddit, to mine and better understand what online communities focus on as their caregiving burdens. These forums [...] Read more.
Almost 40% of US adults provide informal caregiving, yet research gaps remain around what burdens affect informal caregivers. This study uses a novel social media site, Reddit, to mine and better understand what online communities focus on as their caregiving burdens. These forums were accessed using an application programming interface, a machine learning classifier was developed to remove low information posts, and topic modeling was applied to the corpus. An expert panel summarized the forums’ themes into ten categories. The largest theme extracted from Reddit’s forums discussed the personal emotional toll of being a caregiver. This was followed by logistic issues while caregiving and caring for parents who have cancer. Smaller themes included approaches to end-of-life care, physical equipment needs when caregiving, and the use of wearables or technology to help monitor care recipients. The platform often discusses caregiving for parents which may reflect the age of Reddit’s users. This study confirms that Reddit forums are used for caregivers to discuss the burdens associated with their role and the types of stress that can result from informal caregiving. Full article
(This article belongs to the Special Issue Big Data in Public Health: Challenges and Opportunities)
Show Figures

Figure 1

Back to TopTop