Next Article in Journal
Cold Chain Logistics Management of Medicine with an Integrated Multi-Criteria Decision-Making Method
Previous Article in Journal
Evaluation of Qualitative Dietary Protocol (Diet4Hashi) Application in Dietary Counseling in Hashimoto Thyroiditis: Study Protocol of a Randomized Controlled Trial
Open AccessArticle

Predicting Hepatitis B Virus Infection Based on Health Examination Data of Community Population

Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China
Department of Epidemiology and Biostatistics, School of Public Health, University at Albany, State University of New York, Rensselaer, New York, NY 12144, USA
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2019, 16(23), 4842;
Received: 1 October 2019 / Revised: 26 November 2019 / Accepted: 27 November 2019 / Published: 2 December 2019
(This article belongs to the Section Health Behavior, Chronic Disease and Health Promotion)
Despite a decline in the prevalence of hepatitis B in China, the disease burden remains high. Large populations unaware of infection risk often fail to meet the ideal treatment window, resulting in poor prognosis. The purpose of this study was to develop and evaluate models identifying high-risk populations who should be tested for hepatitis B surface antigen. Data came from a large community-based health screening, including 97,173 individuals, with an average age of 54.94. A total of 33 indicators were collected as model predictors, including demographic characteristics, routine blood indicators, and liver function. Borderline-Synthetic minority oversampling technique (SMOTE) was conducted to preprocess the data and then four predictive models, namely, the extreme gradient boosting (XGBoost), random forest (RF), decision tree (DT), and logistic regression (LR) algorithms, were developed. The positive rate of hepatitis B surface antigen (HBsAg) was 8.27%. The area under the receiver operating characteristic curves for XGBoost, RF, DT, and LR models were 0.779, 0.752, 0.619, and 0.742, respectively. The Borderline-SMOTE XGBoost combined model outperformed the other models, which correctly predicted 13,637/19,435 cases (sensitivity 70.8%, specificity 70.1%), and the variable importance plot of XGBoost model indicated that age was of high importance. The prediction model can be used to accurately identify populations at high risk of hepatitis B infection that should adopt timely appropriate medical treatment measures. View Full-Text
Keywords: hepatitis B virus; machine learning; prediction hepatitis B virus; machine learning; prediction
Show Figures

Figure 1

MDPI and ACS Style

Wang, Y.; Du, Z.; Lawrence, W.R.; Huang, Y.; Deng, Y.; Hao, Y. Predicting Hepatitis B Virus Infection Based on Health Examination Data of Community Population. Int. J. Environ. Res. Public Health 2019, 16, 4842.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Back to TopTop