Predicting High-Risk Prostate Cancer Using Machine Learning Methods
School of Computer Science, University of Sydney, 2006 Sydney, Australia
*
Author to whom correspondence should be addressed.
Data 2019, 4(3), 129; https://doi.org/10.3390/data4030129
Received: 30 June 2019 / Revised: 15 August 2019 / Accepted: 19 August 2019 / Published: 2 September 2019
Prostate cancer can be low- or high-risk to the patient’s health. Current screening on the basis of prostate-specific antigen (PSA) levels has a tendency towards both false positives and false negatives, both of which have negative consequences. We obtained a dataset of 35,875 patients from the screening arm of the National Cancer Institute’s Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. We segmented the data into instances without prostate cancer, instances with low-risk prostate cancer, and instances with high-risk prostate cancer. We developed a pipeline to deal with imbalanced data and proposed algorithms to perform preprocessing on such datasets. We evaluated the accuracy of various machine learning algorithms in predicting high-risk prostate cancer. An accuracy of 91.5% can be achieved by the proposed pipeline, using standard scaling, SVMSMOTE sampling method, and AdaBoost for machine learning. We then evaluated the contribution of rate of change of PSA, age, BMI, and filtration by race to this model’s accuracy. We identified that including the rate of change of PSA and age in our model increased the area under the curve (AUC) of the model by 6.8%, whereas BMI and race had a minimal effect.
View Full-Text
▼
Show Figures
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
MDPI and ACS Style
Barlow, H.; Mao, S.; Khushi, M. Predicting High-Risk Prostate Cancer Using Machine Learning Methods. Data 2019, 4, 129.
AMA Style
Barlow H, Mao S, Khushi M. Predicting High-Risk Prostate Cancer Using Machine Learning Methods. Data. 2019; 4(3):129.
Chicago/Turabian StyleBarlow, Henry; Mao, Shunqi; Khushi, Matloob. 2019. "Predicting High-Risk Prostate Cancer Using Machine Learning Methods" Data 4, no. 3: 129.
Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.