Next Article in Journal
Assessment of Water Quality in the Tributaries of the Rega River (Northwestern Poland) as a Fish Habitat
Previous Article in Journal
Correction: Ran et al. Effect of Low-Melting-Point Alloys on High-Temperature Hydrolysis Hydrogen Production of Mg-Based Metals. Appl. Sci. 2025, 15, 4437
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Substantiation of Prostate Cancer Risk Calculator Based on Physical Activity, Lifestyle Habits, and Underlying Health Conditions: A Longitudinal Nationwide Cohort Study

College of Liberal Arts, Dankook University, Cheonan-si 31116, Republic of Korea
Appl. Sci. 2025, 15(14), 7845; https://doi.org/10.3390/app15147845 (registering DOI)
Submission received: 16 May 2025 / Revised: 26 June 2025 / Accepted: 7 July 2025 / Published: 14 July 2025

Abstract

Purpose: Despite increasing rates of prostate cancer among men, prostate cancer risk assessments continue to rely on invasive laboratory tests like prostate-specific antigen and Gleason score tests. This study aimed to develop a noninvasive, data-driven risk model for patients to evaluate themselves before deciding whether to visit a hospital. Materials and Methods: To train the model, data from the National Health Insurance Sharing Service cohort datasets, comprising 347,575 individuals, including 1928 with malignant neoplasms of the prostate, 5 with malignant neoplasms of the penis, 18 with malignant neoplasms of the testis, and 14 with malignant neoplasms of the epididymis, were used. The risk model harnessed easily accessible inputs, such as history of treatment for diseases including stroke, heart disease, and cancer; height; weight; exercise days per week; and duration of smoking. An additional 286,727 public datasets were obtained from the National Health Insurance Sharing Service, which included 434 (0.15%) prostate cancer incidences. Results: The risk calculator was built based on Cox proportional hazards regression, and I validated the model by calibration using predictions and observations. The concordance index was 0.573. Additional calibration of the risk calculator was performed to ensure confidence in accuracy verification. Ultimately, the actual proof showed a sensitivity of 60 (60.5) for identifying a high-risk population. Conclusions: The feasibility of the model to evaluate prostate cancer risk without invasive tests was demonstrated using a public dataset. As a tool for individuals to use before hospital visits, this model could improve public health and reduce social expenses for medical treatment.

1. Introduction

The incidence of prostate cancer (PCa), or malignant neoplasms of the prostate, is increasing, and PCa has become the most common cancer in male individuals. Cancer statistics for 2023 revealed that PCa recorded the highest number of estimated new cases (29%) and the second highest number of estimated deaths (11%) in the United States in 2020 [1]. In South Korea, the crude incidence rate of PCa per 100,000 was recorded as 73.1 in 2021 [2,3].
Advances in prostatectomy and radiotherapy have increased long-term survival in patients with PCa, with no significant decline in functional outcomes [4,5]. Disease prevention is the best way to maintain health. To this end, research that harnesses lifestyle and health data plays an important role in informing individuals whether to alter their lifestyle habits or schedule hospital visits, depending on the risk factors.
Before deciding to visit a hospital to obtain a PCa diagnosis, individuals would benefit from access to tools to determine their own health status. Several tools to assess the risk of PCa currently exist, such as the European Randomized Study of Screening for Prostate Cancer Risk Calculator, the Prostate Cancer Prevention Trial Risk Calculator, and the Korean Prostate Cancer Risk Calculator [6,7,8,9]; however, these tools, including the prostate-specific antigen (PSA) level and Gleason scores, mainly utilize clinical data, such as laboratory tests and biopsy results, as risk calculator input [10,11]. As these tools aim to ensure better treatment decisions, they are not suited for daily use by individuals seeking to better understand their risk profile. Previous studies have evaluated the use of a risk calculator for certain prostate cancer cases. Donna et al. attempted to modify the risk calculator [8], while Frederick et al. studied extensive family histories to determine relative risks and reduce the risk of randomized prostate cancer screening tests [12].
This highlights the need to develop tools that can predict the health status and risks of cancers such as PCa, using information that healthy individuals can collect in their daily lives. Such tools can be developed using information related to life histories, such as lifestyle and underlying diseases. For instance, stroke and hypertension could constitute information in the calculator used to check status. These health histories may explain the risk of developing the disease. Additionally, technological advancements have made it easier to collect personal lifestyle information that can be utilized to predict clinical status using various wearable devices [13].
This study proposes a model that can calculate the risk of PCa based on the lifestyle of the public using national cohort data, as well as findings that may help individuals who wish to prioritize personal health management.

2. Materials and Methods

2.1. Data

Koreans receive health checkups once every 2 years based on their birth year through the National Health Insurance Service (NHIS). The NHIS releases health examination data for research purposes through the National Health Insurance Sharing Service (NHISS) Bigdata Platform [14]. Data from the NHISS were used to construct the target cohort (Figure 1).
The sample from the NHISS included 3,480,395 registered participants between 2008 and 2020. The final dataset was filtered from the initial cohort. Finally, 2013 was selected as the baseline for the longitudinal data because the NHISS changed the survey format to include more detailed information in 2013.
After filtering individuals without past complications, 1,482,516 individuals remained. Subsequently, 66,711 individuals diagnosed with cancer before 2013 were excluded from this study. After the 2013 baseline, 22,004 individuals diagnosed with cancer between 2013 and 2014 were filtered for washout periods. Of the remaining 1,393,801 individuals, 1,046,226 with missing values in any of the datasets were excluded. Ultimately, 347,575 individuals constituted the final cohorts.

2.2. Research Design

To train the risk model, observation periods covered 8 of the total 13 years of longitudinal samples. Before the baseline year, 2013, 5 years (2008–2012) was the history-filtering period. Although the observation period included 8 years, the first year (2013–2014) was a washout period for individuals diagnosed with cancer (Figure 2).
To validate the risk model, the public dataset baseline year was 2012, and the history and filtering periods were 10 years (2002–2011). The tracing period from 2012 to 2016 was used to predict the occurrence of prostate cancer (Figure 3).

2.3. Variables

A total of 13 variables were examined: age, body mass index (BMI), waist circumference (WSTC), number of moderate walks per week (PA_MD), number of high-intensity walks per week (PA_VD), number of walks per week (PA_WALK), smoking duration (SMOKE_DRT), stroke (paralysis) diagnosis (STK), heart disease (myocardial infarction/angina) diagnosis (HTDZ), hypertension diagnosis (HTN), diabetes mellitus diagnosis (DM), hyperlipidemia diagnosis (DLD), and other disease conditions including cancer (ETC). These variables were included in the results of the first questionnaire of the NHIS General Examination.

2.4. Definition of Diagnosis Codes

This study focused on PCa, which was defined as C61 according to the International Classification of Diseases (10th revision) codes. Malignant neoplasm of the penis was defined as C60. Malignant neoplasms of the testis were defined as C62. Malignant neoplasms of other unspecified male genital organs were defined as C63.

2.5. Statistical Analysis

For the demographic data (Table 1), the chi-square test (with continuity correction) was used for categorical variables, and one-way analysis of variance was used for continuous variables. AGE, WSTC, PA_WALK, SMOKE_DRT, HTDZ, HTN, and ETC had significant differences in diagnosis code groups. BMI, PA_MD, PA_VD, STK, DM, and DLD had no significant differences in diagnosis code groups.
Cox proportional hazards (PHs) regression for survival data [15] was used to create a risk calculator to measure the probability of 5-year survival without PCa. This study used the R package survival (version 3.5.5) for the Cox PH analysis. The Cox PH is frequently used in multivariate survival analysis in the medical field. Survival analysis uses a hazard model to predict the probability of an event occurring within a certain period. The common hazard model uses a constant baseline hazard number, which represents the hazard of the model when all other covariates are 0. Cox PH replaces the baseline hazard with the baseline hazard function, making it a robust survival analysis method even without a baseline hazard [16]. The survival time to the diagnosis of PCa was the target outcome of the risk calculator from Cox PH. This model analyzed 1928 C61 individuals diagnosed with SICK_CODE (PCa) and 345,610 individuals diagnosed with non-cancers out of a total of 347,538 registered male individuals.
Statistical analysis was conducted in R (version 4.0.4; R Foundation for Statistical Computing, Vienna, Austria). A concordance index (C-index) was used to evaluate the strength of the risk calculator. The C-index measures the strength of a multivariate survival analysis model using a time-to-event dataset. This represents the probability that the predicted and observed survival rates will match [17].

2.6. Substantiation Methods

For the actual proof of the risk calculator, we used another NHISS dataset from the Korean Public Data Portal (https://www.data.go.kr; accessed on 16 May 2025). Health checkup and medical history information datasets from the data portal were used as input data for the risk model.
Health checkup information consisted of basic information (provincial code, sex, age group, etc.) and checkup details (height, weight, blood pressure, blood sugar, total cholesterol, hemoglobin, etc.) of the National Health Insurance subscribers who underwent health checkups in the relevant year. One million annual examinees were randomly selected among employees, dependents aged 20 years or older, household heads, and local subscribers aged 20 years or older who had a history of general health checkups.
The medical history information used was open data consisting of basic information (sex, age group, city/province code, etc.) and medical history (medical department code, main disease code, number of treatment days, total number of prescription days, etc.) for 1 million patients each year who were enrolled in the National Health Insurance Service from 2002 to 2015 and had a medical history from medical institutions (hospitals, clinics, etc.).
To select the validation dataset, a total of 2,116,266 individuals with conditional data, selected between 2002 and 2022, were filtered. First, male-only and baseline years were selected as 2012 because at least 10 years of history from 2002 to 2011 were needed for the input variables, such as stroke, heart disease, and other diseases, including cancer. Individuals with a history of PCa before 2012 were excluded. After the first filtering step, 541,782 participants were selected. Secondly, a 1-year washout period was set. As the baseline year was 2012, the washout period started between 1 January 2012, and 31 December 2012. A year later were 1 January 2013, and 31 December 2013, respectively. Thus, there were washout periods between 2012 and 2013. After the second filtering, 2614 patients with PCa during the washout period were removed, leaving 549,168. The last filter was used for those aged over 45 years. Ultimately, 286,727 participants remained, including 434 patients (Figure 3).
Using records from 286,727 individuals, a history of stroke (STK), heart disease (HTDZ), and other diseases, including cancers other than PCa (ETC), was obtained. Smoking (SMOKE_DRT) records were collected from health checkup smoking questionnaire responses of non-smokers (1), former smokers (2), and smokers (3). Finally, the 10-year consensus weights for the questionnaire responses were used. Because the validation dataset did not have exercise lifestyle records, the exercise variable (PA_VD) was set as 7, which is the largest measure value of the risk calculator input. The classification of the predicted incidence and normal groups was performed using the threshold of the mean of logarithm values from the probabilities of incidence.

3. Results

3.1. Demographic and Health-Related Characteristics

Table 1 presents the demographic and health-related characteristics of the study population stratified by diagnosis code (SICK_CODE) groups (C60, C61, C62, C63, and None) comprising 347,575 individuals, including 1928 with malignant neoplasms of the prostate, 5 with malignant neoplasms of the penis, 18 with malignant neoplasms of the testis, and 14 with malignant neoplasms of the epididymis.
Significant differences were observed in the following variables: age (p < 0.001), with the highest and lowest mean age in the C61 and None groups, respectively; WSTC (p = 0.013), with the highest prevalence in the C60 group; PA_WALK (p = 0.017), with the highest prevalence in the C60 group; SMOKE_DRT (p < 0.001), with the highest prevalence in the C60 group; HTDZ (p < 0.001), with the highest prevalence in the C61 group; general hypertension (p < 0.001), with the highest prevalence in the C60 group; and ETC (p < 0.001), with the highest prevalence in the C61 group. In contrast, no significant differences were observed in BMI (p = 0.393), moderate physical activity (PA_MD) (p = 0.093), vigorous physical activity (PA_VD) (p = 0.074), STK (p = 0.589), DM (p = 0.339), and DLD (p = 0.699). These findings indicate the variability in health characteristics across different SICK_CODE groups, suggesting the need for tailored health interventions.
Table 2 presents a comparison of the datasets for substantial proof of the demographic and clinical characteristics between the two groups. Age was categorized into 5-year age ranges. The chi-square test (with continuity correction) was used for categorical variables, and one-way analysis of variance was used for continuous variables. The majority of participants were aged between 45 and 64 years. BMI, STK, HTDZ, ETC, and SMOKEDRT were not significantly different between the two groups.

3.2. Risk Calculator to Measure the Probability of PCa-Free Survival Observations for 5 Years

In this study, a model was constructed to predict the probability of survival in patients without PCa. From the Cox proportional-hazard model, it showed each variable’s HR (hazard ratio) and CI (confidence intervals). STK showed the highest HR (1.444), followed by ETC (1.041), PA_VD (1.028), AGE (1.026), HTDZ (1.013), BMI (1.009), HTN (1.007), and SMOKE_DRT (1.006) (Table 3). Based on seven input variables, this risk calculator can measure the probabilities of PCa-free survival over 5 years (Figure 4): STK, HTDZ, ETC, age, PA_VD, BMI, and SMOKE_DRT (Table 4).
STK, HTDZ, and ETC are variables that explain past medications or treatments for stroke, heart disease, and other diseases, including cancer. AGE, PA_VD, BMI, and SMOKE_DRT were used as numerical variables. As the other variables in Table 1 had no effect on the risk calculator, seven variables were used as prediction inputs. The output of the risk calculator revealed that as the total score increased, the likelihood of being PCa-free over a 5-year period decreased.
The C-index of the risk calculator was close to 0.6 (0.573). Typically, in survival analyses, the C-index ranges from 0.6 to 0.7. Considering that we used a large public dataset that presents difficulties in creating a predictive model, the current predictive model results can be said to have a high prediction level. Although this study’s result was close to 0.6, an additional validation method is required to evaluate the results of the risk calculator.
For additional validation, the risk calculator was calibrated using predicted and observed outputs (Figure 5). Calibration explains the correctness of the predictions based on real observations. The cross-dotted line indicates the ideal prediction results. The prediction of the risk calculator is represented by black dots with vertical bars reflecting 95% confidence intervals. The predictions of the risk calculators closely matched the ideal outcome.

3.3. Substantiation with Public Big Data Sets

In the external dataset validation, 263 of the 434 individuals with PCa were identified (Table 5). The results indicate that this risk calculator model has a sensitivity of 60% (60.5%), which is a significant predictive power for predicting PCa risk from the input values of the noninvasive health-screening dataset that shows sensitivity 0.605, specificity 0.434, PPV 0.263, and NPV 0.767 (Table 6). The AUC is 0.5201.

4. Discussion

Predicting the risk of diseases such as PCa is important for both patients and clinicians. Patients can access risk measures to understand their health status and decide whether to visit the hospital. Ultimately, risk calculators are expected to help patients to access additional information. Therefore, this study aimed to develop a risk calculator that can assist patients in acquiring knowledge of PCa risk measures before they become serious, based on their environmental conditions, regular lives, and personal histories.
Several models have been developed to measure PCa risk [8,18,19,20]. Roobol et al. developed risk calculators based on prostate volume and digital rectal examinations [19]. Ankerst et al. developed a risk calculator to predict low- and high-grade PCa using biopsy data [8]. Many mobile applications function as PCa risk calculators [18,21,22,23,24,25,26]. These apps measure risk scores based on invasive laboratory results, such as PSA and Gleason scores. He et al. compared several risk calculators to achieve a better performance [27]. Patel et al. compared MRI-based risk calculators for PCa [28]. Research on lifestyle risk factors was also conducted. Kim et al. developed a prostate cancer risk calculator with additional inputs, such as glucose levels, consumption of meat and alcohol, and family history of cancer [29]. They used lifestyle data and a glucose level test, which is an invasive test.
Although previous studies aimed to predict the risk level of clinical outcomes, these risk calculators used clinical laboratory results and imaging data obtained from medical tests, highlighting the importance of predicting PCa risk for patients and clinicians. However, accessing the results from such risk calculators is difficult unless patients can access the clinical test results as inputs.
There were other studies that were conducted to determine the effects of environmental variables on PCa [30,31,32,33,34,35,36]. Vigneswaran et al. studied the association between environmental factors and PCa stage [31]. Youogo et al. revealed that air pollutants are risk factors of PCa [32]. McDonald et al. found that exposure to arsenic and cadmium increased the mortality risk of PCa [30]. These studies have clearly demonstrated that environmental factors affect the risk of developing PCa. The accuracy of risk calculator predictions, even after including environmental factors, can be established by developing novel approaches.
Despite the significant results, this study has some limitations. First, laboratory test results were not used to develop the risk calculator, which explains the likelihood of 5-year PCa-free survival. However, when individuals discover a high likelihood using a risk calculator, they can decide to visit the hospital and undergo further tests to make appropriate treatment decisions. Second, this study used only Korean data, potentially limiting its application and generalization to other countries. As racial differences may exist [37,38,39,40,41], future studies should use data from diverse countries. Third, the risk calculator has a low C-index value. In future studies, it may be useful to refer to the previous study by Kim et al. [29] to improve the performance of the risk calculator, and it may also be useful to use family history data for additional input.

5. Conclusions

Risk calculators can facilitate early diagnoses by encouraging individuals to make earlier hospital visits, ensuring that appropriate treatments are administered in a timely manner. The risk calculator is available for mobile applications for extended usage; however, it gives only suggestions, not diagnoses. The proposed risk calculator does not require any additional clinical tests for input, facilitating individuals with easy access to preclinical prediction results that will help reduce social costs and improve public health and productivity. The risk calculator for prostate cancer proposed in this study was developed through practical verification. It is currently the only model that uses noninvasive data, providing significant practical value.

Funding

The present research was supported by the research fund of Dankook university in 2023.

Institutional Review Board Statement

This study used the public dataset from the National Health Insurance System, which is not individually identifiable, after approval by the Institutional Review Board of Dankook University (DKU2022-06-002; 15 June 2022).

Informed Consent Statement

Patient consent was waived due to this research was using retrospective deidentified dataset which available on the NHIS analysis servers only.

Data Availability Statement

The datasets presented in this article are not readily available because the data used in this study are only available on the NHIS analysis servers. Therefore, the author cannot independently provide these data to researchers.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Siegel, R.L.; Miller, K.D.; Wagle, N.S.; Jemal, A. Cancer statistics, 2023. CA A Cancer J. Clin. 2023, 73, 17–48. [Google Scholar]
  2. National Cancer Information Center. National Cancer Statistics 2021. Available online: https://www.cancer.go.kr/lay1/S1T639C641/contents.do (accessed on 16 May 2025).
  3. Park, E.H.; Jung, K.-W.; Park, N.J.; Kang, M.J.; Yun, E.H.; Kim, H.-J.; Kim, J.-E.; Kong, H.-J.; Im, J.-S.; Seo, H.G. Cancer statistics in Korea: Incidence, mortality, survival, and prevalence in 2021. Cancer Res. Treat Off. J. Korean Cancer Assoc. 2024, 56, 357–371. [Google Scholar]
  4. Resnick, M.J.; Koyama, T.; Fan, K.-H.; Albertsen, P.C.; Goodman, M.; Hamilton, A.S.; Hoffman, R.M.; Potosky, A.L.; Stanford, J.L.; Stroup, A.M. Long-term functional outcomes after treatment for localized prostate cancer. N. Engl. J. Med. 2013, 368, 436–445. [Google Scholar]
  5. Hung, C.-F.; Yang, C.-K.; Ou, Y.-C. Robotic assisted laparoscopic radical prostatectomy following transurethral resection of the prostate: Perioperative, oncologic and functional outcomes. Prostate Int. 2014, 2, 82–89. [Google Scholar] [PubMed]
  6. Park, J.Y.; Yoon, S.; Park, M.S.; Cho, D.-Y.; Park, H.-S.; Moon, D.G.; Yoon, D.K. Initial biopsy outcome prediction in Korean patients-comparison of a noble web-based Korean prostate cancer risk calculator versus prostate-specific antigen testing. J. Korean Med. Sci. 2011, 26, 85. [Google Scholar] [PubMed]
  7. Gómez-Gómez, E.; Carrasco-Valiente, J.; Blanca-Pedregosa, A.; Barco-Sánchez, B.; Fernandez-Rueda, J.L.; Molina-Abril, H.; Valero-Rosa, J.; Font-Ugalde, P.; Requena-Tapia, M.J. European randomized study of screening for prostate cancer risk calculator: External validation, variability, and clinical significance. Urology 2017, 102, 85–91. [Google Scholar]
  8. Ankerst, D.P.; Hoefler, J.; Bock, S.; Goodman, P.J.; Vickers, A.; Hernandez, J.; Sokoll, L.J.; Sanda, M.G.; Wei, J.T.; Leach, R.J. Prostate Cancer Prevention Trial risk calculator 2.0 for the prediction of low-vs high-grade prostate cancer. Urology 2014, 83, 1362–1368. [Google Scholar]
  9. Kinnaird, A.; Brisbane, W.; Kwan, L.; Priester, A.; Chuang, R.; Barsa, D.E.; Delfin, M.; Sisk, A.; Margolis, D.; Felker, E. A prostate cancer risk calculator: Use of clinical and magnetic resonance imaging data to predict biopsy outcome in North American men. Can. Urol. Assoc. J. 2021, 16, E161. [Google Scholar]
  10. Hernandez, D.J.; Nielsen, M.E.; Han, M.; Partin, A.W. Contemporary evaluation of the D’amico risk classification of prostate cancer. Urology 2007, 70, 931–935. [Google Scholar]
  11. Birch, A.; Withington, J.; Kinsella, J.; Acher, P.; Challacombe, B. Use Of The Swop Calculator To Reduce Unnecessary Prostate Biopsies In Men With Elevated Psa: 0524. Int. J. Surg. 2012, 10, S97. [Google Scholar]
  12. Albright, F.; Stephenson, R.A.; Agarwal, N.; Teerlink, C.C.; Lowrance, W.T.; Farnham, J.M.; Albright, L.A.C. Prostate cancer risk prediction based on complete prostate cancer family history. Prostate 2015, 75, 390–398. [Google Scholar] [PubMed]
  13. Mohr, D.C.; Zhang, M.; Schueller, S.M. Personal sensing: Understanding mental health using ubiquitous sensors and machine learning. Annu. Rev. Clin. Psychol. 2017, 13, 23–47. [Google Scholar] [PubMed]
  14. National Health Insurance Sharing Service. National Health Insurance Sharing Service (NHISS) Bigdata Platform. 2017. Available online: https://nhiss.nhis.or.kr/ (accessed on 16 May 2025).
  15. Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B (Methodol.) 1972, 34, 187–202. [Google Scholar]
  16. Fox, J.; Weisberg, S. Cox proportional-hazards regression for survival data. In An R and S-PLUS Companion to Applied Regression; SAGE Publications, Inc.: Thousand Oaks, CA, USA, 2002; Volume 2002. [Google Scholar]
  17. Steck, H.; Krishnapuram, B.; Dehing-Oberije, C.; Lambin, P.; Raykar, V.C. On ranking in survival analysis: Bounds on the concordance index. In Advances In Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2007; Volume 20. [Google Scholar]
  18. Adam, A.; Hellig, J.C.; Perera, M.; Bolton, D.; Lawrentschuk, N. ‘Prostate Cancer Risk Calculator’mobile applications (Apps): A systematic review and scoring using the validated user version of the Mobile Application Rating Scale (uMARS). World J. Urol. 2018, 36, 565–573. [Google Scholar]
  19. Roobol, M.J.; van Vugt, H.A.; Loeb, S.; Zhu, X.; Bul, M.; Bangma, C.H.; van Leenders, A.G.; Steyerberg, E.W.; Schröder, F.H. Prediction of prostate cancer risk: The role of prostate volume and digital rectal examination in the ERSPC risk calculators. Eur. Urol. 2012, 61, 577–583. [Google Scholar]
  20. Wang, A.; Tseng, C.-c.; Rose, H.; Cheng, I.; Wu, A.H.; Haiman, C.A. Ambient air pollution and risk of prostate cancer: The multiethnic cohort study. Cancer Res. 2022, 82 (Suppl. S12), 1437. [Google Scholar]
  21. Pereira-Azevedo, N.; Osório, L.; Fraga, A.; Roobol, M.J. Rotterdam prostate cancer risk calculator: Development and usability testing of the mobile phone app. JMIR Cancer 2017, 3, e6750. [Google Scholar]
  22. Jeong, C.W.; Lee, S.; Jung, J.-W.; Lee, B.K.; Jeong, S.J.; Hong, S.K.; Byun, S.-S.; Lee, S.E. Mobile application-based Seoul National University Prostate Cancer Risk Calculator: Development, validation, and comparative analysis with two Western risk calculators in Korean men. PLoS ONE 2014, 9, e94441. [Google Scholar]
  23. Chen, I.-H.A.; Chu, C.-H.; Lin, J.-T.; Tsai, J.-Y.; Yu, C.-C.; Sridhar, A.N.; Sooriakumaran, P.; Loureiro, R.C.; Chand, M. Prostate Cancer Risk Calculator Apps in a Taiwanese Population Cohort: Validation Study. J. Med. Internet Res. 2020, 22, e16322. [Google Scholar]
  24. De Nunzio, C.; Lombardo, R.; Tema, G.; Cancrini, F.; Russo, G.I.; Chacon, R.; Garcia-Cruz, E.; Ribal, M.J.; Morgia, G.; Alcaraz, A. Mobile phone apps for the prediction of prostate cancer: External validation of the Coral and Rotterdam apps. Eur. J. Surg. Oncol. 2019, 45, 471–476. [Google Scholar]
  25. Røder, M.A.; Berg, K.D.; Loft, M.D.; Thomsen, F.B.; Ferrari, M.; Kurbegovic, S.; Rytgaard, H.C.; Gruschy, L.; Brasso, K.; Gerds, T.A. The CPC risk calculator: A new app to predict prostate-specific antigen recurrence during follow-up after radical prostatectomy. Eur. Urol. Focus 2018, 4, 360–368. [Google Scholar] [PubMed]
  26. De Nunzio, C.; Lombardo, R.; Baldassarri, V.; Cindolo, L.; Bertolo, R.; Minervini, A.; Sessa, F.; Muto, G.; Bove, P.; Vittori, M. Rotterdam mobile phone app including MRI data for the prediction of prostate cancer: A multicenter external validation. Eur. J. Surg. Oncol. 2021, 47, 2640–2645. [Google Scholar] [PubMed]
  27. Nam, R.K.; Kattan, M.W.; Chin, J.L.; Trachtenberg, J.; Singal, R.; Rendon, R.; Klotz, L.H.; Sugar, L.; Sherman, C.; Izawa, J. Prospective multi-institutional study evaluating the performance of prostate cancer risk calculators. J. Clin. Oncol. 2011, 29, 2959–2964. [Google Scholar]
  28. Patel, H.D.; Remmers, S.; Ellis, J.L.; Li, E.V.; Roobol, M.J.; Fang, A.M.; Davik, P.; Rais-Bahrami, S.; Murphy, A.B.; Ross, A.E. Comparison of magnetic resonance imaging–based risk calculators to predict prostate cancer risk. JAMA Netw. Open 2024, 7, e241516. [Google Scholar] [PubMed]
  29. Kim, S.H.; Kim, S.; Joung, J.Y.; Kwon, W.-A.; Seo, H.K.; Chung, J.; Nam, B.-H.; Lee, K.H. Lifestyle risk prediction model for prostate cancer in a Korean population. Cancer Res. Treat. Off. J. Korean Cancer Assoc. 2018, 50, 1194–1202. [Google Scholar]
  30. McDonald, A.C.; Gernand, J.; Geyer, N.R.; Wu, H.; Yang, Y.; Wang, M. Ambient air exposures to arsenic and cadmium and overall and prostate cancer–specific survival among prostate cancer cases in Pennsylvania, 2004 to 2014. Cancer 2022, 128, 1832–1839. [Google Scholar]
  31. Vigneswaran, H.T.; Jagai, J.S.; Greenwald, D.T.; Patel, A.P.; Kumar, M.; Dobbs, R.W.; Moreira, D.M.; Abern, M.R. Association between environmental quality and prostate cancer stage at diagnosis. Prostate Cancer Prostatic Dis. 2021, 24, 1129–1136. [Google Scholar]
  32. Youogo, L.M.-A.K.; Parent, M.-E.; Hystad, P.; Villeneuve, P.J. Ambient air pollution and prostate cancer risk in a population-based Canadian case-control study. Environ. Epidemiol. 2022, 6, e219. [Google Scholar]
  33. Tse, L.A.; Lee, P.M.Y.; Ho, W.M.; Lam, A.T.; Lee, M.K.; Ng, S.S.M.; He, Y.; Leung, K.-s.; Hartle, J.C.; Hu, H. Bisphenol A and other environmental risk factors for prostate cancer in Hong Kong. Environ. Int. 2017, 107, 1–7. [Google Scholar]
  34. Ekman, P.; Grönberg, H.; Matsuyama, H.; Kivineva, M.; Bergerheim, U.S.; Li, C. Links between genetic and environmental factors and prostate cancer risk. Prostate 1999, 39, 262–268. [Google Scholar]
  35. Ferrís-I-Tortajada, J.; Berbel-Tornero, O.; Garcia-i-Castell, J.; López-Andreu, J.; Sobrino-Najul, E.; Ortega-García, J. Non-dietary environmental risk factors in prostate cancer. Actas Urol. Esp. (Engl. Ed.) 2011, 35, 289–295. [Google Scholar]
  36. Ekman, P. Genetic and environmental factors in prostate cancer genesis: Identifying high-risk cohorts. Eur. Urol. 1999, 35, 362–369. [Google Scholar]
  37. He, B.-M.; Chen, R.; Sun, T.-Q.; Yang, Y.; Zhang, C.-L.; Ren, S.-C.; Gao, X.; Sun, Y.-H. Prostate cancer risk prediction models in Eastern Asian populations: Current status, racial difference, and future directions. Asian J. Androl. 2020, 22, 158–161. [Google Scholar] [PubMed]
  38. Di Pietro, G.; Chornokur, G.; Kumar, N.B.; Davis, C.; Park, J.Y. Racial differences in the diagnosis and treatment of prostate cancer. Int. Neurourol. J. 2016, 20, S112. [Google Scholar]
  39. Hinata, N.; Fujisawa, M. Racial differences in prostate cancer characteristics and cancer-specific mortality: An overview. World J. Men’s Health 2022, 40, 217. [Google Scholar]
  40. Hoffman, R.M.; Gilliland, F.D.; Eley, J.W.; Harlan, L.C.; Stephenson, R.A.; Stanford, J.L.; Albertson, P.C.; Hamilton, A.S.; Hunt, W.C.; Potosky, A.L. Racial and ethnic differences in advanced-stage prostate cancer: The Prostate Cancer Outcomes Study. J. Natl. Cancer Inst. 2001, 93, 388–395. [Google Scholar]
  41. Jones, B.A.; Liu, W.-L.; Araujo, A.B.; Kasl, S.V.; Silvera, S.N.; Soler-Vilá, H.; Curnen, M.G.; Dubrow, R. Explaining the race difference in prostate cancer stage at diagnosis. Cancer Epidemiol. Biomark. Prev. 2008, 17, 2825–2834. [Google Scholar]
Figure 1. Sampling dataset from the NHISS Bigdata platform.
Figure 1. Sampling dataset from the NHISS Bigdata platform.
Applsci 15 07845 g001
Figure 2. Longitudinal research cohort design.
Figure 2. Longitudinal research cohort design.
Applsci 15 07845 g002
Figure 3. The selection of validation data from the National Health Insurance Sharing Service (NHISS) health checkup data.
Figure 3. The selection of validation data from the National Health Insurance Sharing Service (NHISS) health checkup data.
Applsci 15 07845 g003
Figure 4. Risk calculator, Cox proportional-hazard regression analysis: probabilities of 5-year overall survival (Prob of 5-year OS) without prostate cancer (PCa).
Figure 4. Risk calculator, Cox proportional-hazard regression analysis: probabilities of 5-year overall survival (Prob of 5-year OS) without prostate cancer (PCa).
Applsci 15 07845 g004
Figure 5. Calibration of risk calculator model.
Figure 5. Calibration of risk calculator model.
Applsci 15 07845 g005
Table 1. Dataset of 2013~2020 from the National Health Insurance Sharing Service, stratified by records of SICK_CODE (C60,C61,C62,C63) * between 2015 and 2020.
Table 1. Dataset of 2013~2020 from the National Health Insurance Sharing Service, stratified by records of SICK_CODE (C60,C61,C62,C63) * between 2015 and 2020.
Diagnosis CodesC60C61C62C63Nonep-Test *
N (Total = 347,575)519281814345,610
SICK_CODE
(%)
5
(0.001)
1928 (0.555)18
(0.005)
14
(0.004)
345,610 (99.435)
AGE
(mean (SD))
58.60 (11.55)59.46
(7.78)
53.56
(7.66)
57.07
(6.57)
52.76
(6.40)
<0.001 ***
BMI
(mean (SD))
25.94
(1.91)
23.86
(2.96)
23.71
(3.50)
24.15
(3.07)
23.95
(2.92)
0.393
WSTC
(mean (SD))
87.20
(5.93)
84.11
(7.67)
81.94
(8.54)
82.50
(6.81)
83.56
(7.51)
0.013 *
PA_MD
(mean (SD))
0.20
(0.45)
1.40
(1.84)
0.61
(1.14)
0.79
(1.05)
1.40
(1.72)
0.093
PA_VD
(mean (SD))
0.00
(0.00)
1.27
(1.72)
0.61
(0.92)
0.71
(0.91)
1.30
(1.63)
0.074
PA_WALK
(mean (SD))
4.20
(3.03)
2.71
(2.43)
1.61
(2.12)
1.71
(1.82)
2.60
(2.30)
0.017 *
SMOKE_DRT
(mean (SD))
30.40 (11.76)30.26 (11.95)24.72 (12.63)21.50 (13.39)25.36
(9.49)
<0.001 ***
STK = 1
(%)
0
(0.0)
15
(0.8)
0
(0.0)
0
(0.0)
1774
(0.5)
0.589
HTDZ = 1
(%)
0
(0.0)
58
(3.0)
0
(0.0)
0
(0.0)
5401
(1.6)
<0.001 ***
HTN = 1
(%)
3
(60.0)
583
(30.2)
3
(16.7)
4
(28.6)
64,065
(18.5)
<0.001 ***
DM = 1
(%)
0
(0.0)
176
(9.1)
1
(5.6)
2
(14.3)
27,714
(8.0)
0.339
DLD = 1
(%)
0
(0.0)
62
(3.2)
0
(0.0)
1
(7.1)
10,066
(2.9)
0.699
ETC = 1
(%)
0
(0.0)
143
(7.4)
0
(0.0)
0
(0.0)
11,382
(3.3)
<0.001 ***
* chisq.test for categorical variables (with continuity correction) and oneway.test for continuous variables (with equal variance assumption, i.e., regular ANOVA). Two-group ANOVA is the equivalent of a t-test. * p < 0.05, *** p < 0.001. * C60: Malignant neoplasm of penis, C61: malignant neoplasm of prostate, C62: malignant neoplasm of testis, C63: malignant neoplasm of the epididymis.
Table 2. Dataset for substantial proof.
Table 2. Dataset for substantial proof.
Normal GroupPCa Groupp-Value
(t-Test)
Number of subjects286,293434
AGE (%) 0.006
45~4960,688 (21.2)86 (19.8)
50~5474,172 (25.9)110 (25.3)
55~5950,187 (17.5)84 (19.4)
60~6442,000 (14.7)40 (9.2)
65~6922,884 (8.0)38 (8.8)
70~7423,744 (8.3)46 (10.6)
75~798273 (2.9)19 (4.4)
80~843702 (1.3)8 (1.8)
85~643 (0.2)3 (0.7)
BMI (mean (SD))24.00 (2.90)24.10 (2.83)0.494
STK = 1 (%)13,151 (4.6)25 (5.8)0.296
HTDZ = 1 (%)16,463 (5.8)24 (5.5)0.925
ETC = 1 (%)273,524 (95.5)416 (95.9)0.842
SMOKE_DRT (mean (SD))12.47 (4.63)12.57 (4.82)0.642
Body mass index (BMI), waist circumference (WSTC), number of moderate walks per week (PA_MD), number of high-intensity walks per week (PA_VD), number of walks per week (PA_WALK), smoking duration (SMOKE_DRT), stroke (paralysis) diagnosis (STK), heart disease (myocardial infarction/angina) diagnosis (HTDZ), hypertension diagnosis (HTN), diabetes mellitus diagnosis (DM), hyperlipidemia diagnosis (DLD), and other disease conditions including cancer (ETC).
Table 3. Hazard ratios and confidence intervals of Cox proportional-hazard regression.
Table 3. Hazard ratios and confidence intervals of Cox proportional-hazard regression.
VariablesHazard Ratios
(Exp(Coef))
Confidence Intervals
(Lower 0.95–Upper 0.95)
HTN1.0070.84991.194
STK1.4440.71452.917
HTDZ1.0130.66431.544
ETC1.0410.78721.375
AGE1.0261.01461.038
PA_VD1.0280.9831.075
BMI1.0090.98221.036
SMOKE_DRT1.0060.99921.014
Table 4. Points of variables to build nomogram.
Table 4. Points of variables to build nomogram.
STKPointsBMIPoints
00150
160207
HTDZPoints2513
003020
1203526
ETCPoints4033
004540
165046
AGEPointsSMOKE_DRTPoints
45000
501153
5522106
6033159
65442012
70562515
75673018
80783521
85894024
901004527
PA_VDPoints5030
005533
126036
23Total PointsProb of 5-year Overall Survival
35
47
591980.4
6101000.6
712390.7
Body mass index (BMI), waist circumference (WSTC), number of moderate walks per week (PA_MD), number of high-intensity walks per week (PA_VD), number of walks per week (PA_WALK), smoking duration (SMOKE_DRT), stroke (paralysis) diagnosis (STK), heart disease (myocardial infarction/angina) diagnosis (HTDZ), hypertension diagnosis (HTN), diabetes mellitus diagnosis (DM), hyperlipidemia diagnosis (DLD), and other disease conditions including cancer (ETC).
Table 5. Confusion matrix of validation.
Table 5. Confusion matrix of validation.
Reference
Normal groupProstate cancer group
PredictionNormal group 124,311171
Prostate cancer group161,982263
Table 6. Statistics from confusion matrix of validation.
Table 6. Statistics from confusion matrix of validation.
DescValue
Accuracy0.434
95% CI(0.432, 0.436)
No Information Rate0.998
p-Value1
Kappa2 × 10−4
Mcnemar’s Test p-Value<2 × 10−16
Sensitivity0.605
Specificity0.434
Pos Pred Value0.263
Neg Pred Value0.767
Prevalence0.250
Detection Rate0.000
Detection Prevalence0.565
Balanced Accuracy0.520
‘Positive’ ClassPCa
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Park, J. Substantiation of Prostate Cancer Risk Calculator Based on Physical Activity, Lifestyle Habits, and Underlying Health Conditions: A Longitudinal Nationwide Cohort Study. Appl. Sci. 2025, 15, 7845. https://doi.org/10.3390/app15147845

AMA Style

Park J. Substantiation of Prostate Cancer Risk Calculator Based on Physical Activity, Lifestyle Habits, and Underlying Health Conditions: A Longitudinal Nationwide Cohort Study. Applied Sciences. 2025; 15(14):7845. https://doi.org/10.3390/app15147845

Chicago/Turabian Style

Park, Jihwan. 2025. "Substantiation of Prostate Cancer Risk Calculator Based on Physical Activity, Lifestyle Habits, and Underlying Health Conditions: A Longitudinal Nationwide Cohort Study" Applied Sciences 15, no. 14: 7845. https://doi.org/10.3390/app15147845

APA Style

Park, J. (2025). Substantiation of Prostate Cancer Risk Calculator Based on Physical Activity, Lifestyle Habits, and Underlying Health Conditions: A Longitudinal Nationwide Cohort Study. Applied Sciences, 15(14), 7845. https://doi.org/10.3390/app15147845

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop