Penalized Spline Estimator for Semiparametric Binary Logistic Regression Model with Application to Coronary Heart Disease Risk Factors
Abstract
1. Introduction
2. Materials and Methods
2.1. Semiparametric Binary Logistic Regression Model
2.2. Truncated Spline Bases
2.3. Penalized Log-Likelihood Method
2.4. Generalized Approximate Cross-Validation (GACV)
3. Results and Discussions
3.1. Estimation of SBLR Model
- Define the initial value and tolerance () as a very small positive number, namely
- Iterate the scoring step for the adjusted dependent variable as follows:
- Update the parameter estimates , then solve the Penalized Weighted Least Squares (PWLS) problem using the adjusted dependent and weights obtained in the previous step:
- Using the new parameter estimates , calculate the new and the predicted probabilities .
- Calculate the Deviance of the model using the updated probabilities. Details of the Deviance will be discussed in Section 3.2.6. Check if the algorithm has converged by comparing the difference in Deviance between iterations against the tolerance , namely:
3.2. Application to Coronary Heart Disease (CHD) Risk Factors
3.2.1. Statistical Description
- (a)
- The most striking difference was seen in the Age variable. Patients in the CHD group had a significantly higher mean age (59.3 years) compared to the Non-CHD group (39.7 years). In addition, the variance in the Non-CHD group (380.25) was much larger than that in the CHD group (104.04), indicating that the healthy control group covered a much wider age range, whereas CHD cases were more concentrated at older ages.
- (b)
- Regarding anthropometric measurements, the CHD group showed a higher average body weight (69.2 kg) compared to the Non-CHD group (63.8 kg). Similarly, the average height was slightly higher in the CHD group (164.4 cm) compared to the Non-CHD group (160.7 cm).
- (c)
- An interesting pattern also emerged regarding dietary habits and psychological factors. The group with coronary heart disease (CHD) recorded a higher average consumption of sugary foods (14.69) compared to the non-CHD group (10.42). Conversely, average consumption of fatty foods was unexpectedly lower in the CHD group (16.73) compared to the non-CHD group (21.37). These counterintuitive findings may be due to dietary adjustments or medical interventions implemented by CHD patients after their diagnosis, or perhaps age-related differences in food metabolism.
- (d)
- In terms of psychological factors, the Non-CHD group showed a slightly higher mean Stress Level score (5.27) compared to the CHD group (4.167). However, the variance for Stress Level was relatively similar between the two groups (11.56 and 12.92, respectively), indicating a consistent distribution of stress scores across the sample population regardless of disease status.
3.2.2. Observed Logit Plot
3.2.3. Eta Correlation Test
3.2.4. Model Specification and Optimal Smoothing Parameters Selection
| Algorithm 1. Selecting Optimal Smoothing Parameters Based on GACV Criterion |
|
3.2.5. Estimation Results
| Algorithm 2. Penalized Spline Estimation for SBLR Model of CHD Data |
|
- (a)
- By assuming that the other variables are constant, we obtain the estimation of nonparametric penalized spline function, , on the first predictor, namely, Body Weight, as follows:
- (b)
- By assuming that the other variables are constant, we obtain the estimation of nonparametric penalized spline function, , on the second predictor, namely, Body Height, as follows:
- (c)
- By assuming that the other variables are constant, we obtain the estimation of nonparametric penalized spline function, , on the third predictor, namely, Sugary Food Consumption, as follows:
- (d)
- By assuming that the other variables are constant, we obtain the estimation of nonparametric penalized spline function, , on the forth predictor, namely Fatty Food Consumption, as follows:
- (e)
- By assuming that the other variables are constant, we obtain the estimation of nonparametric penalized spline function, , on the fifth predictor, namely Stress Level, as follows:
- (a)
- For the Age (X) factor, the OR for the variable is By assuming that other predictor variables remain constant, this OR shows that the probability of a patient developing coronary heart disease (CHD) increases by 1.1457 times when the patient’s age increases by one year.
- (b)
- For the Body Weight factor:
- -
- By assuming that other predictor variables remain constant, for body weight below 63 kg, the OR shows that the probability of a patient developing coronary heart disease (CHD) decreases by 0.8418 times when the patient’s body weight increases by 1 kg.
- -
- By assuming that other predictor variables remain constant, for body weight between 63 kg and 71 kg, the OR shows that the probability of a patient developing coronary heart disease (CHD) increases by 1.8789 times when the patient’s body weight increases by 1 kg.
- -
- By assuming that other predictor variables remain constant, for body weight more than 71 kg, the OR shows that the probability of the patient getting coronary heart disease (CHD) increases 1.1319 times when the patient’s body weight increases by 1 kg.
- (c)
- For the Body Height factor:
- -
- By assuming that the other predictor variables remain constant, for height less than 159 cm, the OR shows that the probability of a patient developing coronary heart disease (CHD) increases by 1.0059 times when the patient’s height increases by 1 cm.
- -
- By assuming that the other predictor variables remain constant, for heights between 159 cm and 162.5 cm, the OR shows that the probability of a patient developing coronary heart disease (CHD) increases by 1.2201 times when the patient’s height increases by 1 cm.
- -
- By assuming that the other predictor variables remain constant, for heights between 162.5 cm and 168 cm, the OR shows that the probability of a patient developing coronary heart disease (CHD) decreases by 0.6331 times when the patient’s height increases by 1 cm.
- -
- By assuming that the other predictor variables remain constant, for heights greater than 168 cm, the OR shows that the probability of a patient developing coronary heart disease (CHD) increases by 1.1847 times when the patient’s height increases by 1 cm.
- (d)
- For the Sugary Food Consumption factor:
- -
- By assuming that the other predictor variables remain constant, for the consumption of sugary foods less than 4.34 g, the OR shows that the possibility of a patient developing coronary heart disease (CHD) decreases by 0.6925 times when the patient’s sugar food consumption increases by 1 g.
- -
- By assuming that the other predictor variables remain constant, for sugary food consumption between 4.34 g and 8.40 g, the OR shows that the probability of a patient developing coronary heart disease (CHD) increases 1.3766 times when the patient’s sugar food consumption increases by 1 g.
- -
- By assuming that the other predictor variables remain constant, for sugary food consumption between 8.40 g and 18.83 g, the OR shows that the probability of a patient developing coronary heart disease (CHD) increases by 1.2306 times when the patient’s sugar food consumption increases by 1 g.
- -
- By assuming that the other predictor variables remain constant, for the consumption of sugary foods of more than 18.83 g, the OR shows that the possibility of a patient developing coronary heart disease (CHD) increases by 1.3559 times when the patient’s sugar food consumption increases by 1 g.
- (e)
- For the Fatty Food Consumption factor:
- -
- By assuming that the other predictor variables remain constant, for fatty food consumption of less than 17.29 g, the OR shows that the possibility of a patient developing coronary heart disease (CHD) decreases by 0.8418 times when the patient’s fatty food consumption increases by 1 g.
- -
- By assuming that the other predictor variables remain constant, for fatty food consumption of more than 17.29 g, the OR shows that the probability of a patient developing coronary heart disease (CHD) increases 1.1385 times when the patient’s fatty food consumption increases by 1 g.
- (f)
- For the Stress Level factor:
- -
- By assuming that the other predictor variables remain constant, for stress levels below 2 (i.e., Normal category), the OR shows that the probability of a patient developing coronary heart disease (CHD) decreases by 0.5085 times when the patient’s stress level increases by one point.
- -
- By assuming that the other predictor variables remain constant, for stress levels between 2 and 5 (i.e., the Mild Stress category), the OR shows that the probability of a patient developing coronary heart disease (CHD) decreases by 0.5748 times when the patient’s stress level increases by one point.
- -
- By assuming that the other predictor variables remain constant, for stress levels between 5 and 7 (i.e., the Moderate Stress category), the OR shows that the probability of a patient developing coronary heart disease (CHD) increases 3.0535-fold when the patient’s stress level increases by one point.
- -
- By assuming that the other predictor variables remain constant, for stress levels greater than 7 (i.e., the Severe Stress category), the OR shows that the probability of a patient developing coronary heart disease (CHD) increases by 1.4853 times when the patient’s stress level increases by one point.
- (a)
- On the predictor of parametric component, namely Age, based on Equation (42), the value of for is:
- (b)
- On the first predictor of nonparametric component, namely Body Weight. Based on Equation (43), is included in the criteria , so the penalized spline value for estimated nonparametric function is:
- (c)
- On the second predictor of nonparametric component, namely Body Height. Based on Equation (44), is included in the criteria , so the penalized spline value for estimated nonparametric function is:
- (d)
- On the third predictor of nonparametric component, namely Sugary Food Consumption. Based on Equation (45), is included in the criteria , so the penalized spline value for estimated nonparametric function is:
- (e)
- On the fourth predictor of nonparametric component, namely Fatty Food Consumption. Based on Equation (46), is included in the criteria , so the penalized spline value for estimated nonparametric function is:
- (f)
- On the fifth predictor of nonparametric component, namely Stress Level. Based on Equation (47), , is included in the criteria , so the penalized spline value for estimated nonparametric function is:
3.2.6. Model Evaluation and Comparison
- (a)
- Deviance Test (Goodness-of-Fit). We utilized the Deviance statistic to formally test the simultaneous significance of the parameters. The results confirm that the proposed SBLR model fits the data significantly better than the null model.
- (b)
- Press’s Q Statistic. To validate the predictive accuracy, we employed the Press’s Q test. The calculated value exceeds the critical value, statistically confirming that the classification accuracy of our model is significantly better than chance.
- (c)
- McNemar’s Test. To demonstrate robustness and superiority over baseline methods, we conducted McNemar’s test. This test statistically confirms that the difference in prediction accuracy between the proposed SBLR model and the standard parametric logistic regression is significant, not merely due to random variation.
3.2.7. Limitations and Future Work
- (a)
- Computational Complexity. The selection of the optimal smoothing parameter (λ) for multiple predictors can be computationally intensive compared to simple parametric models.
- (b)
- Data Constraints. The study relies on cross-sectional data, which limits causal inference compared to longitudinal data. For future work, we propose extending this estimator to Generalized Additive Models (GAM) for high-dimensional data and applying it to longitudinal datasets to observe risk factor progression over time.
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| PS-SBLR | Penalized Spline–Semiparametric Binary Logistic Regression |
| PS-NBLR | Penalized Spline–Nonparametric Binary Logistic Regression |
| BS | Brier Score |
| BSS | Brier Skill Score |
| GACV | Generalized Approximate Cross-Validation |
| BLR | Binary Logistic Regression |
| GLM | Generalized Linear Model |
| GAM | Generalized Additive Model |
| GAPLM | Generalized Additive Partial Linear Model |
References
- Bewick, V.; Cheek, L.; Ball, J. Statistics Review 14: Logistic Regression. Crit. Care 2005, 9, 112–118. [Google Scholar] [CrossRef]
- Nick, T.G.; Campbell, K.M. Logistic Regression. Methods Mol. Biol. 2007, 404, 273–301. [Google Scholar] [CrossRef]
- Domínguez-Almendros, S.; Benítez-Parejo, N.; Gonzalez-Ramirez, A.R. Logistic Regression Models. Allergol. Immunopathol. 2011, 39, 295–305. [Google Scholar] [CrossRef] [PubMed]
- Stoltzfus, J.C. Logistic Regression: A Brief Primer. Acad. Emerg. Med. 2011, 18, 1099–1104. [Google Scholar] [CrossRef]
- Kleinbaum, D.G.; Klein, M. Logistic Regression: A Self-Learning Text, 3rd ed.; Springer Nature: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
- Hosmer, D.W., Jr.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Sperandei, S. Understanding Logistic Regression Analysis. Biochem. Med. 2014, 24, 12–18. [Google Scholar] [CrossRef]
- Ranganathan, P.; Pramesh, C.S.; Aggarwal, R. Common Pitfalls in Statistical Analysis: Logistic Regression. Perspect. Clin. Res. 2017, 8, 148–151. [Google Scholar] [CrossRef] [PubMed]
- Wang, Q.Q.; Yu, S.C.; Qi, X.; Hu, Y.H.; Zheng, W.J.; Shi, J.X.; Yao, H.Y. Overview of Logistic Regression Model Analysis and Application. Chin. J. Prev. Med. 2019, 53, 955–960. [Google Scholar] [CrossRef]
- Yu, X.; Li, S.; Chen, J. A Three-parameter Logistic Regression Model. Stat. Theory Relat. Fields 2021, 5, 265–274. [Google Scholar] [CrossRef]
- Castro, H.M.; Ferreira, J.C. Linear and Logistic Regression Models: When to Use and How to Interpret them? J. Bras. Pneumol. 2022, 48, e20220439. [Google Scholar] [CrossRef]
- Wang, T.; Tang, W.; Lin, Y.; Su, W. Semi-Supervised Inference for Nonparametric Logistic Regression. Stats. Med. 2023, 42, 2573–2589. [Google Scholar] [CrossRef]
- Yara, A.; Terada, Y. Nonparametric Logistic Regression with Deep Learning. Bernoulli 2026, 32, 952–977. [Google Scholar] [CrossRef]
- Bonnini, S.; Borghesi, M. Nonparametric Test for Logistic Regression with Application to Italian Enterprises’ Propensity for Innovation. Mathematics 2024, 12, 2955. [Google Scholar] [CrossRef]
- Kima, S.; Bak, K.-Y. Nonparametric Logistic Regression Based on Sparse Triangulation Over a Compact Domain. Commun. Stat. Appl. Methods 2024, 31, 557–569. [Google Scholar] [CrossRef]
- Ali, T.H. Modification of the Adaptive Nadaraya-Watson Kernel Method for Nonparametric Regression (Simulation Study). Commun. Stat.-Simul. Comput. 2022, 51, 391–403. [Google Scholar] [CrossRef]
- Linke, Y.; Borisov, I.S.; Ruzankin, P.; Kutsenko, V.A.; Yarovaya, E.; Shalnova, S.A.S. Universal Local Linear Kernel Estimators in Nonparametric Regression. Mathematics 2022, 10, 2693. [Google Scholar] [CrossRef]
- Chamidah, N.; Lestari, B.; Larasati, T.N.; Muniroh, L. Designing Z-Score Standard Growth Charts Based on Height-for-Age of Toddlers Using Local Linear Estimator for Determining Stunting. AIP Conf. Proc. 2024, 3083, 030002. [Google Scholar] [CrossRef]
- Aydin, D.; Chamidah, N.; Lestari, B.; Mohammad, S.; Yilmaz, E. Local Polynomial Estimation for Multi-response Semiparametric Regression Models with Right Censored Data. Commun. Stats. Simul. Comput. 2025, 1–32. [Google Scholar] [CrossRef]
- Utami, T.W.; Chamidah, N.; Saifudin, T.; Lestari, B.; Aydin, D. Estimation of Biresponse Semiparametric Regression Model for Longitudinal Data Using Local Polynomial Kernel Estimator. Symmetry 2025, 17, 392. [Google Scholar] [CrossRef]
- Islamiyati, A.; Kalondeng, A.; Sunusi, N.; Zakir, M.; Amir, A.K. Biresponse Nonparametric Regression Model in Principal Component Analysis with Truncated Spline Estimator. J. King Saud. Univ.-Sci. 2022, 34, 101892. [Google Scholar] [CrossRef]
- Chamidah, N.; Lestari, B.; Susilo, H.; Dewi, T.K.; Saifudin, T.; Siregar, N.R.A.A.; Aydin, D. Modeling Coronary Heart Disease Risk Based on Age, Fatty Food Consumption and Anxiety Factors Using Penalized Spline Nonparametric Logistic Regression. MethodsX 2025, 14, 103320. [Google Scholar] [CrossRef]
- Chamidah, N.; Lestari, B.; Wulandari, A.Y.; Muniroh, L. Z-Score Standard Growth Chart Design of Toddler Weight Using Least Square Spline Semiparametric Regression. AIP Conf. Proc. 2021, 2329, 060031. [Google Scholar] [CrossRef]
- Lestari, B.; Chamidah, N.; Aydin, D.; Yilmaz, E. Reproducing Kernel Hilbert Space Approach to Multiresponse Smoothing Spline Regression Function. Symmetry 2022, 14, 2227. [Google Scholar] [CrossRef]
- Lestari, B.; Chamidah, N.; Budiantara, I.N.; Aydin, D. Determining Confidence Interval and Asymptotic Distribution for Parameters of Multiresponse Semiparametric Regression Model Using Smoothing Spline Estimator. J. King Saud. Univ.-Sci. 2023, 35, 102664. [Google Scholar] [CrossRef]
- Wang, Y. Smoothing Splines: Methods and Applications; CRC Press: New York, NY, USA, 2011. [Google Scholar]
- Zulfadhli, M.; Budiantara, I.N.; Ratnasari, V. Nonparametric Regression Estimator of Multivariable Fourier Series for Categorical Data. MethodsX 2024, 13, 102983. [Google Scholar] [CrossRef]
- Chamidah, N.; Lestari, B.; Budiantara, I.N.; Aydin, D. Estimation of Multiresponse Multipredictor Nonparametric Regression Model Using Mixed Estimator. Symmetry 2024, 16, 386. [Google Scholar] [CrossRef]
- Alswaitti, M.; Siddique, K.; Jiang, S.; Alomoush, W.; Alrosan, A. Dimensionality Reduction, Modelling, and Optimization of Multivariate Problems Based on Machine Learning. Symmetry 2022, 14, 1282. [Google Scholar] [CrossRef]
- Reddy, T.A.; Henze, G.P. Parametric and Non-Parametric Regression Methods: Applied Data Analysis and Modeling for Energy Engineers and Scientists; Springer: Berlin/Heidelberg, Germany, 2023; pp. 355–407. [Google Scholar] [CrossRef]
- Carroll, R.J.; Wand, M.P. Semiparametric Estimation in Logistic Measurement Error Models. J. R. Stat. Soc. Ser. B (Methodol.) 1991, 53, 573–585. [Google Scholar] [CrossRef]
- Fang, F.; Li, J.; Xia, X. Semiparametric Model Averaging Prediction for Dichotomous Response. J. Econom. 2020, 229, 219–245. [Google Scholar] [CrossRef]
- Hesamian, G.; Akbari, M.G. Semi-parametric Partially Logistic Regression Model with Exact Inputs and Intuitionistic Fuzzy Outputs. Appl. Soft Comput. 2017, 58, 517–526. [Google Scholar] [CrossRef]
- Razzaq, A.; Shemaila, H.A. Fuzzy Semi-Parametric Logistic Quantile Regression Model. Wasit J. Pure Sci. 2023, 2, 184–194. [Google Scholar] [CrossRef]
- Breslow, N.E.; Robins, J.M.; Wellner, J.A. On the Semi-parametric Efficiency of Logistic Regression Under Case-Control Sampling. Bernoulli 2000, 6, 447–455. [Google Scholar] [CrossRef]
- Zheng, X.; Rong, Y.; Liu, L.; Cheng, W. A More Accurate Estimation of Semiparametric Logistic Regression. Mathematics 2021, 9, 2376. [Google Scholar] [CrossRef]
- Mullah, M.; Hanley, J.A.; Benedetti, A. LASSO Type Penalized Spline Regression for Binary Data. BMC Med. Res. Methodol. 2021, 21, 83. [Google Scholar] [CrossRef] [PubMed]
- Yu, J.; Shi, J.; Liu, A.; Wang, Y. Smoothing Spline Semiparametric Density Models. J. Am. Stat. Assoc. 2022, 117, 237–250. [Google Scholar] [CrossRef]
- Maharani, A.; Tampubolon, G. Unmet Needs for Cardiovascular Care in Indonesia. PLoS ONE 2014, 9, e105831. [Google Scholar] [CrossRef]
- Savira, F.; Wang, B.; Kompa, A.R.; Ademi, Z.; Owen, A.; Liew, D.; Zomer, E. The Impact of Coronary Heart Disease Prevention on Work Productivity: A 10-Year Analysis. Eur. J. Prev. Cardiol. 2021, 28, 418–425. [Google Scholar] [CrossRef]
- WHO. Non-Communicable Diseases Country Profiles 2018; World Health Organization (WHO): Geneva, Switzerland, 2018; Available online: https://iris.who.int/handle/10665/274512 (accessed on 15 December 2025).
- Kementerian Kesehatan RI. Laporan Nasional Riskesdas 2018; Kementerian Kesehatan RI: Jakarta, Indonesia, 2018. Available online: https://repository.kemkes.go.id/book/1323 (accessed on 20 December 2025).
- Hosseini, K.; Mortazavi, S.H.; Sadeghian, S.; Ayati, A.; Nalini, M.; Aminorroaya, A.; Tavolinejad, H.; Salarifar, M.; Pourhosseini, H.; Aein, A. Prevalence and Trends of Coronary Artery Disease Risk Factors and Their Effect on Age of Diagnosis in Patients with Established Coronary Artery Disease: Tehran Heart Center (2005–2015). BMC Cardiovasc. Disord. 2021, 21, 477. [Google Scholar] [CrossRef]
- Lee, Y.T.H.; Fang, J.; Schieb, L.; Park, S.; Casper, M.; Gillespie, C. Prevalence and Trends of Coronary Heart Disease in the United States, 2011 to 2018. JAMA Cardiol. 2022, 7, 459–462. [Google Scholar] [CrossRef] [PubMed]
- Rethemiotaki, I. Global Prevalence of Cardiovascular Diseases by Gender and Age during 2010–2019. Arch. Med. Sci. Atheroscler. Dis. 2023, 8, e196. [Google Scholar] [CrossRef] [PubMed]
- Meyer, J.F.; Larsen, S.B.; Blond, K.; Damsgaard, C.T.; Bjerregaard, L.G.; Baker, J.L. Associations Between Body Mass Index and Height during Childhood and Adolescence and the Risk of Coronary Heart Disease in Adulthood: A Systematic Review and Meta-Analysis. Obes. Rev. 2021, 22, e13276. [Google Scholar] [CrossRef]
- Juzar, D.A. Pedoman Tatalaksana Sindroma Koroner Akut; Perhimpunan Dokter Spesialis Kardiovaskular Indonesia (PERKI): Jakarta, Indonesia, 2024. [Google Scholar]
- Temple, N.J. Fat, Sugar, Whole Grains and Heart Disease: 50 Years of Confusion. Nutrients 2018, 10, 39. [Google Scholar] [CrossRef]
- Wirtz, P.H.; Von Känel, R. Psychological Stress, Inflammation, and Coronary Heart Disease. Curr. Cardiol. Rep. 2017, 19, 111. [Google Scholar] [CrossRef]
- Marra, G.; Radice, R. Penalised Regression Splines: Theory and Application to Medical Research. Stat. Methods Med. Res. 2010, 19, 107–125. [Google Scholar] [CrossRef]
- Harrell, F.E., Jr. Regression Modeling Strategies with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis, 2nd ed.; Springer International Publishing: Cham, Switzerland, 2015. [Google Scholar]
- Gauthier, J.; Wu, Q.V.; Gooley, T.A. Cubic Splines to Model Relationships Between Continuous Variables and Outcomes: A Guide for Clinicians. Bone Marrow Transpl. 2020, 55, 675–680, Erratum in Bone Marrow Transpl. 2023, 58, 962. https://doi.org/10.1038/s41409-023-01993-7. [Google Scholar] [CrossRef] [PubMed]
- Saveliev, A.A.; Galeeva, E.V.; Semanov, D.A.; Galeev, R.R.; Aryslanov, I.R.; Falaleeva, T.S.; Davletshin, R.R. Adaptive Noise Model Based Iteratively Reweighted Penalized Least Squares for Fluorescence Background Subtraction from Raman Spectra. J. Raman Spectrosc. 2022, 53, 247–255. [Google Scholar] [CrossRef]
- Liu, R.; Härdle, W.K. Statistical Inference for Generalized Additive Partially Linear Model. J. Multivar. Anal. 2017, 162, 1–15. [Google Scholar] [CrossRef]
- Ma, S.; Kosorok, M.R. Penalized Log-Likelihood Estimation for Partly Linear Transformation Models with Current Status Data. Ann. Stats. 2005, 33, 2256–2290. [Google Scholar] [CrossRef]
- Cole, S.R.; Chu, H.; Greenland, S. Maximum Likelihood, Profile Likelihood, and Penalized Likelihood: A Primer. Am. J. Epidemiol. 2013, 179, 252–260. [Google Scholar] [CrossRef]
- Manghi, R.F.; Cysneiros, F.J.A.; Paula, G.A. Generalized Additive Partial Linear Models for Analyzing Correlated Data. Comput. Stat. Data Anal. 2019, 129, 47–60. [Google Scholar] [CrossRef]
- Xiang, D.; Wahba, G. A Generalized Approximate Cross Validation for Smoothing Splines with Non-Gaussian Data. Stat. Sin. 1996, 6, 675–692. Available online: https://www3.stat.sinica.edu.tw/statistica/oldpdf/A6n312.pdf (accessed on 20 December 2025).
- Uçar, M.K. Eta Correlation Coefficient Based Feature Selection Algorithm for Machine Learning: E-Score Feature Selection Algorithm. J. Intell. Syst. Theory Appl. 2019, 2, 7–12. [Google Scholar] [CrossRef]
- Ruopp, M.D.; Perkins, N.J.; Whitcomb, B.W.; Schisterman, E.F. Youden Index and Optimal Cut-Point Estimated from Observations Affected by a Lower Limit of Detection. Biom. J. 2008, 50, 419–430. [Google Scholar] [CrossRef]
- Harris, J.K. Primer on Binary Logistic Regression. Fam. Med. Community Health 2021, 9, e001290. [Google Scholar] [CrossRef]
- Western, B. Concepts and Suggestions for Robust Regression Analysis. Amer. J. Political Sci. 1995, 39, 786–817. [Google Scholar] [CrossRef]
- Taneichi, N.; Sekiya, Y.; Toyama, J. Improved Transformed Deviance Statistic for Testing a Logistic Regression Model. J. Multivar. Anal. 2011, 102, 1263–1279. [Google Scholar] [CrossRef]
- Shen, J.; He, X. Generalized F Test and Generalized Deviance Test in Two-Way ANOVA Models for Randomized Trials. J. Biopharm. Stat. 2014, 24, 523–534. [Google Scholar] [CrossRef] [PubMed]
- Hardle, W.K.; Huang, L.-S. Analysis of Deviance for Hypothesis Testing in Generalized Partially Linear Models. J. Bus. Econ. Stat. 2017, 37, 322–333. [Google Scholar] [CrossRef]
- Maroco, J.; Silva, D.; Rodrigues, A.; Guerreiro, M.; Santana, I.; De Mendonça, A. Data Mining Methods in the Prediction of Dementia: A Real-Data Comparison of the Accuracy, Sensitivity and Specificity of Linear Discriminant Analysis, Logistic Regression, Neural Networks, Support Vector Machines, Classification Trees and Random Forests. BMC Res. Notes 2011, 4, 299. [Google Scholar] [CrossRef]
- Veronese, G.; Pepe, A.; Giordano, F. Child Psychological Adjustment to War and Displacement: A Discriminant Analysis of Resilience and Trauma in Syrian Refugee Children. J. Child. Fam. Stud. 2021, 30, 2575–2588, Erratum in J. Child. Fam. Stud. 2022, 31, 337. https://doi.org/10.1007/s10826-021-02118-8. [Google Scholar] [CrossRef]
- Ruiz, E.D.; González, F.J.N.; Jurado, J.M.L.; Arbulu, A.A.; Bermejo, J.V.D.; Ariza, A.G. Effects of Supplementation of Different Antioxidants to Cryopreservation Extender on the Post-Thaw Quality of Rooster Semen—A Meta-Analysis. Animals 2024, 14, 2936. [Google Scholar] [CrossRef]
- Ruiz, E.D.; Bermejo, J.V.D.; Ariza, A.G.; Jurado, J.M.L.; Arbulu, A.A.; González, F.J.N. Effects of Meteorology and Lunar cycle on the Post-Thawing Quality of Avian Sperm. Front. Vet. Sci. 2024, 11, 1394004. [Google Scholar] [CrossRef]
- Levin, J.R.; Serlin, R.C. Changing Students’ Perspectives of McNemar’s Test of Change. J. Stats. Educ. 2000, 8, 1–9. [Google Scholar] [CrossRef]
- Simion, C.; Borza, S.I. Inspection Performance’s Estimation Using McNemar Statistical Test. In Proceedings of the 2005 WSEAS Int. Conf. on Dynamical Systems and Control, Venice, Italy, 2–4 November 2005; pp. 131–135. [Google Scholar]
- Fisher, M.J.; Marshall, A.P.; Mitchell, M. Testing Differences in Proportions. Aust. Crit. Care 2011, 24, 133–138. [Google Scholar] [CrossRef] [PubMed]
- Adedokun, O.A.; Burgess, W.D. Analysis of Paired Dichotomous Data: A Gentle Introduction to the McNemar Test in SPSS. J. Multidiscip. Eval. 2012, 8, 125–131. [Google Scholar] [CrossRef]
- Fagerland, M.W.; Lydersen, S.; Laake, P. The McNemar Test for Binary Matched-Pairs Data: Mid-p and Asymptotic are Better Than Exact Conditional. BMC Med. Res. Methodol. 2013, 13, 91. [Google Scholar] [CrossRef] [PubMed]
- Smith, M.Q.R.P.; Ruxton, G.D. Effective Use of the McNemar Test. Behav. Ecol. Sociobiol. 2020, 74, 133. [Google Scholar] [CrossRef]
- Wu, Y. Weighted McNemar’s Test for the Comparison of Two Screening Tests in the Presence of Verification Bias. Stat. Med. 2022, 41, 3149–3163. [Google Scholar] [CrossRef]
- Roulston, M.S. Performance Targets and the Brier Score. Meteorol. Appl. 2007, 14, 105–207. [Google Scholar] [CrossRef]
- Weigel, A.P.; Liniger, M.A.; Appenzeller, C. The Discrete Brier and Ranked Probability Skill Scores. Mon. Weather Rev. 2007, 135, 118–124. [Google Scholar] [CrossRef]
- Ferro, C.A.T.; Fricker, T.E. A Bias-Corrected Decomposition of the Brier Score. Q. J. R. Meteorol. Soc. 2012, 138, 1954–1960. [Google Scholar] [CrossRef]
- Assel, M.; Sjoberg, D.D.; Vickers, A.J. The Brier Score Does Not Evaluate the Clinical Utility of Diagnostic Tests or Prediction Models. Diagn. Progn. Res. 2017, 1, 19. [Google Scholar] [CrossRef]
- Yang, W.; Jiang, J.; Schnellinger, E.M.; Kimmel, S.E.; Guo, W. Modified Brier Score for Evaluating Prediction Accuracy for Binary Outcomes. Stat. Methods Med. Res. 2022, 31, 2287–2296. [Google Scholar] [CrossRef]
- Patel, A.R.; Dhingra, A.; Liggesmeyer, P. Leveraging the Brier Score to Enhance Predictive Accuracy in Learning-Based Risk Assessment. In Proceedings of the 8th International Conference on System Reliability and Safety (ICSRS), Sicily, Italy, 20–22 November 2024; pp. 418–425. [Google Scholar] [CrossRef]
- Zhu, K.; Zheng, Y.; Chan, K.C.G. Weighted Brier Score—An Overall Summary Measure for Risk Prediction Models with Clinical Utility Consideration. Stat. Biosci. 2025, 1–29, Erratum in Stats. Biosci. 2025. https://doi.org/10.1007/s12561-025-09507-3. [Google Scholar] [CrossRef] [PubMed]
- Hoessly, L. On Misconceptions about the Brier Score in Binary Prediction Models. Glob. Epidemiol. 2026, 11, 100242. [Google Scholar] [CrossRef] [PubMed]
- Bradley, A.A.; Schwartz, S.S.; Hashino, T. Sampling Uncertainty and Confidence Intervals for the Brier Score and Brier Skill Score. Weather Forecast. 2008, 23, 992–1006. [Google Scholar] [CrossRef]
- Hoss, F.; Fischbeck, P.S. Performance and Robustness of Probabilistic River Forecasts Computed with Quantile Regression Based on Multiple Independent Variables in the North Central USA. Hydrol. Earth Syst. Sci. Discuss. 2014, 11, 11281–11333. [Google Scholar] [CrossRef]
- Bosboom, J.; Reniers, A. The Deceptive Simplicity of the Brier Skill Score. Coast. Eng. Environ. Fluid. Mech. 2017, 22, 1639–1663. [Google Scholar] [CrossRef]
- Li, Y.; Wang, X.; Zhao, B.; Ying, M.; Liu, Y.; Vitart, F. Using the Debiased Brier Skill Score to Evaluate S2S Tropical Cyclone Forecasting. J. Mar. Sci. Eng. 2025, 13, 1035. [Google Scholar] [CrossRef]
- Steyerberg, E.W. Statistical Models for Prediction. In Clinical Prediction Models; Statistics for Biology and Health; Springer International Publishing: Cham, Switzerland, 2019; pp. 59–93. ISBN 978-3-030-16398-3. [Google Scholar]
- Steyerberg, E.W. Clinical Prediction Models: A Practical Approach to Development, Validation and Updating. Biom. J. 2020, 62, 1122–1123. [Google Scholar] [CrossRef]
- Wibowo, W.; Amelia, R.; Octavia, F.A.; Wilantari, R.N. Classification Using Nonparametric Logistic Regression for Predicting Working Status. AIP Conf. Proc. 2021, 2329, 060032. [Google Scholar] [CrossRef]



| Variable | Status | Minimum | Maximum | Mean | Variance |
|---|---|---|---|---|---|
| Age | Non-CHD | 18 | 78 | 39.7 | 380.25 |
| CHD | 31 | 74 | 59.3 | 104.04 | |
| Body Weight | Non-CHD | 36 | 97 | 63.8 | 176.89 |
| CHD | 38 | 100 | 69.2 | 141.61 | |
| Body Height | Non-CHD | 147 | 178 | 160.7 | 60.83 |
| CHD | 155 | 178 | 164.4 | 30.58 | |
| Sugary Food Consumption | Non-CHD | 0.63 | 32.27 | 10.42 | 68.66 |
| CHD | 1.57 | 40.29 | 14.69 | 125.02 | |
| Fatty Food Consumption | Non-CHD | 2.84 | 54.64 | 21.37 | 164.16 |
| CHD | 2.52 | 42.93 | 16.73 | 96.78 | |
| Stress Level | Non-CHD | 0 | 14 | 5.27 | 11.56 |
| CHD | 0 | 12 | 4.167 | 12.92 |
| Variable | Value of | Value of | p-Value | Decision |
|---|---|---|---|---|
| Age | 0.541 | 0.293 | 0.000 | Reject H0 (Linear) |
| Body Weight | 0.191 | 0.036 | 0.131 | Not Reject H0 (Non-Linear) |
| Body Height | 0.246 | 0.060 | 0.051 | Not Reject H0 (Non-Linear) |
| Sugary Food Consumption | 0.179 | 0.032 | 0.155 | Not Reject H0 (Non-Linear) |
| Fatty Food Consumption | 0.193 | 0.037 | 0.126 | Not Reject H0 (Non-Linear) |
| Stress level | 0.103 | 0.011 | 0.416 | Not Reject H0 (Non-Linear) |
| Predictor Variable | The Number of Knots | Knot Point | Smoothing | Minimum GACV Value |
|---|---|---|---|---|
| Body Weight | 2 | 63; 71 | 0.045 | 0.6377 |
| Body Height | 3 | 159; 162.5; 168 | 0.06 | 0.6552 |
| Sugary Food Consumption | 3 | 4.34; 8.40; 18.83 | 0.078 | 0.6730 |
| Fatty Food Consumption | 1 | 17.29 | 2 | 0.6728 |
| Stress Level | 3 | 2; 5; 7 | 0.0011 | 0.6641 |
| Predictor Variable | Knot Points | Parameter Estimation | Odds Ratio [OR] | Risk Effect of CHD |
|---|---|---|---|---|
| Age ()/(Year) | - | 0.1360 | 1.1457 | Increase |
| Body Weight /(Kg) | −0.1722 | 0.8418 | Decrease | |
| 0.6307 | 1.8789 | Increase | ||
| 0.1239 | 1.1319 | Increase | ||
| Body Height /(centimeter) | 0.0059 | 1.0059 | Increase | |
| 0.1990 | 1.2201 | Increase | ||
| −0.4571 | 0.6331 | Decrease | ||
| 0.1695 | 1.1847 | Increase | ||
| Sugary Food Consumption /(gram) | −0.3675 | 0.6925 | Decrease | |
| 0.3196 | 1.3766 | Increase | ||
| 0.2075 | 1.2306 | Increase | ||
| 0.3045 | 1.3559 | Increase | ||
| Fatty Food Consumption /(gram) | −0.0273 | 0.9731 | Decrease | |
| 0.1297 | 1.1385 | Increase | ||
| Stress Level /(point) | −0.6763 | 0.5085 | Decrease | |
| −0.5538 | 0.5748 | Decrease | ||
| 1.1163 | 3.0535 | Increase | ||
| 0.3956 | 1.4853 | Increase |
| Threshold | Sensitivity | Specificity | Youden’s J |
|---|---|---|---|
| 0.39–0.46 | 0.8929 | 0.8333 | 0.7262 |
| 0.38 | 0.8929 | 0.8056 | 0.6984 |
| 0.47 | 0.8214 | 0.8611 | 0.6825 |
| Model Evaluation Criteria | Compared Models | ||
|---|---|---|---|
| BLR (Parametric Model) | NBLR (Nonparametric Model) | SBLR (Proposed Model) | |
| Deviance | 55.3011 | 48.2016 | 41.4327 |
| p-value | 0.5391 | 0.5861 | 0.6281 |
| Press-Q | 18.0625 | 36 | 33.0625 |
| p-value | 0.000 | 0.000 | 0.000 |
| McNemar Test | 0.0833 | 1.1250 | 0.4444 |
| p-value | 0.7728 | 0.2888 | 0.5050 |
| Brier Score | 0.1449 | 0.1191 | 0.1048 |
| Brier Skill Score | 0.4111 | 0.5161 | 0.5741 |
| Methods | Training | Testing | All Observation | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sens. | Spec. | Acc. | AUC | Sens. | Spec. | Acc. (%) | AUC | Sens. | Spec. | Acc. (%) | AUC | |
| BLR (Parametric) | 0.75 | 0.86 | 81.25 | 0.876 | 0.50 | 0.75 | 62.50 | 0.80 | 0.69 | 0.84 | 77.5 | 0.861 |
| PS-NBLR (Nonparametric) | 0.93 | 0.83 | 87.50 | 0.915 | 0.50 | 0.625 | 56.25 | 0.7 | 0.88 | 0.81 | 85 | 0.892 |
| PS-SBLR (Semiparametric) | 0.83 | 0.89 | 85.94 | 0.892 | 0.75 | 0.75 | 75 | 0.81 | 0.82 | 0.86 | 83.75 | 0.899 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Chamidah, N.; Rifada, M.; Lestari, B.; Aydin, D.; Siregar, N.R.A.A. Penalized Spline Estimator for Semiparametric Binary Logistic Regression Model with Application to Coronary Heart Disease Risk Factors. Symmetry 2026, 18, 432. https://doi.org/10.3390/sym18030432
Chamidah N, Rifada M, Lestari B, Aydin D, Siregar NRAA. Penalized Spline Estimator for Semiparametric Binary Logistic Regression Model with Application to Coronary Heart Disease Risk Factors. Symmetry. 2026; 18(3):432. https://doi.org/10.3390/sym18030432
Chicago/Turabian StyleChamidah, Nur, Marisa Rifada, Budi Lestari, Dursun Aydin, and Naufal Ramadhan Al Akhwal Siregar. 2026. "Penalized Spline Estimator for Semiparametric Binary Logistic Regression Model with Application to Coronary Heart Disease Risk Factors" Symmetry 18, no. 3: 432. https://doi.org/10.3390/sym18030432
APA StyleChamidah, N., Rifada, M., Lestari, B., Aydin, D., & Siregar, N. R. A. A. (2026). Penalized Spline Estimator for Semiparametric Binary Logistic Regression Model with Application to Coronary Heart Disease Risk Factors. Symmetry, 18(3), 432. https://doi.org/10.3390/sym18030432

