Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (35)

Search Parameters:
Keywords = penalized regression techniques

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
49 pages, 4062 KB  
Article
Evaluation of a Non-Parametric Penalized Kaplan–Meier Estimator Under Interval-Censored Survival Data
by Kayakazi Chophela, Chioneso Show Marange and Akinwumi Sunday Odeyemi
Symmetry 2026, 18(3), 519; https://doi.org/10.3390/sym18030519 - 18 Mar 2026
Viewed by 392
Abstract
Interval-censored survival data arise frequently in biomedical and epidemiological studies where event times are observed only within observation intervals. Classical non-parametric estimators, such as the Kaplan–Meier (KM) estimator under imputation and the Turnbull estimator, often suffer from instability, irregular fluctuations, and overfitting when [...] Read more.
Interval-censored survival data arise frequently in biomedical and epidemiological studies where event times are observed only within observation intervals. Classical non-parametric estimators, such as the Kaplan–Meier (KM) estimator under imputation and the Turnbull estimator, often suffer from instability, irregular fluctuations, and overfitting when sample sizes are small or when the prevalence rate is low. Recent methodological developments, which include smoothed and penalized approaches, have been proposed to improve stability and reduce estimation error in such settings. This study evaluates and benchmarks the finite-sample performance of a nonparametric penalized likelihood KM estimator under interval-censored data. The method is compared with the classical KM estimator using four imputation strategies, that is, midpoint, regression, uniform, and multiple imputation. From a symmetry perspective, midpoint and uniform imputation preserve interval symmetry through deterministic and probabilistic mechanisms, respectively, whereas regression and multiple imputation intentionally introduce structural asymmetry to reflect data-driven risk heterogeneity and distributional uncertainty. To assess and benchmark the performance of the penalized KM estimator, an extensive Monte Carlo (MC) simulation study was conducted across varying sample sizes and prevalence rates using error-based metrics. The MC simulation results revealed that the nonparametric penalized KM estimator consistently outperforms the classical KM estimator in small samples across all prevalence rates. The gains are more pronounced under low prevalence rates where the penalized KM estimator is superior for small to relatively moderate samples of n 40–100. Among the imputation techniques, regression and multiple imputation generally exhibited superior performance. Real data application further confirms these findings, demonstrating that the nonparametric penalized KM estimator yields more stable and accurate survival curves than the classical KM estimator in small samples. Full article
(This article belongs to the Section Mathematics)
Show Figures

Figure 1

31 pages, 841 KB  
Article
Penalized Spline Estimator for Semiparametric Binary Logistic Regression Model with Application to Coronary Heart Disease Risk Factors
by Nur Chamidah, Marisa Rifada, Budi Lestari, Dursun Aydin and Naufal Ramadhan Al Akhwal Siregar
Symmetry 2026, 18(3), 432; https://doi.org/10.3390/sym18030432 - 28 Feb 2026
Viewed by 419
Abstract
In this study, we develop a regression analysis method, namely, the Semiparametric Binary Logistic Regression (SBLR), by extending the classical logistic regression that integrates both parametric and nonparametric components, which allows it to simultaneously model linear and non-linear relationships. Here, to obtain the [...] Read more.
In this study, we develop a regression analysis method, namely, the Semiparametric Binary Logistic Regression (SBLR), by extending the classical logistic regression that integrates both parametric and nonparametric components, which allows it to simultaneously model linear and non-linear relationships. Here, to obtain the estimation of a nonparametric component in the form of a non-linear curve (sigmoid curve), we use the penalized spline, which is a smoothing technique used in the nonparametric approach due to its ability to produce smooth and adaptive curves for fluctuating data. In this smoothing technique, selecting the optimal smoothing parameters plays an important role in fitting the model. Commonly, this selection is based on the minimum value of ordinary Cross-Validation (CV) or Generalized Cross-Validation (GCV). However, these CV and GCV criteria cannot be used when the CV and GCV curves continuously decline and never rise; the minimum CV and GCV values would not be achieved because they are not directly applicable due to the non-quadratic nature of the log-likelihood function. Therefore, a Generalized Approximate Cross-Validation (GACV) criterion is used to address such cases. This distinguishes it from previous studies that used the CV or GCV criterion. In the application to real data, we define an SBLR model of Coronary Heart Disease (CHD) risk factors that can be used for prediction and interpretation purposes. The results of the study successfully demonstrate the efficacy of the proposed method in identifying critical non-linear thresholds for CHD risk factors, and it is statistically valid and highly effective for CHD risk prediction. In the future, we can use the results of this research as a basis of an early warning system, specifically alerting individuals with moderate stress levels and dietary habits exceeding the identified thresholds to be aware of the heightened probability of developing CHD. In addition, this research aligns with point three of the Sustainable Development Goals (SDGs), namely, premature mortality reduction from non-communicable diseases by 2030. Full article
(This article belongs to the Section Mathematics)
Show Figures

Figure 1

18 pages, 840 KB  
Article
Utilizing Machine Learning Techniques for Computer-Aided COVID-19 Screening Based on Clinical Data
by Honglun Xu, Andrews T. Anum, Michael Pokojovy, Sreenath Chalil Madathil, Yuxin Wen, Md Fashiar Rahman, Tzu-Liang (Bill) Tseng, Scott Moen and Eric Walser
COVID 2026, 6(1), 17; https://doi.org/10.3390/covid6010017 - 9 Jan 2026
Viewed by 546
Abstract
The COVID-19 pandemic has highlighted the importance of rapid clinical decision-making to facilitate the efficient usage of healthcare resources. Over the past decade, machine learning (ML) has caused a tectonic shift in healthcare, empowering data-driven prediction and decision-making. Recent research demonstrates how ML [...] Read more.
The COVID-19 pandemic has highlighted the importance of rapid clinical decision-making to facilitate the efficient usage of healthcare resources. Over the past decade, machine learning (ML) has caused a tectonic shift in healthcare, empowering data-driven prediction and decision-making. Recent research demonstrates how ML was used to respond to the COVID-19 pandemic. This paper puts forth new computer-aided COVID-19 disease screening techniques using six classes of ML algorithms (including penalized logistic regression, random forest, artificial neural networks, and support vector machines) and evaluates their performance when applied to a real-world clinical dataset containing patients’ demographic information and vital indices (such as sex, ethnicity, age, pulse, pulse oximetry, respirations, temperature, BP systolic, BP diastolic, and BMI), as well as ICD-10 codes of existing comorbidities, as attributes to predict the risk of having COVID-19 for given patient(s). Variable importance metrics computed using a random forest model were used to reduce the number of important predictors to thirteen. Using prediction accuracy, sensitivity, specificity, and AUC as performance metrics, the performance of various ML methods was assessed, and the best model was selected. Our proposed model can be used in clinical settings as a rapid and accessible COVID-19 screening technique. Full article
Show Figures

Figure 1

31 pages, 511 KB  
Article
Shrinkage Approaches for Ridge-Type Estimators Under Multicollinearity
by Marwan Al-Momani, Bahadır Yüzbaşı, Mohammad Saleh Bataineh, Rihab Abdallah and Athifa Moideenkutty
Mathematics 2025, 13(22), 3733; https://doi.org/10.3390/math13223733 - 20 Nov 2025
Cited by 1 | Viewed by 657
Abstract
Multicollinearity is a common issue in regression analyses that occurs when some predictor variables are highly correlated, leading to unstable least squares estimates of model parameters. Various estimation strategies have been proposed to address this problem. In this study, we enhanced a ridge-type [...] Read more.
Multicollinearity is a common issue in regression analyses that occurs when some predictor variables are highly correlated, leading to unstable least squares estimates of model parameters. Various estimation strategies have been proposed to address this problem. In this study, we enhanced a ridge-type estimator by incorporating pretest and shrinkage techniques. We conducted an analytical comparison to evaluate the performance of the proposed estimators in terms of their bias, quadratic risk, and numerical performance using both simulated and real data. Additionally, we assessed several penalization methods and three machine learning algorithms to facilitate a comprehensive comparison. Our results demonstrate that the proposed estimators outperformed the standard ridge-type estimator with respect to the mean squared error of the simulated data and the mean squared prediction error of two real data applications. Full article
(This article belongs to the Special Issue Advances in Statistical Methods with Applications)
Show Figures

Figure 1

32 pages, 1288 KB  
Article
Random Forest Adaptation for High-Dimensional Count Regression
by Oyebayo Ridwan Olaniran, Saidat Fehintola Olaniran, Ali Rashash R. Alzahrani, Nada MohammedSaeed Alharbi and Asma Ahmad Alzahrani
Mathematics 2025, 13(18), 3041; https://doi.org/10.3390/math13183041 - 21 Sep 2025
Cited by 2 | Viewed by 1760
Abstract
The analysis of high-dimensional count data presents a unique set of challenges, including overdispersion, zero-inflation, and complex nonlinear relationships that traditional generalized linear models and standard machine learning approaches often fail to adequately address. This study introduces and validates a novel Random Forest [...] Read more.
The analysis of high-dimensional count data presents a unique set of challenges, including overdispersion, zero-inflation, and complex nonlinear relationships that traditional generalized linear models and standard machine learning approaches often fail to adequately address. This study introduces and validates a novel Random Forest framework specifically developed for high-dimensional Poisson and Negative Binomial regression, designed to overcome the limitations of existing methods. Through comprehensive simulations and a real-world genomic application to the Norwegian Mother and Child Cohort Study, we demonstrate that the proposed methods achieve superior predictive accuracy, quantified by lower root mean squared error and deviance, and critically produced exceptionally stable and interpretable feature selections. Our theoretical and empirical results show that these distribution-optimized ensembles significantly outperform both penalized-likelihood techniques and naive-transformation-based ensembles in balancing statistical robustness with biological interpretability. The study concludes that the proposed frameworks provide a crucial methodological advancement, offering a powerful and reliable tool for extracting meaningful insights from complex count data in fields ranging from genomics to public health. Full article
(This article belongs to the Special Issue Statistics for High-Dimensional Data)
Show Figures

Figure 1

14 pages, 845 KB  
Article
Assessment of Ultrasound-Controlled Diagnostic Methods for Thyroid Lesions and Their Associated Costs in a Tertiary University Hospital in Spain
by Lelia Ruiz-Hernández, Carmen Rosa Hernández-Socorro, Pedro Saavedra, María de la Vega-Pérez and Sergio Ruiz-Santana
J. Clin. Med. 2025, 14(15), 5551; https://doi.org/10.3390/jcm14155551 - 6 Aug 2025
Cited by 1 | Viewed by 2186
Abstract
Background/Objectives: Accurate diagnosis of thyroid cancer is critical but challenging due to overlapping ultrasound (US) features of benign and malignant nodules. This study aimed to evaluate the diagnostic performance of non-invasive and minimally invasive US techniques, including B-mode US, shear wave elastography (SWE), [...] Read more.
Background/Objectives: Accurate diagnosis of thyroid cancer is critical but challenging due to overlapping ultrasound (US) features of benign and malignant nodules. This study aimed to evaluate the diagnostic performance of non-invasive and minimally invasive US techniques, including B-mode US, shear wave elastography (SWE), color Doppler, superb microvascular imaging (SMI), and TI-RADS, in patients with suspected thyroid lesions and to assess their reliability and cost effectiveness compared with fine needle aspiration (FNA) biopsy. Methods: A prospective, single-center study (October 2023–February 2025) enrolled 300 patients with suspected thyroid cancer at a Spanish tertiary hospital. Of these, 296 patients with confirmed diagnoses underwent B-mode US, SWE, Doppler, SMI, and TI-RADS scoring, followed by US-guided FNA and Bethesda System cytopathology. Lasso-penalized logistic regression and a bootstrap analysis (1000 replicates) were used to develop diagnostic models. A utility function was used to balance diagnostic reliability and cost. Results: Thyroid cancer was diagnosed in 25 patients (8.3%). Elastography combined with SMI achieved the highest diagnostic performance (Youden index: 0.69; NPV: 97.4%; PPV: 69.1%), outperforming Doppler-only models. Intranodular vascularization was a significant risk factor, while peripheral vascularization was protective. The utility function showed that, when prioritizing cost, elastography plus SMI was cost effective (α < 0.716) compared with FNA. Conclusions: Elastography plus SMI offers a reliable, cost-effective diagnostic rule for thyroid cancer. The utility function aids clinicians in balancing reliability and cost. SMI and generalizability need to be validated in higher prevalence settings. Full article
(This article belongs to the Section Endocrinology & Metabolism)
Show Figures

Figure 1

20 pages, 774 KB  
Article
Robust Variable Selection via Bayesian LASSO-Composite Quantile Regression with Empirical Likelihood: A Hybrid Sampling Approach
by Ruisi Nan, Jingwei Wang, Hanfang Li and Youxi Luo
Mathematics 2025, 13(14), 2287; https://doi.org/10.3390/math13142287 - 16 Jul 2025
Viewed by 1047
Abstract
Since the advent of composite quantile regression (CQR), its inherent robustness has established it as a pivotal methodology for high-dimensional data analysis. High-dimensional outlier contamination refers to data scenarios where the number of observed dimensions (p) is much greater than the [...] Read more.
Since the advent of composite quantile regression (CQR), its inherent robustness has established it as a pivotal methodology for high-dimensional data analysis. High-dimensional outlier contamination refers to data scenarios where the number of observed dimensions (p) is much greater than the sample size (n) and there are extreme outliers in the response variables or covariates (e.g., p/n > 0.1). Traditional penalized regression techniques, however, exhibit notable vulnerability to data outliers during high-dimensional variable selection, often leading to biased parameter estimates and compromised resilience. To address this critical limitation, we propose a novel empirical likelihood (EL)-based variable selection framework that integrates a Bayesian LASSO penalty within the composite quantile regression framework. By constructing a hybrid sampling mechanism that incorporates the Expectation–Maximization (EM) algorithm and Metropolis–Hastings (M-H) algorithm within the Gibbs sampling scheme, this approach effectively tackles variable selection in high-dimensional settings with outlier contamination. This innovative design enables simultaneous optimization of regression coefficients and penalty parameters, circumventing the need for ad hoc selection of optimal penalty parameters—a long-standing challenge in conventional LASSO estimation. Moreover, the proposed method imposes no restrictive assumptions on the distribution of random errors in the model. Through Monte Carlo simulations under outlier interference and empirical analysis of two U.S. house price datasets, we demonstrate that the new approach significantly enhances variable selection accuracy, reduces estimation bias for key regression coefficients, and exhibits robust resistance to data outlier contamination. Full article
Show Figures

Figure 1

26 pages, 6617 KB  
Article
Penalty Strategies in Semiparametric Regression Models
by Ayuba Jack Alhassan, S. Ejaz Ahmed, Dursun Aydin and Ersin Yilmaz
Math. Comput. Appl. 2025, 30(3), 54; https://doi.org/10.3390/mca30030054 - 12 May 2025
Viewed by 2572
Abstract
This study includes a comprehensive evaluation of six penalty estimation strategies for partially linear models (PLRMs), focusing on their performance in the presence of multicollinearity and their ability to handle both parametric and nonparametric components. The methods under consideration include Ridge regression, Lasso, [...] Read more.
This study includes a comprehensive evaluation of six penalty estimation strategies for partially linear models (PLRMs), focusing on their performance in the presence of multicollinearity and their ability to handle both parametric and nonparametric components. The methods under consideration include Ridge regression, Lasso, Adaptive Lasso (aLasso), smoothly clipped absolute deviation (SCAD), ElasticNet, and minimax concave penalty (MCP). In addition to these established methods, we also incorporate Stein-type shrinkage estimation techniques that are standard and positive shrinkage and assess their effectiveness in this context. To estimate the PLRMs, we consider a kernel smoothing technique grounded in penalized least squares. Our investigation involves a theoretical analysis of the estimators’ asymptotic properties and a detailed simulation study designed to compare their performance under a variety of conditions, including different sample sizes, numbers of predictors, and levels of multicollinearity. The simulation results reveal that aLasso and shrinkage estimators, particularly the positive shrinkage estimator, consistently outperform the other methods in terms of Mean Squared Error (MSE) relative efficiencies (RE), especially when the sample size is small and multicollinearity is high. Furthermore, we present a real data analysis using the Hitters dataset to demonstrate the applicability of these methods in a practical setting. The results of the real data analysis align with the simulation findings, highlighting the superior predictive accuracy of aLasso and the shrinkage estimators in the presence of multicollinearity. The findings of this study offer valuable insights into the strengths and limitations of these penalty and shrinkage strategies, guiding their application in future research and practice involving semiparametric regression. Full article
Show Figures

Figure 1

33 pages, 7879 KB  
Article
Performance Evaluation of Machine Learning Models for Predicting Energy Consumption and Occupant Dissatisfaction in Buildings
by Haidar Hosamo and Silvia Mazzetto
Buildings 2025, 15(1), 39; https://doi.org/10.3390/buildings15010039 - 26 Dec 2024
Cited by 30 | Viewed by 5448
Abstract
This study evaluates the performance of 15 machine learning models for predicting energy consumption (30–100 kWh/m2·year) and occupant dissatisfaction (Percentage of Dissatisfied, PPD: 6–90%), key metrics for optimizing building performance. Ten evaluation metrics, including Mean Absolute Error (MAE, average prediction error), [...] Read more.
This study evaluates the performance of 15 machine learning models for predicting energy consumption (30–100 kWh/m2·year) and occupant dissatisfaction (Percentage of Dissatisfied, PPD: 6–90%), key metrics for optimizing building performance. Ten evaluation metrics, including Mean Absolute Error (MAE, average prediction error), Root Mean Squared Error (RMSE, penalizing large errors), and the coefficient of determination (R2, variance explained by the model), are used. XGBoost achieves the highest accuracy, with an energy MAE of 1.55 kWh/m2·year and a PPD MAE of 3.14%, alongside R2 values of 0.99 and 0.97, respectively. While these metrics highlight XGBoost’s superiority, its margin of improvement over LightGBM (energy MAE: 2.35 kWh/m2·year, PPD MAE: 3.89%) is context-dependent, suggesting its application in high-precision scenarios. ANN excelled at PPD predictions, achieving the lowest MAE (1.55%) and Mean Absolute Percentage Error (MAPE: 4.97%), demonstrating its ability to model complex nonlinear relationships. This nonlinear modeling advantage contrasts with LightGBM’s balance of speed and accuracy, making it suitable for computationally constrained tasks. In contrast, traditional models like linear regression and KNN exhibit high errors (e.g., energy MAE: 17.56 kWh/m2·year, PPD MAE: 17.89%), underscoring their limitations with respect to capturing the complexities of building performance datasets. The results indicate that advanced methods like XGBoost and ANN are particularly effective owing to their ability to model intricate relationships and manage high-dimensional data. Future research should validate these findings with diverse real-world datasets, including those representing varying building types and climates. Hybrid models combining the interpretability of linear methods with the precision of ensemble or neural models should be explored. Additionally, integrating these machine learning techniques with digital twin platforms could address real-time optimization challenges, including dynamic occupant behavior and time-dependent energy consumption. Full article
(This article belongs to the Section Building Energy, Physics, Environment, and Systems)
Show Figures

Figure 1

16 pages, 685 KB  
Article
Predicting Clinical Outcomes in COVID-19 and Pneumonia Patients: A Machine Learning Approach
by Kaida Cai, Zhengyan Wang, Xiaofang Yang, Wenzhi Fu and Xin Zhao
Viruses 2024, 16(10), 1624; https://doi.org/10.3390/v16101624 - 17 Oct 2024
Cited by 1 | Viewed by 2088
Abstract
In the clinical diagnosis of pneumonia, particularly during the COVID-19 pandemic, individuals who progress to a critical stage requiring mechanical ventilation are classified as mechanically ventilated critically ill patients. Accurately predicting the discharge outcomes for this specific cohort, especially those with COVID-19, is [...] Read more.
In the clinical diagnosis of pneumonia, particularly during the COVID-19 pandemic, individuals who progress to a critical stage requiring mechanical ventilation are classified as mechanically ventilated critically ill patients. Accurately predicting the discharge outcomes for this specific cohort, especially those with COVID-19, is of paramount clinical importance. Missing data, a common issue in medical research, can significantly impact the validity of analyses. In this work, we address this challenge by employing two missing data imputation techniques: multiple imputation and missForest, to enhance data completeness. Additionally, we utilize the smoothly clipped absolute deviation (SCAD) penalized logistic regression method to select significant features. Our real data analysis compares the predictive performances of extreme learning machines, random forests, support vector machines, and XGBoost using 10-fold cross-validation. The results consistently show that XGBoost outperforms the other methods in predicting discharge outcomes, making it a reliable tool for clinical decision-making in the treatment of severe pneumonia, including COVID-19 cases. Within this context, the random forest imputation method generally enhances performance, underscoring its effectiveness in managing missing data compared to multiple imputation. Full article
(This article belongs to the Section Coronaviruses)
Show Figures

Figure 1

22 pages, 3573 KB  
Article
The Estimating Parameter and Number of Knots for Nonparametric Regression Methods in Modelling Time Series Data
by Autcha Araveeporn
Modelling 2024, 5(4), 1413-1434; https://doi.org/10.3390/modelling5040073 - 5 Oct 2024
Cited by 6 | Viewed by 2603
Abstract
This research aims to explore and compare several nonparametric regression techniques, including smoothing splines, natural cubic splines, B-splines, and penalized spline methods. The focus is on estimating parameters and determining the optimal number of knots to forecast cyclic and nonlinear patterns, applying these [...] Read more.
This research aims to explore and compare several nonparametric regression techniques, including smoothing splines, natural cubic splines, B-splines, and penalized spline methods. The focus is on estimating parameters and determining the optimal number of knots to forecast cyclic and nonlinear patterns, applying these methods to simulated and real-world datasets, such as Thailand’s coal import data. Cross-validation techniques are used to control and specify the number of knots, ensuring the curve fits the data points accurately. The study applies nonparametric regression to forecast time series data with cyclic patterns and nonlinear forms in the dependent variable, treating the independent variable as sequential data. Simulated data featuring cyclical patterns resembling economic cycles and nonlinear data with complex equations to capture variable interactions are used for experimentation. These simulations include variations in standard deviations and sample sizes. The evaluation criterion for the simulated data is the minimum average mean square error (MSE), which indicates the most efficient parameter estimation. For the real data, monthly coal import data from Thailand is used to estimate the parameters of the nonparametric regression model, with the MSE as the evaluation metric. The performance of these techniques is also assessed in forecasting future values, where the mean absolute percentage error (MAPE) is calculated. Among the methods, the natural cubic spline consistently yields the lowest average mean square error across all standard deviations and sample sizes in the simulated data. While the natural cubic spline excels in parameter estimation, B-splines show strong performance in forecasting future values. Full article
Show Figures

Figure 1

16 pages, 3921 KB  
Article
Predicting Antidiabetic Peptide Activity: A Machine Learning Perspective on Type 1 and Type 2 Diabetes
by Kaida Cai, Zhe Zhang, Wenzhou Zhu, Xiangwei Liu, Tingqing Yu and Wang Liao
Int. J. Mol. Sci. 2024, 25(18), 10020; https://doi.org/10.3390/ijms251810020 - 18 Sep 2024
Cited by 3 | Viewed by 2750
Abstract
Diabetes mellitus (DM) presents a critical global health challenge, characterized by persistent hyperglycemia and associated with substantial economic and health-related burdens. This study employs advanced machine-learning techniques to improve the prediction and classification of antidiabetic peptides, with a particular focus on differentiating those [...] Read more.
Diabetes mellitus (DM) presents a critical global health challenge, characterized by persistent hyperglycemia and associated with substantial economic and health-related burdens. This study employs advanced machine-learning techniques to improve the prediction and classification of antidiabetic peptides, with a particular focus on differentiating those effective against T1DM from those targeting T2DM. We integrate feature selection with analysis methods, including logistic regression, support vector machines (SVM), and adaptive boosting (AdaBoost), to classify antidiabetic peptides based on key features. Feature selection through the Lasso-penalized method identifies critical peptide characteristics that significantly influence antidiabetic activity, thereby establishing a robust foundation for future peptide design. A comprehensive evaluation of logistic regression, SVM, and AdaBoost shows that AdaBoost consistently outperforms the other methods, making it the most effective approach for classifying antidiabetic peptides. This research underscores the potential of machine learning in the systematic evaluation of bioactive peptides, contributing to the advancement of peptide-based therapies for diabetes management. Full article
(This article belongs to the Special Issue Machine Learning in Disease Diagnosis and Treatment)
Show Figures

Figure 1

23 pages, 426 KB  
Article
A Penalized Empirical Likelihood Approach for Estimating Population Sizes under the Negative Binomial Regression Model
by Yulu Ji and Yang Liu
Mathematics 2024, 12(17), 2674; https://doi.org/10.3390/math12172674 - 28 Aug 2024
Viewed by 1979
Abstract
In capture–recapture experiments, the presence of overdispersion and heterogeneity necessitates the use of the negative binomial regression model for inferring population sizes. However, within this model, existing methods based on likelihood and ratio regression for estimating the dispersion parameter often face boundary and [...] Read more.
In capture–recapture experiments, the presence of overdispersion and heterogeneity necessitates the use of the negative binomial regression model for inferring population sizes. However, within this model, existing methods based on likelihood and ratio regression for estimating the dispersion parameter often face boundary and nonidentifiability issues. These problems can result in nonsensically large point estimates and unbounded upper limits of confidence intervals for the population size. We present a penalized empirical likelihood technique for solving these two problems by imposing a half-normal prior on the population size. Based on the proposed approach, a maximum penalized empirical likelihood estimator with asymptotic normality and a penalized empirical likelihood ratio statistic with asymptotic chi-square distribution are derived. To improve numerical performance, we present an effective expectation-maximization (EM) algorithm. In the M-step, optimization for the model parameters could be achieved by fitting a standard negative binomial regression model via the R basic function glm.nb(). This approach ensures the convergence and reliability of the numerical algorithm. Using simulations, we analyze several synthetic datasets to illustrate three advantages of our methods in finite-sample cases: complete mitigation of the boundary problem, more efficient maximum penalized empirical likelihood estimates, and more precise penalized empirical likelihood ratio interval estimates compared to the estimates obtained without penalty. These advantages are further demonstrated in a case study estimating the abundance of black bears (Ursus americanus) at the U.S. Army’s Fort Drum Military Installation in northern New York. Full article
Show Figures

Figure 1

12 pages, 415 KB  
Article
Machine Learning-Based Risk Prediction of Discharge Status for Sepsis
by Kaida Cai, Yuqing Lou, Zhengyan Wang, Xiaofang Yang and Xin Zhao
Entropy 2024, 26(8), 625; https://doi.org/10.3390/e26080625 - 25 Jul 2024
Cited by 1 | Viewed by 2118
Abstract
As a severe inflammatory response syndrome, sepsis presents complex challenges in predicting patient outcomes due to its unclear pathogenesis and the unstable discharge status of affected individuals. In this study, we develop a machine learning-based method for predicting the discharge status of sepsis [...] Read more.
As a severe inflammatory response syndrome, sepsis presents complex challenges in predicting patient outcomes due to its unclear pathogenesis and the unstable discharge status of affected individuals. In this study, we develop a machine learning-based method for predicting the discharge status of sepsis patients, aiming to improve treatment decisions. To enhance the robustness of our analysis against outliers, we incorporate robust statistical methods, specifically the minimum covariance determinant technique. We utilize the random forest imputation method to effectively manage and impute missing data. For feature selection, we employ Lasso penalized logistic regression, which efficiently identifies significant predictors and reduces model complexity, setting the stage for the application of more complex predictive methods. Our predictive analysis incorporates multiple machine learning methods, including random forest, support vector machine, and XGBoost. We compare the prediction performance of these methods with Lasso penalized logistic regression to identify the most effective approach. Each method’s performance is rigorously evaluated through ten iterations of 10-fold cross-validation to ensure robust and reliable results. Our comparative analysis reveals that XGBoost surpasses the other models, demonstrating its exceptional capability to navigate the complexities of sepsis data effectively. Full article
Show Figures

Figure 1

20 pages, 2394 KB  
Article
Enhanced Model Predictions through Principal Components and Average Least Squares-Centered Penalized Regression
by Adewale F. Lukman, Emmanuel T. Adewuyi, Ohud A. Alqasem, Mohammad Arashi and Kayode Ayinde
Symmetry 2024, 16(4), 469; https://doi.org/10.3390/sym16040469 - 12 Apr 2024
Cited by 3 | Viewed by 1780
Abstract
We address the estimation of regression parameters for the ill-conditioned predictive linear model in this study. Traditional least squares methods often encounter challenges in yielding reliable results when there is multicollinearity. Therefore, we employ a better shrinkage method, average least squares-centered penalized regression [...] Read more.
We address the estimation of regression parameters for the ill-conditioned predictive linear model in this study. Traditional least squares methods often encounter challenges in yielding reliable results when there is multicollinearity. Therefore, we employ a better shrinkage method, average least squares-centered penalized regression (ALPR), as it offers a more efficient approach for handling multicollinearity than ridge regression. Additionally, we integrate ALPR with the principal component (PC) dimension reduction method for enhanced performance. We compared the proposed PCALPR estimation technique with existing ones for ill-conditioned problems through comprehensive simulations and real-life data analyses using the mean squared error. This integration results in superior model performance compared to other methods, highlighting the potential of combining dimensionality reduction techniques with penalized regression for enhanced model predictions. Full article
(This article belongs to the Section Mathematics)
Show Figures

Figure 1

Back to TopTop