Advances in Machine Learning, High-Dimensional Inference, Shrinkage Estimation, and Model Validation

A special issue of Stats (ISSN 2571-905X). This special issue belongs to the section "Applied Statistics and Machine Learning Methods".

Deadline for manuscript submissions: 25 March 2026 | Viewed by 2513

Special Issue Editor


E-Mail Website
Guest Editor
Department of Mathematics and Statistics, Florida International University, Miami, FL 33199, USA
Interests: biostatistics; computational statistics; environmental statistics; distribution theory; pre-test and shrinkage estimation; predictive inference; ridge regression; statistical inference; simulation studies
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The landscape of modern statistics is rapidly evolving due to the explosive growth of high-dimensional data and the increasing integration of machine learning (ML) methodologies into applied and theoretical statistical research. These developments have necessitated a rethinking of classical estimation techniques, model assessment strategies, and validation frameworks, particularly under complex, high-dimensional settings. This Special Issue aims to bring together high-quality research contributions that explore theoretical advances, methodological developments, and practical applications in the following intersecting areas:

  • Machine learning and statistical learning theory.
  • High-dimensional data analysis.
  • Shrinkage and regularization techniques (e.g., Lasso, Ridge, and Elastic Net).
  • Model selection and validation strategies.
  • Applications in genomics, finance, healthcare, and engineering.

We intend to provide a platform for researchers to showcase innovative work that pushes the frontiers of statistical methodology while maintaining a strong link to real-world data and empirical validation.

This Special Issue will provide a focused venue for the dissemination of cutting-edge research at the intersection of statistical theory, machine learning, and high-dimensional inference. It will not only advance scholarly dialogue in these domains but also guide practitioners in applying robust, validated models to complex data challenges.

We look forward to the opportunity to contribute to Stats through this Special Issue.

We look forward to receiving your contributions.

Prof. Dr. B. M. Golam Kibria
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Stats is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • high-dimensional data
  • machine learning (ML) methodologies
  • theoretical statistical research

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 1922 KB  
Article
Validated Transfer Learning Peters–Belson Methods for Survival Analysis: Ensemble Machine Learning Approaches with Overfitting Controls for Health Disparity Decomposition
by Menglu Liang and Yan Li
Stats 2025, 8(4), 114; https://doi.org/10.3390/stats8040114 - 10 Dec 2025
Abstract
Background: Health disparities research increasingly relies on complex survey data to understand survival differences between population subgroups. While Peters–Belson decomposition provides a principled framework for distinguishing disparities explained by measured covariates from unexplained residual differences, traditional approaches face challenges with complex data patterns [...] Read more.
Background: Health disparities research increasingly relies on complex survey data to understand survival differences between population subgroups. While Peters–Belson decomposition provides a principled framework for distinguishing disparities explained by measured covariates from unexplained residual differences, traditional approaches face challenges with complex data patterns and model validation for counterfactual estimation. Objective: To develop validated Peters–Belson decomposition methods for survival analysis that integrate ensemble machine learning with transfer learning while ensuring logical validity of counterfactual estimates through comprehensive model validation. Methods: We extend the traditional Peters–Belson framework through ensemble machine learning that combines Cox proportional hazards models, cross-validated random survival forests, and regularized gradient boosting approaches. Our framework incorporates a transfer learning component via principal component analysis (PCA) to discover shared latent factors between majority and minority groups. We note that this “transfer learning” differs from the standard machine learning definition (pre-trained models or domain adaptation); here, we use the term in its statistical sense to describe the transfer of covariate structure information from the pooled population to identify group-level latent factors. We develop a comprehensive validation framework that ensures Peters–Belson logical bounds compliance, preventing mathematical violations in counterfactual estimates. The approach is evaluated through simulation studies across five realistic health disparity scenarios using stratified complex survey designs. Results: Simulation studies demonstrate that validated ensemble methods achieve superior performance compared to individual models (proportion explained: 0.352 vs. 0.310 for individual Cox, 0.325 for individual random forests), with validation framework reducing logical violations from 34.7% to 2.1% of cases. Transfer learning provides additional 16.1% average improvement in explanation of unexplained disparity when significant unmeasured confounding exists, with 90.1% overall validation success rate. The validation framework ensures explanation proportions remain within realistic bounds while maintaining computational efficiency with 31% overhead for validation procedures. Conclusions: Validated ensemble machine learning provides substantial advantages for Peters–Belson decomposition when combined with proper model validation. Transfer learning offers conditional benefits for capturing unmeasured group-level factors while preventing mathematical violations common in standard approaches. The framework demonstrates that realistic health disparity patterns show 25–35% of differences explained by measured factors, providing actionable targets for reducing health inequities. Full article
Show Figures

Figure 1

22 pages, 1227 KB  
Article
Theoretically Based Dynamic Regression (TDR)—A New and Novel Regression Framework for Modeling Dynamic Behavior
by Derrick K. Rollins, Marit Nilsen-Hamilton, Kendra Kreienbrink, Spencer Wolfe, Dillon Hurd and Jacob Oyler
Stats 2025, 8(4), 89; https://doi.org/10.3390/stats8040089 - 28 Sep 2025
Viewed by 596
Abstract
The theoretical modeling of a dynamic system will have derivatives of the response (y) with respect to time (t). Two common physical attributes (i.e., parameters) of dynamic systems are dead-time (θ) and lag (τ). Theoretical [...] Read more.
The theoretical modeling of a dynamic system will have derivatives of the response (y) with respect to time (t). Two common physical attributes (i.e., parameters) of dynamic systems are dead-time (θ) and lag (τ). Theoretical dynamic modeling will contain physically interpretable parameters such as τ and θ with physical constraints. In addition, the number of unknown model-based parameters can be considerably smaller than empirically based (i.e., lagged-based) approaches. This work proposes a Theoretically based Dynamic Regression (TDR) modeling approach that overcomes critical lagged-based modeling limitations as demonstrated in three large, multiple input, highly dynamic, real data sets. Dynamic Regression (DR) is a lagged-based, empirical dynamic modeling approach that appears in the statistics literature. However, like all empirical approaches, the model structures do not contain first-principle interpretable parameters. Additionally, several time lags are typically needed for the output, y, and input, x, to capture significant dynamic behavior. TDR uses a simplistic theoretically based dynamic modeling approach to transform xt into its dynamic counterpart, vt, and then applies the methods and tools of static regression to vt. TDR is demonstrated on the following three modeling problems of freely existing (i.e., not experimentally designed) real data sets: 1. the weight variation in a person (y) with four measured nutrient inputs (xi); 2. the variation in the tray temperature (y) of a distillation column with nine inputs and eight test data sets over a three year period; and 3. eleven extremely large, highly dynamic, subject-specific models of sensor glucose (y) with 12 inputs (xi). Full article
Show Figures

Figure 1

19 pages, 1013 KB  
Article
A Simulation-Based Comparative Analysis of Two-Parameter Robust Ridge M-Estimators for Linear Regression Models
by Bushra Haider, Syed Muhammad Asim, Danish Wasim and B. M. Golam Kibria
Stats 2025, 8(4), 84; https://doi.org/10.3390/stats8040084 - 24 Sep 2025
Cited by 1 | Viewed by 946
Abstract
Traditional regression estimators like Ordinary Least Squares (OLS) and classical ridge regression often fail under multicollinearity and outlier contamination respectively. Although recently developed two-parameter ridge regression (TPRR) estimators improve efficiency by introducing dual shrinkage parameters, they remain sensitive to extreme observations. This study [...] Read more.
Traditional regression estimators like Ordinary Least Squares (OLS) and classical ridge regression often fail under multicollinearity and outlier contamination respectively. Although recently developed two-parameter ridge regression (TPRR) estimators improve efficiency by introducing dual shrinkage parameters, they remain sensitive to extreme observations. This study develops a new class of Two-Parameter Robust Ridge M-Estimators (TPRRM) that integrate dual shrinkage with robust M-estimation to simultaneously address multicollinearity and outliers. A Monte Carlo simulation study, conducted under varying sample sizes, predictor dimensions, correlation levels, and contamination structures, compares the proposed estimators with OLS, ridge, and the most recent TPRR estimators. The results demonstrate that TPRRM consistently achieves the lowest Mean Squared Error (MSE), particularly in heavy-tailed and outlier-prone scenarios. Application to the Tobacco and Gasoline Consumption datasets further validates the superiority of the proposed methods in real-world conditions. The findings confirm that the proposed TPRRM fills a critical methodological gap by offering estimators that are not only efficient under multicollinearity, but also robust against departures from normality. Full article
Show Figures

Figure 1

Back to TopTop