You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.

Search for Articles:

Title / Keyword

Author / Affiliation / Email

Journal

Article Type

Advanced Search

Section

Special Issue

Volume

Issue

Number

Page

Logical OperatorOperator

Search Text

Search Type

Unraveling Similarities and Differences Between Non-Negative Garrote and Adaptive Lasso: A Simulation Study in Low- and High-Dimensional Data
Some Useful Techniques for High-Dimensional Statistics

Journal Description

Stats

Stats is an international, peer-reviewed, open access journal on statistical science published quarterly online by MDPI. The journal focuses on methodological and theoretical papers in statistics, probability, stochastic processes and innovative applications of statistics in all scientific disciplines including biological and biomedical sciences, medicine, business, economics and social sciences, physics, data science and engineering.

Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
High Visibility: indexed within ESCI (Web of Science), Scopus, RePEc, and other databases.
Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 18.2 days after submission; acceptance to publication is undertaken in 2.9 days (median values for papers published in this journal in the first half of 2025).
Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Journal Cluster of Artificial Intelligence: AI, AI in Medicine, Algorithms, BDCC, MAKE, MTI, Stats, Virtual Worlds and Computers.

Impact Factor: 1.0 (2024); 5-Year Impact Factor: 1.1 (2024)

Imprint Information Journal Flyer Open Access ISSN: 2571-905X

Latest Articles

15 pages, 1977 KB

Open AccessArticle

Robustness of the Trinormal ROC Surface Model: Formal Assessment via Goodness-of-Fit Testing

by Christos Nakas

Stats 2025, 8(4), 101; https://doi.org/10.3390/stats8040101 - 17 Oct 2025

Receiver operating characteristic (ROC) surfaces provide a natural extension of ROC curves to three-class diagnostic problems. A key summary index is the volume under the surface (VUS), representing the probability that a randomly chosen observation from each of the three ordered groups is [...] Read more.

Receiver operating characteristic (ROC) surfaces provide a natural extension of ROC curves to three-class diagnostic problems. A key summary index is the volume under the surface (VUS), representing the probability that a randomly chosen observation from each of the three ordered groups is correctly classified. A parametric estimation of VUS typically assumes trinormality of the class distributions. However, a formal method for the verification of this composite assumption has not appeared in the literature. Our approach generalizes the two-class AUC-based GOF test of Zou et al. to the three-class setting by exploiting the parallel structure between empirical and trinormal VUS estimators. We propose a global goodness-of-fit (GOF) test for trinormal ROC models based on the difference between empirical and trinormal parametric estimates of the VUS. To improve stability, a probit transformation is applied and a bootstrap procedure is used to estimate the variance of the difference. The resulting test provides a formal diagnostic for assessing the adequacy of trinormal ROC modeling. Simulation studies illustrate the robustness of the assumption via the empirical size and power of the test under various distributional settings, including skewed and multimodal alternatives. The method’s application to COVID-19 antibody level data demonstrates the practical utility of it. Our findings suggest that the proposed GOF test is simple to implement, computationally feasible for moderate sample sizes, and a useful complement to existing ROC surface methodology. Full article

(This article belongs to the Section Biostatistics)

► Show Figures

Figure 1

16 pages, 1699 KB

Open AccessTechnical Note

Synthetic Hydrograph Estimation for Ungauged Basins: Exploring the Role of Statistical Distributions

by Dan Ianculescu and Cristian Gabriel Anghel

Stats 2025, 8(4), 100; https://doi.org/10.3390/stats8040100 - 17 Oct 2025

The use of probability distribution functions in deriving synthetic hydrographs has become a robust method for modeling the response of watersheds to precipitation events. This approach leverages statistical distributions to capture the temporal structure of runoff processes, providing a flexible framework for estimating [...] Read more.

The use of probability distribution functions in deriving synthetic hydrographs has become a robust method for modeling the response of watersheds to precipitation events. This approach leverages statistical distributions to capture the temporal structure of runoff processes, providing a flexible framework for estimating peak discharge, time to peak, and hydrograph shape. The present study explores the application of various probability distributions in constructing synthetic hydrographs. The research evaluates parameter estimation techniques, analyzing their influence on hydrograph accuracy. The results highlight the strengths and limitations of each distribution in capturing key hydrological characteristics, offering insights into the suitability of certain probability distribution functions under varying watershed conditions. The study concludes that the approach based on the Cadariu rational function enhances the adaptability and precision of synthetic hydrograph models, thereby supporting flood forecasting and watershed management. Full article

(This article belongs to the Special Issue Robust Statistics in Action II)

► Show Figures

Figure 1

21 pages, 425 KB

Open AccessArticle

Model-Free Feature Screening Based on Data Aggregation for Ultra-High-Dimensional Longitudinal Data

by Junfeng Chen, Xiaoguang Yang, Jing Dai and Yunming Li

Stats 2025, 8(4), 99; https://doi.org/10.3390/stats8040099 - 16 Oct 2025

Ultra-high dimensional longitudinal data feature screening procedures are widely studied, but most require model assumptions. The screening performance of these methods may not be excellent if we specify an incorrect model. To resolve the above problem, a new model-free method is introduced where [...] Read more.

Ultra-high dimensional longitudinal data feature screening procedures are widely studied, but most require model assumptions. The screening performance of these methods may not be excellent if we specify an incorrect model. To resolve the above problem, a new model-free method is introduced where feature screening is performed by sample splitting and data aggregation. Distance correlation is used to measure the association at each time point separately, while longitudinal correlation is modeled by a specific cumulative distribution function to achieve efficiency. In addition, we extend this new method to handle situations where the predictors are correlated. Both methods possess excellent asymptotic properties and are capable of handling longitudinal data with unequal numbers of repeated measurements and unequal intervals between repeated measurement time points. Compared to other model-free methods, the two new methods are relatively insensitive to within-subject correlation, and they can help reduce the computational burden when applied to longitudinal data. Finally, we use some simulated and empirical examples to show that both new methods have better screening performance. Full article

► Show Figures

Figure 1

25 pages, 514 KB

Open AccessArticle

Expansions for the Conditional Density and Distribution of a Standard Estimate

by Christopher S. Withers

Stats 2025, 8(4), 98; https://doi.org/10.3390/stats8040098 - 14 Oct 2025

Conditioning is a very useful way of using correlated information to reduce the variability of an estimate. Conditioning an estimate on a correlated estimate, reduces its covariance, and so provides more precise inference than using an unconditioned estimate. Here we give expansions in [...] Read more.

Conditioning is a very useful way of using correlated information to reduce the variability of an estimate. Conditioning an estimate on a correlated estimate, reduces its covariance, and so provides more precise inference than using an unconditioned estimate. Here we give expansions in powers of

n^{- 1 / 2}

for the conditional density and distribution of any multivariate standard estimate based on a sample of size n. Standard estimates include most estimates of interest, including smooth functions of sample means and other empirical estimates. We also show that a conditional estimate is not a standard estimate, so that Edgeworth-Cornish-Fisher expansions cannot be applied directly. Full article

► Show Figures

Figure 1

15 pages, 301 KB

Open AccessArticle

Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques

by Luai Al-Labadi, Ruodie Yu and Kairui Bao

Stats 2025, 8(4), 97; https://doi.org/10.3390/stats8040097 - 14 Oct 2025

Goodness-of-fit testing remains a fundamental problem in statistical inference with broad practical importance. In this paper, we introduce two new goodness-of-fit tests grounded in entropy-based density estimation techniques. The first is a boundary-corrected empirical likelihood ratio test, which refines the classic approach by [...] Read more.

Goodness-of-fit testing remains a fundamental problem in statistical inference with broad practical importance. In this paper, we introduce two new goodness-of-fit tests grounded in entropy-based density estimation techniques. The first is a boundary-corrected empirical likelihood ratio test, which refines the classic approach by addressing bias near the support boundaries, though, in practice, it yields results very similar to the uncorrected version. The second is a novel test built on Correa’s local linear entropy estimator, leveraging quantile regression to improve density estimation accuracy. We establish the theoretical properties of both test statistics and demonstrate their practical effectiveness through extensive simulation studies and real-data applications. The results show that the proposed methods deliver strong power and flexibility in assessing model adequacy in a wide range of settings. Full article

14 pages, 426 KB

Open AccessArticle

Robust Parameter Designs Constructed from Hadamard Matrices

by Yingfu Li and Kalanka P. Jayalath

Stats 2025, 8(4), 96; https://doi.org/10.3390/stats8040096 - 11 Oct 2025

The primary objective of robust parameter design (RPD) is to determine the optimal settings of control factors in a system to minimize response variance while achieving a desirable mean response. This article investigates fractional factorial designs constructed from Hadamard matrices of orders 12, [...] Read more.

The primary objective of robust parameter design (RPD) is to determine the optimal settings of control factors in a system to minimize response variance while achieving a desirable mean response. This article investigates fractional factorial designs constructed from Hadamard matrices of orders 12, 16, and 20 to meet RPD requirements with minimal runs. For various combinations of control and noise factors, rather than recommending a single “best” design, up to the top ten good candidate designs are identified. All listed designs permit the estimation of all control-by-noise interactions and the main effects of both control and noise factors. Additionally, some nonregular RPDs allow for the estimation of one or two control-by-control interactions, which may be critical for achieving optimal mean response. These results provide practical options for efficient, resource-constrained experiments with economical run sizes. Full article

► Show Figures

Figure A1

11 pages, 272 KB

Open AccessArticle

Bayesian Bell Regression Model for Fitting of Overdispersed Count Data with Application

by Ameer Musa Imran Alhseeni and Hossein Bevrani

Stats 2025, 8(4), 95; https://doi.org/10.3390/stats8040095 - 10 Oct 2025

The Bell regression model (BRM) is a statistical model that is often used in the analysis of count data that exhibits overdispersion. In this study, we propose a Bayesian analysis of the BRM and offer a new perspective on its application. Specifically, we [...] Read more.

The Bell regression model (BRM) is a statistical model that is often used in the analysis of count data that exhibits overdispersion. In this study, we propose a Bayesian analysis of the BRM and offer a new perspective on its application. Specifically, we introduce a G-prior distribution for Bayesian inference in BRM, in addition to a flat-normal prior distribution. To compare the performance of the proposed prior distributions, we conduct a simulation study and demonstrate that the G-prior distribution provides superior estimation results for the BRM. Furthermore, we apply the methodology to real data and compare the BRM to the Poisson and negative binomial regression model using various model selection criteria. Our results provide valuable insights into the use of Bayesian methods for estimation and inference of the BRM and highlight the importance of considering the choice of prior distribution in the analysis of count data. Full article

(This article belongs to the Section Computational Statistics)

15 pages, 721 KB

Open AccessArticle

Rank-Based Control Charts Under Non-Overlapping Counting with Practical Applications in Logistics and Services

by Ioannis S. Triantafyllou

Stats 2025, 8(4), 94; https://doi.org/10.3390/stats8040094 - 9 Oct 2025

In this article, we establish a constructive nonparametric scheme for monitoring the quality of services provided by a transportation company. The proposed methodology aims at achieving the diligent tracking of the underlying process and the swift detection of any potential malfunctions. The implementation [...] Read more.

In this article, we establish a constructive nonparametric scheme for monitoring the quality of services provided by a transportation company. The proposed methodology aims at achieving the diligent tracking of the underlying process and the swift detection of any potential malfunctions. The implementation of the new framework requires the construction of appropriate schemes, which follow the set-up of a Shewhart chart and are connected to ranks and multiple run decision criteria. The dispersion and the mean value of the run length distribution for the suggested distribution-free scheme are investigated for the special case

k = 2

. For illustration purposes, a real-data logistics environment is discussed, whereas the proposed approach is applied for improving the quality of the provided services. Full article

► Show Figures

Figure 1

19 pages, 339 KB

Open AccessArticle

Improper Priors via Expectation Measures

by Peter Harremoës

Stats 2025, 8(4), 93; https://doi.org/10.3390/stats8040093 - 9 Oct 2025

In Bayesian statistics, the prior distributions play a key role in the inference, and there are procedures for finding prior distributions. An important problem is that these procedures often lead to improper prior distributions that cannot be normalized to probability measures. Such improper [...] Read more.

In Bayesian statistics, the prior distributions play a key role in the inference, and there are procedures for finding prior distributions. An important problem is that these procedures often lead to improper prior distributions that cannot be normalized to probability measures. Such improper prior distributions lead to technical problems, in that certain calculations are only fully justified in the literature for probability measures or perhaps for finite measures. Recently, expectation measures were introduced as an alternative to probability measures as a foundation for a theory of uncertainty. Using expectation theory and point processes, it is possible to give a probabilistic interpretation of an improper prior distribution. This will provide us with a rigid formalism for calculating posterior distributions in cases where the prior distributions are not proper without relying on approximation arguments. Full article

(This article belongs to the Section Bayesian Methods)

► Show Figures

Figure 1

9 pages, 590 KB

Open AccessArticle

Predictions of War Duration

by Glenn McRae

Stats 2025, 8(4), 92; https://doi.org/10.3390/stats8040092 - 9 Oct 2025

The durations of wars fought between 1480 and 1941 A.D. were found to be well represented by random numbers chosen from a single-event Poisson distribution with a half-life of (1.25 ± 0.1) years. This result complements the work of L.F. Richardson who found [...] Read more.

The durations of wars fought between 1480 and 1941 A.D. were found to be well represented by random numbers chosen from a single-event Poisson distribution with a half-life of (1.25 ± 0.1) years. This result complements the work of L.F. Richardson who found that the frequency of outbreaks of wars can be described as a Poisson process. This result suggests that a quick return on investment requires a distillation of the many stressors of the day, each one of which has a small probability of being included in a convincing well-orchestrated simple call-to-arms. The half-life is a measure of how this call wanes with time. Full article

► Show Figures

Figure 1

10 pages, 697 KB

Open AccessArticle

Benford Behavior in Stick Fragmentation Problems

by Bruce Fang, Ava Irons, Ella Lippelman and Steven J. Miller

Stats 2025, 8(4), 91; https://doi.org/10.3390/stats8040091 - 8 Oct 2025

Benford’s law states that in many real-world datasets, the probability that the leading digit is d equals

{log}_{10} ((d + 1) / d)

for all

1 \leq d \leq 9

. We call this weak Benford behavior. A [...] Read more.

Benford’s law states that in many real-world datasets, the probability that the leading digit is d equals

{log}_{10} ((d + 1) / d)

for all

1 \leq d \leq 9

. We call this weak Benford behavior. A dataset is said to follow strong Benford behavior if the probability that its significand (i.e., the significant digits in scientific notation) is at most s equals

{log}_{10} (s)

for all

s \in [1, 10)

. We investigate Benford behavior in a multi-proportion stick fragmentation model, where a stick is split into m substicks according to fixed proportions at each stage. This generalizes previous work on the single proportion stick fragmentation model, where each stick is split into two substicks using one fixed proportion. We provide a necessary and sufficient condition under which the lengths of the stick fragments converge to strong Benford behavior in the multi-proportion model. Full article

(This article belongs to the Special Issue Benford's Law(s) and Applications (Second Edition))

► Show Figures

Figure 1

12 pages, 683 KB

Open AccessReview

The Use of Double Poisson Regression for Count Data in Health and Life Science—A Narrative Review

by Sebastian Appelbaum, Julia Stronski, Uwe Konerding and Thomas Ostermann

Stats 2025, 8(4), 90; https://doi.org/10.3390/stats8040090 - 1 Oct 2025

Count data are present in many areas of everyday life. Unfortunately, such data are often characterized by over- and under-dispersion. In 1986, Efron introduced the Double Poisson distribution to account for this problem. The aim of this work is to examine the application [...] Read more.

Count data are present in many areas of everyday life. Unfortunately, such data are often characterized by over- and under-dispersion. In 1986, Efron introduced the Double Poisson distribution to account for this problem. The aim of this work is to examine the application of this distribution in regression analyses performed in health-related literature by means of a narrative review. The databases Science Direct, PBSC, Pubmed PsycInfo, PsycArticles, CINAHL and Google Scholar were searched for applications. Two independent reviewers extracted data on Double Poisson Regression Models and their applications in the health and life sciences. From a total of 1644 hits, 84 articles were pre-selected and after full-text screening, 13 articles remained. All these articles were published after 2011 and most of them targeted epidemiological research. Both over- and under-dispersion was present and most of the papers used the generalized additive models for location, scale, and shape (GAMLSS) framework. In summary, this narrative review shows that the first steps in applying Efron’s idea of double exponential families for empirical count data have already been successfully taken in a variety of fields in the health and life sciences. Approaches to ease their application in clinical research should be encouraged. Full article

(This article belongs to the Topic Application of Biostatistics in Medical Sciences and Global Health)

► Show Figures

Figure 1

22 pages, 1227 KB

Open AccessArticle

Theoretically Based Dynamic Regression (TDR)—A New and Novel Regression Framework for Modeling Dynamic Behavior

by Derrick K. Rollins, Marit Nilsen-Hamilton, Kendra Kreienbrink, Spencer Wolfe, Dillon Hurd and Jacob Oyler

Stats 2025, 8(4), 89; https://doi.org/10.3390/stats8040089 - 28 Sep 2025

The theoretical modeling of a dynamic system will have derivatives of the response (y) with respect to time (t). Two common physical attributes (i.e., parameters) of dynamic systems are dead-time (θ) and lag (τ). Theoretical [...] Read more.

The theoretical modeling of a dynamic system will have derivatives of the response (y) with respect to time (t). Two common physical attributes (i.e., parameters) of dynamic systems are dead-time (θ) and lag (τ). Theoretical dynamic modeling will contain physically interpretable parameters such as τ and θ with physical constraints. In addition, the number of unknown model-based parameters can be considerably smaller than empirically based (i.e., lagged-based) approaches. This work proposes a Theoretically based Dynamic Regression (TDR) modeling approach that overcomes critical lagged-based modeling limitations as demonstrated in three large, multiple input, highly dynamic, real data sets. Dynamic Regression (DR) is a lagged-based, empirical dynamic modeling approach that appears in the statistics literature. However, like all empirical approaches, the model structures do not contain first-principle interpretable parameters. Additionally, several time lags are typically needed for the output, y, and input, x, to capture significant dynamic behavior. TDR uses a simplistic theoretically based dynamic modeling approach to transform x_t into its dynamic counterpart, v_t, and then applies the methods and tools of static regression to v_t. TDR is demonstrated on the following three modeling problems of freely existing (i.e., not experimentally designed) real data sets: 1. the weight variation in a person (y) with four measured nutrient inputs (x_i); 2. the variation in the tray temperature (y) of a distillation column with nine inputs and eight test data sets over a three year period; and 3. eleven extremely large, highly dynamic, subject-specific models of sensor glucose (y) with 12 inputs (x_i). Full article

(This article belongs to the Special Issue Advances in Machine Learning, High-Dimensional Inference, Shrinkage Estimation, and Model Validation)

► Show Figures

Figure 1

2 pages, 162 KB

Open AccessCorrection

Correction: Chen et al. Scoring Individual Moral Inclination for the CNI Test. Stats 2024, 7, 894–905

by Yi Chen, Benjamin Lugu, Wenchao Ma and Hyemin Han

Stats 2025, 8(4), 88; https://doi.org/10.3390/stats8040088 - 28 Sep 2025

Error in Table [...] Full article

14 pages, 434 KB

Open AccessArticle

Energy Statistic-Based Goodness-of-Fit Test for the Lindley Distribution with Application to Lifetime Data

by Joseph Njuki and Ryan Avallone

Stats 2025, 8(4), 87; https://doi.org/10.3390/stats8040087 - 26 Sep 2025

In this article, we propose a goodness-of-fit test for a one-parameter Lindley distribution based on energy statistics. The Lindley distribution has been widely used in reliability studies and survival analysis, especially in applied sciences. The proposed test procedure is simple and more powerful [...] Read more.

In this article, we propose a goodness-of-fit test for a one-parameter Lindley distribution based on energy statistics. The Lindley distribution has been widely used in reliability studies and survival analysis, especially in applied sciences. The proposed test procedure is simple and more powerful against general alternatives. Under different settings, Monte Carlo simulations show that the proposed test is able to be well controlled for any given nominal levels. In terms of power, the proposed test outperforms other existing similar methods in different settings. We then apply the proposed test to real-life datasets to demonstrate its competitiveness and usefulness. Full article

► Show Figures

Figure 1

32 pages, 1136 KB

Open AccessArticle

Enhancing Diversity and Improving Prediction Performance of Subsampling-Based Ensemble Methods

by Maria Ordal and Qing Wang

Stats 2025, 8(4), 86; https://doi.org/10.3390/stats8040086 - 26 Sep 2025

This paper investigates how diversity among training samples impacts the predictive performance of a subsampling-based ensemble. It is well known that diverse training samples improve ensemble predictions, and smaller subsampling rates naturally lead to enhanced diversity. However, this approach of achieving a higher [...] Read more.

This paper investigates how diversity among training samples impacts the predictive performance of a subsampling-based ensemble. It is well known that diverse training samples improve ensemble predictions, and smaller subsampling rates naturally lead to enhanced diversity. However, this approach of achieving a higher degree of diversity often comes with the cost of a reduced training sample size, which is undesirable. This paper introduces two novel subsampling strategies—partition and shift subsampling—as alternative schemes designed to improve diversity without sacrificing the training sample size in subsampling-based ensemble methods. From a probabilistic perspective, we investigate their impact on subsample diversity when utilized with tree-based sub-ensemble learners in comparison to the benchmark random subsampling. Through extensive simulations and eight real-world examples in both regression and classification contexts, we found a significant improvement in the predictive performance of the developed methods. Notably, this gain is particularly pronounced on challenging datasets or when higher subsampling rates are employed. Full article

(This article belongs to the Section Applied Statistics and Machine Learning Methods)

► Show Figures

Figure 1

13 pages, 357 KB

Open AccessReview

An Overview of Economics and Econometrics Related R Packages

by Despina Michelaki, Michail Tsagris and Christos Adam

Stats 2025, 8(4), 85; https://doi.org/10.3390/stats8040085 - 26 Sep 2025

This study provides a systematic overview of 207 econometrics-related R packages identified through CRAN and the Econometrics Task View. Using descriptive and inferential statistics and text mining to compute the word frequency and association among words (n-grams and correlations), we evaluate the development [...] Read more.

This study provides a systematic overview of 207 econometrics-related R packages identified through CRAN and the Econometrics Task View. Using descriptive and inferential statistics and text mining to compute the word frequency and association among words (n-grams and correlations), we evaluate the development patterns, documentation practices, publication outcomes, and methodological scope. The findings reveal that most packages are created by small-to-mid-sized teams in Europe and North America, with mid-sized collaborations and packages including vignettes being significantly more likely to achieve journal publication. While reverse dependencies indicate strong ecosystem integration, they do not predict publication, and Bayesian or dataset-only packages remain underrepresented. Growth has accelerated since 2010, but newer packages exhibit fewer updates, raising concerns about sustainability. These findings highlight both the central role of R in contemporary econometrics and the need for broader participation, methodological diversity, and long-term maintenance. Full article

► Show Figures

Figure 1

19 pages, 1013 KB

Open AccessArticle

A Simulation-Based Comparative Analysis of Two-Parameter Robust Ridge M-Estimators for Linear Regression Models

by Bushra Haider, Syed Muhammad Asim, Danish Wasim and B. M. Golam Kibria

Stats 2025, 8(4), 84; https://doi.org/10.3390/stats8040084 - 24 Sep 2025

Traditional regression estimators like Ordinary Least Squares (OLS) and classical ridge regression often fail under multicollinearity and outlier contamination respectively. Although recently developed two-parameter ridge regression (TPRR) estimators improve efficiency by introducing dual shrinkage parameters, they remain sensitive to extreme observations. This study [...] Read more.

Traditional regression estimators like Ordinary Least Squares (OLS) and classical ridge regression often fail under multicollinearity and outlier contamination respectively. Although recently developed two-parameter ridge regression (TPRR) estimators improve efficiency by introducing dual shrinkage parameters, they remain sensitive to extreme observations. This study develops a new class of Two-Parameter Robust Ridge M-Estimators (TPRRM) that integrate dual shrinkage with robust M-estimation to simultaneously address multicollinearity and outliers. A Monte Carlo simulation study, conducted under varying sample sizes, predictor dimensions, correlation levels, and contamination structures, compares the proposed estimators with OLS, ridge, and the most recent TPRR estimators. The results demonstrate that TPRRM consistently achieves the lowest Mean Squared Error (MSE), particularly in heavy-tailed and outlier-prone scenarios. Application to the Tobacco and Gasoline Consumption datasets further validates the superiority of the proposed methods in real-world conditions. The findings confirm that the proposed TPRRM fills a critical methodological gap by offering estimators that are not only efficient under multicollinearity, but also robust against departures from normality. Full article

(This article belongs to the Special Issue Advances in Machine Learning, High-Dimensional Inference, Shrinkage Estimation, and Model Validation)

► Show Figures

Figure 1

9 pages, 717 KB

Open AccessArticle

Confidence Intervals of Risk Ratios for the Augmented Logistic Regression with Pseudo-Observations

by Hiroyuki Shiiba and Hisashi Noma

Stats 2025, 8(3), 83; https://doi.org/10.3390/stats8030083 - 18 Sep 2025

The augmented logistic regression proposed by Diaz-Quijano directly provides risk ratios with an augmented dataset with the pseudo-observations. However, the standard errors of regression coefficients cannot be accurately estimated using either the ordinary model variance estimator or the robust variance estimator, as neither [...] Read more.

The augmented logistic regression proposed by Diaz-Quijano directly provides risk ratios with an augmented dataset with the pseudo-observations. However, the standard errors of regression coefficients cannot be accurately estimated using either the ordinary model variance estimator or the robust variance estimator, as neither method appropriately accounts for the pseudo-observations. In this study, we proposed two resampling strategies based on the bootstrap and jackknife methods to construct improved variance estimators for the augmented logistic regression. Both procedures can reflect the overall uncertainty of the augmented dataset involving the pseudo-observations and require only standard software, making them feasible for a wide range of clinical and epidemiological researchers. We validated these proposed methods through comprehensive simulation studies, which demonstrated that both the bootstrap- and jackknife-based variance estimators provided smaller standard error estimates and correspondingly narrower 95% confidence intervals, whereas the robust variance estimator remained biased. Additionally, we applied the proposed methods to real-world binary data, confirming their practical utility. Full article

(This article belongs to the Section Biostatistics)

► Show Figures

Figure 1

15 pages, 748 KB

Open AccessArticle

A Mixture Model for Survival Data with Both Latent and Non-Latent Cure Fractions

by Eduardo Yoshio Nakano, Frederico Machado Almeida and Marcílio Ramos Pereira Cardial

Stats 2025, 8(3), 82; https://doi.org/10.3390/stats8030082 - 13 Sep 2025

One of the most popular cure rate models in the literature is the Berkson and Gage mixture model. A characteristic of this model is that it considers the cure to be a latent event. However, there are situations in which the cure is [...] Read more.

One of the most popular cure rate models in the literature is the Berkson and Gage mixture model. A characteristic of this model is that it considers the cure to be a latent event. However, there are situations in which the cure is well known, and this information must be considered in the analysis. In this context, this paper proposes a mixture model that accommodates both latent and non-latent cure fractions. More specifically, the proposal is to extend the Berkson and Gage mixture model to include the knowledge of the cure. A simulation study was conducted to investigate the asymptotic properties of maximum likelihood estimators. Finally, the proposed model is illustrated through an application to credit risk modeling. Full article

(This article belongs to the Section Survival Analysis)

► Show Figures

Figure 1

More Articles...

Submit to Stats Review for Stats

Journal Menu

Journal Browser

► Journal Browser

Highly Accessed Articles

View More...

Latest Books

More Books and Reprints...

E-Mail Alert

News

1 October 2025
2024 MDPI Top 1000 Reviewers

15 October 2025
MDPI’s Newly Launched Journals in September 2025

2 October 2025
MDPI INSIGHTS: The CEO's Letter #27 - OASPA 2025, COUNTER 5.1, UK Summit in London, MDPI at the Italian Senate

More News & Announcements...

Topics

Propose a Topic

Topic in JPM, Mathematics, Applied Sciences, Stats, Healthcare

Application of Biostatistics in Medical Sciences and Global Health Topic Editors: Bogdan Oancea, Adrian Pană, Cǎtǎlina Liliana Andrei
Deadline: 31 October 2026

Conferences

Propose a Conference Collaboration

21–22 September 2026 The 1st International Online Conference on Forecasting

20–22 October 2026 The 1st International Online Conference on Symmetry (IOCSYM 2026)

More Conferences...

Special Issues

Propose a Special Issue

Special Issue in Stats

Benford's Law(s) and Applications (Second Edition) Guest Editors: Marcel Ausloos, Roy Cerqueti, Claudio Lupi
Deadline: 31 October 2025

Special Issue in Stats

Ethicametrics Guest Editor: Fabio Zagonari
Deadline: 31 October 2025

Special Issue in Stats

Robust Statistics in Action II Guest Editor: Marco Riani
Deadline: 31 December 2025

Special Issue in Stats

Advances in Machine Learning, High-Dimensional Inference, Shrinkage Estimation, and Model Validation Guest Editor: B. M. Golam Kibria
Deadline: 25 March 2026

More Special Issues

Back to TopTop