Next Issue
Volume 9, February
Previous Issue
Volume 8, September
 
 

Stats, Volume 8, Issue 4 (December 2025) – 36 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
29 pages, 718 KB  
Article
Robust Kibria Estimators for Mitigating Multicollinearity and Outliers in a Linear Regression Model
by Hina Naz, Ismail Shah, Danish Wasim and Sajid Ali
Stats 2025, 8(4), 119; https://doi.org/10.3390/stats8040119 - 17 Dec 2025
Abstract
In the presence of multicollinearity, the ordinary least squares (OLS) estimators, aside from BLUE (best linear unbiased estimator), lose efficiency and fail to achieve minimum variance. In addition, these estimators are highly sensitive to outliers in the response direction. To overcome these limitations, [...] Read more.
In the presence of multicollinearity, the ordinary least squares (OLS) estimators, aside from BLUE (best linear unbiased estimator), lose efficiency and fail to achieve minimum variance. In addition, these estimators are highly sensitive to outliers in the response direction. To overcome these limitations, robust estimation techniques are often integrated with shrinkage methods. This study proposes a new class of Kibria Ridge M-estimators specifically developed to simultaneously address multicollinearity and outlier contamination. A comprehensive Monte Carlo simulation study is conducted to evaluate the performance of the proposed and existing estimators. Based on the mean squared error criterion, the proposed Kibria Ridge M-estimators consistently outperform the traditional ridge-type estimators under varying parameter settings. Furthermore, the practical applicability and superiority of the proposed estimators are validated using the Tobacco and Anthropometric datasets. Overall, the new proposed estimators demonstrate good performance, offering robust and efficient alternatives for regression modeling in the presence of multicollinearity and outliers. Full article
Show Figures

Figure 1

27 pages, 2551 KB  
Article
Korovkin-Type Approximation Theorems for Statistical Gauge Integrable Functions of Two Variables
by Hari Mohan Srivastava, Bidu Bhusan Jena, Susanta Kumar Paikray and Umakanta Misra
Stats 2025, 8(4), 118; https://doi.org/10.3390/stats8040118 - 15 Dec 2025
Viewed by 56
Abstract
In this work, we develop and investigate statistical extensions of gauge integrability and gauge summability for double sequences of functions of two real variables, formulated within the framework of deferred weighted means. We begin by establishing several fundamental limit theorems that serve to [...] Read more.
In this work, we develop and investigate statistical extensions of gauge integrability and gauge summability for double sequences of functions of two real variables, formulated within the framework of deferred weighted means. We begin by establishing several fundamental limit theorems that serve to connect these generalized notions and provide a rigorous theoretical foundation. Based on these results, we establish Korovkin-type approximation theorems using the classical test function set 1,s,t,s2+t2 in the Banach space C([0,1]2). To demonstrate the applicability of the proposed framework, we further present an example involving families of positive linear operators associated with the Meyer-König and Zeller (MKZ) operators. These findings not only extend classical Korovkin-type theorems to the setting of statistical deferred gauge integrability and summability but also underscore their robustness in addressing double sequences and the approximation of two-variable functions. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

28 pages, 2131 KB  
Article
Still No Free Lunch: Failure of Stability in Regulated Systems of Interacting Cognitive Modules
by Rodrick Wallace
Stats 2025, 8(4), 117; https://doi.org/10.3390/stats8040117 - 15 Dec 2025
Viewed by 61
Abstract
The asymptotic limit theorems of information and control theories, instantiated as the Rate Distortion Control Theory of bounded rationality, enable examination of stability across models of cognition based on a variety of fundamental, underlying probability distributions likely to characterize different forms of embodied [...] Read more.
The asymptotic limit theorems of information and control theories, instantiated as the Rate Distortion Control Theory of bounded rationality, enable examination of stability across models of cognition based on a variety of fundamental, underlying probability distributions likely to characterize different forms of embodied ‘intelligent’ systems. Embodied cognition is inherently unstable, requiring the pairing of cognition with regulation at and across the various and varied scales and levels of organization. Like contemporary Large Language Model ‘hallucination,’ de facto ‘psychopathology’—the failure of regulation in systems of cognitive modules—is not a bug but an inherent feature of embodied cognition. What particularly emerges from this analysis, then, is the ubiquity of failure-under-stress even for ‘intelligent’ embodied cognition, where cognitive and regulatory modules are closely paired. There is still No Free Lunch, much in the classic sense of Wolpert and Macready. With some further effort, the probability models developed here can be transformed into robust statistical tools for the analysis of observational and experimental data regarding regulated and other cognitive phenomena. Full article
Show Figures

Figure 1

31 pages, 1071 KB  
Review
Mapping Research on the Birnbaum–Saunders Statistical Distribution: Patterns, Trends, and Scientometric Perspective
by Víctor Leiva
Stats 2025, 8(4), 116; https://doi.org/10.3390/stats8040116 - 13 Dec 2025
Viewed by 101
Abstract
This article provides a critical assessment of the Birnbaum–Saunders (BS) distribution, a pivotal statistical model for lifetime data analysis and reliability estimation, particularly in fatigue contexts. The model has seen successfully applied across diverse fields, including biological mortality, environmental sciences, medicine, and risk [...] Read more.
This article provides a critical assessment of the Birnbaum–Saunders (BS) distribution, a pivotal statistical model for lifetime data analysis and reliability estimation, particularly in fatigue contexts. The model has seen successfully applied across diverse fields, including biological mortality, environmental sciences, medicine, and risk models. Moving beyond a basic scientometric review, this study synthesizes findings from 353 peer-reviewed articles, selected using PRISMA 2020 protocols, to specifically trace the evolution of estimation techniques, regression methods, and model extensions. Key findings reveal robust theoretical advances, such as Bayesian methods and bivariate/spatial adaptations, alongside practical progress in influence diagnostics and software development. The analysis highlights key research gaps, including the critical need for scalable, auditable software and structured reviews, and notes a peak in scholarly activity around 2019, driven importantly by the Brazil-Chile research alliance. This work offers a consolidated view of current BS model implementations and outlines clear future directions for enhancing their theoretical robustness and practical utility. Full article
Show Figures

Figure 1

11 pages, 2187 KB  
Article
Entropy and Minimax Risk Diversification: An Empirical and Simulation Study of Portfolio Optimization
by Hongyu Yang and Zijian Luo
Stats 2025, 8(4), 115; https://doi.org/10.3390/stats8040115 - 11 Dec 2025
Viewed by 189
Abstract
The optimal allocation of funds within a portfolio is a central research focus in finance. Conventional mean-variance models often concentrate a significant portion of funds in a limited number of high-risk assets. To promote diversification, Shannon Entropy is widely applied. This paper develops [...] Read more.
The optimal allocation of funds within a portfolio is a central research focus in finance. Conventional mean-variance models often concentrate a significant portion of funds in a limited number of high-risk assets. To promote diversification, Shannon Entropy is widely applied. This paper develops a portfolio optimization model that incorporates Shannon Entropy alongside a risk diversification principle aimed at minimizing the maximum individual asset risk. The study combines empirical analysis with numerical simulations. First, empirical data are used to assess the theoretical model’s effectiveness and practicality. Second, numerical simulations are conducted to analyze portfolio performance under extreme market scenarios. Specifically, the numerical results indicate that for fixed values of the risk balance coefficient and minimum expected return, the optimal portfolios and their return distributions are similar when the risk is measured by standard deviation, absolute deviation, or standard lower semi-deviation. This suggests that the model exhibits robustness to variations in the risk function, providing a relatively stable investment strategy. Full article
(This article belongs to the Special Issue Robust Statistics in Action II)
Show Figures

Figure 1

19 pages, 1922 KB  
Article
Validated Transfer Learning Peters–Belson Methods for Survival Analysis: Ensemble Machine Learning Approaches with Overfitting Controls for Health Disparity Decomposition
by Menglu Liang and Yan Li
Stats 2025, 8(4), 114; https://doi.org/10.3390/stats8040114 - 10 Dec 2025
Viewed by 223
Abstract
Background: Health disparities research increasingly relies on complex survey data to understand survival differences between population subgroups. While Peters–Belson decomposition provides a principled framework for distinguishing disparities explained by measured covariates from unexplained residual differences, traditional approaches face challenges with complex data patterns [...] Read more.
Background: Health disparities research increasingly relies on complex survey data to understand survival differences between population subgroups. While Peters–Belson decomposition provides a principled framework for distinguishing disparities explained by measured covariates from unexplained residual differences, traditional approaches face challenges with complex data patterns and model validation for counterfactual estimation. Objective: To develop validated Peters–Belson decomposition methods for survival analysis that integrate ensemble machine learning with transfer learning while ensuring logical validity of counterfactual estimates through comprehensive model validation. Methods: We extend the traditional Peters–Belson framework through ensemble machine learning that combines Cox proportional hazards models, cross-validated random survival forests, and regularized gradient boosting approaches. Our framework incorporates a transfer learning component via principal component analysis (PCA) to discover shared latent factors between majority and minority groups. We note that this “transfer learning” differs from the standard machine learning definition (pre-trained models or domain adaptation); here, we use the term in its statistical sense to describe the transfer of covariate structure information from the pooled population to identify group-level latent factors. We develop a comprehensive validation framework that ensures Peters–Belson logical bounds compliance, preventing mathematical violations in counterfactual estimates. The approach is evaluated through simulation studies across five realistic health disparity scenarios using stratified complex survey designs. Results: Simulation studies demonstrate that validated ensemble methods achieve superior performance compared to individual models (proportion explained: 0.352 vs. 0.310 for individual Cox, 0.325 for individual random forests), with validation framework reducing logical violations from 34.7% to 2.1% of cases. Transfer learning provides additional 16.1% average improvement in explanation of unexplained disparity when significant unmeasured confounding exists, with 90.1% overall validation success rate. The validation framework ensures explanation proportions remain within realistic bounds while maintaining computational efficiency with 31% overhead for validation procedures. Conclusions: Validated ensemble machine learning provides substantial advantages for Peters–Belson decomposition when combined with proper model validation. Transfer learning offers conditional benefits for capturing unmeasured group-level factors while preventing mathematical violations common in standard approaches. The framework demonstrates that realistic health disparity patterns show 25–35% of differences explained by measured factors, providing actionable targets for reducing health inequities. Full article
Show Figures

Figure 1

19 pages, 1295 KB  
Communication
Goodness of Chi-Square for Linearly Parameterized Fitting
by George Livadiotis
Stats 2025, 8(4), 113; https://doi.org/10.3390/stats8040113 - 1 Dec 2025
Viewed by 187
Abstract
The paper shows an alternative perspective of the reduced chi-square as a measure of the goodness of fitting methods. The reduced chi-square is given by the ratio of the fitting over the propagation errors, that is, a universal relationship that holds for any [...] Read more.
The paper shows an alternative perspective of the reduced chi-square as a measure of the goodness of fitting methods. The reduced chi-square is given by the ratio of the fitting over the propagation errors, that is, a universal relationship that holds for any linearity, but not for a nonlinearly parameterized fitting model. We begin by providing the proof for the traditional examples of one-parametric fitting of a constant and the bi-parametric fitting of a linear model, and then, for the general case of any linearly multi-parameterized model. We also show that this characterization is not generally true for nonlinearly parameterized fitting. Finally, we demonstrate these theoretical developments with an application in real data from the plasma protons in the heliosphere. Full article
Show Figures

Figure 1

26 pages, 3726 KB  
Article
Factor Analysis Biplots for Continuous, Binary and Ordinal Data
by Marina Valdés-Rodríguez, Laura Vicente-González and José L. Vicente-Villardón
Stats 2025, 8(4), 112; https://doi.org/10.3390/stats8040112 - 25 Nov 2025
Viewed by 204
Abstract
This article presents biplots derived from factor analysis of correlation matrices for both continuous and ordinal data. It introduces biplots specifically designed for factor analysis, detailing the geometric interpretation for each data type and providing an algorithm to compute biplot coordinates from the [...] Read more.
This article presents biplots derived from factor analysis of correlation matrices for both continuous and ordinal data. It introduces biplots specifically designed for factor analysis, detailing the geometric interpretation for each data type and providing an algorithm to compute biplot coordinates from the factorization of correlation matrices. The theoretical developments are illustrated using a real dataset that explores the relationship between volunteering, political ideology, and civic engagement in Spain. Full article
(This article belongs to the Section Multivariate Analysis)
Show Figures

Figure 1

16 pages, 994 KB  
Article
A Copula-Based Model for Analyzing Bivariate Offense Data
by Dimuthu Fernando and Wimarsha Jayanetti
Stats 2025, 8(4), 111; https://doi.org/10.3390/stats8040111 - 19 Nov 2025
Viewed by 318
Abstract
We developed a class of bivariate integer-valued time series models using copula theory. Each count time series is modeled as a Markov chain, with serial dependence characterized through copula-based transition probabilities for Poisson and Negative Binomial marginals. Cross-sectional dependence is modeled via a [...] Read more.
We developed a class of bivariate integer-valued time series models using copula theory. Each count time series is modeled as a Markov chain, with serial dependence characterized through copula-based transition probabilities for Poisson and Negative Binomial marginals. Cross-sectional dependence is modeled via a bivariate Gaussian copula, allowing for both positive and negative correlations and providing a flexible dependence structure. Model parameters are estimated using likelihood-based inference, where the bivariate Gaussian copula integral is evaluated through standard randomized Monte Carlo methods. The proposed approach is illustrated through an application to offense data from New South Wales, Australia, demonstrating its effectiveness in capturing complex dependence patterns. Full article
(This article belongs to the Section Time Series Analysis)
Show Figures

Figure 1

25 pages, 362 KB  
Article
Prediction Inferences for Finite Population Totals Using Longitudinal Survey Data
by Asokan M. Variyath and Brajendra C. Sutradhar
Stats 2025, 8(4), 110; https://doi.org/10.3390/stats8040110 - 18 Nov 2025
Viewed by 201
Abstract
In an infinite-/super-population (SP) setup, regression analysis of longitudinal data, which involves repeated responses and covariates collected from a sample of independent individuals or correlated individuals belonging to a cluster such as a household/family, has been intensively studied in the statistics literature over [...] Read more.
In an infinite-/super-population (SP) setup, regression analysis of longitudinal data, which involves repeated responses and covariates collected from a sample of independent individuals or correlated individuals belonging to a cluster such as a household/family, has been intensively studied in the statistics literature over the last three decades. In general, a longitudinal, such as an auto-correlation structure for repeated responses for an individual or a two-way cluster–longitudinal correlation structure for repeated responses from the individuals belonging to a cluster/household, are exploited to obtain consistent and efficient regression estimates. However, as opposed to the SP setup, a similar regression analysis for a finite population (FP)-based longitudinal or clustered longitudinal data using a survey sample (SS) taken from the FP-based on a suitable sampling design becomes complex, which requires first defining the FP regression and correlation (both longitudinal and/or clustered) parameters and then estimating them using appropriate sampling weighted-design unbiased (SWDU) estimating equations. The finite sampling inferences, such as predictions of longitudinal changes in FP totals, would become much more complex, meaning that it would be necessary to predict the non-sampled totals after accommodating the longitudinal and/or clustered longitudinal correlation structures. Our objective in this paper is to deal with this complex FP prediction inference by developing a design cum model (DCM)-based estimation approach. Two competitive FP total predictors, namely design-assisted model-based (DAMB) and design cum model-based (DCMB) predictors are compared using an intensive simulation study. The regression and correlation parameters involved in these prediction functions are optimally estimated using the proposed DCM-based approach. Full article
31 pages, 3426 KB  
Article
Maximum Likelihood and Calibrating Prior Prediction Reliability Bias Reference Charts
by Stephen Jewson
Stats 2025, 8(4), 109; https://doi.org/10.3390/stats8040109 - 6 Nov 2025
Viewed by 628
Abstract
There are many studies in the scientific literature that present predictions from parametric statistical models based on maximum likelihood estimates of the unknown parameters. However, generating predictions from maximum likelihood parameter estimates ignores the uncertainty around the parameter estimates. As a result, predictive [...] Read more.
There are many studies in the scientific literature that present predictions from parametric statistical models based on maximum likelihood estimates of the unknown parameters. However, generating predictions from maximum likelihood parameter estimates ignores the uncertainty around the parameter estimates. As a result, predictive probability distributions based on maximum likelihood are typically too narrow, and simulation testing has shown that tail probabilities are underestimated compared to the relative frequencies of out-of-sample events. We refer to this underestimation as a reliability bias. Previous authors have shown that objective Bayesian methods can eliminate or reduce this bias if the prior is chosen appropriately. Such methods have been given the name calibrating prior prediction. We investigate maximum likelihood reliability bias in more detail. We then present reference charts that quantify the reliability bias for 18 commonly used statistical models, for both maximum likelihood prediction and calibrating prior prediction. The charts give results for a large number of combinations of sample size and nominal probability and contain orders of magnitude more information about the reliability biases in predictions from these methods than has previously been published. These charts serve two purposes. First, they can be used to evaluate the extent to which maximum likelihood predictions given in the scientific literature are affected by reliability bias. If the reliability bias is large, the predictions may need to be revised. Second, the charts can be used in the design of future studies to assess whether it is appropriate to use maximum likelihood prediction, whether it would be more appropriate to reduce the reliability bias by using calibrating prior prediction, or whether neither maximum likelihood prediction nor calibrating prior prediction gives an adequately low reliability bias. Full article
Show Figures

Figure 1

12 pages, 1242 KB  
Article
Analysis of the Truncated XLindley Distribution Using Bayesian Robustness
by Meriem Keddali, Hamida Talhi, Ali Slimani and Mohammed Amine Meraou
Stats 2025, 8(4), 108; https://doi.org/10.3390/stats8040108 - 5 Nov 2025
Viewed by 321
Abstract
In this work, we present a robust examination of the Bayesian estimators utilizing the two-parameter Upper truncated XLindley model, a unique Lindley model variant, and the oscillation of posterior risks. We provide the model in a censored scheme along with its likelihood function. [...] Read more.
In this work, we present a robust examination of the Bayesian estimators utilizing the two-parameter Upper truncated XLindley model, a unique Lindley model variant, and the oscillation of posterior risks. We provide the model in a censored scheme along with its likelihood function. The topic of sensitivity and robustness analysis of the Bayesian estimators was only covered by a small number of authors. As a result, very few apps have been created in this field. The oscillation of the posterior hazards of the Bayesian estimator is used to illustrate the method. By using a Monte Carlo simulation study, we show that, with the correct generalized loss function, a robust Bayesian estimator of the parameters corresponding to the smallest oscillation of the posterior risks may be obtained; robust estimators can be obtained when the parameter space is low-dimensional. The robustness and precision of Bayesian parameter estimation can be enhanced in regimes where the parameters of interest are of small magnitude. Full article
(This article belongs to the Special Issue Robust Statistics in Action II)
Show Figures

Figure 1

20 pages, 386 KB  
Article
A High Dimensional Omnibus Regression Test
by Ahlam M. Abid, Paul A. Quaye and David J. Olive
Stats 2025, 8(4), 107; https://doi.org/10.3390/stats8040107 - 5 Nov 2025
Cited by 1 | Viewed by 351
Abstract
Consider regression models where the response variable Y only depends on the p×1 vector of predictors x=(x1,,xp)T through the sufficient predictor SP=α+xTβ. [...] Read more.
Consider regression models where the response variable Y only depends on the p×1 vector of predictors x=(x1,,xp)T through the sufficient predictor SP=α+xTβ. Let the covariance vector Cov(x,Y)=ΣxY. Assume the cases (xiT,Yi)T are independent and identically distributed random vectors for i=1,,n. Then for many such regression models, β=0 if and only if ΣxY=0 where 0 is the p×1 vector of zeroes. The test of H0:ΣxY=0 versus H1:ΣxY0 is equivalent to the high dimensional one sample test H0:μ=0 versus HA:μ0 applied to w1,,wn where wi=(xiμx)(YiμY) and the expected values E(x)=μx and E(Y)=μY. Since μx and μY are unknown, the test of H0:β=0 versus H1:β0 is implemented by applying the one sample test to vi=(xix¯)(YiY¯) for i=1,,n. This test has milder regularity conditions than its few competitors. For the multiple linear regression one component partial least squares and marginal maximum likelihood estimators, the test can be adapted to test H0:(βi1,,βik)T=0 versus H1:(βi1,,βik)T0 where 1kp. Full article
(This article belongs to the Section Regression Models)
14 pages, 395 KB  
Article
A Multi-State Model for Lung Cancer Mortality in Survival Progression
by Vinoth Raman, Sandra S. Ferreira, Dário Ferreira and Ayman Alzaatreh
Stats 2025, 8(4), 106; https://doi.org/10.3390/stats8040106 - 5 Nov 2025
Viewed by 634
Abstract
Lung cancer remains one of the leading causes of death worldwide due to its high rates of illness and mortality. In this study, we applied a continuous-time multi-state Markov model to examine how lung cancer progresses through six clinically defined stages, using retrospective [...] Read more.
Lung cancer remains one of the leading causes of death worldwide due to its high rates of illness and mortality. In this study, we applied a continuous-time multi-state Markov model to examine how lung cancer progresses through six clinically defined stages, using retrospective data from 576 patients. The model describes movements between disease stages and the final stage (death), providing estimates of how long patients typically remain in each stage and how quickly they move to the next. It also considers important demographic and clinical factors such as age, smoking history, hypertension, asthma, and gender, which influence survival outcomes. Our findings show slower changes at the beginning of the disease but faster decline in later stages, with clear differences across patient groups. This approach highlights the dynamic course of the illness and can help guide tailored follow-up, personalized treatment, and health policy decisions. The study is based on a secondary analysis of publicly available data and therefore did not require clinical trial registration. Full article
Show Figures

Figure 1

52 pages, 10804 KB  
Article
Silhouette-Based Evaluation of PCA, Isomap, and t-SNE on Linear and Nonlinear Data Structures
by Mostafa Zahed and Maryam Skafyan
Stats 2025, 8(4), 105; https://doi.org/10.3390/stats8040105 - 3 Nov 2025
Viewed by 681
Abstract
Dimensionality reduction is fundamental for analyzing high-dimensional data, supporting visualization, denoising, and structure discovery. We present a systematic, large-scale benchmark of three widely used methods—Principal Component Analysis (PCA), Isometric Mapping (Isomap), and t-Distributed Stochastic Neighbor Embedding (t-SNE)—evaluated by average silhouette scores to quantify [...] Read more.
Dimensionality reduction is fundamental for analyzing high-dimensional data, supporting visualization, denoising, and structure discovery. We present a systematic, large-scale benchmark of three widely used methods—Principal Component Analysis (PCA), Isometric Mapping (Isomap), and t-Distributed Stochastic Neighbor Embedding (t-SNE)—evaluated by average silhouette scores to quantify cluster preservation after embedding. Our full factorial simulation varies sample size n{100,200,300,400,500}, noise variance σ2{0.25,0.5,0.75,1,1.5,2}, and feature count p{20,50,100,200,300,400} under four generative regimes: (1) a linear Gaussian mixture, (2) a linear Student-t mixture with heavy tails, (3) a nonlinear Swiss-roll manifold, and (4) a nonlinear concentric-spheres manifold, each replicated 1000 times per condition. Beyond empirical comparisons, we provide mathematical results that explain the observed rankings: under standard separation and sampling assumptions, PCA maximizes silhouettes for linear, low-rank structure, whereas Isomap dominates on smooth curved manifolds; t-SNE prioritizes local neighborhoods, yielding strong local separation but less reliable global geometry. Empirically, PCA consistently achieves the highest silhouettes for linear structure (Isomap second, t-SNE third); on manifolds the ordering reverses (Isomap > t-SNE > PCA). Increasing σ2 and adding uninformative dimensions (larger p) degrade all methods, while larger n improves levels and stability. To our knowledge, this is the first integrated study combining a comprehensive factorial simulation across linear and nonlinear regimes with distribution-based summaries (density and violin plots) and supporting theory that predicts method orderings. The results offer clear, practice-oriented guidance: prefer PCA when structure is approximately linear; favor manifold learning—especially Isomap—when curvature is present; and use t-SNE for the exploratory visualization of local neighborhoods. Complete tables and replication materials are provided to facilitate method selection and reproducibility. Full article
Show Figures

Figure 1

21 pages, 1895 KB  
Article
Computational Testing Procedure for the Overall Lifetime Performance Index of Multi-Component Exponentially Distributed Products
by Shu-Fei Wu and Chia-Chi Hsu
Stats 2025, 8(4), 104; https://doi.org/10.3390/stats8040104 - 2 Nov 2025
Viewed by 285
Abstract
In addition to products with a single component, this study examines products composed of multiple components whose lifetimes follow a one-parameter exponential distribution. An overall lifetime performance index is developed to assess products under the progressive type I interval censoring scheme. This study [...] Read more.
In addition to products with a single component, this study examines products composed of multiple components whose lifetimes follow a one-parameter exponential distribution. An overall lifetime performance index is developed to assess products under the progressive type I interval censoring scheme. This study establishes the relationship between the overall and individual lifetime performance indices and derives the corresponding maximum likelihood estimators along with their asymptotic distributions. Based on the asymptotic distributions, the lower confidence bounds for all indices are also established. Furthermore, a hypothesis testing procedure is formulated to evaluate whether the overall lifetime performance index achieves the specified target level, utilizing the maximum likelihood estimator as the test statistic under a progressive type I interval censored sample. Moreover, a power analysis is carried out, and two numerical examples are presented to demonstrate the practical implementation for the overall lifetime performance index. This research can be applied to the fields of life testing and reliability analysis. Full article
Show Figures

Figure 1

16 pages, 1461 KB  
Article
A Nonparametric Monitoring Framework Based on Order Statistics and Multiple Scans: Advances and Applications in Ocean Engineering
by Ioannis S. Triantafyllou
Stats 2025, 8(4), 103; https://doi.org/10.3390/stats8040103 - 1 Nov 2025
Viewed by 274
Abstract
In this work, we introduce a statistical framework for monitoring the performance of a breakwater structure in reducing wave impact. The proposed methodology aims to achieve diligent tracking of the underlying process and the swift detection of any potential malfunctions. The implementation of [...] Read more.
In this work, we introduce a statistical framework for monitoring the performance of a breakwater structure in reducing wave impact. The proposed methodology aims to achieve diligent tracking of the underlying process and the swift detection of any potential malfunctions. The implementation of the new framework requires the construction of appropriate nonparametric Shewhart-type control charts, which rely on order statistics and scan-type decision criteria. The variance of the run length distribution of the proposed scheme is investigated, while the corresponding mean value is determined. For illustration purposes, we consider a real-life application, which aims at evaluating the effectiveness of a breakwater structure based on wave height reduction and wave energy dissipation. Full article
Show Figures

Figure 1

21 pages, 1332 KB  
Article
The Ridge-Hurdle Negative Binomial Regression Model: A Novel Solution for Zero-Inflated Counts in the Presence of Multicollinearity
by HM Nayem and B. M. Golam Kibria
Stats 2025, 8(4), 102; https://doi.org/10.3390/stats8040102 - 1 Nov 2025
Viewed by 837
Abstract
Datasets with many zero outcomes are common in real-world studies and often exhibit overdispersion and strong correlations among predictors, creating challenges for standard count models. Traditional approaches such as the Zero-Inflated Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB), and Hurdle models can handle extra [...] Read more.
Datasets with many zero outcomes are common in real-world studies and often exhibit overdispersion and strong correlations among predictors, creating challenges for standard count models. Traditional approaches such as the Zero-Inflated Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB), and Hurdle models can handle extra zeros and overdispersion but struggle when multicollinearity is present. This study introduces the Ridge-Hurdle Negative Binomial model, which incorporates L2 regularization into the truncated count component of the hurdle framework to jointly address zero inflation, overdispersion, and multicollinearity. Monte Carlo simulations under varying sample sizes, predictor correlations, and levels of overdispersion and zero inflation show that Ridge-Hurdle NB consistently achieves the lowest mean squared error (MSE) compared to ZIP, ZINB, Hurdle Poisson, Hurdle Negative Binomial, Ridge ZIP, and Ridge ZINB models. Applications to the Wildlife Fish and Medical Care datasets further confirm its superior predictive performance, highlighting RHNB as a robust and efficient solution for complex count data modeling. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

15 pages, 1977 KB  
Article
Robustness of the Trinormal ROC Surface Model: Formal Assessment via Goodness-of-Fit Testing
by Christos Nakas
Stats 2025, 8(4), 101; https://doi.org/10.3390/stats8040101 - 17 Oct 2025
Viewed by 621
Abstract
Receiver operating characteristic (ROC) surfaces provide a natural extension of ROC curves to three-class diagnostic problems. A key summary index is the volume under the surface (VUS), representing the probability that a randomly chosen observation from each of the three ordered groups is [...] Read more.
Receiver operating characteristic (ROC) surfaces provide a natural extension of ROC curves to three-class diagnostic problems. A key summary index is the volume under the surface (VUS), representing the probability that a randomly chosen observation from each of the three ordered groups is correctly classified. A parametric estimation of VUS typically assumes trinormality of the class distributions. However, a formal method for the verification of this composite assumption has not appeared in the literature. Our approach generalizes the two-class AUC-based GOF test of Zou et al. to the three-class setting by exploiting the parallel structure between empirical and trinormal VUS estimators. We propose a global goodness-of-fit (GOF) test for trinormal ROC models based on the difference between empirical and trinormal parametric estimates of the VUS. To improve stability, a probit transformation is applied and a bootstrap procedure is used to estimate the variance of the difference. The resulting test provides a formal diagnostic for assessing the adequacy of trinormal ROC modeling. Simulation studies illustrate the robustness of the assumption via the empirical size and power of the test under various distributional settings, including skewed and multimodal alternatives. The method’s application to COVID-19 antibody level data demonstrates the practical utility of it. Our findings suggest that the proposed GOF test is simple to implement, computationally feasible for moderate sample sizes, and a useful complement to existing ROC surface methodology. Full article
(This article belongs to the Section Biostatistics)
Show Figures

Figure 1

16 pages, 1699 KB  
Technical Note
Synthetic Hydrograph Estimation for Ungauged Basins: Exploring the Role of Statistical Distributions
by Dan Ianculescu and Cristian Gabriel Anghel
Stats 2025, 8(4), 100; https://doi.org/10.3390/stats8040100 - 17 Oct 2025
Viewed by 1006
Abstract
The use of probability distribution functions in deriving synthetic hydrographs has become a robust method for modeling the response of watersheds to precipitation events. This approach leverages statistical distributions to capture the temporal structure of runoff processes, providing a flexible framework for estimating [...] Read more.
The use of probability distribution functions in deriving synthetic hydrographs has become a robust method for modeling the response of watersheds to precipitation events. This approach leverages statistical distributions to capture the temporal structure of runoff processes, providing a flexible framework for estimating peak discharge, time to peak, and hydrograph shape. The present study explores the application of various probability distributions in constructing synthetic hydrographs. The research evaluates parameter estimation techniques, analyzing their influence on hydrograph accuracy. The results highlight the strengths and limitations of each distribution in capturing key hydrological characteristics, offering insights into the suitability of certain probability distribution functions under varying watershed conditions. The study concludes that the approach based on the Cadariu rational function enhances the adaptability and precision of synthetic hydrograph models, thereby supporting flood forecasting and watershed management. Full article
(This article belongs to the Special Issue Robust Statistics in Action II)
Show Figures

Figure 1

21 pages, 425 KB  
Article
Model-Free Feature Screening Based on Data Aggregation for Ultra-High-Dimensional Longitudinal Data
by Junfeng Chen, Xiaoguang Yang, Jing Dai and Yunming Li
Stats 2025, 8(4), 99; https://doi.org/10.3390/stats8040099 - 16 Oct 2025
Viewed by 499
Abstract
Ultra-high dimensional longitudinal data feature screening procedures are widely studied, but most require model assumptions. The screening performance of these methods may not be excellent if we specify an incorrect model. To resolve the above problem, a new model-free method is introduced where [...] Read more.
Ultra-high dimensional longitudinal data feature screening procedures are widely studied, but most require model assumptions. The screening performance of these methods may not be excellent if we specify an incorrect model. To resolve the above problem, a new model-free method is introduced where feature screening is performed by sample splitting and data aggregation. Distance correlation is used to measure the association at each time point separately, while longitudinal correlation is modeled by a specific cumulative distribution function to achieve efficiency. In addition, we extend this new method to handle situations where the predictors are correlated. Both methods possess excellent asymptotic properties and are capable of handling longitudinal data with unequal numbers of repeated measurements and unequal intervals between repeated measurement time points. Compared to other model-free methods, the two new methods are relatively insensitive to within-subject correlation, and they can help reduce the computational burden when applied to longitudinal data. Finally, we use some simulated and empirical examples to show that both new methods have better screening performance. Full article
Show Figures

Figure 1

25 pages, 514 KB  
Article
Expansions for the Conditional Density and Distribution of a Standard Estimate
by Christopher S. Withers
Stats 2025, 8(4), 98; https://doi.org/10.3390/stats8040098 - 14 Oct 2025
Viewed by 310
Abstract
Conditioning is a very useful way of using correlated information to reduce the variability of an estimate. Conditioning an estimate on a correlated estimate, reduces its covariance, and so provides more precise inference than using an unconditioned estimate. Here we give expansions in [...] Read more.
Conditioning is a very useful way of using correlated information to reduce the variability of an estimate. Conditioning an estimate on a correlated estimate, reduces its covariance, and so provides more precise inference than using an unconditioned estimate. Here we give expansions in powers of n1/2 for the conditional density and distribution of any multivariate standard estimate based on a sample of size n. Standard estimates include most estimates of interest, including smooth functions of sample means and other empirical estimates. We also show that a conditional estimate is not a standard estimate, so that Edgeworth-Cornish-Fisher expansions cannot be applied directly. Full article
Show Figures

Figure 1

15 pages, 301 KB  
Article
Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques
by Luai Al-Labadi, Ruodie Yu and Kairui Bao
Stats 2025, 8(4), 97; https://doi.org/10.3390/stats8040097 - 14 Oct 2025
Viewed by 475
Abstract
Goodness-of-fit testing remains a fundamental problem in statistical inference with broad practical importance. In this paper, we introduce two new goodness-of-fit tests grounded in entropy-based density estimation techniques. The first is a boundary-corrected empirical likelihood ratio test, which refines the classic approach by [...] Read more.
Goodness-of-fit testing remains a fundamental problem in statistical inference with broad practical importance. In this paper, we introduce two new goodness-of-fit tests grounded in entropy-based density estimation techniques. The first is a boundary-corrected empirical likelihood ratio test, which refines the classic approach by addressing bias near the support boundaries, though, in practice, it yields results very similar to the uncorrected version. The second is a novel test built on Correa’s local linear entropy estimator, leveraging quantile regression to improve density estimation accuracy. We establish the theoretical properties of both test statistics and demonstrate their practical effectiveness through extensive simulation studies and real-data applications. The results show that the proposed methods deliver strong power and flexibility in assessing model adequacy in a wide range of settings. Full article
14 pages, 426 KB  
Article
Robust Parameter Designs Constructed from Hadamard Matrices
by Yingfu Li and Kalanka P. Jayalath
Stats 2025, 8(4), 96; https://doi.org/10.3390/stats8040096 - 11 Oct 2025
Viewed by 553
Abstract
The primary objective of robust parameter design (RPD) is to determine the optimal settings of control factors in a system to minimize response variance while achieving a desirable mean response. This article investigates fractional factorial designs constructed from Hadamard matrices of orders 12, [...] Read more.
The primary objective of robust parameter design (RPD) is to determine the optimal settings of control factors in a system to minimize response variance while achieving a desirable mean response. This article investigates fractional factorial designs constructed from Hadamard matrices of orders 12, 16, and 20 to meet RPD requirements with minimal runs. For various combinations of control and noise factors, rather than recommending a single “best” design, up to the top ten good candidate designs are identified. All listed designs permit the estimation of all control-by-noise interactions and the main effects of both control and noise factors. Additionally, some nonregular RPDs allow for the estimation of one or two control-by-control interactions, which may be critical for achieving optimal mean response. These results provide practical options for efficient, resource-constrained experiments with economical run sizes. Full article
Show Figures

Figure A1

11 pages, 272 KB  
Article
Bayesian Bell Regression Model for Fitting of Overdispersed Count Data with Application
by Ameer Musa Imran Alhseeni and Hossein Bevrani
Stats 2025, 8(4), 95; https://doi.org/10.3390/stats8040095 - 10 Oct 2025
Viewed by 595
Abstract
The Bell regression model (BRM) is a statistical model that is often used in the analysis of count data that exhibits overdispersion. In this study, we propose a Bayesian analysis of the BRM and offer a new perspective on its application. Specifically, we [...] Read more.
The Bell regression model (BRM) is a statistical model that is often used in the analysis of count data that exhibits overdispersion. In this study, we propose a Bayesian analysis of the BRM and offer a new perspective on its application. Specifically, we introduce a G-prior distribution for Bayesian inference in BRM, in addition to a flat-normal prior distribution. To compare the performance of the proposed prior distributions, we conduct a simulation study and demonstrate that the G-prior distribution provides superior estimation results for the BRM. Furthermore, we apply the methodology to real data and compare the BRM to the Poisson and negative binomial regression model using various model selection criteria. Our results provide valuable insights into the use of Bayesian methods for estimation and inference of the BRM and highlight the importance of considering the choice of prior distribution in the analysis of count data. Full article
(This article belongs to the Section Computational Statistics)
15 pages, 721 KB  
Article
Rank-Based Control Charts Under Non-Overlapping Counting with Practical Applications in Logistics and Services
by Ioannis S. Triantafyllou
Stats 2025, 8(4), 94; https://doi.org/10.3390/stats8040094 - 9 Oct 2025
Viewed by 410
Abstract
In this article, we establish a constructive nonparametric scheme for monitoring the quality of services provided by a transportation company. The proposed methodology aims at achieving the diligent tracking of the underlying process and the swift detection of any potential malfunctions. The implementation [...] Read more.
In this article, we establish a constructive nonparametric scheme for monitoring the quality of services provided by a transportation company. The proposed methodology aims at achieving the diligent tracking of the underlying process and the swift detection of any potential malfunctions. The implementation of the new framework requires the construction of appropriate schemes, which follow the set-up of a Shewhart chart and are connected to ranks and multiple run decision criteria. The dispersion and the mean value of the run length distribution for the suggested distribution-free scheme are investigated for the special case k=2. For illustration purposes, a real-data logistics environment is discussed, whereas the proposed approach is applied for improving the quality of the provided services. Full article
Show Figures

Figure 1

19 pages, 339 KB  
Article
Improper Priors via Expectation Measures
by Peter Harremoës
Stats 2025, 8(4), 93; https://doi.org/10.3390/stats8040093 - 9 Oct 2025
Viewed by 512
Abstract
In Bayesian statistics, the prior distributions play a key role in the inference, and there are procedures for finding prior distributions. An important problem is that these procedures often lead to improper prior distributions that cannot be normalized to probability measures. Such improper [...] Read more.
In Bayesian statistics, the prior distributions play a key role in the inference, and there are procedures for finding prior distributions. An important problem is that these procedures often lead to improper prior distributions that cannot be normalized to probability measures. Such improper prior distributions lead to technical problems, in that certain calculations are only fully justified in the literature for probability measures or perhaps for finite measures. Recently, expectation measures were introduced as an alternative to probability measures as a foundation for a theory of uncertainty. Using expectation theory and point processes, it is possible to give a probabilistic interpretation of an improper prior distribution. This will provide us with a rigid formalism for calculating posterior distributions in cases where the prior distributions are not proper without relying on approximation arguments. Full article
(This article belongs to the Section Bayesian Methods)
Show Figures

Figure 1

9 pages, 590 KB  
Article
Predictions of War Duration
by Glenn McRae
Stats 2025, 8(4), 92; https://doi.org/10.3390/stats8040092 - 9 Oct 2025
Viewed by 1128
Abstract
The durations of wars fought between 1480 and 1941 A.D. were found to be well represented by random numbers chosen from a single-event Poisson distribution with a half-life of (1.25 ± 0.1) years. This result complements the work of L.F. Richardson who found [...] Read more.
The durations of wars fought between 1480 and 1941 A.D. were found to be well represented by random numbers chosen from a single-event Poisson distribution with a half-life of (1.25 ± 0.1) years. This result complements the work of L.F. Richardson who found that the frequency of outbreaks of wars can be described as a Poisson process. This result suggests that a quick return on investment requires a distillation of the many stressors of the day, each one of which has a small probability of being included in a convincing well-orchestrated simple call-to-arms. The half-life is a measure of how this call wanes with time. Full article
Show Figures

Figure 1

10 pages, 697 KB  
Article
Benford Behavior in Stick Fragmentation Problems
by Bruce Fang, Ava Irons, Ella Lippelman and Steven J. Miller
Stats 2025, 8(4), 91; https://doi.org/10.3390/stats8040091 - 8 Oct 2025
Viewed by 1198
Abstract
Benford’s law states that in many real-world datasets, the probability that the leading digit is d equals log10((d+1)/d) for all 1d9. We call this weak Benford behavior. A [...] Read more.
Benford’s law states that in many real-world datasets, the probability that the leading digit is d equals log10((d+1)/d) for all 1d9. We call this weak Benford behavior. A dataset is said to follow strong Benford behavior if the probability that its significand (i.e., the significant digits in scientific notation) is at most s equals log10(s) for all s[1,10). We investigate Benford behavior in a multi-proportion stick fragmentation model, where a stick is split into m substicks according to fixed proportions at each stage. This generalizes previous work on the single proportion stick fragmentation model, where each stick is split into two substicks using one fixed proportion. We provide a necessary and sufficient condition under which the lengths of the stick fragments converge to strong Benford behavior in the multi-proportion model. Full article
(This article belongs to the Special Issue Benford's Law(s) and Applications (Second Edition))
Show Figures

Figure 1

12 pages, 683 KB  
Review
The Use of Double Poisson Regression for Count Data in Health and Life Science—A Narrative Review
by Sebastian Appelbaum, Julia Stronski, Uwe Konerding and Thomas Ostermann
Stats 2025, 8(4), 90; https://doi.org/10.3390/stats8040090 - 1 Oct 2025
Viewed by 1594
Abstract
Count data are present in many areas of everyday life. Unfortunately, such data are often characterized by over- and under-dispersion. In 1986, Efron introduced the Double Poisson distribution to account for this problem. The aim of this work is to examine the application [...] Read more.
Count data are present in many areas of everyday life. Unfortunately, such data are often characterized by over- and under-dispersion. In 1986, Efron introduced the Double Poisson distribution to account for this problem. The aim of this work is to examine the application of this distribution in regression analyses performed in health-related literature by means of a narrative review. The databases Science Direct, PBSC, Pubmed PsycInfo, PsycArticles, CINAHL and Google Scholar were searched for applications. Two independent reviewers extracted data on Double Poisson Regression Models and their applications in the health and life sciences. From a total of 1644 hits, 84 articles were pre-selected and after full-text screening, 13 articles remained. All these articles were published after 2011 and most of them targeted epidemiological research. Both over- and under-dispersion was present and most of the papers used the generalized additive models for location, scale, and shape (GAMLSS) framework. In summary, this narrative review shows that the first steps in applying Efron’s idea of double exponential families for empirical count data have already been successfully taken in a variety of fields in the health and life sciences. Approaches to ease their application in clinical research should be encouraged. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop