Previous Issue
Volume 8, September
 
 

Stats, Volume 8, Issue 4 (December 2025) – 22 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
52 pages, 10801 KB  
Article
Silhouette-Based Evaluation of PCA, Isomap, and t-SNE on Linear and Nonlinear Data Structures
by Mostafa Zahed and Maryam Skafyan
Stats 2025, 8(4), 105; https://doi.org/10.3390/stats8040105 - 3 Nov 2025
Abstract
Dimensionality reduction is fundamental for analyzing high-dimensional data, supporting visualization, denoising, and structure discovery. We present a systematic, large-scale benchmark of three widely used methods—Principal Component Analysis (PCA), Isometric Mapping (Isomap), and t-Distributed Stochastic Neighbor Embedding (t-SNE)—evaluated by average silhouette scores to quantify [...] Read more.
Dimensionality reduction is fundamental for analyzing high-dimensional data, supporting visualization, denoising, and structure discovery. We present a systematic, large-scale benchmark of three widely used methods—Principal Component Analysis (PCA), Isometric Mapping (Isomap), and t-Distributed Stochastic Neighbor Embedding (t-SNE)—evaluated by average silhouette scores to quantify cluster preservation after embedding. Our full factorial simulation varies sample size n{100,200,300,400,500}, noise variance σ2{0.25,0.5,0.75,1,1.5,2}, and feature count p{20,50,100,200,300,400} under four generative regimes: (1) a linear Gaussian mixture, (2) a linear Student-t mixture with heavy tails, (3) a nonlinear Swiss-roll manifold, and (4) a nonlinear concentric-spheres manifold, each replicated 1000 times per condition. Beyond empirical comparisons, we provide mathematical results that explain the observed rankings: under standard separation and sampling assumptions, PCA maximizes silhouettes for linear, low-rank structure, whereas Isomap dominates on smooth curved manifolds; t-SNE prioritizes local neighborhoods, yielding strong local separation but less reliable global geometry. Empirically, PCA consistently achieves the highest silhouettes for linear structure (Isomap second, t-SNE third); on manifolds the ordering reverses (Isomap > t-SNE > PCA). Increasing σ2 and adding uninformative dimensions (larger p) degrade all methods, while larger n improves levels and stability. To our knowledge, this is the first integrated study combining a comprehensive factorial simulation across linear and nonlinear regimes with distribution-based summaries (density and violin plots) and supporting theory that predicts method orderings. The results offer clear, practice-oriented guidance: prefer PCA when structure is approximately linear; favor manifold learning—especially Isomap—when curvature is present; and use t-SNE for the exploratory visualization of local neighborhoods. Complete tables and replication materials are provided to facilitate method selection and reproducibility. Full article
21 pages, 1895 KB  
Article
Computational Testing Procedure for the Overall Lifetime Performance Index of Multi-Component Exponentially Distributed Products
by Shu-Fei Wu and Chia-Chi Hsu
Stats 2025, 8(4), 104; https://doi.org/10.3390/stats8040104 - 2 Nov 2025
Abstract
In addition to products with a single component, this study examines products composed of multiple components whose lifetimes follow a one-parameter exponential distribution. An overall lifetime performance index is developed to assess products under the progressive type I interval censoring scheme. This study [...] Read more.
In addition to products with a single component, this study examines products composed of multiple components whose lifetimes follow a one-parameter exponential distribution. An overall lifetime performance index is developed to assess products under the progressive type I interval censoring scheme. This study establishes the relationship between the overall and individual lifetime performance indices and derives the corresponding maximum likelihood estimators along with their asymptotic distributions. Based on the asymptotic distributions, the lower confidence bounds for all indices are also established. Furthermore, a hypothesis testing procedure is formulated to evaluate whether the overall lifetime performance index achieves the specified target level, utilizing the maximum likelihood estimator as the test statistic under a progressive type I interval censored sample. Moreover, a power analysis is carried out, and two numerical examples are presented to demonstrate the practical implementation for the overall lifetime performance index. This research can be applied to the fields of life testing and reliability analysis. Full article
Show Figures

Figure 1

16 pages, 1461 KB  
Article
A Nonparametric Monitoring Framework Based on Order Statistics and Multiple Scans: Advances and Applications in Ocean Engineering
by Ioannis S. Triantafyllou
Stats 2025, 8(4), 103; https://doi.org/10.3390/stats8040103 - 1 Nov 2025
Viewed by 52
Abstract
In this work, we introduce a statistical framework for monitoring the performance of a breakwater structure in reducing wave impact. The proposed methodology aims to achieve diligent tracking of the underlying process and the swift detection of any potential malfunctions. The implementation of [...] Read more.
In this work, we introduce a statistical framework for monitoring the performance of a breakwater structure in reducing wave impact. The proposed methodology aims to achieve diligent tracking of the underlying process and the swift detection of any potential malfunctions. The implementation of the new framework requires the construction of appropriate nonparametric Shewhart-type control charts, which rely on order statistics and scan-type decision criteria. The variance of the run length distribution of the proposed scheme is investigated, while the corresponding mean value is determined. For illustration purposes, we consider a real-life application, which aims at evaluating the effectiveness of a breakwater structure based on wave height reduction and wave energy dissipation. Full article
Show Figures

Figure 1

21 pages, 1332 KB  
Article
The Ridge-Hurdle Negative Binomial Regression Model: A Novel Solution for Zero-Inflated Counts in the Presence of Multicollinearity
by HM Nayem and B. M. Golam Kibria
Stats 2025, 8(4), 102; https://doi.org/10.3390/stats8040102 - 1 Nov 2025
Viewed by 116
Abstract
Datasets with many zero outcomes are common in real-world studies and often exhibit overdispersion and strong correlations among predictors, creating challenges for standard count models. Traditional approaches such as the Zero-Inflated Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB), and Hurdle models can handle extra [...] Read more.
Datasets with many zero outcomes are common in real-world studies and often exhibit overdispersion and strong correlations among predictors, creating challenges for standard count models. Traditional approaches such as the Zero-Inflated Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB), and Hurdle models can handle extra zeros and overdispersion but struggle when multicollinearity is present. This study introduces the Ridge-Hurdle Negative Binomial model, which incorporates L2 regularization into the truncated count component of the hurdle framework to jointly address zero inflation, overdispersion, and multicollinearity. Monte Carlo simulations under varying sample sizes, predictor correlations, and levels of overdispersion and zero inflation show that Ridge-Hurdle NB consistently achieves the lowest mean squared error (MSE) compared to ZIP, ZINB, Hurdle Poisson, Hurdle Negative Binomial, Ridge ZIP, and Ridge ZINB models. Applications to the Wildlife Fish and Medical Care datasets further confirm its superior predictive performance, highlighting RHNB as a robust and efficient solution for complex count data modeling. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

15 pages, 1977 KB  
Article
Robustness of the Trinormal ROC Surface Model: Formal Assessment via Goodness-of-Fit Testing
by Christos Nakas
Stats 2025, 8(4), 101; https://doi.org/10.3390/stats8040101 - 17 Oct 2025
Viewed by 358
Abstract
Receiver operating characteristic (ROC) surfaces provide a natural extension of ROC curves to three-class diagnostic problems. A key summary index is the volume under the surface (VUS), representing the probability that a randomly chosen observation from each of the three ordered groups is [...] Read more.
Receiver operating characteristic (ROC) surfaces provide a natural extension of ROC curves to three-class diagnostic problems. A key summary index is the volume under the surface (VUS), representing the probability that a randomly chosen observation from each of the three ordered groups is correctly classified. A parametric estimation of VUS typically assumes trinormality of the class distributions. However, a formal method for the verification of this composite assumption has not appeared in the literature. Our approach generalizes the two-class AUC-based GOF test of Zou et al. to the three-class setting by exploiting the parallel structure between empirical and trinormal VUS estimators. We propose a global goodness-of-fit (GOF) test for trinormal ROC models based on the difference between empirical and trinormal parametric estimates of the VUS. To improve stability, a probit transformation is applied and a bootstrap procedure is used to estimate the variance of the difference. The resulting test provides a formal diagnostic for assessing the adequacy of trinormal ROC modeling. Simulation studies illustrate the robustness of the assumption via the empirical size and power of the test under various distributional settings, including skewed and multimodal alternatives. The method’s application to COVID-19 antibody level data demonstrates the practical utility of it. Our findings suggest that the proposed GOF test is simple to implement, computationally feasible for moderate sample sizes, and a useful complement to existing ROC surface methodology. Full article
(This article belongs to the Section Biostatistics)
Show Figures

Figure 1

16 pages, 1699 KB  
Technical Note
Synthetic Hydrograph Estimation for Ungauged Basins: Exploring the Role of Statistical Distributions
by Dan Ianculescu and Cristian Gabriel Anghel
Stats 2025, 8(4), 100; https://doi.org/10.3390/stats8040100 - 17 Oct 2025
Viewed by 625
Abstract
The use of probability distribution functions in deriving synthetic hydrographs has become a robust method for modeling the response of watersheds to precipitation events. This approach leverages statistical distributions to capture the temporal structure of runoff processes, providing a flexible framework for estimating [...] Read more.
The use of probability distribution functions in deriving synthetic hydrographs has become a robust method for modeling the response of watersheds to precipitation events. This approach leverages statistical distributions to capture the temporal structure of runoff processes, providing a flexible framework for estimating peak discharge, time to peak, and hydrograph shape. The present study explores the application of various probability distributions in constructing synthetic hydrographs. The research evaluates parameter estimation techniques, analyzing their influence on hydrograph accuracy. The results highlight the strengths and limitations of each distribution in capturing key hydrological characteristics, offering insights into the suitability of certain probability distribution functions under varying watershed conditions. The study concludes that the approach based on the Cadariu rational function enhances the adaptability and precision of synthetic hydrograph models, thereby supporting flood forecasting and watershed management. Full article
(This article belongs to the Special Issue Robust Statistics in Action II)
Show Figures

Figure 1

21 pages, 425 KB  
Article
Model-Free Feature Screening Based on Data Aggregation for Ultra-High-Dimensional Longitudinal Data
by Junfeng Chen, Xiaoguang Yang, Jing Dai and Yunming Li
Stats 2025, 8(4), 99; https://doi.org/10.3390/stats8040099 - 16 Oct 2025
Viewed by 278
Abstract
Ultra-high dimensional longitudinal data feature screening procedures are widely studied, but most require model assumptions. The screening performance of these methods may not be excellent if we specify an incorrect model. To resolve the above problem, a new model-free method is introduced where [...] Read more.
Ultra-high dimensional longitudinal data feature screening procedures are widely studied, but most require model assumptions. The screening performance of these methods may not be excellent if we specify an incorrect model. To resolve the above problem, a new model-free method is introduced where feature screening is performed by sample splitting and data aggregation. Distance correlation is used to measure the association at each time point separately, while longitudinal correlation is modeled by a specific cumulative distribution function to achieve efficiency. In addition, we extend this new method to handle situations where the predictors are correlated. Both methods possess excellent asymptotic properties and are capable of handling longitudinal data with unequal numbers of repeated measurements and unequal intervals between repeated measurement time points. Compared to other model-free methods, the two new methods are relatively insensitive to within-subject correlation, and they can help reduce the computational burden when applied to longitudinal data. Finally, we use some simulated and empirical examples to show that both new methods have better screening performance. Full article
Show Figures

Figure 1

25 pages, 514 KB  
Article
Expansions for the Conditional Density and Distribution of a Standard Estimate
by Christopher S. Withers
Stats 2025, 8(4), 98; https://doi.org/10.3390/stats8040098 - 14 Oct 2025
Viewed by 190
Abstract
Conditioning is a very useful way of using correlated information to reduce the variability of an estimate. Conditioning an estimate on a correlated estimate, reduces its covariance, and so provides more precise inference than using an unconditioned estimate. Here we give expansions in [...] Read more.
Conditioning is a very useful way of using correlated information to reduce the variability of an estimate. Conditioning an estimate on a correlated estimate, reduces its covariance, and so provides more precise inference than using an unconditioned estimate. Here we give expansions in powers of n1/2 for the conditional density and distribution of any multivariate standard estimate based on a sample of size n. Standard estimates include most estimates of interest, including smooth functions of sample means and other empirical estimates. We also show that a conditional estimate is not a standard estimate, so that Edgeworth-Cornish-Fisher expansions cannot be applied directly. Full article
Show Figures

Figure 1

15 pages, 301 KB  
Article
Goodness-of-Fit Tests via Entropy-Based Density Estimation Techniques
by Luai Al-Labadi, Ruodie Yu and Kairui Bao
Stats 2025, 8(4), 97; https://doi.org/10.3390/stats8040097 - 14 Oct 2025
Viewed by 241
Abstract
Goodness-of-fit testing remains a fundamental problem in statistical inference with broad practical importance. In this paper, we introduce two new goodness-of-fit tests grounded in entropy-based density estimation techniques. The first is a boundary-corrected empirical likelihood ratio test, which refines the classic approach by [...] Read more.
Goodness-of-fit testing remains a fundamental problem in statistical inference with broad practical importance. In this paper, we introduce two new goodness-of-fit tests grounded in entropy-based density estimation techniques. The first is a boundary-corrected empirical likelihood ratio test, which refines the classic approach by addressing bias near the support boundaries, though, in practice, it yields results very similar to the uncorrected version. The second is a novel test built on Correa’s local linear entropy estimator, leveraging quantile regression to improve density estimation accuracy. We establish the theoretical properties of both test statistics and demonstrate their practical effectiveness through extensive simulation studies and real-data applications. The results show that the proposed methods deliver strong power and flexibility in assessing model adequacy in a wide range of settings. Full article
14 pages, 426 KB  
Article
Robust Parameter Designs Constructed from Hadamard Matrices
by Yingfu Li and Kalanka P. Jayalath
Stats 2025, 8(4), 96; https://doi.org/10.3390/stats8040096 - 11 Oct 2025
Viewed by 300
Abstract
The primary objective of robust parameter design (RPD) is to determine the optimal settings of control factors in a system to minimize response variance while achieving a desirable mean response. This article investigates fractional factorial designs constructed from Hadamard matrices of orders 12, [...] Read more.
The primary objective of robust parameter design (RPD) is to determine the optimal settings of control factors in a system to minimize response variance while achieving a desirable mean response. This article investigates fractional factorial designs constructed from Hadamard matrices of orders 12, 16, and 20 to meet RPD requirements with minimal runs. For various combinations of control and noise factors, rather than recommending a single “best” design, up to the top ten good candidate designs are identified. All listed designs permit the estimation of all control-by-noise interactions and the main effects of both control and noise factors. Additionally, some nonregular RPDs allow for the estimation of one or two control-by-control interactions, which may be critical for achieving optimal mean response. These results provide practical options for efficient, resource-constrained experiments with economical run sizes. Full article
Show Figures

Figure A1

11 pages, 272 KB  
Article
Bayesian Bell Regression Model for Fitting of Overdispersed Count Data with Application
by Ameer Musa Imran Alhseeni and Hossein Bevrani
Stats 2025, 8(4), 95; https://doi.org/10.3390/stats8040095 - 10 Oct 2025
Viewed by 320
Abstract
The Bell regression model (BRM) is a statistical model that is often used in the analysis of count data that exhibits overdispersion. In this study, we propose a Bayesian analysis of the BRM and offer a new perspective on its application. Specifically, we [...] Read more.
The Bell regression model (BRM) is a statistical model that is often used in the analysis of count data that exhibits overdispersion. In this study, we propose a Bayesian analysis of the BRM and offer a new perspective on its application. Specifically, we introduce a G-prior distribution for Bayesian inference in BRM, in addition to a flat-normal prior distribution. To compare the performance of the proposed prior distributions, we conduct a simulation study and demonstrate that the G-prior distribution provides superior estimation results for the BRM. Furthermore, we apply the methodology to real data and compare the BRM to the Poisson and negative binomial regression model using various model selection criteria. Our results provide valuable insights into the use of Bayesian methods for estimation and inference of the BRM and highlight the importance of considering the choice of prior distribution in the analysis of count data. Full article
(This article belongs to the Section Computational Statistics)
15 pages, 721 KB  
Article
Rank-Based Control Charts Under Non-Overlapping Counting with Practical Applications in Logistics and Services
by Ioannis S. Triantafyllou
Stats 2025, 8(4), 94; https://doi.org/10.3390/stats8040094 - 9 Oct 2025
Viewed by 255
Abstract
In this article, we establish a constructive nonparametric scheme for monitoring the quality of services provided by a transportation company. The proposed methodology aims at achieving the diligent tracking of the underlying process and the swift detection of any potential malfunctions. The implementation [...] Read more.
In this article, we establish a constructive nonparametric scheme for monitoring the quality of services provided by a transportation company. The proposed methodology aims at achieving the diligent tracking of the underlying process and the swift detection of any potential malfunctions. The implementation of the new framework requires the construction of appropriate schemes, which follow the set-up of a Shewhart chart and are connected to ranks and multiple run decision criteria. The dispersion and the mean value of the run length distribution for the suggested distribution-free scheme are investigated for the special case k=2. For illustration purposes, a real-data logistics environment is discussed, whereas the proposed approach is applied for improving the quality of the provided services. Full article
Show Figures

Figure 1

19 pages, 339 KB  
Article
Improper Priors via Expectation Measures
by Peter Harremoës
Stats 2025, 8(4), 93; https://doi.org/10.3390/stats8040093 - 9 Oct 2025
Viewed by 313
Abstract
In Bayesian statistics, the prior distributions play a key role in the inference, and there are procedures for finding prior distributions. An important problem is that these procedures often lead to improper prior distributions that cannot be normalized to probability measures. Such improper [...] Read more.
In Bayesian statistics, the prior distributions play a key role in the inference, and there are procedures for finding prior distributions. An important problem is that these procedures often lead to improper prior distributions that cannot be normalized to probability measures. Such improper prior distributions lead to technical problems, in that certain calculations are only fully justified in the literature for probability measures or perhaps for finite measures. Recently, expectation measures were introduced as an alternative to probability measures as a foundation for a theory of uncertainty. Using expectation theory and point processes, it is possible to give a probabilistic interpretation of an improper prior distribution. This will provide us with a rigid formalism for calculating posterior distributions in cases where the prior distributions are not proper without relying on approximation arguments. Full article
(This article belongs to the Section Bayesian Methods)
Show Figures

Figure 1

9 pages, 590 KB  
Article
Predictions of War Duration
by Glenn McRae
Stats 2025, 8(4), 92; https://doi.org/10.3390/stats8040092 - 9 Oct 2025
Viewed by 622
Abstract
The durations of wars fought between 1480 and 1941 A.D. were found to be well represented by random numbers chosen from a single-event Poisson distribution with a half-life of (1.25 ± 0.1) years. This result complements the work of L.F. Richardson who found [...] Read more.
The durations of wars fought between 1480 and 1941 A.D. were found to be well represented by random numbers chosen from a single-event Poisson distribution with a half-life of (1.25 ± 0.1) years. This result complements the work of L.F. Richardson who found that the frequency of outbreaks of wars can be described as a Poisson process. This result suggests that a quick return on investment requires a distillation of the many stressors of the day, each one of which has a small probability of being included in a convincing well-orchestrated simple call-to-arms. The half-life is a measure of how this call wanes with time. Full article
Show Figures

Figure 1

10 pages, 697 KB  
Article
Benford Behavior in Stick Fragmentation Problems
by Bruce Fang, Ava Irons, Ella Lippelman and Steven J. Miller
Stats 2025, 8(4), 91; https://doi.org/10.3390/stats8040091 - 8 Oct 2025
Viewed by 925
Abstract
Benford’s law states that in many real-world datasets, the probability that the leading digit is d equals log10((d+1)/d) for all 1d9. We call this weak Benford behavior. A [...] Read more.
Benford’s law states that in many real-world datasets, the probability that the leading digit is d equals log10((d+1)/d) for all 1d9. We call this weak Benford behavior. A dataset is said to follow strong Benford behavior if the probability that its significand (i.e., the significant digits in scientific notation) is at most s equals log10(s) for all s[1,10). We investigate Benford behavior in a multi-proportion stick fragmentation model, where a stick is split into m substicks according to fixed proportions at each stage. This generalizes previous work on the single proportion stick fragmentation model, where each stick is split into two substicks using one fixed proportion. We provide a necessary and sufficient condition under which the lengths of the stick fragments converge to strong Benford behavior in the multi-proportion model. Full article
(This article belongs to the Special Issue Benford's Law(s) and Applications (Second Edition))
Show Figures

Figure 1

12 pages, 683 KB  
Review
The Use of Double Poisson Regression for Count Data in Health and Life Science—A Narrative Review
by Sebastian Appelbaum, Julia Stronski, Uwe Konerding and Thomas Ostermann
Stats 2025, 8(4), 90; https://doi.org/10.3390/stats8040090 - 1 Oct 2025
Viewed by 716
Abstract
Count data are present in many areas of everyday life. Unfortunately, such data are often characterized by over- and under-dispersion. In 1986, Efron introduced the Double Poisson distribution to account for this problem. The aim of this work is to examine the application [...] Read more.
Count data are present in many areas of everyday life. Unfortunately, such data are often characterized by over- and under-dispersion. In 1986, Efron introduced the Double Poisson distribution to account for this problem. The aim of this work is to examine the application of this distribution in regression analyses performed in health-related literature by means of a narrative review. The databases Science Direct, PBSC, Pubmed PsycInfo, PsycArticles, CINAHL and Google Scholar were searched for applications. Two independent reviewers extracted data on Double Poisson Regression Models and their applications in the health and life sciences. From a total of 1644 hits, 84 articles were pre-selected and after full-text screening, 13 articles remained. All these articles were published after 2011 and most of them targeted epidemiological research. Both over- and under-dispersion was present and most of the papers used the generalized additive models for location, scale, and shape (GAMLSS) framework. In summary, this narrative review shows that the first steps in applying Efron’s idea of double exponential families for empirical count data have already been successfully taken in a variety of fields in the health and life sciences. Approaches to ease their application in clinical research should be encouraged. Full article
Show Figures

Figure 1

22 pages, 1227 KB  
Article
Theoretically Based Dynamic Regression (TDR)—A New and Novel Regression Framework for Modeling Dynamic Behavior
by Derrick K. Rollins, Marit Nilsen-Hamilton, Kendra Kreienbrink, Spencer Wolfe, Dillon Hurd and Jacob Oyler
Stats 2025, 8(4), 89; https://doi.org/10.3390/stats8040089 - 28 Sep 2025
Viewed by 431
Abstract
The theoretical modeling of a dynamic system will have derivatives of the response (y) with respect to time (t). Two common physical attributes (i.e., parameters) of dynamic systems are dead-time (θ) and lag (τ). Theoretical [...] Read more.
The theoretical modeling of a dynamic system will have derivatives of the response (y) with respect to time (t). Two common physical attributes (i.e., parameters) of dynamic systems are dead-time (θ) and lag (τ). Theoretical dynamic modeling will contain physically interpretable parameters such as τ and θ with physical constraints. In addition, the number of unknown model-based parameters can be considerably smaller than empirically based (i.e., lagged-based) approaches. This work proposes a Theoretically based Dynamic Regression (TDR) modeling approach that overcomes critical lagged-based modeling limitations as demonstrated in three large, multiple input, highly dynamic, real data sets. Dynamic Regression (DR) is a lagged-based, empirical dynamic modeling approach that appears in the statistics literature. However, like all empirical approaches, the model structures do not contain first-principle interpretable parameters. Additionally, several time lags are typically needed for the output, y, and input, x, to capture significant dynamic behavior. TDR uses a simplistic theoretically based dynamic modeling approach to transform xt into its dynamic counterpart, vt, and then applies the methods and tools of static regression to vt. TDR is demonstrated on the following three modeling problems of freely existing (i.e., not experimentally designed) real data sets: 1. the weight variation in a person (y) with four measured nutrient inputs (xi); 2. the variation in the tray temperature (y) of a distillation column with nine inputs and eight test data sets over a three year period; and 3. eleven extremely large, highly dynamic, subject-specific models of sensor glucose (y) with 12 inputs (xi). Full article
Show Figures

Figure 1

2 pages, 162 KB  
Correction
Correction: Chen et al. Scoring Individual Moral Inclination for the CNI Test. Stats 2024, 7, 894–905
by Yi Chen, Benjamin Lugu, Wenchao Ma and Hyemin Han
Stats 2025, 8(4), 88; https://doi.org/10.3390/stats8040088 - 28 Sep 2025
Viewed by 225
Abstract
Error in Table [...] Full article
14 pages, 434 KB  
Article
Energy Statistic-Based Goodness-of-Fit Test for the Lindley Distribution with Application to Lifetime Data
by Joseph Njuki and Ryan Avallone
Stats 2025, 8(4), 87; https://doi.org/10.3390/stats8040087 - 26 Sep 2025
Viewed by 547
Abstract
In this article, we propose a goodness-of-fit test for a one-parameter Lindley distribution based on energy statistics. The Lindley distribution has been widely used in reliability studies and survival analysis, especially in applied sciences. The proposed test procedure is simple and more powerful [...] Read more.
In this article, we propose a goodness-of-fit test for a one-parameter Lindley distribution based on energy statistics. The Lindley distribution has been widely used in reliability studies and survival analysis, especially in applied sciences. The proposed test procedure is simple and more powerful against general alternatives. Under different settings, Monte Carlo simulations show that the proposed test is able to be well controlled for any given nominal levels. In terms of power, the proposed test outperforms other existing similar methods in different settings. We then apply the proposed test to real-life datasets to demonstrate its competitiveness and usefulness. Full article
Show Figures

Figure 1

32 pages, 1136 KB  
Article
Enhancing Diversity and Improving Prediction Performance of Subsampling-Based Ensemble Methods
by Maria Ordal and Qing Wang
Stats 2025, 8(4), 86; https://doi.org/10.3390/stats8040086 - 26 Sep 2025
Viewed by 345
Abstract
This paper investigates how diversity among training samples impacts the predictive performance of a subsampling-based ensemble. It is well known that diverse training samples improve ensemble predictions, and smaller subsampling rates naturally lead to enhanced diversity. However, this approach of achieving a higher [...] Read more.
This paper investigates how diversity among training samples impacts the predictive performance of a subsampling-based ensemble. It is well known that diverse training samples improve ensemble predictions, and smaller subsampling rates naturally lead to enhanced diversity. However, this approach of achieving a higher degree of diversity often comes with the cost of a reduced training sample size, which is undesirable. This paper introduces two novel subsampling strategies—partition and shift subsampling—as alternative schemes designed to improve diversity without sacrificing the training sample size in subsampling-based ensemble methods. From a probabilistic perspective, we investigate their impact on subsample diversity when utilized with tree-based sub-ensemble learners in comparison to the benchmark random subsampling. Through extensive simulations and eight real-world examples in both regression and classification contexts, we found a significant improvement in the predictive performance of the developed methods. Notably, this gain is particularly pronounced on challenging datasets or when higher subsampling rates are employed. Full article
(This article belongs to the Section Applied Statistics and Machine Learning Methods)
Show Figures

Figure 1

13 pages, 357 KB  
Review
An Overview of Economics and Econometrics Related R Packages
by Despina Michelaki, Michail Tsagris and Christos Adam
Stats 2025, 8(4), 85; https://doi.org/10.3390/stats8040085 - 26 Sep 2025
Viewed by 1031
Abstract
This study provides a systematic overview of 207 econometrics-related R packages identified through CRAN and the Econometrics Task View. Using descriptive and inferential statistics and text mining to compute the word frequency and association among words (n-grams and correlations), we evaluate the development [...] Read more.
This study provides a systematic overview of 207 econometrics-related R packages identified through CRAN and the Econometrics Task View. Using descriptive and inferential statistics and text mining to compute the word frequency and association among words (n-grams and correlations), we evaluate the development patterns, documentation practices, publication outcomes, and methodological scope. The findings reveal that most packages are created by small-to-mid-sized teams in Europe and North America, with mid-sized collaborations and packages including vignettes being significantly more likely to achieve journal publication. While reverse dependencies indicate strong ecosystem integration, they do not predict publication, and Bayesian or dataset-only packages remain underrepresented. Growth has accelerated since 2010, but newer packages exhibit fewer updates, raising concerns about sustainability. These findings highlight both the central role of R in contemporary econometrics and the need for broader participation, methodological diversity, and long-term maintenance. Full article
Show Figures

Figure 1

19 pages, 1013 KB  
Article
A Simulation-Based Comparative Analysis of Two-Parameter Robust Ridge M-Estimators for Linear Regression Models
by Bushra Haider, Syed Muhammad Asim, Danish Wasim and B. M. Golam Kibria
Stats 2025, 8(4), 84; https://doi.org/10.3390/stats8040084 - 24 Sep 2025
Viewed by 545
Abstract
Traditional regression estimators like Ordinary Least Squares (OLS) and classical ridge regression often fail under multicollinearity and outlier contamination respectively. Although recently developed two-parameter ridge regression (TPRR) estimators improve efficiency by introducing dual shrinkage parameters, they remain sensitive to extreme observations. This study [...] Read more.
Traditional regression estimators like Ordinary Least Squares (OLS) and classical ridge regression often fail under multicollinearity and outlier contamination respectively. Although recently developed two-parameter ridge regression (TPRR) estimators improve efficiency by introducing dual shrinkage parameters, they remain sensitive to extreme observations. This study develops a new class of Two-Parameter Robust Ridge M-Estimators (TPRRM) that integrate dual shrinkage with robust M-estimation to simultaneously address multicollinearity and outliers. A Monte Carlo simulation study, conducted under varying sample sizes, predictor dimensions, correlation levels, and contamination structures, compares the proposed estimators with OLS, ridge, and the most recent TPRR estimators. The results demonstrate that TPRRM consistently achieves the lowest Mean Squared Error (MSE), particularly in heavy-tailed and outlier-prone scenarios. Application to the Tobacco and Gasoline Consumption datasets further validates the superiority of the proposed methods in real-world conditions. The findings confirm that the proposed TPRRM fills a critical methodological gap by offering estimators that are not only efficient under multicollinearity, but also robust against departures from normality. Full article
Show Figures

Figure 1

Previous Issue
Back to TopTop