Journal Description
Stats
Stats
is an international, peer-reviewed, open access journal on statistical science published quarterly online by MDPI. The journal focuses on methodological and theoretical papers in statistics, probability, stochastic processes and innovative applications of statistics in all scientific disciplines including biological and biomedical sciences, medicine, business, economics and social sciences, physics, data science and engineering.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within ESCI (Web of Science), Scopus, RePEc, and other databases.
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 18.2 days after submission; acceptance to publication is undertaken in 2.9 days (median values for papers published in this journal in the first half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
1.0 (2024);
5-Year Impact Factor:
1.1 (2024)
Latest Articles
The Use of Double Poisson Regression for Count Data in Health and Life Science—A Narrative Review
Stats 2025, 8(4), 90; https://doi.org/10.3390/stats8040090 - 1 Oct 2025
Abstract
Count data are present in many areas of everyday life. Unfortunately, such data are often characterized by over- and under-dispersion. In 1986, Efron introduced the Double Poisson distribution to account for this problem. The aim of this work is to examine the application
[...] Read more.
Count data are present in many areas of everyday life. Unfortunately, such data are often characterized by over- and under-dispersion. In 1986, Efron introduced the Double Poisson distribution to account for this problem. The aim of this work is to examine the application of this distribution in regression analyses performed in health-related literature by means of a narrative review. The databases Science Direct, PBSC, Pubmed PsycInfo, PsycArticles, CINAHL and Google Scholar were searched for applications. Two independent reviewers extracted data on Double Poisson Regression Models and their applications in the health and life sciences. From a total of 1644 hits, 84 articles were pre-selected and after full-text screening, 13 articles remained. All these articles were published after 2011 and most of them targeted epidemiological research. Both over- and under-dispersion was present and most of the papers used the generalized additive models for location, scale, and shape (GAMLSS) framework. In summary, this narrative review shows that the first steps in applying Efron’s idea of double exponential families for empirical count data have already been successfully taken in a variety of fields in the health and life sciences. Approaches to ease their application in clinical research should be encouraged.
Full article
(This article belongs to the Topic Application of Biostatistics in Medical Sciences and Global Health)
►
Show Figures
Open AccessArticle
Theoretically Based Dynamic Regression (TDR)—A New and Novel Regression Framework for Modeling Dynamic Behavior
by
Derrick K. Rollins, Marit Nilsen-Hamilton, Kendra Kreienbrink, Spencer Wolfe, Dillon Hurd and Jacob Oyler
Stats 2025, 8(4), 89; https://doi.org/10.3390/stats8040089 - 28 Sep 2025
Abstract
The theoretical modeling of a dynamic system will have derivatives of the response (y) with respect to time (t). Two common physical attributes (i.e., parameters) of dynamic systems are dead-time (θ) and lag (τ). Theoretical
[...] Read more.
The theoretical modeling of a dynamic system will have derivatives of the response (y) with respect to time (t). Two common physical attributes (i.e., parameters) of dynamic systems are dead-time (θ) and lag (τ). Theoretical dynamic modeling will contain physically interpretable parameters such as τ and θ with physical constraints. In addition, the number of unknown model-based parameters can be considerably smaller than empirically based (i.e., lagged-based) approaches. This work proposes a Theoretically based Dynamic Regression (TDR) modeling approach that overcomes critical lagged-based modeling limitations as demonstrated in three large, multiple input, highly dynamic, real data sets. Dynamic Regression (DR) is a lagged-based, empirical dynamic modeling approach that appears in the statistics literature. However, like all empirical approaches, the model structures do not contain first-principle interpretable parameters. Additionally, several time lags are typically needed for the output, y, and input, x, to capture significant dynamic behavior. TDR uses a simplistic theoretically based dynamic modeling approach to transform xt into its dynamic counterpart, vt, and then applies the methods and tools of static regression to vt. TDR is demonstrated on the following three modeling problems of freely existing (i.e., not experimentally designed) real data sets: 1. the weight variation in a person (y) with four measured nutrient inputs (xi); 2. the variation in the tray temperature (y) of a distillation column with nine inputs and eight test data sets over a three year period; and 3. eleven extremely large, highly dynamic, subject-specific models of sensor glucose (y) with 12 inputs (xi).
Full article
(This article belongs to the Special Issue Advances in Machine Learning, High-Dimensional Inference, Shrinkage Estimation, and Model Validation)
►▼
Show Figures

Figure 1
Open AccessCorrection
Correction: Chen et al. Scoring Individual Moral Inclination for the CNI Test. Stats 2024, 7, 894–905
by
Yi Chen, Benjamin Lugu, Wenchao Ma and Hyemin Han
Stats 2025, 8(4), 88; https://doi.org/10.3390/stats8040088 - 28 Sep 2025
Abstract
Error in Table [...]
Full article
Open AccessArticle
Energy Statistic-Based Goodness-of-Fit Test for the Lindley Distribution with Application to Lifetime Data
by
Joseph Njuki and Ryan Avallone
Stats 2025, 8(4), 87; https://doi.org/10.3390/stats8040087 - 26 Sep 2025
Abstract
►▼
Show Figures
In this article, we propose a goodness-of-fit test for a one-parameter Lindley distribution based on energy statistics. The Lindley distribution has been widely used in reliability studies and survival analysis, especially in applied sciences. The proposed test procedure is simple and more powerful
[...] Read more.
In this article, we propose a goodness-of-fit test for a one-parameter Lindley distribution based on energy statistics. The Lindley distribution has been widely used in reliability studies and survival analysis, especially in applied sciences. The proposed test procedure is simple and more powerful against general alternatives. Under different settings, Monte Carlo simulations show that the proposed test is able to be well controlled for any given nominal levels. In terms of power, the proposed test outperforms other existing similar methods in different settings. We then apply the proposed test to real-life datasets to demonstrate its competitiveness and usefulness.
Full article

Figure 1
Open AccessArticle
Enhancing Diversity and Improving Prediction Performance of Subsampling-Based Ensemble Methods
by
Maria Ordal and Qing Wang
Stats 2025, 8(4), 86; https://doi.org/10.3390/stats8040086 - 26 Sep 2025
Abstract
This paper investigates how diversity among training samples impacts the predictive performance of a subsampling-based ensemble. It is well known that diverse training samples improve ensemble predictions, and smaller subsampling rates naturally lead to enhanced diversity. However, this approach of achieving a higher
[...] Read more.
This paper investigates how diversity among training samples impacts the predictive performance of a subsampling-based ensemble. It is well known that diverse training samples improve ensemble predictions, and smaller subsampling rates naturally lead to enhanced diversity. However, this approach of achieving a higher degree of diversity often comes with the cost of a reduced training sample size, which is undesirable. This paper introduces two novel subsampling strategies—partition and shift subsampling—as alternative schemes designed to improve diversity without sacrificing the training sample size in subsampling-based ensemble methods. From a probabilistic perspective, we investigate their impact on subsample diversity when utilized with tree-based sub-ensemble learners in comparison to the benchmark random subsampling. Through extensive simulations and eight real-world examples in both regression and classification contexts, we found a significant improvement in the predictive performance of the developed methods. Notably, this gain is particularly pronounced on challenging datasets or when higher subsampling rates are employed.
Full article
(This article belongs to the Section Applied Statistics and Machine Learning Methods)
►▼
Show Figures

Figure 1
Open AccessReview
An Overview of Economics and Econometrics Related R Packages
by
Despina Michelaki, Michail Tsagris and Christos Adam
Stats 2025, 8(4), 85; https://doi.org/10.3390/stats8040085 - 26 Sep 2025
Abstract
►▼
Show Figures
This study provides a systematic overview of 207 econometrics-related R packages identified through CRAN and the Econometrics Task View. Using descriptive and inferential statistics and text mining to compute the word frequency and association among words (n-grams and correlations), we evaluate the development
[...] Read more.
This study provides a systematic overview of 207 econometrics-related R packages identified through CRAN and the Econometrics Task View. Using descriptive and inferential statistics and text mining to compute the word frequency and association among words (n-grams and correlations), we evaluate the development patterns, documentation practices, publication outcomes, and methodological scope. The findings reveal that most packages are created by small-to-mid-sized teams in Europe and North America, with mid-sized collaborations and packages including vignettes being significantly more likely to achieve journal publication. While reverse dependencies indicate strong ecosystem integration, they do not predict publication, and Bayesian or dataset-only packages remain underrepresented. Growth has accelerated since 2010, but newer packages exhibit fewer updates, raising concerns about sustainability. These findings highlight both the central role of R in contemporary econometrics and the need for broader participation, methodological diversity, and long-term maintenance.
Full article

Figure 1
Open AccessArticle
A Simulation-Based Comparative Analysis of Two-Parameter Robust Ridge M-Estimators for Linear Regression Models
by
Bushra Haider, Syed Muhammad Asim, Danish Wasim and B. M. Golam Kibria
Stats 2025, 8(4), 84; https://doi.org/10.3390/stats8040084 - 24 Sep 2025
Abstract
Traditional regression estimators like Ordinary Least Squares (OLS) and classical ridge regression often fail under multicollinearity and outlier contamination respectively. Although recently developed two-parameter ridge regression (TPRR) estimators improve efficiency by introducing dual shrinkage parameters, they remain sensitive to extreme observations. This study
[...] Read more.
Traditional regression estimators like Ordinary Least Squares (OLS) and classical ridge regression often fail under multicollinearity and outlier contamination respectively. Although recently developed two-parameter ridge regression (TPRR) estimators improve efficiency by introducing dual shrinkage parameters, they remain sensitive to extreme observations. This study develops a new class of Two-Parameter Robust Ridge M-Estimators (TPRRM) that integrate dual shrinkage with robust M-estimation to simultaneously address multicollinearity and outliers. A Monte Carlo simulation study, conducted under varying sample sizes, predictor dimensions, correlation levels, and contamination structures, compares the proposed estimators with OLS, ridge, and the most recent TPRR estimators. The results demonstrate that TPRRM consistently achieves the lowest Mean Squared Error (MSE), particularly in heavy-tailed and outlier-prone scenarios. Application to the Tobacco and Gasoline Consumption datasets further validates the superiority of the proposed methods in real-world conditions. The findings confirm that the proposed TPRRM fills a critical methodological gap by offering estimators that are not only efficient under multicollinearity, but also robust against departures from normality.
Full article
(This article belongs to the Special Issue Advances in Machine Learning, High-Dimensional Inference, Shrinkage Estimation, and Model Validation)
►▼
Show Figures

Figure 1
Open AccessArticle
Confidence Intervals of Risk Ratios for the Augmented Logistic Regression with Pseudo-Observations
by
Hiroyuki Shiiba and Hisashi Noma
Stats 2025, 8(3), 83; https://doi.org/10.3390/stats8030083 - 18 Sep 2025
Abstract
The augmented logistic regression proposed by Diaz-Quijano directly provides risk ratios with an augmented dataset with the pseudo-observations. However, the standard errors of regression coefficients cannot be accurately estimated using either the ordinary model variance estimator or the robust variance estimator, as neither
[...] Read more.
The augmented logistic regression proposed by Diaz-Quijano directly provides risk ratios with an augmented dataset with the pseudo-observations. However, the standard errors of regression coefficients cannot be accurately estimated using either the ordinary model variance estimator or the robust variance estimator, as neither method appropriately accounts for the pseudo-observations. In this study, we proposed two resampling strategies based on the bootstrap and jackknife methods to construct improved variance estimators for the augmented logistic regression. Both procedures can reflect the overall uncertainty of the augmented dataset involving the pseudo-observations and require only standard software, making them feasible for a wide range of clinical and epidemiological researchers. We validated these proposed methods through comprehensive simulation studies, which demonstrated that both the bootstrap- and jackknife-based variance estimators provided smaller standard error estimates and correspondingly narrower 95% confidence intervals, whereas the robust variance estimator remained biased. Additionally, we applied the proposed methods to real-world binary data, confirming their practical utility.
Full article
(This article belongs to the Section Biostatistics)
►▼
Show Figures

Figure 1
Open AccessArticle
A Mixture Model for Survival Data with Both Latent and Non-Latent Cure Fractions
by
Eduardo Yoshio Nakano, Frederico Machado Almeida and Marcílio Ramos Pereira Cardial
Stats 2025, 8(3), 82; https://doi.org/10.3390/stats8030082 - 13 Sep 2025
Abstract
One of the most popular cure rate models in the literature is the Berkson and Gage mixture model. A characteristic of this model is that it considers the cure to be a latent event. However, there are situations in which the cure is
[...] Read more.
One of the most popular cure rate models in the literature is the Berkson and Gage mixture model. A characteristic of this model is that it considers the cure to be a latent event. However, there are situations in which the cure is well known, and this information must be considered in the analysis. In this context, this paper proposes a mixture model that accommodates both latent and non-latent cure fractions. More specifically, the proposal is to extend the Berkson and Gage mixture model to include the knowledge of the cure. A simulation study was conducted to investigate the asymptotic properties of maximum likelihood estimators. Finally, the proposed model is illustrated through an application to credit risk modeling.
Full article
(This article belongs to the Section Survival Analysis)
►▼
Show Figures

Figure 1
Open AccessArticle
The Unit-Modified Weibull Distribution: Theory, Estimation, and Real-World Applications
by
Ammar M. Sarhan, Thamer Manshi and M. E. Sobh
Stats 2025, 8(3), 81; https://doi.org/10.3390/stats8030081 - 12 Sep 2025
Abstract
►▼
Show Figures
This paper introduces the Unit-Modified Weibull (UMW) distribution, a novel probability model defined on the unit interval . We derive its key statistical properties and estimate its parameters using the maximum likelihood method. The performance of the
[...] Read more.
This paper introduces the Unit-Modified Weibull (UMW) distribution, a novel probability model defined on the unit interval . We derive its key statistical properties and estimate its parameters using the maximum likelihood method. The performance of the estimators is assessed via a simulation study based on mean squared error, coverage probability, and average confidence interval length. To evaluate the practical utility of the model, we analyze three real-world data sets. Both parametric and nonparametric goodness-of-fit techniques are employed to compare the UMW distribution with several well-established competing models. In addition, nonparametric diagnostic tools such as total time on test transform plots and violin plots are used to explore the data’s behavior and assess the adequacy of the proposed model. Results indicate that the UMW distribution offers a competitive and flexible alternative for modeling bounded data.
Full article

Figure 1
Open AccessReview
Statistical Tools Application for Literature Review: A Case on Maintenance Management Decision-Making in the Steel Industry
by
Nuno Miguel de Matos Torre, Valerio Antonio Pamplona Salomon and Luis Ernesto Quezada
Stats 2025, 8(3), 80; https://doi.org/10.3390/stats8030080 - 12 Sep 2025
Abstract
►▼
Show Figures
Literature review plays a crucial role in research. This paper explores bibliometrics, which utilize statistical tools to evaluate the researcher’s scientific contributions. Its intent is to map frequently cited articles and authors, identify top sources, track publication years, explore keywords and their co-occurrences,
[...] Read more.
Literature review plays a crucial role in research. This paper explores bibliometrics, which utilize statistical tools to evaluate the researcher’s scientific contributions. Its intent is to map frequently cited articles and authors, identify top sources, track publication years, explore keywords and their co-occurrences, and show article distribution by thematic area and country. Additionally, it provides a thematic map of relevance and progress, with special attention to interdisciplinary work. Finally, it also makes use of research findings in maintenance management decision-making, where the findings reveal that the literature provides valuable insights into the impact of the Analytic Hierarchy Process (AHP) method. Despite advancements in maintenance management, gaps persist in comprehensively addressing core themes, evolutionary trends, and future research directions. This research aims to bridge this gap by providing a detailed examination of the application of bibliometric analysis employing statistical tools to measure researchers’ scientific contributions, concerning the AHP method applications in maintenance management within the steel industry. The study confirmed that tools like VOSviewer and the Bibliometrix package in R can extract relevant information regarding bibliometric laws, helping us understand research patterns. These findings support strategic decision-making and the evaluation of scientific policies for researchers and institutions.
Full article

Figure 1
Open AccessArticle
Bootstrap Methods for Correcting Bias in WLS Estimators of the First-Order Bifurcating Autoregressive Model
by
Tamer Elbayoumi, Mutiyat Usman, Sayed Mostafa, Mohammad Zayed and Ahmad Aboalkhair
Stats 2025, 8(3), 79; https://doi.org/10.3390/stats8030079 - 5 Sep 2025
Abstract
►▼
Show Figures
In this study, we examine the presence of bias in weighted least squares (WLS) estimation within the context of first-order bifurcating autoregressive (BAR(1)) models. These models are widely used in the analysis of binary tree-structured data, particularly in cell lineage research. Our findings
[...] Read more.
In this study, we examine the presence of bias in weighted least squares (WLS) estimation within the context of first-order bifurcating autoregressive (BAR(1)) models. These models are widely used in the analysis of binary tree-structured data, particularly in cell lineage research. Our findings suggest that WLS estimators may exhibit significant and problematic biases, especially in finite samples. The magnitude and direction of this bias are influenced by both the autoregressive parameter and the correlation structure of the model errors. To address this issue, we propose two bootstrap-based methods for bias correction of the WLS estimator. The paper further introduces shrinkage-based versions of both single and fast double bootstrap bias correction techniques, designed to mitigate the over-correction and under-correction issues that may arise with traditional bootstrap methods, particularly in larger samples. Comprehensive simulation studies were conducted to evaluate the performance of the proposed bias-corrected estimators. The results show that the proposed corrections substantially reduce bias, with the most notable improvements observed at extreme values of the autoregressive parameter. Moreover, the study provides practical guidance for practitioners on method selection under varying conditions.
Full article

Figure 1
Open AccessArticle
On Synthetic Interval Data with Predetermined Subject Partitioning and Partial Control of the Variables’ Marginal Correlation Structure
by
Michail Papathomas
Stats 2025, 8(3), 78; https://doi.org/10.3390/stats8030078 - 27 Aug 2025
Abstract
►▼
Show Figures
A standard approach for assessing the performance of partition models is to create synthetic datasets with a prespecified clustering structure and assess how well the model reveals this structure. A common format involves subjects being assigned to different clusters, with observations simulated so
[...] Read more.
A standard approach for assessing the performance of partition models is to create synthetic datasets with a prespecified clustering structure and assess how well the model reveals this structure. A common format involves subjects being assigned to different clusters, with observations simulated so that subjects within the same cluster have similar profiles, allowing for some variability. In this manuscript, we consider observations from interval variables. Interval data are commonly observed in cohort and Genome-Wide Association studies, and our focus is on Single-Nucleotide Polymorphisms. Theoretical and empirical results are utilized to explore the dependence structure between the variables in relation to the clustering structure for the subjects. A novel algorithm is proposed that allows control over the marginal stratified correlation structure of the variables, specifying exact correlation values within groups of variables. Practical examples are shown, and a synthetic dataset is compared to a real one, to demonstrate similarities and differences.
Full article

Figure 1
Open AccessArticle
A Markov Chain Monte Carlo Procedure for Efficient Bayesian Inference on the Phase-Type Aging Model
by
Cong Nie, Xiaoming Liu, Serge Provost and Jiandong Ren
Stats 2025, 8(3), 77; https://doi.org/10.3390/stats8030077 - 27 Aug 2025
Abstract
►▼
Show Figures
The phase-type aging model (PTAM) belongs to a class of Coxian-type Markovian models that can provide a quantitative description of well-known aging characteristics that are part of a genetically determined, progressive, and irreversible process. Due to its unique parameter structure, estimation via the
[...] Read more.
The phase-type aging model (PTAM) belongs to a class of Coxian-type Markovian models that can provide a quantitative description of well-known aging characteristics that are part of a genetically determined, progressive, and irreversible process. Due to its unique parameter structure, estimation via the MLE method presents a considerable estimability issue, whereby profile likelihood functions are flat and analytically intractable. In this study, a Markov chain Monte Carlo (MCMC)-based Bayesian methodology is proposed and applied to the PTAM, with a view to improving parameter estimability. The proposed method provides two methodological extensions based on an existing MCMC inference method. First, we propose a two-level MCMC sampling scheme that makes the method applicable to situations where the posterior distributions do not assume simple forms after data augmentation. Secondly, an existing data augmentation technique for Bayesian inference on continuous phase-type distributions is further developed in order to incorporate left-truncated data. While numerical results indicate that the proposed methodology improves parameter estimability via sound prior distributions, this approach may also be utilized as a stand-alone statistical model-fitting technique.
Full article

Figure 1
Open AccessArticle
Pattern Classification for Mixed Feature-Type Symbolic Data Using Supervised Hierarchical Conceptual Clustering
by
Manabu Ichino and Hiroyuki Yaguchi
Stats 2025, 8(3), 76; https://doi.org/10.3390/stats8030076 - 25 Aug 2025
Abstract
►▼
Show Figures
This paper describes a region-oriented method of pattern classification based on the Cartesian system model (CSM), a mathematical model that allows manipulating mixed feature-type symbolic data. We use the supervised hierarchical conceptual clustering to generate class regions for respective pattern class based on
[...] Read more.
This paper describes a region-oriented method of pattern classification based on the Cartesian system model (CSM), a mathematical model that allows manipulating mixed feature-type symbolic data. We use the supervised hierarchical conceptual clustering to generate class regions for respective pattern class based on the evaluation of the generality of the regions and the separability of the regions against other classes in each clustering step. We can easily find the robustly informative features to describe each pattern class against other pattern classes. Some examples show the effectiveness of the proposed method.
Full article

Figure 1
Open AccessCommunication
Who Comes First and Who Gets Cited? A 25-Year Multi-Model Analysis of First-Author Gender Effects in Web of Science Economics
by
Daniela-Emanuela Dănăcică
Stats 2025, 8(3), 75; https://doi.org/10.3390/stats8030075 - 24 Aug 2025
Abstract
►▼
Show Figures
The aim of this research is to provide a 25-year multi-model analysis of gender dynamics in economics articles that include at least one Romanian-affiliated author, published in Web of Science journals between 2000 and 2025 (2025 records current as of 15 May 2025).
[...] Read more.
The aim of this research is to provide a 25-year multi-model analysis of gender dynamics in economics articles that include at least one Romanian-affiliated author, published in Web of Science journals between 2000 and 2025 (2025 records current as of 15 May 2025). Drawing on 4030 papers, we map the bibliometric gender gap by examining first-author status, collaboration patterns, research topics and citation impact. The results show that the female-to-male first-author ratio for Romanian-affiliated publications is close to parity, in sharp contrast to the pronounced under-representation of women among foreign-affiliated first authors. Combining negative binomial, journal fixed-effects Poisson, quantile regressions with a text-based topic analysis, we find no systematic or robust gender penalty in citations once structural and topical factors are controlled for. The initial gender gap largely reflects men’s over-representation in higher-impact journals rather than an intrinsic bias against women’s work. Team size consistently emerges as the strongest predictor of citations, and, by extension, scientific visibility. Our findings offer valuable insights into gender dynamics in a semi-peripheral scientific system, highlighting the nuanced interplay between institutional context, research practices, legislation and academic recognition.
Full article

Figure 1
Open AccessArticle
A Bayesian Non-Linear Mixed-Effects Model for Accurate Detection of the Onset of Cognitive Decline in Longitudinal Aging Studies
by
Franklin Fernando Massa, Marco Scavino and Graciela Muniz-Terrera
Stats 2025, 8(3), 74; https://doi.org/10.3390/stats8030074 - 18 Aug 2025
Abstract
►▼
Show Figures
Change-point models are frequently considered when modeling phenomena where a regime shift occurs at an unknown time. In aging research, these models are commonly adopted to estimate of the onset of cognitive decline. Yet these models present several limitations. Here, we present a
[...] Read more.
Change-point models are frequently considered when modeling phenomena where a regime shift occurs at an unknown time. In aging research, these models are commonly adopted to estimate of the onset of cognitive decline. Yet these models present several limitations. Here, we present a Bayesian non-linear mixed-effects model based on a differential equation designed for longitudinal studies to overcome some limitations of classical change point models used in aging research. We demonstrate the ability of the proposed model to avoid biases in estimates of the onset of cognitive impairment in a simulated study. Finally, the methodology presented in this work is illustrated by analyzing results from memory tests from older adults who participated in the English Longitudinal Study of Aging.
Full article

Figure 1
Open AccessArticle
A Mixture Integer GARCH Model with Application to Modeling and Forecasting COVID-19 Counts
by
Wooi Chen Khoo, Seng Huat Ong, Victor Jian Ming Low and Hari M. Srivastava
Stats 2025, 8(3), 73; https://doi.org/10.3390/stats8030073 - 13 Aug 2025
Abstract
►▼
Show Figures
This article introduces a flexible time series regression model known as the Mixture of Integer-Valued Generalized Autoregressive Conditional Heteroscedasticity (MINGARCH). Mixture models provide versatile frameworks for capturing heterogeneity in count data, including features such as multiple peaks, seasonality, and intervention effects. The proposed
[...] Read more.
This article introduces a flexible time series regression model known as the Mixture of Integer-Valued Generalized Autoregressive Conditional Heteroscedasticity (MINGARCH). Mixture models provide versatile frameworks for capturing heterogeneity in count data, including features such as multiple peaks, seasonality, and intervention effects. The proposed model is applied to regional COVID-19 data from Malaysia. To account for geographical variability, five regions—Selangor, Kuala Lumpur, Penang, Johor, and Sarawak—were selected for analysis, covering a total of 86 weeks of data. Comparative analysis with existing time series regression models demonstrates that MINGARCH outperforms alternative approaches. Further investigation into forecasting reveals that MINGARCH yields superior performance in regions with high population density, and significant influencing factors have been identified. In low-density regions, confirmed cases peaked within three weeks, whereas high-density regions exhibited a monthly seasonal pattern. Forecasting metrics—including MAPE, MAE, and RMSE—are significantly lower for the MINGARCH model compared to other models. These results suggest that MINGARCH is well-suited for forecasting disease spread in urban and densely populated areas, offering valuable insights for policymaking.
Full article

Figure 1
Open AccessCommunication
On the Appropriateness of Fixed Correlation Assumptions in Repeated-Measures Meta-Analysis: A Monte Carlo Assessment
by
Vasileios Papadopoulos
Stats 2025, 8(3), 72; https://doi.org/10.3390/stats8030072 - 13 Aug 2025
Abstract
►▼
Show Figures
In repeated-measures meta-analyses, raw data are often unavailable, preventing the calculation of the correlation coefficient r between pre- and post-intervention values. As a workaround, many researchers adopt a heuristic approximation of r = 0.7. However, this value lacks rigorous mathematical justification and may
[...] Read more.
In repeated-measures meta-analyses, raw data are often unavailable, preventing the calculation of the correlation coefficient r between pre- and post-intervention values. As a workaround, many researchers adopt a heuristic approximation of r = 0.7. However, this value lacks rigorous mathematical justification and may introduce bias into variance estimates of pre/post-differences. We employed Monte Carlo simulations (n = 500,000 per scenario) in Fisher z-space to examine the distribution of the standard deviation of pre-/post-differences (σD) under varying assumptions of r and its uncertainty (σr). Scenarios included r = 0.5, 0.6, 0.707, 0.75, and 0.8, each tested across three levels of variance (σr = 0.05, 0.1, and 0.15). The approximation of r = 0.75 resulted in a balanced estimate of σD, corresponding to a “midway” variance attenuation due to paired data. This value more accurately offsets the deficit caused by assuming a correlation, compared to the traditional value of 0.7. While the r = 0.7 heuristic remains widely used, our results support the use of r = 0.75 as a more mathematically neutral and empirically defensible alternative in repeated-measures meta-analyses lacking raw data.
Full article

Figure 1
Open AccessArticle
Individual Homogeneity Learning in Density Data Response Additive Models
by
Zixuan Han, Tao Li, Jinhong You and Narayanaswamy Balakrishnan
Stats 2025, 8(3), 71; https://doi.org/10.3390/stats8030071 - 9 Aug 2025
Abstract
►▼
Show Figures
In many complex applications, both data heterogeneity and homogeneity are present simultaneously. Overlooking either aspect can lead to misleading statistical inferences. Moreover, the increasing prevalence of complex, non-Euclidean data calls for more sophisticated modeling techniques. To address these challenges, we propose a density
[...] Read more.
In many complex applications, both data heterogeneity and homogeneity are present simultaneously. Overlooking either aspect can lead to misleading statistical inferences. Moreover, the increasing prevalence of complex, non-Euclidean data calls for more sophisticated modeling techniques. To address these challenges, we propose a density data response additive model, where the response variable is represented by a distributional density function. In this framework, individual effect curves are assumed to be homogeneous within groups but heterogeneous across groups, while covariates that explain variation share common additive bivariate functions. We begin by applying a transformation to map density functions into a linear space. To estimate the unknown subject-specific functions and the additive bivariate components, we adopt a B-spline series approximation method. Latent group structures are uncovered using a hierarchical agglomerative clustering algorithm, which allows our method to recover the true underlying groupings with high probability. To further improve estimation efficiency, we develop refined spline-backfitted local linear estimators for both the grouped structures and the additive bivariate functions in the post-grouping model. We also establish the asymptotic properties of the proposed estimators, including their convergence rates, asymptotic distributions, and post-grouping oracle efficiency. The effectiveness of our method is demonstrated through extensive simulation studies and real-world data analysis, both of which show promising and robust performance.
Full article

Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
JPM, Mathematics, Applied Sciences, Stats, Healthcare
Application of Biostatistics in Medical Sciences and Global Health
Topic Editors: Bogdan Oancea, Adrian Pană, Cǎtǎlina Liliana AndreiDeadline: 31 October 2026

Conferences
Special Issues
Special Issue in
Stats
Benford's Law(s) and Applications (Second Edition)
Guest Editors: Marcel Ausloos, Roy Cerqueti, Claudio LupiDeadline: 31 October 2025
Special Issue in
Stats
Nonparametric Inference: Methods and Applications
Guest Editor: Stefano BonniniDeadline: 28 November 2025
Special Issue in
Stats
Robust Statistics in Action II
Guest Editor: Marco RianiDeadline: 31 December 2025