Journal Description
Stats
Stats
is an international, peer-reviewed, open access journal on statistical science published bimonthly online by MDPI. The journal focuses on methodological and theoretical papers in statistics, probability, stochastic processes and innovative applications of statistics in all scientific disciplines including biological and biomedical sciences, medicine, business, economics and social sciences, physics, data science and engineering.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within ESCI (Web of Science), Scopus, RePEc, and other databases.
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 22.3 days after submission; acceptance to publication is undertaken in 3.8 days (median values for papers published in this journal in the second half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
- Journal Cluster of Artificial Intelligence: AI, AI in Medicine, Algorithms, BDCC, MAKE, MTI, Stats, Virtual Worlds and Computers.
Impact Factor:
1.0 (2024);
5-Year Impact Factor:
1.1 (2024)
Latest Articles
Unified Numerical Method for Stochastic Differential Equations with Poisson and Gaussian White Noises
Stats 2026, 9(3), 47; https://doi.org/10.3390/stats9030047 - 24 Apr 2026
Abstract
►
Show Figures
A method is developed for integrating stochastic differential equations (SDEs) with Poisson (PWN) and Gaussian (GWN) white noises interpreted as the formal derivatives of the compound Poisson and Brownian motion processes. In contrast to the current integration schemes, which solve discrete time versions
[...] Read more.
A method is developed for integrating stochastic differential equations (SDEs) with Poisson (PWN) and Gaussian (GWN) white noises interpreted as the formal derivatives of the compound Poisson and Brownian motion processes. In contrast to the current integration schemes, which solve discrete time versions of the posed SDEs, the proposed method solves the posed SDEs for finite dimensional (FD) models of the compound Poisson and Brownian motion processes, i.e., finite sums of deterministic functions of time weighted by random coefficients. Paths of the resulting solutions, referred to as FD solutions, can be generated by standard ordinary differential equation (ODE) solvers since the paths of the FD input models are smooth. We also establish conditions under which the distributions of extremes and other continuous functionals of the solutions of the posed SDEs can be approximated by those of their FD solutions. This is essential in applications since the distributions of functionals of FD solutions can be estimated while those of actual solutions are rarely available analytically and cannot be obtained numerically.
Full article
Open AccessArticle
A Practical Framework for Incorporating Complex Survey Design in Bayesian Kernel Machine Regression
by
Doreen Jehu-Appiah and Emmanuel Obeng-Gyasi
Stats 2026, 9(3), 46; https://doi.org/10.3390/stats9030046 - 23 Apr 2026
Abstract
►▼
Show Figures
Large-scale population datasets are rarely generated via simple random sampling; instead, they reflect complex designs involving stratification, clustering, and unequal inclusion probabilities. While survey weights are provided to recover population-representative estimates, standard Bayesian Kernel Machine Regression (BKMR), a flexible nonlinear model for high-dimensional
[...] Read more.
Large-scale population datasets are rarely generated via simple random sampling; instead, they reflect complex designs involving stratification, clustering, and unequal inclusion probabilities. While survey weights are provided to recover population-representative estimates, standard Bayesian Kernel Machine Regression (BKMR), a flexible nonlinear model for high-dimensional exposure mixtures, does not explicitly accommodate these design features. We present a simulation-based framework that evaluates performance under complex sampling by comparing two analytic strategies applied to identical survey-like data: (i) a naïve, unweighted BKMR implementation and (ii) a design-aware workflow that can be executed using existing software without modifying the BKMR algorithm itself. Finite populations are generated with correlated exposures and a known nonlinear data-generating function. Stratified two-stage cluster samples are then drawn under both non-informative and exposure-dependent (informative) selection mechanisms, with controlled intra-class correlation (ICC). The design-aware approach incorporates sampling weights through resampling of the dataset while preserving primary sampling unit structure, followed by standard BKMR fitting. Methods are evaluated using bias, interval width, and empirical 95% coverage relative to the known truth. Across simulation scenarios, naïve BKMR exhibits bias and systematic under-coverage under informative sampling, with empirical 95% coverage often dropping to approximately 0–40%, whereas the design-aware workflow improves coverage to approximately 40–60%, moving results closer to nominal levels. These findings provide a practical, implementation-ready strategy for integrating survey design considerations into BKMR analyses and delineate conditions under which accounting for sampling design affects inference. While the proposed approach improves inferential performance relative to naïve BKMR, it does not fully achieve nominal coverage, indicating that further methodological development is required for fully valid uncertainty quantification under complex survey designs.
Full article

Figure 1
Open AccessArticle
Coverage and Precision of Net Promoter Score Confidence Intervals Across Sampling Distributions
by
Philip Turk, Jordan Cinderich and Emma McNeill
Stats 2026, 9(2), 45; https://doi.org/10.3390/stats9020045 - 21 Apr 2026
Abstract
►▼
Show Figures
The Net Promoter Score (NPS) is a widely used metric for customer loyalty in business. However, the current theoretical gaps in the literature suggest practical refinements for real-world applications. In this simulation study, we use an unbiased estimator of the variance for the
[...] Read more.
The Net Promoter Score (NPS) is a widely used metric for customer loyalty in business. However, the current theoretical gaps in the literature suggest practical refinements for real-world applications. In this simulation study, we use an unbiased estimator of the variance for the sample NPS to examine coverage and width for three different confidence interval methods: Wald, bootstrap t, and adjusted Wald with weights corresponding to four underlying population distribution shapes: extreme (E), left-skewed (LS), triangular (T), and uniform (U). As the sample size increased, all methods approached the nominal 95% coverage rate with an exception for the extreme population; the adjusted Wald method with triangular and uniform weights is particularly robust among the representative population shapes examined. All adjusted Wald methods performed comparably in width, especially at a larger n. The confidence interval width depended on the population shape. Overall, the Wald and bootstrap t methods should be avoided at small sample sizes and are not recommended. Our methods raise awareness of the sampling distribution of the NPS statistic, provide a theoretical basis for an unbiased estimator of the variance, and assess reliable confidence interval construction. These results provide an informed application of NPS and lay the foundation for future methodological development.
Full article

Figure 1
Open AccessArticle
Assessing the Accuracy of Bootstrap-Based Standard Errors in Regression Models with Unobserved Heterogeneity
by
Yingjuan Zhang and Jochen Einbeck
Stats 2026, 9(2), 44; https://doi.org/10.3390/stats9020044 - 18 Apr 2026
Abstract
When the data at hand are suspected to stem from several latent subpopulations, Statisticians commonly speak of “unobserved heterogeneity”. While the presence and importance of this phenomenon is commonly acknowledged, there is relatively little guidance on how to carry out correct inferences under
[...] Read more.
When the data at hand are suspected to stem from several latent subpopulations, Statisticians commonly speak of “unobserved heterogeneity”. While the presence and importance of this phenomenon is commonly acknowledged, there is relatively little guidance on how to carry out correct inferences under unobserved heterogeneities, even in relatively simple scenarios such as the linear regression model. In this work, bootstrap algorithms for the computation of standard errors are investigated in the context of a mixture-based regression approach which accounts for the clustered nature of the data. Of interest is both the accuracy of the standard errors (evidenced by confidence interval coverage rates) and the relative reduction in standard errors achieved in comparison to a naïve linear model fit. Simulations and a real data example are provided.
Full article
(This article belongs to the Section Regression Models)
►▼
Show Figures

Figure 1
Open AccessArticle
Scalable Likelihood Inference for Student-t Copula Count Time Series
by
Quynh Nhu Nguyen and Victor De Oliveira
Stats 2026, 9(2), 43; https://doi.org/10.3390/stats9020043 - 17 Apr 2026
Abstract
►▼
Show Figures
Count time series often exhibit extremal dependence that may not be adequately captured by Gaussian copula models. We develop a likelihood-based framework for count-valued time series using Student-t copulas with latent ARMA dependence. The latent process is constructed through a scale-mixture representation
[...] Read more.
Count time series often exhibit extremal dependence that may not be adequately captured by Gaussian copula models. We develop a likelihood-based framework for count-valued time series using Student-t copulas with latent ARMA dependence. The latent process is constructed through a scale-mixture representation of a Gaussian ARMA process, preserving the second-order dependence structure while introducing tail dependence and greater persistence of extreme events. Likelihood inference requires evaluating high-dimensional truncated multivariate t probabilities, which is computationally demanding under heavy tails and strong serial dependence. To address this challenge, we develop scalable likelihood approximations tailored to the time series structure. In particular, we formulate a time series version of minimax exponential tilting for multivariate t probabilities, termed Time Series Minimax Exponential Tilting (TMET), which exploits the exact conditional representation of the latent ARMA process. The resulting algorithm reduces computational complexity from cubic to near-linear in the series length while retaining the high accuracy of minimax exponential tilting. For comparison, we also extend two widely used Gaussian copula approximations—the continuous extension (CE) method and the Geweke–Hajivassiliou–Keane (GHK) simulator—to the Student-t copula setting. Simulation studies show that TMET outperforms CE and GHK, particularly under strong dependence, heavy tails, and low-count regimes. The framework also supports predictive inference and residual diagnostics. An application to weekly rotavirus counts illustrates how the Student-t copula provides a flexible extension of the Gaussian copula while retaining stable inference even when tail dependence is weak or absent.
Full article

Figure 1
Open AccessArticle
MAI-GAN: An Inferentially Calibrated Generative Framework for Multilevel Longitudinal Data with Applications to Educational Intersectionality
by
Benjamin Hechtman, Ross H. Nehm and Wei Zhu
Stats 2026, 9(2), 42; https://doi.org/10.3390/stats9020042 - 9 Apr 2026
Abstract
►▼
Show Figures
Synthetic datasets are increasingly used in education research for methodological validation, privacy-preserving data sharing, and reproducible equity analysis; however, most generative approaches prioritize marginal distributional similarity without ensuring preservation of multilevel inferential properties. This limitation is consequential for repeated-measures data analyzed using intersectionality-focused
[...] Read more.
Synthetic datasets are increasingly used in education research for methodological validation, privacy-preserving data sharing, and reproducible equity analysis; however, most generative approaches prioritize marginal distributional similarity without ensuring preservation of multilevel inferential properties. This limitation is consequential for repeated-measures data analyzed using intersectionality-focused hierarchical models, where conclusions depend on variance partitioning, partial pooling, and stratum-level heterogeneity. We introduce MAI-GAN, a hybrid generative framework that implements a structure–residual decomposition approach combining Bayesian longitudinal MAIHDA with conditional GAN-based residual generation. Inferential fidelity is operationalized with respect to multilevel intersectional models by explicitly targeting the preservation of fixed effects, variance components, and variance partitioning coefficients, while baseline composition is maintained via stratified bootstrap resampling. Applied to a six-semester undergraduate biology dataset (N = 2669 students), MAI-GAN was evaluated across multiple independent random seeds and consistently reproduced baseline-dependent residual structure and key inferential quantities. These results demonstrate that model-aligned generative strategies can produce synthetic longitudinal datasets that remain coherent under intersectionality-focused multilevel analysis, offering a principled foundation for equity-oriented synthetic data generation.
Full article

Figure 1
Open AccessArticle
A Novel Exponentiated Pareto Exponential Distribution with Applications in Environmental and Financial Datasets
by
Ibrahim Sule and Mogiveny Rajkoomar
Stats 2026, 9(2), 41; https://doi.org/10.3390/stats9020041 - 9 Apr 2026
Abstract
Environmental and financial datasets often display complex distributional characteristics, including heavy tails, high skewness and the presence of extreme observations. Traditional probability models such as the exponential, gamma or log-normal distributions may not adequately capture these behaviours particularly when modelling extreme events such
[...] Read more.
Environmental and financial datasets often display complex distributional characteristics, including heavy tails, high skewness and the presence of extreme observations. Traditional probability models such as the exponential, gamma or log-normal distributions may not adequately capture these behaviours particularly when modelling extreme events such as rainfall, pollution levels, stock returns or loss severities. By integrating the characteristics of Pareto and exponential distributions into an exponentiated framework that can describe datasets arising from environmental and finance fields, this study presents a novel three-parameter exponentiated Pareto exponential distributions using the exponentiated Pareto family of distributions with classical exponential distribution as the baseline model. This novel model extends the classical exponential distribution with the addition of extra shape parameters which simultaneously regulate the centre and tail behaviours of the new model. The statistical and mathematical characteristics of the proposed distribution are determined and studied. The maximum likelihood estimate approach is used in a conducted simulation exercise, and the estimator’s efficiency is evaluated as seen from the results. The practical applicability of the model is illustrated with four real-life datasets utilising model adequacy and goodness-of-fit measurements such as log–likelihood, Akaike information criteria and Bayesian information criteria. The data reveal that the proposed model gives a better fit than the models chosen as comparators, making the EPE distribution useful and robust in environmental and financial fields of study.
Full article
(This article belongs to the Special Issue Advances in Machine Learning, High-Dimensional Inference, Shrinkage Estimation, and Model Validation)
►▼
Show Figures

Figure 1
Open AccessArticle
A New Partially Linear Regression with an Application to the Price of Coffee Before and After the Pandemic
by
Edwin M. M. Ortega, Gabriela M. Rodrigues, Kwan Sung Jang and Gauss M. Cordeiro
Stats 2026, 9(2), 40; https://doi.org/10.3390/stats9020040 - 8 Apr 2026
Abstract
►▼
Show Figures
We propose a partially linear regression linear model to explain coffee prices before and after the COVID-19 pandemic. This new regression model incorporates the fundamental assumption of linearity and nonlinearity between these variables. We consider the penalized quasi-likelihood method for parameter estimation and
[...] Read more.
We propose a partially linear regression linear model to explain coffee prices before and after the COVID-19 pandemic. This new regression model incorporates the fundamental assumption of linearity and nonlinearity between these variables. We consider the penalized quasi-likelihood method for parameter estimation and present residual analysis for the new regression model. A simulation study examines penalized quasi-likelihood estimators and the empirical distribution of the quantile residuals. Furthermore, the article aims to identify variables that influence changes in coffee prices, such as the price of Arabica and Robusta varieties, supply (expressed in millions of bags of production), global consumption, exchange rates, inflation, and the pandemic.
Full article

Figure 1
Open AccessArticle
A New Depth-Based Test for Multivariate Two-Sample Problems
by
My Luu, Yuejiao Fu, Augustine Wong and Xiaoping Shi
Stats 2026, 9(2), 39; https://doi.org/10.3390/stats9020039 - 3 Apr 2026
Abstract
Statistical depth provides a center–outward ordering of multivariate observations and is widely used in nonparametric inference. We study depth-based tests for multivariate two-sample problems and examine the behaviour of different depth notions using the DD plot (data-depth plot) across a variety of distributional
[...] Read more.
Statistical depth provides a center–outward ordering of multivariate observations and is widely used in nonparametric inference. We study depth-based tests for multivariate two-sample problems and examine the behaviour of different depth notions using the DD plot (data-depth plot) across a variety of distributional space. The DD plot illustrates that depth functions differ in their sensitivity to distributional differences, emphasizing the importance of depth selection in two-sample testing. We propose a new two-sample test statistic, log DDR, constructed from ratios of numerical depth values rather than depth-induced ranks. Simulation studies under multiple scenarios and for three representative depth functions indicate that log DDR achieves improved power relative to several competing depth-based nonparametric tests. The results further demonstrate that the performance of log DDR and existing methods depends strongly on the chosen depth function, consistent with insights from the DD plot. These findings support a two-stage testing approach in which the DD plot is used to guide the choice of depth notion before applying log DDR for homogeneity testing.
Full article
(This article belongs to the Section Data Science)
►▼
Show Figures

Figure 1
Open AccessArticle
Multiple Imputation of a Continuous Outcome with Fully Observed Predictors Using TabPFN
by
Jerome Sepin
Stats 2026, 9(2), 38; https://doi.org/10.3390/stats9020038 - 1 Apr 2026
Abstract
Handling missing data is a central challenge in quantitative research, particularly when datasets exhibit complex dependency structures, such as nonlinear relationships and interactions. Multiple imputation (MI) via fully conditional specification (FCS), as implemented in the MICE R package, is widely used but relies
[...] Read more.
Handling missing data is a central challenge in quantitative research, particularly when datasets exhibit complex dependency structures, such as nonlinear relationships and interactions. Multiple imputation (MI) via fully conditional specification (FCS), as implemented in the MICE R package, is widely used but relies on user-specified models that may fail to capture complex dependency structures, especially in high-dimensional settings, or on more sophisticated algorithms that are considered data-hungry. This paper investigates the performance of TabPFN, a transformer-based, pretrained foundation model developed for tabular prediction tasks, for MI. TabPFN is pretrained on millions of synthetic datasets and approximates posterior predictive distributions without dataset-specific retraining, offering a compelling solution for imputing complex missing data in small to moderately sized samples. We conduct a simulation study focusing on univariate missingness in a continuous outcome with complete predictors, comparing TabPFN with standard MI methods. Performance is evaluated using bias, standard error, and coverage of the marginal mean estimand across a range of data-generating and missingness mechanisms. Our results show that TabPFN yields competitive or superior performance relative to Classification and Regression Trees and Predictive Mean Matching. These findings highlight TabPFN as a promising tool for missing data imputation, with particular relevance to health research.
Full article
(This article belongs to the Special Issue Statistical Methods for Hypothesis Testing)
►▼
Show Figures

Figure 1
Open AccessArticle
On the Classification–Causal Tradeoff in Neural Network Propensity Score Estimation
by
Seungman Kim, Jaehoon Lee and Kwanghee Jung
Stats 2026, 9(2), 37; https://doi.org/10.3390/stats9020037 - 31 Mar 2026
Abstract
►▼
Show Figures
Observational studies serve as a vital alternative to randomized experiments but are highly susceptible to selection bias. Propensity score (PS) methods address this by balancing covariates between groups. Although including all relevant covariates is theoretically ideal, high dimensionality often destabilizes traditional estimation models.
[...] Read more.
Observational studies serve as a vital alternative to randomized experiments but are highly susceptible to selection bias. Propensity score (PS) methods address this by balancing covariates between groups. Although including all relevant covariates is theoretically ideal, high dimensionality often destabilizes traditional estimation models. This study evaluates the efficacy of deep neural networks (DNN) and convolutional neural networks (CNN) for PS estimation compared to traditional logistic regression (LR), leveraging their capacity to handle complex nonlinear relationships and interactions. Using a Monte Carlo simulation across 36 conditions, model performance was evaluated based on bias and imbalance reduction. Results indicate that DNNs and CNNs significantly outperform LR. Specifically, while LR increased outcome bias by 17% and reduced covariate imbalance by only 5%, DNNs and CNNs reduced outcome bias by 13% and 16%, respectively, while decreasing covariate imbalance by 18% and 21%. We conclude that despite requiring specialized computational resources, neural networks offer substantial advantages for high-dimensional PS estimation. However, their reliable application necessitates stability-aware training and proper error rate thresholds to prevent probability degeneracy.
Full article

Figure 1
Open AccessArticle
On Dimension-Free Stochastic Surrogates and Estimators of Cross-Partial Derivatives and the Hessian Matrix
by
Matieyendou Lamboni
Stats 2026, 9(2), 36; https://doi.org/10.3390/stats9020036 - 29 Mar 2026
Abstract
This study introduces stochastic surrogates of all the cross-partial derivatives of functions using L evaluations of functions at randomized points. Such randomized points are constructed using the class of -spherical distributions or equivalent distributions. For the cross-partial derivatives of a given
[...] Read more.
This study introduces stochastic surrogates of all the cross-partial derivatives of functions using L evaluations of functions at randomized points. Such randomized points are constructed using the class of -spherical distributions or equivalent distributions. For the cross-partial derivatives of a given order , the proposed surrogates and the corresponding estimators of cross-partial derivatives enjoy the parametric rate of convergence and dimension-free mean squared errors when , leading to breaking down the curse of dimensionality. Imposing allows to break down the curse of dimensionality for only the cross-partial derivatives of orders given by . Also, the L-point-based Hessian surrogate and estimator are proposed, including the convergence analysis. A particular choice of p allows to achieve the dimension-free mean squared errors. Analytical examples and simulations have been provided to show the efficiency of such surrogates and estimators.
Full article
(This article belongs to the Section Computational Statistics)
Open AccessCommunication
Analyzing Complex Non-Linear Fascia-Muscle Interactions Using Cross-Recurrence Quantification Analysis
by
Andreas Brandl, Marcus Müller and Robert Schleip
Stats 2026, 9(2), 35; https://doi.org/10.3390/stats9020035 - 25 Mar 2026
Abstract
►▼
Show Figures
Biophysical, neurophysiological, psychological and social processes along with their interactions are complex, often non-linear and inherently time-dependent. However, time series analysis of such measurements usually requires extensive data processing and is therefore potentially associated with structural biases. This exploratory secondary analysis introduces cross-recurrence
[...] Read more.
Biophysical, neurophysiological, psychological and social processes along with their interactions are complex, often non-linear and inherently time-dependent. However, time series analysis of such measurements usually requires extensive data processing and is therefore potentially associated with structural biases. This exploratory secondary analysis introduces cross-recurrence quantification analysis (CRQA), which is explicitly suited to time series with complicated non-stationary properties. We illustrate and validate CRQA using a previous study that investigated the dynamic relationship between thoracolumbar fascia deformation and back extensor muscle activity in patients with low back pain. CRQA revealed significant differences in the relationships between fascia and muscles in low back pain patients compared to healthy individuals. The analysis revealed more specific aspects of fascia-muscle coupling than traditional analytical approaches, suggesting that CRQA is a useful additional tool for investigating time-dependent interactions with dynamic complex nonlinear patterns.
Full article

Figure 1
Open AccessArticle
Multidimensional Correlates of Childhood Stunting in India: A Spatial Machine Learning and Explainable AI Approach
by
Bhagyajyothi Rao, Md Gulzarull Hasan, Bandhavya Putturaya, Asha Kamath, Mohammad Aatif and Yousif M. Elmosaad
Stats 2026, 9(2), 34; https://doi.org/10.3390/stats9020034 - 24 Mar 2026
Abstract
Childhood stunting remains a major public health challenge in India and is influenced by multiple socioeconomic and environmental factors. This ecological study examined district-level correlates of childhood stunting, including Crimes Against Women (CAW), the Multidimensional Poverty Index (MPI), and drought severity, using data
[...] Read more.
Childhood stunting remains a major public health challenge in India and is influenced by multiple socioeconomic and environmental factors. This ecological study examined district-level correlates of childhood stunting, including Crimes Against Women (CAW), the Multidimensional Poverty Index (MPI), and drought severity, using data from NFHS-5, the National Crime Records Bureau, NITI Aayog’s MPI reports, and the Drought Atlas of India. Spatial autocorrelation and Spatial regression models were applied alongside machine learning approaches and SHAP-based Explainable AI (XAI) interpretation. Childhood stunting exhibited significant spatial clustering (Moran’s I = 0.520, p < 0.001), with hotspots in northern, central, and eastern India. Higher stunting was associated with higher birth order, low maternal BMI, child anaemia, and MPI, and negative associations with iodised salt usage, electricity access, and timely postnatal care. A significant spatial lag parameter (ρ = 0.348) indicated substantial spillover effects. Machine learning models consistently identified MPI, drought severity, and CAW as key predictors. The integrated spatial and machine learning framework identifies key correlates and spatial dependencies of childhood stunting, highlighting the need for region-specific, multisectoral interventions.
Full article
(This article belongs to the Section Applied Statistics and Machine Learning Methods)
►▼
Show Figures

Figure 1
Open AccessArticle
An Adaptive Method to Identify Outliers in Skewed Observations: Application to Assess NAACCR Cancer Registry Data Usage
by
Xiaowen Yang, Amjila Bam, Nubaira Rizvi, Xiao-Cheng Wu, Donald Mercante and Qingzhao Yu
Stats 2026, 9(2), 33; https://doi.org/10.3390/stats9020033 - 23 Mar 2026
Abstract
►▼
Show Figures
Outlier detection is a fundamental component of data preprocessing and quality monitoring across diverse scientific domains, including engineering, biomedical sciences, and finance. While many variables in controlled environments approximate a normal distribution, real-world data, particularly biological, environmental, and epidemiological measures, are frequently characterized
[...] Read more.
Outlier detection is a fundamental component of data preprocessing and quality monitoring across diverse scientific domains, including engineering, biomedical sciences, and finance. While many variables in controlled environments approximate a normal distribution, real-world data, particularly biological, environmental, and epidemiological measures, are frequently characterized by pronounced right-skewness. To address the shortcomings of conventional methods, this study introduces the Dynamic Threshold for Outlier Detection (DTOD), which reframes outlier detection as a concrete operational workflow. The DTOD framework dynamically adjusts detection thresholds based on a functional relationship between skewness and tail morphology. Validation through large-scale simulation experiments across light-, middle-, and high-skewness levels confirms the method’s versatility. The DTOD proves particularly effective at two ends of the spectrum: enhancing sensitivity for detecting subtle anomalies in light-skewed data while serving as a conservative, high-confidence screening tool that controls false positives in high-skewness environments. In real-world application to North American Association of Central Cancer Registries (NAACCR) data, the method successfully identified outliers with abnormally high unknown tumor size rates in colorectal cancer and maintained a low misclassification rate in highly skewed lung cancer data. Ultimately, the DTOD provides a promising, interpretable solution for improving data quality in skewed scenarios.
Full article

Figure 1
Open AccessArticle
Estimator Statistics from Simulation-Free Dirichlet Block-Bootstrap Resampling
by
Tillmann Rosenow
Stats 2026, 9(2), 32; https://doi.org/10.3390/stats9020032 - 20 Mar 2026
Abstract
Since the initiation of two variants of the bootstrap method by Efron and Rubin in the late 1970s, a variety of advancements has emerged in the literature. The subsampling of blocks enabled the estimation of the actual variance of the sample mean. The
[...] Read more.
Since the initiation of two variants of the bootstrap method by Efron and Rubin in the late 1970s, a variety of advancements has emerged in the literature. The subsampling of blocks enabled the estimation of the actual variance of the sample mean. The equivalence of the data-level and the estimator-level resampling is easily established for the sample mean and estimators alike. For Rubin’s variant of the bootstrap we apply an algorithm by Diniz et al. which allows for the numerically stable computation of the sample-based cumulative distribution function of the estimator under investigation. No actual Monte-Carlo resampling is necessary in this setting and we demonstrate how we get access to the very small probabilities of the tails and moreover to confidence intervals. We do this at the example of a well-known test model that exhibits geometrically decaying spatial correlations. The analysis naturally applies to temporally correlated systems or to the correlations occurring in Markov chains, as well.
Full article
(This article belongs to the Section Time Series Analysis)
►▼
Show Figures

Figure 1
Open AccessArticle
A Bayesian Approach for Clustering Constant-Wise Change-Point Data
by
Ana Carolina da Cruz and Camila P. E. de Souza
Stats 2026, 9(2), 31; https://doi.org/10.3390/stats9020031 - 17 Mar 2026
Abstract
Change-point models deal with ordered data sequences. Their primary goal is to infer the locations where an aspect of the data sequence changes. In this paper, we propose and implement a nonparametric Bayesian model for clustering observations based on their constant-wise change-point profiles
[...] Read more.
Change-point models deal with ordered data sequences. Their primary goal is to infer the locations where an aspect of the data sequence changes. In this paper, we propose and implement a nonparametric Bayesian model for clustering observations based on their constant-wise change-point profiles via a Gibbs sampler. Our model incorporates a Dirichlet process on the constant-wise change-point structures to cluster observations while simultaneously performing multiple change-point estimation. Additionally, our approach controls the number of clusters in the model, not requiring specification of the number of clusters a priori. Satisfactory clustering and estimation results were obtained when evaluating our method under various simulated scenarios and on a real dataset from single-cell genomic sequencing. Our proposed methodology is implemented as an R package called BayesCPclust and is available from the Comprehensive R Archive Network.
Full article
(This article belongs to the Section Bayesian Methods)
►▼
Show Figures

Figure 1
Open AccessCorrection
Correction: Risca et al. Archimedean Copulas: A Useful Approach in Biomedical Data—A Review with an Application in Pediatrics. Stats 2025, 8, 69
by
Giulia Risca, Stefania Galimberti, Paola Rebora, Alessandro Cattoni, Maria Grazia Valsecchi and Giulia Capitoli
Stats 2026, 9(2), 30; https://doi.org/10.3390/stats9020030 - 17 Mar 2026
Abstract
►▼
Show Figures
In the original publication [...]
Full article

Figure 1
Open AccessCommunication
Comparison of Minimal Circular Balanced RMDs Constructed Through Rule I and II of Cyclic Shifts Method
by
Muhammad Ejaz Malik, Muhammad Ameeq, Muhammad Riaz and Rashid Ahmed
Stats 2026, 9(2), 29; https://doi.org/10.3390/stats9020029 - 13 Mar 2026
Abstract
The repeated measurement design (RMD) is a cost-effective research design commonly used in various fields. RMDs have several advantages; however, the carryover effect is a fundamental issue. Carryover effects typically serve as the primary source of bias in the evaluation of treatment efficacy.
[...] Read more.
The repeated measurement design (RMD) is a cost-effective research design commonly used in various fields. RMDs have several advantages; however, the carryover effect is a fundamental issue. Carryover effects typically serve as the primary source of bias in the evaluation of treatment efficacy. To reduce this bias, minimal circular balanced RMDs (MCBRMDs) are utilized. Rule I of the cyclic shift method produces MCBRMDs for only the odd v (number of treatments to be compared). Rule II produces these designs for both v odd and v even. This article contributes to the literature by providing a systematic comparison of two cyclic shift rules for constructing MCBRMDs for odd v. The study provides useful guidance to experimenters in choosing effective designs under practical experimental restrictions by comparing these designs using efficiency of carryover effects and separability.
Full article
Open AccessArticle
Combinatorial Game Theory and Reinforcement Learning in Cumulative Tic-Tac-Toe via Evaluation Functions
by
Kai Li and Wei Zhu
Stats 2026, 9(2), 28; https://doi.org/10.3390/stats9020028 - 10 Mar 2026
Abstract
►▼
Show Figures
We introduce cumulative tic-tac-toe, a novel variant of the classic tic-tac-toe game in which play continues until the board is completely filled. Each player’s final score is determined by the total number of three-in-a-row sequences they form. Using combinatorial game
[...] Read more.
We introduce cumulative tic-tac-toe, a novel variant of the classic tic-tac-toe game in which play continues until the board is completely filled. Each player’s final score is determined by the total number of three-in-a-row sequences they form. Using combinatorial game theory (CGT), we establish that under optimal play, the game is a draw, and we characterize its theoretical properties. To empirically validate and optimize practical play, we develop a reinforcement learning (RL) framework based on temporal-difference (TD) learning, which is enhanced with a domain-informed evaluation function to accelerate convergence. The experimental results show that our triplet-coverage difference (TCD) evaluation function reduces the average number of training episodes by approximately 23.1% compared with a random-initialization baseline, a statistically significant improvement at the 5% significance level. These results demonstrate the efficiency of our CGT–RL approach for cumulative tic-tac-toe and suggest that similar methods may be useful for analyzing related combinatorial games. We also discuss potential analogies in domains such as competitive resource allocation and coalition formation, illustrating how cumulative-scoring games connect abstract game-theoretic ideas to practical sequential decision problems.
Full article

Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
JPM, Mathematics, Applied Sciences, Stats, Healthcare
Application of Biostatistics in Medical Sciences and Global Health
Topic Editors: Bogdan Oancea, Adrian Pană, Cǎtǎlina Liliana AndreiDeadline: 31 October 2026
Topic in
AppliedMath, Entropy, Mathematics, Stats, Sustainability, Symmetry, Algorithms, BDCC
Statistics and Data Science
Topic Editors: Jin-Ting Zhang, Tianming ZhuDeadline: 31 July 2027
Conferences
Special Issues
Special Issue in
Stats
Benford's Law(s) and Applications (Second Edition)
Guest Editors: Roy Cerqueti, Claudio LupiDeadline: 30 June 2026
Special Issue in
Stats
Extreme Weather Modeling and Forecasting
Guest Editor: Wei ZhuDeadline: 30 July 2026
Special Issue in
Stats
Statistical Methods for Hypothesis Testing
Guest Editors: Daniel Rodriguez, Jamil LaneDeadline: 20 August 2026
Special Issue in
Stats
Robust Statistics in Action II
Guest Editor: Marco RianiDeadline: 30 September 2026



