Journal Description
Stats
Stats
is an international, peer-reviewed, open access journal on statistical science published quarterly online by MDPI. The journal focuses on methodological and theoretical papers in statistics, probability, stochastic processes and innovative applications of statistics in all scientific disciplines including biological and biomedical sciences, medicine, business, economics and social sciences, physics, data science and engineering.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within ESCI (Web of Science), Scopus, RePEc, and other databases.
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 19.7 days after submission; acceptance to publication is undertaken in 3.9 days (median values for papers published in this journal in the second half of 2024).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
0.9 (2023);
5-Year Impact Factor:
1.0 (2023)
Latest Articles
A Note on the Robust Modification of the Ordered-Heterogeneity Test
Stats 2025, 8(2), 47; https://doi.org/10.3390/stats8020047 - 5 Jun 2025
Abstract
►
Show Figures
An ordered heterogeneity (OH) test is a test for a trend that combines a nondirectional heterogeneity test with the rank-order information specified under the alternative. A modified OH test introduced in 2006 can detect all possible patterns under the alternative with a relatively
[...] Read more.
An ordered heterogeneity (OH) test is a test for a trend that combines a nondirectional heterogeneity test with the rank-order information specified under the alternative. A modified OH test introduced in 2006 can detect all possible patterns under the alternative with a relatively high power. Here, it is proposed to apply the modified OH test as a permutation test, which has the advantage that it requires only the exchangeability of the observations in the combined sample under the null hypothesis. No additional assumptions or simulations of critical values are necessary. A simulation study indicates that the permutation OH test controls the significance level even for small sample sizes and has a good power, comparable to competing tests. Moreover, the main advantage of the OH tests is their very broad applicability; they can always be applied when a heterogeneity test exists.
Full article
Open AccessArticle
New Methods for Multivariate Normal Moments
by
Christopher Stroude Withers
Stats 2025, 8(2), 46; https://doi.org/10.3390/stats8020046 - 5 Jun 2025
Abstract
Multivariate normal moments are foundational for statistical methods. The derivation and simplification of these moments are critical for the accuracy of various statistical estimates and analyses. Normal moments are the building blocks of the Hermite polynomials, which in turn are the building blocks
[...] Read more.
Multivariate normal moments are foundational for statistical methods. The derivation and simplification of these moments are critical for the accuracy of various statistical estimates and analyses. Normal moments are the building blocks of the Hermite polynomials, which in turn are the building blocks of the Edgeworth expansions for the distribution of parameter estimates. Isserlis (1918) gave the bivariate normal moments and two special cases of trivariate moments. Beyond that, convenient expressions for multivariate variate normal moments are still not available. We compare three methods for obtaining them, the most powerful being the differential method. We give simpler formulas for the bivariate moment than that of Isserlis, and explicit expressions for the general moments of dimensions 3 and 4.
Full article
(This article belongs to the Section Multivariate Analysis)
Open AccessArticle
Evaluating Estimator Performance Under Multicollinearity: A Trade-Off Between MSE and Accuracy in Logistic, Lasso, Elastic Net, and Ridge Regression with Varying Penalty Parameters
by
H. M. Nayem, Sinha Aziz and B. M. Golam Kibria
Stats 2025, 8(2), 45; https://doi.org/10.3390/stats8020045 - 31 May 2025
Abstract
►▼
Show Figures
Multicollinearity in logistic regression models can result in inflated variances and yield unreliable estimates of parameters. Ridge regression, a regularized estimation technique, is frequently employed to address this issue. This study conducts a comparative evaluation of the performance of 23 established ridge regression
[...] Read more.
Multicollinearity in logistic regression models can result in inflated variances and yield unreliable estimates of parameters. Ridge regression, a regularized estimation technique, is frequently employed to address this issue. This study conducts a comparative evaluation of the performance of 23 established ridge regression estimators alongside Logistic Regression, Elastic-Net, Lasso, and Generalized Ridge Regression (GRR), considering various levels of multicollinearity within the context of logistic regression settings. Simulated datasets with high correlations (0.80, 0.90, 0.95, and 0.99) and real-world data (municipal and cancer remission) were analyzed. Both results show that ridge estimators, such as , and , exhibit strong performance in terms of Mean Squared Error (MSE) and accuracy, particularly in smaller samples, while GRR demonstrates superior performance in large samples. Real-world data further confirm that GRR achieves the lowest MSE in highly collinear municipal data, while ridge estimators and GRR help prevent overfitting in small-sample cancer remission data. The results underscore the efficacy of ridge estimators and GRR in handling multicollinearity, offering reliable alternatives to traditional regression techniques, especially for datasets with high correlations and varying sample sizes.
Full article

Figure 1
Open AccessArticle
mbX: An R Package for Streamlined Microbiome Analysis
by
Utsav Lamichhane and Jeferson Lourenco
Stats 2025, 8(2), 44; https://doi.org/10.3390/stats8020044 - 29 May 2025
Abstract
Here, we introduce the mbX package: an R-based tool designed to streamline 16S rRNA gene microbiome data analysis following taxonomic classification. It automates key post-sequencing steps, including taxonomic data cleaning and visualization, addressing the need for reproducible and user-friendly microbiome workflows. mbX’s core
[...] Read more.
Here, we introduce the mbX package: an R-based tool designed to streamline 16S rRNA gene microbiome data analysis following taxonomic classification. It automates key post-sequencing steps, including taxonomic data cleaning and visualization, addressing the need for reproducible and user-friendly microbiome workflows. mbX’s core functions, ezclean and ezviz, take raw taxonomic output (such as those from QIIME 2) and sample metadata to produce a cleaned relative abundance dataset and high-quality stacked bar plots with minimal manual intervention. We validated mbX on 14 real microbiome datasets, demonstrating significant improvements in efficiency and consistency of post-processing of DNA sequence data. The results show that mbX ensures uniform taxonomic formatting, eliminates common manual errors, and quickly generates publication-ready figures, greatly facilitating downstream analysis. For a dataset with 20 samples, both functions of mbX ran in less than 1 s and used less than 1 GB of memory. For a dataset with more than 1170 samples, the functions ran within 125 s and used less than 4.5 GB of memory. By integrating seamlessly with existing pipelines and emphasizing automation, mbX fills a critical gap between sequence classification and statistical analysis. An upcoming version will have an added function which will further extend mbX to automated statistical comparisons, aiming for an end-to-end microbiome analysis solution by integrating mbX with currently available pipelines. This article presents the design of mbX, its workflow and features, and a comparative discussion positioning mbX relative to other microbiome bioinformatics tools. The contributions of mbX highlight its significance in accelerating microbiome research through reproducible and streamlined data analysis.
Full article
(This article belongs to the Section Statistical Software)
►▼
Show Figures

Figure 1
Open AccessArticle
D-plots: Visualizations for Analysis of Bivariate Dependence Between Continuous Random Variables
by
Arturo Erdely and Manuel Rubio-Sánchez
Stats 2025, 8(2), 43; https://doi.org/10.3390/stats8020043 - 24 May 2025
Abstract
►▼
Show Figures
Scatter plots are widely recognized as fundamental tools for illustrating the relationship between two numerical variables. Despite this, based on solid theoretical foundations, scatter plots generated from pairs of continuous random variables may not serve as reliable tools for assessing dependence. Sklar’s theorem
[...] Read more.
Scatter plots are widely recognized as fundamental tools for illustrating the relationship between two numerical variables. Despite this, based on solid theoretical foundations, scatter plots generated from pairs of continuous random variables may not serve as reliable tools for assessing dependence. Sklar’s theorem implies that scatter plots created from ranked data are preferable for such analysis, as they exclusively convey information pertinent to dependence. This is in stark contrast to conventional scatter plots, which also encapsulate information about the variables’ marginal distributions. Such additional information is extraneous to dependence analysis and can obscure the visual interpretation of the variables’ relationship. In this article, we delve into the theoretical underpinnings of these ranked data scatter plots, hereafter referred to as rank plots. We offer insights into interpreting the information they reveal and examine their connections with various association measures, including Pearson’s and Spearman’s correlation coefficients, as well as Schweizer–Wolff’s measure of dependence. Furthermore, we introduce a novel visualization ensemble, termed a d-plot, which integrates rank plots, empirical copula diagnostics, and traditional summaries to provide a comprehensive visual assessment of dependence between continuous variables. This ensemble facilitates the detection of subtle dependence structures, including non-quadrant dependencies, that might be overlooked by traditional visual tools.
Full article

Figure 1
Open AccessArticle
Modeling Uncertainty in Ordinal Regression: The Uncertainty Rating Scale Model
by
Gerhard Tutz
Stats 2025, 8(2), 42; https://doi.org/10.3390/stats8020042 - 23 May 2025
Abstract
►▼
Show Figures
In questionnaires, respondents sometimes feel uncertain about which category to choose and may respond randomly. Including uncertainty in the modeling of response behavior aims to obtain more accurate estimates of the impact of explanatory variables on actual preferences and to avoid bias. Additionally,
[...] Read more.
In questionnaires, respondents sometimes feel uncertain about which category to choose and may respond randomly. Including uncertainty in the modeling of response behavior aims to obtain more accurate estimates of the impact of explanatory variables on actual preferences and to avoid bias. Additionally, variables that have an impact on uncertainty can be identified. A model is proposed that explicitly considers this uncertainty but also allows stronger certainty, depending on covariates. The developed uncertainty rating scale model is an extended version of the adjacent category model. It differs from finite mixture models, an approach that has gained popularity in recent years for modeling uncertainty. The properties of the model are investigated and compared to finite mixture models and other ordinal response models using illustrative datasets.
Full article

Figure 1
Open AccessArticle
Revisiting the Replication Crisis and the Untrustworthiness of Empirical Evidence
by
Aris Spanos
Stats 2025, 8(2), 41; https://doi.org/10.3390/stats8020041 - 20 May 2025
Abstract
►▼
Show Figures
The current replication crisis relating to the non-replicability and the untrustworthiness of published empirical evidence is often viewed through the lens of the Positive Predictive Value (PPV) in the context of the Medical Diagnostic Screening (MDS) model. The PPV is misconstrued as a
[...] Read more.
The current replication crisis relating to the non-replicability and the untrustworthiness of published empirical evidence is often viewed through the lens of the Positive Predictive Value (PPV) in the context of the Medical Diagnostic Screening (MDS) model. The PPV is misconstrued as a measure that evaluates ‘the probability of rejecting when false’, after being metamorphosed by replacing its false positive/negative probabilities with the type I/II error probabilities. This perspective gave rise to a widely accepted diagnosis that the untrustworthiness of published empirical evidence stems primarily from abuses of frequentist testing, including p-hacking, data-dredging, and cherry-picking. It is argued that the metamorphosed PPV misrepresents frequentist testing and misdiagnoses the replication crisis, promoting ill-chosen reforms. The primary source of untrustworthiness is statistical misspecification: invalid probabilistic assumptions imposed on one’s data. This is symptomatic of the much broader problem of the uninformed and recipe-like implementation of frequentist statistics without proper understanding of (a) the invoked probabilistic assumptions and their validity for the data used, (b) the reasoned implementation and interpretation of the inference procedures and their error probabilities, and (c) warranted evidential interpretations of inference results. A case is made that Fisher’s model-based statistics offers a more pertinent and incisive diagnosis of the replication crisis, and provides a well-grounded framework for addressing the issues (a)–(c), which would unriddle the non-replicability/untrustworthiness problems.
Full article

Figure 1
Open AccessArticle
An Analysis of Vectorised Automatic Differentiation for Statistical Applications
by
Chun Fung Kwok, Dan Zhu and Liana Jacobi
Stats 2025, 8(2), 40; https://doi.org/10.3390/stats8020040 - 19 May 2025
Abstract
Automatic differentiation (AD) is a general method for computing exact derivatives in complex sensitivity analyses and optimisation tasks, particularly when closed-form solutions are unavailable and traditional analytical or numerical methods fall short. This paper introduces a vectorised formulation of AD grounded in matrix
[...] Read more.
Automatic differentiation (AD) is a general method for computing exact derivatives in complex sensitivity analyses and optimisation tasks, particularly when closed-form solutions are unavailable and traditional analytical or numerical methods fall short. This paper introduces a vectorised formulation of AD grounded in matrix calculus. It aligns naturally with the matrix-oriented style prevalent in statistics, supports convenient implementations, and takes advantage of sparse matrix representation and other high-level optimisation techniques that are not available in the scalar counterpart. Our formulation is well-suited to high-dimensional statistical applications, where finite differences (FD) scale poorly due to the need to repeat computations for each input dimension, resulting in significant overhead, and is advantageous in simulation-intensive settings—such as Markov Chain Monte Carlo (MCMC)-based inference—where FD requires repeated sampling and multiple function evaluations, while AD can compute exact derivatives in a single pass, substantially reducing computational cost. Numerical studies are presented to demonstrate the efficacy and speed of the proposed AD method compared with FD schemes.
Full article
(This article belongs to the Section Computational Statistics)
Open AccessArticle
Theoretical Advancements in Small Area Modeling: A Case Study with the CHILD Cohort
by
Charanpal Singh and Mahmoud Torabi
Stats 2025, 8(2), 39; https://doi.org/10.3390/stats8020039 - 16 May 2025
Abstract
►▼
Show Figures
Developing accurate predictive models in statistical analysis presents significant challenges, especially in domains with limited routine assessments. This study aims to advance the theoretical underpinnings of longitudinal logistic and zero-inflated Poisson (ZIP) models in the context of small area estimation (SAE). Utilizing data
[...] Read more.
Developing accurate predictive models in statistical analysis presents significant challenges, especially in domains with limited routine assessments. This study aims to advance the theoretical underpinnings of longitudinal logistic and zero-inflated Poisson (ZIP) models in the context of small area estimation (SAE). Utilizing data from the Canadian Healthy Infant Longitudinal Development (CHILD) study as a case study, we explore the use of individual- and area-level random effects to enhance model precision and reliability. The study evaluates various covariates’ impact (such as mother’s asthma, mother wheezed, mother smoked) on model performance to predict child’s wheezing, emphasizing the role of location within Manitoba. Our main findings contribute to the literature by providing insights into the development and refinement of small area models, emphasizing the significance of advancing theoretical frameworks in statistical modeling.
Full article

Figure 1
Open AccessArticle
Determinants of Blank and Null Votes in the Brazilian Presidential Elections
by
Renata Rojas Guerra, Kerolene De Souza Moraes, Fernando De Jesus Moreira Junior, Fernando A. Peña-Ramírez and Ryan Novaes Pereira
Stats 2025, 8(2), 38; https://doi.org/10.3390/stats8020038 - 13 May 2025
Abstract
This study analyzes the factors influencing the proportions of blank and null votes in Brazilian municipalities during the 2018 presidential elections. The behavior of the variable of interest is examined using unit regression models within the Generalized Additive Models for Location, Scale, and
[...] Read more.
This study analyzes the factors influencing the proportions of blank and null votes in Brazilian municipalities during the 2018 presidential elections. The behavior of the variable of interest is examined using unit regression models within the Generalized Additive Models for Location, Scale, and Shape (GAMLSS) framework. Specifically, five different unit regression models are explored, beta, simplex, Kumaraswamy, unit Weibull, and reflected unit Burr XII regressions, each incorporating submodels for both indexed distribution parameters. The beta regression model emerges as the best fit through rigorous model selection and diagnostic procedures. The findings reveal that the disaggregated municipal human development index (MHDI), particularly its income, longevity, and education dimensions, along with the municipality’s geographic region, significantly affect voting behavior. Notably, higher income and longevity values are linked to greater proportions of blank and null votes, whereas the educational level exhibits a negative relationship with the variable of interest. Additionally, municipalities in the Southeast region tend to have higher average proportions of blank and null votes. In terms of variability, the ability of a municipality’s population to acquire goods and services is shown to negatively influence the dispersion of vote proportions, while municipalities in the Northeast, North, and Southeast regions exhibit distinct patterns of variation compared to other regions. These results provide valuable insights into electoral participation’s socioeconomic and regional determinants, contributing to broader discussions on political engagement and democratic representation in Brazil.
Full article
(This article belongs to the Section Regression Models)
►▼
Show Figures

Figure 1
Open AccessArticle
A Cox Proportional Hazards Model with Latent Covariates Reflecting Students’ Preparation, Motives, and Expectations for the Analysis of Time to Degree
by
Dimitrios Kalamaras, Laura Maska and Fani Nasika
Stats 2025, 8(2), 37; https://doi.org/10.3390/stats8020037 - 13 May 2025
Abstract
Issues related to the duration of university studies have attracted the interest of many researchers from different scientific fields, as far back as the middle of the 20th century. In this study, a Survival Analysis methodology and, more specifically, a Cox Proportional hazards
[...] Read more.
Issues related to the duration of university studies have attracted the interest of many researchers from different scientific fields, as far back as the middle of the 20th century. In this study, a Survival Analysis methodology and, more specifically, a Cox Proportional hazards model, has been proposed to evaluate a theoretical framework/model that relates the risk a student might face either graduating on time or having a late graduation, with a number of observed and latent factors that have been proposed in the literature as the main determinants of time to degree completion. The major findings of the analysis suggest that the factors contributing to reducing the duration of studies include high academic achievements at early stages, positive motivation, expectations, attitudes, and beliefs regarding studies. On the contrary, external situations, negative academic experiences, and some individual characteristics of the students contribute to an extended duration of studies.
Full article
(This article belongs to the Topic Interfacing Statistics, Machine Learning and Data Science from a Probabilistic Modelling Viewpoint)
►▼
Show Figures

Figure 1
Open AccessCommunication
Unraveling Meteorological Dynamics: A Two-Level Clustering Algorithm for Time Series Pattern Recognition with Missing Data Handling
by
Ekaterini Skamnia, Eleni S. Bekri and Polychronis Economou
Stats 2025, 8(2), 36; https://doi.org/10.3390/stats8020036 - 9 May 2025
Abstract
►▼
Show Figures
Identifying regions with similar meteorological features is of both socioeconomic and ecological importance. Towards that direction, useful information can be drawn from meteorological stations, and spread in a broader area. In this work, a time series clustering procedure composed of two levels is
[...] Read more.
Identifying regions with similar meteorological features is of both socioeconomic and ecological importance. Towards that direction, useful information can be drawn from meteorological stations, and spread in a broader area. In this work, a time series clustering procedure composed of two levels is proposed, focusing on clustering spatial units (meteorological stations) based on their temporal patterns, rather than clustering time periods. It is capable of handling univariate or multivariate time series, with missing data or different lengths but with a common seasonal time period. The first level involves the clustering of the dominant features of the time series (e.g., similar seasonal patterns) by employing K-means, while the second one produces clusters based on secondary features. Hierarchical clustering with Dynamic Time Warping for the univariate case and multivariate Dynamic Time Warping for the multivariate scenario are employed for the second level. Principal component analysis or Classic Multidimensional Scaling is applied before the first level, while an imputation technique is applied to the raw data in the second level to address missing values in the dataset. This step is particularly important given that missing data is a frequent issue in measurements obtained from meteorological stations. The method is subsequently applied to the available precipitation time series and then also to a time series of mean temperature obtained by the automated weather stations network in Greece. Further, both of the characteristics are employed to cover the multivariate scenario.
Full article

Figure 1
Open AccessArticle
Reliability Assessment via Combining Data from Similar Systems
by
Jianping Hao and Mochao Pei
Stats 2025, 8(2), 35; https://doi.org/10.3390/stats8020035 - 8 May 2025
Abstract
►▼
Show Figures
In operational testing contexts, testers face dual challenges of constrained timeframes and limited resources, both of which impede the generation of reliability test data. To address this issue, integrating data from similar systems with test data can effectively expand data sources. This study
[...] Read more.
In operational testing contexts, testers face dual challenges of constrained timeframes and limited resources, both of which impede the generation of reliability test data. To address this issue, integrating data from similar systems with test data can effectively expand data sources. This study proposes a systematic approach wherein the mission of the system under test (SUT) is decomposed to identify candidate subsystems for data combination. A phylogenetic tree representation is constructed for subsystem analysis and subsequently mapped to a mixed-integer programming (MIP) model, enabling efficient computation of similarity factors. A reliability assessment model that combines data from similar subsystems is established. The similarity factor is regarded as a covariate, and the regression relationship between it and the subsystem failure-time distribution is established. The joint posterior distribution of regression coefficients is derived using Bayesian theory, which are then sampled via the No-U-Turn Sampler (NUTS) algorithm to obtain reliability estimates. Numerical case studies demonstrate that the proposed method outperforms existing approaches, yielding more robust similarity factors and higher accuracy in reliability assessments.
Full article

Figure 1
Open AccessArticle
Estimation of Weighted Extropy Under the α-Mixing Dependence Condition
by
Radhakumari Maya, Archana Krishnakumar, Muhammed Rasheed Irshad and Christophe Chesneau
Stats 2025, 8(2), 34; https://doi.org/10.3390/stats8020034 - 1 May 2025
Abstract
►▼
Show Figures
Introduced as a complementary concept to Shannon entropy, extropy provides an alternative perspective for measuring uncertainty. While useful in areas such as reliability theory and scoring rules, extropy in its original form treats all outcomes equally, which can limit its applicability in real-world
[...] Read more.
Introduced as a complementary concept to Shannon entropy, extropy provides an alternative perspective for measuring uncertainty. While useful in areas such as reliability theory and scoring rules, extropy in its original form treats all outcomes equally, which can limit its applicability in real-world settings where different outcomes have varying degrees of importance. To address this, the weighted extropy measure incorporates a weight function that reflects the relative significance of outcomes, thereby increasing the flexibility and sensitivity of uncertainty quantification. In this paper, we propose a novel recursive non-parametric kernel estimator for weighted extropy based on -mixing dependent observations, a common setting in time series and stochastic processes. The recursive formulation allows for efficient updating with sequential data, making it particularly suitable for real-time analysis. We establish several theoretical properties of the estimator, including its recursive structure, consistency, and asymptotic behavior under mild regularity conditions. A comprehensive simulation study and data application demonstrate the practical performance of the estimator and validate its superiority over the non-recursive kernel estimator in terms of accuracy and computational efficiency. The results confirm the relevance of the method for dynamic, dependent, and weighted systems.
Full article

Figure 1
Open AccessArticle
A Smoothed Three-Part Redescending M-Estimator
by
Alistair J. Martin and Brenton R. Clarke
Stats 2025, 8(2), 33; https://doi.org/10.3390/stats8020033 - 30 Apr 2025
Abstract
A smoothed M-estimator is derived from Hampel’s three-part redescending estimator for location and scale. The estimator is shown to be weakly continuous and Fréchet differentiable in the neighbourhood of the normal distribution. Asymptotic assessment is conducted at asymmetric contaminating distributions, where smoothing is
[...] Read more.
A smoothed M-estimator is derived from Hampel’s three-part redescending estimator for location and scale. The estimator is shown to be weakly continuous and Fréchet differentiable in the neighbourhood of the normal distribution. Asymptotic assessment is conducted at asymmetric contaminating distributions, where smoothing is shown to improve variance and change-of-variance sensitivity. Other robust metrics compared are largely unchanged, and therefore, the smoothed functions represent an improvement for asymmetric contamination near the rejection point with little downside.
Full article
(This article belongs to the Section Statistical Methods)
►▼
Show Figures

Figure 1
Open AccessArticle
Longitudinal Survival Analysis Using First Hitting Time Threshold Regression: With Applications to Wiener Processes
by
Ya-Shan Cheng, Yiming Chen and Mei-Ling Ting Lee
Stats 2025, 8(2), 32; https://doi.org/10.3390/stats8020032 - 28 Apr 2025
Abstract
►▼
Show Figures
First-hitting time threshold regression (TR) is well-known for analyzing event time data without the proportional hazards assumption. To date, most applications and software are developed for cross-sectional data. In this paper, using the Markov property of processes with stationary independent increments, we present
[...] Read more.
First-hitting time threshold regression (TR) is well-known for analyzing event time data without the proportional hazards assumption. To date, most applications and software are developed for cross-sectional data. In this paper, using the Markov property of processes with stationary independent increments, we present methods and procedures for conducting longitudinal threshold regression (LTR) for event time data with or without covariates. We demonstrate the usage of LTR in two case scenarios, namely, analyzing laser reliability data without covariates, and cardiovascular health data with time-dependent covariates. Moreover, we provide a simple-to-use R function for LTR estimation for applications using Wiener processes.
Full article

Figure 1
Open AccessArticle
Adaptive Clinical Trials and Sample Size Determination in the Presence of Measurement Error and Heterogeneity
by
Hassan Farooq, Sajid Ali, Ismail Shah, Ibrahim A. Nafisah and Mohammed M. A. Almazah
Stats 2025, 8(2), 31; https://doi.org/10.3390/stats8020031 - 25 Apr 2025
Abstract
Adaptive clinical trials offer a flexible approach for refining sample sizes during ongoing research to enhance their efficiency. This study delves into improving sample size recalculation through resampling techniques, employing measurement error and mixed distribution models. The research employs diverse sample size-recalculation strategies
[...] Read more.
Adaptive clinical trials offer a flexible approach for refining sample sizes during ongoing research to enhance their efficiency. This study delves into improving sample size recalculation through resampling techniques, employing measurement error and mixed distribution models. The research employs diverse sample size-recalculation strategies standard simulation, R1 and R2 approaches where R1 considers the mean and R2 employs both mean and standard deviation as summary locations. These strategies are tested against observed conditional power (OCP), restricted observed conditional power (ROCP), promising zone (PZ) and group sequential design (GSD). The key findings indicate that the R1 approach, capitalizing on mean as a summary location, outperforms standard recalculations without resampling as it mitigates variability in recalculated sample sizes across effect sizes. The OCP exhibits superior performance within the R1 approach compared to ROCP, PZ and GSD due to enhanced conditional power. However, a tendency to inflate the initial stage’s sample size is observed in the R1 approach, prompting the development of the R2 approach that considers mean and standard deviation. The ROCP in the R2 approach demonstrates robust performance across most effect sizes, although GSD retains superiority within the R2 approach due to its sample size boundary. Notably, sample size-recalculation designs perform worse than R1 for specific effect sizes, attributed to inefficiencies in approaching target sample sizes. The resampling-based approaches, particularly R1 and R2, offer improved sample size recalculation over conventional methods. The R1 approach excels in minimizing recalculated sample size variability, while the R2 approach presents a refined alternative.
Full article
Open AccessCommunication
An Integrated Hybrid-Stochastic Framework for Agro-Meteorological Prediction Under Environmental Uncertainty
by
Mohsen Pourmohammad Shahvar, Davide Valenti, Alfonso Collura, Salvatore Micciche, Vittorio Farina and Giovanni Marsella
Stats 2025, 8(2), 30; https://doi.org/10.3390/stats8020030 - 25 Apr 2025
Abstract
This study presents a comprehensive framework for agro-meteorological prediction, combining stochastic modeling, machine learning techniques, and environmental feature engineering to address challenges in yield prediction and wind behavior modeling. Focused on mango cultivation in the Mediterranean region, the workflow integrates diverse datasets, including
[...] Read more.
This study presents a comprehensive framework for agro-meteorological prediction, combining stochastic modeling, machine learning techniques, and environmental feature engineering to address challenges in yield prediction and wind behavior modeling. Focused on mango cultivation in the Mediterranean region, the workflow integrates diverse datasets, including satellite-derived variables such as NDVI, soil moisture, and land surface temperature (LST), along with meteorological features like wind speed and direction. Stochastic modeling was employed to capture environmental variability, while a proxy yield was defined using key environmental factors in the absence of direct field yield measurements. Machine learning models, including random forest and multi-layer perceptron (MLP), were hybridized to improve the prediction accuracy for both proxy yield and wind components (U and V that represent the east–west and north–south wind movement). The hybrid model achieved mean squared error (MSE) values of 0.333 for U and 0.181 for V, with corresponding R2 values of 0.8939 and 0.9339, respectively, outperforming the individual models and demonstrating reliable generalization in the 2022 test set. Additionally, although NDVI is traditionally important in crop monitoring, its low temporal variability across the observation period resulted in minimal contribution to the final prediction, as confirmed by feature importance analysis. Furthermore, the analysis revealed the significant influence of environmental factors such as LST, precipitable water, and soil moisture on yield dynamics, while wind visualization over digital elevation models (DEMs) highlighted the impact of terrain features on the wind patterns. The results demonstrate the effectiveness of combining stochastic and machine learning approaches in agricultural modeling, offering valuable insights for crop management and climate adaptation strategies.
Full article
(This article belongs to the Section Applied Statistics and Machine Learning Methods)
►▼
Show Figures

Figure 1
Open AccessArticle
Bias-Corrected Fixed Item Parameter Calibration, with an Application to PISA Data
by
Alexander Robitzsch
Stats 2025, 8(2), 29; https://doi.org/10.3390/stats8020029 - 24 Apr 2025
Abstract
►▼
Show Figures
Fixed item parameter calibration (FIPC) is commonly used to compare groups or countries using an item response theory model with a common set of fixed item parameters. However, FIPC has been shown to produce biased estimates of group means and standard deviations in
[...] Read more.
Fixed item parameter calibration (FIPC) is commonly used to compare groups or countries using an item response theory model with a common set of fixed item parameters. However, FIPC has been shown to produce biased estimates of group means and standard deviations in the presence of random differential item functioning (DIF). To address this, a bias-corrected variant of FIPC, called BCFIPC, is introduced in this article. BCFIPC eliminated the bias of FIPC with only minor efficiency losses in certain simulation conditions, but substantial precision gains in many others, particularly for estimating group standard deviations. Finally, a comparison of both methods using the PISA 2006 dataset revealed relatively large differences in country means and standard deviations.
Full article

Figure 1
Open AccessFeature PaperArticle
Inferences About Two-Parameter Multicollinear Gaussian Linear Regression Models: An Empirical Type I Error and Power Comparison
by
Md Ariful Hoque, Zoran Bursac and B. M. Golam Kibria
Stats 2025, 8(2), 28; https://doi.org/10.3390/stats8020028 - 23 Apr 2025
Abstract
In linear regression analysis, the independence assumption is crucial and the ordinary least square (OLS) estimator generally regarded as the Best Linear Unbiased Estimator (BLUE) is applied. However, multicollinearity can complicate the estimation of the effect of individual variables, leading to potential inaccurate
[...] Read more.
In linear regression analysis, the independence assumption is crucial and the ordinary least square (OLS) estimator generally regarded as the Best Linear Unbiased Estimator (BLUE) is applied. However, multicollinearity can complicate the estimation of the effect of individual variables, leading to potential inaccurate statistical inferences. Because of this issue, different types of two-parameter estimators have been explored. This paper compares t-tests for assessing the significance of regression coefficients, including several two-parameter estimators. We conduct a Monte Carlo study to evaluate these methods by examining their empirical type I error and power characteristics, based on established protocols. The simulation results indicate that some two-parameter estimators achieve better power gains while preserving the nominal size at 5%. Real-life data are analyzed to illustrate the findings of this paper.
Full article
(This article belongs to the Section Statistical Methods)
►▼
Show Figures

Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
JPM, Mathematics, Applied Sciences, Stats, Healthcare
Application of Biostatistics in Medical Sciences and Global Health
Topic Editors: Bogdan Oancea, Adrian Pană, Cǎtǎlina Liliana AndreiDeadline: 31 October 2026

Conferences
Special Issues
Special Issue in
Stats
Machine Learning and Natural Language Processing (ML & NLP)
Guest Editor: Stéphane MussardDeadline: 30 June 2025
Special Issue in
Stats
Benford's Law(s) and Applications (Second Edition)
Guest Editors: Marcel Ausloos, Roy Cerqueti, Claudio LupiDeadline: 31 October 2025
Special Issue in
Stats
Nonparametric Inference: Methods and Applications
Guest Editor: Stefano BonniniDeadline: 28 November 2025