Journal Description
Stats
Stats
is an international, peer-reviewed, open access journal on statistical science published quarterly online by MDPI. The journal focuses on methodological and theoretical papers in statistics, probability, stochastic processes and innovative applications of statistics in all scientific disciplines including biological and biomedical sciences, medicine, business, economics and social sciences, physics, data science and engineering.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within ESCI (Web of Science), Scopus, RePEc, and other databases.
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 18.2 days after submission; acceptance to publication is undertaken in 2.9 days (median values for papers published in this journal in the first half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
1.0 (2024);
5-Year Impact Factor:
1.1 (2024)
Latest Articles
A Data-Driven Approach of DRG-Based Medical Insurance Payment Policy Formulation in China Based on an Optimization Algorithm
Stats 2025, 8(3), 54; https://doi.org/10.3390/stats8030054 - 30 Jun 2025
Abstract
►
Show Figures
The diagnosis-related group (DRG) system classifies patients into different groups in order to facilitate decisions regarding medical insurance payments. Currently, more than 600 standard DRGs exist in China. Payment details represented by DRG weights must be adjusted during decision-making. After modeling the DRG
[...] Read more.
The diagnosis-related group (DRG) system classifies patients into different groups in order to facilitate decisions regarding medical insurance payments. Currently, more than 600 standard DRGs exist in China. Payment details represented by DRG weights must be adjusted during decision-making. After modeling the DRG weight-determining process as a parameter-searching and optimization-solving problem, we propose a stochastic gradient tracking algorithm (SGT) and compare it with a genetic algorithm and sequential quadratic programming. We describe diagnosis-related groups in China using several statistics based on sample data from one city. We explored the influence of the SGT hyperparameters through numerous experiments and demonstrated the robustness of the best SGT hyperparameter combination. Our stochastic gradient tracking algorithm finished the parameter search in only 3.56 min when the insurance payment rate was set at 95%, which is acceptable and desirable. As the main medical insurance payment scheme in China, DRGs require quantitative evidence for policymaking. The optimization algorithm proposed in this study shows a possible scientific decision-making method for use in the DRG system, particularly with regard to DRG weights.
Full article
Open AccessArticle
Distance-Based Relevance Function for Imbalanced Regression
by
Daniel Daeyoung In and Hyunjoong Kim
Stats 2025, 8(3), 53; https://doi.org/10.3390/stats8030053 - 28 Jun 2025
Abstract
Imbalanced regression poses a significant challenge in real-world prediction tasks, where rare target values are prone to overfitting during model training. To address this, prior research has employed relevance functions to quantify the rarity of target instances. However, existing functions often struggle to
[...] Read more.
Imbalanced regression poses a significant challenge in real-world prediction tasks, where rare target values are prone to overfitting during model training. To address this, prior research has employed relevance functions to quantify the rarity of target instances. However, existing functions often struggle to capture the rarity across diverse target distributions. In this study, we introduce a novel Distance-based Relevance Function (DRF) that quantifies the rarity based on the distance between target values, enabling a more accurate and distribution-agnostic assessment of rare data. This general approach allows imbalanced regression techniques to be effectively applied to a broader range of distributions, including bimodal cases. We evaluate the proposed DRF using Mean Squared Error (MSE), relevance-weighted Mean Absolute Error ( ), and Symmetric Mean Absolute Percentage Error (SMAPE). Empirical studies on synthetic datasets and 18 real-world datasets demonstrate that DRF tends to improve the performance across various machine learning models, including support vector regression, neural networks, XGBoost, and random forests. These findings suggest that DRF offers a promising direction for rare target detection and broadens the applicability of imbalanced regression methods.
Full article
Open AccessArticle
New Effects and Methods in Brownian Transport
by
Dmitri Martila and Stefan Groote
Stats 2025, 8(3), 52; https://doi.org/10.3390/stats8030052 - 26 Jun 2025
Abstract
►▼
Show Figures
We consider the noise-induced transport of overdamped Brownian particles in a ratchet system driven by nonequilibrium symmetric three-level Markovian noise and additive white noise. In addition to a detailed analysis of this system, we consider a simple example that can be solved exactly,
[...] Read more.
We consider the noise-induced transport of overdamped Brownian particles in a ratchet system driven by nonequilibrium symmetric three-level Markovian noise and additive white noise. In addition to a detailed analysis of this system, we consider a simple example that can be solved exactly, showing both the increase in the number of current reversals and hypersensitivity. The simplicity of the exact solution and the model itself is beneficial for comparison with experiments.
Full article

Figure 1
Open AccessArticle
Elicitation of Priors for the Weibull Distribution
by
Purvi Prajapati, James D. Stamey, David Kahle, John W. Seaman, Jr., Zachary M. Thomas and Michael Sonksen
Stats 2025, 8(3), 51; https://doi.org/10.3390/stats8030051 - 23 Jun 2025
Abstract
►▼
Show Figures
Bayesian methods have attracted increasing interest in the design and analysis of clinical trials. Many of these clinical trials investigate time-to-event endpoints. The Weibull distribution is often used in survival and reliability analysis to model time-to-event data. We propose a process to elicit
[...] Read more.
Bayesian methods have attracted increasing interest in the design and analysis of clinical trials. Many of these clinical trials investigate time-to-event endpoints. The Weibull distribution is often used in survival and reliability analysis to model time-to-event data. We propose a process to elicit information about the parameters of the Weibull distribution for pharmaceutical applications. Our method is based on an expert’s answers to questions about the median and upper quartile of the distribution. Using the elicited information, a joint prior is constructed for the median and upper quartile of the Weibull distribution, which induces a joint prior distribution on the shape and rate parameters of the Weibull. To illustrate, we apply our elicitation methodology to a pediatric clinical trial, where information is elicited from a subject-matter expert for the control arm.
Full article

Figure 1
Open AccessArticle
Ethicametrics: A New Interdisciplinary Science
by
Fabio Zagonari
Stats 2025, 8(3), 50; https://doi.org/10.3390/stats8030050 - 22 Jun 2025
Abstract
►▼
Show Figures
This paper characterises Ethicametrics (EM) as a new interdisciplinary scientific research area focusing on metrics of ethics (MOE) and ethics of metrics (EOM), by providing a comprehensive methodological framework. EM is scientific: it is based on behavioural mathematical modelling to be statistically validated
[...] Read more.
This paper characterises Ethicametrics (EM) as a new interdisciplinary scientific research area focusing on metrics of ethics (MOE) and ethics of metrics (EOM), by providing a comprehensive methodological framework. EM is scientific: it is based on behavioural mathematical modelling to be statistically validated and tested, with additional sensitivity analyses to favour immediate interpretations. EM is interdisciplinary: it spans from less to more traditional fields, with essential mutual improvements. EM is new: valid and invalid examples of EM (articles referring to an explicit and an implicit behavioural model, respectively) are scarce, recent, time-stable and discipline-focused, with 1 and 37 scientists, respectively. Thus, the core of EM (multi-level statistical analyses applied to behavioural mathematical models) is crucial to avoid biased MOE and EOM. Conversely, articles inside EM should study quantitatively any metrics or ethics, in any alternative context, at any analytical level, by using panel/longitudinal data. Behavioural models should be ethically explicit, possibly by evaluating ethics in terms of the consequences of actions. Ethical measures should be scientifically grounded by evaluating metrics in terms of ethical criteria coming from the relevant theological/philosophical literature. Note that behavioural models applied to science metrics can be used to deduce social consequences to be ethically evaluated.
Full article

Figure 1
Open AccessArticle
Mission Reliability Assessment for the Multi-Phase Data in Operational Testing
by
Jianping Hao and Mochao Pei
Stats 2025, 8(3), 49; https://doi.org/10.3390/stats8030049 - 20 Jun 2025
Abstract
►▼
Show Figures
Traditional methods for mission reliability assessment under operational testing conditions exhibit some limitations. They include coarse modeling granularity, significant parameter estimation biases, and inadequate adaptability for handling heterogeneous test data. To address these challenges, this study establishes an assessment framework using a vehicular
[...] Read more.
Traditional methods for mission reliability assessment under operational testing conditions exhibit some limitations. They include coarse modeling granularity, significant parameter estimation biases, and inadequate adaptability for handling heterogeneous test data. To address these challenges, this study establishes an assessment framework using a vehicular missile launching system (VMLS) as a case study. The framework constructs phase-specific reliability block diagrams based on mission profiles and establishes mappings between data types and evaluation models. The framework integrates the maximum entropy criterion with reliability monotonic decreasing constraints, develops a covariate-embedded Bayesian data fusion model, and proposes a multi-path weight adjustment assessment method. Simulation and physical testing demonstrate that compared with conventional methods, the proposed approach shows superior accuracy and precision in parameter estimation. It enables mission reliability assessment under practical operational testing constraints while providing methodological support to overcome the traditional assessment paradigm that overemphasizes performance verification while neglecting operational capability development.
Full article

Figure 1
Open AccessArticle
Component Analysis When Testing for Fixed Effects in Unbalanced ANOVAs
by
J. C. W. Rayner and G. C. Livingston, Jr.
Stats 2025, 8(2), 48; https://doi.org/10.3390/stats8020048 - 16 Jun 2025
Abstract
►▼
Show Figures
In possibly unbalanced fixed effects in ANOVAs, we examine both parametric and nonparametric tests for main and two-way interaction effects when the levels of each factor may be ordered or unordered. For main effects, we decompose the factor sum of squares into one
[...] Read more.
In possibly unbalanced fixed effects in ANOVAs, we examine both parametric and nonparametric tests for main and two-way interaction effects when the levels of each factor may be ordered or unordered. For main effects, we decompose the factor sum of squares into one degree of freedom components involving contrasts, albeit not necessarily orthogonal contrasts. For interactions, we develop what we call coefficients. These are an extension of part of the interaction sum of squares in potentially unbalanced designs. They may be used to test nonparametrically for focused interaction effects. The tests developed here provide focused and objective assessments of main and interaction effects and augment traditional methods.
Full article

Figure 1
Open AccessArticle
A Note on the Robust Modification of the Ordered-Heterogeneity Test
by
Markus Neuhäuser and Sabrina Schmitt
Stats 2025, 8(2), 47; https://doi.org/10.3390/stats8020047 - 5 Jun 2025
Abstract
►▼
Show Figures
An ordered heterogeneity (OH) test is a test for a trend that combines a nondirectional heterogeneity test with the rank-order information specified under the alternative. A modified OH test introduced in 2006 can detect all possible patterns under the alternative with a relatively
[...] Read more.
An ordered heterogeneity (OH) test is a test for a trend that combines a nondirectional heterogeneity test with the rank-order information specified under the alternative. A modified OH test introduced in 2006 can detect all possible patterns under the alternative with a relatively high power. Here, it is proposed to apply the modified OH test as a permutation test, which has the advantage that it requires only the exchangeability of the observations in the combined sample under the null hypothesis. No additional assumptions or simulations of critical values are necessary. A simulation study indicates that the permutation OH test controls the significance level even for small sample sizes and has a good power, comparable to competing tests. Moreover, the main advantage of the OH tests is their very broad applicability; they can always be applied when a heterogeneity test exists.
Full article

Figure 1
Open AccessArticle
New Methods for Multivariate Normal Moments
by
Christopher Stroude Withers
Stats 2025, 8(2), 46; https://doi.org/10.3390/stats8020046 - 5 Jun 2025
Abstract
Multivariate normal moments are foundational for statistical methods. The derivation and simplification of these moments are critical for the accuracy of various statistical estimates and analyses. Normal moments are the building blocks of the Hermite polynomials, which in turn are the building blocks
[...] Read more.
Multivariate normal moments are foundational for statistical methods. The derivation and simplification of these moments are critical for the accuracy of various statistical estimates and analyses. Normal moments are the building blocks of the Hermite polynomials, which in turn are the building blocks of the Edgeworth expansions for the distribution of parameter estimates. Isserlis (1918) gave the bivariate normal moments and two special cases of trivariate moments. Beyond that, convenient expressions for multivariate variate normal moments are still not available. We compare three methods for obtaining them, the most powerful being the differential method. We give simpler formulas for the bivariate moment than that of Isserlis, and explicit expressions for the general moments of dimensions 3 and 4.
Full article
(This article belongs to the Section Multivariate Analysis)
Open AccessArticle
Evaluating Estimator Performance Under Multicollinearity: A Trade-Off Between MSE and Accuracy in Logistic, Lasso, Elastic Net, and Ridge Regression with Varying Penalty Parameters
by
H. M. Nayem, Sinha Aziz and B. M. Golam Kibria
Stats 2025, 8(2), 45; https://doi.org/10.3390/stats8020045 - 31 May 2025
Abstract
►▼
Show Figures
Multicollinearity in logistic regression models can result in inflated variances and yield unreliable estimates of parameters. Ridge regression, a regularized estimation technique, is frequently employed to address this issue. This study conducts a comparative evaluation of the performance of 23 established ridge regression
[...] Read more.
Multicollinearity in logistic regression models can result in inflated variances and yield unreliable estimates of parameters. Ridge regression, a regularized estimation technique, is frequently employed to address this issue. This study conducts a comparative evaluation of the performance of 23 established ridge regression estimators alongside Logistic Regression, Elastic-Net, Lasso, and Generalized Ridge Regression (GRR), considering various levels of multicollinearity within the context of logistic regression settings. Simulated datasets with high correlations (0.80, 0.90, 0.95, and 0.99) and real-world data (municipal and cancer remission) were analyzed. Both results show that ridge estimators, such as , and , exhibit strong performance in terms of Mean Squared Error (MSE) and accuracy, particularly in smaller samples, while GRR demonstrates superior performance in large samples. Real-world data further confirm that GRR achieves the lowest MSE in highly collinear municipal data, while ridge estimators and GRR help prevent overfitting in small-sample cancer remission data. The results underscore the efficacy of ridge estimators and GRR in handling multicollinearity, offering reliable alternatives to traditional regression techniques, especially for datasets with high correlations and varying sample sizes.
Full article

Figure 1
Open AccessArticle
mbX: An R Package for Streamlined Microbiome Analysis
by
Utsav Lamichhane and Jeferson Lourenco
Stats 2025, 8(2), 44; https://doi.org/10.3390/stats8020044 - 29 May 2025
Abstract
Here, we introduce the mbX package: an R-based tool designed to streamline 16S rRNA gene microbiome data analysis following taxonomic classification. It automates key post-sequencing steps, including taxonomic data cleaning and visualization, addressing the need for reproducible and user-friendly microbiome workflows. mbX’s core
[...] Read more.
Here, we introduce the mbX package: an R-based tool designed to streamline 16S rRNA gene microbiome data analysis following taxonomic classification. It automates key post-sequencing steps, including taxonomic data cleaning and visualization, addressing the need for reproducible and user-friendly microbiome workflows. mbX’s core functions, ezclean and ezviz, take raw taxonomic output (such as those from QIIME 2) and sample metadata to produce a cleaned relative abundance dataset and high-quality stacked bar plots with minimal manual intervention. We validated mbX on 14 real microbiome datasets, demonstrating significant improvements in efficiency and consistency of post-processing of DNA sequence data. The results show that mbX ensures uniform taxonomic formatting, eliminates common manual errors, and quickly generates publication-ready figures, greatly facilitating downstream analysis. For a dataset with 20 samples, both functions of mbX ran in less than 1 s and used less than 1 GB of memory. For a dataset with more than 1170 samples, the functions ran within 125 s and used less than 4.5 GB of memory. By integrating seamlessly with existing pipelines and emphasizing automation, mbX fills a critical gap between sequence classification and statistical analysis. An upcoming version will have an added function which will further extend mbX to automated statistical comparisons, aiming for an end-to-end microbiome analysis solution by integrating mbX with currently available pipelines. This article presents the design of mbX, its workflow and features, and a comparative discussion positioning mbX relative to other microbiome bioinformatics tools. The contributions of mbX highlight its significance in accelerating microbiome research through reproducible and streamlined data analysis.
Full article
(This article belongs to the Section Statistical Software)
►▼
Show Figures

Figure 1
Open AccessArticle
D-plots: Visualizations for Analysis of Bivariate Dependence Between Continuous Random Variables
by
Arturo Erdely and Manuel Rubio-Sánchez
Stats 2025, 8(2), 43; https://doi.org/10.3390/stats8020043 - 24 May 2025
Abstract
►▼
Show Figures
Scatter plots are widely recognized as fundamental tools for illustrating the relationship between two numerical variables. Despite this, based on solid theoretical foundations, scatter plots generated from pairs of continuous random variables may not serve as reliable tools for assessing dependence. Sklar’s theorem
[...] Read more.
Scatter plots are widely recognized as fundamental tools for illustrating the relationship between two numerical variables. Despite this, based on solid theoretical foundations, scatter plots generated from pairs of continuous random variables may not serve as reliable tools for assessing dependence. Sklar’s theorem implies that scatter plots created from ranked data are preferable for such analysis, as they exclusively convey information pertinent to dependence. This is in stark contrast to conventional scatter plots, which also encapsulate information about the variables’ marginal distributions. Such additional information is extraneous to dependence analysis and can obscure the visual interpretation of the variables’ relationship. In this article, we delve into the theoretical underpinnings of these ranked data scatter plots, hereafter referred to as rank plots. We offer insights into interpreting the information they reveal and examine their connections with various association measures, including Pearson’s and Spearman’s correlation coefficients, as well as Schweizer–Wolff’s measure of dependence. Furthermore, we introduce a novel visualization ensemble, termed a d-plot, which integrates rank plots, empirical copula diagnostics, and traditional summaries to provide a comprehensive visual assessment of dependence between continuous variables. This ensemble facilitates the detection of subtle dependence structures, including non-quadrant dependencies, that might be overlooked by traditional visual tools.
Full article

Figure 1
Open AccessArticle
Modeling Uncertainty in Ordinal Regression: The Uncertainty Rating Scale Model
by
Gerhard Tutz
Stats 2025, 8(2), 42; https://doi.org/10.3390/stats8020042 - 23 May 2025
Abstract
►▼
Show Figures
In questionnaires, respondents sometimes feel uncertain about which category to choose and may respond randomly. Including uncertainty in the modeling of response behavior aims to obtain more accurate estimates of the impact of explanatory variables on actual preferences and to avoid bias. Additionally,
[...] Read more.
In questionnaires, respondents sometimes feel uncertain about which category to choose and may respond randomly. Including uncertainty in the modeling of response behavior aims to obtain more accurate estimates of the impact of explanatory variables on actual preferences and to avoid bias. Additionally, variables that have an impact on uncertainty can be identified. A model is proposed that explicitly considers this uncertainty but also allows stronger certainty, depending on covariates. The developed uncertainty rating scale model is an extended version of the adjacent category model. It differs from finite mixture models, an approach that has gained popularity in recent years for modeling uncertainty. The properties of the model are investigated and compared to finite mixture models and other ordinal response models using illustrative datasets.
Full article

Figure 1
Open AccessArticle
Revisiting the Replication Crisis and the Untrustworthiness of Empirical Evidence
by
Aris Spanos
Stats 2025, 8(2), 41; https://doi.org/10.3390/stats8020041 - 20 May 2025
Abstract
►▼
Show Figures
The current replication crisis relating to the non-replicability and the untrustworthiness of published empirical evidence is often viewed through the lens of the Positive Predictive Value (PPV) in the context of the Medical Diagnostic Screening (MDS) model. The PPV is misconstrued as a
[...] Read more.
The current replication crisis relating to the non-replicability and the untrustworthiness of published empirical evidence is often viewed through the lens of the Positive Predictive Value (PPV) in the context of the Medical Diagnostic Screening (MDS) model. The PPV is misconstrued as a measure that evaluates ‘the probability of rejecting when false’, after being metamorphosed by replacing its false positive/negative probabilities with the type I/II error probabilities. This perspective gave rise to a widely accepted diagnosis that the untrustworthiness of published empirical evidence stems primarily from abuses of frequentist testing, including p-hacking, data-dredging, and cherry-picking. It is argued that the metamorphosed PPV misrepresents frequentist testing and misdiagnoses the replication crisis, promoting ill-chosen reforms. The primary source of untrustworthiness is statistical misspecification: invalid probabilistic assumptions imposed on one’s data. This is symptomatic of the much broader problem of the uninformed and recipe-like implementation of frequentist statistics without proper understanding of (a) the invoked probabilistic assumptions and their validity for the data used, (b) the reasoned implementation and interpretation of the inference procedures and their error probabilities, and (c) warranted evidential interpretations of inference results. A case is made that Fisher’s model-based statistics offers a more pertinent and incisive diagnosis of the replication crisis, and provides a well-grounded framework for addressing the issues (a)–(c), which would unriddle the non-replicability/untrustworthiness problems.
Full article

Figure 1
Open AccessArticle
An Analysis of Vectorised Automatic Differentiation for Statistical Applications
by
Chun Fung Kwok, Dan Zhu and Liana Jacobi
Stats 2025, 8(2), 40; https://doi.org/10.3390/stats8020040 - 19 May 2025
Abstract
Automatic differentiation (AD) is a general method for computing exact derivatives in complex sensitivity analyses and optimisation tasks, particularly when closed-form solutions are unavailable and traditional analytical or numerical methods fall short. This paper introduces a vectorised formulation of AD grounded in matrix
[...] Read more.
Automatic differentiation (AD) is a general method for computing exact derivatives in complex sensitivity analyses and optimisation tasks, particularly when closed-form solutions are unavailable and traditional analytical or numerical methods fall short. This paper introduces a vectorised formulation of AD grounded in matrix calculus. It aligns naturally with the matrix-oriented style prevalent in statistics, supports convenient implementations, and takes advantage of sparse matrix representation and other high-level optimisation techniques that are not available in the scalar counterpart. Our formulation is well-suited to high-dimensional statistical applications, where finite differences (FD) scale poorly due to the need to repeat computations for each input dimension, resulting in significant overhead, and is advantageous in simulation-intensive settings—such as Markov Chain Monte Carlo (MCMC)-based inference—where FD requires repeated sampling and multiple function evaluations, while AD can compute exact derivatives in a single pass, substantially reducing computational cost. Numerical studies are presented to demonstrate the efficacy and speed of the proposed AD method compared with FD schemes.
Full article
(This article belongs to the Section Computational Statistics)
Open AccessArticle
Theoretical Advancements in Small Area Modeling: A Case Study with the CHILD Cohort
by
Charanpal Singh and Mahmoud Torabi
Stats 2025, 8(2), 39; https://doi.org/10.3390/stats8020039 - 16 May 2025
Abstract
►▼
Show Figures
Developing accurate predictive models in statistical analysis presents significant challenges, especially in domains with limited routine assessments. This study aims to advance the theoretical underpinnings of longitudinal logistic and zero-inflated Poisson (ZIP) models in the context of small area estimation (SAE). Utilizing data
[...] Read more.
Developing accurate predictive models in statistical analysis presents significant challenges, especially in domains with limited routine assessments. This study aims to advance the theoretical underpinnings of longitudinal logistic and zero-inflated Poisson (ZIP) models in the context of small area estimation (SAE). Utilizing data from the Canadian Healthy Infant Longitudinal Development (CHILD) study as a case study, we explore the use of individual- and area-level random effects to enhance model precision and reliability. The study evaluates various covariates’ impact (such as mother’s asthma, mother wheezed, mother smoked) on model performance to predict child’s wheezing, emphasizing the role of location within Manitoba. Our main findings contribute to the literature by providing insights into the development and refinement of small area models, emphasizing the significance of advancing theoretical frameworks in statistical modeling.
Full article

Figure 1
Open AccessArticle
Determinants of Blank and Null Votes in the Brazilian Presidential Elections
by
Renata Rojas Guerra, Kerolene De Souza Moraes, Fernando De Jesus Moreira Junior, Fernando A. Peña-Ramírez and Ryan Novaes Pereira
Stats 2025, 8(2), 38; https://doi.org/10.3390/stats8020038 - 13 May 2025
Abstract
This study analyzes the factors influencing the proportions of blank and null votes in Brazilian municipalities during the 2018 presidential elections. The behavior of the variable of interest is examined using unit regression models within the Generalized Additive Models for Location, Scale, and
[...] Read more.
This study analyzes the factors influencing the proportions of blank and null votes in Brazilian municipalities during the 2018 presidential elections. The behavior of the variable of interest is examined using unit regression models within the Generalized Additive Models for Location, Scale, and Shape (GAMLSS) framework. Specifically, five different unit regression models are explored, beta, simplex, Kumaraswamy, unit Weibull, and reflected unit Burr XII regressions, each incorporating submodels for both indexed distribution parameters. The beta regression model emerges as the best fit through rigorous model selection and diagnostic procedures. The findings reveal that the disaggregated municipal human development index (MHDI), particularly its income, longevity, and education dimensions, along with the municipality’s geographic region, significantly affect voting behavior. Notably, higher income and longevity values are linked to greater proportions of blank and null votes, whereas the educational level exhibits a negative relationship with the variable of interest. Additionally, municipalities in the Southeast region tend to have higher average proportions of blank and null votes. In terms of variability, the ability of a municipality’s population to acquire goods and services is shown to negatively influence the dispersion of vote proportions, while municipalities in the Northeast, North, and Southeast regions exhibit distinct patterns of variation compared to other regions. These results provide valuable insights into electoral participation’s socioeconomic and regional determinants, contributing to broader discussions on political engagement and democratic representation in Brazil.
Full article
(This article belongs to the Section Regression Models)
►▼
Show Figures

Figure 1
Open AccessArticle
A Cox Proportional Hazards Model with Latent Covariates Reflecting Students’ Preparation, Motives, and Expectations for the Analysis of Time to Degree
by
Dimitrios Kalamaras, Laura Maska and Fani Nasika
Stats 2025, 8(2), 37; https://doi.org/10.3390/stats8020037 - 13 May 2025
Abstract
Issues related to the duration of university studies have attracted the interest of many researchers from different scientific fields, as far back as the middle of the 20th century. In this study, a Survival Analysis methodology and, more specifically, a Cox Proportional hazards
[...] Read more.
Issues related to the duration of university studies have attracted the interest of many researchers from different scientific fields, as far back as the middle of the 20th century. In this study, a Survival Analysis methodology and, more specifically, a Cox Proportional hazards model, has been proposed to evaluate a theoretical framework/model that relates the risk a student might face either graduating on time or having a late graduation, with a number of observed and latent factors that have been proposed in the literature as the main determinants of time to degree completion. The major findings of the analysis suggest that the factors contributing to reducing the duration of studies include high academic achievements at early stages, positive motivation, expectations, attitudes, and beliefs regarding studies. On the contrary, external situations, negative academic experiences, and some individual characteristics of the students contribute to an extended duration of studies.
Full article
(This article belongs to the Topic Interfacing Statistics, Machine Learning and Data Science from a Probabilistic Modelling Viewpoint)
►▼
Show Figures

Figure 1
Open AccessCommunication
Unraveling Meteorological Dynamics: A Two-Level Clustering Algorithm for Time Series Pattern Recognition with Missing Data Handling
by
Ekaterini Skamnia, Eleni S. Bekri and Polychronis Economou
Stats 2025, 8(2), 36; https://doi.org/10.3390/stats8020036 - 9 May 2025
Abstract
►▼
Show Figures
Identifying regions with similar meteorological features is of both socioeconomic and ecological importance. Towards that direction, useful information can be drawn from meteorological stations, and spread in a broader area. In this work, a time series clustering procedure composed of two levels is
[...] Read more.
Identifying regions with similar meteorological features is of both socioeconomic and ecological importance. Towards that direction, useful information can be drawn from meteorological stations, and spread in a broader area. In this work, a time series clustering procedure composed of two levels is proposed, focusing on clustering spatial units (meteorological stations) based on their temporal patterns, rather than clustering time periods. It is capable of handling univariate or multivariate time series, with missing data or different lengths but with a common seasonal time period. The first level involves the clustering of the dominant features of the time series (e.g., similar seasonal patterns) by employing K-means, while the second one produces clusters based on secondary features. Hierarchical clustering with Dynamic Time Warping for the univariate case and multivariate Dynamic Time Warping for the multivariate scenario are employed for the second level. Principal component analysis or Classic Multidimensional Scaling is applied before the first level, while an imputation technique is applied to the raw data in the second level to address missing values in the dataset. This step is particularly important given that missing data is a frequent issue in measurements obtained from meteorological stations. The method is subsequently applied to the available precipitation time series and then also to a time series of mean temperature obtained by the automated weather stations network in Greece. Further, both of the characteristics are employed to cover the multivariate scenario.
Full article

Figure 1
Open AccessArticle
Reliability Assessment via Combining Data from Similar Systems
by
Jianping Hao and Mochao Pei
Stats 2025, 8(2), 35; https://doi.org/10.3390/stats8020035 - 8 May 2025
Abstract
►▼
Show Figures
In operational testing contexts, testers face dual challenges of constrained timeframes and limited resources, both of which impede the generation of reliability test data. To address this issue, integrating data from similar systems with test data can effectively expand data sources. This study
[...] Read more.
In operational testing contexts, testers face dual challenges of constrained timeframes and limited resources, both of which impede the generation of reliability test data. To address this issue, integrating data from similar systems with test data can effectively expand data sources. This study proposes a systematic approach wherein the mission of the system under test (SUT) is decomposed to identify candidate subsystems for data combination. A phylogenetic tree representation is constructed for subsystem analysis and subsequently mapped to a mixed-integer programming (MIP) model, enabling efficient computation of similarity factors. A reliability assessment model that combines data from similar subsystems is established. The similarity factor is regarded as a covariate, and the regression relationship between it and the subsystem failure-time distribution is established. The joint posterior distribution of regression coefficients is derived using Bayesian theory, which are then sampled via the No-U-Turn Sampler (NUTS) algorithm to obtain reliability estimates. Numerical case studies demonstrate that the proposed method outperforms existing approaches, yielding more robust similarity factors and higher accuracy in reliability assessments.
Full article

Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
JPM, Mathematics, Applied Sciences, Stats, Healthcare
Application of Biostatistics in Medical Sciences and Global Health
Topic Editors: Bogdan Oancea, Adrian Pană, Cǎtǎlina Liliana AndreiDeadline: 31 October 2026

Conferences
Special Issues
Special Issue in
Stats
Benford's Law(s) and Applications (Second Edition)
Guest Editors: Marcel Ausloos, Roy Cerqueti, Claudio LupiDeadline: 31 October 2025
Special Issue in
Stats
Nonparametric Inference: Methods and Applications
Guest Editor: Stefano BonniniDeadline: 28 November 2025
Special Issue in
Stats
Robust Statistics in Action II
Guest Editor: Marco RianiDeadline: 31 December 2025