Next Issue
Volume 9, April
Previous Issue
Volume 8, December
 
 

Stats, Volume 9, Issue 1 (February 2026) – 21 articles

Cover Story (view full-size image): The single-cell spatial transcriptomics (ST) data produced by recent biotech CosMx contain a vast amount of information about cancer tissue samples, thus have great potential for cancer research via the detection of ST-Community which is a collection of cells with distinct cell-type composition and similar neighboring patterns based on nearby cell percentages. This article provides a novel and more informative disk compositional data (DCD) method to process the single-cell ST data, and develops an innovative TMHC computation method to detect ST-Communities from processed DCD data. Extensive simulation studies and analysis of CosMx breast cancer ST data show that our DCD-TMHC computation method detects ST-Communities with superior and interpretable results, especially in terms of assessment for different cancer categories. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
29 pages, 1411 KB  
Article
Performance Evaluation of the Robust Stein Estimator in the Presence of Multicollinearity and Outliers
by Lwando Dlembula, Chioneso Show Marange and Lwando Orbet Kondlo
Stats 2026, 9(1), 21; https://doi.org/10.3390/stats9010021 - 22 Feb 2026
Viewed by 132
Abstract
Multicollinearity and outliers are common challenges in multiple linear regression, often adversely affecting the properties of least squares estimators. To address these issues, several robust estimators have been developed to handle multicollinearity and outliers individually or simultaneously. More recently, the robust Stein estimator [...] Read more.
Multicollinearity and outliers are common challenges in multiple linear regression, often adversely affecting the properties of least squares estimators. To address these issues, several robust estimators have been developed to handle multicollinearity and outliers individually or simultaneously. More recently, the robust Stein estimator (RSE) was introduced, which integrates shrinkage and robustness to effectively mitigate the impact of both multicollinearity and outliers. Despite its theoretical advantages, the finite-sample performance of this approach under multicollinearity and outliers remains underexplored. First, outliers in the y direction have been the main focus of earlier research on the RSE, not considering that leverage points could substantially impact regression results. Second, this study addresses the gap by considering outliers in the y direction and leverage points, providing a more thorough assessment of the RSE robustness. Finally, to extend the limited existing benchmark, we compare and evaluate the RSE performance with a wide range of robust and classical estimators. This extends existing benchmarking, which is limited in the current literature. Several Monte Carlo (MC) simulations were conducted, considering both normal and heavy-tailed error distributions, with sample sizes, multicollinearity levels, and outlier proportions varied. Performance was evaluated using bootstrap estimates of root mean squared error (RMSE) and bias. The MC simulation results indicated that the RSE outperformed other estimators under several scenarios where both multicollinearity and outliers are present. Finally, real data studies confirm the MC simulation results. Full article
(This article belongs to the Special Issue Robust Statistics in Action II)
Show Figures

Figure 1

13 pages, 274 KB  
Article
Penalized Likelihood Estimation of Continuation Ratio Models for Ordinal Response and Its Application in CGSS Data
by Huihui Sun and Yemin Cui
Stats 2026, 9(1), 20; https://doi.org/10.3390/stats9010020 - 19 Feb 2026
Viewed by 179
Abstract
The continuation ratio model is a crucial tool for analyzing ordinal response data. However, its explanatory power diminishes under high-dimensional settings where the number of covariates p is large. To address this, we introduce, for the first time, the smoothly clipped absolute deviation [...] Read more.
The continuation ratio model is a crucial tool for analyzing ordinal response data. However, its explanatory power diminishes under high-dimensional settings where the number of covariates p is large. To address this, we introduce, for the first time, the smoothly clipped absolute deviation (SCAD) penalty into the forward continuation ratio model framework. We propose a corresponding penalized likelihood estimation method that performs simultaneous variable selection and parameter estimation and provides an efficient algorithm for its implementation. Numerical simulations demonstrate the favorable properties of the SCAD penalty: it precisely identifies significant variables while more aggressively shrinking the coefficients of irrelevant ones to zero, outperforming alternative penalties like Lasso and elastic net in selection accuracy. Finally, we illustrate the practical utility of our method through an empirical application using data from the Chinese General Social Survey (CGSS). Full article
13 pages, 527 KB  
Article
Sample Size Calculation and Power Analysis for the General Mediation Analysis Method
by Nubaira Rizvi, Amjila Bam, Wentao Cao and Qingzhao Yu
Stats 2026, 9(1), 19; https://doi.org/10.3390/stats9010019 - 14 Feb 2026
Viewed by 252
Abstract
Mediation analysis is a widely used statistical technique for identifying the mechanisms underlying the relationship between an exposure and an outcome. However, accurate power analysis and sample size determination for mediation models that involve non-normal distributions or mixtures of continuous and binary variables [...] Read more.
Mediation analysis is a widely used statistical technique for identifying the mechanisms underlying the relationship between an exposure and an outcome. However, accurate power analysis and sample size determination for mediation models that involve non-normal distributions or mixtures of continuous and binary variables are challenging. We propose a computationally efficient simulation-based approach for general mediation analysis. By applying monotone smoothing splines to estimate empirical critical values derived from extensive simulations, our method enables accurate power calculations without the need for real-time simulation. We validated the method across varying scenarios, including continuous, binary variables and time-to-event outcome with strict Type I error control. The method-quantified large effects (0.35) yielded >80% power at minimal sample sizes (n = 25–50) across all settings, while small effects (0.02) required larger samples. Continuous models achieved 80% power for small effects at n = 410, whereas fully binary models required n > 500. For medium effects (0.15), the power was >0.80 at n = 75 with binary mediators. This study presents a robust framework that combines the flexibility of simulation-based inference with the speed of analytical approximations. We provide an accompanying R package to facilitate efficient sample size planning for mediation models. Full article
Show Figures

Figure 1

14 pages, 298 KB  
Article
The Bivariate Poisson–X–Exponential Distribution: Theory, Inference, and Multidomain Applications
by Wafa Treidi and Halim Zeghdoudi
Stats 2026, 9(1), 18; https://doi.org/10.3390/stats9010018 - 14 Feb 2026
Viewed by 163
Abstract
We propose the Bivariate Poisson–X–Exponential Distribution (BPXED), a flexible bivariate count model obtained by compounding Poisson variables with a shared X–Exponential latent mixing distribution. The model extends the Poisson–X–Exponential (PXED) distribution and includes several bivariate Poisson-type models as special or limiting cases. Closed-form [...] Read more.
We propose the Bivariate Poisson–X–Exponential Distribution (BPXED), a flexible bivariate count model obtained by compounding Poisson variables with a shared X–Exponential latent mixing distribution. The model extends the Poisson–X–Exponential (PXED) distribution and includes several bivariate Poisson-type models as special or limiting cases. Closed-form expressions are derived for the joint probability mass function, probability generating function, moments, and covariance structure, showing that dependence arises from shared latent heterogeneity and is restricted to positive correlation. Parameter estimation is developed using maximum likelihood, regression-based, and Bayesian approaches, and a Monte Carlo simulation study demonstrates a good finite-sample performance. Applications to soccer scores, reliability failures, and correlated photon counts illustrate improved goodness-of-fit over classical and recent competing models. Overall, BPXED provides an analytically tractable and interpretable framework for modeling positively dependent and overdispersed bivariate count data. Full article
(This article belongs to the Section Multivariate Analysis)
12 pages, 272 KB  
Communication
Estimating the Parameter of Direct Effects in Crossover Designs: The Case of 6 Periods and 2 Treatments
by Miltiadis S. Chalikias
Stats 2026, 9(1), 17; https://doi.org/10.3390/stats9010017 - 12 Feb 2026
Viewed by 151
Abstract
The present study investigates the derivation of optimal repeated measurement designs for two treatments, six periods, and n experimental units, focusing exclusively on the direct effects of the treatments. The optimal designs are determined for cases where n ≡ 0 or 1, 2, [...] Read more.
The present study investigates the derivation of optimal repeated measurement designs for two treatments, six periods, and n experimental units, focusing exclusively on the direct effects of the treatments. The optimal designs are determined for cases where n ≡ 0 or 1, 2, 3, 4 (mod 4). The adopted optimality criterion aims at minimizing the variance of the estimator of the direct effects, thereby ensuring maximum precision in parameter estimation and increased design efficiency. The results presented extend and complement earlier studies on optimal two-treatment repeated-measurement designs for a smaller number of periods, and are closely related to more recent work focusing on optimality with respect to direct effects. Overall, this work contributes to the theoretical framework of optimal design methodology by providing new insights into the structure and efficiency of repeated measurement designs, and lays the groundwork for future extensions incorporating treatment–period interactions. Full article
(This article belongs to the Section Statistical Methods)
7 pages, 227 KB  
Communication
Using Vector Representations of Characteristic Functions and Vector Logarithms When Proving Asymptotic Statements
by Wolf-Dieter Richter
Stats 2026, 9(1), 16; https://doi.org/10.3390/stats9010016 - 11 Feb 2026
Viewed by 173
Abstract
In this methodological–technical note, in addition to the well-known concepts of logarithms of positive real numbers and operators, we open a path for mathematical treatment of the mathematical concept of the logarithm of a vector. We prove the most basic arithmetic operations for [...] Read more.
In this methodological–technical note, in addition to the well-known concepts of logarithms of positive real numbers and operators, we open a path for mathematical treatment of the mathematical concept of the logarithm of a vector. We prove the most basic arithmetic operations for this new logarithm concept and demonstrate how it applies to characteristic functions and limit theorems of probability theory. As a side result, we revise a formula for ii that is known from the literature. Full article
(This article belongs to the Section Applied Stochastic Models)
33 pages, 1336 KB  
Article
New Two-Parameter Ridge Estimators for Addressing Multicollinearity in Linear Regression: Theory, Simulation, and Applications
by Md Ariful Hoque, B. M. Golam Kibria and Zoran Bursac
Stats 2026, 9(1), 15; https://doi.org/10.3390/stats9010015 - 10 Feb 2026
Viewed by 385
Abstract
Multicollinearity among explanatory variables often undermines the reliability of the ordinary least squares (OLS) estimator that can be used in linear regression modeling. To overcome the limitation, a variety of two-parameter estimation strategies have been developed in prior research. We revisit these existing [...] Read more.
Multicollinearity among explanatory variables often undermines the reliability of the ordinary least squares (OLS) estimator that can be used in linear regression modeling. To overcome the limitation, a variety of two-parameter estimation strategies have been developed in prior research. We revisit these existing methods and present a newly established two-parameter ridge estimator to improve the accuracy of regression coefficients in terms of multicollinearity settings. A theoretical evaluation, assessed under the mean squared error (MSE) framework, is examined to compare their efficiency. Furthermore, a comprehensive simulation study is conducted to examine the empirical properties of all these estimators for different configurations, followed by a real-life dataset to examine their performance. Full article
Show Figures

Figure 1

5 pages, 449 KB  
Communication
eduSTAT—Automated Workflows for the Analysis of Small- to Medium-Sized Datasets
by Rudolf Golubich
Stats 2026, 9(1), 14; https://doi.org/10.3390/stats9010014 - 4 Feb 2026
Viewed by 249
Abstract
This communication provides a citable methodological reference for eduSTAT (v1), an automated, rule-based workflow for the statistical analysis of small- to medium-sized datasets (N30–3000). The web application is initially available in German and will be offered in English once [...] Read more.
This communication provides a citable methodological reference for eduSTAT (v1), an automated, rule-based workflow for the statistical analysis of small- to medium-sized datasets (N30–3000). The web application is initially available in German and will be offered in English once it is established in German-speaking regions. It is developed with the aim of supporting early training in the scientific method and reducing the risk of spurious or inappropriate statistical analyses. The paper establishes the foundation for subsequent meta-analyses based on citation tracking of studies that apply eduSTAT, enabling iterative, data-driven improvement of the software. Full article
(This article belongs to the Section Statistical Software)
Show Figures

Figure 1

25 pages, 1321 KB  
Article
The Stingray Copula for Negative Dependence
by Alecos Papadopoulos
Stats 2026, 9(1), 13; https://doi.org/10.3390/stats9010013 - 4 Feb 2026
Viewed by 276
Abstract
We present a new single-parameter bivariate copula, called the Stingray, that is dedicated to representing negative dependence, and it nests the Independence copula. The Stingray copula is generated in a relatively novel way; it has a simple form and is always defined over [...] Read more.
We present a new single-parameter bivariate copula, called the Stingray, that is dedicated to representing negative dependence, and it nests the Independence copula. The Stingray copula is generated in a relatively novel way; it has a simple form and is always defined over the full support, unlike many copulas that model negative dependence. We provide visualizations of the copula, derive several dependence properties, and compute basic concordance measures. We compare it with other copulas and joint distributions with respect to the extent of dependence it can capture, and we find that the Stingray copula outperforms most of them while remaining competitive with well-known, widely used copulas such as the Gaussian and Frank copulas. Moreover, we show, through simulation, that the dependence structure it represents cannot be fully captured by these copulas, as it is asymmetric. We also show how the non-parametric Spearman’s rho measure of concordance can be used to formally test the hypothesis of statistical independence. As an illustration, we apply it to a financial data sample from the building construction sector in order to model the negative relationship between the level of capital employed and its gross rate of return. Full article
Show Figures

Figure 1

24 pages, 2292 KB  
Article
Tuning for Precision Forecasting of Green Market Volatility Time Series
by Sonia Benghiat and Salim Lahmiri
Stats 2026, 9(1), 12; https://doi.org/10.3390/stats9010012 - 29 Jan 2026
Viewed by 286
Abstract
In recent years, the green financial market has been exhibiting heightened volatility daily, largely due to policy changes and economic shifts. To explore the broader potential of predictive modeling in the context of short-term volatility time series, this study analyzes how fine-tuning hyperparameters [...] Read more.
In recent years, the green financial market has been exhibiting heightened volatility daily, largely due to policy changes and economic shifts. To explore the broader potential of predictive modeling in the context of short-term volatility time series, this study analyzes how fine-tuning hyperparameters in predictive models is essential for improving short-term forecasts of market volatility, particularly within the rapidly evolving domain of green financial markets. While traditional econometric models have long been employed to model market volatility, their application to green markets remains limited, especially when contrasted with the emerging potential of machine-learning and deep-learning approaches for capturing complex dynamics in this context. This study evaluates the performance of several data-driven forecasting models starting with machine-learning models: regression tree (RT) and support vector regression (SVR), and with deep-learning ones: long short-term memory (LSTM), convolutional neural network (CNN), and gated recurrent unit (GRU) applied to over a decade of daily estimated volatility data coming from three distinct green markets. Predictive accuracy is compared both with and without hyperparameter optimization methods. In addition, this study introduces the quantile loss metric to better capture the skewness and heavy tails inherent in these financial series, alongside two widely used evaluation metrics. This comparative analysis yields significant numerical and graphical insights, enhancing the understanding of short-term volatility predictability in green markets and advancing a relatively underexplored research domain. The study demonstrates that deep-learning predictors outperform machine-learning ones, and that including a hyperparameter tuning algorithm shows consistent improvements across all deep-learning models and for all volatility time series. Full article
(This article belongs to the Section Applied Statistics and Machine Learning Methods)
Show Figures

Figure 1

45 pages, 1107 KB  
Article
Improving Confidence Interval Estimation in Logistic Regression with Multicollinear Predictors: A Comparative Study of Shrinkage Estimators and Application to Prostate Cancer Data
by Sultana Mubarika Rahman Chowdhury, Zoran Bursac and B. M. Golam Kibria
Stats 2026, 9(1), 11; https://doi.org/10.3390/stats9010011 - 29 Jan 2026
Viewed by 326
Abstract
In logistic regression with finite binary samples and multicollinear predictors, the maximum likelihood estimator often results in overfitting and high mean squared error (MSE). Shrinkage methods like ridge, Liu, and Kibria–Lukman offer improved MSE performance but are typically evaluated only on this criterion, [...] Read more.
In logistic regression with finite binary samples and multicollinear predictors, the maximum likelihood estimator often results in overfitting and high mean squared error (MSE). Shrinkage methods like ridge, Liu, and Kibria–Lukman offer improved MSE performance but are typically evaluated only on this criterion, which overlooks their inferential capability. This study shifts the focus toward confidence interval coverage, using simulations to assess the coverage probability, interval width, and MSE of several shrinkage estimators under varying conditions. The results show that, while shrinkage methods generally reduce interval width and MSE, many fail to maintain adequate coverage. However, certain ridge and Kibria–Lukman estimators achieve a favorable balance between narrow interval width and consistent coverage, making them preferable. The findings are further validated using a prostate cancer dataset, contributing to more reliable inference in logistic regression under multicollinearity. Overall, the results demonstrate that well-chosen shrinkage estimators can serve as effective alternatives to the MLE in biostatistical modeling, improving the stability and interpretability of regression analyses in studies pertaining to public health and medicine. Full article
(This article belongs to the Section Applied Statistics and Machine Learning Methods)
Show Figures

Figure 1

21 pages, 894 KB  
Article
Effect Structures in Ordinal Regression: The Adjacent Categories Approach
by Gerhard Tutz
Stats 2026, 9(1), 10; https://doi.org/10.3390/stats9010010 - 27 Jan 2026
Viewed by 366
Abstract
The potential of the adjacent categories approach for capturing the influence of explanatory variables on ordinal responses is investigated. Several models with increasing complexity in their linear predictors are considered, and their relationships are discussed, including the basic adjacent categories model, the stereotype [...] Read more.
The potential of the adjacent categories approach for capturing the influence of explanatory variables on ordinal responses is investigated. Several models with increasing complexity in their linear predictors are considered, and their relationships are discussed, including the basic adjacent categories model, the stereotype model, models with category-specific effects, and dispersion models. For the adjacent categories framework, regularization methods for effect selection are introduced with the aim of distinguishing between no effect, global effects, and category-specific effects. Particular attention is given to the adjacent dispersion model, which provides a parsimonious parameterization while substantially improving model fit compared to the basic model. Effect selection for both the location and dispersion effects in the adjacent dispersion model is introduced. The proposed approaches are illustrated using several real data sets. Full article
Show Figures

Figure 1

15 pages, 318 KB  
Article
A Utility-Driven Bayesian Design: A New Framework for Extracting Optimal Experiments from Observational Reliability Data
by Rossella Berni, Nedka Dechkova Nikiforova and Federico Mattia Stefanini
Stats 2026, 9(1), 9; https://doi.org/10.3390/stats9010009 - 21 Jan 2026
Viewed by 246
Abstract
In this study, a procedure to build Bayesian optimal designs using utility functions and exploiting existing data is proposed. The procedure is illustrated through a case study in the field of reliability, by applying a hierarchical Bayesian model and performing Markov Chain Monte [...] Read more.
In this study, a procedure to build Bayesian optimal designs using utility functions and exploiting existing data is proposed. The procedure is illustrated through a case study in the field of reliability, by applying a hierarchical Bayesian model and performing Markov Chain Monte Carlo simulations. Two innovative contributions are introduced: (i) the definition of specific utility functions that involve several key issues and (ii) the use of observational data. The use of observational data makes it possible to build the optimal design without additional costs for the company, while the definition of the utility functions accounts for the specific characteristics of the reliability study. Features like model residuals, i.e., discrepancies between observed and predicted response values, and the costs of the electronic component are addressed. Costs are also weighted considering the environmental impact. Satisfactory results are obtained and subsequently validated through an in-depth sensitivity analysis. Full article
Show Figures

Figure 1

20 pages, 401 KB  
Article
Preliminary and Shrinkage-Type Estimation for the Parameters of the Birnbaum–Saunders Distribution Based on Modified Moments
by Syed Ejaz Ahmed, Muhammad Kashif Ali Shah, Waqas Makhdoom and Nighat Zahra
Stats 2026, 9(1), 8; https://doi.org/10.3390/stats9010008 - 16 Jan 2026
Viewed by 292
Abstract
The two-parameter Birnbaum–Saunders (B-S) distribution is widely applied across various fields due to its favorable statistical properties. This study aims to enhance the efficiency of modified moment estimators for the B-S distribution by systematically incorporating auxiliary non-sample information. To this end, we developed [...] Read more.
The two-parameter Birnbaum–Saunders (B-S) distribution is widely applied across various fields due to its favorable statistical properties. This study aims to enhance the efficiency of modified moment estimators for the B-S distribution by systematically incorporating auxiliary non-sample information. To this end, we developed and analyzed a suite of estimation strategies, including restricted estimators, preliminary test estimators, and Stein-type shrinkage estimators. A pretest procedure was formulated to guide the decision on whether to integrate the non-sample information. The relative performance of these estimators was rigorously evaluated through an asymptotic distributional analysis, comparing their asymptotic distributional bias and risk under a sequence of local alternatives. The finite-sample properties were assessed via Monte Carlo simulation studies. The practical utility of the proposed methods is demonstrated through applications to two real-world datasets: failure times for mechanical valves and bone mineral density measurements. Both numerical results and theoretical analysis confirm that the proposed shrinkage-based techniques deliver substantial efficiency gains over conventional estimators. Full article
Show Figures

Figure 1

22 pages, 694 KB  
Article
Performance Forecasting for Multi-Server Retrial Queue with Possibility of Processing Repetition and Server Reservation for Repeating Users
by Alexander N. Dudin, Sergei A. Dudin and Olga S. Dudina
Stats 2026, 9(1), 7; https://doi.org/10.3390/stats9010007 - 9 Jan 2026
Viewed by 357
Abstract
This study focuses on forecasting and optimizing the performance of a real-world object modelled by a multi-server queueing system that processes two types of users: primary (new) users and repeating users. The repeating users are those who succeeded in entering processing upon arrival [...] Read more.
This study focuses on forecasting and optimizing the performance of a real-world object modelled by a multi-server queueing system that processes two types of users: primary (new) users and repeating users. The repeating users are those who succeeded in entering processing upon arrival and then decided to repeat it. These users have privilege and can enter processing when they wish once at least one device is idle. The primary user is admitted to the system only if the number of occupied devices is less than some threshold value and the quantity of repeating users residing in the system does not exceed certain thresholds. Repeating users are impatient and non-persistent. Arrivals of primary users are described by the Markovian arrival process. Processing times of primary and repeating users have distinct phase-type distributions. Utilizing the concept of the generalized phase–time distributions, the dynamics of this queueing system are formally characterized by the multidimensional Markov chain, which is examined in this paper. The ergodicity condition is derived. The relation of the key performance characteristics of the system and the thresholds defining the policy of the primary user’s admission is numerically highlighted. Optimal threshold selection is demonstrated numerically. Full article
Show Figures

Figure 1

15 pages, 1297 KB  
Article
Two-Stage Wiener-Physically-Informed-Neural-Network (W-PINN) AI Methodology for Highly Dynamic and Highly Complex Static Processes
by Dillon G. Hurd, Yuderka T. González, Jacob Oyler, Spencer Wolfe, Monica H. Lamm and Derrick K. Rollins
Stats 2026, 9(1), 6; https://doi.org/10.3390/stats9010006 - 1 Jan 2026
Viewed by 503
Abstract
Our new Theoretically Dynamic Regression (TDR) modeling methodology was recently applied in three types of real data modeling cases using physically based dynamic model structures with low-order linear regression static functions. Two of the modeling cases achieved the validation set modeling [...] Read more.
Our new Theoretically Dynamic Regression (TDR) modeling methodology was recently applied in three types of real data modeling cases using physically based dynamic model structures with low-order linear regression static functions. Two of the modeling cases achieved the validation set modeling goal of rfit,val  0.9. However, the third case, consisting of eleven (11) type one (1) sensor glucose data sets, and thus, eleven individual models, all fail considerably short of this modeling goal and the average  rfit,val, r¯fit,val = 0.68. For this case, the dynamic forms are highly complex 60 min forecast, second-order-plus-dead-time-plus-lead (SOPDTPL) structures, and the static form is a twelve (12) input first-order linear regression structure. Using these dynamic structure results, the objective is to significantly increase  rfit for each of the eleven (11) modeling cases using the recently developed Wiener-Physically-Informed-Neural-Network (W-PINN) approach as the static modeling structure. Two W-PINN stage-two static structures are evaluated–one developed using the JMP® Pro Version 16, Artificial Neural Network (ANN) toolbox and the other developed using a novel ANN methodology coded in Python version, 3.12.3. The JMP r¯fit,val = 0.74 with a maximum of 0.84. The Python r¯fit,val = 0.82 with a maximum of 0.93. Incorporating bias correction, using current and past SGC residuals, the Python estimator improved the average r¯fit,val from 0.82 to 0.87 with the maximum still 0.93. Full article
Show Figures

Figure 1

29 pages, 2805 KB  
Article
Probabilistic Links Between Quantum Classification of Patterns of Boolean Functions and Hamming Distance
by Theodore Andronikos, Constantinos Bitsakos, Konstantinos Nikas, Georgios I. Goumas and Nectarios Koziris
Stats 2026, 9(1), 5; https://doi.org/10.3390/stats9010005 - 1 Jan 2026
Viewed by 385
Abstract
This article investigates the probabilistic relationship between quantum classification of Boolean functions and their Hamming distance. By integrating concepts from quantum computing, information theory, and combinatorics, we explore how Hamming distance serves as a metric for analyzing deviations in function classification. Our extensive [...] Read more.
This article investigates the probabilistic relationship between quantum classification of Boolean functions and their Hamming distance. By integrating concepts from quantum computing, information theory, and combinatorics, we explore how Hamming distance serves as a metric for analyzing deviations in function classification. Our extensive experimental results confirm that the Hamming distance is a pivotal metric for validating nearest neighbors in the process of classifying random functions. One of the significant conclusions we arrived is that the successful classification probability decreases monotonically with the Hamming distance. However, key exceptions were found in specific classes, revealing intra-class heterogeneity. We have established that these deviations are not random but are systemic and predictable. Furthermore, we were able to quantify these irregularities, turning potential errors into manageable phenomena. The most important novelty of this work is the demarcation, for the first time to the best of our knowledge, of precise Hamming distance intervals for the classification probability. These intervals bound the possible values the probability can assume, and provide a new foundational tool for probabilistic assessment in quantum classification. Practitioners can now endorse classification results with high certainty or dismiss them with confidence. This framework can significantly enhance any quantum classification algorithm’s reliability and decision-making capability. Full article
Show Figures

Figure 1

23 pages, 4673 KB  
Article
ST-Community Detection Methods for Spatial Transcriptomics Data Analysis
by Charles Zhao and Jian-Jian Ren
Stats 2026, 9(1), 4; https://doi.org/10.3390/stats9010004 - 1 Jan 2026
Viewed by 598
Abstract
The single-cell spatial transcriptomics (ST) data with cell type and spatial location, i.e., (C,x,y) with C as cell type and (x,y) as its spatial location, produced by recent biotechnologies, such as CosMx and [...] Read more.
The single-cell spatial transcriptomics (ST) data with cell type and spatial location, i.e., (C,x,y) with C as cell type and (x,y) as its spatial location, produced by recent biotechnologies, such as CosMx and Xenium, contain a huge amount of information about cancer tissue samples, thus have great potential for cancer research via detection of ST-Community which is defined as a collection of cells with distinct cell-type composition and similar neighboring patterns based on nearby cell-percentages. But for huge CosMx single-cell ST data, the existing clustering methods do not work well for st-community detection, and the commonly used kNN compositional data method shows lack of informative neighboring cell patterns. In this article, we propose a novel and more informative disk compositional data (DCD) method for single-cell ST data, which identifies neighboring patterns of each cell via taking into account of ST data features from recent new technologies. After initial processing single-cell ST data into the DCD matrix, an innovative DCD-TMHC computation method for st-community detection is proposed here. Extensive simulation studies and the analysis of CosMx breast cancer data, which is an example of single-cell ST dataset, clearly show that our proposed DCD-TMHC computation method is superior to other existing methods. Based on the st-communities detected for CosMx breast cancer data, the logistic regression analysis results demonstrate that the proposed DCD-TMHC computation method produces better interpretable and superior outcomes, especially in terms of assessment for different cancer categories. These suggest that our proposed novel and informative DCD-TMHC computation method here will be helpful and have an impact on future cancer research based on single-cell ST data, which can improve cancer diagnosis and monitor cancer treatment progress. Full article
(This article belongs to the Section Computational Statistics)
Show Figures

Figure 1

12 pages, 260 KB  
Article
Repeated Measurement Designs of Five Periods: Estimating the Parameter of Carryover Effects
by Miltiadis S. Chalikias
Stats 2026, 9(1), 3; https://doi.org/10.3390/stats9010003 - 29 Dec 2025
Viewed by 225
Abstract
This study investigates the derivation of optimal repeated measurement designs of two treatments, five periods, and n experimental units for carryover effects. The optimal designs are determined for cases where n = 0, 1 (mod 2). The adopted optimality criterion focuses on minimizing [...] Read more.
This study investigates the derivation of optimal repeated measurement designs of two treatments, five periods, and n experimental units for carryover effects. The optimal designs are determined for cases where n = 0, 1 (mod 2). The adopted optimality criterion focuses on minimizing the variance of the estimated carryover effect, thereby ensuring maximum precision in parameter estimation and design efficiency. The results presented here extend and complement earlier research of Chalikias and Kounias on optimal two-treatment repeated measurement designs for a smaller number of periods, and are closely related to the more recent findings on optimal designs for direct effects. Overall, the present work contributes to the theoretical framework of optimal design methodology by providing new insights into the structure and efficiency of repeated measurement designs, particularly in the presence of carryover effects, and sets the ground for future extensions incorporating treatment–period interactions. Full article
14 pages, 400 KB  
Article
Stochastic Complexity of Rayleigh and Rician Data with Normalized Maximum Likelihood
by Aaron Lanterman
Stats 2026, 9(1), 2; https://doi.org/10.3390/stats9010002 - 25 Dec 2025
Viewed by 646
Abstract
The Rician distribution, which arises in radar, communications, and magnetic resonance imaging, is characterized by a noncentrality parameter and a scale parameter. The Rayleigh distribution is a special case of the Rician distribution with a noncentrality parameter of zero. This paper considers generalized [...] Read more.
The Rician distribution, which arises in radar, communications, and magnetic resonance imaging, is characterized by a noncentrality parameter and a scale parameter. The Rayleigh distribution is a special case of the Rician distribution with a noncentrality parameter of zero. This paper considers generalized hypothesis testing for Rayleigh and Rician distributions using Rissanen’s stochastic complexity, particularly his approximation employing Fisher information matrices. The Rayleigh distribution is a member of the exponential family, so its normalized maximum likelihood density is readily computed, and shown to asymptotically match the Fisher information approximation. Since the Rician distribution is not a member of the exponential family, its normalizing term is difficult to compute directly, so the Fisher information approximation is employed. Because the square root of the determinant of the Fisher information matrix is not integrable, we restrict the integral to a subset of its range, and separately encode the choice of subset. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Graphical abstract

15 pages, 322 KB  
Article
A Proportional Hazards Mixture Cure Model for Subgroup Analysis: Inferential Method and an Application to Colon Cancer Data
by Kai Liu, Yingwei Peng and Narayanaswamy Balakrishnan
Stats 2026, 9(1), 1; https://doi.org/10.3390/stats9010001 - 24 Dec 2025
Viewed by 345
Abstract
When determining subgroups with heterogeneous treatment effects in cancer clinical trials, the threshold of a variable that defines subgroups is often pre-determined by physicians based on their experience, and the optimality of the threshold is not well studied, particularly when the mixture cure [...] Read more.
When determining subgroups with heterogeneous treatment effects in cancer clinical trials, the threshold of a variable that defines subgroups is often pre-determined by physicians based on their experience, and the optimality of the threshold is not well studied, particularly when the mixture cure rate model is considered. We propose a mixture cure model that allows optimal subgroups to be estimated for both the time to event for uncured subjects and the cure status. We develop a smoothed maximum likelihood method for the estimation of model parameters. An extensive simulation study shows that the proposed smoothed maximum likelihood method provides accurate estimates. Finally, the proposed mixture cure model is applied to a colon cancer study to evaluate the potential differences in the treatment effect of levamisole plus fluorouracil therapy versus levamisole alone therapy between younger and older patients. The model suggests that the difference in the treatment effect on the time to cancer recurrence for uncured patients is significant between patients younger than 67 and patients older than 67, and the younger patient group benefits more from the combined therapy than the older patient group. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop