Next Issue
Volume 6, June
Previous Issue
Volume 5, December
 
 

Stats, Volume 6, Issue 1 (March 2023) – 29 articles

Cover Story (view full-size image): To accelerate the results of a clinical trial, investigators often rely on intermediate or “surrogate” endpoints that can be obtained earlier than the ultimate or “true” endpoint of interest.  For example, tumor growth may be observed well in advance of death in a cancer setting. There are settings in which the proposed surrogate endpoint is positively correlated with the true endpoint, but the treatment has opposite effects on the surrogate and true endpoints, a phenomenon labeled as “surrogate paradox”. Covariate information may be useful in predicting an individual’s risk of surrogate paradox. In this work, we extend methods to estimate the risk of this paradox as a function of different baseline covariates (for example, males versus females). View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
14 pages, 553 KiB  
Article
Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data
by Zheng Xu, Song Yan, Shuai Yuan, Cong Wu, Sixia Chen, Zifang Guo and Yun Li
Stats 2023, 6(1), 468-481; https://doi.org/10.3390/stats6010029 - 19 Mar 2023
Cited by 1 | Viewed by 1744
Abstract
Sequencing-based genetic association analysis is typically performed by first generating genotype calls from sequence data and then performing association tests on the called genotypes. Standard approaches require accurate genotype calling (GC), which can be achieved either with high sequencing depth (typically available in [...] Read more.
Sequencing-based genetic association analysis is typically performed by first generating genotype calls from sequence data and then performing association tests on the called genotypes. Standard approaches require accurate genotype calling (GC), which can be achieved either with high sequencing depth (typically available in a small number of individuals) or via computationally intensive multi-sample linkage disequilibrium (LD)-aware methods. We propose a computationally efficient two-stage combination approach for association analysis, in which single-nucleotide polymorphisms (SNPs) are screened in the first stage via a rapid maximum likelihood (ML)-based method on sequence data directly (without first calling genotypes), and then the selected SNPs are evaluated in the second stage by performing association tests on genotypes from multi-sample LD-aware calling. Extensive simulation- and real data-based studies show that the proposed two-stage approaches can save 80% of the computational costs and still obtain more than 90% of the power of the classical method to genotype all markers at various depths d2. Full article
(This article belongs to the Section Bioinformatics)
Show Figures

Figure 1

18 pages, 571 KiB  
Article
A Phylogenetic Regression Model for Studying Trait Evolution on Network
by Dwueng-Chwuan Jhwueng
Stats 2023, 6(1), 450-467; https://doi.org/10.3390/stats6010028 - 18 Mar 2023
Viewed by 1749
Abstract
A phylogenetic regression model that incorporates the network structure allowing the reticulation event to study trait evolution is proposed. The parameter estimation is achieved through the maximum likelihood approach, where an algorithm is developed by taking a phylogenetic network in eNewick format as [...] Read more.
A phylogenetic regression model that incorporates the network structure allowing the reticulation event to study trait evolution is proposed. The parameter estimation is achieved through the maximum likelihood approach, where an algorithm is developed by taking a phylogenetic network in eNewick format as the input to build up the variance–covariance matrix. The model is applied to study the common sunflower, Helianthus annuus, by investigating its traits used to respond to drought conditions. Results show that our model provides acceptable estimates of the parameters, where most of the traits analyzed were found to have a significant correlation with drought tolerance. Full article
(This article belongs to the Section Regression Models)
Show Figures

Figure 1

12 pages, 3114 KiB  
Article
Consecutive-k1 and k2-out-of-n: F Structures with a Single Change Point
by Ioannis S. Triantafyllou and Miltiadis Chalikias
Stats 2023, 6(1), 438-449; https://doi.org/10.3390/stats6010027 - 16 Mar 2023
Viewed by 1303
Abstract
In the present paper, we establish a new consecutive-type reliability model with a single change point. The proposed structure has two common failure criteria and consists of two different types of components. The general framework for constructing the so-called consecutive-k1 and [...] Read more.
In the present paper, we establish a new consecutive-type reliability model with a single change point. The proposed structure has two common failure criteria and consists of two different types of components. The general framework for constructing the so-called consecutive-k1 and k2-out-of-n: F system with a single change point is launched. In addition, the number of path sets of the proposed structure is determined with the aid of a combinatorial approach. Moreover, two crucial performance characteristics of the proposed model are studied. The numerical investigation carried out reveals that the behavior of the new structure is outperforming against its competitors. Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
Show Figures

Figure 1

7 pages, 302 KiB  
Brief Report
Analytic Error Function and Numeric Inverse Obtained by Geometric Means
by Dmitri Martila and Stefan Groote
Stats 2023, 6(1), 431-437; https://doi.org/10.3390/stats6010026 - 15 Mar 2023
Viewed by 1256
Abstract
Using geometric considerations, we provided a clear derivation of the integral representation for the error function, known as the Craig formula. We calculated the corresponding power series expansion and proved the convergence. The same geometric means finally assisted in systematically deriving useful formulas [...] Read more.
Using geometric considerations, we provided a clear derivation of the integral representation for the error function, known as the Craig formula. We calculated the corresponding power series expansion and proved the convergence. The same geometric means finally assisted in systematically deriving useful formulas that approximated the inverse error function. Our approach could be used for applications in high-speed Monte Carlo simulations, where this function is used extensively. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

20 pages, 3816 KiB  
Article
Renaissance of Creative Accounting Due to the Pandemic: New Patterns Explored by Correspondence Analysis
by Roman Blazek, Pavol Durana and Jakub Michulek
Stats 2023, 6(1), 411-430; https://doi.org/10.3390/stats6010025 - 3 Mar 2023
Cited by 5 | Viewed by 2511
Abstract
The COVID-19 outbreak has rapidly affected global economies and the parties involved. There was a need to ensure the sustainability of corporate finance and avoid bankruptcy. The reactions of individuals were not routine, but covered a wide range of approaches to surviving the [...] Read more.
The COVID-19 outbreak has rapidly affected global economies and the parties involved. There was a need to ensure the sustainability of corporate finance and avoid bankruptcy. The reactions of individuals were not routine, but covered a wide range of approaches to surviving the crisis. A creative way of accounting was also adopted. This study is primarily concerned with the behavior of businesses in the Visegrad Four countries between 2019 and 2021. The pandemic era was the driving force behind the renaissance of manipulation. Thus, the purpose of the article is to explore how the behavior of enterprises changed during the ongoing pandemic. The Beneish model was applied to reveal creative manipulation in the analyzed samples. Its M-score was calculated for 6113 Slovak, 153 Czech, 585 Polish, and 155 Hungarian enterprises. Increasing numbers of handling enterprises were confirmed in the V4 region. The dependency between the size of the enterprise and the occurrence of creative accounting was also proven. However, the structure of manipulators has been changing. Correspondence analysis specifically showed behavioral changes over time. Correspondence maps demonstrate which enterprises already used creative accounting before the pandemic in 2019. Then, it was noted that enterprises were influenced to modify their patterns in 2020 and 2021. The coronavirus pandemic had a significant potency on the use of creative accounting, not only for individual units, but for businesses of all sizes. In addition, the methodology may be applied for the investigation of individual sectors post-COVID. Full article
(This article belongs to the Section Financial Statistics)
Show Figures

Figure 1

30 pages, 4253 KiB  
Article
The Linear Skew-t Distribution and Its Properties
by C. J. Adcock
Stats 2023, 6(1), 381-410; https://doi.org/10.3390/stats6010024 - 23 Feb 2023
Viewed by 1961
Abstract
The aim of this expository paper is to present the properties of the linear skew-t distribution, which is a specific example of a symmetry modulated-distribution. The skewing function remains the distribution function of Student’s t, but its argument is simpler than that used [...] Read more.
The aim of this expository paper is to present the properties of the linear skew-t distribution, which is a specific example of a symmetry modulated-distribution. The skewing function remains the distribution function of Student’s t, but its argument is simpler than that used for the standard skew-t. The linear skew-t offers different insights, for example, different moments and tail behavior, and can be simpler to use for empirical work. It is shown that the distribution may be expressed as a hidden truncation model. The paper describes an extended version of the distribution that is analogous to the extended skew-t. For certain parameter values, the distribution is bimodal. The paper presents expressions for the moments of the distribution and shows that numerical integration methods are required. A multivariate version of the distribution is described. The bivariate version of the distribution may also be bimodal. The distribution is not closed under marginalization, and stochastic ordering is not satisfied. The properties of the distribution are illustrated with numerous examples of the density functions, table of moments and critical values. The results in this paper suggest that the linear skew-t may be useful for some applications, but that it should be used with care for methodological work. Full article
Show Figures

Figure 1

16 pages, 350 KiB  
Article
On Weak Convergence of the Bootstrap Copula Empirical Process with Random Resample Size
by Salim Bouzebda
Stats 2023, 6(1), 365-380; https://doi.org/10.3390/stats6010023 - 22 Feb 2023
Viewed by 1288
Abstract
The purpose of this note is to provide a description of the weak convergence of the random resample size bootstrap empirical process. The principal results are used to estimate the sample rank correlation coefficients using Spearman’s and Kendall’s respective methods. In addition to [...] Read more.
The purpose of this note is to provide a description of the weak convergence of the random resample size bootstrap empirical process. The principal results are used to estimate the sample rank correlation coefficients using Spearman’s and Kendall’s respective methods. In addition to this, we discuss how our findings can be applied to statistical testing. Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
11 pages, 1626 KiB  
Brief Report
Assessing Area under the Curve as an Alternative to Latent Growth Curve Modeling for Repeated Measures Zero-Inflated Poisson Data: A Simulation Study
by Daniel Rodriguez
Stats 2023, 6(1), 354-364; https://doi.org/10.3390/stats6010022 - 19 Feb 2023
Cited by 4 | Viewed by 2002
Abstract
Researchers interested in the assessment of substance use trajectories, and predictors of change, have several data analysis options. These include, among others, generalized estimating equations and latent growth curve modeling. One difficulty in the assessment of substance use, however, is the nature of [...] Read more.
Researchers interested in the assessment of substance use trajectories, and predictors of change, have several data analysis options. These include, among others, generalized estimating equations and latent growth curve modeling. One difficulty in the assessment of substance use, however, is the nature of the variables studied. Although counting instances of use (e.g., the number of cigarettes smoked per day) would seem to be the best option, such data present difficulties in that the distribution of these variables is not likely normal. Count variables often follow a Poisson distribution, and when dealing with substance use in the general population, there is a preponderance of zeros (representing not using). As such, substance use counts may approximate a zero-inflated Poisson distribution. Unfortunately, analyses with zero-inflated Poisson random variables are not easily accommodated in many types of software and may be beyond access to most researchers. As such, an easier method would benefit researchers interested in assessing substance use change. The purpose of this study is to assess the area under the curve as an option when dealing with repeated measures data and contrast it to one popular method of longitudinal data analysis, latent growth curve modeling. Using a Monte Carlo simulation study with varying sample sizes, we found that the area under the curve performed well with different sample sizes and compared favorably to the performance of latent growth curve modeling, particularly when dealing with smaller sample sizes. The area under the curve may be a simpler alternative for researchers, especially when dealing with smaller sample sizes. Full article
Show Figures

Figure 1

9 pages, 257 KiB  
Communication
Quantum-like Data Modeling in Applied Sciences: Review
by Stan Lipovetsky
Stats 2023, 6(1), 345-353; https://doi.org/10.3390/stats6010021 - 17 Feb 2023
Cited by 3 | Viewed by 2056
Abstract
This work presents a brief review on the modern approaches to data modeling by the methods developed in the quantum physics during the last one hundred years. Quantum computers and computations have already been widely investigated theoretically and attempted in some practical implementations, [...] Read more.
This work presents a brief review on the modern approaches to data modeling by the methods developed in the quantum physics during the last one hundred years. Quantum computers and computations have already been widely investigated theoretically and attempted in some practical implementations, but methods of quantum data modeling are not yet sufficiently established. A vast range of concepts and methods of quantum mechanics have been tried in many fields of information and behavior sciences, including communications and artificial intelligence, cognition and decision making, sociology and psychology, biology and economics, financial and political studies. The application of quantum methods in areas other than physics is called the quantum-like paradigm, meaning that such approaches may not be related to the physical processes but rather correspond to data modeling by the methods designed for operating in conditions of uncertainty. This review aims to attract attention to the possibilities of these methods of data modeling that can enrich theoretical consideration and be useful for practical purposes in various sciences and applications. Full article
23 pages, 541 KiB  
Article
Incorporating Covariates into Measures of Surrogate Paradox Risk
by Fatema Shafie Khorassani, Jeremy M. G. Taylor, Niko Kaciroti and Michael R. Elliott
Stats 2023, 6(1), 322-344; https://doi.org/10.3390/stats6010020 - 17 Feb 2023
Cited by 3 | Viewed by 1640
Abstract
Clinical trials often collect intermediate or surrogate endpoints other than their true endpoint of interest. It is important that the treatment effect on the surrogate endpoint accurately predicts the treatment effect on the true endpoint. There are settings in which the proposed surrogate [...] Read more.
Clinical trials often collect intermediate or surrogate endpoints other than their true endpoint of interest. It is important that the treatment effect on the surrogate endpoint accurately predicts the treatment effect on the true endpoint. There are settings in which the proposed surrogate endpoint is positively correlated with the true endpoint, but the treatment has opposite effects on the surrogate and true endpoints, a phenomenon labeled “surrogate paradox”. Covariate information may be useful in predicting an individual’s risk of surrogate paradox. In this work, we propose methods for incorporating covariates into measures of assessing the risk of surrogate paradox using the meta-analytic causal association framework. The measures calculate the probability that a treatment will have opposite effects on the surrogate and true endpoints and determine the size of a positive treatment effect on the surrogate endpoint that would reduce the risk of a negative treatment effect on the true endpoint as a function of covariates, allowing the effects of covariates on the surrogate and true endpoint to vary across trials. Full article
Show Figures

Figure 1

10 pages, 281 KiB  
Article
Panel Data Models for School Evaluation: The Case of High Schools’ Results in University Entrance Examinations
by Manuel Salas-Velasco
Stats 2023, 6(1), 312-321; https://doi.org/10.3390/stats6010019 - 13 Feb 2023
Viewed by 1864
Abstract
To what extent do high school students’ course grades align with their scores on standardized college admission tests? People sometimes make the argument that grades are “inflated”, but many school districts only use outcome-based descriptive methods for school evaluation. In order to answer [...] Read more.
To what extent do high school students’ course grades align with their scores on standardized college admission tests? People sometimes make the argument that grades are “inflated”, but many school districts only use outcome-based descriptive methods for school evaluation. In order to answer that question, this paper proposes econometric models for panel data, which are less well-known in educational evaluation. In particular, fixed-effects and random-effects models are proposed for assessing student performance in university entrance examinations. School-level panel data analysis allows one knowing if results in college admission tests vary more between high schools than within a high school in different academic years. Another advantage of using panel data includes the ability to control for school-specific unobserved heterogeneity. For empirical implementation, official transcript data and university entrance test scores of Spanish secondary schools are used. Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
19 pages, 1538 KiB  
Article
A New Soft-Clipping Discrete Beta GARCH Model and Its Application on Measles Infection
by Huaping Chen
Stats 2023, 6(1), 293-311; https://doi.org/10.3390/stats6010018 - 9 Feb 2023
Cited by 1 | Viewed by 1425
Abstract
In this paper, we develop a novel soft-clipping discrete beta GARCH (ScDBGARCH) model that provides an available method to model bounded time series with under-dispersion, equi-dispersion or over-dispersion. The new model not only allows positive dependence, but also negative dependence. The stochastic properties [...] Read more.
In this paper, we develop a novel soft-clipping discrete beta GARCH (ScDBGARCH) model that provides an available method to model bounded time series with under-dispersion, equi-dispersion or over-dispersion. The new model not only allows positive dependence, but also negative dependence. The stochastic properties of the models are established, and these results are, in turn, used in the analysis of the asymptotic properties of the conditional maximum likelihood (CML) estimator of the new model. In addition, we apply the new model to measles infection to show its improved performance. Full article
Show Figures

Figure 1

14 pages, 494 KiB  
Article
A Class of Enhanced Nonparametric Control Schemes Based on Order Statistics and Runs
by Nikolaos I. Panayiotou and Ioannis S. Triantafyllou
Stats 2023, 6(1), 279-292; https://doi.org/10.3390/stats6010017 - 8 Feb 2023
Cited by 2 | Viewed by 1320
Abstract
In this article, we establish a new class of nonparametric Shewhart-type control charts based on order statistics with signaling runs-type rules. The proposed charts offer to the practitioner the opportunity to reach, as close as possible, a pre-specified level of performance by determining [...] Read more.
In this article, we establish a new class of nonparametric Shewhart-type control charts based on order statistics with signaling runs-type rules. The proposed charts offer to the practitioner the opportunity to reach, as close as possible, a pre-specified level of performance by determining appropriately their design parameters. Special monitoring schemes, already established in the literature, are ascertained to be members of the proposed class. In addition, several new nonparametric control charts that belong to the family are introduced and studied in some detail. Exact formulae for the variance of the run length distribution and the average run length (ARL) for the proposed monitoring schemes are also derived. A numerical investigation is carried out and demonstrates that the proposed schemes acquire competitive performance in detecting the shift of the underlying distribution. Although the large number of design parameters is quite hard to handle, the numerical results presented throughout the lines of the present manuscript provide practical guidance for the implementation of the proposed charts. Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
Show Figures

Figure 1

11 pages, 3767 KiB  
Article
Point Cloud Registration via Heuristic Reward Reinforcement Learning
by Bingren Chen
Stats 2023, 6(1), 268-278; https://doi.org/10.3390/stats6010016 - 6 Feb 2023
Cited by 1 | Viewed by 1761
Abstract
This paper proposes a heuristic reward reinforcement learning framework for point cloud registration. As an essential step of many 3D computer vision tasks such as object recognition and 3D reconstruction, point cloud registration has been well studied in the existing literature. This paper [...] Read more.
This paper proposes a heuristic reward reinforcement learning framework for point cloud registration. As an essential step of many 3D computer vision tasks such as object recognition and 3D reconstruction, point cloud registration has been well studied in the existing literature. This paper contributes to the literature by addressing the limitations of embedding and reward functions in existing methods. An improved state-embedding module and a stochastic reward function are proposed. While the embedding module enriches the captured characteristics of states, the newly designed reward function follows a time-dependent searching strategy, which allows aggressive attempts at the beginning and tends to be conservative in the end. We assess our method based on two public datasets (ModelNet40 and ScanObjectNN) and real-world data. The results confirm the strength of the new method in reducing errors in object rotation and translation, leading to more precise point cloud registration. Full article
(This article belongs to the Special Issue Feature Paper Special Issue: Reinforcement Learning)
Show Figures

Figure 1

15 pages, 920 KiB  
Article
Farlie–Gumbel–Morgenstern Bivariate Moment Exponential Distribution and Its Inferences Based on Concomitants of Order Statistics
by Sasikumar Padmini Arun, Christophe Chesneau, Radhakumari Maya and Muhammed Rasheed Irshad
Stats 2023, 6(1), 253-267; https://doi.org/10.3390/stats6010015 - 3 Feb 2023
Cited by 3 | Viewed by 1597
Abstract
In this research, we design the Farlie–Gumbel–Morgenstern bivariate moment exponential distribution, a bivariate analogue of the moment exponential distribution, using the Farlie–Gumbel–Morgenstern approach. With the analysis of real-life data, the competitiveness of the Farlie–Gumbel–Morgenstern bivariate moment exponential distribution in comparison with the other [...] Read more.
In this research, we design the Farlie–Gumbel–Morgenstern bivariate moment exponential distribution, a bivariate analogue of the moment exponential distribution, using the Farlie–Gumbel–Morgenstern approach. With the analysis of real-life data, the competitiveness of the Farlie–Gumbel–Morgenstern bivariate moment exponential distribution in comparison with the other Farlie–Gumbel–Morgenstern distributions is discussed. Based on the Farlie–Gumbel–Morgenstern bivariate moment exponential distribution, we develop the distribution theory of concomitants of order statistics and derive the best linear unbiased estimator of the parameter associated with the variable of primary interest (study variable). Evaluations are also conducted regarding the efficiency comparison of the best linear unbiased estimator relative to the respective unbiased estimator. Additionally, empirical illustrations of the best linear unbiased estimator with respect to the unbiased estimator are performed. Full article
(This article belongs to the Special Issue Novel Semiparametric Methods)
Show Figures

Figure 1

21 pages, 5159 KiB  
Article
A New Class of Alternative Bivariate Kumaraswamy-Type Models: Properties and Applications
by Indranil Ghosh
Stats 2023, 6(1), 232-252; https://doi.org/10.3390/stats6010014 - 30 Jan 2023
Cited by 1 | Viewed by 1394
Abstract
In this article, we introduce two new bivariate Kumaraswamy (KW)-type distributions with univariate Kumaraswamy marginals (under certain parametric restrictions) that are less restrictive in nature compared with several other existing bivariate beta and beta-type distributions. Mathematical expressions for the joint and marginal density [...] Read more.
In this article, we introduce two new bivariate Kumaraswamy (KW)-type distributions with univariate Kumaraswamy marginals (under certain parametric restrictions) that are less restrictive in nature compared with several other existing bivariate beta and beta-type distributions. Mathematical expressions for the joint and marginal density functions are presented, and properties such as the marginal and conditional distributions, product moments and conditional moments are obtained. Additionally, we show that both the proposed bivariate probability models have positive likelihood ratios dependent on a potential model for fitting positively dependent data in the bivariate domain. The method of maximum likelihood and the method of moments are used to derive the associated estimation procedure. An acceptance and rejection sampling plan to draw random samples from one of the proposed models along with a simulation study are also provided. For illustrative purposes, two real data sets are reanalyzed from different domains to exhibit the applicability of the proposed models in comparison with several other bivariate probability distributions, which are defined on [0,1]×[0,1]. Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
Show Figures

Figure A1

23 pages, 489 KiB  
Article
Bayesian Logistic Regression Model for Sub-Areas
by Lu Chen and Balgobin Nandram
Stats 2023, 6(1), 209-231; https://doi.org/10.3390/stats6010013 - 29 Jan 2023
Viewed by 1553
Abstract
Many population-based surveys have binary responses from a large number of individuals in each household within small areas. One example is the Nepal Living Standards Survey (NLSS II), in which health status binary data (good versus poor) for each individual from sampled households [...] Read more.
Many population-based surveys have binary responses from a large number of individuals in each household within small areas. One example is the Nepal Living Standards Survey (NLSS II), in which health status binary data (good versus poor) for each individual from sampled households (sub-areas) are available in the sampled wards (small areas). To make an inference for the finite population proportion of individuals in each household, we use the sub-area logistic regression model with reliable auxiliary information. The contribution of this model is twofold. First, we extend an area-level model to a sub-area level model. Second, because there are numerous sub-areas, standard Markov chain Monte Carlo (MCMC) methods to find the joint posterior density are very time-consuming. Therefore, we provide a sampling-based method, the integrated nested normal approximation (INNA), which permits fast computation. Our main goal is to describe this hierarchical Bayesian logistic regression model and to show that the computation is much faster than the exact MCMC method and also reasonably accurate. The performance of our method is studied by using NLSS II data. Our model can borrow strength from both areas and sub-areas to obtain more efficient and precise estimates. The hierarchical structure of our model captures the variation in the binary data reasonably well. Full article
Show Figures

Figure 1

17 pages, 478 KiB  
Article
Comparing Robust Linking and Regularized Estimation for Linking Two Groups in the 1PL and 2PL Models in the Presence of Sparse Uniform Differential Item Functioning
by Alexander Robitzsch
Stats 2023, 6(1), 192-208; https://doi.org/10.3390/stats6010012 - 25 Jan 2023
Cited by 5 | Viewed by 1790
Abstract
In the social sciences, the performance of two groups is frequently compared based on a cognitive test involving binary items. Item response models are often utilized for comparing the two groups. However, the presence of differential item functioning (DIF) can impact group comparisons. [...] Read more.
In the social sciences, the performance of two groups is frequently compared based on a cognitive test involving binary items. Item response models are often utilized for comparing the two groups. However, the presence of differential item functioning (DIF) can impact group comparisons. In order to avoid the biased estimation of groups, appropriate statistical methods for handling differential item functioning are required. This article compares the performance-regularized estimation and several robust linking approaches in three simulation studies that address the one-parameter logistic (1PL) and two-parameter logistic (2PL) models, respectively. It turned out that robust linking approaches are at least as effective as the regularized estimation approach in most of the conditions in the simulation studies. Full article
(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)
Show Figures

Figure 1

23 pages, 424 KiB  
Article
Informative g-Priors for Mixed Models
by Yu-Fang Chien, Haiming Zhou, Timothy Hanson and Theodore Lystig
Stats 2023, 6(1), 169-191; https://doi.org/10.3390/stats6010011 - 16 Jan 2023
Cited by 2 | Viewed by 1876
Abstract
Zellner’s objective g-prior has been widely used in linear regression models due to its simple interpretation and computational tractability in evaluating marginal likelihoods. However, the g-prior further allows portioning the prior variability explained by the linear predictor versus that of pure [...] Read more.
Zellner’s objective g-prior has been widely used in linear regression models due to its simple interpretation and computational tractability in evaluating marginal likelihoods. However, the g-prior further allows portioning the prior variability explained by the linear predictor versus that of pure noise. In this paper, we propose a novel yet remarkably simple g-prior specification when a subject matter expert has information on the marginal distribution of the response yi. The approach is extended for use in mixed models with some surprising but intuitive results. Simulation studies are conducted to compare the model fitting under the proposed g-prior with that under other existing priors. Full article
(This article belongs to the Special Issue Bayes and Empirical Bayes Inference)
19 pages, 445 KiB  
Article
A Novel Flexible Class of Intervened Poisson Distribution by Lagrangian Approach
by Muhammed Rasheed Irshad, Mohanan Monisha, Christophe Chesneau, Radhakumari Maya and Damodaran Santhamani Shibu
Stats 2023, 6(1), 150-168; https://doi.org/10.3390/stats6010010 - 15 Jan 2023
Cited by 3 | Viewed by 1496
Abstract
The zero-truncated Poisson distribution (ZTPD) generates a statistical model that could be appropriate when observations begin once at least one event occurs. The intervened Poisson distribution (IPD) is a substitute for the ZTPD, in which some intervention processes may change the mean of [...] Read more.
The zero-truncated Poisson distribution (ZTPD) generates a statistical model that could be appropriate when observations begin once at least one event occurs. The intervened Poisson distribution (IPD) is a substitute for the ZTPD, in which some intervention processes may change the mean of the rare events. These two zero-truncated distributions exhibit underdispersion (i.e., their variance is less than their mean). In this research, we offer an alternative solution for dealing with intervention problems by proposing a generalization of the IPD by a Lagrangian approach called the Lagrangian intervened Poisson distribution (LIPD), which in fact generalizes both the ZTPD and the IPD. As a notable feature, it has the ability to analyze both overdispersed and underdispersed datasets. In addition, the LIPD has a closed-form expression of all of its statistical characteristics, as well as an increasing, decreasing, bathtub-shaped, and upside-down bathtub-shaped hazard rate function. A consequent part is devoted to its statistical application. The maximum likelihood estimation method is considered, and the effectiveness of the estimates is demonstrated through a simulated study. To evaluate the significance of the new parameter in the LIPD, a generalized likelihood ratio test is performed. Subsequently, we present a new count regression model that is suitable for both overdispersed and underdispersed datasets using the mean-parametrized form of the LIPD. Additionally, the LIPD’s relevance and application are shown using real-world datasets. Full article
Show Figures

Figure 1

2 pages, 182 KiB  
Editorial
Acknowledgment to the Reviewers of Stats in 2022
by Stats Editorial Office
Stats 2023, 6(1), 148-149; https://doi.org/10.3390/stats6010009 - 12 Jan 2023
Viewed by 1051
Abstract
High-quality academic publishing is built on rigorous peer review [...] Full article
17 pages, 1811 KiB  
Article
Statistical Prediction of Future Sports Records Based on Record Values
by Christina Empacher, Udo Kamps and Grigoriy Volovskiy
Stats 2023, 6(1), 131-147; https://doi.org/10.3390/stats6010008 - 11 Jan 2023
Cited by 3 | Viewed by 2582
Abstract
Point prediction of future record values based on sequences of previous lower or upper records is considered by means of the method of maximum product of spacings, where the underlying distribution is assumed to be a power function distribution and a Pareto distribution, [...] Read more.
Point prediction of future record values based on sequences of previous lower or upper records is considered by means of the method of maximum product of spacings, where the underlying distribution is assumed to be a power function distribution and a Pareto distribution, respectively. Moreover, exact and approximate prediction intervals are discussed and compared with regard to their expected lengths and their percentages of coverage. The focus is on deriving explicit expressions in the point and interval prediction procedures. Predictions and forecasts are of interest, e.g., in sports analytics, which is gaining more and more attention in several sports disciplines. Previous works on forecasting athletic records have mainly been based on extreme value theory. The presented statistical prediction methods are exemplarily applied to data from various disciplines of athletics as well as to data from American football based on fantasy football points according to the points per reception scoring scheme. The results are discussed along with basic assumptions and the choice of underlying distributions. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

18 pages, 3946 KiB  
Article
Change Point Detection by State Space Modeling of Long-Term Air Temperature Series in Europe
by Magda Monteiro and Marco Costa
Stats 2023, 6(1), 113-130; https://doi.org/10.3390/stats6010007 - 4 Jan 2023
Cited by 2 | Viewed by 1936
Abstract
This work presents the statistical analysis of a monthly average temperatures time series in several European cities using a state space approach, which considers models with a deterministic seasonal component and a stochastic trend. Temperature rise rates in Europe seem to have increased [...] Read more.
This work presents the statistical analysis of a monthly average temperatures time series in several European cities using a state space approach, which considers models with a deterministic seasonal component and a stochastic trend. Temperature rise rates in Europe seem to have increased in the last decades when compared with longer periods. Therefore, change point detection methods, both parametric and non-parametric methods, were applied to the standardized residuals of the state space models (or some other related component) in order to identify these possible changes in the monthly temperature rise rates. All of the used methods have identified at least one change point in each of the temperature time series, particularly in the late 1980s or early 1990s. The differences in the average temperature trend are more evident in Eastern European cities than in Western Europe. The smoother-based t-test framework proposed in this work showed an advantage over the other methods, precisely because it considers the time correlation presented in time series. Moreover, this framework focuses the change point detection on the stochastic trend component. Full article
(This article belongs to the Special Issue Advances in State-Space Modeling of Time Series)
Show Figures

Figure 1

14 pages, 384 KiB  
Article
An ϵ-Greedy Multiarmed Bandit Approach to Markov Decision Processes
by Isa Muqattash and Jiaqiao Hu
Stats 2023, 6(1), 99-112; https://doi.org/10.3390/stats6010006 - 1 Jan 2023
Cited by 1 | Viewed by 1508
Abstract
We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ-greedy multiarmed bandit algorithm to each stage of the MDP in [...] Read more.
We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ-greedy multiarmed bandit algorithm to each stage of the MDP in a recursive manner, thus computing an estimation of the “reward-to-go” value at each stage of the MDP. We provide a finite-time analysis of REGA. In particular, we provide a bound on the probability that the approximation error exceeds a given threshold, where the bound is given in terms of the number of samples collected at each stage of the MDP. We empirically compare REGA against another sampling-based algorithm called RASA by running simulations against the SysAdmin benchmark problem with 210 states. The results show that REGA and RASA achieved similar performance. Moreover, REGA and RASA empirically outperformed an implementation of the algorithm that uses the “original” ϵ-greedy algorithm that commonly appears in the literature. Full article
(This article belongs to the Special Issue Feature Paper Special Issue: Reinforcement Learning)
Show Figures

Figure 1

32 pages, 5018 KiB  
Article
Applying the Multilevel Approach in Estimation of Income Population Differences
by Venera Timiryanova, Dina Krasnoselskaya and Natalia Kuzminykh
Stats 2023, 6(1), 67-98; https://doi.org/10.3390/stats6010005 - 29 Dec 2022
Viewed by 1711
Abstract
Income inequality remains one of the most burning issues discussed in the world. The difficulty of the problem arises from its multiple manifestations at regional and local levels and unique patterns within countries. This paper employs a multilevel approach to identify factors that [...] Read more.
Income inequality remains one of the most burning issues discussed in the world. The difficulty of the problem arises from its multiple manifestations at regional and local levels and unique patterns within countries. This paper employs a multilevel approach to identify factors that influence income and wage inequalities at regional and municipal scales in Russia. We carried out the study on data from 2017 municipalities of 75 Russian regions from 2015 to 2019. A Hierarchical Linear Model with Cross-Classified Random Effects (HLMHCM) allowed us to establish that most of the total variances in population income and average wages accounted for the regional scale. Our analysis revealed different variances of income per capita and average wage; we disclosed the reasons for these disparities. We also found a mixed relationship between income inequality and social transfers. These variables influence income growth but change the relationship between income and labour productivity. Our study underlined that the impacts of shares of employees in agriculture and manufacturing should be considered together with labour productivity in these industries. Full article
Show Figures

Figure 1

17 pages, 745 KiB  
Article
Do Deep Reinforcement Learning Agents Model Intentions?
by Tambet Matiisen, Aqeel Labash, Daniel Majoral, Jaan Aru and Raul Vicente
Stats 2023, 6(1), 50-66; https://doi.org/10.3390/stats6010004 - 28 Dec 2022
Cited by 1 | Viewed by 1943
Abstract
Inferring other agents’ mental states, such as their knowledge, beliefs and intentions, is thought to be essential for effective interactions with other agents. Recently, multi-agent systems trained via deep reinforcement learning have been shown to succeed in solving various tasks. Still, how each [...] Read more.
Inferring other agents’ mental states, such as their knowledge, beliefs and intentions, is thought to be essential for effective interactions with other agents. Recently, multi-agent systems trained via deep reinforcement learning have been shown to succeed in solving various tasks. Still, how each agent models or represents other agents in their environment remains unclear. In this work, we test whether deep reinforcement learning agents trained with the multi-agent deep deterministic policy gradient (MADDPG) algorithm explicitly represent other agents’ intentions (their specific aims or plans) during a task in which the agents have to coordinate the covering of different spots in a 2D environment. In particular, we tracked over time the performance of a linear decoder trained to predict the final targets of all agents from the hidden-layer activations of each agent’s neural network controller. We observed that the hidden layers of agents represented explicit information about other agents’ intentions, i.e., the target landmark the other agent ended up covering. We also performed a series of experiments in which some agents were replaced by others with fixed targets to test the levels of generalization of the trained agents. We noticed that during the training phase, the agents developed a preference for each landmark, which hindered generalization. To alleviate the above problem, we evaluated simple changes to the MADDPG training algorithm which lead to better generalization against unseen agents. Our method for confirming intention modeling in deep learning agents is simple to implement and can be used to improve the generalization of multi-agent systems in fields such as robotics, autonomous vehicles and smart cities. Full article
(This article belongs to the Special Issue Feature Paper Special Issue: Reinforcement Learning)
Show Figures

Figure 1

20 pages, 373 KiB  
Article
Estimating Smoothness and Optimal Bandwidth for Probability Density Functions
by Dimitris N. Politis, Peter F. Tarassenko and Vyacheslav A. Vasiliev
Stats 2023, 6(1), 30-49; https://doi.org/10.3390/stats6010003 - 27 Dec 2022
Viewed by 1518
Abstract
The properties of non-parametric kernel estimators for probability density function from two special classes are investigated. Each class is parametrized with distribution smoothness parameter. One of the classes was introduced by Rosenblatt, another one is introduced in this paper. For the case of [...] Read more.
The properties of non-parametric kernel estimators for probability density function from two special classes are investigated. Each class is parametrized with distribution smoothness parameter. One of the classes was introduced by Rosenblatt, another one is introduced in this paper. For the case of the known smoothness parameter, the rates of mean square convergence of optimal (on the bandwidth) density estimators are found. For the case of unknown smoothness parameter, the estimation procedure of the parameter is developed and almost surely convergency is proved. The convergence rates in the almost sure sense of these estimators are obtained. Adaptive estimators of densities from the given class on the basis of the constructed smoothness parameter estimators are presented. It is shown in examples how parameters of the adaptive density estimation procedures can be chosen. Non-asymptotic and asymptotic properties of these estimators are investigated. Specifically, the upper bounds for the mean square error of the adaptive density estimators for a fixed sample size are found and their strong consistency is proved. The convergence of these estimators in the almost sure sense is established. Simulation results illustrate the realization of the asymptotic behavior when the sample size grows large. Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
Show Figures

Figure 1

13 pages, 327 KiB  
Communication
Data Cloning Estimation and Identification of a Medium-Scale DSGE Model
by Pedro Chaim and Márcio Poletti Laurini
Stats 2023, 6(1), 17-29; https://doi.org/10.3390/stats6010002 - 24 Dec 2022
Cited by 1 | Viewed by 1444
Abstract
We apply the data cloning method to estimate a medium-scale dynamic stochastic general equilibrium model. The data cloning algorithm is a numerical method that employs replicas of the original sample to approximate the maximum likelihood estimator as the limit of Bayesian simulation-based estimators. [...] Read more.
We apply the data cloning method to estimate a medium-scale dynamic stochastic general equilibrium model. The data cloning algorithm is a numerical method that employs replicas of the original sample to approximate the maximum likelihood estimator as the limit of Bayesian simulation-based estimators. We also analyze the identification properties of the model. We measure the individual identification strength of each parameter by observing the posterior volatility of data cloning estimates and access the identification problem globally through the maximum eigenvalue of the posterior data cloning covariance matrix. Our results corroborate existing evidence suggesting that the DSGE model of Smeets and Wouters is only poorly identified. The model displays weak global identification properties, and many of its parameters seem locally ill-identified. Full article
(This article belongs to the Section Econometric Modelling)
Show Figures

Figure 1

16 pages, 410 KiB  
Article
A Semiparametric Tilt Optimality Model
by Chathurangi H. Pathiravasan and Bhaskar Bhattacharya
Stats 2023, 6(1), 1-16; https://doi.org/10.3390/stats6010001 - 22 Dec 2022
Viewed by 1279
Abstract
Practitioners often face the situation of comparing any set of k distributions, which may follow neither normality nor equality of variances. We propose a semiparametric model to compare those distributions using an exponential tilt method. This extends the classical analysis of variance models [...] Read more.
Practitioners often face the situation of comparing any set of k distributions, which may follow neither normality nor equality of variances. We propose a semiparametric model to compare those distributions using an exponential tilt method. This extends the classical analysis of variance models when all distributions are unknown by relaxing its assumptions. The proposed model is optimal when one of the distributions is known. Large-sample estimates of the model parameters are derived, and the hypotheses for the equality of the distributions are tested for one-at-a-time and simultaneous comparison cases. Real data examples from NASA meteorology experiments and social credit card limits are analyzed to illustrate our approach. The proposed approach is shown to be preferable in a simulated power comparison with existing parametric and nonparametric methods. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop