Sign in to use this feature.

Years

Between: -

Article Types

Countries / Regions

Search Results (202)

Search Parameters:
Journal = Stats

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
Article
A Novel Generalization of Zero-Truncated Binomial Distribution by Lagrangian Approach with Applications for the COVID-19 Pandemic
Stats 2022, 5(4), 1004-1028; https://doi.org/10.3390/stats5040060 - 30 Oct 2022
Viewed by 88
Abstract
The importance of Lagrangian distributions and their applicability in real-world events have been highlighted in several studies. In light of this, we create a new zero-truncated Lagrangian distribution. It is presented as a generalization of the zero-truncated binomial distribution (ZTBD) and hence named [...] Read more.
The importance of Lagrangian distributions and their applicability in real-world events have been highlighted in several studies. In light of this, we create a new zero-truncated Lagrangian distribution. It is presented as a generalization of the zero-truncated binomial distribution (ZTBD) and hence named the Lagrangian zero-truncated binomial distribution (LZTBD). The moments, probability generating function, factorial moments, as well as skewness and kurtosis measures of the LZTBD are discussed. We also show that the new model’s finite mixture is identifiable. The unknown parameters of the LZTBD are estimated using the maximum likelihood method. A broad simulation study is executed as an evaluation of the well-established performance of the maximum likelihood estimates. The likelihood ratio test is used to assess the effectiveness of the third parameter in the new model. Six COVID-19 datasets are used to demonstrate the LZTBD’s applicability, and we conclude that the LZTBD is very competitive on the fitting objective. Full article
Article
Comparison of Positivity in Two Epidemic Waves of COVID-19 in Colombia with FDA
Stats 2022, 5(4), 993-1003; https://doi.org/10.3390/stats5040059 - 28 Oct 2022
Viewed by 39
Abstract
We use the functional data methodology to examine whether there are significant differences between two waves of contagion by COVID-19 in Colombia between 7 July 2020 and 20 July 2021. A pointwise functional t-test is initially used, then an alternative statistical test [...] Read more.
We use the functional data methodology to examine whether there are significant differences between two waves of contagion by COVID-19 in Colombia between 7 July 2020 and 20 July 2021. A pointwise functional t-test is initially used, then an alternative statistical test proposal for paired samples is presented, which has a theoretical distribution and performs well in small samples. Our statistical test generates a scalar p-value, which provides a global idea about the significance of the positivity curves, complementing the existing punctual tests, as an advantage. Full article
(This article belongs to the Section Applied Stochastic Models)
Show Figures

Figure 1

Article
Snooker Statistics and Zipf’s Law
Stats 2022, 5(4), 985-992; https://doi.org/10.3390/stats5040058 - 21 Oct 2022
Viewed by 226
Abstract
Zipf’s law is well known in linguistics: the frequency of a word is inversely proportional to its rank. This is a special case of a more general power law, a common phenomenon in many kinds of real-world statistical data. Here, it is shown [...] Read more.
Zipf’s law is well known in linguistics: the frequency of a word is inversely proportional to its rank. This is a special case of a more general power law, a common phenomenon in many kinds of real-world statistical data. Here, it is shown that snooker statistics also follow such a mathematical pattern, but with varying parameter values. Two types of rankings (prize money earned and centuries scored), and three different time frames (all-time, decade, and year) are considered. The results indicate that the power law parameter values depend on the type of ranking used, as well as the time frame considered. Furthermore, in some cases, the resulting parameter values vary significantly over time, for which a plausible explanation is provided. Finally, it is shown how individual rankings can be described somewhat more accurately using a log-normal distribution, but that the overall conclusions derived from the power law analysis remain valid. Full article
(This article belongs to the Section Data Science)
Show Figures

Figure 1

Article
Extreme Tail Ratios and Overrepresentation among Subpopulations with Normal Distributions
Stats 2022, 5(4), 977-984; https://doi.org/10.3390/stats5040057 - 20 Oct 2022
Viewed by 283
Abstract
Given several different populations, the relative proportions of each in the high (or low) end of the distribution of a given characteristic are often more important than the overall average values or standard deviations. In the case of two different normally-distributed random variables, [...] Read more.
Given several different populations, the relative proportions of each in the high (or low) end of the distribution of a given characteristic are often more important than the overall average values or standard deviations. In the case of two different normally-distributed random variables, as is shown here, one of the (right) tail ratios will not only eventually be greater than 1 from some point on, but will even become infinitely large. More generally, in every finite mixture of different normal distributions, there will always be exactly one of those distributions that is not only overrepresented in the right tail of the mixture but even completely overwhelms all other subpopulations in the rightmost tails. This property (and the analogous result for the left tails), although not unique to normal distributions, is not shared by other common continuous centrally symmetric unimodal distributions, such as Laplace, nor even by other bell-shaped distributions, such as Cauchy (Lorentz) distributions. Full article
Show Figures

Figure 1

Communication
Ordinal Cochran-Mantel-Haenszel Testing and Nonparametric Analysis of Variance: Competing Methodologies
Stats 2022, 5(4), 970-976; https://doi.org/10.3390/stats5040056 - 17 Oct 2022
Viewed by 176
Abstract
The Cochran-Mantel-Haenszel (CMH) and nonparametric analysis of variance (NP ANOVA) methodologies are both sets of tests for categorical response data. The latter are competitor tests for the ordinal CMH tests in which the response variable is necessarily ordinal; the treatment variable may be [...] Read more.
The Cochran-Mantel-Haenszel (CMH) and nonparametric analysis of variance (NP ANOVA) methodologies are both sets of tests for categorical response data. The latter are competitor tests for the ordinal CMH tests in which the response variable is necessarily ordinal; the treatment variable may be either ordinal or nominal. The CMH mean score test seeks to detect mean treatment differences, while the CMH correlation test assesses ordinary or (1, 1) generalized correlation. Since the corresponding nonparametric ANOVA tests assess arbitrary univariate and bivariate moments, the ordinal CMH tests have been extended to enable a fuller comparison. The CMH tests are conditional tests, assuming that certain marginal totals in the data table are known. They have been extended to have unconditional analogues. The NP ANOVA tests are unconditional. Here, we give a brief overview of both methodologies to address the question “which methodology is preferable?”. Full article
(This article belongs to the Section Statistical Methods)
Article
On the Bivariate Composite Gumbel–Pareto Distribution
Stats 2022, 5(4), 948-969; https://doi.org/10.3390/stats5040055 - 16 Oct 2022
Viewed by 156
Abstract
In this paper, we propose a bivariate extension of univariate composite (two-spliced) distributions defined by a bivariate Pareto distribution for values larger than some thresholds and by a bivariate Gumbel distribution on the complementary domain. The purpose of this distribution is to capture [...] Read more.
In this paper, we propose a bivariate extension of univariate composite (two-spliced) distributions defined by a bivariate Pareto distribution for values larger than some thresholds and by a bivariate Gumbel distribution on the complementary domain. The purpose of this distribution is to capture the behavior of bivariate data consisting of mainly small and medium values but also of some extreme values. Some properties of the proposed distribution are presented. Further, two estimation procedures are discussed and illustrated on simulated data and on a real data set consisting of a bivariate sample of claims from an auto insurance portfolio. In addition, the risk of loss in this insurance portfolio is estimated by Monte Carlo simulation. Full article
Article
Benford Networks
Stats 2022, 5(4), 934-947; https://doi.org/10.3390/stats5040054 - 30 Sep 2022
Viewed by 238
Abstract
The Benford law applied within complex networks is an interesting area of research. This paper proposes a new algorithm for the generation of a Benford network based on priority rank, and further specifies the formal definition. The condition to be taken into account [...] Read more.
The Benford law applied within complex networks is an interesting area of research. This paper proposes a new algorithm for the generation of a Benford network based on priority rank, and further specifies the formal definition. The condition to be taken into account is the probability density of the node degree. In addition to this first algorithm, an iterative algorithm is proposed based on rewiring. Its development requires the introduction of an ad hoc measure for understanding how far an arbitrary network is from a Benford network. The definition is a semi-distance and does not lead to a distance in mathematical terms, instead serving to identify the Benford network as a class. The semi-distance is a function of the network; it is computationally less expensive than the degree of conformity and serves to set a descent condition for the rewiring. The algorithm stops when it meets the condition that either the network is Benford or the maximum number of iterations is reached. The second condition is needed because only a limited set of densities allow for a Benford network. Another important topic is assortativity and the extremes which can be achieved by constraining the network topology; for this reason, we ran simulations on artificial networks and explored further theoretical settings as preliminary work on models of preferential attachment. Based on our extensive analysis, the first proposed algorithm remains the best one from a computational point of view. Full article
(This article belongs to the Special Issue Benford's Law(s) and Applications)
Show Figures

Figure 1

Article
Robust Permutation Tests for Penalized Splines
Stats 2022, 5(3), 916-933; https://doi.org/10.3390/stats5030053 - 16 Sep 2022
Viewed by 378
Abstract
Penalized splines are frequently used in applied research for understanding functional relationships between variables. In most applications, statistical inference for penalized splines is conducted using the random effects or Bayesian interpretation of a smoothing spline. These interpretations can be used to assess the [...] Read more.
Penalized splines are frequently used in applied research for understanding functional relationships between variables. In most applications, statistical inference for penalized splines is conducted using the random effects or Bayesian interpretation of a smoothing spline. These interpretations can be used to assess the uncertainty of the fitted values and the estimated component functions. However, statistical tests about the nature of the function are more difficult, because such tests often involve testing a null hypothesis that a variance component is equal to zero. Furthermore, valid statistical inference using the random effects or Bayesian interpretation depends on the validity of the utilized parametric assumptions. To overcome these limitations, I propose a flexible and robust permutation testing framework for inference with penalized splines. The proposed approach can be used to test omnibus hypotheses about functional relationships, as well as more flexible hypotheses about conditional relationships. I establish the conditions under which the methods will produce exact results, as well as the asymptotic behavior of the various permutation tests. Additionally, I present extensive simulation results to demonstrate the robustness and superiority of the proposed approach compared to commonly used methods. Full article
Show Figures

Graphical abstract

Article
Smoothing County-Level Sampling Variances to Improve Small Area Models’ Outputs
Stats 2022, 5(3), 898-915; https://doi.org/10.3390/stats5030052 - 11 Sep 2022
Viewed by 404
Abstract
The use of hierarchical Bayesian small area models, which take survey estimates along with auxiliary data as input to produce official statistics, has increased in recent years. Survey estimates for small domains are usually unreliable due to small sample sizes, and the corresponding [...] Read more.
The use of hierarchical Bayesian small area models, which take survey estimates along with auxiliary data as input to produce official statistics, has increased in recent years. Survey estimates for small domains are usually unreliable due to small sample sizes, and the corresponding sampling variances can also be imprecise and unreliable. This affects the performance of the model (i.e., the model will not produce an estimate or will produce a low-quality modeled estimate), which results in a reduced number of official statistics published by a government agency. To mitigate the unreliable sampling variances, these survey-estimated variances are typically modeled against the direct estimates wherever a relationship between the two is present. However, this is not always the case. This paper explores different alternatives to mitigate the unreliable (beyond some threshold) sampling variances. A Bayesian approach under the area-level model set-up and a distribution-free technique based on bootstrap sampling are proposed to update the survey data. An application to the county-level corn yield data from the County Agricultural Production Survey of the United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) is used to illustrate the proposed approaches. The final county-level model-based estimates for small area domains, produced based on updated survey data from each method, are compared with county-level model-based estimates produced based on the original survey data and the official statistics published in 2016. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

Project Report
Using Small Area Estimation to Produce Official Statistics
Stats 2022, 5(3), 881-897; https://doi.org/10.3390/stats5030051 - 08 Sep 2022
Cited by 1 | Viewed by 454
Abstract
The USDA National Agricultural Statistics Service (NASS) and other federal statistical agencies have used probability-based surveys as the foundation for official statistics for over half a century. Non-survey data that can be used to improve the accuracy and precision of estimates such as [...] Read more.
The USDA National Agricultural Statistics Service (NASS) and other federal statistical agencies have used probability-based surveys as the foundation for official statistics for over half a century. Non-survey data that can be used to improve the accuracy and precision of estimates such as administrative, remotely sensed, and retail data have become increasingly available. Both frequentist and Bayesian models are used to combine survey and non-survey data in a principled manner. NASS has recently adopted Bayesian subarea models for three of its national programs: farm labor, crop county estimates, and cash rent county estimates. Each program provides valuable estimates at multiple scales of geography. For each program, technical challenges had to be met and a strenuous review completed before models could be adopted as the foundation for official statistics. Moving models out of the research phase into production required major changes in the production process and a cultural shift. With the implemented models, NASS now has measures of uncertainty, transparency, and reproducibility of its official statistics. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

Article
Modeling Realized Variance with Realized Quarticity
Stats 2022, 5(3), 856-880; https://doi.org/10.3390/stats5030050 - 07 Sep 2022
Viewed by 308
Abstract
This paper proposes a model for realized variance that exploits information in realized quarticity. The realized variance and quarticity measures are both highly persistent and highly correlated with each other. The proposed model incorporates information from the observed realized quarticity process via autoregressive [...] Read more.
This paper proposes a model for realized variance that exploits information in realized quarticity. The realized variance and quarticity measures are both highly persistent and highly correlated with each other. The proposed model incorporates information from the observed realized quarticity process via autoregressive conditional variance dynamics. It exploits conditional dependence in higher order (fourth) moments in analogy to the class of GARCH models exploit conditional dependence in second moments. Full article
(This article belongs to the Special Issue Modern Time Series Analysis)
Show Figures

Figure 1

Article
A New Benford Test for Clustered Data with Applications to American Elections
Stats 2022, 5(3), 841-855; https://doi.org/10.3390/stats5030049 - 31 Aug 2022
Viewed by 603
Abstract
A frequent problem with classic first digit applications of Benford’s law is the law’s inapplicability to clustered data, which becomes especially problematic for analyzing election data. This study offers a novel adaptation of Benford’s law by performing a first digit analysis after converting [...] Read more.
A frequent problem with classic first digit applications of Benford’s law is the law’s inapplicability to clustered data, which becomes especially problematic for analyzing election data. This study offers a novel adaptation of Benford’s law by performing a first digit analysis after converting vote counts from election data to base 3 (referred to throughout the paper as 1-BL 3), spreading out the data and thus rendering the law significantly more useful. We test the efficacy of our approach on synthetic election data using discrete Weibull modeling, finding in many cases that election data often conforms to 1-BL 3. Lastly, we apply 1-BL 3 analysis to selected states from the 2004 US Presidential election to detect potential statistical anomalies. Full article
(This article belongs to the Special Issue Benford's Law(s) and Applications)
Show Figures

Figure 1

Article
A New Bivariate INAR(1) Model with Time-Dependent Innovation Vectors
Stats 2022, 5(3), 819-840; https://doi.org/10.3390/stats5030048 - 19 Aug 2022
Viewed by 344
Abstract
Recently, there has been a growing interest in integer-valued time series models, especially in multivariate models. Motivated by the diversity of the infinite-patch metapopulation models, we propose an extension to the popular bivariate INAR(1) model, whose innovation vector is assumed to be time-dependent [...] Read more.
Recently, there has been a growing interest in integer-valued time series models, especially in multivariate models. Motivated by the diversity of the infinite-patch metapopulation models, we propose an extension to the popular bivariate INAR(1) model, whose innovation vector is assumed to be time-dependent in the sense that the mean of the innovation vector is linearly increased by the previous population size. We discuss the stationarity and ergodicity of the observed process and its subprocesses. We consider the conditional maximum likelihood estimate of the parameters of interest, and establish their large-sample properties. The finite sample performance of the estimator is assessed via simulations. Applications on crime data illustrate the model. Full article
(This article belongs to the Section Time Series Analysis)
Show Figures

Figure 1

Article
Deriving the Optimal Strategy for the Two Dice Pig Game via Reinforcement Learning
Stats 2022, 5(3), 805-818; https://doi.org/10.3390/stats5030047 - 17 Aug 2022
Viewed by 400
Abstract
Games of chance have historically played a critical role in the development and teaching of probability theory and game theory, and, in the modern age, computer programming and reinforcement learning. In this paper, we derive the optimal strategy for playing the two-dice game [...] Read more.
Games of chance have historically played a critical role in the development and teaching of probability theory and game theory, and, in the modern age, computer programming and reinforcement learning. In this paper, we derive the optimal strategy for playing the two-dice game Pig, both the standard version and its variant with doubles, coined “Double-Trouble”, using certain fundamental concepts of reinforcement learning, especially the Markov decision process and dynamic programming. We further compare the newly derived optimal strategy to other popular play strategies in terms of the winning chances and the order of play. In particular, we compare to the popular “hold at n” strategy, which is considered to be close to the optimal strategy, especially for the best n, for each type of Pig Game. For the standard two-player, two-dice, sequential Pig Game examined here, we found that “hold at 23” is the best choice, with the average winning chance against the optimal strategy being 0.4747. For the “Double-Trouble” version, we found that the “hold at 18” is the best choice, with the average winning chance against the optimal strategy being 0.4733. Furthermore, time in terms of turns to play each type of game is also examined for practical purposes. For optimal vs. optimal or optimal vs. the best “hold at n” strategy, we found that the average number of turns is 19, 23, and 24 for one-die Pig, standard two-dice Pig, and the “Double-Trouble” two-dice Pig games, respectively. We hope our work will inspire students of all ages to invest in the field of reinforcement learning, which is crucial for the development of artificial intelligence and robotics and, subsequently, for the future of humanity. Full article
(This article belongs to the Special Issue Feature Paper Special Issue: Reinforcement Learning)
Show Figures

Figure 1

Article
Autoregressive Models with Time-Dependent Coefficients—A Comparison between Several Approaches
Stats 2022, 5(3), 784-804; https://doi.org/10.3390/stats5030046 - 12 Aug 2022
Viewed by 381
Abstract
Autoregressive-moving average (ARMA) models with time-dependent (td) coefficients and marginally heteroscedastic innovations provide a natural alternative to stationary ARMA models. Several theories have been developed in the last 25 years for parametric estimations in that context. In this paper, we focus on time-dependent [...] Read more.
Autoregressive-moving average (ARMA) models with time-dependent (td) coefficients and marginally heteroscedastic innovations provide a natural alternative to stationary ARMA models. Several theories have been developed in the last 25 years for parametric estimations in that context. In this paper, we focus on time-dependent autoregressive (tdAR) models and consider one of the estimation theories in that case. We also provide an alternative theory for tdAR processes that relies on a ρ-mixing property. We compare these theories to the Dahlhaus theory for locally stationary processes and the Bibi and Francq theory, made essentially for cyclically time-dependent models, with our own theory. Regarding existing theories, there are differences in the basic assumptions (e.g., on derivability with respect to time or with respect to parameters) that are better seen in specific cases such as the tdAR(1) process. There are also differences in terms of asymptotics, as shown by an example. Our opinion is that the field of application can play a role in choosing one of the theories. This paper is completed by simulation results that show that the asymptotic theory can be used even for short series (less than 50 observations). Full article
(This article belongs to the Section Time Series Analysis)
Show Figures

Figure 1

Back to TopTop