Sign in to use this feature.

Years

Between: -

Article Types

Countries / Regions

Search Results (202)

Search Parameters:
Journal = Stats

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
Article
One-Parameter Weibull-Type Distribution, Its Relative Entropy with Respect to Weibull and a Fractional Two-Parameter Exponential Distribution
Stats 2019, 2(1), 34-54; https://doi.org/10.3390/stats2010004 - 21 Jan 2019
Cited by 3 | Viewed by 5180
Abstract
A new one-parameter distribution is presented with similar mathematical characteristics to the two parameter conventional Weibull. It has an estimator that only depends on the sample mean. The relative entropy with respect to the Weibull distribution is derived in order to examine the [...] Read more.
A new one-parameter distribution is presented with similar mathematical characteristics to the two parameter conventional Weibull. It has an estimator that only depends on the sample mean. The relative entropy with respect to the Weibull distribution is derived in order to examine the level of similarity between them. The performance of the new distribution is compared to the Weibull and in some cases the Gamma distribution using real data. In addition, the Exponential distribution is modified to include an extra parameter via a simple transformation using fractional mathematics. It will be shown that the modified version also exhibits Weibull characteristics for particular values of the second parameter. Full article
Show Figures

Figure 1

Article
The Prediction of Batting Averages in Major League Baseball
Stats 2020, 3(2), 84-93; https://doi.org/10.3390/stats3020008 - 03 Apr 2020
Cited by 4 | Viewed by 3226
Abstract
The prediction of yearly batting averages in Major League Baseball is a notoriously difficult problem where standard errors using the well-known PECOTA (Player Empirical Comparison and Optimization Test Algorithm) system are roughly 20 points. This paper considers the use of ball-by-ball data provided [...] Read more.
The prediction of yearly batting averages in Major League Baseball is a notoriously difficult problem where standard errors using the well-known PECOTA (Player Empirical Comparison and Optimization Test Algorithm) system are roughly 20 points. This paper considers the use of ball-by-ball data provided by the Statcast system in an attempt to predict batting averages. The publicly available Statcast data and resultant predictions supplement proprietary PECOTA forecasts. With detailed Statcast data, we attempt to account for a luck component involving batting averages. It is anticipated that the luck component will not be repeated in future seasons. The two predictions (Statcast and PECOTA) are combined via simple linear regression to provide improved forecasts of batting average. Full article
(This article belongs to the Section Data Science)
Show Figures

Figure 1

Article
Assessing the Impact of School Rules and Regulations on Students’ Perception Toward Promoting Good Behavior: Sabian Secondary School, Dire Dawa, Ethiopia
Stats 2019, 2(2), 202-211; https://doi.org/10.3390/stats2020015 - 04 Apr 2019
Viewed by 2869
Abstract
Discipline is an important component of human behavior, and one could assert that without it, an organization cannot function well toward the achievement of its goals. The aim of this study was to assess the impact of school rules and regulations on students’ [...] Read more.
Discipline is an important component of human behavior, and one could assert that without it, an organization cannot function well toward the achievement of its goals. The aim of this study was to assess the impact of school rules and regulations on students’ perception toward promoting good behavior. The data were obtained from 438 respondents through a mailed questionnaire instrument. The data were tabulated, and Pearson’s chi-square test was applied for inferential analysis. Around 33.1% of the students had a negative perception of school rules and regulations about promoting good behavior, whereas 66.9% of them had a positive perception. A p-value of 0.015 (<5% significance level) indicated that there is a significant association between students’ awareness on school rules and regulations and their perception toward promoting good behavior. Students’ attitudes on school rules and regulations and perception toward promoting good behavior were statistically associated at a p-value of 0.012. Parents’ educational levels had a significant effect on students’ perception toward promoting good behavior. Generally, students’ awareness on school rules and regulations, parents’ education levels, civics and ethical education scores, and students’ attitudes toward promoting good behavior were found as significant effects on perception toward promoting good behavior. Full article
Article
On the Mistaken Use of the Chi-Square Test in Benford’s Law
Stats 2021, 4(2), 419-453; https://doi.org/10.3390/stats4020027 - 28 May 2021
Cited by 5 | Viewed by 2072
Abstract
Benford’s Law predicts that the first significant digit on the leftmost side of numbers in real-life data is distributed between all possible 1 to 9 digits approximately as in LOG(1 + 1/digit), so that low digits occur much more frequently than high digits [...] Read more.
Benford’s Law predicts that the first significant digit on the leftmost side of numbers in real-life data is distributed between all possible 1 to 9 digits approximately as in LOG(1 + 1/digit), so that low digits occur much more frequently than high digits in the first place. Typically researchers, data analysts, and statisticians, rush to apply the chi-square test in order to verify compliance or deviation from this statistical law. In almost all cases of real-life data this approach is mistaken and without mathematical-statistics basis, yet it had become a dogma or rather an impulsive ritual in the field of Benford’s Law to apply the chi-square test for whatever data set the researcher is considering, regardless of its true applicability. The mistaken use of the chi-square test has led to much confusion and many errors, and has done a lot in general to undermine trust and confidence in the whole discipline of Benford’s Law. This article is an attempt to correct course and bring rationality and order to a field which had demonstrated harmony and consistency in all of its results, manifestations, and explanations. The first research question of this article demonstrates that real-life data sets typically do not arise from random and independent selections of data points from some larger universe of parental data as the chi-square approach supposes, and this conclusion is arrived at by examining how several real-life data sets are formed and obtained. The second research question demonstrates that the chi-square approach is actually all about the reasonableness of the random selection process and the Benford status of that parental universe of data and not solely about the Benford status of the data set under consideration, since the focus of the chi-square test is exclusively on whether the entire process of data selection was probable or too rare. In addition, a comparison of the chi-square statistic with the Sum of Squared Deviations (SSD) measure of distance from Benford is explored in this article, pitting one measure against the other, and concluding with a strong preference for the SSD measure. Full article
(This article belongs to the Special Issue Benford's Law(s) and Applications)
Show Figures

Figure 1

Article
A Nonparametric Statistical Approach to Content Analysis of Items
Stats 2018, 1(1), 1-13; https://doi.org/10.3390/stats1010001 - 01 Feb 2018
Cited by 1 | Viewed by 2059
Abstract
In order to use psychometric instruments to assess a multidimensional construct, we may decompose it into dimensions and, in order to assess each dimension, develop a set of items, so one may assess the construct as a whole, by assessing its dimensions. In [...] Read more.
In order to use psychometric instruments to assess a multidimensional construct, we may decompose it into dimensions and, in order to assess each dimension, develop a set of items, so one may assess the construct as a whole, by assessing its dimensions. In this scenario, content analysis of items aims to verify if the developed items are assessing the dimension they are supposed to by requesting the judgement of specialists in the studied construct about the dimension that the developed items assess. This paper aims to develop a nonparametric statistical approach based on the Cochran’s Q test to analyse the content of items in order to present a practical method to assess the consistency of the content analysis process; this is achieved by the development of a statistical test that seeks to determine if all the specialists have the same capability to judge the items. A simulation study is conducted to check the consistency of the test and it is applied to a real validation process. Full article
Article
Cronbach’s Alpha under Insufficient Effort Responding: An Analytic Approach
Stats 2019, 2(1), 1-14; https://doi.org/10.3390/stats2010001 - 20 Dec 2018
Cited by 11 | Viewed by 2021
Abstract
Surveys commonly suffer from insufficient effort responding (IER). If not accounted for, IER can cause biases and lead to false conclusions. In particular, Cronbach’s alpha has been empirically observed to either deflate or inflate due to IER. This paper will elucidate how IER [...] Read more.
Surveys commonly suffer from insufficient effort responding (IER). If not accounted for, IER can cause biases and lead to false conclusions. In particular, Cronbach’s alpha has been empirically observed to either deflate or inflate due to IER. This paper will elucidate how IER impacts Cronbach’s alpha in a variety of situations. Previous results concerning internal consistency under mixture models are extended to obtain a characterization of Cronbach’s alpha in terms of item validities, average variances, and average covariances. The characterization is then applied to contaminating distributions representing various types of IER. The discussion will provide commentary on previous simulation-based investigations, confirming some previous hypotheses for the common types of IER, but also revealing possibilities from newly considered responding patterns. Specifically, it is possible that the bias can change from negative to positive (and vice versa) as the proportion of contamination increases. Full article
Show Figures

Figure 1

Article
Local Processing of Massive Databases with R: A National Analysis of a Brazilian Social Programme
Stats 2020, 3(4), 444-464; https://doi.org/10.3390/stats3040028 - 19 Oct 2020
Cited by 2 | Viewed by 1912
Abstract
The analysis of massive databases is a key issue for most applications today and the use of parallel computing techniques is one of the suitable approaches for that. Apache Spark is a widely employed tool within this context, aiming at processing large amounts [...] Read more.
The analysis of massive databases is a key issue for most applications today and the use of parallel computing techniques is one of the suitable approaches for that. Apache Spark is a widely employed tool within this context, aiming at processing large amounts of data in a distributed way. For the Statistics community, R is one of the preferred tools. Despite its growth in the last years, it still has limitations for processing large volumes of data in single local machines. In general, the data analysis community has difficulty to handle a massive amount of data on local machines, often requiring high-performance computing servers. One way to perform statistical analyzes over massive databases is combining both tools (Spark and R) via the sparklyr package, which allows for an R application to use Spark. This paper presents an analysis of Brazilian public data from the Bolsa Família Programme (BFP—conditional cash transfer), comprising a large data set with 1.26 billion observations. Our goal was to understand how this social program acts in different cities, as well as to identify potentially important variables reflecting its utilization rate. Statistical modeling was performed using random forest to predict the utilization rated of BFP. Variable selection was performed through a recent method based on the importance and interpretation of variables in the random forest model. Among the 89 variables initially considered, the final model presented a high predictive performance capacity with 17 selected variables, as well as indicated high importance of some variables for the observed utilization rate in income, education, job informality, and inactive youth, namely: family income, education, occupation and density of people in the homes. In this work, using a local machine, we highlighted the potential of aggregating Spark and R for analysis of a large database of 111.6 GB. This can serve as proof of concept or reference for other similar works within the Statistics community, as well as our case study can provide important evidence for further analysis of this important social support programme. Full article
(This article belongs to the Section Data Science)
Show Figures

Figure 1

Article
Improving Access to Justice with Legal Chatbots
Stats 2020, 3(3), 356-375; https://doi.org/10.3390/stats3030023 - 04 Sep 2020
Cited by 3 | Viewed by 1897
Abstract
On average, one in three Canadians will be affected by a legal problem over a three-year period. Unfortunately, whether it is legal representation or legal advice, the very high cost of these services excludes disadvantaged and most vulnerable people, forcing them to represent [...] Read more.
On average, one in three Canadians will be affected by a legal problem over a three-year period. Unfortunately, whether it is legal representation or legal advice, the very high cost of these services excludes disadvantaged and most vulnerable people, forcing them to represent themselves. For these people, accessing legal information is therefore critical. In this work, we attempt to tackle this problem by embedding legal data in a conversational interface. We introduce two dialog systems (chatbots) created to provide legal information. The first one, based on data from the Government of Canada, deals with immigration issues, while the second one informs bank employees about legal issues related to their job tasks. Both chatbots rely on various representations and classification algorithms, from mature techniques to novel advances in the field. The chatbot dedicated to immigration issues is shared with the research community as an open resource project. Full article
Show Figures

Figure 1

Article
Generalized Mutual Information
Stats 2020, 3(2), 158-165; https://doi.org/10.3390/stats3020013 - 10 Jun 2020
Cited by 2 | Viewed by 1889
Abstract
Mutual information is one of the essential building blocks of information theory. It is however only finitely defined for distributions in a subclass of the general class of all distributions on a joint alphabet. The unboundedness of mutual information prevents its potential utility [...] Read more.
Mutual information is one of the essential building blocks of information theory. It is however only finitely defined for distributions in a subclass of the general class of all distributions on a joint alphabet. The unboundedness of mutual information prevents its potential utility from being extended to the general class. This is in fact a void in the foundation of information theory that needs to be filled. This article proposes a family of generalized mutual information whose members are indexed by a positive integer n, with the nth member being the mutual information of nth order. The mutual information of the first order coincides with Shannon’s, which may or may not be finite. It is however established (a) that each mutual information of an order greater than 1 is finitely defined for all distributions of two random elements on a joint countable alphabet, and (b) that each and every member of the family enjoys all the utilities of a finite Shannon’s mutual information. Full article
Article
INARMA Modeling of Count Time Series
Stats 2019, 2(2), 284-320; https://doi.org/10.3390/stats2020022 - 03 Jun 2019
Cited by 6 | Viewed by 1867
Abstract
While most of the literature about INARMA models (integer-valued autoregressive moving-average) concentrates on the purely autoregressive INAR models, we consider INARMA models that also include a moving-average part. We study moment properties and show how to efficiently implement maximum likelihood estimation. We analyze [...] Read more.
While most of the literature about INARMA models (integer-valued autoregressive moving-average) concentrates on the purely autoregressive INAR models, we consider INARMA models that also include a moving-average part. We study moment properties and show how to efficiently implement maximum likelihood estimation. We analyze the estimation performance and consider the topic of model selection. We also analyze the consequences of choosing an inadequate model for the given count process. Two real-data examples are presented for illustration. Full article
Show Figures

Figure 1

Article
Application of the Modified Shepard’s Method (MSM): A Case Study with the Interpolation of Neogene Reservoir Variables in Northern Croatia
Stats 2020, 3(1), 68-83; https://doi.org/10.3390/stats3010007 - 23 Mar 2020
Cited by 8 | Viewed by 1816
Abstract
Interpolation is a procedure that depends on the spatial and/or statistical properties of the analysed variable(s). It is a particularly challenging task for small datasets, such as in those with less than 20 points of data. This problem is common in subsurface geological [...] Read more.
Interpolation is a procedure that depends on the spatial and/or statistical properties of the analysed variable(s). It is a particularly challenging task for small datasets, such as in those with less than 20 points of data. This problem is common in subsurface geological mapping, i.e., in cases where the data is taken solely from wells. Successful solutions of such mapping problems depend on interpolation methods designed primarily for small datasets and the datasets themselves. Here, we compare two methods, Inverse Distance Weighting and the Modified Shepard’s Method, and apply them to three variables (porosity, permeability, and thickness) measured in the Neogene sandstone hydrocarbon reservoirs (northern Croatia). The results show that cross-validation itself will not provide appropriate map selection, but, in combination with geometrical features, it can help experts eliminate the solutions with low-probable structures/shapes. The Golden Software licensed program Surfer 15 was used for the interpolations in this study. Full article
Show Figures

Figure 1

Article
Computing Happiness from Textual Data
Stats 2019, 2(3), 347-370; https://doi.org/10.3390/stats2030025 - 03 Jul 2019
Cited by 1 | Viewed by 1761
Abstract
In this paper, we use a corpus of about 100,000 happy moments written by people of different genders, marital statuses, parenthood statuses, and ages to explore the following questions: Are there differences between men and women, married and unmarried individuals, parents and non-parents, [...] Read more.
In this paper, we use a corpus of about 100,000 happy moments written by people of different genders, marital statuses, parenthood statuses, and ages to explore the following questions: Are there differences between men and women, married and unmarried individuals, parents and non-parents, and people of different age groups in terms of their causes of happiness and how they express happiness? Can gender, marital status, parenthood status and/or age be predicted from textual data expressing happiness? The first question is tackled in two steps: first, we transform the happy moments into a set of topics, lemmas, part of speech sequences, and dependency relations; then, we use each set as predictors in multi-variable binary and multinomial logistic regressions to rank these predictors in terms of their influence on each outcome variable (gender, marital status, parenthood status and age). For the prediction task, we use character, lexical, grammatical, semantic, and syntactic features in a machine learning document classification approach. The classification algorithms used include logistic regression, gradient boosting, and fastText. Our results show that textual data expressing moments of happiness can be quite beneficial in understanding the “causes of happiness” for different social groups, and that social characteristics like gender, marital status, parenthood status, and, to some extent age, can be successfully predicted form such textual data. This research aims to bring together elements from philosophy and psychology to be examined by computational corpus linguistics methods in a way that promotes the use of Natural Language Processing for the Humanities. Full article
Show Figures

Figure 1

Article
On Some Test Statistics for Testing the Regression Coefficients in Presence of Multicollinearity: A Simulation Study
Stats 2020, 3(1), 40-55; https://doi.org/10.3390/stats3010005 - 10 Mar 2020
Cited by 7 | Viewed by 1722
Abstract
Ridge regression is a popular method to solve the multicollinearity problem for both linear and non-linear regression models. This paper studied forty different ridge regression t-type tests of the individual coefficients of a linear regression model. A simulation study was conducted to [...] Read more.
Ridge regression is a popular method to solve the multicollinearity problem for both linear and non-linear regression models. This paper studied forty different ridge regression t-type tests of the individual coefficients of a linear regression model. A simulation study was conducted to evaluate the performance of the proposed tests with respect to their empirical sizes and powers under different settings. Our simulation results demonstrated that many of the proposed tests have type I error rates close to the 5% nominal level and, among those, all tests except one have considerable gain in powers over the standard ordinary least squares (OLS) t-type test. It was observed from our simulation results that seven tests based on some ridge estimators performed better than the rest in terms of achieving higher power gains while maintaining a 5% nominal size. Full article
(This article belongs to the Section Computational Statistics)
Show Figures

Figure 1

Article
Causality between Oil Prices and Tourist Arrivals
Stats 2018, 1(1), 134-154; https://doi.org/10.3390/stats1010010 - 20 Oct 2018
Cited by 3 | Viewed by 1647
Abstract
This paper investigates the causal relationship between oil price and tourist arrivals to further explain the impact of oil price volatility on tourism-related economic activities. The analysis itself considers the time domain, frequency domain, and information theory domain perspectives. Data relating to the [...] Read more.
This paper investigates the causal relationship between oil price and tourist arrivals to further explain the impact of oil price volatility on tourism-related economic activities. The analysis itself considers the time domain, frequency domain, and information theory domain perspectives. Data relating to the US and nine European countries are exploited in this paper with causality tests which include the time domain, frequency domain, and Convergent Cross Mapping (CCM). The CCM approach is nonparametric and therefore not restricted by assumptions. We contribute to existing research through the successful and introductory application of an advanced method and via the uncovering of significant causal links from oil prices to tourist arrivals. Full article
Show Figures

Figure 1

Article
A Noncentral Lindley Construction Illustrated in an INAR(1) Environment
Stats 2022, 5(1), 70-88; https://doi.org/10.3390/stats5010005 - 10 Jan 2022
Viewed by 1644
Abstract
This paper proposes a previously unconsidered generalization of the Lindley distribution by allowing for a measure of noncentrality. Essential structural characteristics are investigated and derived in explicit and tractable forms, and the estimability of the model is illustrated via the fit of this [...] Read more.
This paper proposes a previously unconsidered generalization of the Lindley distribution by allowing for a measure of noncentrality. Essential structural characteristics are investigated and derived in explicit and tractable forms, and the estimability of the model is illustrated via the fit of this developed model to real data. Subsequently, this model is used as a candidate for the parameter of a Poisson model, which allows for departure from the usual equidispersion restriction that the Poisson offers when modelling count data. This Poisson-noncentral Lindley is also systematically investigated and characteristics are derived. The value of this count model is illustrated and implemented as the count error distribution in an integer autoregressive environment, and juxtaposed against other popular models. The effect of the systematically-induced noncentrality parameter is illustrated and paves the way for future flexible modelling not only as a standalone contender in continuous Lindley-type scenarios but also in discrete and discrete time series scenarios when the often-encountered equidispersed assumption is not adhered to in practical data environments. Full article
(This article belongs to the Special Issue Modern Time Series Analysis)
Show Figures

Figure 1

Back to TopTop