Statistics, Analytics, and Inferences for Discrete Data

A special issue of Stats (ISSN 2571-905X).

Deadline for manuscript submissions: closed (30 November 2024) | Viewed by 24552

Special Issue Editor


E-Mail Website
Guest Editor
Department of Operations, Business Analytics, and Information Systems, Lindner College of Business, University of Cincinnati, Cincinnati, OH 45221, USA
Interests: meta-analysis and discrete data analysis

Special Issue Information

Dear Colleagues,

Discrete data analysis concerns statistics, analytics, and inferences specifically designed for discretely measured data such as binary data, ordinal data, multinomial data, count data, and grouped continuous data. Discrete data are present in almost all research areas, and they are particularly common in biomedical sciences, social sciences, and business fields. The development of statistical tools for discrete data has played an instrumental role in drawing meaningful conclusions in clinical trials, oncology research, psychology studies, marketing, and e-commerce strategies. Due to the discrete nature of the data, however, fundamental statistical questions remain unresolved or not fully addressed. The lack of appropriate statistical tools is amplified by the advancement in information technology, social media, precision-X, and other emerging areas of practice and research.

This Special Issue calls for research papers devoted to the development of theories, methods, and applications for discrete data analysis. In particular, the authors are encouraged to develop the following:

  • New analytic tools including measures, statistics, and visualization methods;
  • New descriptive and predictive models as well as their assessment and diagnostics;
  • Novel applications of existing methods to emerging problems in subject domains and industries;
  • Reviews of current statistical practice in subject domains and industries with in-depth discussions on its limitations and pitfalls.

Dr. Dungang Liu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Stats is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 476 KiB  
Article
A Comparison of Limited Information Estimation Methods for the Two-Parameter Normal-Ogive Model with Locally Dependent Items
by Alexander Robitzsch
Stats 2024, 7(3), 576-591; https://doi.org/10.3390/stats7030035 - 21 Jun 2024
Cited by 1 | Viewed by 1168
Abstract
The two-parameter normal-ogive (2PNO) model is one of the most popular item response theory (IRT) models for analyzing dichotomous items. Consistent parameter estimation of the 2PNO model using marginal maximum likelihood estimation relies on the local independence assumption. However, the assumption of local [...] Read more.
The two-parameter normal-ogive (2PNO) model is one of the most popular item response theory (IRT) models for analyzing dichotomous items. Consistent parameter estimation of the 2PNO model using marginal maximum likelihood estimation relies on the local independence assumption. However, the assumption of local independence might be violated in practice. Likelihood-based estimation of the local dependence structure is often computationally demanding. Moreover, many IRT models that model local dependence do not have a marginal interpretation of item parameters. In this article, limited information estimation methods are reviewed that allow the convenient and straightforward handling of local dependence in estimating the 2PNO model. In detail, pairwise likelihood, weighted least squares, and normal-ogive harmonic analysis robust method (NOHARM) estimation are compared with marginal maximum likelihood estimation that ignores local dependence. A simulation study revealed that item parameters can be consistently estimated with limited information methods. At the same time, marginal maximum likelihood estimation resulted in biased item parameter estimates in the presence of local dependence. From a practical perspective, there were only minor differences regarding the statistical quality of item parameter estimates of the different estimation methods. Differences between the estimation methods are also compared for two empirical datasets. Full article
(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)
Show Figures

Figure 1

15 pages, 380 KiB  
Article
On Underdispersed Count Kernels for Smoothing Probability Mass Functions
by Célestin C. Kokonendji, Sobom M. Somé, Youssef Esstafa and Marcelo Bourguignon
Stats 2023, 6(4), 1226-1240; https://doi.org/10.3390/stats6040076 - 4 Nov 2023
Cited by 2 | Viewed by 1884
Abstract
Only a few count smoothers are available for the widespread use of discrete associated kernel estimators, and their constructions lack systematic approaches. This paper proposes the mean dispersion technique for building count kernels. It is only applicable to count distributions that exhibit the [...] Read more.
Only a few count smoothers are available for the widespread use of discrete associated kernel estimators, and their constructions lack systematic approaches. This paper proposes the mean dispersion technique for building count kernels. It is only applicable to count distributions that exhibit the underdispersion property, which ensures the convergence of the corresponding estimators. In addition to the well-known binomial and recent CoM-Poisson kernels, we introduce two new ones such the double Poisson and gamma-count kernels. Despite the challenging problem of obtaining explicit expressions, these kernels effectively smooth densities. Their good performances are pointed out from both numerical and comparative analyses, particularly for small and moderate sample sizes. The optimal tuning parameter is here investigated by integrated squared errors. Also, the added advantage of faster computation times is really very interesting. Thus, the overall accuracy of two newly suggested kernels appears to be between the two old ones. Finally, an application including a tail probability estimation on a real count data and some concluding remarks are given. Full article
(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)
Show Figures

Figure 1

13 pages, 743 KiB  
Article
Comparison between Two Algorithms for Computing the Weighted Generalized Affinity Coefficient in the Case of Interval Data
by Áurea Sousa, Osvaldo Silva, Leonor Bacelar-Nicolau, João Cabral and Helena Bacelar-Nicolau
Stats 2023, 6(4), 1082-1094; https://doi.org/10.3390/stats6040068 - 13 Oct 2023
Cited by 1 | Viewed by 1536
Abstract
From the affinity coefficient between two discrete probability distributions proposed by Matusita, Bacelar-Nicolau introduced the affinity coefficient in a cluster analysis context and extended it to different types of data, including for the case of complex and heterogeneous data within the scope of [...] Read more.
From the affinity coefficient between two discrete probability distributions proposed by Matusita, Bacelar-Nicolau introduced the affinity coefficient in a cluster analysis context and extended it to different types of data, including for the case of complex and heterogeneous data within the scope of symbolic data analysis (SDA). In this study, we refer to the most significant partitions obtained using the hierarchical cluster analysis (h.c.a.) of two well-known datasets that were taken from the literature on complex (symbolic) data analysis. h.c.a. is based on the weighted generalized affinity coefficient for the case of interval data and on probabilistic aggregation criteria from a VL parametric family. To calculate the values of this coefficient, two alternative algorithms were used and compared. Both algorithms were able to detect clusters of macrodata (aggregated data into groups of interest) that were consistent and consonant with those reported in the literature, but one performed better than the other in some specific cases. Moreover, both approaches allow for the treatment of large microdatabases (non-aggregated data) after their transformation into macrodata from the huge microdata. Full article
(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)
Show Figures

Figure 1

17 pages, 478 KiB  
Article
Comparing Robust Linking and Regularized Estimation for Linking Two Groups in the 1PL and 2PL Models in the Presence of Sparse Uniform Differential Item Functioning
by Alexander Robitzsch
Stats 2023, 6(1), 192-208; https://doi.org/10.3390/stats6010012 - 25 Jan 2023
Cited by 7 | Viewed by 2463
Abstract
In the social sciences, the performance of two groups is frequently compared based on a cognitive test involving binary items. Item response models are often utilized for comparing the two groups. However, the presence of differential item functioning (DIF) can impact group comparisons. [...] Read more.
In the social sciences, the performance of two groups is frequently compared based on a cognitive test involving binary items. Item response models are often utilized for comparing the two groups. However, the presence of differential item functioning (DIF) can impact group comparisons. In order to avoid the biased estimation of groups, appropriate statistical methods for handling differential item functioning are required. This article compares the performance-regularized estimation and several robust linking approaches in three simulation studies that address the one-parameter logistic (1PL) and two-parameter logistic (2PL) models, respectively. It turned out that robust linking approaches are at least as effective as the regularized estimation approach in most of the conditions in the simulation studies. Full article
(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)
Show Figures

Figure 1

14 pages, 540 KiB  
Article
Goodness-of-Fit and Generalized Estimating Equation Methods for Ordinal Responses Based on the Stereotype Model
by Daniel Fernández, Louise McMillan, Richard Arnold, Martin Spiess and Ivy Liu
Stats 2022, 5(2), 507-520; https://doi.org/10.3390/stats5020030 - 1 Jun 2022
Cited by 3 | Viewed by 3002
Abstract
Background: Data with ordinal categories occur in many diverse areas, but methodologies for modeling ordinal data lag severely behind equivalent methodologies for continuous data. There are advantages to using a model specifically developed for ordinal data, such as making fewer assumptions and [...] Read more.
Background: Data with ordinal categories occur in many diverse areas, but methodologies for modeling ordinal data lag severely behind equivalent methodologies for continuous data. There are advantages to using a model specifically developed for ordinal data, such as making fewer assumptions and having greater power for inference. Methods: The ordered stereotype model (OSM) is an ordinal regression model that is more flexible than the popular proportional odds ordinal model. The primary benefit of the OSM is that it uses numeric encoding of the ordinal response categories without assuming the categories are equally-spaced. Results: This article summarizes two recent advances in the OSM: (1) three novel tests to assess goodness-of-fit; (2) a new Generalized Estimating Equations approach to estimate the model for longitudinal studies. These methods use the new spacing of the ordinal categories indicated by the estimated score parameters of the OSM. Conclusions: The recent advances presented can be applied to several fields. We illustrate their use with the well-known arthritis clinical trial dataset. These advances fill a gap in methodologies available for ordinal responses and may be useful for practitioners in many applied fields. Full article
(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)
Show Figures

Figure 1

17 pages, 356 KiB  
Article
Bayesian Semiparametric Regression Analysis of Multivariate Panel Count Data
by Chunling Wang and Xiaoyan Lin
Stats 2022, 5(2), 477-493; https://doi.org/10.3390/stats5020028 - 10 May 2022
Viewed by 2379
Abstract
Panel count data often occur in a long-term recurrent event study, where the exact occurrence time of the recurrent events is unknown, but only the occurrence count between any two adjacent observation time points is recorded. Most traditional methods only handle panel count [...] Read more.
Panel count data often occur in a long-term recurrent event study, where the exact occurrence time of the recurrent events is unknown, but only the occurrence count between any two adjacent observation time points is recorded. Most traditional methods only handle panel count data for a single type of event. In this paper, we propose a Bayesian semiparameteric approach to analyze panel count data for multiple types of events. For each type of recurrent event, the proportional mean model is adopted to model the mean count of the event, where its baseline mean function is approximated by monotone I-splines. The correlation between multiple types of events is modeled by common frailty terms and scale parameters. Unlike many frequentist estimating equation methods, our approach is based on the observed likelihood and makes no assumption on the relationship between the recurrent process and the observation process. Under the Poisson counting process assumption, we develop an efficient Gibbs sampler based on novel data augmentation for the Markov chain Monte Carlo sampling. Simulation studies show good estimation performance of the baseline mean functions and the regression coefficients; meanwhile, the importance of including the scale parameter to flexibly accommodate the correlation between events is also demonstrated. Finally, a skin cancer data example is fully analyzed to illustrate the proposed methods. Full article
(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)
Show Figures

Figure A1

14 pages, 399 KiB  
Article
ordinalbayes: Fitting Ordinal Bayesian Regression Models to High-Dimensional Data Using R
by Kellie J. Archer, Anna Eames Seffernick, Shuai Sun and Yiran Zhang
Stats 2022, 5(2), 371-384; https://doi.org/10.3390/stats5020021 - 15 Apr 2022
Cited by 2 | Viewed by 3045
Abstract
The stage of cancer is a discrete ordinal response that indicates the aggressiveness of disease and is often used by physicians to determine the type and intensity of treatment to be administered. For example, the FIGO stage in cervical cancer is based on [...] Read more.
The stage of cancer is a discrete ordinal response that indicates the aggressiveness of disease and is often used by physicians to determine the type and intensity of treatment to be administered. For example, the FIGO stage in cervical cancer is based on the size and depth of the tumor as well as the level of spread. It may be of clinical relevance to identify molecular features from high-throughput genomic assays that are associated with the stage of cervical cancer to elucidate pathways related to tumor aggressiveness, identify improved molecular features that may be useful for staging, and identify therapeutic targets. High-throughput RNA-Seq data and corresponding clinical data (including stage) for cervical cancer patients have been made available through The Cancer Genome Atlas Project (TCGA). We recently described penalized Bayesian ordinal response models that can be used for variable selection for over-parameterized datasets, such as the TCGA-CESC dataset. Herein, we describe our ordinalbayes R package, available from the Comprehensive R Archive Network (CRAN), which enhances the runjags R package by enabling users to easily fit cumulative logit models when the outcome is ordinal and the number of predictors exceeds the sample size, P>N, such as for TCGA and other high-throughput genomic data. We demonstrate the use of this package by applying it to the TCGA cervical cancer dataset. Our ordinalbayes package can be used to fit models to high-dimensional datasets, and it effectively performs variable selection. Full article
(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)
18 pages, 552 KiB  
Article
A Flexible Mixed Model for Clustered Count Data
by Darcy Steeg Morris and Kimberly F. Sellers
Stats 2022, 5(1), 52-69; https://doi.org/10.3390/stats5010004 - 7 Jan 2022
Cited by 3 | Viewed by 4449
Abstract
Clustered count data are commonly modeled using Poisson regression with random effects to account for the correlation induced by clustering. The Poisson mixed model allows for overdispersion via the nature of the within-cluster correlation, however, departures from equi-dispersion may also exist due to [...] Read more.
Clustered count data are commonly modeled using Poisson regression with random effects to account for the correlation induced by clustering. The Poisson mixed model allows for overdispersion via the nature of the within-cluster correlation, however, departures from equi-dispersion may also exist due to the underlying count process mechanism. We study the cross-sectional COM-Poisson regression model—a generalized regression model for count data in light of data dispersion—together with random effects for analysis of clustered count data. We demonstrate model flexibility of the COM-Poisson random intercept model, including choice of the random effect distribution, via simulated and real data examples. We find that COM-Poisson mixed models provide comparable model fit to well-known mixed models for associated special cases of clustered discrete data, and result in improved model fit for data with intermediate levels of over- or underdispersion in the count mechanism. Accordingly, the proposed models are useful for capturing dispersion not consistent with commonly used statistical models, and also serve as a practical diagnostic tool. Full article
(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)
Show Figures

Figure 1

18 pages, 3927 KiB  
Article
Smoothing in Ordinal Regression: An Application to Sensory Data
by Ejike R. Ugba, Daniel Mörlein and Jan Gertheiss
Stats 2021, 4(3), 616-633; https://doi.org/10.3390/stats4030037 - 21 Jul 2021
Cited by 4 | Viewed by 3338
Abstract
The so-called proportional odds assumption is popular in cumulative, ordinal regression. In practice, however, such an assumption is sometimes too restrictive. For instance, when modeling the perception of boar taint on an individual level, it turns out that, at least for some subjects, [...] Read more.
The so-called proportional odds assumption is popular in cumulative, ordinal regression. In practice, however, such an assumption is sometimes too restrictive. For instance, when modeling the perception of boar taint on an individual level, it turns out that, at least for some subjects, the effects of predictors (androstenone and skatole) vary between response categories. For more flexible modeling, we consider the use of a ‘smooth-effects-on-response penalty’ (SERP) as a connecting link between proportional and fully non-proportional odds models, assuming that parameters of the latter vary smoothly over response categories. The usefulness of SERP is further demonstrated through a simulation study. Besides flexible and accurate modeling, SERP also enables fitting of parameters in cases where the pure, unpenalized non-proportional odds model fails to converge. Full article
(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)
Show Figures

Figure 1

Back to TopTop