Next Article in Journal
Confidence Sets for Statistical Classification
Previous Article in Journal
INARMA Modeling of Count Time Series
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Bayes Inference for Ordinal Response with Latent Variable Approach

Department of Mathematical Sciences, University of Texas at El Paso, El Paso, TX 79968, USA
*
Author to whom correspondence should be addressed.
Current address: 500 W University Ave., El Paso, TX 79968, USA.
Stats 2019, 2(2), 321-331; https://doi.org/10.3390/stats2020023
Submission received: 18 May 2019 / Revised: 8 June 2019 / Accepted: 14 June 2019 / Published: 16 June 2019

Abstract

:
In this paper, we propose a Bayesian model for the analysis of categorical data with an ordered outcome. The method provides a latent variable approach with an informative prior transformed from a Dirichlet distribution for the boundary parameters. A simulation study is carried out to assess the performance of the methods under various settings of the data structure. Our method produces predictive accuracy over the conventional classification procedures. Real data are analyzed to demonstrate the efficiency of the proposed method.

1. Introduction

Over the past few decades, modeling and predicting of ordinal outcomes have become essential to study in various fields especially in social and economic sciences where the natural ordering data very commonly appear. For example, socioeconomic status is typically broken into three levels (high, middle, and low) to describe the three places into which a family or an individual may fall. The level of education consists of high school, bachelor’s, master’s, and doctoral degrees. These levels or classes can be viewed as ordinal variables, but with no scale or magnitude available between each order or category. Many such ordinal data and numerous methods of analysis have been introduced and discussed by a number of researchers: [1,2,3], just to name a few. One well-known method is polytomous ordinal logistic regression (POLR) or the cumulative logit model initially proposed by [4], later called the proportional odds model [2] since the same proportionality constant applies for all cumulative logits. For a multinomial response variable Z with possible J ordered categorical outcomes and the associated p-dimensional vector of covariates x , the cumulative probability for Z on x is given by:
P ( Z j | x ) = exp ( α j + x β ) 1 + exp ( α j + x β ) , j = 1 , 2 , , J 1 ,
or the cumulative logit form as:
log P ( Z j | x ) P ( Z > j | x ) = α j + x β , j = 1 , 2 , , J 1 ,
where α j is an unknown intercept parameter associated with the j th category and β = ( β 1 , β 2 , β p ) is the common vector of effect coefficients across the categories. The POLR models the cumulative probabilities P ( Z j ) rather than the specific category probabilities P ( Z = j ) as in the nominal logistic regression. Implementing nominal analysis on the ordinal data will lead to underestimation of the variation and considerable loss of information [1]. Other relevant approaches dealing with ordinal outcomes include adjacent-category logit and sequential logit models. A number of researchers have proposed various inference processes for the logistic regression or other methods by classical approaches in [2,5] or from Bayesian perspectives in [6,7]. A comprehensive review about the analysis of ordered category data can be found in [1]. Another commonly-adopted way of modeling ordinal data is via an underlying continuous latent variable. This regards the observed ordinal response as a crude measurement of a continuous variable falling into an interval on the real line. The work in [3] applied a Bayesian analysis approach for a latent variable model to investigate the effect of a binary treatment on an ordinal outcome of interest. A fully-Bayesian method for modeling polychotomous ordinal categories was developed in [6] by using the data augmentation approach. Some issues arise with these approaches, one of which is the estimation of the cutpoint parameters of the interval boundary. Incorporating a vague prior on these parameters, the work in [6] proposed a probit model in a Bayesian framework, which leads to a slow convergence under large sample sizes because of inefficient sampling of the cutoff point parameters. The work in [7] suggested a hybrid Gibbs/Metropolis–Hasting (MH) sampling scheme, which updates these parameters jointly with other parameters. Although the approach reduces the high auto-correlations in the sampling, it may require the computation of joint cumulative probabilities of multivariate distributions, which is generally intractable, for computing the acceptance probability in MH samplers. To avoid these difficulties, the work in [8] proposed a probit model with latent variables following a mixture model, which can successfully characterize the ordinality of data without the need to estimate these parameters.
In this paper, we propose informative priors on the parameters in the probit model, especially for the boundaries associated with category responses based on a Dirichlet distribution via a parameter transformation. The resulting posterior distributions with appropriate hyperparameter values chosen makes the sampling algorithm have fast convergence and efficient estimations. The rest of the article is arranged as follows. Section 2 presents our Bayesian inference procedure for the probit model with the the latent variable approach. Subsequently, we carry out simulation studies to investigate the performance of the proposed methods in Section 3. For illustrative purposes, two real datasets are analyzed in Section 4, followed by some concluding remarks in Section 5.

2. Bayesian Method for the Probit Model

We first briefly introduce the probit model with a latent variable and then present a Bayesian procedure for the analysis. Let ( Z , X ) denote the observed data, where the n-vector Z = ( Z 1 , Z 2 , , Z n ) is the n-ordered categorical outcomes and n × p matrix of covariates X = ( x 1 , x 2 , , x n ) with x i being the i th vector of covariates containing 1 in the first element. The responses Z i take one of the J values, 1 , , J , and are associated with the covariate vector x i through a latent continuous variable y i in the following linear regression model:
y i = x i β + ε i , ε i N ( 0 , σ 2 ) , i = 1 , , n ,
where β = ( β 1 , β 2 , , β p ) is a p × 1 vector of regression coefficients and σ 2 is set to 1 to make the model identifiable. The correspondence between the observed outcome z i and the latent variable y i is defined by:
z i = j if δ j 1 < y i δ j , j = 1 , , J ,
where the boundaries δ j are unknown and = δ 0 < δ 1 < < δ J 1 < δ J = .

2.1. Prior Specification

We chose an independent prior for the parameters β and δ , respectively. First, we specify a conjugate prior, multivariate normal, for the regression coefficients:
β N ( β 0 , Σ 0 ) ,
with the p-vector mean β 0 and p × p variance-covariance matrix Σ 0 being hyperparameters. For setting a prior for the boundaries δ = ( δ 1 , δ 2 , , δ J 1 ) , first, suppose that there is a continuous distribution function F ( · ) whose domain lies in ( , ) , for example, a normal function, such that the coverage probability at each interval is p j = P ( δ j 1 < Δ < δ j ) = F ( δ j ) F ( δ j 1 ) , j = 1.2 , , J , and so, j = 1 J p j = 1 . It follows that:
p 1 = F ( δ 1 ) p 2 = F ( δ 2 ) F ( δ 1 ) p J 1 = F ( δ J 1 ) F ( δ J 2 ) δ 1 = F 1 ( p 1 ) δ 2 = F 1 ( p 1 + p 2 ) δ J 1 = F 1 ( p 1 + p 2 + + p J 1 ) .
Secondly, a Dirichlet prior distribution is set for ( p 1 , p 2 , , p J ) , that is, the prior density π ( p 1 , p 2 , , p J | γ ) = 1 / B ( γ ) × j = 1 J p j γ j 1 with the positive hyperparameters γ = ( γ 1 , γ 2 , , γ J ) and the Beta coefficient B ( γ ) = j = 1 J Γ ( γ j ) / Γ ( j = 1 J γ j ) with the gamma function Γ ( · ) . Thus, by the transformation in (6), the prior of δ = ( δ 1 , δ 2 , , δ J 1 ) becomes:
π ( δ 1 , δ 2 , , δ J 1 | F , γ ) = 1 B ( γ ) j = 1 J [ F ( δ j ) F ( δ j 1 ) ] γ j 1 j = 1 J 1 f ( δ j ) ,
where f ( · ) is the density function of F ( · ) . Clearly, the expression (7) can be regarded as the joint density of order statistics ( δ 1 , δ 2 , , δ J 1 ) with the rank index ( γ 1 , γ 2 , , γ J ) drawn from the distribution function F ( · ) (see, for example, [9]).

2.2. Posterior Inference

The prior beliefs are then updated with information from the data to lead to the following joint posterior distribution:
π ( β , δ | X , Y , Z ) L ( β , δ | X , Y , Z ) π ( β ) π ( δ ) ,
where Y = ( y 1 , y 2 , , y n ) is the vector of latent variables and the likelihood function L ( β , δ | X , Y , Z ) = i = 1 n f Y ( y i | x i , β ) I ( δ j 1 < y i < δ j , z i = j ) with f Y ( · ) being the density function of N ( x i β , 1 ) and I ( · ) being the indicator function. Then, the conditional posteriors are:
( 9 ) π ( β | X , Y , Z , δ ) L ( β , δ | X , Y , Z ) π ( β ) , resulting   in β | ( X , Y , Z , δ ) N ( β ˜ , Σ ˜ ) , ( 10 ) π ( δ | X , Y , Z , β ) π ( δ ) × i = 1 n I ( δ j 1 < y i < δ j , z i = j ) ,
with the posterior mean vector β ˜ = ( X X + Σ 0 1 ) 1 ( X Y + Σ 0 1 β 0 ) and variance-covariance matrix Σ ˜ = ( X X + Σ 0 1 ) 1 . The conditional posterior of δ in (10) is a truncated joint density of order statistics ( δ 1 , δ 2 , , δ J 1 ) . To draw its posterior samples conveniently, we explore the conditional posterior distribution of each component δ j . Let δ ( j ) be the vector δ without the j th element, that is, δ ( j ) = ( δ 1 , , δ j 1 , δ j + 1 , , δ J 1 ) , and it follows that the conditional posterior of δ j is:
π ( δ j | X , Y , Z , δ ( j ) ) [ F ( δ j ) F ( δ j 1 ) ] γ j 1 [ F ( δ j + 1 ) F ( δ j ) ] γ j + 1 1 f ( δ j ) × I ( c j , 1 < δ j < c j , 2 ) , j = 1 , 2 , , J 1 ,
where c j , 1 = max { y i , i = 1 , 2 , , n : z i = j } , c j , 2 = min { y i , i = 1 , 2 , , n : z i = j + 1 } , j = 1 , 2 , , J 1 , and F ( δ 0 ) = 0 , F ( δ J ) = 1 . Therefore, conditionally, δ j is a random variable whose transformed F ( δ j ) is distributed as the shifted F ( δ j 1 ) and scaled [ F ( δ j + 1 ) F ( δ j 1 ) ] Betadistribution Beta ( γ j , γ j + 1 ) truncated at interval [ F ( c j , 1 ) , F ( c j , 2 ) ] , or equivalently:
F ( δ j ) F ( δ j 1 ) F ( δ j + 1 ) F ( δ j 1 ) ( δ j 1 , δ j + 1 ) Beta ( γ j , γ j + 1 ) truncated   at F ( c j , 1 ) F ( δ j 1 ) F ( δ j + 1 ) F ( δ j 1 ) , F ( c j , 2 ) F ( δ j 1 ) F ( δ j + 1 ) F ( δ j 1 ) .
The method that we propose here is closely related to the approach presented in [10] for multinomial probit models. In this context, however, the correspondence between Z i and Y i uses different boundaries that account for the natural ordering of the outcome. We performed posterior inference using Markov chain Monte Carlo (MCMC) techniques. Specifically, a Gibbs sampling based on the above conditional posteriors was adopted starting from a set of initial values of parameters, and then, the following step was repeated M times, among which, given the values of Y ( k ) , β ( k ) and δ ( k ) at the k th iteration, the ( k + 1 ) th iteration is as follows:
(1)
Update the latent vector Y from its posterior distribution given ( β , δ , X , Z ) , each element of which is a truncated normal under the constraints defined in Equation (4):
Y i ( k + 1 ) | ( β ( k ) , δ ( k ) , X , Z ) N ( x i β ( k ) , 1 ) , δ j 1 ( k ) < Y i ( k + 1 ) δ j ( k ) if Z i = j , j = 1 , , J , i = 1 , 2 , , n .
(2)
Update the regression coefficients β from their posterior distribution in Equation (9) under the updated Y ( k + 1 ) .
(3)
Update the boundary parameters δ j from their posterior densities given in Equation (12) with c j , 1 and c j , 2 being evaluated at Y ( k + 1 ) . Specifically, first draw a d j ( k + 1 ) from the truncated Beta in Equation (12), and then, get δ j ( k + 1 ) = F 1 ( u j ( k + 1 ) ) where u j ( k + 1 ) = F ( δ j 1 ( k + 1 ) ) + [ F ( δ j + 1 ( k ) ) F ( δ j 1 ( k + 1 ) ) ] d j ( k + 1 ) , j = 1 , 2 , , J 1 .

2.3. Hyperparameter Settings

The prior on β depends on the mean β 0 and covariance matrix Σ 0 , and the work in [11] discussed the relative merits and drawbacks of different specifications. Here, we set β 0 = 0 and Σ 0 = c I , which is easier to calibrate. The parameter c regulates the amount of shrinkage in the model. In general, we want to avoid very small values of c, which cause too much regularization, and large values, which can induce nonlinear shrinkage as a result of Lindley’s paradox [12]. In the context of probit models for classification into nominal groups, the work in [10] provided some guidelines on how to choose this hyperparameter value. We used similar guidelines here. In practice, the values of c that provide good mixing of the MCMC sampler, with 25–50% distinct visited models, are appropriate [13]. An informative prior of δ in (7) can be specified by using the number of categorical data to set the component γ j in the parameter γ to be the counts in the j th category among z i ’s. Alternatively, a diffuse prior can be made by assigning all γ j = 1 to express no prior belief for δ j , namely, a uniform distribution. For the transformed distribution function F ( · ) , we chose a normal distribution with zero mean and large-scale σ 0 (for example σ 0 = 50 ) to cover a fairly wide range in ( , ) , and so, in this case, F ( δ ) = Φ ( δ / σ 0 ) with Φ ( · ) being the cumulative distribution function of the standard normal.

2.4. Posterior Prediction

The MCMC procedure results in a list of sampled Y , β , and δ vectors. In order to draw posterior inference, we first need to impute the latent vector Y , which can be viewed as missing data. Let Y ^ , β ^ and δ ^ be the estimates obtained by averaging respectively over the sampled Y , β , and δ vectors.
Inference on class prediction can be done in various ways. If a further future vector of covariates x f is available for validation, the least squares prediction based on a single value of β can be computed as:
y ^ f = x f β ^ or y ^ f = x f β ˜ ^ ,
where the posterior mean β ˜ ^ = ( X X + Σ 0 1 ) 1 X Y ^ . Alternatively, we can use the Bayesian model averaging over a set of a posteriori likely models of posterior samples β ( k ) to estimate y f as:
y ^ f = k = 1 M x f β ( k ) π ( β ( k ) | X , Y ^ , Z , δ ^ ) .
The ordered categorical outcomes can then be predicted using the boundary correspondence:
z ^ f = j if δ ^ j 1 < y ^ f δ ^ j , j = 1 , 2 , , J .
In addition, by the fact that y f N ( x f β , 1 ) , the prediction probability that it falls in each class can be computed through the model averaging over the posterior samples β ( k ) :
P ( Z f = j ) k = 1 M P ( δ ^ j 1 < Y f δ ^ j | β ( k ) ) π ( β ( k ) | X , Y ^ , Z , δ ^ ) = Φ δ ^ j x f β ^ ( k ) Φ δ ^ j 1 x f β ^ ( k ) π ( β ( k ) | X , Y ^ , Z , δ ^ ) ,
where Φ ( · ) is the distribution function of the standard normal distribution. The class membership can then be predicted by the mode of the predictive distribution:
z ^ f = argmax 1 j J P ( Z f = j ) .
Furthermore, a less varied estimate can be obtained by the nearest rounded integer computed through membership averaging over predictive probabilities,
z ^ f = μ Z , where μ Z = j = 1 J j P ( Z f = j ) .

3. Simulation Study

We conducted a simulation study to assess the performance of the proposed Bayesian method. The simulated datasets were from three multivariate normal distributions whose dimension was four and the means μ 1 = [ 3 , 2 , 4 , 1 ] , μ 2 = [ 3 , 2 , 4 , 1 ] , μ 3 = [ 3 , 2 , 4 , 1 ] and equal correlation variance-covariance matrices Σ = σ 2 [ ( 1 ρ ) I + ρ 1 1 ] , where σ = 2 , the increasingly-ordered correlations ρ = 0.1 , 0.5 , 0.9 for three structures, corresponding to the ordered response z = 1 , 2 , 3 , respectively, I is the identity matrix, and 1 is the vector of ones. We simulated 30 observational data x = ( x 1 , x 2 , x 3 , x 4 ) from each multivariate normal to make a total sample size of 90. Considering the structure of data may be an essential aspect for efficient classification. In order not to make erroneous conclusions from the predicted error rates, we critically examined the nature of the data to explore the variations existing among groups. In a model for predicting ordinal outcomes, it is obvious to believe that, remarkably, the least variations and far apart locations among groups of variables could make it easy to classify or predict correctly into which category a particular observation falls. In this vein, we visualized the data structure by using the tool of displaying data concentration in [14], who proposed the p-dimensional ellipsoid, E d , of size (“radius”) d, which is defined as the set of all points X in a contour:
E d ( X ¯ , S ) = { X : ( X X ¯ ) T S 1 ( X X ¯ ) d 2 } .
Clearly, E d corresponds to the set of points whose Mahalanobis distances D 2 = ( X X ¯ ) S 1 ( X X ¯ ) with sample covariance matrix S , from the centroid of the sample, X ¯ = ( X ¯ 1 , X ¯ 2 , , X ¯ p ) , are less than or equal to d 2 . For multivariate normal variables, the data ellipsoid approximates a contour with constant density in their joint distribution, and D 2 is the asymptotic chi-squared distribution with p degrees of freedom, χ p 2 . The work in [15] elaborated more on the properties of data ellipsoids and their use in a wide variety of problems and applications. Taking d 2 = χ 0.05 , 2 2 = 5.99 , the two-dimensional pairwise ellipsoid is shown in Figure 1, where the ellipse encloses approximately 95% of the data points under the normal theory. The plot indicates that there were some overlaps between Groups 1 and 2, and Group 3 was relatively well-separated away from the other two. Thus, we expected a slightly high classification error rate where the most misclassified cases would probably occur among Groups 1 and 2.
Next, we performed the analysis by our method. To obtain accurate results and avoid over-fitting, we applied the Bayesian model on the simulated data (training data), and generated a test data with 300 observations for each category for validation. We ran four MCMC chains with widely different starting values for 10,000 iterations each and discarded the first 2000 as burn-in to eliminate dependence on the starting points. We also considered several hyperparameter values for the covariance of the regression coefficients Σ 0 = c I , with c ranging between 5 and 20. There was minimal effect on the overall results, and here, we report the results for c = 10 . An informative prior for the boundary parameter vector γ was specified by setting all the components γ j = 30 , the counts in the j th category among ordinal responses. We note that despite the widely different starting values, there was a good agreement between the results. The error rate for the test data was recorded from the predictions to the misclassification rate tabulated in Table 1, containing the three prediction approaches in Equations (16) (where y f is estimated in Equations (14)), (18) and (19). The classification result showed that there were in total about 270 subjects misclassified, among which, 90 out of 300 observations in Group 1 were misclassified into Group 2, while 100 obsin Group 2 were incorrectly assigned to Group 1. The outcome validated our previous judgment from the pairwise ellipses in Figure 1, where Groups 1 and 2 shared quite a bit of common area. As shown in Table 1, the three types of predictions approximately yielded the same error rates, and the polytomous ordinal logistic regression (POLR) model resulted in a higher classification error rate.
Finally, for comparison purposes, we analyzed the data using common classification methods, such as linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), k-nearest neighbor (KNN), and support vector machines (SVM), which build multi-class classifiers without taking the natural ordering of the response into account. For KNN, we considered values of group number k ranging from 2–8, and we report the results for k = 3 , which gave the lowest overall misclassification rate. All these approaches produced a higher error rate as summarized in Table 1.

4. Real Data Analysis

We applied our model for discrimination analysis on two real datasets. The first was the well-known Irisflower data, and the second was about the measurements of male Egyptian skull data. The purpose of adopting these two datasets was to examine how the model predictions were affected by different data structures and measurement variations among groups.

4.1. Iris Data

We used the dataset as a benchmark for the analysis. It was introduced by the British statistician Ronald [16]. It is also sometimes called Anderson’s Iris data [17], which was collected to quantify the morphological variation of Iris flowers of three related species pictured in Figure 2. The dataset consisted of 50 samples from each of three species (Iris setosa, Iris versicolor, and Iris virginica) with four measured features for each sample: the length and width of the sepals and petals, in centimeters. These are generally viewed as nominal category data and have been analyzed in a vast literature, such as [18,19], and many others. Here, we explore it as ordinal outcomes of three species instead, ranked by the magnitude of measurement variations. The total variances of the four morphological measurements for each species were: 0.3292 (setosa), 0.6248 (versicolor), and 0.8883 (virginica). These numbers, to some extent, represent the size and spread of flowers, which is consistent with the image in Figure 2, where the flower of setosa has the least size in terms of sepal and petal, versicolor larger, and virginica largest.
These findings can also be observed from the visual display of the ellipses for all pairs of measurements shown in Figure 3, where one may further notice that, except a little overlap between groups of versicolor and virginica, the three species of data were overall well-separated; especially, Iris setosa was far apart from the other two species. To obtain an accurate error rate of classification, we applied the cross-validation approach to partition the whole dataset into a training set containing 120 observations with 40 for each category and a testing data containing 30 observations with 10 for each category. The partitioning was repeated five times until all the samples from each species were exhausted. The small error rate of classification produced by our method with cross-validation prediction, along with POLR regression, is summarized in Table 2, where all the cases of setosa were classified correctly by both methods, two (or three) cases from versicolor were misclassified to the virginica species by the Bayesian (or POLR) method, and one from virginica was incorrectly assigned to versicolor by the POLR model. Other common classification methods dealing with nominal data yielded a little larger error rate, shown in Table 2. To see how much the reduction of the error rates for each prediction approach were, we simply computed the error rate of the null model where no covariates was used, that is, 1 max ( R F j ) , as listed in Table 2, where R F j is the relative frequency for the j th category in the dataset.

4.2. Skull Data

The dataset was obtained from the R embedded package “HSAUR”. The data consisted of four physical measurements in millimeters of 30 male Egyptian skulls from each of five epochs (periods) [20]: Period 1 (4000 BC), Period 2 (3300 BC), Period 3 (1850 BC), Period 4 (200BC), and Period 5 (150 AD). The measures were the maximal breadth (mb), basibregmatic height (bh), basialveolar length (bl), and nasal height (nh) of each skull. Figure 4 gives an illustrative labeled image for these four measurements of a typical skull. The researchers claimed that a change in skull measurement was as a result of the time duration. Systematic changes over time could indicate interbreeding among migrant populations (or the influence of other factors) [21]. The interest in this analysis, however, lies in the ability to predict which period these measurements fall within well.
Figure 5 displays the ellipses for all pairs of measurements, and it clearly shows that the five groups of data were pretty much overlapping each other so that one can hardly distinguish one class from the others, and a high classification error rate can be expected. For our Bayesian method, an MCMC chain with 10,000 iterations was run, and the first half was discarded to eliminate dependence on the starting points. Several hyperparameter values were chosen for the covariance of the regression coefficients, with c ranging between 5 and 15. We note that different hyperparameter values resulted in a similar classification. The error rates of cross-validated prediction are listed in Table 2, where, not surprisingly, the high error rates (83, 82, and 80 skulls were misclassified respectively by the three prediction approaches of the Bayesian method, while 92 skulls were incorrectly assigned by POLR) of classification produced by our method validated our previous judgment. In contrast to the Iris data, the skull dataset provided an example where the measurements tended to cluster around a common centroid and overlapped one another with little evidence of separation in location. The classification results of these two different structural datasets justified the reliability of our model in exploration and prediction for ordinal outcomes’ data.
We also noticed that all other common procedures even yielded higher error rates compared to our method summarized in Table 2. As a result of all these classification outcomes, we could logically infer that the measurements of these male Egyptian skulls did not change significantly across the given time periods.

5. Concluding Remarks

We have proposed a Bayesian approach for predicting ordinal outcomes. Introducing latent variables underlying the ordinal outcomes, the problem reduces to a linear model setting. While MCMC techniques are generally computationally intensive, with an informative prior posed on the boundary parameters, our MCMC algorithm showed practical implementation and fast convergence. The simulation study demonstrated the efficient and impressive performance of the proposed method. We have illustrated that our approach, with applications to two real datasets, provided an efficient, reliable, and precise analysis for ordinal categorical response data.

Author Contributions

N.S. contributions: methodology proposing and development; theoretical derivation; paper organization and writing; B.O.D. contribution: code writing and running; paper writing.

Funding

This research was funded by NSF CMMI-0654417 and NIMHD-2G12MD007592.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Agresti, A. Analysis of Ordinal Categorical Data, 2nd ed.; Wiley: Hoboken, NJ, USA, 2010. [Google Scholar]
  2. McCullagh, P. Regression models for ordinal data. J. R. Stat. Soc. Ser. B 1980, 42, 109–142. [Google Scholar] [CrossRef]
  3. Sirisrisakulchai, J.; Sriboonchitta, S. Causal effect for ordinal outcomes from observational data: Bayesian approach. Thai J. Math. 2016, (Special Issue on Applied Mathematics: Bayesian Econometrics), 63–70. [Google Scholar]
  4. Walker, S.H.; Duncan, D.B. Estimation of the probability of an event as a function of several independent variables. Biometrika 1967, 54, 167–179. [Google Scholar] [CrossRef] [PubMed]
  5. Harrell, F.E. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis, 2nd ed.; Springer: New York, NY, USA, 2001. [Google Scholar]
  6. Albert, J.H.; Chib, S. Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 1993, 88, 669–679. [Google Scholar] [CrossRef]
  7. Cowles, M.; Carlin, B.; Connet, J. Bayesian tobit modeling of longitudinal ordinal clinical trial compliance data with nonignorable missingness. J. Am. Stat. Assoc. 1996, 91, 86–98. [Google Scholar] [CrossRef]
  8. Zhou, X. Bayesian Inference for Ordinal Data. Ph.D. Thesis, Rice University, Houston, TX, USA, 2006. [Google Scholar]
  9. David, H.A.; Nagaraja, H.N. Order Statistics, 3rd ed.; Wiley: Hoboken, NJ, USA, 2003. [Google Scholar]
  10. Sha, N.; Vannucci, M.; Tadesse, M.G.; Brown, P.J.; Dragoni, I.; Davies, N.; Roberts, T.; Contestabile, A.; Salmon, M.; Buckley, C.; et al. Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics 2004, 60, 812–819. [Google Scholar] [CrossRef] [PubMed]
  11. Brown, P.J.; Vannucci, M.; Fearn, T. Bayes model averaging with selection of regressors. J. R. Stat. Soc. Ser. B 2002, 64, 519–536. [Google Scholar] [CrossRef]
  12. Lindley, D.V. A statistical paradox. Biometrika 1957, 44, 187–192. [Google Scholar] [CrossRef]
  13. Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. Bayesian Data Analysis, 2nd ed.; Chapman & Hall: London, UK, 2004. [Google Scholar]
  14. Dempster, A.P. Elements of Continuous Multivariate Analysis; Addison-Wesley Publisher Co.: Reading, MA, USA, 1969. [Google Scholar]
  15. Friendly, M.; Monette, G.; Fox, J. Elliptical insights: Understanding statistical methods through elliptical geometry. Stat. Sci. 2013, 28, 1–39. [Google Scholar] [CrossRef]
  16. Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
  17. Anderson, E. The irises of the gaspe peninsula. Bull. Am. Iris Soc. 1935, 59, 2–5. [Google Scholar]
  18. Dy, J.G.; Brodley, C.E. Feature selection for unsupervised learning. J. Mach. Learn. Res. 2004, 5, 845. [Google Scholar]
  19. Johnson, R.A.; Wichern, D.W. Applied Multivariate Statistical Analysis, 6th ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2007. [Google Scholar]
  20. Thomson, A.; Randall-MacIver, D. The Ancient Races of the Thebaid: Being an Anthropometrical Study of the Inhabitants of Upper Egypt from the Earliest Prehistoric Times to the Mohammedan Conquest, Based upon the Examination of Over 1500 Crania; Clarendon Press: Oxford, UK, 1905. [Google Scholar]
  21. Hand, D.J.; Lunn, A.D.; McConway, K.J.; Ostrowski, E. A Handbook of Small Datasets; Chapman and Hall/CRC: London, UK, 1994. [Google Scholar]
Figure 1. Simulated data: visualization of data variation for the three groups.
Figure 1. Simulated data: visualization of data variation for the three groups.
Stats 02 00023 g001
Figure 2. Three species of Iris flower.
Figure 2. Three species of Iris flower.
Stats 02 00023 g002
Figure 3. Iris data: ellipse structure of three species.
Figure 3. Iris data: ellipse structure of three species.
Stats 02 00023 g003
Figure 4. A labeled male Egyptian skull.
Figure 4. A labeled male Egyptian skull.
Stats 02 00023 g004
Figure 5. Skull data: ellipses structure of five groups.
Figure 5. Skull data: ellipses structure of five groups.
Stats 02 00023 g005
Table 1. Simulated data: test data prediction misclassification rates. POLR, polytomous ordinal logistic regression; QDA, quadratic discriminant analysis.
Table 1. Simulated data: test data prediction misclassification rates. POLR, polytomous ordinal logistic regression; QDA, quadratic discriminant analysis.
Bayesian MethodPOLRLDA QDA KNN SVM
BoundaryProbabilityAverage
0.3080.3070.2940.3480.362   0.357   0.356   0.371
Table 2. Real data: cross-validated prediction misclassification rates.
Table 2. Real data: cross-validated prediction misclassification rates.
Bayesian MethodPOLRLDA QDA KNN SVMNull Model
BoundaryProbabilityAverage
Iris0.0130.0130.0130.0260.036  0.034  0.035  0.0330.667
Skull0.5530.5460.5330.6070.646  0.633  0.653  0.6530.800

Share and Cite

MDPI and ACS Style

Sha, N.; Dechi, B.O. A Bayes Inference for Ordinal Response with Latent Variable Approach. Stats 2019, 2, 321-331. https://doi.org/10.3390/stats2020023

AMA Style

Sha N, Dechi BO. A Bayes Inference for Ordinal Response with Latent Variable Approach. Stats. 2019; 2(2):321-331. https://doi.org/10.3390/stats2020023

Chicago/Turabian Style

Sha, Naijun, and Benard Owusu Dechi. 2019. "A Bayes Inference for Ordinal Response with Latent Variable Approach" Stats 2, no. 2: 321-331. https://doi.org/10.3390/stats2020023

Article Metrics

Back to TopTop