Estimating Ideal Points from Roll-Call Data: Explore Principal Components Analysis, Especially for More Than One Dimension?

For two or more dimensions, the two main approaches to estimating legislators’ ideal points from roll-call data entail arbitrary, yet consequential, identification and modeling assumptions that bring about both indeterminateness and undue constraints for the ideal points. This paper presents a simple and fast approach to estimating ideal points in multiple dimensions that is not marred by those issues. The leading approach at present is that of Poole and Rosenthal. Also prominent currently is one that uses Bayesian techniques. However, in more than one dimension, they both have several problems, of which nonidentifiability of ideal points is the most precarious. The approach that we offer uses a particular mode of principal components analysis to estimate ideal points. It applies logistic regression to estimate roll-call parameters. It has a special feature that provides some guidance for deciding how many dimensions to use. Although its relative simplicity makes it useful even in just one dimension, its main advantages are for more than one.


Introduction
The use of roll-call data to place the positions of legislators within a political or ideological spectrum or space is now common, not only in academic research but also, to some degree, in mainstream media and in relation to political campaigns.One may alternatively refer to these spatial positions as locations, or scores, or ideal points.In the bulk of applications, the space is one-dimensional, with the single dimension interpreted in terms of a spectrum of "left"-"right" or "liberal"-"conservative" political ideology.However, space with two or more dimensions is of potential importance beyond the limited attention it has yet received, as will be covered in Section 2 below.
Ideal-point estimates can have different origins and different applications.In academic work, use of results from the Poole-Rosenthal approach has predominated, though certain Bayesian methodology has become available more recently.On the nonacademic side, election forecasting by Nate Silver (2014) has used (unidimensional) Poole-Rosenthal scores as part of its input.Other nonacademic pursuits will be cited in Sections 2 and 5.2.
The two main existing avenues for estimation of ideal points from roll-call data are the Poole-Rosenthal approach and a Bayesian approach.We examine both of them critically, particularly for more than one dimension, before turning to detailed study of principal components analysis, a technique that has rarely seen use for ideal-point estimation but offers much promise.
As a tool for appraisal of legislators' spatial locations, Poole-Rosenthal NOMINATE scores (McCarty et al. 1997;Poole 2005;Poole and Rosenthal 1985, 1991, 1997, 2001, 2007) have received wide attention and use.We will confine our analysis mainly to their W-NOMINATE (hereafter P-R), though DW-NOMINATE, a version that is much the same except that it can span across more than one legislative term, is also often applied.We also will not deal with somewhat related approaches that are less often used such as the Optimal Classification (OC) method of Poole (2000), albeit in some circumstances (see, e.g., Rosenthal and Voeten 2004) OC may produce better results than P-R.
The Bayesian approach that we will consider (hereafter CJR) is the one associated with Jackman (2000Jackman ( , 2001) ) and Clinton et al. (2004a).A very new Bayesian approach is that of Imai et al. (2016).It is computationally fast but also not simple.In addition, its applications are shown only for ideal points in one dimension, whereas our prime focus here is on two or more dimensions.
This paper embraces two main themes, both with primary relevance for an issue space that is beyond one dimension.First, we highlight defects of P-R and CJR.The most serious problem for both of them is nonidentifiability of ideal points.Also raising questions are other issues concerning indeterminateness and arbitrary model assumptions.We note a greater number of difficulties for P-R than for CJR.
Second, we present a particular way of using principal components analysis (hereafter PCA) to estimate ideal points.Although our main emphasis is on contending that PCA has a pronounced edge over P-R or CJR for two or more dimensions, PCA has some advantage even in one dimension and may be a sound choice even for that case.
In regard to PCA, our main goal is to enable ideal-point estimates that have evident validity and avoid indeterminateness and arbitrary modeling and identification assumptions.Such estimates from PCA will then compare favorably with those from P-R and CJR.Estimates of ideal points can be (and have been) used in many types of applications, including those related to prediction of legislators' votes, measurement of ideology, dimensionality, historical evolvement of the US Congress, and polarization.Poole and Rosenthal (2007, chp. 11 and elsewhere) cite or describe numerous specific applications.See also Section 2 below for relevant information.Our PCA ideal-point estimates could be used for any and all types of applications, with the intent of obtaining more justifiable analyses especially with more than one dimension.
Section 2 presents numerous examples of issue spaces that may have more than one dimension.It thus demonstrates the meaningfulness of such issue spaces and of our work here.
Section 3 covers the shortcomings of P-R and CJR.Mathematical support comes from Appendix A. The shortcomings and their impacts have been largely unrecognized.
Section 4 presents our PCA methodology.Among other things, it covers extraction of the ideal-point estimates; estimation of the roll-call parameters, which is via logistic regression; a special example, for two dimensions, small enough to show the main features of our PCA approach in a single table; and treatment of missing votes.Estimates of roll-call parameters, sometimes used in analysis of individual roll calls, are produced jointly with the ideal-point estimates in both P-R and CJR-but not under PCA, where we produce them separately through logistic regression.
After discussion in Section 5 of certain techniques or applications that bear some relation to our PCA approach or provide alternatives to it, Section 6 presents four empirical examples of PCA applications from the 105th and 106th US Senates (1997-1998 and 1999-2000), with varying numbers of roll calls and with both one-and two-dimensional issue spaces.A benefit of using examples from those two Senates is the availability of comparable published data, especially on model fit, for P-R and CJR.Comparisons are satisfactory, thus attesting to face validity for PCA.
How does one decide how many issue dimensions are best for given data?For judging the number of dimensions to use, PCA provides guidance of a type not available with P-R or CJR.Section 7 gives details.
Why has principal component analysis been used only rarely for estimation of ideal points?We explore this matter in Section 8, before summarizing further in Section 9.

Application of Ideal Points in More Than One Dimension
Largely unidimensional voting spaces in recent US Congresses may have curtailed recent attention to issue spaces with more than one dimension.Historically, though, even the US has seen dimensions beyond the traditional "left"-"right" or "liberal"-"conservative" one: for example, slavery before the Civil War, bimetallism in the late 19th century, and civil rights in the mid-20th century (Poole and Rosenthal 2007).
Moreover, a new second dimension, such as one pertaining to undue police force, foreign military involvement, surveillance by government, drug laws, or some combination thereof (cf.Hook 2014), could become prominent at some point.In a somewhat similar vein, the political preferences of US citizens (as opposed to elites or legislators) were found by Carmines et al. (2012) to fall onto not just one dimension, but rather two-one involving economic and social-welfare issues and the other related to social and cultural matters.
Varied dimensions may arise in different times and places.Social or cultural issues may be largely separable from economic ones in some situations.Other issues that may supply extra dimensions include transnational integration, as in some European countries; secession, as in Scotland and Catalonia; language, as in Belgium and Canada; Peronism (Argentina); ethnicity; religion; immigration; corruption and reform; and nationalism.Dimensions in Israel, in addition to economic ones, have included religious-secular and hawk-dove.Bornschier (2010) examined manifestations of two dimensions of political space in six countries in Western Europe.Hix et al. (2006) found two dimensions in the European Parliament.
Some recent work has challenged the notion that the US political space is largely one-dimensional.Perhaps paradoxically, extra dimensions for the US may become more evident with fewer roll calls rather than more, and with fewer legislators rather than more.Upon examining separate subsets of votes in different subject areas, Crespin and Rohde (2010) found more multidimensionality than in the totality of roll calls.Aldrich et al. (2014) concluded that greater dimensionality is evinced within each party than in the two parties taken together.Thus, analysis of smaller data sets may uncover new and extra dimensions that are muffled in a comprehensive data set.Along a different line, Dougherty et al. (2014) found that agenda setting that prevents certain votes from taking place can suppress revelation of multidimensionality.
Perhaps the most extensive representation of two-dimensional political space is in a video shown on Voteview Blog (2014).It presents graphs of two-dimensional Poole-Rosenthal ideal points, separately for the US Senate and House, for each two-year period from 1789 to 2013.
A chart with 10 two-dimensional graphs of Poole-Rosenthal scores from Voteview, for the US Senate and House for each of the terms of 1889-90, 1963-64, 1983-84, 1997-98, and 2011-12, appeared in an article in The Economist (Anonymous 2014b) to illustrate increasing polarization in American politics in recent years.The first dimension was labeled as "Liberal" versus "Conservative", reflecting "differences in political ideology, based on votes on government intervention in the economy" (an oversimplification, it would seem).The second dimension, labeled as "Southern values" versus "Northern values", was represented as pertaining to "values traditionally identified as Northern and Southern, based on votes on race-related issues" (a description that would seem to be less fitting for the later years).
All in all, either for the US or elsewhere there is fertile ground for studying more than one dimension in roll-call analysis, a pursuit for which PCA can offer some unique advantages.

Flaws and Drawbacks of P-R and CJR
In this section, we scrutinize both P-R and CJR.In many respects they do not compare favorably with PCA.
In two or more dimensions (there is no problem in one dimension), the gravest flaw of both P-R and CJR is their nonidentifiability of ideal points-nonidentifiability that goes beyond the simple indefiniteness of location, scale, and orientation.Unlimited numbers of transformations (ones that do not just change location or scale) produce shifts to new sets of ideal points that are substantively different but leave the maximized log likelihood unchanged.Troubles escalate as the number of dimensions increases.Identifiability of ideal points in more than one dimension should be viewed as an essential property for any approach to roll-call analysis.Nonidentifiability and associated nonestimability (cf., e.g., Basu 1983;Schmidt 1983) mean essentially that the ideal points in more than one dimension are not even defined-a critical pitfall.Nonidentifiability is hardly a trifling transgression.
For CJR, the nonidentifiability is acknowledged explicitly (e.g.see (Jackman 2001), especially pp. 233, 235).Recognizing the nonidentifiability of CJR in two dimensions, Jackman (2001) sought to circumvent it by assigning certain priors to two roll calls chosen to serve as second-dimension anchors.However, this patch seems arbitrary and subjective.Any singling out of a pair of roll calls is open to question, since different anchor pairs yield different results.With more dimensions, the arbitrary identification assumptions create worsening troubles.Two-dimensional CJR applications other than Jackman (2001) seem to be isolated.We could find only one, noted briefly in Jessee (2009, footnote 11) with no indication of what anchors he chose or how otherwise he dealt with the nonidentifiability.The nonidentifiability issue seems to cripple the use of CJR for more than one dimension.
For P-R, one may not suspect nonidentifiability, as one does find P-R applications in more than one dimension.However, Part A.6 of Appendix A below provides formal mathematical proof of the P-R nonidentifiability.This result for P-R may seem almost obvious, though: For two or more dimensions, P-R has far more parameters than CJR (thus entailing worse identifiability troubles) yet even CJR itself lacks identifiability.
Also pertaining to indeterminateness of ideal-point estimates are two other characteristics of both P-R and CJR.First, both of them lack the orthogonality property, under which the ideal-point vectors for any pair of dimensions are orthogonal.Any flexibility gained from forgoing orthogonality might be deemed an advantage until one realizes that it comes at the price of nonidentifiability (or, at least, no suitable way to get identifiability without orthogonality is evident).Second, neither P-R nor CJR preserves lower-dimension ideal-point estimates when calculating those for a higher dimension.
Although not very lucid about the matter, the mathematical details for both CJR (Clinton et al. 2004a, p. 367) and P-R (Poole 2005, pp. 107-10) convey that the respective procedures yield first coordinates of their two-dimensional ideal points that will differ from the single coordinates of their one-dimensional ideal points.Thus, where computations are done for both one and two dimensions, there are two different sets of ideal points for the first dimension.With more dimensions, multiplicity increases.At least some users will object to such indefiniteness.
As will be seen shortly, PCA does not have any of the characteristics that have just been described.It does not suffer from nonidentifiability, it enjoys the orthogonality property, and its ideal-point estimates for a given dimension do not change when the number of dimensions changes.
Several further modeling issues, in the form of arbitrary parameter constraints or identification assumptions, cause trouble for P-R.None of them apply to either CJR or PCA.
P-R requires legislators' ideal points to lie within a circle or sphere of unit radius for two or three dimensions, respectively, and similarly for higher dimensions.Some ideal points may thus be forced unnaturally to lie exactly on the boundary (surface) of the unit circle (sphere) in two (three) dimensions.In two dimensions under P-R, a legislator with a location of −0.8 or +0.8 on the first dimension could not have a second-dimension position below −0.6 or above +0.6 (because 0.82 + 0.6 2 = 1).A legislator who is at an extreme (−1 or +1) on one dimension cannot be at an extreme (in fact, can be nowhere except at 0) on the other. 1 This limitation of P-R is of more than theoretical concern.For instance, out of the 102 members who served in the 106th US Senate, 23 have two-dimensional ideal points that lie exactly on the boundary of the unit circle. 2 The most extreme case is Barbara Boxer of California, who has a position of −0.988 on the first dimension and −0.156 on the second.Given the value of −0.988, 1 Note that using a larger circle would not help, because a legislator would still be unable to have extreme scores on both dimensions.Shape matters: A square, a circle, and (e.g.) a four-pointed star will all work differently.
the second-dimension P-R location has to lie in the short interval from −0.156 to +0.156.For more on the ideal-point constraints, see Part A.4 of Appendix A.
Roll-call parameters bear similar restrictions.P-R imposes constraints on its estimates of its roll-call midpoints like those on its ideal-point estimates (limitation to the unit sphere in two dimensions, e.g.).Again, some (midpoint) estimates may be unnaturally confined.See Part A.5 of Appendix A for more discussion.
In two or more dimensions, P-R is severely overparameterized.This trait relates to, and aggravates, nonidentifiability.For any dimensionality, CJR and PCA use exactly the same number of ideal-point and roll-call parameters as each other.Although P-R uses the same number of ideal-point parameters as CJR and PCA, with two or more dimensions it uses far more roll-call parameters than the other two.The excess P-R roll-call parameters may have an appealing rationale but exact a steep price in terms of overparameterization and nonidentifiability.Part A.3 of Appendix A explains further.
P-R abides an arcane inflexibility for a (hypothetical) roll call whose votes have no relation at all to the legislator locations.Full details are in Part A.2 of Appendix A.
One finds indications that the use of P-R may be questionable if the number of roll calls and/or legislators is small (e.g., Crespin and Rohde 2010, pp. 980-81;Lewis 2001;Peress 2009;Poole et al. 2011, p. 5).It may not be clear how well CJR behaves with small data sets.As for PCA, Example 1 below applies it to data with only eight roll calls and 14 legislators, and textbooks abound with examples of principal components analyses for small data sets.
PCA yields a unique solution for ideal points. 3P-R and CJR, by contrast, both involve iterative procedures, and may thus produce ideal-point estimates that vary depending on different choices of such elements as starting values for the ideal points, number of iterations, and stopping rules.With CJR, choice of Bayesian priors can also affect the solution. 4,5

Specifics for the Use of PCA, with an Example
As will be seen shortly, our PCA technique avoids the difficulties of P-R and CJR just described; it is simple, fast, and powerful; and its results are acceptable.We now devote detailed attention to it.
The statistical theory and application of principal components has a lengthy history and is thoroughly covered in various textbooks (e.g., Morrison 1976, chp. 8;Jackson 1991;Jolliffe 2004) that provide details of the technique.For estimating ideal points from roll-call data, however, the use of the methodology has been scant.This section covers the general method and related matters, deals with estimation of roll-call parameters, and provides a detailed small example.

How PCA Obtains Ideal Points
With I legislators and J roll calls, we suppose that we have a vote matrix Y 0 (I × J) with general element y 0ij (i = 1, ..., I; j = 1, ..., J) equal to 1 or 0 if legislator i votes, respectively, yea or nay on roll call j.For the moment we assume no missing data (no missed votes), but later, in Section 4.5, we show how to skirt this assumption.Unanimous votes provide no usable information and are excluded from Y 0 .All of our calculations use SAS ® .
Our approach treats the elements of Y 0 as I observations from a J-variate distribution and then calculates principal components and their associated scores in the standard fashion.Let u I denote 3 It is unique except for possible different choices for how to impute for any missing votes.

4
The same elements whose choices can cause differing CJR results can also lead to differences under the approach of Imai et al. (2016).

5
For an extensive and informative comparison of CJR and P-R, though only for the case of one dimension, see Carroll et al. (2009); see also Clinton and Jackman (2009).For other works that provide differing views about P-R or how it compares with CJR, see (e.g.) Krehbiel and Peskowitz (2015); Caughey and Schickler (2016); Bateman and Lapinski (2016) and McCarty (2016).
Soc. Sci. 2018, 7, 12 an (I × 1) vector with all 1's.Then the vote matrix adjusted for the roll-call means is PCA is based on the eigenvalues and eigenvectors of S. Let L k denote the k-th largest eigenvalue (k = 1, 2, ...) and g k (J × 1) the corresponding eigenvector.(Here we consider only nonzero eigenvalues, and also disregard the highly unlikely case where two nonzero sample eigenvalues are equal.)Then, for any g (J × 1) such that g g = 1, the maximum value of g Sg (which is the maximum variance of any linear function of the J votes using such a g) is L 1 and is attained at g = g 1 .Furthermore, for any g subject to g g 1 = 0 (or g Sg 1 = 0) and g g = 1, the maximum value of g Sg is L 2 and is attained at g = g 2 .Analogous results hold for k > 2.
We use the formula x 1 (I × 1) = (3L 1 ) −0.5 Yg 1 for the legislators' first-dimension scores, or ideal points.The mean score, u I x 1 /I, is 0. The variance of the scores is 1 / 3 , equal to that of the rectangular (uniform) distribution from −1 to +1.Similarly, the formula x 2 (I × 1) = (3L 2 ) −0.5 Yg 2 gives the second-dimension ideal points.Both x 1 x 2 and g 1 g 2 are 0; that is, the two ideal-point vectors are orthogonal, as are the two eigenvectors.The scores for general k for the I legislators are found from x k (I × 1) = (3L k ) −0.5 Yg k .For all dimensions k, x k has mean 0 and variance 1 / 3 .For all k = k*, x k x k * and g k g k * are both 0.
Although in most applications of principal components analysis the number of observations exceeds the number of variables, the reverse condition can also occur (e.g., Jackson 1991, pp. 32, 73, 190).Some of our examples have more legislators (observations) than roll calls (variables), that is, I > J, whereas others have J > I.
The simplicity of the main element of the PCA approach becomes evident upon noting that the single line of SAS ® code proc princomp data=d1 cov out=d2 outstat=d3 noprint; produces (in the output data set d2, for all dimensions) the legislators' scores before multiplication by (3L k ) −0.5 (i.e., the values Yg k ) as well as (in the output data set d3) the eigenvalues, L k , and the eigenvectors, g k .The input data set d1 is the vote matrix Y 0 (after resolving any missing votes).For some limited timing results for PCA, see the end of Section 6.1 below.

Roll-Call Parameters
Though generally less important than the legislator scores, estimates of roll-call parameters are often desired.However, with our routine (unlike others), they need not be calculated at all if one is interested only in the scores.On the other hand, they are required if one wants to evaluate model fit (see Sections 4.3 and 6.2 below).They are also needed if one wants to study individual roll calls by (e.g.) examining their (two-dimensional) cutting lines as in numerous examples in Chapters 5-7 of Poole and Rosenthal (2007).In one dimension, they are used to calculate, for a given roll call, the cut point that separates predicted nay-voters from predicted yea-voters based on the first-dimension scores.
Once the scores x k have been obtained, our PCA approach estimates the roll-call parameters through logistic regression.Our technique for these parameter estimates bears limited resemblance to the method of joint maximum-likelihood estimation used in item-response theory in educational testing.Let x ik denote the i-th element of x k , that is, the estimated ideal point of legislator i on dimension k.With p ij denoting the probability that legislator i votes yea on roll call j, our logistic-regression model takes the form with one dimension and log for two dimensions, with obvious extensions for more than two dimensions.Each roll call j has two parameters, denoted by (a j , b j ), in (1), and three parameters, (b j0 , b j1 , b j2 ), in (2).
Because p ij = 1 2 if the right side of ( 1) is equal to 0, a cut point (or midpoint) for roll call j under (1) may be defined by m j = −a j /b j .Thus, any legislator with ideal point x i1 to the right (left) of m j votes yea (nay) with probability greater than 1 /2 if b j > 0-and vice versa if b j < 0. Under (2), the cutting line b j0 + b j1 x i1 + b j2 x i2 = 0 separates the legislators according to whether their probabilities of voting yea on roll call j are greater or less than 1 /2.
For each roll call (j) separately, for either (1) or ( 2), we estimate the roll-call parameters through logistic regression with the y 0ij 's (equal to 1 for yea, 0 for nay) as the response variable and the x i1 's [in (1)] or x i1 's and x i2 's [in (2)] as the independent variable(s).Although the principal-components computation of Section 4.1 requires imputed values for missing votes (obtained as described in Section 4.5), the logistic-regression calculations are run for each roll call individually and thus do not need the imputed values.If we define n ij to be 1 if legislator i provides a vote on roll call j and 0 if the vote is missing, then the logistic-regression computation for a roll call j is based only on those legislators i for whom n ij = 1.
Because logistic regression can encounter complete separation of points (see, e.g., Albert and Anderson 1984), we can employ special steps to detect this condition and bypass the logistic-regression calculation on any roll call where it occurs.With either (1) or (2), complete separation of points on roll call j entails perfect fit for that roll call.
With (1), the separation prevents convergence of the logistic-regression procedure that estimates (a j , b j ).It occurs if the x i1 's of the legislators who vote yea on roll call j are either all above or all below the x i1 's of all legislators who vote nay.If that condition exists on roll call j, one proceeds as follows.Let m j be the point halfway between the highest x i1 of a nay (yea) voter and the lowest x i1 of a yea (nay) voter if all yea voters have x i1 's above (below) those of all nay voters.Then set a j = −Hm j and b j = H (a j = Hm j and b j = −H) if the yea voters have the higher (lower) x i1 's.H is a large positive number (we use H = 100).With the (a j , b j ) pair thus specified, the line represented by the right side of (1) is almost vertical and cuts the horizontal axis at m j = −a j /b j .
For (2), we first set b j0 = a j , b j1 = b j , and b j2 = 0 for any roll call j that has complete separation of points under model (1) [that is, we duplicate the two parameters from (1) and set b j2 = 0].Beyond that, complete separation of points is harder to identify in two dimensions than in one, but can be detected through linear programming.Appendix B gives the details.

Measuring Model Fit
Geometric mean probability or GMP (e.g., Poole and Rosenthal 2007, pp. 37-38) provides a useful means for evaluating model fit.Under our PCA approach, the log-likelihood function for legislator i and roll call j can be taken as under the one-dimensional model (1) and as under the two-dimensional model ( 2), with obvious generalization to k dimensions for a function V ijk .If roll call j has perfect model fit (complete separation of points) in k dimensions (k ≥ 1), though, then V ijk is set to 0 for all legislators i for that roll call.Excluding those (i, j) combinations for which n ij = 0, let V i.k , V. jk , and V.. k denote the sum of V ijk over j, over i, and over both i and j, respectively (where the V ijk 's are evaluated using the estimates of the ideal points and roll-call parameters).Similarly, let n i. , n. j , and n.. denote the sum of n ij over j, over i, and over both i and j, respectively.Then for dimension k the GMP can be written as G i.k = e V i.k /n i. for legislator i, G. jk = e V. jk /n .jfor roll call j, and G ..k = e V ..k /n .. overall.As a V-value becomes less negative, the corresponding G-value increases, indicating better fit; G approaches 1 as V approaches 0.

A "Toy" Example
Perhaps the best way to illustrate the workings of our PCA methodology described in Sections 4.1-4.3 is through a small example.Example 1, all of whose calculated values are shown in Table 1, has I = 14 legislators, who are US House members, and J = 8 roll calls, taken from the 2006 (second) session of the 109th US Congress.The eight roll calls were chosen from the 12 "key votes" selected by Congressional Quarterly (2007) for that session.Although the choices of the 14 representatives and of the eight key votes from the 12 were made with an eye toward trying (successfully) to obtain an example with a rather strong second dimension, the choices were done without any trial-and-error explorations.In any case, the basic aim of the example is to illustrate the PCA features so as to provide understanding.Any substantive results, even if reasonable, are secondary and incidental.For much larger examples, see Section 6 below.The table shows the vote of each House member on each roll call.To obtain the 14 × 8 matrix Y 0 , in Table 1 one changes each Y (yea) to 1 and each N (nay) to 0. For purposes of this example, since Congressman Jones voted "Present" on roll call #288 and was "Announced for" (shown as "+" in Table 1) on #511, his value was set to 1 /2 for #288 and to 1 for #511.
Under "Legislator results" Table 1 shows, for each House member, the scores for the first and second dimensions, x i1 and x i2 , along with the rank of x i2 .The x i1 's and x i2 's each have mean 0 and variance 1 / 3 .The members are listed in the order of their first-dimension scores (low to high, or traditional "left" to "right"), with Levin and Cantor at the two ends.Not surprisingly, there is almost complete separation between Democrats and Republicans on the first dimension, the only exception being that Matheson (D) has a higher x i1 than Kirk (R).
The second-dimension scores, x i2 , bear little or no relation to party or to x i1 , as would be expected in view of the orthogonality of x i1 and x i2 .Paul and Spratt are at opposite ends on x i2 .
In the lower part of Table 1, the first two lines under "Roll-call results" show the first two eigenvectors, g 1 and g 2 , whose elements are denoted by g j1 and g j2 .Both g 1 g 1 and g 2 g 2 are equal to 1.Note that |g j1 | is highest for vote #135 (tax cuts) and lowest for #288 (Iraq war) whereas |g j2 | is highest for #288 and lowest for #135, thus suggesting that the first dimension is heavily influenced by the tax-cut vote and the second dimension by positions on the Iraq war.Other votes that contribute strongly to the second dimension are #511 (easier challenges to eminent domain) and #502 (warrantless surveillance).The second largest |g j1 | is for #388 (stem cell research).
The first two eigenvalues (not shown in Table 1) are L 1 = 0.741 and L 2 = 0.555.Since the sum of the eight eigenvalues is 2.029, the first and second dimensions account, respectively, for 0.741/2.029= 36.5% and 0.555/2.029= 27.4% of total variability.The ratio of L 2 to L 1 , 0.555/0.741= 0.75, is unusually high (as is evident upon comparison with examples in Section 6.1) and thus suggests a strong second dimension.As a result, Example 1 provides a good illustration of how PCA works when more than one dimension is important, though the prominence of the second dimension seems attributable in large part to the particular choices of House members and roll calls (e.g.these members include most of the few Republicans who voted nay or present on vote #288).
For each roll call Table 1 shows the estimates of the first-dimension roll-call parameters, a j and b j , and the associated cut point, m j = −a j /b j .The sign of a b j is generally positive (negative) for a roll call that attracts its yea votes largely from the Republicans (Democrats).Cut points are close to 0, the mean of x i1 , for five of the eight roll calls.
Model Improvement in legislator GMP upon adding the second dimension (as measured by G i.2 /G i.1 or its logarithm) is greatest for Paul, second greatest for Kaptur, and least for Cantor and Levin.Cantor has the largest GMP in both one and two dimensions; Paul has the lowest G i.1 , but Kirk and Matheson have the lowest G i.2 values.Overall GMP's, shown in the lower right corner of Table 1, are G.. 1 = 0.658 for one dimension and G.. 2 = 0.879 for two.

Handling of Missing Votes
Before the calculations described in Section 4.1 can run, there must be values for all IJ elements of Y 0 ; Section 4.1 assumed no missing data.In practice, though, one must generally deal with missing votes (although how they are handled may be unimportant if their percentage is small).To get PCA started, one has to assign values for the holes in the data.
Our basic concept is that, if y 0ij is missing, we set it equal to a value between 0 and 1 that represents the estimated probability that the (i, j) vote is yea.One might get these values through different means, but our routine is as follows: 1 Obtain a preliminary Y 0 by setting y 0ij equal to the party mean on roll call j if the (i, j) vote is missing.This party mean is the proportion of yea votes to total (yea plus nay) votes on roll call j among those legislators in the same party as legislator i.(In the absence of party data, one could use the proportion of yea votes to total votes on roll call j among all legislators who voted.) 2 Use the preliminary Y 0 to run a principal-components computation in the same manner as indicated in Section 4.1.The first-dimension scores that result will constitute a preliminary x 1 .3 Separately for each roll call j, feed this preliminary x 1 into a logistic regression based on model (1) to obtain preliminary (a j , b j ) values, using the same procedure as in Section 4.2.

4
Separately for each legislator i, feed these preliminary (a j , b j ) pairs into another logistic regression, also based on model (1) but with x i1 to be solved for and the (a j , b j ) values supplied rather than the reverse.The resulting values of x i1 form a second (and more refined) preliminary x 1 .The calculation for a legislator i is based only on those roll calls j for which n ij = 1, that is, for which y 0ij is not missing.Because a j is given, there is no intercept term to be solved for, and so a j is treated as an offset variable (for which PROC LOGISTIC of SAS ® makes provision).Any legislator i who, except for missed roll calls, votes yea (nay) on every roll call with b j > 0 and nay (yea) on every one with b j < 0 is given the value x i1 = H 0 (x i1 = −H 0 ) and is excluded from the logistic-regression calculation upon being detected beforehand.(We use H 0 = 3.)The exclusion for such an "extreme" legislator is necessary because otherwise x i1 would be unbounded.5 For any (i, j) for which n ij = 0, use the preliminary (a j , b j ) values together with x i1 from the second preliminary x 1 to obtain y 0ij = 1/ 1 + e −(a j +b j x i1 ) , which estimates the p ij of (1).Use these results to produce a full Y 0 , with no empty cells.Starting with this new Y 0 , one can run the PCA calculations of Section 4.1 (and then the ones in Sections 4.2 and 4.3).
The imputation procedure just described is obviously not the only way to deal with missing votes.However, it seems relatively sophisticated in that it allows missing votes to have values on a continuum from 0 to 1 (not just 0 and 1 themselves), in addition to using the full non-missing data in determining those values.P-R (whose iterations calculate for one legislator at a time, as well as for one roll call at a time) and CJR [which imputes in each iteration (Clinton et al. 2004a, p. 367)] each have their own ways of taking care of missing votes.Rosas and Shomer (2008) express general concern (but with specific mention of P-R and CJR) that techniques for handling missing votes are open to question because the missingness may not be ignorable.Although any method for attacking missingness will be imperfect, ours appears to be relatively satisfactory.Generally, one would expect that adverse impact of missing votes will be less overall if they are fewer, and will be less for a legislator who has fewer of them.In addition, our scheme for imputation of missing votes may work better for legislatures with high intra-party vote cohesion (as is typical for a parliamentary system) than for those where such cohesion is less.

Bridging Across Sessions
P-R, or, more exactly, D-or DW-NOMINATE, can span multiple legislative terms that entail changing memberships, and can thus aim to estimate legislators' ideal points over time in a common space (Poole 2005;Poole and Rosenthal 1991, 2001, 2007).With PCA (or for any approach, including P-R), much the same objective can be achieved through estimation based on a standard unbalanced two-way design.Thus, under single-dimension PCA (e.g.) let x i1t be the ideal point that was found for legislator i for time or term t (for those t's in which legislator i did serve).Then the estimate of the "treatment" effect w i calculated under the linear model using the restriction that the "block" (term) effects v t sum to 0 (∑ t v t = 0), will serve as an estimate of the ideal point of legislator i over time in a common space.The model ( 3) is "constant" in that it allows for no time variation beyond that provided by the v t 's.To allow for linear trends for individual legislators (perhaps a questionable move, since a more complex model applies), just augment (3) by replacing w i with (w i0 + w i1 t), and add a second restriction.Questions can be raised, though, as to how meaningful it is to try to place legislators in a common space over time.Concerns may be minor if the time covers just a few terms.However, if it covers a number of decades, to say nothing of two centuries (e.g., Poole and Rosenthal 1991), then serious doubts may be expressed.For further comments on this issue, see (e.g.) Bailey (2007), Bateman and Lapinski (2016), Cillizza (2014), and Sides (2011).

Developments Related to Our PCA Approach
Sections 5.1-5.5 deal, respectively, with other work involving principal components in estimating ideal points; ratings by National Journal; factor analysis; a method of Heckman and Snyder (1997); and miscellaneous pursuits, including Bayesian approaches other than CJR.

Other Use of Principal Components in Ideal-Point Estimation
The possibility of making use of principal components in the estimation of ideal points has not been altogether ignored in the past.One finds some applications whose descriptions are less than totally clear as to what was done.In footnote 4 of Clinton et al. (2004a, p. 359), however, is a detailed description of one technique.The CJR method uses the principal-components estimator only for start values of the ideal points, though (Clinton et al. 2004a, p. 368).
Aside from this confinement to the initialization, there are several differences between the CJR approach of footnote 4 and our PCA approach.First, the vote matrix is double-centered under CJR, whereas with Y our PCA follows the usual practice in most applied work with principal components, by adjusting only for the variable (roll-call) means and not for the observation (legislator) means.Second, CJR uses pairwise deletion of missing data in computing its correlation matrix (which could result in negative eigenvalues), whereas we impute for missing votes (with values between 0 and 1, as described above in Section 4.5).Third, rather than a correlation matrix as used by CJR, our PCA uses the covariance matrix, S. That is largely because inference results are mostly unavailable with a correlation matrix (e.g., Jackson 1991, sct.4.7) but can be obtained (see Section 7 below), though with some assumptions, with a covariance matrix.Although a correlation matrix rather than a covariance matrix generally has to be used if variables have different units of measurement, such a consideration hardly applies to a vote matrix with all its values in the set [0, 1].Fourth, CJR uses roll calls as observations and legislators as variables, rather than the reverse as in our PCA.Thus, CJR obtains its ideal points from the (I × 1) eigenvectors of its (I × I) correlation matrix, whereas our ideal points are the scores x k (I × 1) that are derived using Y and the eigenvectors of our (J × J) covariance matrix.
One might think that there should be no relation at all between the eigenvectors of the (I × I) matrix and the score vectors that are based on the eigenvectors of the (J × J) matrix.By virtue of singular value decomposition (Jolliffe 2004, pp. 44-45), though, the two can be the same except for a multiplicative constant, but generally just under certain restrictive conditions.Specifically, both matrices have to be covariance (not correlation) matrices, and, in essence, both matrices have to originate from a double-centered vote matrix.Of course, the two (I × 1) vectors could be similar in some cases even if these conditions do not both hold. 6 6 With the roll calls as variables and the legislators as observations as in our PCA approach, one could ask whether the list of variables might be augmented to include, besides the roll calls, some covariates that would be legislator attributes (e.g., party).No attempt has been made to study that possibility or how it might be used, but it could lead to some interesting applications.For a covariance (rather than correlation) matrix to be used, though, an attribute might need to meet certain conditions, such as confinement to the interval [0, 1].

The Ratings from National Journal
From selected US Congressional roll-call votes from each year starting with 1981 (and continuing at least through votes from 2013), National Journal calculated ideological ratings for all members of the Senate and House.The methodology was explained each year (e.g., Anonymous 2014a) and ever since the beginning was described (briefly) as using "principal-components analysis".However, mathematical details of that analysis are absent; because one can do principal components analyses in differing ways, the ratings methodology is hard to appraise.
The National Journal ratings have found their way into political campaigns.They placed presidential candidate John Kerry (during the 2004 campaign) as the most liberal of all senators and Barack Obama (during the 2008 campaign) also as the most liberal senator, thereby sparking attacks from political opponents (see, e.g., Harris 2004;Montopoli 2008).Clinton et al. (2004b) and Clinton and Jackman (2009, p. 603) found the claims based on the ratings to be overstepping but said nothing about the mathematical basis for the principal components analyses and did not (or could not) evaluate it.More recently, North Carolina Senator Kay Hagan in her 2014 losing reelection campaign advertised herself as "the most moderate senator" based on the National Journal ratings (Christensen 2014).

Relation to Factor Analysis
Unfortunately, in the broad literature there has been much disagreement as to what constitutes factor analysis as well as great confusion between it and principal components analysis, with (e.g.) many authors saying that they are using the former when they are really using the latter (Jackson 1991, scts. 17.1, 17.10;Jolliffe 2004, p. 150).In addition, there are myriad varieties of (true) factor analysis, with different techniques for both parameter estimation and factor rotation, as well as for determination of scores.Published reports may give unclear or inadequate details of methods, or use ambiguous language.
For ideal-point estimation as well as more generally, the question may arise as to how much similarity there is between results from (true) factor analysis and those from (true) principal components analysis.A proposition proved by Bentler and Kano (1990) states that, under mild conditions, if a J-variate vector follows the factor-analysis model with just a single factor, then the squared correlation coefficient between that factor and the first-dimension score from principal components analysis will approach 1 as J increases.This result suggests that, for a large number (J) of roll calls, first-dimension ideal points estimated by principal components analysis and factor analysis may be quite close if the issue space is strongly one-dimensional.However, the result is limited since it just applies when there is only one factor.
Thus, in more than one dimension, principal components analysis and factor analysis will generally give results that are different.That is not to say, of course, that the former produces better outcomes than the latter (or vice versa).

The Heckman-Snyder Method
Heckman and Snyder (1997) estimate ideal points through a factor-analytic approach.The method is often mentioned but has not seen extensive use.It resembles PCA more closely than either P-R or CJR does.Its properties include the following: 1 It uses an I × I matrix (rather than J × J, as in PCA). 2 It apparently uses some sort of randomization to handle missing votes (Heckman and Snyder 1997, p. S160, footnote 13), a practice that seems questionable.3 It pays little regard to estimation of roll-call parameters (which are sometimes desired).4 It uses an unusual distributional assumption in relation to its utility function.5 It provides no standard errors.
For further comments on this method, especially some critical ones regarding the fourth property above, see Poole and Rosenthal (2001) and Clinton et al. (2004a).P-R, CJR, and Heckman-Snyder each use a particular utility function of their own (and have never considered any approach that refrains from using one).By contrast, PCA makes no use of a utility function and does not need to do so.

Further Endeavors, Bayesian and Other
Estimation of ideal points through Bayesian approaches other than CJR has been more specialized than CJR.Bailey (2001) focused on situations with a tiny number of votes, utilized covariates, and provided an example based on five US Senate roll calls dealing with international trade.Martin and Quinn (2002) and Bafumi et al. (2005) each examined US Supreme Court decisions over a span of more than 45 years and estimated ideal points for 29 justices, with allowance for temporal change in the case of the former paper.All three of these Bayesian works dealt only with a one-dimensional issue space.
For various possible extensions of principal components analysis, see Jolliffe (2004).

Large Empirical Examples Using PCA
Because Example 1 in Section 4.4 was small, it could illustrate various PCA details.Examples 2-5 for PCA, which we now present, are large but less comprehensive.They illustrate various aspects of application of our PCA technique.There are comparisons with P-R, and also some with CJR.Section 6.1 deals with general results, and Section 6.2 with model fit.The comparisons mostly show that PCA differs little from P-R or CJR, thus suggesting close equivalence of PCA with the other two from the standpoint of their results.
For details about data, see Appendix C. 7

General Results for Examples 2-5
All four examples use US Senate roll calls, from the 105th Congress (1997)(1998) for Examples 2 and 4 and from the 106th Congress (1999)(2000) for Examples 3 and 5. To be consistent with previous P-R and CJR published results and thus allow proper comparisons, Examples 2 and 3 are based on all roll calls except those with a vote more extreme than 97.5% to 2.5%: 486 such roll calls for the former example, 540 for the latter.Examples 4 and 5 are not nearly as large, and use, respectively, the 23 and 25 "key votes" selected by Congressional Quarterly (1998Quarterly ( , 1999Quarterly ( and 2000Quarterly ( , 2001) ) from among the 486 and 540.
In Example 2, the PCA x i1 's of all Democrats are less than (to the left of) those of all Republicans.The same is true for the P-R x i1 's in Example 2, as well as for the PCA x i1 's in Examples 3 and 5.It is true also for the P-R x i1 's in Example 3 except that Lincoln Chafee (R) of Rhode Island is to the left of Miller (D) of Georgia, and for the PCA x i1 's in Example 4 except that John Chafee (R) of Rhode Island is left of Hollings (D) of South Carolina and Breaux (D) of Louisiana.
The PCA eigenvalue ratios L 2 /L 1 are 0.07, 0.05, 0.19, and 0.09 for Examples 2-5, respectively.All are far below the 0.75 ratio for Example 1 (Section 4.4) and thus show smaller roles for the second dimension.In Examples 2-5 the respective percentages of PCA total variability accounted for by the first dimension are 54.8, 63.6, 46.3, and 66.5, and by the second dimension are 4.0, 3.1, 8.9, and 5.7.
For 65 of the 100 senators in Example 2, the PCA two-dimension GMP, G i.2 , exceeds the corresponding value for P-R.In Example 3, these GMP's are higher for PCA than for P-R for 74 of the 102 senators.Although the 65 senators in Example 2 are disproportionately Republicans, the 74 in Example 3 are disproportionately Democrats.
For PCA in Example 2, the highest G i.2 value, 0.869, is for Sessions of Alabama, and the lowest, 0.613, is for Byrd of West Virginia.P-R GMP's are likewise largest for Sessions and least for Byrd.

7
Besides Examples 2-5, we also have one more example, Example 6.It is for the 90th U.S. Senate (1967Senate ( -1968) ) and is used only in Section 7 below.
Table 2 shows correlation coefficients of location scores for both the first dimension (x i1 's, top half of table) and second dimension (x i2 's, bottom half) and for both the 105th Senate (left half of table) and the106th (right half).For the 105th Senate, the correlations are among the senators' scores from PCA using just the 23 key votes (PCA/23); from PCA using all 486 non-lopsided votes (PCA/486); and from P-R (P-R/486, also using these 486 votes).For the 106th Senate, PCA/25, PCA/540, and P-R/540 are analogous, respectively, to PCA/23, PCA/486, and P-R/486.In each 3 × 3 square in the table, Pearson (product-moment) and Spearman (rank-order) correlation coefficients are, respectively, below and above the main diagonal.The x i1 's, x i2 's are the legislator scores for dimension 1, 2; PCA/23, PCA/25 refer to the 23, 25 "key votes" in the 105th, 106th Senates; PCA/486 and P-R/486, PCA/540 and P-R/540 refer to the 486, 540 non-lopsided votes in the 105th, 106th Senates.
In the comparisons of P-R x i1 scores versus those of PCA based on all (486 or 540) votes, all (four) of the correlation coefficients, both Pearson and Spearman and for both Senates, are greater than 0.99.Table 2 also shows that even the correlations involving the PCA x i1 scores derived from the key votes, though lower, are still high (all greater than 0.9), despite the small numbers of key votes that provide the basis for the corresponding scores.
The second-dimension correlations of PCA/540 versus P-R/540 (106th Senate) are high.However, otherwise the correlations for x i2 that appear in Table 2, though well above zero, are not very large, perhaps an indirect result of weakness of the second dimension.
The first dimension in each of Examples 2-5 is obviously related to party and to traditional "left"-"right" factors.The second dimension, though, is rather elusive and not easy to pin down.However, the PCA g 2 eigenvectors can shed at least some amount of light in Examples 2 and 3.
For Example 2 (105th Senate), 19 roll calls have |g j2 | > 0.1.(The value 0.1 was picked arbitrarily.)All but two of those roll calls are budget or appropriations votes.On all 19, the Democrats are at least 80% united and the Republicans are less than 80% united; in fact, on all but three the GOP senators are no more than 2 /3 united.Broadly speaking, Republicans who tend to vote with the Democrats on these roll calls are at one end of the x i2 scale, whereas those who tend not to do so are at the other end.The senators with x i2 ranks of 78 through 100 are, except for Lieberman (D) of Connecticut, all Republicans in the former group, whereas those with ranks 1 through 12 are, except for Feingold (D) of Wisconsin, all GOP senators in the latter group.Senators with extreme x i2 scores tend also to have high G i.2 /G i.1 ratios.
For Example 3 (106th Senate), the picture for the second dimension is far different.All 29 of the roll calls with |g j2 | > 0.1 deal with some aspect of foreign trade (CQ votes #54, 178, 213, 344, 346, 348-350, and 352-353 in 1999;and #97-98, 231, 234-236, 238-246, and 248-251 in 2000).On the x i2 scale, senators on one end favor free trade whereas those on the other end are protectionist.The difference in the complexion of the second dimension in the two Senates stems from the fact that the 106th but not the 105th has a large number of votes related to foreign trade.(Generally, of course, the results of any method of roll-call analysis will be fundamentally affected by the nature of the votes in the data set.) Section 7 below will provide some suggestion that a third dimension could play a role in the 106th Senate (though not in the 105th).In line with this, in the 16 roll calls in Example 3 for which |g j3 | > 0.1, Democrats are at least 93% united in all but one, whereas Republicans are less than 2 /3 united in all but three.The pattern is much the same as for the second dimension in Example 2, except that the 16 votes cover a medley of issues rather than mainly budgets and appropriations.
For one dimension, results for cut points (analogous to P-R midpoints) may be of interest.
Of the 486 roll calls in Example 2, 106 have cut points (values of m j ) that are outside the range of senators' x i1 's (below −0.764 or above +0.673for this case).The corresponding figures for Examples 3-5 are, respectively, 130 out of 540 roll calls, six out of 23, and four out of 25.For such roll calls with cut points outside the range of the x i1 's, the senators' one-dimension p ij 's are either all greater than 1 /2 or all less than 1 /2.These roll calls generally have poor single-dimension model fit and/or a vote that is not at all close.
Time to run PCA is minimal.We did no analyses with US House roll-call data, which involves a much larger matrix than US Senate data.However, for a dummy vote matrix with I = 450 legislators and J = 1250 roll calls (comparable in size to a roll-call matrix for one two-year period for the US House), computer time (clock time) to run the single line of code at the end of Section 4.1 above was less than 14 seconds.With the code changed to extract just one dimension rather than all 449, the time was under five seconds.These times are for an Intel dual-core processor running at 2.4 gigahertz with 2.00 gigabytes of random-access memory.

Measures of Model Fit
Section 4.3 above defined GMP.Other possible measures of model fit are also available.They include percentage of correct classifications or %CC (Poole and Rosenthal 2007, p. 33) and aggregate proportional reduction in error or APRE (Poole and Rosenthal 2007, pp. 36-37).All the measures are applicable to each of the techniques of ideal-point estimation that we consider here and can be used to compare them. 8 GMP can be seen as more sensitive than the other two measures.That is, the other two differentiate less than GMP.For example, consider a roll call whose yeas and nays in the order of the spatial locations of the legislators (in a one-dimensional issue space) are No matter what the estimation technique, APRE (calculated for this roll call by itself) could not have a value greater than zero, because no placement of the cut point separating predicted yeas from predicted nays could do better than predicting nays for all 13 legislators.The measure %CC is similarly constrained.However, GMP can yield varied results for different estimation techniques.
Table 3 presents measures of model fit (GMP, %CC, and APRE) that compare PCA, CJR, and P-R, for both one and two dimensions and for both the 105th and 106th Senates (see end of Appendix C for 8 The last two measures can be illustrated for Example 1 (Section 4.4).Their values are, for one and two dimensions respectively, 1 − 27/111 = 0.757 and 1 − 6/111 = 0.946 for % CC and 1 − 27/45 = 0.400 and 1 − 6/45 = 0.867 for APRE.Here, 111 is the total number of votes cast, of which 45 were on the losing side; and 27 (k = 1) and 6 (k = 2) are the numbers of incorrect predictions (classifications), based on the sign of the estimate of (1) or (2) disagreeing with the vote.some details).Differences among the results from the three methods are small.On its four available comparisons, CJR is better than both PCA and P-R on the two for GMP, and better than PCA but about the same as P-R on the two for %CC.PCA is better than P-R on all four comparisons for GMP, and worse than P-R or about the same on the eight comparisons for %CC and APRE.The various differences are so narrow, though, that they seem to be inconsequential.

Number of Dimensions
Our PCA treatment throughout Sections 4 and 6 provides only descriptive results and thereby avoids any need for distributional assumptions.In this section, though, our PCA methodology entails a model based on continuous variables, whereas votes (if not missing) are, of course, binary.Specifically, the model assumes that each legislator's set of votes (i.e., each row of the matrix Y 0 ) is drawn independently from a multivariate normal distribution.In addition, the theoretic results rely on large-sample (asymptotic) distributions of the relevant statistics.Our methods here in Section 7 are suitable to the extent that assumption violations (e.g., votes being binary) do not have serious effects.Our results below, though, suggest that the methodology does work well.(One can speculate that the binariness is less of an issue with larger sample sizes.) There can be controversy over what number of dimensions to use in estimating ideal points from roll-call data.In particular, for the US Congress Heckman and Snyder (1997, pp. S165, S184) contended that the number should be much higher than the one or two favored by Poole and Rosenthal (1991) and others.Questions involving number of dimensions are not easy to judge.PCA, however, can at least furnish some clues.
A result for principal components (e.g., Anderson 1963, pp. 130-33;Morrison 1976, p. 294;Jackson 1991, pp. 86-87) provides a test of the null hypothesis that the k 0 -th through k 1 -th population eigenvalues (inclusive) are equal to one another.The test uses only the sample eigenvalues, L k .One refers to the chi-square distribution with (k 1 − k 0 + 3)(k 1 − k 0 )/2 degrees of freedom.The statistic (4) will suggest that the k 0 -th through k 1 -th population eigenvalues are not all alike if (4) is significantly large, or are all about the same if (4) is nonsignificant.Because the rough equality of eigenvalues for some dimensions k ≥ k 0 would generally signal that dimensions k ≥ k 0 are of little benefit and should not be kept, the use of (4) may help to decide how many dimensions are appropriate to use.The general concept is that, if the list of successively lower sample eigenvalues reaches a point where no further eigenvalues differ much, then the dimensions starting at that point probably have little meaning and can be dropped.Using (4) is thus similar to using the classical scree graph (e.g., Jolliffe 2004, p. 115 ff.).However, the latter may involve extra subjective judgment whereas the former provides a more formal statistical criterion for judging how many dimensions to retain.For PCA or any other approach, one could use model fit to try to assess dimensionality as in (e.g.) Poole and Rosenthal (2007, pp. 63-64), but that is also more subjective than using (4).
For selected (k 0 , k 1 ) and the associated degrees of freedom, Table 4 shows the value of (4), along with its chi-square probability, for Examples 2-6 from the US Senate.As indicated before, Examples 2 and 3 involve all the non-lopsided votes, and Examples 4 and 5 each use a small set of "key" votes, from the 1997-1998 and 1999-2000 terms.That period of time, unlike that of Example 6, is generally thought to have strongly unidimensional voting patterns.The value of χ 2 tests for the equality of the k 0 -th through k 1 -th population eigenvalues (inclusive); d.f.= Degrees of freedom = (k 1 − k 0 + 3)(k 1 − k 0 )/2; Prob.= Probability of finding a value of χ 2 greater than the one shown.
Although some caution is needed because of multiple testing, one can draw several conclusions of varying tenability.First, consider Examples 2-5.Not only for the full-vote data (Examples 2 and 3) but also for the key votes (Examples 4 and 5), the probabilities for the basic statistic (4) with k 0 = 1 show decisively that the first population eigenvalue differs from the others, thus (not surprisingly) indicating a strong first dimension.For Example 2, the low probabilities for k 0 = 2 and the nonsignificant ones for k 0 = 3 suggest acknowledging a second (albeit weak) dimension but not a third.For Example 3, though, the high probability for (k 0 , k 1 ) = (2, 3) combined with the low remaining probabilities for k 0 = 2 and k 0 = 3 suggests that the second and third population eigenvalues differ from those for higher dimensions but perhaps not greatly from each other-a condition consistent with recognizing both a second and a third dimension (see Section 6.1 above for related discussion).The results for k 0 = 2 and 3 for Examples 4 and 5 are largely nonsignificant, though their patterns slightly resemble the ones in Examples 2 and 3, respectively.The results for k 0 = 4 in the different examples provide no evidence for a fourth dimension.All of the results for Examples 2-5 are credible.
Example 6 pertains to the 90th Senate (1967)(1968).This Senate was chosen for inclusion in Table 4 because it was cited by Lewis and Poole (2004, p. 106) as one with "two dominant dimensions".In conformity with that description, the probabilities for k 0 = 2 are far lower than in Examples 2-5.Thus, the findings for the 90th Senate reinforce other conclusions from Table 4 regarding usefulness of (4) for judging dimensionality. 9

Discussion
In comparison with alternatives, our PCA approach to estimating ideal points from roll-call data is simple in both concept and implementation, and its computation is fast.Together with its simplicity come less programming and easier understanding.It evidently has face validity, based on the results in Sections 6 and 7.It also avoids the difficulties (noted in Section 3) that, especially for more than one dimension, affect both P-R and CJR, with the former facing more issues than the latter.Why, then, is principal components analysis seldom used for ideal points-either in one dimension, where applications are more frequent, or in more than one, where its advantages are greater?Outside of plain inertia and adherence to the status quo, two considerations may be playing a role.We do not find either one to be generally compelling.
First, PCA provides no means to assess uncertainty.CJR uses Bayesian methodology to estimate uncertainty for estimates of ideal points and other parameters.Lewis and Poole (2004) proposed parametric-bootstrap standard errors to handle uncertainty assessment for P-R.Both the P-R and CJR techniques, however, rely on an ungrounded assumption of (mutual) conditional independence of a legislator's votes given the ideal points and roll-call parameters.Although for PCA one may try to find complex standard-error formulas that steer clear of that assumption, it appears that, whether for P-R, CJR, or PCA, any effort to assess uncertainty is fraught with impediments.In addition, standard errors may see infrequent use in applications anyway.
Second, unlike P-R and CJR, PCA does not use a utility function in deriving its model.However, in its theory it is still based on a spatial voting model, just as P-R and CJR are.Parts A.1 and A.3 of Appendix A show a close mathematical similarity between P-R, which uses a utility function, and the model (1)-(2) above, which does not.Table 2 indicates that PCA and P-R yield first-dimension ideal points that are barely distinguishable.This close parallel between PCA and P-R, both theoretic and pragmatic, suggests that the lack of a utility function for PCA need not be a general concern.Principal-components methods have, of course, seen use in various fields of application.

Summary
For analysis of roll-call data, this paper builds a case for considering our PCA approach as an alternative to P-R and CJR, two well-established methods.P-R has been used for years and is deeply entrenched.CJR has made recent inroads.
For unidimensional applications anyway, many users may thus hesitate to lightly eschew P-R (or CJR) and embrace PCA instead, despite the strong points of PCA even in one dimension.However, for two or more dimensions, whose study may be fruitful for varied situations (e.g., for certain locales and time periods or certain subsets of votes or legislators), the relative benefits of PCA are especially striking and should suffice to make PCA a preeminent contender.9 Also consonant with a relatively strong second dimension in the 90th Senate are its eigenvalue ratio L 2 /L 1 , equal to 0.29, and its GMP gain from the second dimension, G ..2 − G ..1 = 0.711 − 0.665 = 0.046, both much larger than the respective values, for the 105th and 106th Senates (Examples 2 and 3), of 0.07 and 0.05 for L 2 /L 1 , and of 0.775 − 0.758 = 0.017 and 0.813 − 0.791 = 0.022 for G ..2 − G ..1 .respectively, where the index k refers to the dimension (cf.McCarty et al. 1997, Appendix A, e.g.).The parameter ω 1 may be set to 1 (McCarty et al. 1997, p. 53) and thus effectively dropped.
For either (A1) or (A2), the total number of parameters to be estimated (excluding ω 1 ) is (2I + 4J + 2) after expansion to two dimensions, compared with (I + 2J + 1) for one dimension.The number of parameters for PCA (and also for CJR) is (2I + 3J) for two dimensions and (I + 2J) for one.Thus, not counting ω 1 , (A1) and (A2) after expansion to two dimensions each have (J + 2) more parameters than (2), but for one dimension they have only one more parameter than (1).
If (A1) is replaced by substituting quadratic for exponential utility and by allowing for two dimensions instead of one through substitution of (A10), then (A3) is unchanged except for the substitution of (A10), given that β and both ω 2 k terms are dropped from the two-dimensional version of (A1).Thus, (A4) now becomes log where the m jk 's and b jk 's are defined analogously to (A5) and (A6), respectively.In order for (A11) to be the same as (2), it is necessary to substitute in (A11), thereby reducing the number of parameters for each roll call j from 4 to 3. In fact, if the substitution (A12) is not made, then the m jk parameters in (A11) encounter exacerbated difficulties with identifiability.The necessary decrease in number of parameters is just what one would expect given the results stated in the preceding paragraph.
For the general case of K dimensions, (A1) and (A2) generalize in obvious fashion through substitution of (A10) again but with 2 replaced by K as the upper limit of the summations in (A10).For general K, the number of parameters (with ω 1 excluded) is then K(I + 2J + 1) for the expanded version of either (A1) or (A2).The model (2) as generalized has (KI + KJ + J) parameters for general K (as does CJR-see (Clinton et al. 2004a, p. 357), formula for what they call p).Thus, P-R has (KJ − J + K) more parameters than (2).The use by P-R of the yea and nay roll-call outcome parameters z jk1 and z jk0 may appear reasonable until one becomes aware of the resulting overparameterization and its consequences.
If the solution for neither problem yields a positive value of C, then for roll call j there exists no line X 2 = A + BX 1 that completely separates the two-dimensional locations of the yea voters from those of the nay voters, and so the standard logistic-regression calculation for the roll-call parameters can proceed.(X 2 and X 1 relate to x i2 and x i1 , respectively.)If the first problem yields a positive value for C, then the line X 2 = A + BX 1 has all yea voters above it and all nay voters below it.Senators' vote data for PCA came from ftp://voteview.com/dtaord/sen105kh.ord for Examples 2 and 4 (105th Senate) and from ftp://voteview.com/dtaord/sen106kh.ord for Examples 3 and 5 (106th Senate).The vote data for Example 6 (90th Senate), used only for Table 4, came from ftp://voteview.com/dtaord/sen90kh.ord.P-R first-and second-dimension scores, used for calculations for Table 2, came from ftp://voteview.com/junkord/massproduction/s105_bs_1000_2.dat for Example 2 and from ftp://voteview.com/junkord/massproduction/s106_bs_1000_2.dat for Example 3. The P-R two-dimension GMP's (G i.2 ), used for some comparisons with PCA in Section 6.1, came from ftp://voteview.com/junkord/massproduction/s105_nom31_1000.dat for Example 2 and from ftp://voteview.com/junkord/massproduction/s106_nom31_1000.dat for Example 3.
In Table 2 the results for the 106th Senate exclude Miller of Georgia, who was in office for just a short time, and are based only on the other 101 senators.If Miller is included, the four values for PCA/540 versus P-R/540 change from (0.996, 0.990, 0.906, 0.930) to (0.994, 0.990, 0.896, 0.923).
In Table 3, the P-R measures came from Poole (2005, p. 165) for the 105th Senate, and from http://www.voteview.com/c106/fits.htmfor the 106th Senate.The PCA measures are, of course, those for Examples 2 (105th Senate) and 3 (106th Senate), and are calculated as indicated in Sections 4.3 and 6.2 Measures for CJR, though, are available only for the 105th Senate and only for GMP and %CC.Their source is Table 1 of Jackman (2001).Although that table does not include GMP's, it does show log-likelihood values, which are −12,965.14for one dimension and −11,844.19 for two.Thus, the respective GMP's can be calculated as e −12,965.14/47,739= 0.762 and e −11,844.19/47,739= 0.780 (where 47,739  is the total of the yeas and nays across the 486 roll calls, and thus reflects 861 missed votes out of a potential 48,600 for the 100 senators who served in the 105th Senate).
fit is perfect in one dimension for roll call #135 (as indicated by the values G. j1 = 1 for roll-call GMP and 100 for |b j |) and in two dimensions for four additional roll calls [for which G. j2 = 1 (and |b j2 | = 100; see Appendix B)].Roll call #288 has by far the highest value of the ratio |b j2 /b j1 |.Not surprisingly, the three roll calls with the highest |b j2 /b j1 | ratios are the same three whose |g j2 | values are greatest, thus suggesting further that these three contribute heavily to the second dimension.
One then uses the solution values of A and B and sets b j0 = −HA, b j1 = −HB, and b j2 = H (as in Section 4.2, we use H = 100).If the second problem yields a positive value for C, then the line X 2 = A + BX 1 has all nay voters above it and all yea voters below it.One then sets b j0 = HA, b j1 = HB, and b j2 = −H.The routine of this paragraph easily generalizes for more than two dimensions.If desired, one could do all logistic regressions at the start and then, just for those roll calls for which convergence fails, do the linear programming.Or, for those roll calls, one could even accept the non-convergent (b j0 , b j1 , b j2 ) estimates (forgoing any linear programming) and declare perfect model fit; such values differ from the ones above but may entail no other consequences.Appendix C. Data Details for Examples 2-6 11

Table 1 .
Data and results for Example 1 (14 US House members, eight of the 12 CQ key votes for 2006).

Table 2 .
Pearson and Spearman correlation coefficients (lower left and upper right triangle, respectively, in each 3 × 3 square) for senator location estimates, among "key vote" PCA, full PCA, and Poole-Rosenthal W-NOMINATE, for one and two dimensions and 105th and 106th Senates.

Table 3 .
Measures of model fit for PCA, CJR, and P-R, for results in one and two dimensions, 105th and 106th Senates.

Table 4 .
Chi-square values from (4) along with their upper-tail probabilities, for Examples 2-6, for certain (k 0 , k 1 ) and corresponding degrees of freedom.