2. Mathematics for General Result
As the result builds on Waller’s [
16], we adopt his notation wherever possible. Kuder and Richardson [
17] defined a measure of internal reliability for binary choices (commonly known as KR-20), which was generalized by Hoyt [
18] and Guttman [
19] and popularized by Cronbach [
10] to the form bearing his name. Let
represent responses from a multivariate probability distribution to
k items on an instrument. The notation
V is used to represent the
valid distribution. Cronbach’s alpha is defined as
An alternate formulation in terms of average variances and covariances will be convenient. Define
to be the average variance of components of
V, and
to be the average covariance between distinct components of
V. Specifically,
and
Then Cronbach’s alpha may be expressed in the form [
16,
20]
Now consider an instrument with k items given to a population with two distinct subgroups. The first subgroup has a response distribution denoted by , and the second subgroup has a response distribution denoted by . The notation C is used to represent the contaminating IER.
Let
W be a Bernoulli random variable with parameter
p, where
p is a value between zero and one representing the probability of observing a response from the contaminating class. That is,
W is a random variable which takes value one with probability
p and zero with probability
. The responses actually recorded on the instrument are described by the multivariate distribution
M, defined by
The notation M is used to emphasize that it is a mixture of the valid and contaminating responses. Because W is either zero or one, each individual gives responses from one of the two response distributions. With probability p an individual will give contaminating responses, and with probability an individual will give valid responses.
By adopting this model, it is assumed that a respondent will either respond attentively to all items, or respond in an invalid manner to all items. We acknowledge that this assumption does not perfectly model real-life data; responses may be partially invalid [
21] and are more likely to be invalid at the end of a survey [
22]. However, we believe (and there is precedent in the literature [
16]) that this assumption represents a reasonable trade-off between the realism of the assumptions and the complexity of the model. Furthermore, the usual data cleaning methods used by a practitioner to remove suspected IER operate at the respondent level rather than the item level.
The goal is to find when ; that is, when contamination inflates Cronbach’s alpha. First, notation and two results that will aid in the comparison are introduced.
Let
and
denote the respective means of responses to item
i from the valid and contaminating distributions. The differences in these means are called “item validities” in the taxometrics literature and are denoted by
[
16]. As with the variances and covariances, only averages are needed. Specifically, the average product of item validities for distinct items, and the average of squared item validities:
The first of the two needed results is known as the
general covariance mixture theorem [
23,
24]. Here, it will be expressed in terms of averages.
Lemma 1. Let M be defined as a mixture of V and C as in Equation (3), where p is the probability of observing a response from C. Assume that the random quantities V, C, and W are independent. Then - 1.
- 2.
A proof is in Appendix A of Meehl [
23].
The second result is an inequality between average variances and covariances of items within a distribution, and will be used to investigate special cases during the discussion.
Lemma 2. The average of covariances between distinct components of a multivariate distribution V is less than or equal to the average variance. Symbolically, The proof is in the appendix. The main result can now be stated and proved.
Theorem 1. Let V and C be multivariate distributions with k components representing potential responses to an instrument. Let W be a Bernoulli random variable with parameter p between zero and one. Define as a mixture of V and C. Assume that V, C, and W are independent. The behavior of Cronbach’s alpha under the mixture can be broken down into five categories.
- 1.
Cronbach’s alpha does not change for any mixing proportion. for all p.
- 2.
Cronbach’s alpha inflates for any mixing proportion. for all p.
- 3.
Cronbach’s alpha deflates for any mixing proportion. for all p.
- 4.
Cronbach’s alpha inflates for small mixing proportions, but deflates for large mixing proportions. There is a value in the interval such that for , but for .
- 5.
Cronbach’s alpha deflates for small mixing proportions, but inflates for large mixing proportions. There is a value in the interval such that for , but for .
Furthermore, there exist distributions that will yield each of the above cases, including when the item scale is continuous, discrete, or binary.
Proof. The general strategy is to derive that the sign of the bias in Cronbach’s alpha has the same sign as a quadratic function of the mixing proportion
p, and then invoke elementary properties of quadratic functions. Begin by finding conditions under which Cronbach’s alpha inflates, or when
. Apply Equation (
2), the alternate form of Cronbach’s alpha.
Combine into a single fraction with a common denominator.
Expand the numerator. The term
will cancel.
The denominator of (
4) is a scaled product of variances and must be positive, so it will not affect the sign of the bias. Thus, to determine whether Cronbach’s alpha has inflated, it suffices to find when
Apply the general covariance mixture theorem to the average variances and covariances of
M.
Expand and group terms that include
and
p. The term
will cancel. The result is
It is now easy to see that Cronbach’s alpha inflates when . is a quadratic with no constant term. This is a simple family of functions, though the coefficients are not simple. Momentarily ignore how a and b are defined (we will return to this soon), and consider the possible behaviors of functions of the form . In particular, we are interested in the sign for values of p in the interval , as this will determine when Cronbach’s alpha inflates or deflates. This behavior can be characterized in terms of the concavity and roots of . The roots are at 0 and (as long as . Consider each case in turn.
Case one is a trivial possibility when . for all p in .
Case two is for all p in . This has two subcases: if and (but not both ), or if .
Case three is for all p in . This has two subcases: if and (but not both ), or if .
Case four occurs when . The non-zero root lies in the interval , meaning changes sign at . Because the function is concave down, so the bias changes from positive to negative.
Case five occurs when . The non-zero root is in the interval , except now and the function is concave up. The bias changes from negative to positive.
Examples of each non-trivial case, including the subcases for two and four, are included in
Figure 1.
Because the sign of the bias is derived to be the same as the sign of a quadratic function with a root at zero, these scenarios are exhaustive. For example, a scenario in which Cronbach’s first deflates, then inflates, and deflates again as p increases would require three crossings of the horizontal axis. This is not possible for a quadratic and is logically excluded.
To complete the proof, remember that the values
a and
b are not arbitrary, but defined in Equations (
5) and (
6) in terms of summaries of two multivariate probability distributions which represent responses to an instrument. Item validities, representing differences in means, are bounded by the item scale. Variances and covariances are also limited by the range of the scale and inequalities such as Cauchy-Schwarz [
25]. Furthermore, if the scale has a small number of discrete options or is binary, then means, variances, and covariances are not independent parameters. A natural question is: are all five cases actually possible? The answer is yes, and the following two paragraphs describe how to obtain each.
Consider an instrument with 20 items on a five-point scale from one to five. No questions use negative keying. Discrete data are produced by first generating multivariate normal observations with a given mean vector and covariance matrix, and then rounding. The covariance matrix is constructed by using the average variance for the diagonal entries and the average covariance for the off-diagonal entries. The multivariate normal observations are rounded to the nearest integer in the scale. Forcing discrete responses by rounding will, of course, change the means, variances, and covariances, but in the particular cases considered, the change is not enough to alter the characterization of Cronbach’s alpha. Two data sets are produced, representing valid responses and mixed responses. Cronbach’s alpha is calculated for each, and the bias is recovered as the difference. This simulation is repeated for values of
p, the mixing proportion, in increments of 0.025 between zero and one.
Table 1 describes the means, variances, and covariances of the multivariate normal values which were rounded to obtain
V and
C in order to reproduce each case.
Figure 2 shows, for each case, the bias of the simulation as a solid black line, the exact bias before discretization as a dotted blue line, and the function
as a dot-dash red line. To aid in seeing when the sign changes, there is a dashed line for the horizontal axis. The simulations used 10,000 respondents at each value of
p.
Reproducing all cases when all items are binary options is trickier, but still possible. The data were simulated in the same manner as before, except responses were rounded to the nearest of zero or one.
Table 2 describes the summaries of the multivariate normal values which were rounded to obtain
V and
C in order to reproduce each case. A larger number of simulations was necessary to reduce the sampling variability and clearly see the bias in Cronbach’s alpha. We found 20,000 to be sufficient for all except case five, which used 100,000 simulations. The graphs for the binary case are not significantly different from the five-option case, and are omitted. This completes the proof. □
We illustrate one of the non-intuitive possibilities, case 2, through an example with simulated data.
Table 3 contains data from ten respondents for an instrument with five items. The data were generated such that there was a
chance of a contaminating response. In this sample, the result was six valid responses and four contaminating responses. Each class has a variance of one for all items and a covariance of zero (due to independence) between any pair of items. Because any pair of items has independent responses, the exact value for Cronbach’s alpha is zero, which is estimated from this sample to be 0.13 for the valid class and 0.063 for the contaminating class. However, because the valid class has a mean of two and the contaminating class has a mean of four, responses
appear to be consistent when the two classes are combined into a single data set. The resulting estimate of Cronbach’s alpha for the entire sample is 0.87, a value generally seen as desirable, yet we see it is only due to contamination in the sample.
Now let us relate this example to Theorem 1. The contaminating classes have summaries
,
,
,
,
, and
. Applying Equations (
5) and (
6), we see:
As , this is case 2 of Theorem 1, so Cronbach’s alpha will inflate for any proportion p of contamination.
For this example, we estimated Cronbach’s alpha using Equation (
1) with sample estimates of variance, but most software solutions have built-in functions with helpful features. For example, the
alpha() function in the
psych package [
26] in R [
27] produces a confidence interval for alpha and an analysis of how alpha will change if items are removed from this instrument.
At this point, no claim is made as to how likely each case is, only that all are possible. In the discussion, we will demonstrate that each of these cases could potentially be arrived at through specific types of IER.
Because of the removal of the positive term in Equation (
4),
does not give the magnitude of the bias, but is a function with the same sign as the bias. The cancelled term includes covariances of
M which implicitly depend on
p, so the magnitude of the bias is a ratio of polynomials in
p and is more difficult to analyze.
Figure 2 makes it clear that
and the magnitude of the exact bias share the same roots and sign but potentially very different magnitudes. This is why the cases in
Table 1 and
Table 2 and
Figure 2 do not differentiate between
being concave or convex; that property is not necessarily shared with the magnitude of the bias.
The simulation code that produced
Figure 2 is available through a Shiny R [
28] web app found at
https://alphaier.shinyapps.io/cronbachs_alpha_under_ier/. The app allows users to investigate how Cronbach’s alpha will behave under a mixture model consisting of discretized multivariate normal distributions with any mean vector, average variance, and average covariance (as long as the resulting covariance matrix is valid). We see two potential uses for this tool. The first is educational, as it allows the user to visualize the effects of contamination on Cronbach’s alpha and test the effect when the valid and contaminating distributions are altered. Second, the forthcoming discussion relates Theorem 1 to the two main types of IER, but IER could potentially manifest through myriad possible response distributions. For any other hypothesized pattern of IER, if the means, variances, and covariances can be specified, this tool can immediately determine the effect of that contamination on Cronbach’s alpha.
The reader is reminded that the preceding is true for any mixture of two distributions, whether the interpretation of “valid” and “contaminating” holds or not.
3. Discussion and Special Cases
This section refers repeatedly to the quadratic coefficients a and b, which were defined as
With effort, these complicated coefficients yield much information about the behavior of Cronbach’s alpha under mixture models.
Now we relate the behavior of Cronbach’s alpha to IER. Studying IER in any generality is difficult because IER can take so many forms. IER is defined more by what the responses are not, rather than by what they are. For this reason, past investigations [
5,
11] have typically considered two extreme forms through which IER may manifest:
Random responding: Item responses are uniformly and independently chosen from those available.
Straight-lining: The respondent chooses the same option for all items, either in an attempt to complete the instrument as quickly as possible or operating on the belief that all questions are sufficiently similar to the first. Different respondents may choose different options, but each respondent repeats their choice without deviation.
A strength of the current approach is that any form of IER can be investigated as long as the means, average variance, and average covariance can be determined. Some of the following observations will refer specifically to random responding or straight-lining, and the fifth observation will deal with a potential form of IER we have not seen studied in the literature.
Observation 1: If all components of V have a common mean, and all components of C have a common mean, then the quadratic coefficient a cannot be positive. Cases one through four are still possible, but case five is not.
Suppose that all means are equal within a response distribution, that is,
and
for all items
i and
j. This implies
, and so Equation (
5) can be simplified as
is an average of squares and is clearly positive, while Lemma 2 implies the term in parentheses is negative. Thus, a is non-positive, precluding the possibility of case five.
Actually, the assumption of common means within a distribution is stronger than necessary, as it is sufficient for all item validities to be equal. However, the case of common means within a distribution is an important special case for many of the following observations, so that is the form in which the observation is stated.
Observation 2: The forces pressuring Cronbach’s alpha to inflate are:
Increasing the differences between means of V and C when item validities have the same sign,
Increasing the ratio of average covariance to variance for the contaminating distribution,
Decreasing the ratio of average covariance to variance for the valid distribution.
Likewise, forces in the opposite direction will pressure Cronbach’s alpha to deflate.
The first part of this observation is easiest to see when response distributions have common means, so consider
as expressed in Equation (
7). As the difference in means grows,
a becomes more negative and
b becomes more positive. This moves in the direction of cases two and four, so alpha tends to inflate. This is pertinent to situations in which the content of the survey will lead the mean of attentive responses to be close to an extreme bound. Consider a survey attempting to detect a rare trait like psychopathy. The mean of attentive responses is expected to be low, but if careless responses are chosen randomly and have a mean close to the midpoint of the scale, it is possible that alpha will inflate due to IER. This suggests that measures of extreme psychopathology may report inflated values of Cronbach’s alpha.
The second part of this observation is intuitively plausible, as it corresponds to contaminating with a highly internally consistent distribution, such as straight-lining. As the average covariance of C increases, b becomes more positive, moving away from case three towards case two, possibly moving through cases four and five.
The third part of this observation corresponds to a valid distribution with low internal consistency, leaving ample opportunity for inflation. As the average covariance of V decreases, a becomes more negative, which by itself would pressure alpha towards deflation, but b includes in the sum, so b is becoming more positive. Also, the second term in b includes a negative average covariance of V. Thus, b is increasing faster than a is decreasing, moving in the direction of case two and possibly case four.
Observation 3: If means are equal across items and distributions and contamination consists purely of random responses, then Cronbach’s alpha must deflate (except for the unusual case that ). However, if the distributions have different means, either inflation or deflation is possible.
The key characteristic of random responding is the independence between responses, so
. All means being equal implies that all item validities are zero, thus
. Combining these observations,
. The assumption that
implies
, so
b is negative. This is case three, in which Cronbach’s alpha deflates. The example of case three in
Table 1 illustrates this exact situation.
However, if the means of
V and
C are not equal, then deflation is not guaranteed. Consider case two in
Table 1 and
Figure 2, in which the contaminating distribution has independent responses, yet alpha always inflates due to the difference in means. Case four also uses a contaminating distribution with independent observations, but whether alpha inflates or deflates depends on the exact mixing proportion
p. Case two is noteworthy because both of the distributions contributing to the mixture have average covariances of zero (thus
), yet the mixture has a positive value of Cronbach’s alpha. This scenario is discussed by Waller [
16] as admittedly contrived and non-realistic, but useful as an example of the non-intuitive nature of reliability measures under mixtures.
The scenario of contamination with random responses is included in the simulation study of DeSimone et al. [
5]. Prior to the study, the authors state the expectation that random responding will reduce alpha (p. 312).
Figure 2 of the same article confirms that for their particular situation, random responses did indeed result in a strict decrease in Cronbach’s alpha. However, the results of the present article show that this is not the only possibility, and that random responses can increase Cronbach’s alpha if the means of the valid and contaminating distributions are sufficiently different.
Observation 4: Assume the valid distribution has a common mean and no questions use reverse keying. If contamination consists purely of straight-lining, then alpha is guaranteed to inflate.
The key characteristic of straight-lining respondents is that covariance equals the variance, which can be seen from applying to the definition of covariance. Furthermore, straight-lining forces a common mean. Thus, the item validities are all identical, and from Observation 1 a is negative. Combining with the fact and Lemma 2, b can be simplified as
This is case two, so Cronbach’s alpha can only inflate. This confirms generally the expectation by DeSimone et al. [
5] that pure straight-lining will inflate alpha.
Observation 5: If contamination is of a form that alternates between extremes, then case five is a possibility. Cronbach’s alpha deflates for small p, but inflates for larger p.
Consider a mischievous responder who deliberately alternates between the first and last option in a scale for the entirety of the instrument. This form of IER can produce case five. We are not aware of (nor would we expect) any studies of this contrived style of response (though anecdotally, one of the authors observed a classmate exhibit this behavior on a standardized test in secondary school). This is case five in
Table 1 and
Figure 2. This could also be produced by straight-lining respondents when the survey alternates between regular and reverse keying.
Observation 6: To investigate the effects of multiple types of IER occurring simultaneously, mixture models and the general covariance mixture theorem can be applied iteratively.
In reality, IER rarely consists exclusively of purely random or straight-lining responses. It is more likely that non-valid responses from C are themselves a mixture of random, straight-lining, and perhaps other kinds of IER. Therefore the general covariance mixture theorem (Lemma 1 in the present paper) can be applied repeatedly to obtain the parameters of C, at which point Theorem 1 can be applied to determine whether Cronbach’s alpha will deflate or inflate.
Consider the following illustrative example. An instrument has five questions, with each having five options. Of the respondents, 75% will answer in a valid manner, 20% will answer randomly, and 5% will straight-line. The valid responses follow a discrete uniform with
, corresponding to a common mean
, an average variance of
and an average covariance of
. Careless responses come in two forms. The 20% of respondents who respond randomly (denote corresponding quantities with the subscript
R) have a common mean
, an average variance of
, and an average covariance of
. The 5% of respondents who straight-line (denote corresponding quantities with the subscript
S) choose the first item uniformly, so these responses have a common mean
, an average variance of
, and an average covariance of
. All means are identical, so item validities are zero. Within the contaminating class, 80% come from random responses and 20% are straight-line responses, so an application of Lemma 1 yields
, and
. Next, applying Equations (
5) and (
6) yields
and
, so this is case three, where Cronbach’s alpha will always deflate. It is interesting that for this example, only the mixing proportion
within the contaminating class is important. The more numerous random responders have a larger effect than the few straight-lining responders, so changing the mixture proportion between the valid and contaminating class may change the magnitude of the bias, but will not alter the fact that Cronbach’s alpha will deflate due to contamination.
In the previous example, the random responding followed a uniform distribution (a sort of pure randomness). There are infinite other potential response distributions, but fortunately, a full specification of the exact distribution is not necessary. The behavior of Cronbach’s alpha depends on the valid and contaminating distributions only through the variances, covariances, and differences in means. So, in a partial sense, the realm of possibilities is reduced. Any multivariate distribution of responses with identical means, variances, and covariances will bias Cronbach’s alpha in the same manner.
How do real-life manifestations of careless responses tend to affect Cronbach’s alpha? For an answer, we defer to studies with real data where alpha is calculated with and without suspected IER. Huang et al. [
2] compared alpha for 30 facets and found that generally alpha decreases as a result of careless responses, with a notable exception: one section with eight positively keyed and only two negatively keyed items manifested in an increase in alpha, which the authors attribute to straight-lining respondents, which is consistent with the present analysis. Wertheimer [
14] conducted a similar analysis for multiple data sets and classified respondents as conscientious, random, or patterned. In summary, removing random respondents tended to increase alpha, removing patterned respondents tended to decrease alpha, and removing both tended to increase alpha, but to a lesser degree. This agrees with the results of the present paper, lending evidence that the mathematical assumptions are not too unrealistic.