Stephen Jay Gould’s Analysis of the Army Beta Test in The Mismeasure of Man: Distortions and Misconceptions Regarding a Pioneering Mental Test
My original reasons for writing The Mismeasure of Man mixed the personal with the professional. I confess, first of all, to strong feelings on this particular issue. I grew up in a family with a tradition of participation in campaigns for social justice, and I was active, as a student, in the civil rights movement at a time of great excitement and success in the early 1960s… Some readers may regard this confessional as a sure sign of too much feeling to write a proper work in nonaction. But I am willing to bet that passion must be the central ingredient needed to lift such books above the ordinary, and that most works of nonfiction regarded by our culture as classical or enduring are centered in their author’s deep beliefs.
2. Background of the Army Beta
3. Gould’s Criticisms of the Army Beta
3.1. Criticisms of Test Content
3.2. Criticisms of Test Administration Conditions
3.2.1. Army Beta Instructions
3.2.2. Test Administration Facilities
3.2.3. Screening for Literacy
3.3. Criticisms of Time Limits
3.4. Criticism of the Belief that the Army Beta Measures Intelligence
- Scores on the total Army Beta test would be as high or higher (α = 0.05) than Gould’s  students’ scores: 58% of students rated as A, 30% of students rated as B, and 11% of students rated as C. We believed this outcome was plausible for two reasons. First, Gould’s students were enrolled in a course on “biology as a social weapon” and may have had a preconceived bias against the utility and/or validity of the Army Beta; Gould’s treatment of Morton’s data and his negative view of the Army Beta’s creators may have also influenced the way he administered the Army Beta. Second, we thought it was plausible that the Flynn Effect would boost the scores of a modern sample of college students.
- Completion rates of the new sample would be similar to completion rates in Gould’s  sample, as determined by a chi-squared test (α = 0.05). We created this hypothesis in order to test whether Gould’s results were consistent with an administration of the Army Beta according to the instructions (, pp. 162–165).
- All variables in the correlation matrix would be positively correlated, and those correlations would be statistically significant (one-tailed test, α = 0.05), with the possible exception of Test 1’s score (which we believed would have low variance because it appeared to be an easy test and only allowed a maximum of 5 points out of 118). If this hypothesis were disproved, it would support Gould’s claim that the Army Beta could not measure intelligence (e.g., , p. 210). If Army Beta subscores all positively correlate, it would demonstrate that the Army Beta functions much like any other intelligence test.
- Confirmatory factor analysis would show good fit for a one-factor congeneric model. This one-factor model would fit the data better than the two-factor model that would consist of two correlated factors, where one factor consisted of subtests that required written numbers (Cube Analysis, Digit Symbol, and Number Checking) and a second factor consisted of all other subtests (Maze, X-O Series, Picture Completion, and Geometric Construction). This two-factor model is consistent with Gould’s claim that the use of written numbers could interfere with the test’s ability to measure intelligence, whereas a one-factor model would be consistent with the Army Beta’s creators’ view that the test measured a general cognitive ability, and that it was rational to combine the subtest scores into a global test score. Thus, Hypothesis 4 pits our interpretation of the Army Alpha and Gould’s interpretation against one another to determine which is better supported by the data.
5. Overall Discussion
5.1. Gould’s Judgments of the Army Beta
5.2. Other Thoughts
Conflicts of Interest
- Gould, S.J. The Mismeasure of Man; W. W. Norton: New York, NY, USA, 1981. [Google Scholar]
- York, R.; Clark, B. Debunking as positive science: Reflections in honor of the twenty-fifth anniversary of Stephen Jay Gould’s The mismeasure of man. Mon. Rev. 2006, 57, 3–15. [Google Scholar] [CrossRef]
- Carroll, J.B. Reflections on Stephen Jay Gould’s The mismeasure of man (1981): A retrospective review. Intelligence 1995, 21, 121–134. [Google Scholar] [CrossRef]
- Jensen, A.R. The debunking of scientific fossils and straw persons. Contemp. Educ. Rev. 1982, 1, 121–135. [Google Scholar]
- Snyderman, M.; Herrnstein, R.J. Intelligence tests and the Immigration Act of 1924. Am. Psychol. 1983, 38, 986–995. [Google Scholar] [CrossRef]
- Rindermann, H. Cognitive Capitalism: Human Capital and the Wellbeing of Nations; Cambridge University Press: New York, NY, USA, 2018. [Google Scholar]
- Gould, S.J. The Mismeasure of Man: Revised and Expanded; W. W. Norton: New York, NY, USA, 1996. [Google Scholar]
- Mataré, H.F. The controversial teachings of Stephen Jay Gould. Mank. Q. 2003, 43, 321–333. [Google Scholar]
- Rushton, J.P. Race, intelligence, and the brain: The errors and omissions of the ‘revised’ edition of S. J. Gould’s The Mismeasure of Man (1996). Personal. Individ. Differ. 1997, 23, 169–180. [Google Scholar] [CrossRef]
- Sanders, J.T. Marxist criticisms of IQ: A defense of Jensen. Can. J. Educ. 1985, 10, 402–414. [Google Scholar] [CrossRef]
- Lewis, J.E.; DeGusta, D.; Meyer, M.R.; Monge, J.M.; Mann, A.E.; Holloway, R.L. The mismeasure of acience: Stephen Jay Gould versus Samuel George Morton on skulls and bias. PLoS Biol. 2011, 9, e1001071. [Google Scholar] [CrossRef]
- Weisberg, M. Remeasuring man. Evol. Dev. 2014, 16, 166–178. [Google Scholar] [CrossRef]
- Kaplan, J.M.; Pigliucci, M.; Banta, J.A. Gould on Morton, redux: What can the debate reveal about the limits of data? Stud. Hist. Philos. Biol. Biomed. Sci. 2015, 52, 22–31. [Google Scholar] [CrossRef]
- Mitchell, P.W. The fault in his seeds: Lost notes to the case of bias in Samuel George Morton’s cranial race science. PLoS Biol. 2018, 16, e2007008. [Google Scholar] [CrossRef] [PubMed]
- Yerkes, R.M. Psychological Examining in the United States Army; Government Printing Office: Washington, DC, USA, 1921.
- Yoakum, C.S.; Yerkes, R.M. Army Mental Tests; Henry Holt and Company: New York, NY, USA, 1920. [Google Scholar]
- Cobb, M.V.; Yerkes, R.M. Intellectual and educational status of the medical profession as represented in the United States army. Bull. Natl. Res. Counc. 1921, 1, 458–532. [Google Scholar]
- Yerkes, R.M. Eugenic bearing of measurements of intelligence in the United States army. Eugen. Rev. 1923, 14, 225–245. [Google Scholar] [PubMed]
- Yerkes, R.M. Testing the human mind. Atlant. Mon. 1923, 131, 358–370. [Google Scholar]
- Yerkes, R.M. Robert Mearns Yerkes. In History of Psychology in Autobiography; Murchison, C., Ed.; Clark University Press: Worcester, MA, USA, 1932; Volume 2, pp. 381–407. [Google Scholar]
- Carson, J. Army Alpha, army brass, and the search for army intelligence. ISIS 1993, 84, 278–309. [Google Scholar] [CrossRef]
- Zenderland, L. Measuring Minds: Henry Herbert Goddard and the Origins of American Intelligence Testing; Cambridge University Press: New York, NY, USA, 1998. [Google Scholar]
- Kevles, D.J. Testing the army’s intelligence: Psychologists and the military in World War I. J. Am. Hist. 1968, 55, 565–581. [Google Scholar] [CrossRef]
- Kane, M.T. Validation. In Educational Measurement, 4th ed.; Brennan, R.L., Ed.; Praeger Publishers: Westport, CT, USA, 2006; pp. 17–64. [Google Scholar]
- Bureau of the Census. Thirteenth Census of the United States Taken in the Year 1910, Volume I: Population 1910: General Report and Analysis; Government Printing Office: Washington, DC, USA, 1913.
- Clarizio, H.F. In defense of the IQ test. Sch. Psychol. Dig. 1979, 8, 79–88. [Google Scholar]
- Jensen, A.R. Bias in Mental Testing; The Free Press: New York, NY, USA, 1980. [Google Scholar]
- Terman, L.M. The Measurement of Intelligence: An Explanation of and a Complete Guide for the Use of the Stanford Revision and Extension of the Binet-Simon Intelligence Scale; Houghton Mifflin: New York, NY, USA, 1916. [Google Scholar]
- Terman, L.M.; Lyman, G.; Ordahl, G.; Ordahl, L.E.; Galbreath, N.; Talbert, W. The Stanford Revision and Extension of the Binet-Simon Scale for Measuring Intelligence; Warwick & York: Baltimore, MD, USA, 1917. [Google Scholar]
- Spearman, C. The Abilities of Man: Their Nature and Measurement; The Macmillan Company: New York, NY, USA, 1927. [Google Scholar]
- Warne, R.T.; Astle, M.C.; Hill, J.C. What do undergraduates learn about human intelligence? An analysis of introductory psychology textbooks. Arch. Sci. Psychol. 2018, 6, 35–50. [Google Scholar] [CrossRef]
- Shuey, A.M. The Testing of Negro Intelligence, 2nd ed.; Social Science Press: New York, NY, USA, 1966. [Google Scholar]
- Garrett, H.E. Comparison of Negro and white recruits on the army tests given in 1917–1918. Am. J. Psychol. 1945, 58, 480–495. [Google Scholar] [CrossRef]
- Peak, H.; Boring, E.G. The factor of speed in intelligence. J. Exp. Psychol. 1926, 9, 71–94. [Google Scholar] [CrossRef]
- Deary, I.J.; Ritchie, S.J. Processing speed differences between 70- and 83-year-olds matched on childhood IQ. Intelligence 2016, 55, 28–33. [Google Scholar] [CrossRef] [PubMed]
- Hunt, E. Human Intelligence; Cambridge University Press: New York, NY, USA, 2011. [Google Scholar]
- Alper, T.G.; Boring, E.G. Intelligence test scores of Northern and Southern white and Negro recruits in 1918. J. Abnorm. Soc. Psychol. 1944, 39, 471–474. [Google Scholar] [CrossRef]
- Warne, R.T. An evaluation (and vindication?) of Lewis Terman: What the father of gifted education can teach the 21st century. Gifted Child Q. 2019, 63, 3–21. [Google Scholar] [CrossRef]
- Terman, L.M.; Childs, H.G. A tentative revision and extension of the Binet-Simon measuring scale of intelligence: Part I. J. Educ. Psychol. 1912, 3, 61–74. [Google Scholar] [CrossRef]
- Ayres, L.P. The Binet-Simon measuring scale for intelligence: Some criticisms and suggestions. Psychol. Clin. 1911, 5, 187–196. [Google Scholar] [PubMed]
- Davidson, P.E. The social significance of the army intelligence findings. Sci. Mon. 1923, 16, 184–193. [Google Scholar]
- Freeman, F.N. A referendum of psychologists. Century 1923, 107, 237–245. [Google Scholar]
- Young, K. The history of mental testing. Pedagog. Semin. 1924, 31, 1–48. [Google Scholar] [CrossRef]
- Gottfredson, L.S. Mainstream science on intelligence: An editorial with 52 signatories, history, and bibliography. Intelligence 1997, 24, 13–23. [Google Scholar] [CrossRef]
- Neisser, U.; Boodoo, G.; Bouchard, T.J.; Boykin, A.W.; Brody, N.; Ceci, S.J.; Halpern, D.F.; Loehlin, J.C.; Perloff, R.; Sternberg, R.J.; et al. Intelligence: Knowns and unknowns. Am. Psychol. 1996, 51, 77–101. [Google Scholar] [CrossRef]
- Cronbach, L.J.; Meehl, P.E. Construct validity in psychological tests. Psychol. Bull. 1955, 52, 281–302. [Google Scholar] [CrossRef] [PubMed]
- Kane, M.T. Validating the interpretations and uses of test scores. J. Educ. Meas. 2013, 50, 1–73. [Google Scholar] [CrossRef]
- Koenig, K.A.; Frey, M.C.; Detterman, D.K. ACT and general cognitive ability. Intelligence 2008, 36, 153–160. [Google Scholar] [CrossRef]
- Lee, M. Report Discloses SATs, Admin Rate. Harvard Crimson. 7 May 1993. Available online: https://www.thecrimson.com/article/1993/5/7/report-discloses-sats-admit-rate-pa/?page=single (accessed on 19 February 2019).
- Davis, B.D. Neo-Lysenkoism, IQ, and the press. Natl. Aff. 1983, 73, 41–59. [Google Scholar]
- Porteus, S.D. Mental tests for feeble-minded: A new series. J. Psycho-Asthenics 1915, 19, 200–213. [Google Scholar]
- Pyle, W.H. The Examination of School Children: A Manual for Directions and Norms; Macmillan: New York, NY, USA, 1913. [Google Scholar]
- Binet, A.; Simon, T. The Development of Intelligence in the Child; Kite, E.S., Translator; Original Work Published 1908; Williams & Wilkins: Baltimore, MD, USA, 1916. [Google Scholar]
- Porteus, S.D. Porteus Maze Test: Fifty Years’ Application; Pacific Books: Palo Alto, CA, USA, 1965. [Google Scholar]
- Alcock, J. Unpunctured equilibrium in the Natural History essays of Stephen Jay Gould. Evol. Hum. Behav. 1998, 19, 321–336. [Google Scholar] [CrossRef]
- Woodley of Menie, M.A.; Dutton, E.; Figueredo, A.-J.; Carl, N.; Debes, F.; Hertler, S.; Irwing, P.; Kura, K.; Lynn, R.; Madison, G.; et al. Communicating intelligence research: Media misrepresentation, the Gould Effect, and unexpected forces. Intelligence 2018, 70, 84–87. [Google Scholar] [CrossRef]
The omitted portion of the quotation is three pages of history describing Gould’s and his family’s social advocacy. It is clear in this passage that he saw The Mismeasure of Man as being a continuation of this tradition of political work.
The text of this section in revised version of The Mismeasure of Man  is unchanged from the original edition of the book, so all references will be to the 1981 edition.
The phonograph item was the third most difficult item in the Picture Completion subtest for our replication sample. Only 32.7% of examinees who attempted it answered the question correctly, though Items 8 (a picture of an envelope) and 10 (a picture of a pocketknife), were more difficult for our examinees.
About 8% of Army Beta examinees were born in the UK or Canada (, p. 696, Table 213), and it is reasonable to believe that most of these spoke English as a native language. It is unclear, though, whether immigrants from these countries were disproportionately more likely to be recent immigrants.
Gould  never reported the percentage of zero scores on each subtest, only that these were excessive. The only table stating exact zero percentages for each subtest (, p. 741) showed that between 2.1% and 26.9% of White examinees and 4.5% to 36.0% of Black examinees earned a zero on each Army Beta subtest. These numbers are not representative of the entire examinee population, though, because they are from a single location, Camp Dix, New Jersey. In a comparison of fifteen military training camps, Camp Dix had the second-lowest mean Army Beta score (based on data from , p. 669). It is not clear why Camp Dix scored lower than others; Shuey (, pp. 314–315) believed it was because almost half of the examinees at Camp Dix were Black or foreign-born Whites, many of whom would be more likely to be poorly educated than literate, native-born White men. Garrett (, p. 484) also identified different procedures for selecting examinees for Army Beta that could have caused the lower scores at Camp Dix.
The army psychologists’ attempts to make scores on both tests interchangeable may be the earliest example of test score equating in the scientific literature.
The “high average” rating is for all examinees across the Army Alpha and Army Beta. The mean score on Beta for a sample of 26,012 examinees (, p. 669, Table 189), was about 40.45 (SD = 21.50), which indicated that our sample scored 2.24 SD above the mean for illiterate soldiers in World War I. A total Army Beta score of 40.45 warranted a rating of D, or “inferior”.
When correcting for low reliability of the total Army Beta test score and the restriction of range for our examinees, the correlations increased to r = 0.812 for self-reported ACT scores and r = 0.411 for self-reported college GPA. It is interesting that the former correlation is almost exactly equal to the r = 0.811 correlation between Army Beta scores and total Army Alpha scores that Yerkes (, p. 634) reported. However, because these corrections were not pre-registered we are relegating them to this footnote instead of the main text of the article. Readers should put more stock in our pre-registered analyses.
|Gould’s Sample||Replication Sample|
|Subtest #||Subtest Name||Completed||Not Completed||Completed||Not Completed||χ2 (p)||Odds Ratio a|
|Rating||Gould’s Sample||Replication Sample|
|A||31 (58.5%)||29 (14.1%)|
|B||16 (30.2%)||65 (31.7%)|
|C||6 (11.3%)||111 a (54.1%)|
|Maze||Cube Analysis||X-O Series||Digit Symbol||Number Checking||Picture Completion||Geometric Construction||Total Score||ACT Composite a||College GPA b|
|ACT Composite Score a||0.114||0.280||0.185||0.060||0.270||0.315||0.294||0.379||1.000|
|College GPA b||0.061||0.135||0.193||0.094||0.084||−0.015||0.001||0.143||0.104||1.000|
|Sample||Model||χ2 (df, p)||Δ χ2 (df, p)||RMSEA [90% CI]||CFI||TLI||SRMR|
|Replication||1 factor||13.139||—||0.000 [0.000, 0.064]||1.000||1.011||0.034|
|(n = 205)||(df = 14, p = 0.516)|
|Replication||2 factors||11.099||2.040||0.000 [0.000, 0.060]||1.000||1.026||0.032|
|(n = 205)||(df = 13, p = 0.603)||(df = 1, p = 0.153)|
|Yerkes (, p. 390)||1 factor||104.446||—||0.099 [0.082, 0.118]||0.965||0.948||0.032|
|(n = 693)||(df = 14, p < 0.001)|
|Yerkes (, p. 390)||2 factors||106.581||2.135||0.102 [0.085, 0.120]||0.966||0.945||0.033|
|(n = 693)||(df = 13, p < 0.001)||(df = 1, p = 0.144)|
|Yerkes (, p. 634)||1 factor||226.542||—||0.117 [0.104, 0.131]||0.950||0.925||0.038|
|(n = 1102)||(df = 14, p < 0.001)|
|Yerkes (, p. 634)||2 factors||221.567||4.975||0.121 [0.107, 0.135]||0.951||0.921||0.039|
|(n = 1102)||(df = 13, p < 0.001)||(df = 1, p = 0.026)|
|Subtest Name||Replication Sample||Yerkes ( p. 390)||Yerkes (, p. 634)|
|1 Factor||2 Factors||1 Factor||2 Factors||1 Factor||2 Factors|
|Factor Correlation||—||r = 0.842||—||r = 0.975||—||r = 0.975|
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Warne, R.T.; Burton, J.Z.; Gibbons, A.; Melendez, D.A. Stephen Jay Gould’s Analysis of the Army Beta Test in The Mismeasure of Man: Distortions and Misconceptions Regarding a Pioneering Mental Test. J. Intell. 2019, 7, 6. https://doi.org/10.3390/jintelligence7010006
Warne RT, Burton JZ, Gibbons A, Melendez DA. Stephen Jay Gould’s Analysis of the Army Beta Test in The Mismeasure of Man: Distortions and Misconceptions Regarding a Pioneering Mental Test. Journal of Intelligence. 2019; 7(1):6. https://doi.org/10.3390/jintelligence7010006Chicago/Turabian Style
Warne, Russell T., Jared Z. Burton, Aisa Gibbons, and Daniel A. Melendez. 2019. "Stephen Jay Gould’s Analysis of the Army Beta Test in The Mismeasure of Man: Distortions and Misconceptions Regarding a Pioneering Mental Test" Journal of Intelligence 7, no. 1: 6. https://doi.org/10.3390/jintelligence7010006