Why Creativity Isn ’ t in IQ Tests , Why it Matters , and Why it

Creativity is a part of most theories of intelligence—sometimes a small part and sometimes a large part. Yet even IQ tests that assess aspects of intelligence that supposedly reflect creative abilities do not actually measure creativity. Recent work has argued that intelligence and creativity are more conceptually related than we have thought. In addition, creativity offers a potential way to counter issues of test bias from several different angles. That said, inherent difficulties in measuring creativity and inherent sluggishness in the test industry mean the odds are small that creativity will find its way into IQ tests as currently defined. However, there remain other potential possibilities in related fields.


Introduction
There are many challenges facing IQ testing, and how the field responds will likely determine the extent of their continued use in the future.A small sampling of issues includes how tests can incorporate technological advances (indeed, most intelligence tests are oddly stuck in the hard-copy era), how test developers can respond to such challenges as the RTI (Response to Intervention) movement, and how trainers can encourage intelligent testing [1].There are broader concerns, however, which are rooted in some of the same basic questions that have been asked for the last century.Do IQ tests actually measure intelligence?How well do they predict real-world success?Do IQ tests reflect current beliefs about the vast expanse of intellectual abilities?

OPEN ACCESS
In some ways, it is unfair to expect one test-whether we mean an IQ test, an academic achievement test, or an admissions test such as the SATs or GREs-to account for everything.We wouldn't expect one blood test to yield a diagnosis for all possible diseases.Yet in common perception and often in practice, we allow a handful of scores to determine virtually everything about a person.
In an ideal world, every child being assessed would be given a complete battery of cognitive and non-cognitive tests, just as any college applicant would be evaluated on a portfolio of past work, scores, recommendations, interviews, and essays.Both IQ tests and college admission tests are often defended by saying they are only intended to be used as part of a complete work-up; I've called this argument the Lucky Charms paradox [2].In commercials, a child is portrayed eating Lucky Charms as part of a nutritious breakfast, complete with orange juice, scrambled eggs, yogurt, and toast.When seen this way, Lucky Charms is a good thing.But in real life, mornings are frenetic and filled with oversleeping kids, hurried showers, and a quick bowl or two of Lucky Charms gulped down before the school bus.Despite the best intentions, Lucky Charms become the only source of breakfast nutrition.
Moving from breakfast cereal to tests, the SATs are often the only numbers given weight for most college admissions.Low GRE scores can keep students with otherwise exceptional records out of top graduate schools.And IQ tests, all by themselves, hold a tremendous amount of power.It is common for students to be initially recommended for gifted programs but then denied if their IQ score is one point lower than the cut-off [3,4].
IQ can take on a literally life-or-death role given that an IQ of 70 is the cutoff for whether a prisoner can be executed [5].Even though technically a classification of Intellectual Disability (Mental Retardation) requires both an IQ and a measure of adaptive behavior (such as the Vineland-II) to be below 70, in reality it is the IQ test that bears the burden of determining a criminal's fate [6].Further, sometimes a live-or-die verdict rests on less than 1 IQ point.IQ test norms get out of date at the rate of 0.3 points per year, and many researchers and clinicians argue that a person's IQ must be adjusted by 0.3 points for each year the norms are out of date (Flynn, 2009)-even if that adjustment produces an IQ of 69.4 or 70.7.As Cecil Reynolds and his colleagues argue: "The importance of understanding and assessing mental retardation in criminal defendants has become critical, indeed a true matter of life and death, in capital felony cases.No one's life should depend on when an IQ test was normed" [7].Even more pertinent, in this high-tech age brimming with sophisticated theories of intelligence, creativity, and non-cognitive variables, how can key decisions like gifted placements or capital punishment verdicts rest almost exclusively on g, and be able to be overturned by a point or two?
With power comes responsibility.What aspects of intelligence are not measured by intelligence tests?Here there is natural divergence between IQ test developers and intelligence theorists, yet at least one construct is present in all theories but still omitted from any IQ test: Creativity.

Creativity in IQ-Related Theories and IQ Tests
Most current IQ tests use the Cattell-Horn-Carroll (CHC) model [1] either explicitly or implicitly within their theoretical frame work [8].Given that CHC is the theory most commonly used in IQ tests and is also the most frequently used non-g perspective used in creativity-intelligence studies, I will primarily focus on this theory.
The CHC model is an outgrowth of the Cattell-Horn Gf-Gc theory of fluid and crystallized intelligence [9] and Carroll's (1993) Three-Stratum Theory [10].Much of current CHC theory is most rooted in Horn's expanded model (e.g., [11]).Although Gf would seem to be a natural match with creativity abilities, with its emphasis on novel problem-solving, creativity is placed as part of Glr (Long-term storage and retrieval) in the current CHC model [12,13].
Someone high in Glr is able to store information in long-term memory and then retrieve it when needed [14].Glr is used when you learn the name of your daughter's fifth-grade teacher and then again when you remember the name of your own fifth-grade teacher.You are still using Glr when you help your daughter with her homework and can recall your own long-ago knowledge of the state capitals and then link this memory of Sacramento being the capital of California with last night's news story about the lottery winner living in Sacramento.
Glr has two distinct components: learning efficiency (how well you can both learn and retain new information) and fluency (the ability to rapidly recall many things).There are many "narrow" abilities underneath these two components; one such ability is Idea Production, which can include five types of fluency (associational, expressional, ideational, word, and figural), figural flexibility, and originality [15].Fluency, which is this usage refers to the ability to generate many different ideas; flexibility, or the ability to generate many different categories of ideas; and originality, which is generating particularly rare and unusual ideas, are also key components in creativity.They represent a large portion of Guilford's Divergent Production [16] within his Structure of Intellect model.
Although Glr is measured in many current IQ tests, the focus is rarely on fluency (and never on flexibility or originality).The Woodcock-Johnson IV Tests of Oral Language [17] include Retrieval Fluency.The Woodcock-Johnson IV Tests of Achievement (WJ-IV; [18]) include three Fluency subtests (Sentence Reading, Math Facts, and Science Writing) in the core battery and Word Reading Fluency in the extended battery.The Kaufman Test of Educational Achievement-3 (KTEA-3; [19]) includes nine Fluency subtests: Word Recognition, Silent Reading, Decoding, Math, Writing, Associational, Reading, Oral, and Academic.The NEPSY-II [20] includes Design Fluency and Word Generation (which was originally called Verbal Fluency).Yet most of these subtests are measuring the broader type of fluency, which is less related to creativity.As J.C. Kaufman et al. [15] discuss, the only four subtests that measure fluency in a creativity-relevant way are the WJ IV's Retrieval Fluency, the KTEA-3's Associational Fluency, and the two NEPSY-II subtests.Although recent studies have found that Glr does indeed predict creativity [21,22], the construct as measured by IQ tests is far removed from conceptions of creativity.
The CHC model is certainly not the only (or necessarily best) theory of intelligence.The PASS model (Planning, Attention, Simultaneous, and Successive) [23], a cognitive processing approach rooted in the neuropsychological work of Luria [24], is also often used in IQ tests.Creativity likely lies in the Planning component [25].Tests that use the PASS model do measure Planning, but such assessments have not included anything explicitly related to creativity.In addition, there are several theories that are not (yet) represented in IQ tests that have much to offer.

Creativity in Other Theories of Intelligence
Guilford's [16] Structure of Intellect model is not as popular as it once was, but it represented a pioneering step forward.The model introduced the concepts of both divergent and convergent thinking, which are parts of many modern theories of creativity (e.g., [26]).
A modern theory of intelligence that emphasizes creativity is Sternberg's [27][28][29] theory of successful intelligence.An earlier version, the Triarchic Theory of Intelligence [30], proposed three intelligences: Analytic, Practical, and Creative.The revised theory comprises three subtheories.Most related to creativity is the experiential subtheory, which focuses on how people adjust to novelty and automatize information processing.Although many measures of Sternberg's theories have been created and used for college admissions [31], they have been not yet been incorporated into IQ tests (see, e.g., [32]).
Gardner's well-known theory of multiple intelligences [33,34] (encompassing interpersonal, intrapersonal, spatial, naturalistic, language, logical-mathematical, bodily-kinesthetic, and musical) includes a wide range of abilities that would require differing amounts of creativity.Further, Gardner's Creating Minds [35] contains case studies of eminent creative individuals who embody his intelligences.Although Gardner's work has had extensive influence in the schools, his ideas have also not yet been incorporated into IQ tests.

The Relationship between Creativity and Intelligence
There have been hundreds of papers devoted to this topic, and given that my emphasis is on creativity as part of intelligence (as opposed to creativity representing a different construct from intelligence), I will only briefly review this literature.Most studies on the topic tend to assume a generalist perspective on both intelligence (typically using group-based tests of g) and creativity (typically using divergent thinking tests).Creativity and intelligence, under these circumstances, tend to correlate at a small but significant level [36,37].
Many argue that the two constructs are more closely related than such studies would indicate.Silvia [38,39] suggests the relationship is underestimated because the studies are limited by only looking at observable scores (i.e., performance on an intelligence test).Jung [40] notes that intelligence can be seen as problem solving at an everyday level (e.g., [41]), whereas creativity may represent problem solving for less common issues (e.g., [42]).Others argue that creativity and intelligence are both cognitive functions [43] or that divergent thinking is simply an executive cognitive function [44].
Much of the more recent work on creativity and intelligence has focused more on Gf instead of g or Gc.Benedek, et al. [45] studied underlying executive functions behind both divergent thinking and Gf.They found that the ability to notice small changes (called updating; [46]) predicted both Gf and divergent thinking; the ability to stifle a natural response (or inhibition) also predicted divergent thinking.Other studies have used metaphor creation as a creativity measure instead of divergent thinking and a spectrum of CHC components instead of just g and have found much higher relationships between creativity and intelligence than past studies (i.e., [47,48]).
A pair of studies by Silvia and colleagues indicated that the relationship between Gf and creativity is mediated by other cognitive mechanisms.Nusbaum and Silvia [49] found that Gf predicted creativity but also predicted how well people could use a more efficient strategy to improve their scores.Beaty and Silvia [50] studied divergent thinking over time.Participants in the sample with higher Gf produced creative initial ideas but slowed over time; in comparison, those participants with lower Gf showed much more improvement in their idea generation when given more time.Looking broadly at these studies suggests that the basic approach to seeing how creativity and intelligence relate to each other may not be enough.Creativity and intelligence show an intricate relationship with many cognitive and situational mediators.It may be instinctually appealing to consider them as completely different concepts, but some have argued that this separation can hurt children who are being assessed [51].

Why is Creativity Important? The Issue of Bias
Even if one readily accepts that creativity is a part of intelligence and that it is not satisfactorily measured on IQ tests, there is the larger question of whether creativity's absence is cause for concern.It is easy to list the many positive outcomes connected with creativity, but most components of intelligence would have their own list of beneficial associations.As I have argued elsewhere [52,53], creativity has particular implications for fairness and non-biased assessment.
IQ tests have been criticized in two different ways for being biased.A common layperson criticism approach is to note the significant differences that occur between males and females (although such differences are much more present in standardized achievement tests) and among ethnic groups.Some researchers argue that these measures reflect actual differences [54].Others point to the discrepancy between socioeconomic status and opportunities across ethnicities [27], whereas still others argue that current ability measures do not incorporate enough aspects of intelligence to truly reflect a person's "global" ability [29].
Another proposed explanation is stereotype threat.A multitude of studies (see, e.g., [55]) have found that individuals feel stress when placed in a situation where they run the risk of confirming a negative stereotype about their group (e.g., ethnicity).This stress can then cause poor performance [56,57].People can be aware of stereotypes about intelligence when they do not endorse such views; indeed, even people who are the targets of such stereotypes are cognizant of these views.An example of a manifestation might be if an African American test-taker is aware of these stereotypes and is worried about confirming such negative views and, as a result, feels added stress.Such added stress may increase cognitive load and reduce working memory [58], thereby leading to lowered test performance.
Beyond simply focusing on group differences, there is a psychometric approach to bias that offers a more nuanced approach.In this method, the question is whether a test might be measuring different things in different groups.For example, a test may measure vocabulary in Caucasians, but measure something else (such as exposure to American culture) in a Hispanic American population [59,60].
If a test measures something different from its intended purpose for specific groups, then it can be considered to be biased against such groups.A measure can be considered to be fair to the extent any resultant score only reflects (a) variables associated with the construct being measured, and (b) random variance from error.If any systematic error occurs because of a person's group membership and, thus, that person receives a lower score, then the test is biased [61].
Creativity may be a reason why some groups systematically get items incorrect.Several scholars [62][63][64] have suggested that Caucasians approach some tasks (such as hearing a story and then recounting specific details) in the way that the test creators intended.In contrast, African Americans may emphasize the narrative in the story and answer more creatively.Such answers would then be marked incorrect.
Creativity also may help with the issue of test bias in other ways.If stereotype threat is a reason why some groups receive lower IQs, it is worth noting that creativity tests and rated work tend to show no differences across ethnicity [65][66][67].However, African Americans see themselves as more creative than Caucasians in several domains [68].In another study, African Americans and Caucasians were asked to provide self-ratings on both intelligence and creativity [69].There was an interaction with education; among less-educated individuals, ethnic differences were notably higher for intelligence than creativity.Among the higher-educated individuals, African Americans rated themselves higher for creativity than did Caucasians; the opposite effect was found for intelligence.The result of these findings may be that given the higher self-concept that African Americans have for creativity, an IQ test that includes creativity may reduce stereotype threat.Further, there is a strong chance that the ethnicity gap noted in IQ tests would not be found in items emphasizing creativity.

Creativity and IQ Tests: Possibilities, Realities, and Ironies
Although the field of creativity has been around for more than 65 years and has been especially fertile over the last two decades, measurement is still a tricky subject.The most common assessment remains divergent thinking, much as it was in the 1950s.Researchers often use either the Torrance Tests of Creative Thinking (TTCT; [70]) or a comparable measure based on Guilford's [16] ideas [71].Such tests are traditionally scored for fluency, flexibility, and originality (all abilities that would theoretically comprise Glr).
Could a subtest that measures divergent thinking ("Creative Fluency") be added to an IQ test?Instead of asking test-takers to name many different examples from a category (such as types of flowers), could they be asked more open-ended questions such as uses for a pencil or what might happen if people could fly?Certainly, it would be possible to score such responses for fluency; just as IQ test examiners are highly trained as to what constitutes a correct answer, so too are TTCT scorers trained to reliably determine if a response is relevant (and thus counts for fluency) or irrelevant [72].
Even scoring for originality would be possible.The TTCT has a matrix of responses and points are assigned based on the rarity of response.In research studies, all responses are pooled and originality is scored based on the frequency a particular answer is given.Such matrices would be easy to compile during test standardization; one such matrix was applied to determine the originality of responses on the Verbal Fluency subtest of the McCarthy Scales of Children's Abilities [73].Just as current IQ tests have responses that carry different weights based on "correctness", so too could a scoring system be implemented that assigns increased points for more rare responses.Such an assessment could also be translated on-line without notable changes to the process [74].
There are criticisms of divergent thinking, of course.Even its supporters agree that it is only one aspect of creativity [75].The broad verbal-figural categories are not dissimilar to the early atheoretical Verbal-Performance structure of past Wechsler scales (e.g., [76]).The validity of such measures has been both challenged [77] and defended [78].Adding divergent thinking to IQ tests would probably be the easiest possibility, but not necessarily the optimal choice.Indeed, consider that one common criticism of IQ tests is that they have largely remained stagnant over the last century [32].The selection of a creativity measure primarily rooted in work more than six decades old may not be a step forward.
Another popular measure of creativity is the Consensual Assessment Technique [79,80], in which actual creative products are rated by content experts.Such experts tend to agree at strikingly high rates, even if they represent different areas of expertise [81,82] or a different amount of expertise [83,84].Only pure novices do not show agreement with experts [85,86].
This method would be impractical on a large scale for reasons of time and cost, but an important take-away is that an expert's opinion carries a certain amount of weight.Just as an examiner need not be an expert in motivation or personality to make valuable notes about a child's behavior during the test, particular care can be made to note responses that are notably original, aesthetically pleasing, humorous, or clever.Such behavioral observations are consistent with a teacher using slight variations from a typical lesson plan to allow student creativity to develop [87].
Even the Consensual Assessment Technique could eventually be utilized on a larger scale.Computers can be trained to grade essays -not to mention to play chess and develop unique recipes [72].There is no reason to assume that given enough programming, computers could not be trained to mimic human raters in evaluating creative work (at least creative writing).How might computers be trained?Consider the many textual analysis programs (e.g., WordNet, Linguistic Integrative Word Count, and Latent Semantic Analysis) that use advanced computational modeling techniques to assess semantic patterns and associations.Many of these models are based on self-learning algorithms, meaning that there is minimal training time and that the ability to detect whatever construct is being examined improves with experience.Some of the programs (such as Latent Semantic Analysis) are already being used to automatically score essays based on human feedback; this same basic principle can be applied to creative writing.Just as current systems have trained computer programs to mimic experts in grading essays for writing quality, grammar, and sentence structure, so too can programs be trained to detect creativity.Infinite facets of a writing sample can be detected (across many possible dimensions, from adjective placement to language choice) and then associated with multiple ratings by experts of the creative aspects of the product.The program could then learn which features are associated with creativity, and to what extent.Certainly, the ability to calculate distances between words or concepts should allow for sophisticated originality scores in divergent thinking-type measures.
I have used creative writing as a main example because I think we are the closest to being able to best measure that domain on a large scale.However, the Consensual Assessment Technique has been used in many others domains, from taking photographs [88] to music compositions [89] to deriving mathematical equations [90] to dramatic performance [91] to answering science questions [92] to cooking [93] to everyday problem-solving [94].In addition, ratings of different creative products within the same domain tend to be consistent by person [80,95].
Ideally, a multitude of possible domains could be measured to provide a broad perspective on a person's creativity.Although it would be simpler to use a domain-general measure of creativity, there has been a strong movement toward domain-specificity [85,96,97].
Unfortunately, these types of initiatives take money and the willingness to take risks.Test publishers may have the former, but most have shown no indication of the latter.On some level, this is understandable; IQ tests are a business, like making dental floss, and the people making the decisions-even those with backgrounds in psychology or education-do not seem to have the burning passion to advance the field.It is easy to criticize other people for not spending money, taking chances, or investing in the future.Yet, ironically, what is preventing creativity from being included in IQ tests is a lack of creativity on the part of publishers.

The Future of Creativity -and IQ -Assessment?
No matter how much some groups want to keep the status quo, progress moves forward regardless.Video games continue to grow in popularity, both as an activity and a business.There has been a growing interest in the relationship between video games and creativity [98].Some studies show a link between playing video games and being creative [99,100].But these are only the start.
Valerie Shute and her colleagues [101][102][103] have begun what they call "stealth assessment" in video games.They have created a game called Physics Playground, in which the player draws objects on the screen.The drawings then become animated and interact with other objects in the game; as the game progresses, there are a series of puzzles and problems that can be solved by drawing different objects.Shute and her colleagues can quietly test how well the players learn and understand physics-as well as their creativity.Levels can be solved in many different ways, and a player can attempt multiple solutions.It is thus possible to generate scores for fluency, flexibility, and originality (as well as related constructs, such as humor or aesthetics; [101]).
Shute and her colleagues' work is still at a starting point.There are years of work in continued development, validation, and expansion.The possibilities discussed earlier-especially the use of computerized scoring to make large scale Consensual Assessment Technique feasible-are also likely far away from being a reality.Unfortunately, we are many years removed from a time where researchers such as David Wechsler (or my parents, Alan and Nadeen Kaufman) drove the field.The publishers are the ones with all of the power, and the dominating company, Pearson, is only now slowly translating its existing instruments into computerized format with Q-interactive (see, e.g., [104]).
It used to be that the costs of creating and standardizing a test were so prohibitive that a huge publisher was a necessity.Things can change.There are some aspects of testing that need a face-to-face component [1], but on-line assessment will nonetheless be suitable for many purposes.Collaboration with programmers and using MTurk to obtain a large standardization sample may enable academics or independent companies to tackle the type of new approaches-creative approaches, and ones that also measure creativity-that current industry leaders are unwilling to consider.