The C-Test: An Integrative Measure of Crystallized Intelligence

: Crystallized intelligence is a pivotal broad ability factor in the major theories of intelligence including the Cattell-Horn-Carroll (CHC) model, the three-stratum model, and the extended Gf-Gc (fluid intelligence-crystallized intelligence) model and is usually measured by means of vocabulary tests and other verbal tasks. In this paper the C-Test, a text completion test originally proposed as a test of general proficiency in a foreign language, is introduced as an integrative measure of crystallized intelligence. Based on the existing evidence in the literature, it is argued that the construct underlying the C-Test closely matches the abilities underlying the language component of crystallized intelligence, as defined in the well-established theories of intelligence. It is also suggested that by carefully selecting texts from pertinent knowledge domains, the factual knowledge component of crystallized intelligence could also be measured by the C-Test.

methods" [4] (p. 127) and represents abilities that result from education and experience. Gf and Gc are correlated as crystallized intelligence is partly based on fluid intelligence; Gf is at work to activate information from experience [4]. This is referred to as the investment hypothesis of fluid and crystallized intelligence, i.e., Gc is acquired through the "investment" of other abilities during formal and informal education and life experiences [5,6]. It is commonly accepted that fluid intelligence abilities decline with age but crystallized intelligence abilities do not or even increase with age. [7] This view has been challenged by [8]. This distinction is in parallel with conceptions of Intelligence A and Intelligence B, the former hypothesized to be biologically determined and the latter being the product of education and experience [9]. Gc along with fluid intelligence (Gf) and visual-spatial reasoning ability (Gv) are generally referred to as the trio of intellectual abilities [4].
More specifically, according to CHC, Gc is composed of a multitude of factors including: "knowledge of the culture … breadth and depth of acquired knowledge of language, information and concepts of a specific culture… a store of verbal or language-based declarative (knowing what) and procedural (knowing how) knowledge acquired through the investment of other abilities during formal and informal educational and general life experiences" [10] (p. 5).
Gc is usually measured by means of vocabulary tests and other verbal tasks [11,12]. The purpose of this research is to introduce and suggest an integrative verbal task, namely the C-Test, as a valid and economic measure of the verbal component of Gc.
Although Gc is a recognized component of intelligence, it is not well understood and there is no agreement on its nature and content [12]. Whereas the original definition of the construct by Cattell [13,14] was very broad and contained knowledge and skills in many domains, Carroll's [2] model only included language abilities.
Crystallized intelligence is basically considered to be language and general world knowledge and is measured by vocabulary knowledge and reading comprehension. The narrow abilities under crystallized intelligence are reading comprehension, cloze ability, reading decoding, reading speed, spelling ability, writing ability, foreign language proficiency, foreign language aptitude, language development, lexical knowledge, listening ability, phonetic coding, communication ability, oral production and fluency, grammatical sensitivity, and verbal language comprehension [15]. Carroll [2] states that whether you want to call Gc crystallized intelligence or verbal intelligence is a matter of choice. Major intelligence batteries define and test Gc differently but mostly focus on verbal ability measures [12].
In the extended Gf-Gc theory, however, Gc is composed of verbal comprehension, syllogistic reasoning, verbal closure (comprehension of sentences when parts are missing), mechanical knowledge, general information, seeing problems, and behavioral relations [1].
It should be noted that an aspect of Gc that is mostly ignored in intelligence tests is cultural, world, and factual knowledge. Schipoliwski et al., [12] argue that Gc includes factual knowledge and should not be tested only with verbal indicators. They empirically show that, although verbal ability and factual knowledge are highly correlated, they are factorially distinct and form separate components of Gc.

C-Test
The C-Test is a variation of the cloze test, an integrative test suggested as an overall measure of general language proficiency [16]. In a cloze test, every n th word is deleted in a passage where examinees are required to restore the deleted words. The ability to complete the mutilated text is deemed to be an indication of general first and second language ability. In the late 1970s and early 1980s, cloze came under criticism on several grounds. Cloze critics argued that the rate and the point of unset of deletions affected reliability and validity of cloze tests, cloze tests are usually based on one long passage, which can result in bias, and that the cloze difficulty depends on the proportion of content and structure words deleted, among other issues [17][18][19]. Klein-Braley and Raatz [20] proposed the C-Test as a new testing procedure based on the tenets of the cloze test but allegedly without its deficiencies. By fixing the deletion rate to 2, and increasing the number of passages to 4-6, they tried to correct the faults of cloze. Deleting only half of the words made the scoring of the test more objective compared to cloze where several possibilities exist to complete each gap.
A C-Test battery is composed of four to six short passages where, starting from the second word in the second sentence, the second half of every second word is deleted. There are usually 20 to 25 broken words in each passage. To provide enough context for respondents the first and the last sentence of each text is left intact. Test takers have to restore the broken words. Each correct restoration is allotted a mark [20]. The following is an example of a C-Test passage: If you were to ask most people who Charles Darwin was, many of them would reply that he was the man who said that we were descended from monkeys. They wo___ be wr___. Darwin d___ no mo___ than sug___ the possi___. What h___ said, a___ proved b___ thousands o___ examples, w___ that ov___ millions o___ years ani___ and pla___ have cha___. This he called evolution.
The C-Test was first introduced as a measure of general second language proficiency and over the past three and a half decades many empirical studies have been conducted in several languages to demonstrate its validity as a measure of language competence both in first and second language. The latest C-Test bibliography contains 503 entries [21].
Reduced redundancy principle [22] and Gestalt psychology [16] have been advanced as underlying theories for the C-Test and a number of other tests including the cloze test, dictation, and the noise test (a dictation test on which background hissing noise is imposed). Reduced redundancy principle is based on the assumption that natural languages are redundant, i.e., languages contain unnecessary elements. Gaies [23] claims that in a telephone conversation there is twice as much information as is needed for a native speaker of a language to understand it. A communication system without redundancy is extremely sensitive to noise and impractical for communication in real life. Therefore, a competent language user must be able to understand distorted messages or when noise is imposed. Reduced redundancy principle served as the underlying and unifying theory to account for tests such as the cloze test, the C-Test, and the noise test.
Gestalt psychology, is another theory to account for tests in which message is distorted in one way or another. Oller [16] (p. 342) states that, "the perceivers' ability to fill in gaps in imperfect patterns……may be related to the ability to construct the same patterns".
Qualitative and quantitative investigations demonstrated that the construct underlying the C-Test performance is language competence in its entirety. Numerous studies on the C-Test mental processes via think aloud techniques and analyses of errors committed by test-takers when solving the C-Tests demonstrate that successful C-Test performance requires integration of "contextual, semantic, syntactic, morphological, lexical, and orthographic information and knowledge pertaining to a particular written language" [24] (p. 66). Hastings further continues that the C-test processing has a lot in common with natural language processing and resembles it both in complexity and length.
Klein-Braley [25] examined responses to C-Test items by nonnative speakers of English and identified several reasons for incorrect or failure to provide any response at all. Among the reasons she offers are "early closure" (finishing a clause or sentence before it ends and failing to process any further), "narrow focus" (using only the immediate context around the blank), incorrect spellings, singular/plural concordance, right word class but wrong word, near misses, and nonsense words as reasons for failure.
In a similar study, Babaii and Fatahi-Majd [26] identified the sources of failed responses in the C-Tests given to nonnative students of English. Their verbal protocol analysis suggested that overreliance on background knowledge-a reason for failure in reading comprehension [27]overlooking delicate points of grammar, automatic restoration of high frequency lexicon, poor retrieval or non-retrieval of lexical items, and overlooking major points of grammar are the major sources of wrong restorations in C-test performance. They also showed that gaps within long embedded sentences are harder to restore, as such constructions are harder to understand and tax working memory capacity.
The C-test is also shown to be sensitive to age related differences in first language abilities [28][29][30] and to first language attrition in adults living overseas for a long time [31]. Furthermore, Linnemann and Wilbert [32,33] demonstrated that the C-Test could be used to measure language comprehension in children with learning disorders.

C-Test and Crystallized Intelligence
In this section, theoretical and empirical evidence are put forward to support our major argument that the C-Test can measure the verbal component of Gc. Our exposition starts with the historical background of sentence completion tests as measures of intelligence. Based on previous construct validation studies of the C-Test in the language-testing field, we continue with our argument that the constructs underlying C-Test performance match to a considerable degree with the narrow ability factors hypothesized for Gc. Finally, we report the few existing empirical pieces of evidence on the relationship of passage completion tests and particularly C-Tests with measures of Gc.
The history of the C-Test and its predecessor, the cloze test, as intelligence tests goes back to the late 19th century when the school board of Breslau assigned German psychologist Hermann Ebbinghaus [34] to the task of developing some mental tests to find out the best time of study for school children [35]. He developed a sort of text completion test that he named Combinationsgabe, which is translated as the 'combination test' in English [36]. According to Max Meyer [37], the term "combination test" does not denote what the word in German implies. He stated that in German a person who "possesses 'Combinationsgabe', this means that he has a talent for drawing conclusions from premises which do not very readily present themselves to a man's consciousness as items of a unitary logical thought…" (p. 688) and suggested the term "conjectural method" as an appropriate English equivalent for 'Combinationsgabe'. The test was similar to a modern cloze test or C-Test and was used to study the effect of fatigue on children's school performance. Ebbinghaus found that there were strong relations between children's scores on the completion test and their groupings based on class standing and teachers' ratings of brightness. Statistical methods of correlation to compute the exact coefficients and the strength of the associations were not available to him at the time [2]. According to Terman [38], Ebbinghaus considered his Combinationsgabe a reliable measure of intellectual ability, although Terman did not agree and believed that "mechanical activities like memory and association… verbal memory… fluency in language" (p. 347) is tapped by the test.
Nevertheless, Ebbinghaus is credited as being one of the first to have investigated human intelligence and devised a group intelligence test [2]. The completion test was reported to be an excellent test of general intelligence with a corrected loading of 0.97 on the g factor [39]. Alfred Binet read a report of Ebbinghaus's work that he had prepared in French and was influenced by his experiments on school children and included the completion test in the original Binet-Simon test [35]. Ebbinghaus's combination method was probably the best test of intelligence at the time [40]. The method continues to be used in many modern day intelligence and achievement tests.
We argue that the modern C-Test is a very close variation of Ebbinghaus's method and can be used as a measure of crystallized intelligence, if not general intelligence. Since word mutilations are very frequent in a C-Test, which result in more than 25% reduction of the text content, processing the text entails reading comprehension, activation of lexico-grammatical knowledge, background knowledge, and using redundancies. Hofstätter [41] defines intelligence as the ability to recognize redundancies. Table 1 lists the narrow ability factors under Gc and reading/writing factor (Grw) in major theories of intelligence [10], with abilities deemed to be tapped by the C-Test highlighted. As depicted in Table 1 several Gc and Grw narrow abilities closely match abilities that are measured by the C-Test. Note that Grw is an independent broad ability factor in the extended Gf-Gc theory and CHC theory but is subsumed under Gc in Carroll's three-stratum model [10]. Whether we consider Grw an independent ability factor from Gc or a component of Gc, we believe that C-Test processing captures a number of components of this factor, too.
Reading comprehension is an extremely complex phenomenon and taxes several cognitive capacities and knowledge sources including reasoning ability, world knowledge, lexico-grammatical knowledge, and working memory. Working memory is the cognitive capacity to hold information in the memory while processing incoming information [42]. WM is considered essential to learning and literacy. Reading researchers have emphasized the role working memory plays in reading comprehension [43,44]. The ability to keep and process textual information and to build a coherent propositional and situational representation of text is essential to reading comprehension. Since reading comprehension requires several processes which take place over time WM is considered essential to make their coordination possible (e.g., [43][44][45]). To make a coherent representation of the text, readers have to hold information in memory to process other components of the text and then integrate elements of the text. When we read we need to process letters, associate them with their sounds, remember all these pieces of information and integrate them to figure out the meaning of a word, keep the meaning of each word in the memory, and combine them to understand the meaning of a sentence. Reading and processing mutilated texts should tax working memory to a greater extent than reading intact texts. Frequent mutilations in a text, as in a C-Test, make the process of reading comprehension even more demanding. Table 1. Gc (crystallized intelligence) and Grw (reading/writing) narrow abilities according to the three major theories of intelligence and the related C-Test constructs (highlighted).

Gc Narrow Abilities Grw Narrow Abilities CHC [3,10]
Extended Gf-Gc [ Restoring the broken words in a C-Test taps several narrow and broad ability factors subsumed under CHC. Vocabulary is an important component of language ability and is present in many intelligence batteries as a measure of Gc. Since each unit of a C-Test, i.e., an item is a broken word, the lexicon repertoire of examinees is challenged. Therefore, the C-Test is known to be an appropriate measure of vocabulary [46]. Successful completion of blanks requires active access to vocabulary resources. Besides vocabulary is closely associated with knowledge [2]. Carroll states that this might be due to the similar acquisition processes for them (they are both acquired through reading). Vocabulary tests are very commonly used to measure Gc because they are related both to verbal abilities and factual knowledge. "Knowledge of word meanings is declarative knowledge and tests of lexical knowledge are typically not limited to specific content domains, but vocabulary is also indispensable for any form of verbal communication and an important indicator of language development" [12] (p.159).
In order to supply words with correct endings examinees' grammatical knowledge and spelling ability are challenged; both narrow ability factors subsumed under crystallized intelligence. Since each text is a self-sufficient complete unit, successful completion of gaps requires understanding the fellow of speech, i.e., comprehending the text. Among the strategies reported by respondents to come up with answers (missing letters of a broken word), one remarkable one is reading the sentence in which the unsolved item is located [47]. Employing think-aloud verbal protocol analysis, [47] found that the longer the portion of the text test takers read and the more reading cycles they get involved in the higher they score on C-Tests. Since language is a rule-based system and each text is a complete unit of meaning, to restore the integrity and meaningfulness of text reasoning at the level of word, sentence and entire text seems to be an important factor to supply the missing letters. Lapses in the flow of information in a mutilated text act as background noise, which impedes the normal processing of text. Successful text completion requires comprehending the text and supplying the correct words with correct grammar. Text-based reasoning is essential to successful text completion.
Furthermore, successful word reconstruction entails both top down and bottom up processing. Think aloud verbal protocol studies of the underlying processing of the C-Test indicate that to solve the items local word-level processing for restoring some words suffices. However, restoring more difficult words requires reading and processing longer portions of the context and integrating information from various parts of the text [47]. Considering these findings, one can conclude that C-Test taking requires both inductive and deductive reasoning. Resnick [48] sates that higher order processes are complex and, therefore, difficult to measure. Nevertheless, there is no such thing as reasoning that is independent of prior learning [3]. Considering this, an efficient and practical way "to measure reasoning involves looking for the products of past reasoning, rather than trying to capture reasoning at the moment of testing" [49] (p. 25). Even if we believe that C-Test taking does not require any reasoning we should note that successful reading, text processing and word restoring requires extensive prior reasoning with texts. The roles of working memory and reasoning in reading comprehension are well established. In the extended Gf-Gc model, syllogistic reasoning and verbal closure (comprehension of sentences when parts are missing) are parts of Gc. If we agree with these abilities as components of Gc, then the C-Test should be an optimal measure of Gc.

Empirical Evidence
Some empirical evidence for our main argument that the C-Test is a measure of Gc exists. Raatz [50] found a moderate correlation between the C-Test and nonverbal intelligence. In a factor analytic study on three different samples Raatz [51] gave a German C-Test along with some verbal and nonverbal intelligence tests to German school children and college students. The C-Test heavily loaded on the verbal intelligence factor for adults (0.84). In a children's sample C-Test had a smaller loading on the verbal intelligence factor (0.55) and loaded moderately on the reasoning factor (0.42). Along the same line, Wockenfuß and Raatz [52] found substantially high correlations between the C-Test and verbal subtests of LPS (Leistungsprüfsystem) intelligence test [53].
In a study Ackerman, Beier, and Bowen [39] examined the relationship between cloze test, completion test, and knowledge with Gc. Completion test was similar to cloze test except that the intact passage was read aloud to the test-taker in its entity before they attempted to answer it. It was a strategy devised by Terman [38] "in order to rob the test [Ebbinghaus's combination tests] of its puzzle nature" (p. 344). Gc measures were vocabulary, information, word beginnings, comprehension, and synonyms. Gc's manifest correlations with cloze and the completion test were 0.766 and 0.807, respectively; the cloze and the completion test had manifest correlations of 0.676 and 0.734, respectively, with knowledge test. The ttest for the difference of dependent correlations showed that the association of completion test with both knowledge and Gc was significantly stronger than the association of the cloze test. The higher correlation of completion test was attributed to the additional listening and recall component tapped by the completion test. Structural equation modeling with four correlated factors, i.e., Gc, cloze, completion, and knowledge provided good fit. The latent correlations of cloze and completion tests with Gc were 0.83 and 0.88, respectively. Cloze and completion correlated at 0.75 and 0.82 with knowledge, respectively, while Gc and knowledge had a latent correlation of 0.97. Note that in this study, C-Tests were not used but some variations of the C-Test, i.e., cloze and completion tests, were employed to measure Gc. Therefore, findings from this research should be interpreted cautiously. However, due to the scarcity of empirical studies on the C-Test as a measure of Gc the results of a study with close variations of C-Test can be very informative.
Perhaps the strongest evidence for the validity of the C-Test as a measure of Gc is provided by Schipolowski, Wilhem, and Schroeders [12]. In an extensive study to investigate the nature of Gc they administrated measures of verbal ability (Va), declarative knowledge (Kn), and fluid intelligence (Gf) to a large sample of respondents (n = 6701). The verbal ability measures included: reading, listening, writing, orthography, language usage, and the C-Test. A correlated factor model with verbal ability, knowledge and Gf as interrelated but distinct factors had a good fit. The interesting finding in this study is that the C-Test had the highest loading (0.85) on the verbal ability factor, higher than any other verbal measure used in the study. Furthermore, correlations of ρ (Va, Kn) = 0.91, ρ (Va, Gf) = 0.89, ρ (Kn, Gf) = 0.85 were observed between the three latent factors.

Conclusions
In this note, based on existing empirical and theoretical evidence, an attempt was made to demonstrate that the C-Test is a global measure of crystallized intelligence. Mapping the findings of thirty-five years of research on the C-Test as a test of general language proficiency in first and second language onto the definition of Gc shows that the construct underlying the C-Test coincides to a considerable degree with the verbal ability component of Gc as defined in major intelligence theories.
Research shows that the C-Test taps reading comprehension, working memory capacity, grammatical ability, vocabulary knowledge, spelling, background knowledge, and the ability to use redundancies in the language. Gc is basically verbal ability, on which subfactors of verbal comprehension, reading comprehension, and vocabulary knowledge, highly load [2]. The C-Test seems to capture the complex of abilities, which are generally referred to as Gc.
An aspect that is commonly ignored in the measurement of Gc is factual knowledge. A shortcoming of most modern intelligence tests is that they basically measure Gc by vocabulary measures or more generally language knowledge and ignore cultural and factual knowledge [4,12]. Baghaei and Grotjahn [54,55] demonstrate that by manipulating the type and content of the texts on which C-tests are based, many abilities of interest can be tapped. Baghaei [56] also shows that the rhetorical organization of texts affects the C-Test construct. It might be possible to take advantage of this flexibility of the C-Test in measuring the knowledge aspect of Gc, which is not usually included in intelligence measures. By constructing C-Tests from texts that contain the desired cultural and factual information, the knowledge component of Gc might be assessed. Further research is required to ascertain this hypothesis.
The speeded C-Test, C-Test administered under time constraints [57,58], is another variation of the C-Test, which reportedly results in increased difficulty, reliability, and discrimination with very advanced language learners. Grotjahn et al. [58] claim that the speeded C-Test measures declarative knowledge, procedural knowledge, and automatization of information processing. The study reported by Grotjahn et al. [58] was conducted in the context of second language learning. The technique certainly deserves empirical research in the field of intelligence to check its relation and loading on g and Gc compared to the canonical power C-Test. This is particularly important, as reading and writing speed are included in the Gc and Grw.
Ackerman et al. [39] stated that it is puzzling that the application of completion test as a single test was discontinued, despite the fact that it produced higher correlation with intelligence than any other test, and treated it as a topic for historical research. Spearman [59] considered it as the best single test of intelligence, "…we believe that the completion test clearly deserves greater use in the context of assessment of a broad Gc ability, and in the prediction of learning and achievement" [39] (p. 119). We suggest that the C-Test, as a modern version of the completion test, with a strong theoretical and empirical background, is a more suitable measure of Gc than other completion type tests and merits application and research efforts in the field of intelligence.

Author Contributions
Purya Baghaei conceived the idea for the research, accumulated evidence, and wrote most of the manuscript. Mona Tabatabaee revised and reworked the manuscript and suggested additional supporting evidence for the argument presented.