1. Introduction
Graduate school in scientific psychology and related disciplines (e.g., human development, educational psychology, brain science, etc.) trains its students for two main tasks—scientific research and teaching. Traditionally, there has probably been more emphasis on research in graduate training, but for many if not most graduates, the large majority of their time will be spent fulfilling teaching responsibilities. Although many students have their sights set on tenure-track jobs in major research universities, the large majority of students who actually get academic jobs will land in teaching-intensive liberal-arts colleges, state-university campuses, or community colleges. Even among those who do make it to major research universities, many will be adjuncts whose paid responsibilities exclusively involve teaching assignments (see [
1,
2], for more details on these issues). Good teaching is critical not only for future scientists, but for the lives of the students that the scientists will impact through their instruction [
3,
4]. Scholarship in psychology and related disciplines should advance not only the research agenda of the university, but also the university’s teaching agenda [
5].
Students who do not choose an academic career—or who are forced into a non-academic career through the dearth of academic jobs—may believe that they have no need to learn to teach. Chances are, they are wrong. In most of the jobs they go into—consulting, government research, nonprofits, amongst others—they will find themselves needing to make presentations to others [
6]. Sometimes, the stakes for such presentations are even higher than for teaching a single class, such as a decision whether to award a consulting contract or the decision whether to begin the manufacture of a new product. So even for those students, oral-presentation skills can be critically important.
A previous set of investigations [
7] explored the relationship between scientific reasoning skills, such as formulating scientific hypotheses, designing scientific experiments, evaluating scientific conclusions, and reviewing research articles, and to scores on conventional tests of cognitive abilities (e.g., letters sets and number series, which are tests of inductive-reasoning skills; and the SAT and ACT, which are tests of knowledge and analytical reasoning). It was found that the scientific-reasoning tests tended to correlate with each other, as did the conventional tests of cognitive abilities; however, the scientific-reasoning tests generally did not correlate significantly with the tests of cognitive abilities, and in some cases, correlated negatively. Those investigations suggested that tests of scientific reasoning might provide useful supplements to conventional tests of knowledge and analytical reasoning in graduate admissions, at least in psychology and related disciplines. Work on undergraduate admissions also suggested that conventional tests could benefit from supplementation with others kinds of measures, such as creative and practical thinking [
8,
9]).
The work on scientific reasoning only assessed research skills, not teaching or teaching-analysis skills. Yet as noted earlier, much of a typical scientific career is spent in teaching, preparing for teaching, or analyzing teaching. Analyzing teaching is particularly important, because it is through the analysis of others’ and one’s own teaching skills that one can become an excellent or at least good teacher [
10]. Analyzing the quality of teaching is important not only for developing and improving one’s own teaching, but also for other tasks confronting faculty members. An analysis of teaching skills is typically involved, and should always be involved, in hiring new faculty members, promoting faculty members, tenuring faculty members, evaluating faculty members for raises, awarding teaching prizes, and other related activities. If one cannot analyze teaching (just as if one cannot analyze research), it is difficult to have a successful career as a faculty member.
Of course, being able to distinguish good and not so good teaching in oneself and others is not the only element in becoming a good teacher. One must also translate what one has learned into actual active teaching skills. Much of whether this translation will ever take place, however, will depend not only on a set of abilities, but also on the incentives of the institution and one’s motivation to become a good teacher. If one is teaching in a small liberal-arts college, the motivation to excel as a teacher will probably be very high; but if one is teaching in a large “Research 1” state university, the incentives for excellent teaching might be much lower. Thus, extrinsic and intrinsic motivational factors will probably be as important as analytical skills in determining who ultimately becomes a good teacher.
Because of the importance of being able to distinguish good from not so good teaching, it would seem reasonable to consider assessing such skills in the graduate-admissions process. From this point of view, if faculty members will spend a large proportion of their time teaching, do graduate programs really want to admit students who lack the skills necessary for successful teaching? Of course, students often start evaluating teaching as college students filling out course questionnaires, but research suggests that such evaluations are often very superficial and even wrong-headed [
11,
12,
13,
14,
15,
16].
Our study was designed to assess people’s ability to evaluate, in an experimental situation, problematical behaviors in the teaching of psychology and related disciplines. The way in which we did this was to have professors with extensive teaching experience teach lessons in ways that were purposefully designed to embody particular flaws. The question then became whether the subjects viewing the teachers in the process of teaching would recognize their flaws in the teachers’ behavior. The flaws were not pointed out to the subjects. Rather, subjects had to reason about the teaching they were observing in real time.
“Did you do any teaching in the past year (courses, seminars, laboratories)? Would you like additional opportunities to teach? How will you find these teaching opportunities?
What sorts of feedback, formal or informal, have you received on your course content, syllabi, pedagogy, consideration of diverse learners and overall teaching abilities? In which areas do you need to improve? How will you improve your teaching and what resources are available?”
This assessment is a self-assessment, however, and it is not clear what its statistical properties are. Nor is it clear whether self-assessments are valid indicators of a person’s teaching skills or reasoning about teaching skills. We have been unable to locate specific formal instruments that have been used for graduate admissions that assess teaching skills for psychology. There are, of course, teacher-competency examinations, such as the Praxis (
https://www.ets.org/praxis). There are three Praxis examinations. The Praxis Core Academic Skills for Educators (Core) tests measure reading, writing, and mathematic skills. The Praxis Subject Assessments measure subject-specific content knowledge, as well as general and subject-specific teaching skills, that individuals need for teaching. Moreover, the Praxis Content Knowledge for Teaching Assessments (CKT) measure subject-specific content knowledge, with an emphasis on content knowledge for K-12 teaching. The closest to what we are trying to accomplish is the Praxis Subject Assessment for teaching psychology (5391) in grades K-12. But these tests are of declarative knowledge, whereas our assessment is oriented toward procedural knowledge. Our assessment asks subjects to evaluate actual teaching as it is in progress, rather than to assess the teaching of psychology in the abstract.
We do not attempt here to fully review the extensive literature on assessments for graduate admissions. Such reviews can be found elsewhere (see, e.g., [
7,
19,
20], for partial reviews). We present only a partial review.
The theoretical basis of our work is Sternberg’s [
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31] theory of successful intelligence. The theory holds that intelligence can be understood in terms of information-processing components that combine to produce intelligent behavior (see also [
32,
33]). When these components are implied insightfully, as in reasoning about scientific research and presumably about the teaching of science, they can solve problems that go beyond the usual kinds of routine problems that one confronts in everyday life [
34,
35]. The theory of successful intelligence is one of a class of theories that seek to understand intelligence in somewhat broader terms than conventional theories (see [
36,
37]; see also [
38,
39] for a theory arguing that conventional intelligence does not sufficiently take into account rational thinking). The theory of successful intelligence has been previously applied to undergraduate admissions [
8], but only recently has been applied to graduate admissions [
7] (see also [
40]).
According to the theory [
30], all analytical, creative, and practical abilities comprise the
same components, namely, metacomponents, performance components, and knowledge-acquisition components. What differs in the three is how they are applied. The components of analytical abilities are executed in the application of the components to somewhat familiar but relatively abstract kinds of problems and situations. Creative abilities are executed in the application of these components to relatively novel tasks and situations. Practical abilities (as for evaluating teaching or research) are involved in the application of the components to those everyday concrete problems and situations with which one is confronted in one’s work and personal life. This means that analytical, creative, and practical abilities should show some correlation—they depend on the same information-processing components—but the extent of the correlation will vary depending upon the extent to which the contexts of measurement overlap. There is no one expected level of correlation but rather it depends on the particular tasks, situations, and their contextual overlap. It is difficult to specify in advance exactly what this overlap will be. For example, the practical abilities of a physicist may be more overlapping with the analytical abilities of a physicist than would be the case for an artist or a musician. In the theory of successful intelligence, there are no absolutely “pure tasks”, because so many tasks require a combination of analytical, creative, and practical skills.
Individuals can be strong in general abstract analytical skills but not necessarily strong in applying those skills to any one particular domain of practice. For example, someone might be adept at solving number or letter series, or at solving general mathematical problems, but not be adept when applying the same inductive reasoning skills to a domain of practice such as legal, medical, or scientific problem solving [
29,
40], or to reasoning about teaching, for that matter. The basic argument as it applies here is that the cognitive skills needed to succeed in teaching are in part different from the abstract analytical skills measured by tests such as the GRE (Graduate Record Examination). Abstract-reasoning skills certainly matter, but they are not the whole story. Thus, the goal here is not to supplant tests such as the GRE, but to examine whether future assessors would perhaps wish to look beyond such measures to measures that assess a person’s skill in evaluating the quality of teaching.
The basic question in our study is where the reasoning-about-teaching measure will place in terms of its factorial structure in relation to the other measures. There are various hypotheses about how it might load.
One hypothesis would be that reasoning about teaching is relatively different from reasoning about research, but it is inductive and draws on knowledge, so that the new reasoning-about-teaching measure should cluster with the tests of cognitive (abstract reasoning) and educational skills.
A second hypothesis would be that reasoning about teaching is basically similar to reasoning about research, in which case, factorially, the new reasoning-about-teaching measure would cluster with the reasoning-about-research measures.
A third hypothesis is that reasoning about teaching is similar neither to reasoning about research nor to traditional cognitive educational skills. In this case, the measure would cluster with neither of the other two groupings and instead would form its own factor. It might represent a set of skills entirely different from the skills measured by conventional tests or by measures of scientific reasoning.
The theory of successful intelligence would be consistent with either of the second or third hypotheses. Reasoning about teaching is a contextualized skill, so we would not expect it to be closely related to the kinds of reasoning on conventional tests of cognitive and educational skills. But whether this kind of reasoning is similar to reasoning about research was not clear to us in advance, and so we made no prior prediction regarding the second and third hypotheses. It might or might not be similar to reasoning about research.
There is one aspect of the experimental design that will be described below that might tend to yield to a separate reasoning-about-teaching factor. This is that whereas our reasoning-about-research measures are printed scenarios that can be read at a subject’s leisure and analyzed at whatever rate is comfortable for the subject, the reasoning-about-teaching measure is a video presentation in real time. Although subjects can take as long as they wish to provide responses, the videos themselves are rather quickly paced, and subjects have to watch them and then respond. If it turned out instead that, even with this methodological difference, the second hypothesis was nevertheless correct, it would suggest that reasoning about research and reasoning about teaching truly are rather closely bound.
4. Discussion
Our goal in this study was to explore the feasibility of adding a measure of evaluation of scientific teaching into the graduate-admission process. In particular, we were interested in whether a new measure of reasoning about teaching—evaluation of flaws in teaching—would prove itself to cluster factorially with traditional tests of cognitive and educational skills (the first hypothesis), would prove to cluster with our reasoning-about-research measures (the second hypothesis), or would form its own factor (the third hypothesis). Our data best fit the second hypothesis. Factorially, our measure of reasoning about teaching flaws clustered with hypothesis generation, drawing conclusions, and reviewing research. In this study, the reasoning-about-research measures clustered into two different factors (in contrast to Sternberg & Sternberg, where they formed a unified factor). Our measure of reasoning about teaching did not correlate with any of the research-evaluation measures at a level anywhere close to the reliabilities of the respective measures, suggesting that it possessed a unique variance of its own that was not shared with the research-evaluation measures. Thus, it is a useful supplement to those measures, rather than being redundant with them.
The study does not look at the modifiability of reasoning about flaws in teaching or of reasoning about scientific research. Almost certainly these skills can be taught [
46,
47], and books on experimental methods (e.g., [
48,
49]) try to teach these skills, as do books teaching critical thinking [
50,
51].
Our study, of course, had various limitations.
First, our subjects were all undergraduates at Cornell. Although Cornell is more diverse than any of the other Ivy League Schools (containing separate colleges for disciplines as diverse as arts and sciences of business, hotel administration, agriculture, human ecology, engineering, and industrial and labor relations), there certainly would be some restriction of range in abilities beyond that of students applying to attend graduate school in psychology and related disciplines.
Second, our instruction was limited to four topics in developmental psychology. Although the topics themselves were somewhat diverse, they did not include many of the areas of psychology, for example, brain science or humanistic psychology.
Third, our sample was only slightly more than 100 subjects, and thus not particularly large. Future research would need a larger sample of subjects.
Fourth, although all of our measures have been used before (Sternberg & Sternberg, [
7]), except for the reasoning-about-teaching measure, their use has not been extensive. We do not have extensive prior data on these measures.
Fifth, we were measuring reasoning about teaching, not actual teaching. In a subsequent study currently being run, we actually have students give a brief lecture. We believe this follow-up is important because there is no guarantee that someone who is good at spotting flaws in lectures is also good in lecturing (and vice versa). However, most students have had very limited experience in lecturing, and so it is not clear how predictive we can expect their own teaching to be when they are so inexperienced in teaching. Such a task may prove to be beyond the students’ zone of proximal development (Vygotsky [
52]). Moreover, such a task probably heavily involves extrinsic and intrinsic motivational factors that may have little to do with subjects’ actually teaching, learning, or reasoning abilities [
53,
54].
Finally, and perhaps most importantly, our subjects were not ones specifically targeting graduate school in psychology and related disciplines, and we cannot say how they would have performed in graduate school. Even if we targeted applicants to graduate schools in psychology and related disciplines, it would have been difficult to follow up, because the students then would have gone to very diverse schools with widely varying systems of evaluation, and it is not clear how we truly could have kept track of them.
What our study did suggest, however, is that it appears to be possible to measure students’ reasoning about teaching flaws; an important skill students need to succeed not only in academia, but in the world of science, more generally. Our new measure is certainly not a finalized instrument, but it suggests that a more refined instrument could be developed based on students watching lectures (or other classroom situations) and commenting on the flaws (and perhaps in future assessment, the strengths) in specific situations. Teaching is critically important to success in many jobs in psychological and related sciences, and a measure that assesses skills relevant to success in teaching certainly deserves consideration as a supplementary measure for admission in graduate education in the psychological sciences and related disciplines. Most importantly, the current research suggests the importance of scholarship that advances the teaching as well as the research agenda of the university (Boyer [
5]). If the goal of a university is to develop the active concerned citizens and ethical leaders of the next generation (Sternberg [
9]), then this can only be done if the university seeks to admit and then educate the best teachers that our society can possibly find. Such teachers will excel not only in the analytical skills measured by conventional standardized tests, but also in the practical skills of the intellect [
55,
56,
57,
58,
59] that are so important in teaching and other professions as well as in the creative skills involved in teaching and being a serious scientist [
60].