1. Introduction
Course enrollment often dictates the format of examinations in business and economics courses. For large lecture sections, multiple choice exams are often preferred by instructors and students. This study asserts that student performance on these types of exams can be viewed as the result of the process of elimination of incorrect answers, rather than the selection of the correct answer. Viewed in this way, the elimination by a student of all of the incorrect answers to a particular exam question results in choosing the correct answer. However, if no wrong answers are eliminated, the response to a particular exam question is a fully uninformed guess, which has a 0.25 probability of being correct. Thus, the more wrong answers one eliminates, the higher the probability that the correct answer to a given exam question is selected by a student.
In this study, we assert that how students respond on a multiple-choice test can be broken down into the fractions of questions where no wrong answers can be eliminated (i.e., random guessing), one wrong answer can be eliminated, two wrong answers can be eliminated, and all wrong answers can be eliminated. The results from an empirical model, representing a mixture of binomials in which the probability of a correct choice depends on the number of incorrect choices eliminated, we find, using performance data from a final exam in principles of microeconomics, that the responses to about 26 percent of the questions on the exam can be characterized as random guessing, while none of the questions on the exam is completed after eliminating all of the incorrect choices. Before delving further into our results, we first describe our mixture-model approach to performance on multiple choice exams.
2. Mining for Correct Answers on Multiple-Choice Exams
Given the widespread use of large lecture sections in introductory courses in economics, management and marketing, multiple-choice questions are the basis of a significant portion of assessment by college and university instructors [
1,
2,
3]. The prevalence of multiple-choice testing has led to studies examining the impact of exam structure on student performance. One branch of the literature examines the possibility that the chronological ordering of exam questions (i.e., exam questions are presented in the same chronological order as the course content was delivered) has some bearing on exam performance in economics [
4,
5,
6,
7,
8]. A second branch of the literature extends research in cognitive psychology to disfluency, which is defined as the subjective experience of difficulty associated with cognitive operations [
9,
10,
11], and this was performed by testing whether or not font disfluency improves exam performance in economics principles [
12].
This paper examines student performance on multiple-choice exams by focusing on what the educational measurement literature describes as a popular “test-wiseness” strategy, whereby students reach the correct answer by eliminating some distractors, depending on their partial knowledge of the test content [
13,
14,
15,
16]. In the absence of such partial knowledge, students tend to use blind guessing, which has a probability equal to one divided by number of alternatives in choosing the correct answer [
13,
17]. This paper assumes that economics principles students adopt this approach to multiple-choice exams, where each multiple-choice exam consists of
questions, each with
alternatives. The process of scoring well on the exam can be viewed as the elimination of incorrect answers, rather than the selection of the correct answer. For example, suppose that each multiple-choice question has four answers, that is,
= 4. Thus, the exam consists of 4
possible answers, of which
are correct, and 3
are incorrect. (In this simple case, we are assuming that there is no question for which “all of the above” or “none of the above” is the correct answer.) A perfect score on the exam involves the elimination of 3
, or, more generally, (
−1)
incorrect answers. Eliminating anything less than 3
incorrect answers involves some guessing, with the guesses being better, on average, when a larger number of incorrect answers can be eliminated for each choice. For any individual exam question,
i,
is the number of incorrect choices eliminated out of
k total choices, and
That is, if each multiple-choice question has four (five) choices, the number of incorrect choices eliminated must fall between zero and three (four).
To illustrate, suppose there are four possible answers for each question. In this case, the elimination of three incorrect answers results in choosing the correct answer. However, if no wrong answers are eliminated, the response is a fully uninformed guess, which has a 0.25 probability of being correct. The more wrong answers one eliminates, the higher the probability that the correct answer is selected. If one wrong answer can be omitted with certainty, the probably of answering correctly rises to 0.33 or, more generally, to,
On an exam with
= 4 choices, the probabilities of a correct answer,
, given the number of incorrect answers eliminated,
, are given by:
Thus, on an exam with
= 4 choices for each question, the probabilities as a function of the number of wrong answers eliminated are, respectively,
On any given exam for any student, there are multiple choice questions where the correct answer is known with certainty, other questions where random guesses are the response, and all cases in between.
In this framework, we have assumed that incorrect answers are equally plausible and not correlated or interconnected. That is, we do not consider the case where some incorrect answers are clearly implausible, as that would increase the probability of choosing the correct answer. Neither do we consider the situation where several questions refer to the same table or graph and an incorrect answer on the first question in the group increases the probability of an incorrect answer on subsequent questions in the group.
In this paper, we wish to investigate how students respond on a multiple-choice test, which can be broken down into the fraction of responses where no wrong answers are eliminated (i.e., random guessing), one wrong answer can be eliminated, two wrong answers can be eliminated, and all wrong answers can be eliminated. In order to investigate this issue, we examine student responses on a final exam consisting of 100 questions ( = 100) with four choices each ( = 4). Thus, a perfect score involves the elimination of 300 incorrect answers.
3. Data, Empirical Strategy and Evidence, and Future Research
The data for our study come from student performance on a final exam in a principles of microeconomics course. These consist of 94 exam grades over each of 100 questions. The final exam consists of multiple-choice questions, with four answer choices offered for each question. Our empirical model is a mixture of binomials in which the probability of a correct choice depends on the number of incorrect choices eliminated. Thus, for each of the 100 questions on the exam, students can eliminate zero, one, two, or three incorrect answers. Eliminating zero incorrect answers is random guessing, and eliminating three incorrect questions results in a correct response. Between these two extremes, we have what we call “informed guessing”, where some incorrect answers have been eliminated.
Let
represent a binomial indicating the number of successful choices,
, in
trials. Here, we have 100 trials or questions. The probabilities change with the number of incorrect choices eliminated. Thus, the probability function for each observation is based on a mixture of binomials given by:
and the resulting log-likelihood function is given by:
Given that the mixing weights must sum to one, one of the mixing weights is not identified. We set . Formulated this way, the mixing weights indicate the fraction of exam responses consistent with eliminating zero, one, two, or three incorrect answers. Thus, is the fraction of responses associated with random guessing, and the probability of a correct answers is 0.25, is the fraction of responses associated with the elimination of one incorrect answer, is the fraction of responses associated with the elimination of two incorrect answers, and is the fraction associated with the elimination of three incorrect answers (that is, the correct answer is chosen).
As the success probabilities are known, our estimated model parameters are the mixing weights,
,
= 1 to 4, although, due to the adding up restriction, only three are identified. Our fitted model is thus:
The maximized value of the likelihood function is −791.390. This value necessarily must fall between the likelihood value associated with pure guessing on every question by every student (i.e., −2166.71) and the likelihood value associated with correct answers provided by every student to every question (i.e., 0.000). The value of the likelihood function associated with pure guessing (i.e., −2166.71) is associated with essentially estimating the model in (4) above, which is subject to the constraint that . Thus, we use a likelihood ratio test to evaluate the null hypothesis of pure guessing against the alternative hypothesis that students did not guess all the time. The test statistic is 2750.64 (i.e., 2(−791.390 + 2166.71)), with three degrees of freedom. This leads to the rejection of the null hypothesis of pure guessing at any of the usual levels of significance.
These estimation results are revealing. The first mixing weight indicates that the responses to more than 26 percent of the questions on the exam can be characterized as random guessing. The fourth mixing weight is 0.000, indicating that no questions on the exam were answered correctly 100 percent of the time, which is really not a great surprise. The two middle cases of informed guessing differ considerably. There seem to be more than 12 percent of the questions answered with one incorrect answer eliminated, while more than 61 percent of the questions are answered as if two incorrect answers have been eliminated.
Recall that scoring 100 percent on the exam requires the elimination of (
− 1)
, or, in this case, 300 incorrect answers, and our results indicate that the estimated number of incorrect answers eliminated is:
That is, this results in a metric of 134.8 out of 300. This is obviously less than half of the incorrect answers that need to be eliminated in order to be successful on the exam. As a result, the exam average is unsurprisingly low. The responses to over 26 percent of the questions were pure guesses and, at best, students were able to whittle the choices down to two for most questions.
There are a number of future research opportunities related to pedagogical choices that would extend or broaden the approach taken in this study. In terms of economics, prior studies have examined whether or not classroom experiments improve student learning. The seminal study in this genre [
18] provides mixed results. Perhaps an extension of [
18] and subsequent studies [
19,
20,
21] that focuses on the in-class use of experimental economics to reduce student guessing on exams would shed additional light on the relationship between pedagogical choices and student performance. Next, prior research has focused on both the relationship between instructor attractiveness and pedagogical choices [
22] and instructor appearance and academic performance [
23,
24]. Rejoining this line of research, an examination of the relationships between instructor appearance and academic performance that concentrates on a reduction in guessing by students would perhaps provide a new angle on the study of the beauty premium in academia. There are other avenues for future research to explore efforts to improve student performance on multiple-choice exams. Some of these are related to prior research that is discussed above in this study. For example, a re-examination of the possibility that the chronological ordering of exam questions matters that focuses on student guessing may provide a useful addition to the prior literature on the ordering of test questions [
7,
8]. Finally, another possibility is additional exploration related to the behavioral economics implications of disfluency in exam preparation [
25,
26]. In this regard, one might revisit the [
11] study of student performance by employing separate exam review handouts formatted in either an easy-to-read or a difficult-to-read font in order to investigate how font disfluency relates to the mixing weights discussed above in this study.
4. Concluding Comments
The use of large lecture halls for instruction in business and economics courses often results in multiple choice-based assessments of learning. This study asserts that student performance on these types of assessment instruments can be viewed as the result of the process of elimination of incorrect answers, rather than the selection of the correct answer. Viewed in this way, how students respond on a multiple-choice test can be broken down into the fractions of responses where no wrong answers can be eliminated, one wrong answer can be eliminated, two wrong answers can be eliminated, and all wrong answers can be eliminated. The first three of these categories represent some form of guessing by students. The first indicates “random guessing,” while the second and third constitute varying degrees of “informed guessing”.
Using data on student performance on a final exam in principles of microeconomics, the results from an empirical model represent a mixture of binomials in which the probability of a correct choice depends on the number of incorrect choices eliminated, indicating that some form of guessing accounts for performance on all exam questions, with performance on about 74 percent of all exam questions depending upon some degree of informed guessing. In all, purely random guessing accounts for student performance on about 26 percent of all exam questions. Given that scoring 100 percent on the exam requires the elimination of 300 incorrect answers using our empirical approach, our results indicate that the estimated number of incorrect answers eliminated is only 134.8. This is obviously less than half of the needed elimination of incorrect choices. Thus, encounters with relatively low performance statistics in assessments of learning in principles of economics are not surprising.
Useful and different information about exam performance is available at a glance from the mixing weights in our procedure. Instructors would likely prefer that all students mark the correct answer for every question on an exam. As this is likely not the case, instructors would prefer less guessing and the elimination of more incorrect answers on exams. These behaviors can easily be gleaned from the mixing weights in our model. That is, instructors would prefer to observe the mixing weights increasing, such that . This pattern indicates a trend away from guessing and in the direction of the elimination of more incorrect answers, thus leading to the provision of more correct answers. On the other hand, mixing weights that are skewed in the other direction, such that , indicates an abundance of guessing and the elimination of few incorrect answers. Of course, this is not a desirable educational outcome.