In this section, we discuss the standard SYGR method and elaborate on the sensitivity of this method. Similarly, we introduce and compare the usage of an absorbing Markov chain for small cohorts when estimating graduation rates.
2.1. Standard Six-Year Graduation Rate
Suppose we are interested in estimating a population’s six-year graduation rate,
, given some observed data,
D. Since only two final six-year outcomes are possible, that is, graduating or not graduating (according to Federal guidelines), each student’s outcome can be modeled as a Bernoulli trial. With this assertion, the number of students who graduate follows a Binomial distribution with parameter
. Therefore, the probability of
k students graduating out of
N, given
(the probability of graduating in six years for each student), is:
The standard six-year graduation rate method corresponds to estimating the graduation rate by maximizing the likelihood function in Equation (
2). Based on the maximum likelihood estimation (MLE), the estimated graduation rate follows the frequentist approach [
10], whereby
is an unbiased estimator for graduation rate.
In order to demonstrate the performance of the MLE approach, we use data collected from the University of Central Florida (UCF) Based on historical data, the six-year graduation rate at UCF for first-time-in-college (FTIC) students starting in 2008 is 71.2%, with 28.8% of students graduating in more than six years or halting enrollment. Assuming 71.2% to be the true value, parameterizing a binomial distribution representing the number of students graduating within six years (i.e.,
), we randomly simulate 10,000 student cohorts of different sample sizes,
N, with the resulting SYGR calculated using Equation (
1). Here, each one of the 10,000 cohort samples represents an incoming Freshman class, or perhaps a specific cohort (e.g., women in STEM fields). The simulated numbers of graduates in the cohorts with different sample sizes alongside the corresponding average graduation rate and standard variation across the cohort samples are summarized in
Table 1. The corresponding probability density function (pdf) of the six-year graduation rate for each set of cohorts with a fixed size, representing an estimate, is shown in
Figure 1. As indicated in
Figure 1 and
Table 1, the distribution of graduation rate estimates for the cohorts with small sample sizes can vary significantly from the asserted true value of 71.20% (see the case for
where
). That said, the sample mean of the graduation rates over the 10,000 simulated cohorts is between 71.19% and 71.21%, which closely matches the actually reported graduation rate of 72.20%, again indicating that the maximum likelihood estimator is an unbiased estimator. Both the figure and table illustrate how the sample standard deviation for cohorts with small sample sizes (
) is significantly larger when compared with larger sample sizes (
), i.e., 6.4% versus 0.6%. Accordingly, when
and
, the probability that the SYGR is reported to be lower than 66% or higher than 76% is 0.53 and 0.03, respectively (corresponding to a graduation rate reporting error of greater than 5%). These differences can be considered significant in the context of college rankings and when being evaluated by government or accreditation boards (And while the numerical values above are specific to UCF, similar sensitivity results are expected at other American universities). Accordingly, it is arguably true that natural statistical variations have the potential for an oversized impact on the ranking or perception of small departments or small colleges when compared to larger departments or colleges.
2.2. Absorbing Markov Chain
Different approaches are used to evaluate students’ performance and persistence in higher education systems, among which machine learning algorithms and stochastic models are the most common [
11,
12,
13,
14,
15,
16,
17]. Markov models have been used in many educational studies to analyze students’ academic progress and behaviors [
3,
18,
19,
20,
21,
22]. For example, Nicholls [
3] analyzed the progress of graduate students through a degree program in Australia. Using Markov modeling techniques, the authors assessed students’ performance according to measures like expected time to graduation and graduation rate. The authors developed a Markov model that includes two absorbing states representing withdrawing from the program and graduating. Additional transient states represent the students’ status at the end of each year based on their academic performance. A similar modeling procedure is provided by Bairagi and Kakaty [
23], who focused on a university system in northern India. While maintaining a significantly different academic structure (e.g., number of courses per semester, semester exams for each course), their Markov models were also able to track and model student progression.
For the prior works cited above, the authors made use of a specific class of Markov chains referred to as absorbing Markov chains (AMCs). Absorbing Markov chains have two classes of states: transient states and absorbing states. In the case of applying AMCs to track student progress through a degree program, the total number of states (both absorbing and transient) for an AMC is typically finite. For modeling American four-year universities with an AMC, absorbing states could correspond to graduating or halting. In contrast, transient states could correspond to academic level (e.g., Freshman, Sophomore, Junior, Senior). An example application of an AMC for an American four-year university is provided in
Figure 2. In the case of an AMC, when the system transitions from a transient state to one of the absorbing states, it cannot exit the state. Again, transitioning to an absorbing state corresponds to a student halting their education or graduating; however, practically speaking, a student could always earn another degree or re-enroll years later. In addition to a list of absorbing and transition states, each AMC, like any other Markov model, is defined by a transition matrix
representing the probability of moving from state
i to state
j [
20,
24]. The canonical form of an absorbing Markov chain with
r absorbing states and
t transient states is based on the matrix
P in (
3). In the matrix,
R is a
matrix containing the transition probabilities from the transient states to the absorbing states,
Q is a
matrix that represents transition probabilities within the transient states,
I is an
identity matrix, and 0 is an
zero matrix that allows for the AMC to model the trapped dynamics when entering an absorbing state [
25,
26]. The matrix
P is used as part of the dynamic equation
, where
is a vector containing the probability distribution over the states at time-step
y. In our case, the dynamic equation probabilistically describes how a student advances through their academic career. While matrix
P provides the one-step transition probabilities between states, the matrix power
represents
n-step probabilities of transitions between states. In other words,
is the probability that a system that is initially in state
i will be in state
j after exactly
n steps [
27,
28,
29]. Each absorbing Markov chain has two useful calculated properties: expected time until absorption (U) and probabilities of absorption (B) to the absorbing states [
30]. These characteristics are computed with Equations (
4) and (
5). In our case, their values correspond to the probability and expected time to graduate or to halt.
where
In order to demonstrate how absorbing Markov chains are used to estimate graduation rates, we consider students’ academic levels (Freshmen, Sophomore, Junior, Senior) as the transient states and students’ final educational outcomes (graduate or halt) as the absorbing states. All students start from a dummy state (the start state); then, based on their incoming academic credits (e.g., Advanced Placement credits), they are assigned to other states. After this initial assignment, students then advance through the transient states based on their accumulation and successful completion of academic credits. Ultimately, students are absorbed into one of the absorbing states. For our purposes, when processing historical data, we declare students to have halted their education if they do not enroll for three consecutive semesters. The possible transitions and transition probabilities for students who started their education in Fall 2008 at UCF are shown in
Figure 2. As typically done, the transition probabilities between states are extracted from historical data using the maximum likelihood estimator, resulting in an unbiased estimation of individual transition probabilities. Each state in the AMC corresponds to the student’s academic standing at the end of each academic year. For example, at the end of one year, 10% of sophomore students remain at the sophomore level, while 75% and 8% of the students advance to a junior and senior academic standing, and finally, 7% will halt their education. In order to find the fraction of students who graduated within six years, we need to calculate
(
accounts for students beginning in the start state) and observe the entry that contains the transition probability from the start state to the graduate state. Assuming that the AMC and its transition probabilities (illustrated in
Figure 2) are an accurate representation, the
N-step transition matrix is given by the matrix in Equation (
6) below, with the bolded value
corresponding to the estimated graduation rate.
In order to evaluate the performance of the AMC method (i.e., using
) to estimate graduation rates, 10,000 cohorts with different sample sizes (
N = 50, 500, 5000, 10,000) are generated based on the UCF transition matrix parameters depicted in
Figure 2; again, this assumes that the values of transition matrix are the true values. The academic trajectory of the students is sampled directly from the Markov model. Examples of sampled generated students’ academic trajectories are provided in
Table 2. For each of the 10,000 sets of
N generated student trajectories, a unique transition matrix,
, is generated based on the sampled generated data. So, for
, there are 10,000 induced transition matrices
whereby the transition probabilities are calculated based on academic trajectories using only 50 students. The estimated graduation rate for a sample of 50 students is then extracted from the estimated
; this process is repeated 10,000 times, with a report on the sample in
Table 3.
The sampled probability distribution functions (pdfs) of the estimated graduation rates when using the AMC method are shown in
Figure 3 for a variety of cohorts with different sample sizes. As the figure illustrates and
Table 3 reports, the sample standard deviation of the estimated graduation rate for cohorts with small sample sizes is high (e.g., for
, the sample standard deviation is
). Besides the poor performance of the AMC in terms of limiting the sample standard deviation, the AMC introduces a bias from the true graduation rate, as established by the actual six-year graduation rate. The estimated graduation rate based on the AMC method is 68.6% (the bold number in Equation (
6)), while the true six-year graduation rate for the same cohorts on which the original was based is 71.20%; the same bias is present for all sample sizes. In other words, the AMC model underestimates the six-year graduation rate.
Figure 4 compares 5–95% inter-quartile interval as a measure of performance (in terms of estimation variation and bias) for both the absorbing Markov chain and six-year graduation rate methods. Based on the results, we see that the sample standard deviation of the estimated graduation rates in both the SYGR and AMC methods is higher for cohorts with small sample sizes. Furthermore, the AMC has a 2.6% (71.2–68.6%) bias in estimating the true graduation rate, unlike the SYGR method, which does not have an estimation bias.
To overcome the challenges presented above (i.e., bias and large sample standard deviation for small sample sizes), we propose the use of a regularly updating multi-level absorbing Markov chain (RUML-AMC) method to cope with these shortcomings (in terms of both variation and bias). We then provide a sensitivity analysis to demonstrate the benefit of this method in improving the accuracy of graduation rate estimates. The details of the proposed methodology are explained in the next section.