Analysis of Student Progression Through Curricular Networks: A Case Study in an Illinois Public Institution

Bonan Yang; Mahdi Gharebhaygloo; Hannah Rachel Rondi; Syeda Zunehra Banu; Xiaolan Huang; Gunes Ercal

doi:10.3390/electronics14153016

,

and

¹

Computer Science Department, Southern Illinois University Edwardsville, 1 Hairpin Dr, Edwardsville, IL 62026, USA

²

Mathematics and Statistics Department, Missouri University of Science and Technology, 400 W 12th St, Rolla, MO 65409, USA

³

Computer Science Department, Southern Illinois University, 1230 Lincoln Dr, Carbondale, IL 62901, USA

^*

Author to whom correspondence should be addressed.

Electronics2025, 14(15), 3016;https://doi.org/10.3390/electronics14153016

This article belongs to the Special Issue Data Retrieval and Data Mining

Version Notes

Order Reprints

Abstract

Improving curriculum structure is critical for enhancing student success and on-time graduation, yet few methods exist to evaluate how prerequisite paths shape student progression and graduation outcomes. This study proposes a data-driven, graph-based framework that integrates course prerequisite networks with student performance data to systematically analyze curricular structure and student outcomes. We identify high-risk courses by jointly modeling their structural importance and pass rates, and quantify the time and survivability of different prerequisite paths using probabilistic models. Additionally, we introduced grade transition patterns to capture more nuanced transitions in student performance and pinpoint bottlenecks along prerequisite paths. Applying the model on four science and engineering majors from a public institution, the results not only identify high-risk courses often missed in conventional analyses, but also reveal path-level disparities and structural bottlenecks that affect student progression and time to graduation. For example, in the Computer Science major, we identified that the architecture and operating systems pathway is more challenging than the software engineering pathway. A closer examination of the course pairs along this trajectory revealed that the difficulty stems from a significant drop in student performance between a prerequisite–successor course pairs.This type of analysis fills a gap in conventional curriculum studies, which often overlook path-level dynamics, and offers actionable insights for educators a to identify high risk curricular components.

Keywords:

course prerequisite network; data-driven modeling; graph theory; probabilistic modeling; higher education

1. Introduction

College is often seen as a formative period that shapes students’ knowledge and skills. During this time, they complete a series of courses, gradually build their own knowledge base, and ultimately progress toward graduation. To support this journey, modern institutions typically design well-structured prerequisite systems to help students clarify their learning paths.

Despite the provided prerequisite structure, students often still encounter confusion when selecting courses. They may struggle to assess the importance of specific courses for future learning or feel uncertain when presented with multiple progression routes. Choosing an inefficient or misaligned path can delay graduation, which is costly both academically and financially. For instance, the average annual tuition at public universities in the United States exceeds USD 11,600 [], making timely graduation a critical concern for students and families. From an institutional perspective, student course-taking behavior offers valuable insights into the effectiveness of curriculum design. Are students following the intended learning paths? Which courses act as critical hubs or bottlenecks in a degree program? Do some pathways lead to more efficient graduation timelines than others? Addressing these issues not only supports students in making informed decisions but also helps universities improve curricular coherence and optimize educational outcomes.

To systematically study curriculum progression and prerequisite structures, graph modeling should be the most direct and natural approach []. A graph is a mathematical structure used to model pairwise relationships between objects. Formally, a graph consists of a set of nodes, which represent entities, and a set of edges, which represent relationships or connections between these entities [,]. In the context of prerequisite analysis, courses are modeled as nodes and prerequisites are modeled as directed edges, forming the course prerequisite network (CPN). For example, some prior studies [] commonly rely on centrality metrics, path lengths, or simulated flows to identify influential courses. However, these approaches often neglect real student outcomes such as grades and completion timelines.

Most prior studies focus on individual courses or the overall curriculum structure, overlooking the path-level progression of students. Few have modeled how students traverse prerequisite sequences, their difficulty, and associated success rates. To fill this gap, we propose a data-driven framework that evaluates each prerequisite path from multiple perspectives. Specifically, we introduce three probabilistic models to estimate the expected number of semesters required to complete a path and the overall success probability.

Another contribution of this study is a joint analysis of course centrality and pass rate, which helps identify structurally important courses that are difficult to pass in practice. Such courses often serve as critical bottlenecks in student progression and may warrant targeted academic support or curricular redesign.

Our final novelty is our use of grade transitions between connected course pairs to evaluate performance continuity. This allows us to quantify how well students’ achievement in a prerequisite translates into success in its successor, offering insights into the alignment between course content and instruction.

2. Literature Review

Course Prerequisite Network (CPN) modeling has been widely applied to analyze curriculum structures across diverse institutional contexts. For instance, ref. [] examined CPNs at a leading STEM (science, technology, engineering, and mathematics)-oriented institution, demonstrating their potential to clarify academic pathways and support curriculum planning. Others have viewed curricula as complex systems and employed CPN-based analyses to uncover systemic patterns within specific majors [,,]. More recently, ref. [] conducted a comparative study of CPNs from five public universities, highlighting structural variations that reflect each institution’s disciplinary emphasis, such as engineering versus biomedical sciences. Ref. [] extended the traditional CPN framework by incorporating semantic information like learning outcomes and assessment linkages, enabling more comprehensive visualizations and analyses.

A substantial body of work focuses on identifying structurally important courses using graph-based metrics. Betweenness and degree centrality have been commonly used to locate pivotal nodes, while the ‘Reach’ metric proposed by [,] quantifies the downstream influence of a course. Ref. [] introduced a ‘course cruciality’ metric to assess how delays or failures in specific courses may affect graduation timelines. While these methods yield valuable structural insights, they are typically isolated from actual student performance and rely solely on static network topology.

To address this, a growing number of studies have started to consider student trajectories and behavior. Ref. [] noted that path length in a CPN provides a theoretical lower bound on time to completion, assuming one course per term. Building on this idea, refs. [,] proposed the Longest Path Induced Graph (LPIG) to identify prerequisite chains that may pose completion risks. Other works have applied frequent subgraph mining to uncover recurring course taking patterns [], developed simulation-based frameworks to model student flow [], and visualized curriculum navigation across majors and levels []. Several studies have also begun incorporating performance-based metrics, such as ‘curricular efficiency’ [] and grade-informed predictive modeling [].

While these efforts represent a shift toward more realistic modeling, many still rely on either simulated data or aggregate statistics. Few analyses focused on prerequisite paths based on empirical data. To bridge this gap, our study adopts a path-level modeling perspective grounded in student transcript data. By integrating probabilistic modeling, structural importance measures, and performance continuity analysis, we aim to provide a more comprehensive understanding of curricular effectiveness and progression risk.

It is worth noting that recent advances in machine learning, though beyond the scope of this study, offer valuable directions for future work. These studies particularly focused on integrating student behavioral data with curriculum models. For instance, [] developed a graph neural network that jointly models student interaction behaviors and static attributes for academic performance prediction. The authors in [] employed a relational Graph Convolutional Network to capture student–course dependencies in heterogeneous graphs, enabling fine-grained grade prediction. Beyond these graph neural network approaches, another emerging strand of research leverages large language models (LLMs) to advance educational analytics [,,]. The rich content embedded in course materials can be used to construct knowledge graphs that reveal deeper conceptual and thematic connections among courses. By extracting these semantic relationships, researchers can foster content-driven approaches to inform and enhance curriculum design. Although these approaches are not directly incorporated into our work, they highlight promising future directions for integrating student behavioral patterns with curriculum structures in modeling. In addition, issues of data decentralization and privacy preserving computation are becoming increasingly important in educational analytics, especially when cross-institutional data is involved. Techniques such as federated learning and collaborative adaptation have shown promise in related domains. For example, refs. [,] provided meaningful prospects.

Finally, we want to clarify that all existing CPN-related analyses, included in our current work, are established based on the assumption that the subjective universities provide the prerequisite course structure. Our work is not applicable for those prerequisite-free subjects. Such works warrants an independent division of the curriculum research; ref. [] can be seen as a important recent work under the prerequisite-free curriculum.

3. Methods

In this work, we introduce a graph-theoretical approach combined with statistical analysis to examine the curriculum system, beginning with the formation of the curriculum network. The graph construction pipeline is illustrated in Figure 1. The data sources include the online course catalog and student transcript records. Specifically, we parsed prerequisite relationships from course descriptions and used them to construct the CPN, where each node represents a course and directed edges indicate prerequisite links.

Figure 1. Graph construction pipeline.

In parallel, we preprocess all students’ registration records and segment them into individual academic trajectories grouped by major. These trajectories are used to construct Student Course Path Graphs (SCPGs), each of which reflects a student’s actual course-taking pathway along with the corresponding grades. These graphs enable us to observe how students navigate the curriculum in practice.

Next, we enrich the CPN by annotating its edges with student-level data derived from the SCPGs, including enrollment frequencies and performance metrics. Based on this enriched network, we derive two specialized graph variants for each major, namely MIG_R, which captures the major-specific structural backbone of the curriculum, and MIG_C, which traces converging prerequisite pathways leading toward the capstone course, revealing the core knowledge flow necessary for program completion.

3.1. Network Formation

3.1.1. Course Prerequisite Network (CPN)

As mentioned above, all the analyses in this study are based on the representation of the Course Prerequisite Network (CPN). The CPN is a directed graph where each node corresponds to a course, and a direct edge from node X to node Y, denoted

(X, Y)

, indicates that course X is a prerequisite of course Y. The CPN is constructed by parsing the university’s official course catalog, which explicitly lists prerequisite relationships between courses.

Since students must complete the prerequisites before enrolling in the successor courses, the CPN is inherently encoded as a directed acyclic graph (DAG). There are no cycles, and one cannot return to a course once it has been passed in the prerequisite chain. Such a directed acyclic structure naturally reflects the hierarchical organization of a curriculum, ensuring that courses progress from foundational to advanced levels. It also preserves the temporal order of course taking, as no course can appear earlier than its prerequisites in any valid path.

While schools aim to offer students flexibility in course selection, the course dependencies often involve complexities such as AND/OR logic, co-requisites, and cross-listed or equivalent courses. Following our prior work [], we simplified the logic by treating all listed prerequisites as mandatory. For instance, when a course lists “A or B” as prerequisites, both A and B are included in the CPN as prerequisite nodes to ensure that all potential prerequisite paths are preserved and the graph remains acyclic. This transformation is to guarantee that all potential valid paths are well preserved.

For co-requisites, we consider two common cases. Firstly, if a course X is listed as a “prerequisite or co-requisite” for course Y but not vice versa, it typically means that students may take X before or alongside Y. In such cases, we model the dependency as a one-way edge from X to Y. Secondly, if a pair of courses are explicitly listed as mutual co-requisites, indicating that they must be taken simultaneously, we do not add any edge between them in the CPN since neither precedes the other.

It is worth noting that, in this work, each node in the CPN represents an individual course, and each directed edge encodes a prerequisite relationship between two courses. Beyond structural dependencies, we enrich the edge-level information with student performance data extracted from the Individualized Course Path Graphs. More details will be provided in the later section.

3.1.2. Major Induced Graph (MIG)

To extract the portion of the curriculum relevant to each major, we isolate the subgraphs of the CPN, termed the Major Induced Graph (MIG). The concept of the MIG extends our previous work on the Longest Path Induced Graph (LPIG) []. In the original formulation,

L P I G_{t}

was defined as the induced subgraph of a DAG that contains all nodes lying on paths of length greater than t. In terms of CPN analysis, the threshold t can be interpreted as a structural depth criterion, such that the resulting LPIG captures course sequences that are sufficiently long to reflect meaningful curricular constraints.

In contrast to our prior work, where LPIG was defined based on a path length threshold, in the current work, we adopt MIG to extract the major relevant subgraphs of the CPN. Specifically, we aim to capture those courses that are connected to a given major through prerequisite paths. The key motivation for using MIG is that a major’s curriculum cannot be simply captured solely by filtering courses based on their major prefix. For example, selecting courses labeled as “ECE” would omit the foundational “MATH” or “PHYS” courses, which are commonly required in its pathways. In this work, we introduce two variants of MIG for major-level analysis.

MIG_R (Related): This variant includes all courses that appear on any prerequisite path that leads to a course with the target major’s prefix (e.g., “ECE”). Formally, for a given major M, a course X is included in MIG_R if there exists a path in the CPN such that $X ⇝ Y$ , where Y is labeled with prefix M. This subgraph thus captures not only the major’s own courses but also foundational or supporting courses (e.g., MATH, PHYS) that serve as prerequisites along those paths. In summary, MIG_R encompasses all courses that are associated with major M through any prerequisite chains.
MIG_C (Capstone): This variant restricts to prerequisite paths that culminate in the capstone course (e.g., Senior Design or academic internship) of the major. A course X is included in MIG_C if it lies on any path that ends at the capstone course $C \in C_{M}$ for major M. This variant focuses on the structural backbone that supports capstone completion, offering a more stringent subgraph for dependency analysis. In summary, MIG_C is a subgraph of MIG_R that includes only those courses lying on prerequisite paths leading to the designated capstone courses of major M.

3.1.3. Student Course Path Graphs (SCPGs)

In addition to the CPN and its major specific subgraphs, we also constructed the individualized Student Course Path Graphs (SCPGs) for each student based on their academic records. Each SCPG is a DAG, depicting the sequences of courses taken by an individual student throughout their university career.

The node set of an SCPG consists of all courses attempted by the student, and the node attributes include (1) First attempted semester, (2) Finished semester, (3) Retaken count, and (4) Grade. The Retaken count is initialized as zero. If a student passes the course on the first attempt, the First attempted semester and Finished semester are identical. Otherwise, the Finished semester records the term in which the course was successfully completed, while the Retaken count reflects the number of failed attempts prior to passing. An edge between two nodes

(X, Y)

is added if (1) the CPN contains this edge and (2) the Finished semester of course X is earlier than the First attempted semester of course Y, and (3) the Grade of course X is one of

{A, B, C}

, indicating a passing grade. These conditions ensures that the edges reflect valid prerequisite relationships that were respected in the student’s actual course taking sequence. In cases where a student attempts a course multiple times but never achieves a passing grade (such as D or F), the node is still included in the SCPG. In such cases, the Finished semester is set to the last attempted semester. Since the course was never passed, this node is not eligible to serve as a valid prerequisite for any courses, and therefore has no outgoing edges.

Each SCPG reflects one student’s course taking trajectory and is used to enrich the CPN with statistical attributes. Specifically, they are used to compute edge-level metrics in CPN, such as conditional pass rates and grade correlations between course pairs.

3.1.4. Graph Postprocessing

For each directed edge

(X, Y)

in the CPN, we associate a set of edge attributes based on actual student course-taking behavior, extracted from the SCPGs. These attributes include the following:

d: Total number of students who attempted course Y after completing course X.
n_f: Number of students who passed course X, then passed course Y on first attempt.
n_a: The number of students who passed course X, then passed course Y regardless of how many attempts were needed.
Repeat: Total number of repeats among students between course X and Y.
Rep_stu: Total number of students who retook course Y after completion of course X.
Matrix_first: A $3 \times 4$ matrix recording the number of students with each grade combination for courses X and Y, restricted to students who passed X and attempted Y exactly once. The rows correspond to grades in X ${A, B, C}$ , and the columns correspond to grades in Y ${A, B, C, F}$ .
Matrix_all: This matrix shares the same structure and interpretation as Matrix_f, with rows corresponding to grades in X ${A, B, C}$ and columns to grades in Y ${A, B, C, F}$ . The only difference is that Matrix_a includes all students who passed X and eventually completed Y, regardless of whether they repeated the course. It therefore reflects the overall transition pattern without excluding students who retook Y.

These statistics provide a foundation for computing conditional pass rates, modeling grade dependencies, and estimating the expected time needed to traverse specific paths in the curriculum. At this stage, the graph construction part is complete.

3.2. Analysis

3.2.1. Centrality Analysis

To characterize the structural properties of the curriculum, we performed a series of graph-based centrality measures on the CPN. In particular, node centralities identifying courses that play structurally important roles within the prerequisite network. We first considered outdegree centrality, which counts the number of immediate successors for each course, reflecting its direct influence on access to the subsequent courses.

Given a directed graph

G = (V, E)

,

i \in V

, we use

k_{o u t} (i)

to denote the outdegree of node i. In terms of adjacency matrix A,

k_{o u t} (i)

is given by Equation (1).

k_{out} (i) = \sum_{j = 1}^{n} A_{i j}

(1)

We also computed the betweenness centrality, which identifies courses that frequently lie on the shortest paths between other courses and serve as bottlenecks in the curriculum. We use

β (i)

to denote the betweenness centrality of node i, defined as in Equation (2). Here,

σ (s, t)

denotes the total number of shortest paths from node s to node t, and

σ (s, t ∣ i)

denotes the number of those paths that pass through node i.

β (i) = \sum_{s \neq i, t \neq i} \frac{σ (s, t | i)}{σ (s, t)}

(2)

In addition, we introduced

R e a c h

as a centrality measure tailored for the curricular context. For each course,

R e a c h

is defined as the number of downstream courses accessible via any direct path. Unlike betweenness or degree,

R e a c h

captures the total span of a course’s influence. As highlighted in our previous work [], high-reach courses often serve as critical gateways; and disruptions to them can affect large portions of the curriculum. The

R e a c h

metric is computed using breadth-first search (BFS) from each course node. Formally,

R e a c h

is defined as

R e a c h (i) = | B F S (i) |

(3)

where

B F S (i)

is the BFS tree rooted at i.

Among these three centrality measures,

R e a c h

is especially useful for capturing how broadly a course influences downstream curriculum. In the next section, we combine it with pass rates to identify courses that are both structurally central and difficult to pass.

3.2.2. Reach–Pass Rate Joint Analysis

Having identified the structural roles of courses within the curriculum through the aforementioned measures, we now incorporate the student data to further understand their functional impact. In this section, we perform a joint analysis that combines

R e a c h

with course passing rates that are derived from student transcripts. The pass rate is defined as the proportion of students who passed the course out of all students who attempted it. While

R e a c h

captures a course’s structural centrality within the curriculum, the pass rate reflects its real-world difficulty as experienced by students. Together, they provide a more holistic view of curricular impact. This analysis is based on MIG_R, which emphasizes how far a course reaches through the curriculum in terms of prerequisite influence.

By plotting courses in the two-dimensional space of

R e a c h

and pass rate, we were able to identify key curricular bottlenecks, specifically courses that are both structurally central and difficult to pass. These high-risk courses may require additional instructional support or curricular adjustments, as they have a disproportionately impact on student progression.

3.2.3. Path Completion Modeling

The above analyses provide valuable insights into the structural roles and difficulty of individual courses, but they are still conducted at the individual course level (node-level). From students’ perspective, what truly matters is not just whether a single course is hard, but whether they can successfully complete an entire course chain that leads to graduation, and how long that might take. From an institutional perspective, it is equally important to evaluate whether a program’s structure creates unintended barriers or lowers overall completion feasibility. In this section, we focus on analyzing the complete chains extracted from the MIG_C. Since all prerequisites in an attempted chain must be passed, assuming one course per term, the length of a path also reflects the minimum number of semesters required to complete it []. To evaluate the feasibility of completing these course paths, we introduce three statistical models that estimate the expected time or probability of path completion under varying assumptions about student performance. These models, referred to as M1, M2, and M3, progressively incorporate increasing levels of empirical information. Assume

ρ = {c_{1}, c_{2}, \dots, c_{n}}

represents a course path consisting of n sequential courses connected by prerequisite relationships.

M1 (Independent Passing Probability Model): Assume that the passing rates $P (c_{i})$ for each course $c_{i}$ on $ρ$ are independent, and the expected number of semesters to complete $ρ$ is calculated as the sum of the inverses of these pass rates (Equation (4)).

$E_{ρ}^{M 1} = \sum_{i = 1}^{n} \frac{1}{P (c_{i})}$

(4)
M2 (Conditional Passing Probability Model): This model incorporates empirical pass rates by estimating the conditional probability of passing a course given that the prerequisite was passed. Specifically, for each course pair $(c_{i - 1} \to c_{i})$ , we estimate $P (c_{i} | c_{i - 1})$ using the edge information of the CPN, considering only students who passed course $c_{i + 1}$ on their first attempt. The expected number of semesters to complete $ρ$ is calculated as the sum of the inverses of these conditional probabilities. Under M2 (Equation (5)), each course’s success likelihood depends on the prerequisite success.

$E_{ρ}^{M 2} = \frac{1}{P (c_{1})} + \sum_{i = 2}^{n} \frac{1}{P (c_{i} | c_{i - 1})}$

(5)
M3 (Path Completion Probability Model): This model estimates the overall probability of successfully completing a path. It incorporates conditional pass rates $P (c_{i} | c_{i - 1})$ , including all students who eventually passed course $c_{i}$ after passing $c_{i - 1}$ , regardless of the number of attempts. Unlike M1 and M2, which focus on expected time, M3 measures the structural survivability of a path by computing the cumulative success probability across all transitions. The completion probability for $ρ$ is calculated as Equation (6):

${CompletionProb}_{ρ}^{M 3} = P (c_{1}) \times \prod_{i = 2}^{n} P (c_{i} ∣ c_{i - 1})$

(6)

Together, these three models offer complementary perspectives on curricular progression: M1 provides a baseline estimate under independent course-level assumptions, M2 incorporates realistic dependencies between courses without considering repetition, and M3 further captures the long-term completion likelihood by accounting for eventual success through course retakes. By comparing results across these models, we can assess both the temporal demands and structural fragility of different degree pathways.

3.2.4. Grade Transition Pattern

The analyses mentioned in the previous section focused on course passing rates, treating outcomes as binary events (i.e., either passed or failed a course). A deeper examination of fine-grained grade relationships between courses may reveal how students’ performance in a prerequisite course influences their subsequent courses. Intuitively, for a given prerequisite pair

(X, Y)

, if a student earns an “A” in course X, one would expect a high likelihood of also achieving a good corresponding grade in course Y as well. Conversely, if a student performs well in X but performs poorly in Y, it may suggest that either the content of Y diverges significantly from X, or that the preparation offered by X does not equip students for the challenge presented in Y. To capture such dynamics, we propose a grade-based transition index, denoted as

ψ

, that captures the directional shift in student performance from one course to its successor. This index is computed based on three key grade transition pattern:

Let

N_{x, y} (p, q)

denote the number of students whose grade transitions from p in course x to q in course y, and

N_{x} (p)

denote the total number of students who received p in course x.

Jump: The proportion of students who move from a lower grade in the prerequisite course to a higher grade in the subsequent course, reflecting upward academic momentum.

${J u m p}_{x, y} = \frac{N_{x, y} (C, A) + N_{x, y} (C, B) + N_{x, y} (B, A)}{N_{x} (C) + N_{x} (B)}$

(7)
Fall: The proportion of students who earn a high grade in the prerequisite but drop to a low grade in the successor, suggesting a mismatch or gap in curricular alignment. The Fall-Down factor is given by

${F a l l}_{x, y} = \frac{N_{x, y} (A, C) + N_{x, y} (A, B) + N_{x, y} (A, F) + N_{x, y} (B, C) + N_{x, y} (B, F) + N_{x, y} (C, F)}{N_{x} (A) + N_{x} (B) + N_{x} (C)}$

(8)
Stability: The proportion of students who maintain similar performance levels across both courses ( $A \to A$ , $B \to B$ , $C \to C$ ), representing the curricular consistency. The Stability factor is given by

${S t a b i l i t y}_{x, y} = \frac{N_{x, y} (A, A) + N_{x, y} (B, B) + N_{x, y} (C, C)}{N_{x} (A) + N_{x} (B) + N_{x} (C)}$

(9)

In our experiments, we focus primarily on whether students experience a performance drop when progressing to the successor course. From this perspective, we only focus on the component

F a l l

. This simplified formulation emphasizes curricular coherence by penalizing transitions where students’ performance drops after passing the prerequisite. A lower

F a l l

value indicates that most students maintain or improve their grades when moving to the successor course, suggesting better continuity and instructional alignment between the two courses. Conversely, a higher value implies that a larger proportion of students perform worse in the successor course, potentially revealing gaps in curricular coherence or prerequisite preparation.

4. Experiments and Results

4.1. Dataset and Experiment Overview

The dataset used in this study comprises both course information and student academic records from Southern Illinois University Edwardsville. Course information was obtained by crawling the university’s official website [] using Python request (2.32.3) and bs4 (4.13) library. The student records consist of transcript data from four majors, spanning the years 2010 to 2024. Personally identifiable information was removed prior to analysis to ensure compliance with privacy standards. For each entry includes the unique student ID, the course code, the attempted term, and the corresponding grade. The transcript data preprocessing involved two main aspects. On the one hand, since our analysis focuses on course-taking pathways and students are only allowed to proceed to subsequent courses upon earning a grade of ‘C’ or higher

({A, B, C})

in the prerequisite course, all grades below ‘C’ were treated as

F a i l

(‘F’) in our model. On the other hand, there is an exception where some records showed students enrolled in the same course more than once within a single semester with invalid grades; such redundancies were removed during preprocessing to ensure data consistency. The samples of both prerequisites and transcript data are shown in Figure 2. To ensure meaningful analysis, the subjective majors should have sufficient numbers of students and exhibited complete prerequisite chains in the CPN. Based on these criteria, we selected four representative STEM majors: Computer Science (CS), Biology (BIOL), Electrical and Computer Engineering (ECE), and Mechanical Engineering (ME). Since student major information is not presented in the transcript data, students were assigned to the major in which they completed the greatest number of courses. An overview of the course offerings, the capstone courses, and student counts is shown in Table 1 (note that the course counts in Table 1 include only those with the specified major prefix). In addition, all the graph construction and analysis were primarily conducted using NetworkX (3.5) library []. Part of the path analysis and result plotting are carried out by using graph visualization tools Gephi(0.10) [] and yEd Graph Editor(3.25).

Figure 2. Examples of input data.

Table 1. Dataset overview.

All course codes and their corresponding course names mentioned in this paper are listed in Table A1, and the major induced graphs are provided in Figure A1, Figure A2, Figure A3 and Figure A4. The data and code used in this research are available at [].

4.2. Centrality Results

Table 2 and Table 3 summarize the structural characteristics of the CPN and the corresponding subgraphs. Table 2 shows the number of STEM courses and prerequisite links that constitute the CPN, along with the courses exhibiting the highest centrality under different metrics. The abbreviation “Lcc” refers to the Largest Connected Component, which captures the largest subset of courses connected via prerequisite relationships. We found that BIOL220(Genetics) exhibits both the highest betweenness centrality and the highest outdegree. This prominence can be partially attributed to the relatively large number of biology-related courses represented in the network, which increases the likelihood of BIOL220 serving as a key connector or prerequisite across multiple paths. This observation is consistent with the subgraph statistics shown in Table 3. In addition, we found all fundamental Math courses MATH120 (College Algebra) → MATH125 (Precalculus Mathematics) → MATH150 (Calculus I) appears at the top of the

R e a c h

. This is primarily because MATH courses are required across nearly all STEM majors, making them common entry points in many students’ academic pathways. Table 3 lists the size of MIG_R and MIG_C for each major. We observed that BIOL contains 99 related courses, but only 15 of them are included in its core paths toward the capstone. In comparison, CS offers 33 related courses, 17 of which are included in the core path.

Table 2. CPN overview.

Table 3. Subgraph Overview.

Table 4, Table 5, Table 6 and Table 7 list the courses with high betweenness centrality and high out-degree for each major. Figure A1, Figure A2, Figure A3 and Figure A4 illustrate the corresponding MIG_R graphs, where green nodes indicate the subset of courses include in MIG_C, and red nodes represent the capstone courses. By examining Table 4 and Table 5, we observe a strong overlap in the courses with high betweenness centrality between MIG_R and MIG_C across all majors. Key courses such as CHEM121A, BIOL220, CS150, ECE351, MATH150, MATH152, and CE240 consistently appear in both networks, indicating that they are not only structurally central within the curriculum graph but also serve as essential connectors along the core paths toward capstone. We also noted a few differences, such as CS234 and ECE282, which appear exclusively in MIG_C, suggesting that they are prominent in the core paths despite having relatively lower overall betweenness centrality.

Table 4. Betweenness centrality in MIG_R.

Table 5. Betweenness centrality in MIG_C.

Table 6. Out-degree in MIG_R.

Table 7. Out-degree in MIG_C.

Table 6 and Table 7 list the high out-degree courses in MIG_R and MIG_C, respectively. Courses with out-degree less than 2 are omitted and marked with a “*”, and the number in parentheses indicates the out-degree of the course. By comparing these two tables, we find that several courses such as CHEM121A, MATH150, CS150, ECE351, ECE282, and ME354 consistently exhibit high out-degree in both subgraphs, indicating their role as central prerequisites that directly lead to multiple downstream courses, including those in the capstone-oriented paths. Most courses such as BIOL220, CS340, etc., only appear in MIG_R, underscoring their structural centrality but less relevant to the core graduation paths. As shown in Figure A1, although BIOL220 has 22 successors in MIG_R, but only 1 of them (BIOL492) contributes to the capstone path. In contrast, we noticed some courses like CS140 show high out-degree exclusively in MIG_C, which means that although they are not dominant in the major curriculum graph, they serve as critical connectors along the core academic paths.

Across all four subjective majors, we observe that the courses with the highest

R e a c h

in both MIG_R and MIG_C consistently originate from the foundational mathematics course chain beginning with MATH120. This sequence typically includes MATH120, MATH125, MATH150, and often extends further to MATH152 (Calculus II), MATH250 (Calculus III), or MATH305 (Differential Equations) depending on the major. This pattern reflects the universal curricular importance of mathematics in STEM education, and highlights how early math preparation is not only structurally central but also functionally critical for student progression toward degree completion.

4.3. Reach-Pass Rate Joint Analysis

Figure 3 presents the analytical result for each major. We observe that MATH150 (the orange node) stands out in all four majors as a course with high Reach but low pass rate, highlighting its role as a foundational but challenging bottleneck in student’s academic progression. Comparatively, MATH120 (the yellow node) always occupies high

R e a c h

and high pass rate position, and serves as an accessible and starting point that supports a wide range of downstream paths. Several courses like BIOL444A, CS447, CS314, CS286 (the green nodes) fall into the low

R e a c h

and low pass rate quadrant. These courses may not be central to the curriculum pathways but present high failure risks. Conversely, courses with high pass rate but low Reach are often upper-level (300–400 level) electives or specialized courses. These tend to be less pivotal role within the curriculum but are generally less challenging and have higher pass rates. The above pattern helps identify critical bottlenecks, and outlier courses that may warrant further curricular attention.

Figure 3. Reach-Pass Rate.

4.4. Path Completion Results

Figure 4, Figure 5, Figure 6 and Figure 7 demonstrated all the prerequisite chains leading to the capstone course of each major. The paths beginning with head node

M P_x

denote the math prerequisite chains, and x indicates the length of the math chain. Specifically, the math chains refer to the segments of math path that begin with MATH120 and typically progress through MATH125 → MATH150 → MATH152 → MATH250 → MATH305. Regarding each path, the path length, the statistic results, and the actual semesters students took are presented. Notably, the ground truth is calculated by averaging the semesters taken by all students who completed the corresponding path. The estimations derived from M1 and M2 are evaluated against this baseline, with the closer estimate highlighted in the results.

Figure 4. BIOL paths.

Figure 5. CS paths.

Figure 6. ECE paths.

Figure 7. ME paths.

Several noteworthy patterns emerge from the comparison of the paths in each major: Although the BIOL offers the largest number of courses, it has only six paths leading to the capstone course. In contrast, despite CS having the fewest courses, it exhibits a relatively larger amount of capstone paths. In addition, the path length also varies across majors. BIOL paths typically span seven to eight courses, CS ranges from three to nine, ECE from five to eleven, while all ME paths consistently contain ten courses. Furthermore, the math load requirements differ significantly among the majors, BIOL students generally need at most two math courses, CS requires three. ECE and ME demand a more intensive sequence of six math courses.

From the estimation perspective, we observe the advantage of the conditional probability model (M2) over the raw probability model (M1), as M2 yields estimates that are closer to the actual completion times for most of the paths. In the results, we circled the items that are closer to the ground truth.

M3 reflects the path completion rates by incorporating course repetition. A high M3 score indicates that a large proportion of students are eventually able to complete the path, even if it involves course repetitions. This suggests that the path has a degree of resilience, and students may struggle along the way, but institutional structures or course policies may allow them to recover and proceed. In most cases, when comparing paths with the same length, those with higher M3 scores tend to exhibit shorter completion times, while lower M3 scores are often associated with longer time. However, a high M3 does not necessarily imply efficiency, as repeated attempts may lead to longer actual completion times. In contrast, a low M3 score implies that failures along the path are more consequential, and students who encounter difficulties are less likely to recover and complete the path. This may reflect stricter prerequisites, bottleneck courses, or limited opportunities for retaking. Paths with low M3 score but short actual durations might reflect a subset of high-performing students progressing smoothly through more demanding trajectories. For example, path-5 and path-6 of CS (Figure 5) have a passing probability of 0.58 and a completion duration of 8.50, while path-7 has a slightly higher passing probability of 0.62 but a longer duration of 8.81. Another example can be seen in path-11 of ECE (Figure 6), which has a passing probability of 0.58 and a completion duration of 9.95, while path-10 has a slightly higher passing probability of 0.62 but a longer duration of 11.00.

We also found some inner patterns for each major:

BIOL: The paths start with CHEM prerequisites have a lower completion rate than those start with MATH.
CS:
–
The computer architecture and system design pathways (CS286 (Intro to computer organization and architecture) →CS314 (Operating System)) is more challenging than software development and application pathways (CS234 (Web development) → CS325 (Software Engineering)).
–
The pathways leading from MATH224 (Discrete Mathematics) → CS340 (Data Structures) is more challenging than the pathway from CS150 (Introduction to Computing I) → CS340 (Data Structures), as it builds on abstract reasoning and mathematical rigor which increases the overall cognitive load.
ECE: As shown in ECE (Figure 6), path-15 has the lowest passing probability of 0.55, yet it is relatively short, indicating that it represents one of the most challenging trajectories within the ECE curriculum.
ME: Almost all ME paths incorporate Civil Engineering (CE) courses such as CE240 (Statics) and CE242 (Mechanics of Solids), which are foundational but challenging. These courses are common prerequisites across the ME curriculum.

When comparing majors, BIOL and ME have a similar number of paths leading toward their respective capstone courses. However, the paths in ME are more difficult. The average passing probability for ME is around 56%, while BIOL has a higher average passing probability of approximately 64%, highlighting a greater overall challenge in the ME curriculum. Additionally, the overall completion rates for CS and ECE are comparable.

The previous analysis was based on students who complete the entire prerequisite path, meaning all courses along the path were completed. However, students may often have varying mathematical backgrounds and may not be required to complete every foundational math courses. For instance, some students may be exempted from precalculus courses such as MATH120 or MATH125 through college placement exams (such as AP exam), allowing them to start directly with a higher level math course and enter major-specific courses earlier. This enables them to access upper-level content sooner and potentially graduate in a shorter time.

To account for this variation, we also conducted a focused analysis on the influence of each individual math course as a starting point, aiming to understand how different entry levels in the math sequence affect curricular coverage and progression across majors. Figure 8 shows the math prerequisite path from MP_1 to MP_6, and Table 8 presents the average graduation duration associated with each entry point. Theoretically, adjacent starting points should correspond to a one-semester difference in overall duration. However, as highlighted in Table 8, we noticed cases where this difference exceeds one semester, suggesting that placing out certain math courses may lead to faster graduation outcomes. This feature is particularly notable in ECE major.

Figure 8. MATH prerequisite paths.

Table 8. Influence of math prerequisites.

4.5. Grade Transition Results

To supplement analyses of critical paths, we focused on

F a l l (X, Y)

values (as described in Section 3.2.4) to further hone in on which specific prerequisite X to course Y pairs are important candidates for further examination by the relevant departments. While we understand that the

F a l l

values are not normalized according to each individual course’s grade distribution, we nonetheless find these values to be a useful indicator of the likelihood that a student who succeeded in a prerequisite X might exhibit worse performance in subsequent course Y. In Section 4.4, we analyzed the patterns along paths across different majors. The grade transition results in this chapter to some extent echo and support those observed patterns. Table A2, Table A3, Table A4, Table A5 and Table A6 show fall values for course pairs existing along critical paths, separated by major and sorted maximum to minimum

F a l l (x, y)

value.

In the BIOL major, we found that the load of paths start from MATH prerequisites are less than the load of paths start from CHEM113. In terms of

F a l l

score, we can clearly see that the transition from CHEM113 to CHEM121A yields the highest

F a l l

score (0.619), significantly higher than the transitions start from MATH courses (as shown in Figure 9).

Figure 9. BIOL example.

Similarly, in the CS major, we found that the grade transition results also corroborate the pathway patterns. Firstly, the sub-path from CS140 to CS340 via MATH224 is more challenging than the one through CS150. As highlighted in Figure 10, the bottleneck between CS140 and MATH224 is the reason. Secondly, we have identified that the Software Engineering sub-path is easier than the Architecture and Operating System sub-path. As shown in Figure 11, here we also found the corresponding performance drop down between CS150 and CS286.

Figure 10. CS example 1.

Figure 11. CS example 2.

For the ECE major, as we noticed that under the same path length (Path-15 and Path-16 in Figure 6), the subpath of CS145 → ECE282 → ECE381 is significantly more challenge than the subpath of ECE210 → ECE211 → ECE351. The barrier occurs between CS145 and ECE282, which yields the second-highest

F a l l

factor (0.505) among all ECE course pairs (Figure 12).

Figure 12. ECE example.

As for the ME major, although the relationship between paths and student performance is not as evident as other majors, we still observed some noteworthy patterns. In particular, as shown in Table A5, ME354 appears to serve as a major bottleneck, as both of its transitions ME354 → ME380 and ME354 → ME350 exhibit the highest

F a l l

scores (0.571 and 0.565, respectively) among all course pairs in the ME curriculum.

5. Discussion

In our effort to model curriculum structure in conjunction with student performance data, the analysis unfolded in a progressive manner. We started from understanding the curriculum structure by decomposing CPN to extract two type of major level subgraphs, MIG_R and MIG_C, and performed centrality analyses on each of them to identify structurally important courses, those that function as hubs or bottlenecks. These courses occupy central positions within the prerequisite structure and thus warrant special consideration in curriculum planning. Interestingly, we observed that some courses, while not centrally located in the overall MIG_R graph, serve as critical transition points on the pathways to capstone courses in MIG_C. In our cases, CS234 and ECE282 play pivotal roles in enabling capstone completion, despite their relatively peripheral positions in the broader curriculum. This discrepancy originates from the curriculum structure itself, yet is often overlooked in conventional curriculum analysis, which typically emphasizes global course importance rather than path-dependent structural roles.

Although centrality measures provide valuable insights into the overall organization of the curriculum, they do not fully capture the practical challenges students face in completing certain courses. For example, hubs primarily reflect how many direct prerequisite relationships a course has, indicating its local connectedness. Betweenness centrality, on the other hand, measures how often a course lies on the shortest paths between other courses, reflecting its bridging role in the curriculum. Instead, the

R e a c h

metric provides a more cumulative view, quantifying how extensively a course impacts the curriculum by measuring the number of downstream courses that depend on it. When combined with pass rate, Reach enables a more nuanced understanding of both structural importance and real-world difficulty, highlighting courses that are not only influential but also difficult to complete. From our observations in Section 4.3, we found that, MATH150 consistently domains the significantly influential position but with very low passing rate, making it a persistent high risk course. In contrast, MATH120 also exhibits high

R e a c h

but maintains a relatively high pass rate, indicating that it functions as a stable and foundational course. In addition, although the vast majority of courses have pass rates above 70%, we still identified several exceptions like BIOL444A, CS286, CS447, and CS314 that fall below this threshold. This clearly indicates areas where instructional support or curriculum adjustment is needed.

While the Reach–Pass Rate joint analysis addresses part of the concern regarding course impact, it leaves open the question of how challenges accumulate across entire prerequisite paths, shifting the attention from individual courses to prerequisite paths. We focused on particularly those leading to capstone courses, which ultimately determine whether students can graduate on time. Motivated by this, we proposed both independent (M1) and conditional (M2) passing probability metrics to estimate the number of semesters required to complete a given path. By comparing these estimations with actual student data, we found that the conditional probability model more accurately reflects real world outcomes. This fact is natural, as the conditional probability captures how performance in earlier courses influences success in later ones. In practice, students who struggle with earlier courses are more likely to face difficulties in downstream courses, while those who perform well early on tend to maintain higher success rates throughout the path. By progressively adjusting the passing probabilities along the sequence, the M2 model more accurately reflects this cumulative and dependent nature of course progression. Here, we emphasize that M1 and M2 estimations are not meant to replicate actual course schedules, but to provide a standardized abstraction that assumes students take one course per term, in line with prerequisite logic.

Besides the completion time estimation, we also introduced the path completion probability metric (M3), which focuses on path survivability. M3 captures the overall likelihood that a student eventually completes a given path, regardless of the number of attempts. Beyond measuring resilience, the M3 score also serves as a useful indicator for comparing the relative difficulty of different paths. Paths with lower M3 scores are generally more difficult due to potential bottlenecks. We captured several major-specific patterns, such as in the BIOL major, paths beginning with CHEM prerequisites tend to have lower completion rates than those starting from MATH; in CS, the computer architecture and systems path (CS286 → CS314) is consistently more challenging than software development path (CS234 → CS325), and so on. These patterns warrant the attention of curriculum designers, as they highlight structural disparities that may affect student progression.

Furthermore, we observed that students who start with more advanced math courses tend to graduate in less time. While theoretically skipping a course should reduce the timeline by just one semester, in CS, ECE, and ME majors, the average time saving exceeds that. Since bypassing foundational math courses typically requires placement exams, these students likely already possess strong mathematical skills. Given that foundational math courses serve as root prerequisites for nearly all STEM majors, this finding further reinforces the hypothesis that stronger math preparation leads to more efficient degree completion.

Finally, the grade transition analysis offers a new perspective on students’ learning continuity across prerequisite course links. Unlike pass/fail indicators, the

F a l l

metric captures more subtle shifts in performance. High

F a l l

values indicate that a significant number of students experience grade drops after completing a prerequisite course, often signaling issues such as poor conceptual alignment, curricular disconnects. Notably, these high

F a l l

course pairs often align with the path pattern that identified earlier. For example, CHEM113 → CHEM121A in BIOL, CS140 → MATH224 and CS150 → CS286 in CS, and CS145 → ECE282 in ECE emerge as critical bottlenecks. These course pairs consistently perform poorly in both path flow and performance continuity, suggesting they are key points where students are likely to fall behind. As such, the Fall metric serves as a valuable diagnostic tool to pinpoint targets for pedagogical intervention such as refining instructional content or adjusting prerequisites to enhance student performance in subsequent courses and improve overall path completion.

6. Conclusions

In this work, we introduced a set of nuanced approaches for assessing curriculum structure, learning pathway feasibility, and grade transition pattern by integrating transcript data with the course prerequisite network. The results reveal that some courses serve as critical hubs not only due to their structural position, but also due to their low pass rates at these key junctures. The introduction of the grade transition pattern offers a novel view to measure the continuity of academic performance between prerequisite and subsequent courses, with the

F a l l

factor effectively identifying points where performance tends to decline.

Despite these contributions, our study still has several limitations. The analysis is based on historical records from four majors at a single public university, and the prerequisite structures themselves are relatively simple. As a result, the number of distinct paths and structural patterns available for comparison is inherently small. In our future work, we plan to incorporate data from more institutions to enable comparative studies across different curricular designs, which will help uncover more generalized and transferable patterns.

Author Contributions

B.Y. led the implementation of all methods, data acquisition, and application of the framework, and contributed significantly to manuscript preparation. M.G. was deeply involved in writing and revising the manuscript. H.R.R. and S.Z.B. contributed to data collection, execution of network science experiments, and result reporting. X.H. provided early-stage guidance on administrative structures and degree categorizations within her institution. G.E. served as the principal architect of the study, formulating the original curricular questions in graph-theoretic terms, designing the methodology, and interpreting the results. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study titled “Analysis of STEM Student Flows through Curricular Networks” (IRB Protocol #2748, PI: Gunes Ercal) was reviewed and determined to be exempt from full IRB review by the Southern Illinois University Edwardsville Institutional Review Board on 3 October 2024, in accordance with applicable federal regulations.

Data Availability Statement

The de-identified datasets, codes, and results are available at https://github.com/BonanYang/MDPI.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. BIOL subgraph.

Figure A2. CS subgraph.

Figure A3. ECE subgraph.

Figure A4. ME subgraph.

Appendix B

Table A1. Course code-subject mapping.

Codes	Courses
BIOL150	Introduction to Biological Sciences I
BIOL151	Introduction to Biological Sciences II
BIOL220	Genetics
BIOL319	Cell and Molecular Biology
BIOL330	Environmental Health
BIOL365	Ecology
BIOL492	Biological Sciences Colloquium I
BIOL497	Senior Assignment
CE240	Statics
CE242	Mechanics of Solids
CHEM121A	General Chemistry I
CHEM121B	General Chemistry II
CHEM241A	Organic Chemistry
CS140	Introduction to computing I
CS150	Introduction to computing II
CS234	Database and Web System Development
CS286	Introduction to Computer Organization and Architecture
CS314	Operating Systems
CS325	Software Engineering
CS340	Algorithms and Data Structures
CS382	Game Design, Development and Technology
CS438	Artificial Intelligence
CS447	Networks and Data Communications
CS499	Senior Project: software implementation
ECE211	Circuit Analysis II
ECE282	Digital Systems Design
ECE326	Electronic Circuits I
ECE340	Electromagnetics
ECE341	Princ w/Electro Mec Enrg Conv
ECE351	Signals and Systems
ECE444	Power Electronics
ECE446	Power System Analysis
ECE405	Electrical and Computer Engineering Design Laboratory
MATH120	College Algebra
MATH125	Precalculus Mathematics with Trigonometry
MATH150	Calculus I
MATH152	Calculus II
MATH224	Discrete Mathematics
MATH250	Calculus III
MATH305	Differential Equation I
ME262	Dynamics
ME310	Thermodynamics I
ME315	Fluid Mechanics
ME354	Numerical Simulation
ME356	Dynamical Systems Modeling
ME370	Materials Engineering
ME380	Design of Machine Element
ME410	Heat Transfer
ME442	Microelectromechanical Systems
ME484	Mechanical Engineering Design II
PHYS141	Physics I for Engineers
PHYS151	University Physics I

Table A2. The Fall score for BIOL.

BIOL	Fall
CHEM113 → CHEM121A	0.619
CHEM121A → BIOL150	0.529
CHEM121B → CHEM241A	0.401
MATH120 → CHEM121A	0.357
MATH125 → CHEM121A	0.352
BIOL150 → BIOL151	0.346
CHEM121A → CHEM121B	0.316
CHEM241A → BIOL220	0.259
BIOL151 → BIOL220	0.210

Table A3. The Fall score for CS.

CS	Fall
CS140 → MATH224	0.608
CS150 → CS286	0.424
MATH125 → CS140	0.356
CS111 → CS234	0.354
MATH120 → CS140	0.351
CS286 → CS314	0.338
CS140 → CS150	0.335
CS234 → CS325	0.325
MATH150 → CS145	0.303
CS150 → CS340	0.300
CS150 → CS234	0.289
CS145 → CS150	0.257
MATH150 → CS140	0.239
MATH150 → CS340	0.210
MATH224 → CS340	0.194

Table A4. The Fall score for ECE.

ECE	Fall
CS140 → ECE282	0.528
CS145 → ECE282	0.505
ECE351 → ECE352	0.375
MATH125 → CS140	0.356
MATH120 → CS140	0.351
MATH150 → ECE210	0.342
ECE210 → ECE211	0.314
MATH150 → CS145	0.302
ECE211 → ECE351	0.296
MATH305 → ECE211	0.293
MATH305 → ECE351	0.282
ECE352 → ECE375	0.273
ECE351 → ECE375	0.260
MATH150 → CS140	0.239
MATH250 → ECE211	0.239
ECE282 → ECE381	0.179

Table A5. The Fall score for ME.

ME	Fall
ME354 → ME380	0.571
ME354 → ME350	0.565
PHYS140 → PHYS141	0.472
MATH152 → PHYS141	0.400
CE240 → CE242	0.383
CE242 → ME380	0.347
ME262 → ME350	0.329
CE240 → ME262	0.328
MATH250 → PHYS141	0.304
MATH152 → PHYS151	0.301
CE242 → ME370	0.228
PHYS151 → CE240	0.214
MATH150 → PHYS140	0.176
MATH305 → ME354	0.139
PHYS141 → CE240	0.120

Table A6. The Fall score for Math Prerequisites.

Math Prerequisites	Fall
MATH125 → MATH150	0.471
MATH120 → MATH125	0.404
MATH150 → MATH152	0.371
MATH152 → MATH250	0.252
MATH250 → MATH305	0.240

References

College Board. Trends in College Pricing and Student Aid 2024; College Board: New York, NY, USA, 2024; Available online: https://research.collegeboard.org/media/pdf/Trends-in-College-Pricing-and-Student-Aid-2024-ADA.pdf (accessed on 20 June 2025).
Bai, Y.; Liu, Z.; Guo, T.; Hou, M.; Xiao, K. Prerequisite Relation Learning: A Survey and Outlook. ACM Comput. Surv. 2025, 57, 1–28. [Google Scholar] [CrossRef]
Matta, J.; Obafemi-Ajayi, T.; Borwey, J.; Wunsch, D.; Ercal, G. Robust graph-theoretic clustering approaches using node-based resilience measures. In Proceedings of the 2016 IEEE 16th International Conference On Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; pp. 320–329. [Google Scholar]
Yang, B.; Kommuri, D.; Sarva, S.; Ercal, G.; Onal, S. Clustering Time Series Data with Applications to Gait Analysis. In Proceedings of the IISE Annual Conference & Expo, Seattle, WA, USA, 21–24 May 2022; pp. 1–6. [Google Scholar]
Stavrinides, P.; Zuev, K. Course-prerequisite networks for analyzing and understanding academic curricula. Appl. Netw. Sci. 2023, 8, 19. [Google Scholar] [CrossRef]
Aldrich, P. The curriculum prerequisite network: Modeling the curriculum as a complex system. Biochem. Mol. Biol. Educ. 2015, 43, 168–180. [Google Scholar] [CrossRef] [PubMed]
Slim, A.; Kozlick, J.; Heileman, G.; Abdallah, C. The complexity of university curricula according to course cruciality. In Proceedings of the 2014 Eighth International Conference On Complex, Intelligent And Software Intensive Systems, Birmingham, UK, 2–4 July 2014; pp. 242–248. [Google Scholar]
Wigdahl, J.; Heileman, G.; Slim, A.; Abdallah, C. Curricular efficiency: What role does it play in student success? Presented at the American Society For Engineering Education Annual Conference & Exposition, Indianapolis, IN, USA, 15–18 June 2014; Available online: https://peer.Asee.Org/20235 (accessed on 20 June 2025).
Yang, B.; Gharebhaygloo, M.; Rondi, H.; Hortis, E.; Lostalo, E.; Huang, X.; Ercal, G. Comparative analysis of course prerequisite networks for five Midwestern public institutions. Appl. Netw. Sci. 2024, 9, 25. [Google Scholar] [CrossRef]
Willcox, K.; Huang, L. Network models for mapping educational data. Des. Sci. 2017, 3, e18. [Google Scholar] [CrossRef]
Rondi, H. Analyzing Course Prerequisite Networks: A Graph-Theoretic Approach to Student Progression. Master’s Thesis, Southern Illinois University at Edwardsville, Edwardsville, IL, USA, 2025. [Google Scholar]
Costa, J.; Bernardini, F.; Artigas, D.; Viterbo, J. Mining direct acyclic graphs to find frequent substructures—An experimental analysis on educational data. Inf. Sci. 2019, 482, 266–278. [Google Scholar] [CrossRef]
Molontay, R.; Horváth, N.; Bergmann, J.; Szekrényes, D.; Szabó, M. Characterizing curriculum prerequisite networks by a student flow approach. IEEE Trans. Learn. Technol. 2020, 13, 491–501. [Google Scholar] [CrossRef]
Raji, M.; Duggan, J.; DeCotes, B.; Huang, J.; Vander Zanden, B. Modeling and visualizing student flow. IEEE Trans. Big Data 2018, 7, 510–523. [Google Scholar] [CrossRef]
Slim, A.; Heileman, G.; Kozlick, J.; Abdallah, C. Predicting student success based on prior performance. In Proceedings of the 2014 IEEE Symposium On Computational Intelligence And Data Mining (CIDM), Orlando, FL, USA, 9–12 December 2014; pp. 410–415. [Google Scholar]
Huang, Q.; Zeng, Y. Improving academic performance predictions with dual graph neural networks. Complex Intell. Syst. 2024, 10, 3557–3575. [Google Scholar] [CrossRef]
Karimi, H.; Derr, T.; Huang, J.; Tang, J. Online academic course performance prediction using relational graph convolutional neural network. In Proceedings of the 13th International Educational Data Mining Society, Online, 10–13 July 2020. [Google Scholar]
Witsken, G.; Crk, I.; Gultepe, E. LLMs in the Classroom: Outcomes and Perceptions of Questions Written with the Aid of AI. Proc. AAAI Conf. Artif. Intell. 2025, 39, 27698–27705. [Google Scholar] [CrossRef]
Onal, S.; Kulavuz-Onal, D. A cross-disciplinary examination of the instructional uses of ChatGPT in higher education. J. Educ. Technol. Syst. 2024, 52, 301–324. [Google Scholar] [CrossRef]
Onal, S.; Kulavuz-Onal, D.; Childers, M. Patterns of ChatGPT Usage and Perceived Benefits on Academic Performance Across Disciplines: Insights from a Survey of Higher Education Students in the United States. J. Educ. Technol. Syst. 2025, 00472395251341214. [Google Scholar] [CrossRef]
Yang, B.; Lei, Y.; Li, X.; Li, N.; Si, X.; Chen, C. A dynamic barycenter bridging network for federated transfer fault diagnosis in machine groups. Mech. Syst. Signal Process. 2025, 230, 112605. [Google Scholar] [CrossRef]
Yang, B.; Lei, Y.; Li, N.; Li, X.; Si, X.; Chen, C. Balance recovery and collaborative adaptation approach for federated fault diagnosis of inconsistent machine groups. Knowl.-Based Syst. 2025, 317, 113480. [Google Scholar] [CrossRef]
Baucks, F.; Wiskott, L. Simulating Policy Changes in Prerequisite-Free Curricula: A Supervised Data-Driven Approach. In Proceedings of the 15th International Conference on Educational Data Mining (EDM), Durham, UK, 24–27 July 2022. [Google Scholar]
Southern Illinois University Edwardsville Undergraduate Courses. Available online: https://www.siue.edu/academics/undergraduate/courses/ (accessed on 20 June 2025).
Hagberg, A.; Swart, P.; Schult, D. Exploring Network Structure, Dynamics, and Function Using NetworkX; Los Alamos National Laboratory (LANL): Los Alamos, NM, USA, 2008. [Google Scholar]
Bastian, M.; Heymann, S.; Jacomy, M. Gephi: An open source software for exploring and manipulating networks. Proc. Int. AAAI Conf. Web Soc. Media 2009, 3, 361–362. [Google Scholar] [CrossRef]
Code, Data & Results. Available online: https://github.com/BonanYang/MDPI (accessed on 20 June 2025).

Figure 1. Graph construction pipeline.

Figure 2. Examples of input data.

Figure 3. Reach-Pass Rate.

Figure 4. BIOL paths.

Figure 5. CS paths.

Figure 6. ECE paths.

Figure 7. ME paths.

Figure 8. MATH prerequisite paths.

Figure 9. BIOL example.

Figure 10. CS example 1.

Figure 11. CS example 2.

Figure 12. ECE example.

Table 1. Dataset overview.

	Number of Courses	Capstone	Students Count (All)	Students Count (with Degree)
BIOL	89	BIOL497	268	501
CS	35	CS499	649	1106
ECE	47	ECE405	427	468
ME	49	ME484	732	842

Table 2. CPN overview.

Number of nodes in CPN	619
Number of nodes in CPN (Lcc)	457
Number of edges in CPN	807
Number of edges in CPN (Lcc)	766
Highest Betweenness Centrality Course	CHEM121A, BIOL220
Highest Out degree Course	BIOL220, MATH150
Highest Reachablity Course	MATH120, MATH125, MATH150

Lcc: Largest connected component.

Table 3. Subgraph Overview.

	Number of Nodes in MIG_R	Number of Nodes in MIG_C
BIOL	99	15
CS	33	17
ECE	59	19
ME	57	18

Table 4. Betweenness centrality in MIG_R.

BIOL	CS	ECE	ME
BIOL220	CS150	ECE211	CE240
CHEM121A	CS140	ECE351	CE242
BIOL151	CS340	MATH150	MATH150
CHEM121B	CS286	ECE340	MATH152

Table 5. Betweenness centrality in MIG_C.

BIOL	CS	ECE	ME
CHEM121A	CS150	ECE351	CE240
BIOL220	CS140	ECE211	MATH152
CHEM121B	CS340	MATH150	MATH150
BIOL151	CS234	ECE282	PHYS141

Table 6. Out-degree in MIG_R.

BIOL	CS	ECE	ME
BIOL220 (26)	MATH150, CS340 (5)	ECE351 (10)	ME315, CE242 (6)
BIOL319 (10)	CS150, MATH125 (4)	ECE282, MATH305 (6)	MATH150, ME310, ME262, ME356, ME410, MATH305, ME380, ME370 (4)
BIOL151, BIOL365 (7)	CS286, CS234, MATH224, MATH152, MATH120 (3)	MATH150, MATH152 (5)	MATH152, MATH250, ME354 (3)
CHEM121A, CHEM241A, BIOL150 (5),	*	PHYS151, PHYS141, ECE326(4)	*

*: Courses with out-degree less than 2

Table 7. Out-degree in MIG_C.

BIOL	CS	ECE	ME
CHEM121A, MATH125 (3)	CS150, MATH150, MATH152 (3)	ECE351, MATH150 (4)	MATH152(3)
CHEM121B, BIOL220 (2)	CS140, MATH120 (2)	MATH125 (3)	CE240, CE242, MATH150, MATH250, ME354 (2)
*	*	ECE282, MATH120, MATH250, MATH305 (2)	*

*: Courses with out-degree less than 2

Table 8. Influence of math prerequisites.

	BIOL	CS	ECE	ME
From MATH120	8.84	11.35	12.01	11.82
From MATH125	8.28	10.22	11.29	10.50
From MATH150	7.92	9.34	9.66	9.47
From MATH152	N/A	N/A	9.44	8.77
From MATH250	N/A	N/A	8	N/A
From MATH305	N/A	N/A	7	7

Cases where the gap is greater than one semester are highlighted.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Analysis of Student Progression Through Curricular Networks: A Case Study in an Illinois Public Institution

Abstract

1. Introduction

2. Literature Review

3. Methods

3.1. Network Formation

3.1.1. Course Prerequisite Network (CPN)

3.1.2. Major Induced Graph (MIG)

3.1.3. Student Course Path Graphs (SCPGs)

3.1.4. Graph Postprocessing

3.2. Analysis

3.2.1. Centrality Analysis

3.2.2. Reach–Pass Rate Joint Analysis

3.2.3. Path Completion Modeling

3.2.4. Grade Transition Pattern

4. Experiments and Results

4.1. Dataset and Experiment Overview

4.2. Centrality Results

4.3. Reach-Pass Rate Joint Analysis

4.4. Path Completion Results

4.5. Grade Transition Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

References

Article Metrics

Citations

Article Access Statistics