Backpack Process Model (BPPM): A Process Mining Approach for Curricular Analytics

Curricular analytics is the area of learning analytics that looks for insights and evidence on the relationship between curricular elements and the degree of achievement of curricular outcomes. For higher education institutions, curricular analytics can be useful for identifying the strengths and weaknesses of the curricula and for justifying changes in learning pathways for students. This work presents the study of curricular trajectories as processes (i.e., sequence of events) using process mining techniques. Specifically, the Backpack Process Model (BPPM) is defined as a novel model to unveil student trajectories, not by the courses that they take, but according to the courses that they have failed and have yet to pass. The usefulness of the proposed model is validated through the analysis of the curricular trajectories of N = 4466 engineering students considering the first courses in their program. We found differences between backpack trajectories that resulted in retention or in dropout; specific courses in the backpack and a larger initial backpack sizes were associated with a higher proportion of dropout. BPPM can contribute to understanding how students handle failed courses they must retake, providing information that could contribute to designing and implementing timely interventions in higher education institutions.


Introduction
In the last decade, different techniques have progressively emerged for the analysis of data recorded by information systems, with the purpose of supporting informed decisionmaking in Higher Education Institutions (HEIs) [1]. In this context, Learning Analytics (LA) is the "measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs" [1]. Many HEIs have high hopes that LA can play an important role in supporting institutional processes, where data analysis can improve teaching [2], curricula [3] and learning outcomes [2] and reduce dropouts [2]. Curricular Analytics (CA) emerged as an area of LA that focuses on analyzing curricula and improving them through continuous improvement processes [4]. For this purpose, analytical tools are used to collect different types of educational data, e.g., data on curricular structure and/or data on course grading [5]. This allows the HEIs to analyze the strengths

The Backpack Metaphor
Recent research has highlighted the important role that student effort plays in academic success [19]. High school readiness [20], economic disadvantage [20], classroom climate [21] and curriculum design [22] had previously been identified as factors that explain dropout. However, self-efficacy also plays a key role in academic success.
Self-efficacy is defined as a person's belief in their ability to succeed in a specific situation [23] and has recently been linked to the effort that students are willing to make [19]. In a qualitative case study, Meyer and Marx [24] found that loss of confidence due to poor performance contributed to engineering attrition. Those students who believe that intelligence is fixed and cannot be developed tend to be less tolerant of failure [19]. On the contrary, those who believe that it can be developed, when faced with an adverse situation, try harder, develop active learning strategies and, even when their curricular progress is not optimal, persist [19].
Based on those previous works about the relevance of self-efficacy, this paper defines the backpack metaphor. A metaphor is "a mechanism of analogy in which we conceive a concept that belongs to a certain conceptual domain in terms of another conceptual domain, and in which correspondences between the attributes of both domains are established" [25]. Metaphors are commonly used to communicate ideas in technical disciplines, facilitating interdisciplinary work. The backpack metaphor is defined as follows: The list of failed courses that a student must retake can be represented as stones that the student puts in a backpack. Each time a student fails a course, a new stone is placed in their backpack, which remains there until the course is passed. Carrying many stones in the backpack could awaken in the student the need to empty it as soon as possible, making risky decisions from the curricular point of view [26]. On the other hand, never being able to empty the backpack (even if the failed courses changed), could affect self-efficacy, have serious consequences in the medium term, affect the students' goals or even result in program dropout.
Traditionally, the analysis of curricular trajectories has been based on the progress of each student in the curriculum [8,27]. Even though this strategy has the advantage that the model is obtained directly from the data and is easy to understand, it does not represent the psychological burden students perceive when they have failed courses that later on they must retake. Understanding how students manage their course backpack could help to better understand their decisions regarding course enrollment, persistence and dropout.

Related Work in Process Mining
Process Mining (PM) is a relatively new research discipline that acts as a bridge between data science and process science [28]. It aims to extract knowledge from the event logs obtained from information systems, in order to discover process models, verify conformance, analyze bottlenecks, compare variants of the same process and suggest improvements [28]. This discipline has been applied in multiple domains, with particular success achieved in fields in which processes are insufficiently structured, such as healthcare [29] and education [16].
Different techniques have been used to analyze the diverse perspectives of the processes, such as control flow, performance and organizational [30]. For the control flow, which is the focus of this work, petri nets [30,31], causal nets [31] and process trees [32], among other methods, have been used to represent process models and algorithms have been proposed that can generate them (e.g., alpha for petri nets [28] and Inductive Miner for process trees [28]). Directly Follows Graphs (DFGs) that can be obtained by DFG-based algorithms such as heuristic miner [28] are also widely used because they are one of the easiest notations to interpret by non-expert users of process mining [31]. While DFGs show disadvantages over the other mentioned formalisms because they do not manage concurrency properly (i.e., when several events occur simultaneously), they are especially recommended when concurrency representation is not necessary.
To guide the application of PM and analyze the results [33], different methodologies have been developed. Well-known generic methodologies are the L* life cycle [28] and PM 2 [33]. Both methodologies have a broad scope, covering the entire process management cycle [34]. Different authors have proposed domain-specific methodologies that take into account the particularities of each domain. For education, Maldonado-Mahauad et al. [35] adapted PM 2 , narrowing its scope from data extraction to model analysis. Johnson et al. [36] extended PM 2 to include domain-specific requirements in healthcare, in terms of ethics and participation of domain experts, among other things. Martin et al. [29] established that a process mining methodology in healthcare should highlight usability, in the building of domain-specific event logs and in the management of unstructured data. In manufacturing, Lorenz et al. [37] proposed a methodology with a scope that included the improvement of the processes.
In recent years, research has been conducted to visualize event logs through domain models or even theoretical frameworks that can be used to represent and analyze the data [38]. The inclusion of shift work operation to model the organizational perspective of processes [30], the construction of a model representing the added value in service processes [39], the modeling of user behavior in MOOCs to identify self-regulated learning strategies [35] and the analysis of dropout behavior through the investment model [40], are examples of that. In this work, we propose to use the backpack metaphor to conceptualize students' curricular trajectories as a novel approach to understand how they manage their failed courses.
While any general-domain process mining methodology could have been used for this work, the one proposed by Maldonado-Mahauad [35] was chosen because it is based in PM 2 , which is widely used, and its scope goes only from data extraction to model analysis. The main contribution of this work is not the sequence of stages in the methodology, but the approach used to understand the curricular trajectories through the backpack metaphor.

The Backpack Process Model (BPPM) Approach
This section proposes the Backpack Process Model (BPPM) approach, in order to systematize the analysis of curricular trajectories using PM techniques, based on the backpack metaphor. This model represents the curricular trajectories of the students as a sequence of failed courses that they must retake; that is to say, as a sequence of backpacks. This sequence of backpacks is represented as a Directly Follows Graph (DFG), one of the most popular and widespread process modeling notations [31]. A DFG is a graph with nodes and transitions (directed edges) that corresponds to directly follows relationships (see Figure 1) [31]. In the BPPM, each node represents the group of failed courses that the student must retake and each edge represents the transition between a given backpack and the next one. Table 1 shows the backpack trajectories for two students, namely 23 and 24. In this example, student 23 failed algebra (A) and chemistry (Q) in the first semester, beginning the following semester with both courses in his/her backpack. This situation is labeled "AQ" in Table 1. The second semester, this student passed chemistry (Q) but failed algebra (A) again, keeping it in his/her backpack. Finally, this student passed algebra (A) and continued studying with an empty backpack. This situation is tagged with "RETENTION" in the event log. On the contrary, student 24 failed chemistry (Q), maintained it in the backpack for one semester and the next semester dropped out. This situation is tagged with "DROPOUT" in the event log. Figure 1a represents the DFG for the backpack trajectories illustrated in Table 1. Additionally, our analysis considers a derivation of this model, called BPPM-S (BPPM, grouped by size), where curricular trajectories are represented as a sequence of backpack sizes for each academic period. Figure 1b represents the DFG for the backpack trajectories included in Table 1, according to the BPPM-S. In Figure 1b, BP-1 and BP-2 represent the backpack size at the end of a semester; that is, the number of courses students failed and have not yet passed. BP-1 indicates a backpack size of 1, while BP-2 indicates a backpack size of 2. More details on the meaning of nodes and transitions shown in Figure 1 are explained later, in the event log generation subsection. trajectories included in Table 1, according to the BPPM-S. In Figure 1b, BP-1 and BP-2 represent the backpack size at the end of a semester; that is, the number of courses students failed and have not yet passed. BP-1 indicates a backpack size of 1, while BP-2 indicates a backpack size of 2. More details on the meaning of nodes and transitions shown in Figure 1 are explained later, in the event log generation subsection. In order to develop the previously proposed analysis models from the curricular records of a HEI, it is necessary to apply a PM methodology. An appropriate methodology, such as the one presented below, allows to apply PM techniques and to understand the results [33]. This methodology, as can be seen in Figure 2, defines the following four stages: Data extraction, event log generation, discovery and analysis.

Data Extraction
In the first stage of the methodology, the minimum necessary data are extracted and used for an exploratory analysis. Since the final objective is the curricular analysis, the BPPM and BPPM-S models define a table, called BPPMt, that contains a record of each course taken by each student, as well as their final grade (see Table 2). In order to develop the previously proposed analysis models from the curricular records of a HEI, it is necessary to apply a PM methodology. An appropriate methodology, such as the one presented below, allows to apply PM techniques and to understand the results [33]. This methodology, as can be seen in Figure 2, defines the following four stages: Data extraction, event log generation, discovery and analysis.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 5 of 20 trajectories included in Table 1, according to the BPPM-S. In Figure 1b, BP-1 and BP-2 represent the backpack size at the end of a semester; that is, the number of courses students failed and have not yet passed. BP-1 indicates a backpack size of 1, while BP-2 indicates a backpack size of 2. More details on the meaning of nodes and transitions shown in Figure 1 are explained later, in the event log generation subsection. In order to develop the previously proposed analysis models from the curricular records of a HEI, it is necessary to apply a PM methodology. An appropriate methodology, such as the one presented below, allows to apply PM techniques and to understand the results [33]. This methodology, as can be seen in Figure 2, defines the following four stages: Data extraction, event log generation, discovery and analysis.

Data Extraction
In the first stage of the methodology, the minimum necessary data are extracted and used for an exploratory analysis. Since the final objective is the curricular analysis, the BPPM and BPPM-S models define a table, called BPPMt, that contains a record of each course taken by each student, as well as their final grade (see Table 2).

Data Extraction
In the first stage of the methodology, the minimum necessary data are extracted and used for an exploratory analysis. Since the final objective is the curricular analysis, the BPPM and BPPM-S models define a table, called BPPMt, that contains a record of each course taken by each student, as well as their final grade (see Table 2).  In Table 2, each row of the BPPMt table is represented by the tuple (s, p, c, g, d), where: -s indicates the ID of the student who took the course p the academic period when the course was taken c the identifier of the course taken g the final grade obtained d the end date of the academic period.
For example, (23, 2013-3, Algebra, 6.5, 1 February 2014) indicates that the student with ID 23 took the algebra course in the period 2013-3, which concluded on 1 February 2014, obtaining a grade of 6.5 out of 7. The data necessary for the creation of the BPPMt table are available in the ERP systems commonly used by HEIs. It must be noted that, although the BPPMt table specifies the minimum fields as a common standardization for the next stage of the methodology, the table can be extended with additional data, such as the student's weighted GPA or current academic status.

Event Log Generation
In this stage, it is required to model the data in the form of an event log [28] (i.e., a record of the events that have happened in a process). Formally, an event log is defined as a set of cases (executions of the process), where each case is an ordered sequence of events (the actions occurred in that execution) [28]. Therefore, in order to define an event log, it is necessary to define (1) how to identify a case and (2) how to specify a sequence of events.
A classic first event log option for curricular data consists of defining each case as a student and each event as a course taken by that student, in the order in which he/she has taken it. For example, <A, Q, C, D, A, Q, A> would define the trajectory of the student with s = 23, according to Table 2.
However, as mentioned above, our BPPM model proposes a different perspective on curricular data, where each event in a student's trajectory is the backpack he or she has at the end of each academic term. Formally, (1) each student identifier in the BPPMt table identifies a different case <b 1 , b 2 , . . . b n >; and (2) a b i event is defined as the record of the group of failed courses that the student must retake at the end of academic period i. For example, for the student identified with s = 23 in Table 2, <AQ, Q, -> represents their backpack trajectory and graphically it can be seen in Figure 3. In his/her first academic period (for example, semester), this student had failed algebra (A) and chemistry (Q) and therefore must retake them. In their second period the student must still retake chemistry (either because the student took it and failed it again or because the student decided not to take the course in the second period) and finally, after the third academic period, the student does not have courses that should be retaken. to take the course in the second period) and finally, after the third academic period, the student does not have courses that should be retaken.  Table 2, according to the BPPM model.
For simplicity, the above makes an abuse of notation, where the set {A, Q} is represented by the label AQ and the empty set is presented by the label -. Similarly, an ordering between course identifiers is assumed, such that two equal sets are represented by the same label {A, Q} = {Q, A} = AQ.
This definition of event log allows us to analyze the trajectories in the backpack. However, a student who stays two academic periods with the same backpack will be represented with two consecutive equal events, where the first event ends in the first period and the second in the second period. To analyze how much time the student maintains the same backpack, it needs to be represented as a single event, which begins at the end of the academic period in which this backpack appears and ends in the period in which the backpack changes. The BPPM model proposes the post-processing of the event log defined above, to merge consecutive events that represent the same backpack. That is, given a case <…; bi; bi+1; …; bi+n; …> where bi = bi+1 = … = bi+n, will result in a case <… bi:i+n …>, where backpack bi:i+n ranges from period i to period i + n.
To easily distinguish between cases that ended in dropout or retention, for the  Table 2, according to the BPPM model.
For simplicity, the above makes an abuse of notation, where the set {A, Q} is represented by the label AQ and the empty set is presented by the label -. Similarly, an ordering between course identifiers is assumed, such that two equal sets are represented by the same label {A, Q} = {Q, A} = AQ.
This definition of event log allows us to analyze the trajectories in the backpack. However, a student who stays two academic periods with the same backpack will be represented with two consecutive equal events, where the first event ends in the first period and the second in the second period. To analyze how much time the student maintains the same backpack, it needs to be represented as a single event, which begins at the end of the academic period in which this backpack appears and ends in the period in which the backpack changes. The BPPM model proposes the post-processing of the event log defined above, to merge consecutive events that represent the same backpack. That is, given a case < . . . ; b i ; b i+1 ; . . . ; b i+n ; . . . > where b i = b i+1 = . . . = b i+n , will result in a case < . . . b i:i+n . . . >, where backpack b i:i+n ranges from period i to period i + n.
To easily distinguish between cases that ended in dropout or retention, for the backpack trajectories that ended with the empty backpack, in the BPPM model, the "-" label was replaced by the label "RETENTION" and in the other cases, a "DROPOUT" event was added at the end of the case. In the example used in the previous paragraph, <AQ, A, ->, was replaced by <AQ, A, RETENTION> in the BPPM event log.
Finally, to obtain the event log for the BPPM-S model, each b i:i+n event was labeled with BP-j, where j represents the size of the backpack. For example, the case represented by <AQ, Q, RETENTION>, was replaced by <BP-2, BP-1, RETENTION> in the BPPM-S event log.

Discovery
In this stage, the event log generated is processed using process mining algorithms. Specifically, process discovery algorithms are used to automatically generate a model of the curricular trajectories.
In this work, the BPPM and BPPM-S models propose the use of the Directly Follows Graphs (DFG), that are built using the DFG algorithm [31]. There is a wide range of technologies that implement the DFG algorithm. Both academic (e.g., PM4Py [41], bupaR [42], ProM [28]) and commercial (e.g., Disco [43], Celonis [44]) alternatives are available. In this case, the model discovery stage was performed using bupaR, an integrated collection of R packages that creates a framework for the reproducible analysis of processes in R [42].
The DFG notation was chosen because it is one of the easiest to interpret by non-expert users in process mining [31]. Moreover, the DFG notation is especially recommended when the concurrency representation is not necessary (i.e., several events occurring simultaneously), as in our case (i.e., each event corresponds to a different, disjointed period).

Analysis
In this final stage, the model analysis was also performed with bupaR [42], considering different perspectives, which are described in more detail in Table 3, including the selection of node types, the selection of transition types and applied filters. Table 3. Filters and properties applied to the event logs to perform each analysis.

Model
Perspective Node Type Transition Type Filters Figure             The graph considers only the most frequent variants for each model, corresponding to 80% of the students. The RETENTION column considers only students who had a backpack and emptied it (1383 cases, seven more frequent variants). The DROPOUT columns consider only students who had a backpack, were not able to empty it and finally dropped out (199 cases, 13 more frequent variants).  The graph considers only the most frequent variants for each model, corresponding to 80% of the students. The RETENTION column considers only students who had a backpack and emptied it (1383 cases, seven more frequent variants). The DROPOUT columns consider only students who had a backpack, were not able to empty it and finally dropped out (199 cases, 13 more frequent variants).

Figure 5.
Average time (in days) that the students stay in each backpack. The graph considers only the most frequent variants for each model, corresponding to 80% of the students. The RETENTION columns consider only students who had a backpack and emptied it. The DROPOUT columns consider only students who had a backpack, were not able to empty it and finally dropped out.  For the BPPM model, a comparison between the backpack trajectories that ended in DROPOUT or RETENTION was performed, as well as the backpack trajectories that include the three most frequent backpacks were analyzed. For the BPPM-S model, the trajectories in relation to the backpack size that ended in DROPOUT and RETENTION were compared, as well as the backpack trajectories that start with different sizes of backpack.       The darker color of the nodes represents a higher percentage of students who went through a state. The thickness of the arrows represents the percentage of students who had transitions between both states. All values are percentages in relation to the total number of students included in each model.

Application Case: First Engineering Courses
This section illustrates the usefulness of the BPPM and BPPM-S models through a real application case that analyzes the backpack trajectories for N = 4466 engineering students from a Latin American university. Specifically, the trajectories they followed to take the first four courses of the curriculum were analyzed. The courses are calculus (C), algebra (A), chemistry (Q) and innovation (D). All four courses are automatically enrolled at the beginning of the first semester. The N = 4466 correspond to the students of the 2013 to 2019 cohorts, who passed the four courses or dropped out after having failed any of these courses. Specifically, the following three perspectives are analyzed: (P1) BPPM trajectories, ending either in retention (at the undergraduate program) or in dropout; (P2) most frequent backpacks; and (P3) the size of the backpack.
(P1) BPPM trajectories ending either in retention or in dropout The BPPM model allows us to compare the backpack trajectories between dropout students and those who remained. In particular, the differences can be seen in the distribution of the variants, the relative frequency of each backpack, the elapsed time in the entire trajectory and the average time students spend with each backpack. The darker color of the nodes represents a higher percentage of students who went through a state. The thickness of the arrows represents the percentage of students who had transitions between both states. All values are percentages in relation to the total number of students included in each model.

Application Case: First Engineering Courses
This section illustrates the usefulness of the BPPM and BPPM-S models through a real application case that analyzes the backpack trajectories for N = 4466 engineering students from a Latin American university. Specifically, the trajectories they followed to take the first four courses of the curriculum were analyzed. The courses are calculus (C), algebra (A), chemistry (Q) and innovation (D). All four courses are automatically enrolled at the beginning of the first semester. The N = 4466 correspond to the students of the 2013 to 2019 cohorts, who passed the four courses or dropped out after having failed any of these courses. Specifically, the following three perspectives are analyzed: (P1) BPPM trajectories, ending either in retention (at the undergraduate program) or in dropout; (P2) most frequent backpacks; and (P3) the size of the backpack.
(P1) BPPM trajectories ending either in retention or in dropout The BPPM model allows us to compare the backpack trajectories between dropout students and those who remained. In particular, the differences can be seen in the distribution of the variants, the relative frequency of each backpack, the elapsed time in the entire trajectory and the average time students spend with each backpack. Table 4 shows three groups of trajectories. The first of these corresponds to those that include only the empty backpack (No BP). This is the most frequent variant (2504 cases) and also the shortest (a single event). The second group corresponds to those students who, having failed one or more courses, manage to empty the backpack and remain in the study program. The third group corresponds to those students who, having failed one or more courses, drop out of the study program without having managed to empty the backpack. Table 4 also shows that, for the trajectories that include backpacks, both those that drop out and those who remain show a high variability (51 and 40 variants, respectively). In the same way, the number of backpack events is similar between students who remain and those who drop out. However, the average time that students who drop out stay with a non-empty backpack is significantly lower (p < 0.01).  Figure 4 shows that the first backpack of students who remain in their undergraduate programs are more evenly distributed, compared to the first backpack of students who dropped out, where close to half (50.25% of cases) began with backpack ACQ. An institution seeking to reduce the early dropout risk could use this information, for example, to implement support mechanisms for students who have failed certain courses simultaneously or change the design of its curricula to prevent certain combinations of failed courses from occurring at high frequencies.
When the average time that the students stay with each backpack is compared, it is possible to see that the average time that the dropout students stay with each backpack is less than the average time for students who remain. Figure 5 shows that the average time that students who ended in RETENTION stay with each backpack, varies from 166.35 to 275.1 days. In contrast, Figure 5 also shows that the average time that students who ended in DROPOUT stay with each backpack, varies from 0 to 129.6 days. In particular, those backpacks with an average duration of much less than one semester (A, AC, ACDQ, ADQ, CQ, Q), show that a significant proportion of students who fall into this situation dropout without even retaking such courses or attempting to pass them.
The BPPM model also allows us to compare the backpack trajectories that include specific backpacks. For the three most frequent backpacks: Q (464 cases), ACQ (458 cases) and A (456 cases), Figure 6 shows the proportion of educational trajectories, according to the BPPM model, that ended either in RETENTION or in DROPOUT. While there are differences in the proportion of students who dropped out for each backpack, in all cases the majority of students remained. Following, a more fine-grained analysis is presented, to illustrate the differences in the educational trajectories that include each backpack. Figure 7 shows the 90% most frequent variants of the students who had backpack A, ACQ and Q. Figure 7a shows that the vast majority (94.56%) of the students had backpack A as the first and the only backpack in their trajectories. Figure 7b shows that the backpack ACQ was the first backpack for all of these students. Most of those who dropped out (62 over 88) had direct transitions from ACQ to DROPOUT. In contrast, the majority of students who remained emptied their backpack after several stages. The institution could then encourage those students who have the ACQ backpack not to take such failed courses simultaneously by suggesting a certain sequence. In this case, the vast majority of students who defer the Q course remain. Figure 7c shows that most students did not have the backpack Q as the first backpack, but they had it after passing one or more courses they had previously failed. This behavior could show a sort of prioritization of students who have multiple courses in their backpacks, postponing taking course Q. Furthermore, only a small minority of the students who had this backpack, ended in DROPOUT. This reinforces the idea that students who remain, and have failed a course, have given higher priority to courses other than Q. The institution should then analyze the possibility of postponing this course in the study plan.
(P3) size of the backpack.
The BPPM-S model allows us to compare the educational trajectories across different backpack sizes. This study illustrates the comparison between backpack trajectories for students who dropout and students who remain, as well as the comparison between those that start with different backpack sizes. Figure 8a shows that backpack trajectories for students who manage to empty it. They mostly start with a backpack size of 1 or 2 and most students who start with a larger backpack, reduce its size before emptying it, going through BP-1. In addition, the average time that students spend with a given backpack size is longer than one semester (150 days, approximately). Figure 8b shows that most backpack trajectories for dropout students start with a larger backpack size. Moreover, for backpack sizes larger than 1, there are mainly direct transitions from the nodes to dropout. Figure 9 shows the proportion of students who dropped out or remained, grouping them according to the initial backpack size. Most students whose initial backpack size is less than 4 (BP-1, BP-2 and BP-3), emptied their backpack and remained in their undergraduate programs. 96.57% of students who started in BP-1, 92.63% of students who started in BP-2 and 74.69% of students who started in BP-3, ended in RETENTION. In contrast, only 35.21% of students who started in BP-4, ended in RETENTION. Figure 10 shows that, in all cases, the majority of students who dropped out, have direct transitions from the initial backpack to dropout. The above shows the importance of defining support strategies for students who simultaneously fail several courses, as well as the need to review the curriculum, evaluating the placement of several high-failure rate courses in the same semester.

Discussion
In this paper, the Backpack Process Model (BPPM) was presented, a model that allows systematizing the analysis of curricular trajectories using the backpack metaphor through PM techniques. Its purpose is to represent the psychological burden that students perceive while they have failed courses that they must retake. This model offers a new alternative to analyze curricular trajectories and contribute to understanding why a student remains or drops out from their undergraduate program. This model will help to carry out timely interventions that allow the retention of students at risk of dropping out. We believe that this approach is relevant for an international audience given that global participation in higher education has grown in many countries [45], with HEIs receiving more heterogeneous students, in terms of prior preparation, socioeconomic background and beliefs about learning. Actually, students show more complex enrollment patterns [46] and more variability in their academic results [8]. In these contexts, curricular analytics models that are based on recent research and go beyond the analysis of curricular records can be useful.
We highlight the main findings in the application case: First, BPPM shows differences between the backpack trajectories that ended in retention or in dropout. Almost half of the students who ended up dropping out start with the ACQ backpack and most of them did not empty their backpacks before dropping out. On the other hand, most of the students who partially emptied their backpacks in more than one stage, even in several semesters, remained. According with Stump, Husman & Corby [47], this difference in their behaviors could be explained by their beliefs about the nature of intelligence and whether it can be developed or not. Students with incremental beliefs about intelligence and self-efficacy may try harder in higher education [19]. Those with less successful initial trajectories but who nevertheless remain and eventually finish their undergraduate programs are termed struggling persisters [48] and they have received more attention while the proportion of less prepared student has increased in higher education [45].
Second, BPPM-S shows differences in the proportion of students who dropped out or remained, depending on the initial size of the backpack. The larger initial backpack size was associated with a higher proportion of dropouts and also with a higher proportion of direct transitions to dropout. In the case of engineering, the competitive culture and the nature of the first courses as contributors to a process of "natural selection" [49] have been used before to explain this phenomenon. The HEI in which this application case was made is highly selective [50] and previous self-efficacy beliefs are expected to influence the decision to stay, although the initial results are not satisfactory. It could explain that only students with the largest backpack size ended up dropping out at a higher rate. According to Snyder et al. [19], positive beliefs about effort were moderately associated with the success of the first semester in engineering, but the association between their beliefs about effort and the decision to stay was found to be stronger.
Descriptive statistics on BPPM have shown interesting findings, related with backpack size and frequency of each backpack type, as well as their relationship with the student decision on drop out or remain. Nevertheless, the expressive power of BPPM goes further, providing insights on the dynamic behavior of students when managing their backpacks. The sequence followed by students who drop out or remain to empty their backpacks, as well as the time it takes to do so, are good examples. PM tools, combined with domain models, provide a powerful instrument to obtain a deeper understanding of the dynamic behavior of students [30,35].
The findings in the application case should not be taken as general conclusions, but as examples of the expressive power of the BPPM and BPPM-S models.
This application case has two main limitations and the conclusions drawn from it should take them into account. First, the conclusions derived from the BPPM and from process mining in general depend largely on the accuracy and completeness of the information used [51]. The BPPM uses data extracted from the curriculum dimension, so to obtain a deep understanding of the student's decisions, it should be used as a complement to other information sources. Second, the conclusions derived from the application case study should not be considered as general findings because this study was carried out in a specific institution and time window and only first-semester courses were considered.

Conclusions
From the PM perspective, the contribution of BPPM to CA is twofold: Firstly, it systematizes the analysis of curricular trajectories based on the backpack metaphor, characterizing students with similar behaviors in similar contexts. BPPM can improve the understanding of how students handle failed courses they must retake in each study program and in which sequences students stay or drop out most often. Furthermore, the BPPM shows how it is possible to integrate methodologies for sequence analysis and give them a specific meaning in study contexts such as higher education. It is a way of complementing other study metrics to seek an understanding of the educational trajectories that lead to dropout [52].
Secondly, BPPM can help managers and policymakers because the analysis of educational data can help to design and implement timely interventions. Backpack monitoring could be implemented in HEIs, to support counseling. While academic performance is a strong predictor of retention [53], student's beliefs about the usefulness of effort have a significant influence on academic performance [19]. Good counseling services use technology to identify students at risk [6,27] and BPPM could be used as a complement to identify these students. Additionally, understanding how students handle their backpacks could be used to redesign the curriculum, to reduce the risk of students getting a very large backpack size and to improve student satisfaction with the curriculum. These kinds of decisions could help to reduce early dropout.
We believe that the BPPM and BPPM-S models could be used to analyze longer curricular trajectories that include the entire study plan. This analysis could help to understand the impact of failed courses that students must retake on late dropout and stop out decisions. The application of trace clustering techniques, in combination with DFGs, could be useful to reduce the complexity of longer BPPM educational trajectories, decomposing traces into smaller and more understandable backpack trajectories. In this context, hierarchical clustering [54] looks promising for future works. Furthermore, qualitative analysis could expand the understanding of students' beliefs about effort and the nature of intelligence [19] on decisions about course taking, stop out and dropout.