Temporal Modeling of LMS Logs and Zero-Shot LLM Prediction: A Multi-Course Study in Moodle

Shehada, Wala’a; Ashqar, Huthaifa I.; Ewais, Ahmed; Hatzilygeroudis, Ioannis

doi:10.3390/app16062707

Open AccessArticle

Temporal Modeling of LMS Logs and Zero-Shot LLM Prediction: A Multi-Course Study in Moodle

¹

Department of Natural, Engineering, and Technology Sciences, Arab American University, Jenin P.O. Box 240, Palestine

²

Department of AI and Data Science, Arab American University, Jenin P.O. Box 240, Palestine

³

AI Program, Columbia University, New York, NY 10027, USA

⁴

eLearning Center, Arab American University, Jenin P.O. Box 240, Palestine

⁵

Department of Computer Engineering & Informatics, University of Patras, 26504 Rion, Greece

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2026, 16(6), 2707; https://doi.org/10.3390/app16062707

Submission received: 5 February 2026 / Revised: 4 March 2026 / Accepted: 10 March 2026 / Published: 12 March 2026

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Learning Management Systems (LMS) generate rich activity and interaction logs that can be exploited using machine learning techniques. This study models temporal engagement patterns, such as early, middle, late, weekend, and night activity, derived from Moodle logs in multiple undergraduate courses. It constructs temporal feature vectors per-student, applies k-means clustering to uncover behavioral patterns, and then uses ANOVA and Kruskal–Wallis tests to assess whether patterns differ in final grades. Results show that the predictive value of temporal patterns is highly course-dependent; in some courses, structured early engagement aligns with higher achievement, whereas in others, heavy weekend and night usage is associated with the best outcomes. To complement the obtained quantitative analyses, a Large Language Model (LLM) (i.e., ChatGPT) is evaluated as a zero-shot classifier that receives only natural-language summaries of temporal behavior and predicts performance tiers. While accuracy is limited, the model produces a coherent approach, indicating value as an interpretable layer on top of statistical analysis. The work demonstrates a generalizable pipeline for temporal feature engineering, unsupervised profiling, and LLM-based reasoning over LMS data for early risk detection in digital learning environments.

Keywords:

temporal engagement; learning analytics; Moodle; student performance; clustering; Large Language Models (LLMs); zero-shot prediction

1. Introduction

With the increasing reliance on LMS such as Moodle, Blackboard, and others, a massive number of students’ learning activities and their grades are available, leading researchers to propose different learning analytics approaches and technical solutions [1]. In particular, understanding how students engage with online learning environments has become a central objective in learning analytics. Temporal engagement, i.e., when students interact with course materials rather than merely how much, has emerged as a significant behavioral dimension linked to self-regulated learning, cognitive load management, and academic success [2].

Different studies report strong links between LMS activity and academic achievement; others find weak or negligible associations [3]. Prior research suggests that early, distributed engagement supports deeper learning, whereas last-minute or late-night activity may indicate cognitive overload or tendencies toward procrastination [1,4]. Other valuable insights into learner behavior and outcomes have been explored, such as persistent engagement over the entire semester, which is also considered a good indicator of success [5,6].

Accordingly, the utilization of data-mining techniques, learning analytics approaches, and deep learning methods [7,8] has been extensively explored to support students, instructors, educational administrators, and policy makers in predicting performance, detecting at-risk learners, optimizing course design and learning materials, and informing timely interventions [9]. From a computer science perspective, LMS logs are increasingly treated as large-scale temporal event streams, enabling the application of clustering, predictive modeling, and representation-learning techniques to uncover meaningful behavioral patterns and drive data-informed decision making [10].

Furthermore, different research work suggested tools that can be integrated into LMS to provide advanced analytics and visual dashboards [11], such as temporal learning analytics visualizations for assessment [9], a Moodle-based learning analytics dashboard for monitoring student success and system activity using machine learning techniques [12,13], and the Academic Analytics Tool, which offers course-level exploration of LMS interaction data for educators and designers. Other attempts proposed aligning learning materials and contents based on different factors related to students’ progress and behavior, during the semester such as [3,14].

Previous studies have frequently focused on a single course or cohort, which limits the generalizability of discovered engagement profiles and predictive models to other instructional contexts [10]. However, our study considers students’ activity logs from three different undergraduate courses with distinct structures and assessment schemes, thereby addressing the limitation of relying solely on a single course and enabling a cross-course comparison of temporal engagement patterns and their relationship with academic performance.

Beyond engagement timing within a course, broader research in educational psychology highlights the importance of time regulation and adequate night rest for cognitive performance. Studies show that irregular sleep schedules, late bedtimes, and short sleep duration negatively affect learning consolidation, attention, and academic achievement [5,6]. Effective time management and balanced study-rest cycles are therefore critical components of self-regulated learning. These findings provide additional theoretical grounding for examining night and late-period LMS activity as potential indicators of academic performance patterns.

Motivated by these factors, this study conducts an in-depth investigation of learning behaviors over time in three compulsory undergraduate courses in linguistics and the social sciences. Their obligatory nature ensures large enrollments and rich Moodle interaction logs across the semester, providing a robust basis for temporal modeling. At the same time, the courses differ substantially in learning objectives, assessment design, and use of online activities, offering diverse instructional contexts within a single institutional setting. By extracting five time-related features (early, middle, late, weekend, and night activity) from Moodle logs and applying K-Means clustering separately for each course and using statistical analysis. The hybrid analysis seeks to uncover meaningful engagement profiles, assess whether these temporal patterns are associated with academic performance, and evaluate how consistent or divergent temporal behavior patterns are across different instructional contexts.

To deepen the behavioral interpretation, the study adds an exploratory experiment using an LLM as a zero-shot predictor of student performance to statistical and clustering analysis. Instead of depending on traditional feature-based supervised learning, GPT received prompts using natural-language summaries of each student’s activities over time. This approach examined whether an LLM can predict performance levels based solely on descriptive behavioral signals. This integration shows how modern AI reasoning can provide a narrative and interpretive way to understand engagement patterns.

The study’s contribution lies in the systematic integration and evaluation of established methods across multiple instructional contexts using a unified temporal feature framework. By applying consistent temporal segmentation and clustering procedures to three structurally distinct courses, the study explicitly examines whether engagement-performance relationships generalize or remain course-dependent. This cross-course validation provides empirical evidence that temporal effects are pedagogically mediated rather than universally predictive.

Additionally, the inclusion of an LLM is not positioned as a predictive enhancement but as an exploratory investigation into whether generative AI can meaningfully interpret structured temporal behavior summaries. The LLM experiment critically evaluates the strengths and limitations of zero-shot reasoning in quantitative educational contexts. Rather than altering clustering outcomes or improving prediction accuracy, the LLM component serves as a conceptual bridge between traditional learning analytics and emerging generative AI interpretability paradigms. This dual analytical perspective contributes to understanding both the potential and boundaries of LLM-based reasoning over structured behavioral data. Therefore, the novelty of the study lies in the systematic multi-course validation of temporal analytics and the critical evaluation of LLM interpretive capabilities within an established quantitative pipeline

2. Background

Temporal learning analytics has gained more attention as researchers aim to understand not just how much students engage but also when they engage. A foundational study by [9] showed how visualizations of temporal trace data during assessments can improve instructors’ awareness of student behavior, especially around critical times of exam preparation. The study proposed a Temporal Learning Analytics Visualizations tool that depends on Partial Least Squares (PLS) structural equation modeling. In a similar vein, recent work by [15] identified distinct temporal engagement patterns, such as “active before quiz” or “active before exams,” through clustering and linked them to academic performance. Their findings indicated that early engagement often correlates with higher achievement.

Other scholars have investigated how students allocate their time in digital learning environments and how this affects their performance. For example, a study in Smart Learning Environments [16] used clustering and time analysis to show that how learners divide their study time among different phases of preparation (early, middle, late) is closely linked to performance. However, this connection is not always clear and varies based on the type of learning material. Shawareb et al. (2024) [11] applied data-mining techniques to Moodle log data to predict students’ performance, demonstrating that interaction traces in LMS platforms can serve as a strong basis for predictive modeling. Our work extends this perspective by focusing specifically on temporal engagement features and combining clustering, statistical testing, and LLM-based reasoning over these traces [11]. Another study uses clustering techniques to connect engagement behaviors in online courses to student outcomes [10] by applying K-Means clustering to log data and it found that behavioral engagement profiles significantly relate to perceived learning and academic results.

In addition to these log-based and clustering methods, the use of LLMs in educational analytics is growing. Some recent studies investigate whether LLMs can predict learner performance. Reference [17] examined zero-shot and few-shot prompting for knowledge tracing. They showed that even without fine-tuning, LLMs can accurately predict student performance trends. Reference [18] suggested using LLMs with longitudinal experiential data, combining time-based behavioral data, handling missing values, and prompt design to predict academic pathways and possible intervention points.

Despite valuable progress in temporal learning analytics, most prior studies focus on a single course or either clustering or prediction separately. Different systematic reviews in learning analytics such as ref. [19] highlight that clustering and statistical modeling are key to revealing engagement-performance relationships. Only a few studies have looked at temporal patterns across multiple courses using a unified feature set. Research on using Large Language Models to interpret temporal engagement is also limited. Existing LLM studies examine zero-shot reasoning for knowledge tracing, but none assess whether an LLM can directly determine performance tiers from structured temporal summaries. This study addresses this gap by applying consistent temporal clustering across three courses and introducing GPT as an exploratory zero-shot reasoning tool. It provides both quantitative and qualitative insights into how the timing of engagement connects to student performance.

3. Methodology

This study used a multi-stage analytical process to explore how Moodle engagement over time connects to academic performance. Our methodology integrates steps to extract useful information and knowledge from the three obligatory courses dataset in Moodle. The core steps include data collection, preprocessing, feature engineering, unsupervised clustering, statistical testing, and an exploratory LLM-based prediction experiment. Figure 1 shows an overview of the entire workflow. Previous steps are explained in the subsequent sections.

3.1. Data Collection

Data were collected from three undergraduate courses at the Arab American University, Jenin, Palestine: Arabic Language, Fundamentals of Research Methods, and Palestinian Studies. For each course, two datasets were exported from Moodle:

Activity Logs: which contained timestamped interaction events performed by each student.
Grade Reports: which included the final course grade (“Course Total”) for each student.

The datasets were cleaned, merged, and standardized to ensure consistency in student identifiers. To ensure responsible use of student data, all analyses followed strict privacy and ethical guidelines. All datasets were fully anonymized before processing. No personally identifiable information was stored or accessed during the analysis. The study met established standards for educational data mining and institutional data governance. This ensured that all procedures complied with privacy protection requirements.

3.2. Preprocessing and Feature Engineering

Each log entry had a timestamp in different formats. All timestamps were changed to a consistent datetime format and broken down into derived variables: hour of day, day of week, and ISO week number.

3.2.1. Temporal Segmentation Algorithm

To analyze temporal pacing, the semester duration for each course was divided into three equal-length periods: (Early period: first third of active weeks, Middle period: middle third, and Late period: final third). The procedure for temporal segmentation (Early, Middle, Late) was implemented as shown in the following Algorithm 1:

Algorithm 1: Temporal Segmentation of LMS Logs

1. Extract the week number from each timestamp in the log file.
2. Identify:
            START_WEEK = earliest week with student activity
            END_WEEK = latest week with student activity
3. Compute the total duration:
            RANGE = END_WEEK - START_WEEK
4. Divide the duration into three equal segments:
            BOUNDARY_1 = START_WEEK + (RANGE/3)
            BOUNDARY_2 = START_WEEK + (2 × RANGE/3)
5. For each log entry:
            IF week ≤ BOUNDARY_1:
                       assign “Early”
            ELSE IF week ≤ BOUNDARY_2:
                       assign “Middle”
            ELSE:
                       assign “Late”
6. For each student:
            Calculate:
                   EARLY_RATIO = (# events in Early)/(total events)
                   MIDDLE_RATIO = (# events in Middle)/(total events)
                   LATE_RATIO = (# events in Late)/(total events)

The resulting temporal classifications were transformed into proportional engagement ratios for each student. These ratios served as the primary inputs for clustering and statistical evaluation in subsequent stages of the analysis.

It is important to note that the Middle period showed nearly no student activity in all three courses. Consequently, middle_ratio values were close to zero for almost all students. This suggests that learners tend to engage more at the beginning and end of the semester, while the mid-semester period is mostly inactive. This is also clarified in [20].

Although the middle period exhibited very low engagement across all three courses, the feature was retained to preserve the theoretical symmetry of the Early–Middle–Late temporal segmentation. To evaluate its impact, an ablation analysis was conducted in which the clustering procedure was repeated without the middle_ratio feature. The resulting cluster centroids, student assignments, and statistical significance results showed no meaningful differences. More than 95% of students retained identical cluster membership, and ANOVA and Kruskal–Wallis outcomes remained consistent. This confirms that the middle_ratio does not materially influence clustering behavior but does not distort it either.

3.2.2. Temporal Feature Engineering

For each student, five temporal engagement ratios were computed based on the distribution of their events: Early Ratio, Middle Ratio, Late Ratio, Weekend Ratio, which is proportion of events on Friday or Saturday, Night Ratio, which is proportion of events between 18:00–23:59. Additionally, the total number of events was retained as a measure of overall engagement intensity. This feature engineering process produces one feature vector per student per course, which will be considered as input in the hybrid processing model. Before clustering, all temporal features, including Early, Middle, Late, Weekend, Night ratios, and total events, were standardized using z-score normalization (Standard Scaler). This step ensured that features with larger scales, such as total events, did not dominate the distance calculations.

3.3. Hybrid Processing Model

The proposed hybrid processing model includes a machine learning technique, a statistical analysis method, and an LLM-based zero-shot prediction component. The machine learning technique is mainly used for discovering latent temporal engagement profiles by clustering students based on their time-related LMS activity features. On the other hand, the statistical analysis method is used to test whether these clusters differ significantly in academic performance and to quantify the strength of the relationship between temporal behaviors and final grades. Finally, LLM-based zero-shot prediction is mainly used for mapping natural-language summaries of students’ temporal patterns to performance tiers and generating interpretable explanations that complement the quantitative findings. Given the growing interest in using LLMs for educational analytics, it is important to examine whether such models can infer student performance from temporal features alone. This serves not as a replacement for statistical analysis, but as a complementary experiment that tests the potential of LLM-based prediction in comparison with conventional methods. Therefore, after completing the core quantitative analysis, we conducted an exploratory experiment using GPT as a zero-shot and few-shot predictor to evaluate whether an LLM can correctly classify student performance tiers based solely on temporal learning behaviors

3.3.1. Clustering of Temporal Behaviors

To reveal hidden behavior patterns in students’ engagement over time, a clustering analysis was done using the K-Means algorithm [21]. The clustering was carried out for each course separately. This approach preserved the unique engagement structures of each course, rather than merging them into a single model. K-Means was chosen because of its effectiveness in identifying compact, spherical clusters in behavioral data and its widespread use in learning analytics.

The algorithm was configured as follows: k = 3, reflecting the study’s aim to uncover three distinct engagement profiles, n_init = 10, ensuring robustness by optimizing across multiple centroid initializations, and random_state = 42, to guarantee reproducibility. Each student was assigned to one of the three new temporal-behavior clusters. It is important to note that no semantic meaning was placed on the cluster labels, since the K-Means algorithm assigns cluster indices at random. Instead, the clusters are interpreted afterward based on their temporal feature centroids.

To empirically validate the choice of the number of clusters, Silhouette Score analysis was conducted for k values ranging from 2 to 6 for each course independently. The Silhouette Score measures how similar a student is to their assigned cluster compared to other clusters, providing an index of cluster cohesion and separation. Across the three courses, k = 3 produced either the highest or near-highest average Silhouette Score. For larger values of k, the scores decreased, indicating reduced separation and increased overlap between clusters. Although k = 2 yielded reasonable separation, it merged pedagogically distinct temporal profiles, limiting interpretability. Therefore, k = 3 was selected based on both quantitative validation and behavioral interpretability.

3.3.2. Statistical Analysis

To see if academic performance was different across clusters, two statistical tests were used for each course.

One-way ANOVA: a parametric test that evaluates whether the mean grade differs across the three temporal clusters. The ANOVA F-statistic is computed as:

F = \frac{M S B}{M S W}

M S B = \frac{S S B}{(k - 1)} M S W = \frac{S S W}{(N - k)}

S S B = \sum n ᵢ {(\bar{x} ᵢ - \bar{x})}^{2} S S W = \sum \sum {(x ᵢ ⱼ - x \bar{ᵢ})}^{2}

where

F

: ANOVA F-statistic

M S B

(Mean Square Between): variability of mean grades between clusters

M S W

(Mean Square Within): variability of grades within clusters

S S B

(Sum of Squares Between): total variability between cluster means

S S W

(Sum of Squares Within): total variability within clusters

k

: number of clusters (groups)

N

: total number of students

n_{i}

: number of students in cluster

i

\bar{x_{i}}

: mean grade of cluster

i

\bar{x}

: overall mean grade of all students

x_{i j}

: grade of student j in cluster

i

Kruskal–Wallis H-test: a non-parametric alternative that evaluates whether grade distributions differ across clusters. The test statistic is computed as:

H = \frac{12}{N (N + 1)} \sum_{i = 1}^{k} \frac{R_{i}^{2}}{n_{i}} - 3 (N + 1)

where:

k

: number of clusters (groups)

N

: total number of students

n_{i}

: number of students in cluster

i

R_{i}

: sum of ranks of grades in cluster

i

\frac{R_{i}^{2}}{n_{i}}

: rank-based variance component for cluster

i

These tests examined whether cluster membership was linked to differences in final grades [22].

3.3.3. LLM-Based Zero-Shot Prediction

In addition to the statistical analyses, an exploratory experiment examined whether a large language model, GPT, can infer student performance from engagement patterns shown in natural language. This component was not intended as a predictive model but rather as a complementary, explainable reasoning mechanism.

Zero-shot inference with LLMs has recently gained attention for its ability to reason from descriptive input without specific training. However, models often have difficulty with structured numerical features and depend more on linguistic cues (Bubeck et al., 2023 [23]).

To evaluate this capability, each student’s temporal profile, which includes the Early, Middle, Late, Weekend, and Night ratios, along with the total number of events, was transformed into a short natural-language description. GPT received this description without any prior examples and was prompted to classify the student into one of three performance tiers: High, Medium, or Low.

The procedure consisted of three steps. Step 1 is Prompt Construction where feature vectors were turned into natural-language summaries using the following template:

Student temporal activity summary:
- Early ratio: {early_ratio}
- Middle ratio: {middle_ratio}
- Late ratio: {late_ratio}
- Weekend ratio: {weekend_ratio}
- Night ratio: {night_ratio}
- Total events: {total_events}
Based on these engagement patterns, predict the student’s expected performance tier (High/Medium/Low) for the course final grade.

Step 2 is LLM Inference. GPT provided a narrative explanation and a predicted performance tier based on its interpretation of the temporal indicators. Step 3 is Evaluation. For evaluation, only the final predicted label (High, Medium, or Low) was taken from GPT’s output using simple string matching. The extracted labels were then compared directly to the true performance tiers based on students’ final grades. Accuracy was calculated separately for each course to check the model’s predictive consistency across different contexts.

To ensure reproducibility and transparency, the LLM experiment was conducted using GPT-4 (OpenAI API). All inferences were performed under deterministic settings with temperature = 0 and top_p = 1.0 to eliminate randomness in generation and ensure consistent outputs across repeated runs. Performance tiers (High/Medium/Low) were defined numerically based on course-specific grade distributions. Students were divided into three equal-sized groups using tertile splits of final grades within each course dataset. The top third of grades was labeled “High,” the middle third “Medium,” and the bottom third “Low.” This ensured balanced class representation and avoided arbitrary grade thresholds across courses.

To contextualize LLM performance, two baseline accuracies were computed. First, a random baseline assuming uniform probability across three classes yields an expected accuracy of approximately 33.3%. Second, a majority-class baseline was calculated separately for each course based on the largest performance tier proportion (ranging between 34% and 38% across courses). The reported GPT overall accuracy (47.4%) exceeds both baselines, indicating performance above chance level, although still insufficient for reliable predictive deployment.

This experimental phase provided a qualitative, interpretable perspective on how an LLM reasons about temporal learning behaviors, serving as a complement, not an alternative to traditional quantitative methods. This component was exploratory in nature, and it was intended to complement the proposed model, rather than replace, traditional statistical findings by offering interpretable, narrative-driven insights.

4. Experimental Evaluation

4.1. Experimental Protocol

This study examined learning behaviors over time in three undergraduate courses: Arabic Language, Fundamentals of Research Methods, and Palestinian Studies. It aimed to see how different patterns of Moodle engagement related to academic performance. Five normalized temporal features (early, middle, late, weekend, and night) were used. K-Means clustering (

k = 3

) was applied separately for each course. The clusters were then evaluated using ANOVA and Kruskal–Wallis tests to determine whether temporal patterns were associated with differences in final grades. Because K-Means assigns labels arbitrarily, and temporal behaviors differed by course, each course is interpreted independently, followed by a cross-course synthesis.

4.2. Clustering and Statistical Results by Course

4.2.1. Linguistic Course

The linguistic course is Arabic Language which has a dataset with 241 students. Table 1 shows the average temporal feature ratios for the three groups created by the K-Means algorithm. These features reflect when learners were most active on the Moodle platform: early, middle, or late, along with their weekend and night engagement patterns.

Table 1 illustrates a slight variation in temporal engagement patterns across the three clusters, especially in early and late activity ratios. To further visualize these behavioral differences, Figure 2 presents the distribution of grades across clusters using boxplots. It shows the grade distribution for the Arabic Language course across the three clusters.

Although the median performance looks similar, the differences in variability and outliers suggest that how often students engage over time might relate to differences in their academic success. To better understand these behavior patterns and how they connect to academic performance, the three clusters are interpreted in detail below:

Cluster 0: Moderately Consistent Learners

Students in this cluster exhibited high early engagement (0.871), complemented by relatively higher weekend (0.165) and night activity (0.166) compared to the other groups. This pattern suggests a flexible but generally structured learning approach.

Cluster 1: Highly Structured, Early Learners

This group demonstrated the strongest early engagement (0.925) with very limited late (0.075), weekend (0.025), or night activity (0.037). Their behavior reflects strong self-regulation and disciplined study planning.

Cluster 2: Procrastinated Learners

Students in this cluster showed the highest late engagement ratio (0.309), indicating a tendency to complete tasks closer to deadlines. Weekend and night activities were moderate.

To examine whether these behavioral differences in the Arabic Language course are reflected in academic performance, Table 2 summarizes the final grade statistics for each cluster.

Despite the behavioral differences observed across clusters, the grade statistics presented in Table 2 show that the mean and median final grades are nearly identical among the three groups. To further examine whether these small variations represent meaningful differences, both ANOVA and Kruskal–Wallis tests were conducted. The ANOVA result (p = 0.9908) indicated no significant difference between cluster means, and the Kruskal–Wallis test (p = 0.7909) similarly showed no significant differences in grade distributions.

These findings confirm that temporal engagement patterns did not translate into measurable differences in academic performance in the Arabic Language course.

4.2.2. Fundamentals of Research Methods Course

The Fundamentals of Research Methods dataset included 157 students. Table 3 presents the mean temporal engagement ratios for the three clusters. These features capture when students were most active on the Moodle platform, including early, late, weekend, and night engagement patterns.

It can be noticed that Table 3 shows modest but noticeable differences in temporal engagement patterns across the three clusters, particularly in early and late activity. These variations are further examined in the following grade distribution figure.

Figure 3 illustrates the distribution of final grades across the three clusters in the Research Methods course. While the clusters show some variation in grade ranges and outliers, their median performance remains broadly comparable.

To better understand these patterns, the three clusters are described in detail below

Cluster 0: Late-Oriented, Night-Active Learners

Students in this cluster showed moderate early engagement (0.737) alongside the highest late (0.263) and night activity (0.231) among all groups. Their behavior suggests a tendency toward completing coursework later in the day and relying more heavily on nighttime study hours.

Cluster 1: Structured Early Learners

Cluster 1 demonstrated strong early engagement (0.918), low late usage (0.082), and minimal weekend activity (0.015). Their pattern reflects consistent weekday-based early study habits.

Cluster 2: Highly Early, Weekend-Active Learners

Cluster 2 exhibited the highest early engagement (0.932) and the highest weekend activity (0.129), combined with low late usage (0.068). This suggests proactive engagement during the week with additional reliance on weekend study time.

To examine whether these temporal engagement patterns are reflected in academic performance, Table 4 summarizes the final grade statistics for the three clusters in the Research Methods course.

Table 4 indicates substantial variation in academic performance across clusters, with Cluster 2 achieving the highest scores and Cluster 1 the lowest. Both ANOVA (p = 0.0017) and Kruskal–Wallis (p = 5.17 × 10⁻⁶) tests show statistically significant differences, suggesting that temporal engagement behavior may meaningfully influence achievement in the Research Methods course.

4.2.3. Social Science Course

The social science course is the Palestinian Studies which has a dataset with 175 students. Like the previous courses, temporal features were extracted to capture when students engaged with the Moodle platform, including early, late, weekend, and night activity. Applying K-Means clustering to these behavioral indicators produced three distinct temporal patterns, as summarized in Table 5.

Table 5 shows noticeable variability in temporal engagement behaviors across clusters, particularly in early, late, weekend, and night activity. These differences are further illustrated in the grade distribution figure.

Figure 4 visualizes the distribution of final grades for the three clusters in the Palestinian Studies course. While all groups show comparable median performance, differences in variability and outliers reflect distinct engagement patterns.

To interpret these temporal patterns more clearly, the behavioral profiles of each cluster are outlined below.

Cluster 0: Procrastinators

Cluster 0 exhibits the highest late engagement (0.380) and moderate weekend/night usage. Their temporal pattern indicates a tendency toward last-minute study and less structured time management.

Cluster 1: Highly Structured Learners

This cluster shows very strong early engagement (0.929) and minimal late (0.071), weekend (0.023), and night activity (0.056). Their behavior reflects highly disciplined and consistent study patterns.

Cluster 2: Heavy Weekend/Night Learners

Cluster 2 is characterized by extremely high weekend activity (0.354) and the highest night usage (0.402), combined with high early activity (0.881). This group tends to study outside typical hours, relying heavily on late-night and weekend sessions.

To examine whether these temporal engagement behaviors correspond to differences in academic performance, Table 6 presents the final grade statistics for the three clusters.

The grade statistics indicate modest but noticeable performance differences among the clusters, with Cluster 2 achieving the highest mean and median grades, followed by Clusters 1 and 0. To determine whether these differences are statistically meaningful, both ANOVA and Kruskal–Wallis tests were conducted. The ANOVA result (p = 0.00099) indicates significant differences in mean grades across clusters, and the Kruskal–Wallis test (p = 1.31 × 10⁻⁶) confirms significant variation in grade distributions. These findings suggest that, in the Palestinian Studies course, temporal engagement patterns, particularly heavy weekend and late-night activity, are associated with higher academic performance. This suggests that evening or weekend study can still be effective in theoretical or reading-intensive courses.

In addition to statistical significance testing, effect sizes were computed to evaluate the practical magnitude of the observed differences. For the one-way ANOVA, eta squared (

η^{2}

) was calculated as

η^{2} = S S B / S S T

, which represents the proportion of total grade variance explained by cluster membership. For the Kruskal–Wallis test, epsilon squared (

ε^{2}

) was calculated using

ε^{2} = (H - k + 1) / (N - k)

. Effect sizes were interpreted using conventional thresholds: small (=0.01), medium (=0.06), and large (=0.14).

In the Arabic Language course, effect sizes were negligible (

η^{2} = 0.0003; ε^{2} = 0.0002

), confirming that temporal cluster membership explained virtually no variance in final grades. In the Research Methods course, effect sizes were moderate (

η^{2} = 0.084; ε^{2} = 0.092

), indicating that approximately 8–9% of grade variance was explained by temporal engagement patterns. In the Palestinian Studies course, effect sizes were small-to-moderate (

η^{2} = 0.061; ε^{2} = 0.067

), suggesting a measurable but more modest practical impact of temporal behavior on academic performance. These results confirm that temporal engagement effects are not only statistically significant in two courses but also practically meaningful, whereas no meaningful effect is observed in the Arabic Language course.

For completeness, the distribution of students across clusters is summarized below for each course. In the Arabic Language course (

N = 241

), Cluster 0 included 19 students (7.9%), Cluster 1 included 176 students (73.0%), and Cluster 2 included 46 students (19.1%). In the Research Methods course (

N = 157

), Cluster 0 included 24 students (15.3%), Cluster 1 included 95 students (60.5%), and Cluster 2 included 38 students (24.2%). In the Palestinian Studies course (

N = 175

), Cluster 0 included 47 students (26.9%), Cluster 1 included 110 students (62.9%), and Cluster 2 included 18 students (10.3%).

Although some imbalance is observed, particularly in the Arabic Language course where one dominant cluster appears, cluster sizes remain sufficiently large to support statistical comparison. These distributions reflect natural engagement patterns rather than enforced equal partitioning.

4.3. Cross-Course Analysis of Temporal Behaviors

A comparison across the three courses reveals several important insights regarding temporal engagement and its relationship with academic performance.

Cluster meanings were not consistent across courses

The behavioral interpretation of Cluster 0, Cluster 1, and Cluster 2 changed based on the course. In some courses, early engaged clusters performed best, while in others, clusters characterized by weekend or night activity achieved higher outcomes. This inconsistency is expected because K-Means assigns cluster labels arbitrarily, and the meaning of each group depends on the specific behavioral patterns in each dataset.

2.: Research Methods and Palestinian Studies demonstrated statistically significant performance differences

Both courses showed strong evidence that temporal engagement patterns are associated with final grades, as indicated by significant ANOVA and Kruskal–Wallis results. In these courses, students with more structured or proactive engagement, particularly early or regular study habits, tended to achieve higher performance.

3.: The Arabic Language course showed no significant relationship

Unlike the other two courses, temporal engagement patterns did not significantly predict performance in Arabic Language. This may suggest that assessments in this course depend more on linguistic background, prior knowledge, or memorization instead of study timing or consistency of engagement.

4.: General behavioral patterns across courses

Despite variation in cluster meanings, several consistent trends emerged:

Early and steady engagement was generally associated with higher performance (observed in two courses) (Peach et al., 2019 [21]).
Heavy late engagement was often linked to lower outcomes.
High weekend or night activity was not necessarily negative; in some cases (e.g., Palestinian Studies), it corresponded with strong academic achievement, especially in theoretical or reading-heavy courses [6].

5.: The “middle period” was nearly inactive across all courses

The middle_ratio was approximately zero for nearly all students, indicating minimal engagement during the mid-semester weeks. This suggests a front-loaded and back-loaded pattern of interaction, where students engage heavily at the beginning and end of the course but reduce activity mid-term. This aligns with established theories of self-regulated learning, distributed practice, and cognitive load management.

To sum up, temporal engagement, across the three courses, demonstrated varying relationships with academic performance. Research Methods and Palestinian Studies showed significant effects (p < 0.01), whereas no meaningful association was observed in Arabic Language (p > 0.7). Structured, early-focused learners performed best in two of the three courses, while procrastination-heavy clusters generally scored lower, with some exceptions. These results highlight that temporal engagement behaviors are course-dependent and should not be generalized across learning contexts.

4.4. LLM Zero-Shot Prediction Results

GPT responded with interpretive explanations linking behavioral patterns to performance expectations. From each generated response, the predicted tier (High, Medium, or Low) was extracted and compared against the true label in the dataset to calculate prediction accuracy across all courses. The evaluation was conducted on three courses: Research Methods, Arabic Language, and Palestinian Studies, as shown in Table 7.

Although the model’s quantitative performance was weak, achieving an overall accuracy of approximately 47.4%, the qualitative reasoning generated by GPT was coherent, structured, and human-like. This aligns with findings that LLMs generate human-like narrative explanations even when their quantitative predictions are unreliable (Bubeck et al., 2023 [23]).

The model frequently associated early and consistent engagement with higher academic outcomes and interpreted fragmented or imbalanced temporal patterns as indicators of lower or moderate achievement. To better understand GPT’s reasoning behavior, three representative cases were analyzed from the Research Methods dataset:

Case 1: Correct Medium Prediction (Structured Early Learner)

Behavioral profile: Early = 0.93, Middle = 0.00, Late = 0.07, Weekend ratio: 0.03, Night ratio: 0.10, Total events: 576

True Tier: Medium/Predicted Tier: Medium

This student showed a highly structured pattern, with 93% of activity occurring in the early period and very little late or weekend engagement. GPT correctly classified this learner as Medium and justified the decision by emphasizing strong early engagement and a sufficiently high number of events, while noting the lack of sustained activity later in the course. This case illustrates that when temporal patterns align with GPT’s internal heuristic (early = good, late gaps = risk), the model can produce reasonable and well-argued predictions.

Case 2: Overestimated Prediction (Procrastination/Late-Heavy Profile)

Behavioral profile: Early = 0.59, Middle = 0.00, Late = 0.41, Weekend = 0.07, Night = 0.18, Total events = 938

True Tier: Medium/Predicted Tier: High

This student combined moderate early activity (0.59) with a very high late ratio (0.41) and substantial total engagement (938 events). GPT focused mainly on the overall volume of activity and interpreted the extensive interaction with the platform as evidence of strong commitment, assigning a high-performance tier. However, the true label was medium. This mismatch shows that GPT sometimes over-weights activity quantity and under-weights the risks associated with heavy late engagement and possible last-minute behavior.

Case 3: Underestimated High Performer (Strong Early Engagement)

Behavioral profile: Early = 0.97, Middle = 0.00, Late = 0.03, Weekend = 0.00, Night = 0.12, Total events = 567

True Tier: High/Predicted Tier: Medium

This student exhibited extremely high early engagement (0.97) with very low late and weekend activity, and a relatively large number of events (567). GPT acknowledged the strong early participation but downplayed it due to the absence of mid- and late-course activity, arguing that a lack of “balanced” engagement could limit final achievement. It therefore predicted the medium tier, whereas the student belonged to the high group. This case illustrates GPT’s bias toward evenly distributed activity and its tendency to penalize unbalanced but still effective temporal strategies.

To contextualize GPT performance, additional baseline comparisons were conducted. A majority-class classifier was computed for each course based on the most frequent performance tier, yielding baseline accuracies ranging between 34% and 38%. A multinomial logistic regression model using the same temporal features achieved accuracies of 62% (Research Methods), 54% (Arabic Language), and 58% (Palestinian Studies). These results demonstrate that traditional supervised models outperform the zero-shot LLM approach for structured numerical feature prediction.

The experiment was intentionally conducted in a strict zero-shot setting (temperature = 0) to evaluate raw reasoning capacity without calibration. Few-shot prompting and structured numeric formatting experiments were not included in this study, as the goal was not to optimize LLM accuracy but to examine whether descriptive temporal summaries alone enable meaningful performance inference. Future research may explore few-shot calibration, structured numeric encoding, or hybrid neuro-symbolic approaches to improve predictive reliability.

These comparisons confirm that while GPT provides coherent interpretive reasoning, it does not match conventional supervised models in predictive performance. Therefore, the LLM component should be viewed as an exploratory interpretive layer rather than a replacement for statistical or machine learning classifiers.

5. Discussion

The findings of this study offer valuable insights into how learning behaviors over time influence academic performance in various undergraduate courses. Results showed notable differences in how student engagement affected learning outcomes. This demonstrates that temporal analytics carry explanatory power, but the pedagogical nature of each course shapes their impact, a finding consistent with studies of online learning engagement that emphasize the predictive value of behavioral log data and time-of-day usage [24,25].

Across two courses, Research Methods and Palestinian Studies, patterns of engagement over time were significantly connected to final grades. In both cases, early and organized engagement was generally linked to better performance. Students who used Moodle earlier in the semester and kept a consistent pattern of access tended to do better academically. This aligns with principles of self-regulated learning and spaced practice. This aligns with the findings of [5,6], who show that earlier and consistent LMS engagement is linked to better performance.

The Arabic Language course, however, exhibited no statistically significant differences between clusters. Although the clusters represented different temporal behaviors, these patterns did not translate into meaningful grade variations. This suggests that performance in this course may depend more on prior linguistic proficiency, memorization, or skill-based learning rather than the timing of platform activity. Such findings reinforce the idea that temporal engagement is not universally predictive; instead, its relevance depends on the nature of the learning tasks and assessment design.

Another important outcome is that the meaning of clusters changed between courses. K-Means assigns cluster labels (0, 1, 2) randomly, so we must always interpret each group’s meaning based on the actual time-related feature values. In Research Methods, the highest-performing group showed highly structured, early engagement. In contrast, in Palestinian Studies, the top-performing group showed heavy night and weekend activity. This indicates that effective learning behaviors can differ by course context, a finding consistent with [26], who emphasizes that engagement strategies and path-dependency vary across course types. For theoretical subjects, flexible late-night study may still be effective, while method-focused courses may require steady, time-distributed engagement for optimal performance [27].

Although the statistical and clustering analyses offered useful insights into how engagement patterns over time connect to academic performance, these methods depend on standard numerical modeling and need clear feature engineering and statistical testing. Recent advances in Generative AI, particularly LLMs, offer a fundamentally different approach; the ability to perform reasoning and classification directly from descriptive input without model training [28].

Despite this interpretability, GPT struggled to accurately differentiate between performance tiers, particularly in courses where student activity patterns were highly homogeneous, such as Arabic Language and Palestinian Studies. In contrast, the Research Methods dataset, which contained greater behavioral variation, resulted in more meaningful but still limited alignment between predicted and true labels. Overall, the experiment confirms that GPT, in its zero-shot form, is not suitable as a standalone predictive model for structured numeric features, as LLMs are known to struggle with numerical abstraction and structured feature-based prediction, performing better in linguistic reasoning than quantitative modeling [29]; however, it offers valuable explanatory insights. Its narrative-driven interpretations reveal how temporal behaviors may be understood by a human-like reasoning system, complementing rather than replacing traditional statistical and machine learning approaches. Recent studies in educational data mining similarly note that LLMs provide interpretive value but remain less reliable than statistical or machine-learning models for predictive accuracy [30].

6. Conclusions

This study investigated temporal learning behaviors across three undergraduate courses including Arabic Language, Fundamentals of Research Methods, and Palestinian Studies to evaluate how patterns of student interaction with Moodle relate to academic performance. By extracting temporal engagement features (early, middle, late, weekend, and night activity) and applying K-Means clustering independently for each course, the analysis revealed clear behavioral patterns as well as meaningful differences in how these behaviors aligned with academic outcomes.

Across all courses, the temporal features effectively separated students into different engagement profiles, such as highly structured early learners, moderately consistent learners, and procrastinated or late-night learners. However, the link between these behavioral groups and academic performance varied among the courses.

In the Arabic Language course, temporal behaviors did not significantly predict performance, as shown by high p-values (ANOVA p = 0.99; Kruskal–Wallis p = 0.79). This suggests that assessments in this course may depend more on prior linguistic competence or memorization rather than study timing.

In contrast, the Research Methods and Palestinian Studies courses demonstrated strong statistically significant differences between clusters (p < 0.01), indicating that temporal engagement patterns meaningfully influenced student achievement. In these courses, structured early engagement was generally associated with higher academic performance, supporting established theories of self-regulated learning and distributed practice. Interestingly, in Palestinian Studies, a cluster characterized by high weekend and night activity achieved the highest grades, suggesting that certain types of courses may allow for effective, flexible, or condensed learning strategies.

Beyond statistical clustering, the study also looked into using GPT as a zero-shot predictor of student performance. While GPT’s numeric accuracy was modest (overall accuracy of 47–51%), the model generated clear and understandable explanations for its predictions. These narrative interpretations showed how the model reasons about temporal engagement. They offered an additional qualitative viewpoint that traditional models do not provide. Therefore, GPT, in its zero-shot form, should not be seen as a dependable predictive model. However, it can function as an analytical tool for interpretation.

This study demonstrates that temporal learning behaviors offer valuable insight into student engagement and academic outcomes, particularly in courses requiring continuous analytical or reflective work. The findings highlight the pedagogical importance of encouraging sustained engagement across the semester and suggest opportunities for early identification of at-risk students using temporal analytics. Future work may integrate behavioral features with additional indicators such as demographic, cognitive, or textual engagement data and explore more advanced machine learning and fine-tuned LLM approaches to enhance prediction accuracy.

Author Contributions

Conceptualization, A.E. and H.I.A.; methodology, A.E., H.I.A. and W.S.; software, W.S. and A.E.; validation, H.I.A. and W.S.; formal analysis, H.I.A. and W.S.; investigation,. I.H. and A.E.; resources, A.E.; data curation, W.S.; writing—original draft preparation, W.S., A.E. and H.I.A.; writing—review and editing, A.E. and I.H.; visualization, W.S. and H.I.A.; supervision, I.H. and A.E.; project administration, A.E. and H.I.A. and I.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to other studies still being underway.

Acknowledgments

The authors would like to thank Arab American University, Palestine, for supporting the authors to obtain the dataset used in this study. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rotelli, D.; Monreale, A. Processing and Understanding Moodle Log Data and Their Temporal Dimension. J. Learn. Anal. 2023, 10, 126–141. [Google Scholar] [CrossRef]
Khan, M.; Naz, S.; Khan, Y.; Zafar, M.; Khan, M.; Pau, G. Utilizing Machine Learning Models to Predict Student Performance from LMS Activity Logs. IEEE Access 2023, 11, 86953–86962. [Google Scholar] [CrossRef]
Romero, C.; Ventura, S. Educational Data Mining and Learning Analytics: An Updated Survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1355. [Google Scholar] [CrossRef]
McKenna, B.A.; Wehr, J.B.; Kopittke, P.M. Quantifying Online Engagement at Three Levels of Undergraduate Study. Cogent Educ. 2024, 11, 2345939. [Google Scholar] [CrossRef]
Saqr, M.; López-Pernas, S.; Helske, S.; Hrastinski, S. The Longitudinal Association between Engagement and Achievement Varies by Time, Students’ Profiles, and Achievement State: A Full Program Study. Comput. Educ. 2023, 199, 104787. [Google Scholar] [CrossRef]
Sher, V.; Hatala, M.; Gašević, D. When Do Learners Study?: An Analysis of the Time-of-Day and Weekday-Weekend Usage Patterns of Learning Management Systems from Mobile and Computers in Blended Learning. J. Learn. Anal. 2022, 9, 1–23. [Google Scholar] [CrossRef]
Lee, J.; Rew, J. Memory-Augmented Large Language Model for Enhanced Chatbot Services in University Learning Management Systems. Appl. Sci. 2025, 15, 9775. [Google Scholar] [CrossRef]
Buschetto Macarini, L.A.; Cechinel, C.; Batista Machado, M.F.; Faria Culmant Ramos, V.; Munoz, R. Predicting Students Success in Blended Learning—Evaluating Different Interactions inside Learning Management Systems. Appl. Sci. 2019, 9, 5523. [Google Scholar] [CrossRef]
Papamitsiou, Z.; Economides, A.A. Temporal Learning Analytics Visualizations for Increasing Awareness during Assessment. Int. J. Educ. Technol. High. Educ. 2015, 12, 129–147. [Google Scholar] [CrossRef]
Kim, S.; Cho, S.; Kim, J.Y.; Kim, D.-J. Statistical Assessment on Student Engagement in Asynchronous Online Learning Using the K-Means Clustering Algorithm. Sustainability 2023, 15, 2049. [Google Scholar] [CrossRef]
Shawareb, N.; Ewais, A.; Dalipi, F. Utilizing Data Mining Techniques to Predict Students Performance Using Data Log from MOODLE. KSII Trans. Internet Inf. Syst. 2024, 18, 2564–2588. [Google Scholar] [CrossRef]
You, J.W. Identifying Significant Indicators Using LMS Data to Predict Course Achievement in Online Learning. Internet High. Educ. 2016, 29, 23–30. [Google Scholar] [CrossRef]
Dobashi, K.; Ho, C.P.; Fulford, C.P.; Higa, C.; Hara, K. Real-Time In-Class Data Mining of Moodle Click Logs for Learning Pattern Classification and Outlier Detection. In Proceedings of the International Conference on Intelligent Computing; Springer: Berlin/Heidelberg, Germany, 2025; pp. 213–224. [Google Scholar]
Dalipi, F.; Imran, A.S.; Kastrati, Z. MOOC Dropout Prediction Using Machine Learning Techniques: Review and Research Challenges. In Proceedings of the 2018 IEEE Global Engineering Education Conference (EDUCON); IEEE: New York, NY, USA, 2018; pp. 1007–1014. [Google Scholar]
Tempelaar, D.; Nguyen, Q.; Rienties, B. Learning Analytics and the Measurement of Learning Engagement. In Adoption of Data Analytics in Higher Education Learning and Teaching; Springer: Berlin/Heidelberg, Germany, 2020; pp. 159–176. [Google Scholar]
Hsu, C.-Y.; Horikoshi, I.; Li, H.; Majumdar, R.; Ogata, H. Supporting “Time Awareness” in Self-Regulated Learning: How Do Students Allocate Time during Exam Preparation? Smart Learn. Environ. 2023, 10, 21. [Google Scholar] [CrossRef]
Neshaei, S.P.; Davis, R.L.; Hazimeh, A.; Lazarevski, B.; Dillenbourg, P.; Käser, T. Towards Modeling Learner Performance with Large Language Models. arXiv 2024, arXiv:2403.14661. [Google Scholar] [CrossRef]
Hayat, A.; Khan, B.; Hasan, M.R. Leveraging Language Models for Analyzing Longitudinal Experiential Data in Education. arXiv 2025, arXiv:2503.21617. [Google Scholar] [CrossRef]
Johar, N.A.; Kew, S.N.; Tasir, Z.; Koh, E. Learning Analytics on Student Engagement to Enhance Students’ Learning Performance: A Systematic Review. Sustainability 2023, 15, 7849. [Google Scholar] [CrossRef]
Mize, M.; Park, Y.; Carter, A. Technology-based Self-monitoring System for On-task Behavior of Students with Disabilities: A Quantitative Meta-analysis of Single-subject Research. J. Comput. Assist. Learn. 2022, 38, 668–680. [Google Scholar] [CrossRef]
Peach, R.L.; Yaliraki, S.N.; Lefevre, D.; Barahona, M. Data-Driven Unsupervised Clustering of Online Learner Behaviour. NPJ Sci. Learn. 2019, 4, 14. [Google Scholar] [CrossRef]
Jamil, M.A.; Khanam, S. Influence of One-Way ANOVA and Kruskal–Wallis Based Feature Ranking on the Performance of ML Classifiers for Bearing Fault Diagnosis. J. Vib. Eng. Technol. 2024, 12, 3101–3132. [Google Scholar] [CrossRef]
Bubeck, S.; Chandrasekaran, V.; Eldan, R.; Gehrke, J.; Horvitz, E.; Kamar, E.; Lee, P.; Lee, Y.T.; Li, Y.; Lundberg, S. Sparks of Artificial General Intelligence: Early Experiments with Gpt-4. arXiv 2023, arXiv:2303.12712. [Google Scholar] [CrossRef]
Rachel, B.; Xu, D.; Park, J.; Yu, R.; Li, Q.; Cung, B.; Fischer, C.; Rodriguez, F.; Warschauer, M.; Smyth, P. The benefits and caveats of using clickstream data to understand student self-regulatory behaviors: Opening the black box of learning processes. Int. J. Educ. Technol. High. Educ. 2020, 17, 13. [Google Scholar] [CrossRef]
Spitzer, M.W.H.; Gutsfeld, R.; Wirzberger, M.; Moeller, K. Evaluating Students’ Engagement with an Online Learning Environment during and after COVID-19 Related School Closures: A Survival Analysis Approach. Trends Neurosci. Educ. 2021, 25, 100168. [Google Scholar] [CrossRef] [PubMed]
Getman, A.; Boitcov, M.; Adamovich, K.; Costley, J. The Role of Engagement Strategies and Path-Dependency in Online Learning. Innov. Educ. Teach. Int. 2025, 62, 1305–1319. [Google Scholar] [CrossRef]
Gao, C.; Terlizzese, T.; Scullin, M.K. Short Sleep and Late Bedtimes Are Detrimental to Educational Learning and Knowledge Transfer: An Investigation of Individual Differences in Susceptibility. Chronobiol. Int. 2019, 36, 307–318. [Google Scholar] [CrossRef]
Xu, P.; Liu, J.; Jones, N.; Cohen, J.; Ai, W. The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education. arXiv 2024, arXiv:2404.02444. [Google Scholar] [CrossRef]
Floridi, L.; Chiriatti, M. GPT-3: Its Nature, Scope, Limits, and Consequences. Minds Mach. 2020, 30, 681–694. [Google Scholar] [CrossRef]
Kasneci, E.; Seßler, K.; Küchemann, S.; Bannert, M.; Dementieva, D.; Fischer, F.; Gasser, U.; Groh, G.; Günnemann, S.; Hüllermeier, E. ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education. Learn. Individ. Differ. 2023, 103, 102274. [Google Scholar] [CrossRef]

Figure 1. Overview of the sequential stages involved in the proposed methodological workflow.

Figure 2. Arabic Language course Boxplot (Grade Distribution by Cluster).

Figure 3. Boxplot of Fundamentals of Research Methods course (Grade Distribution by Cluster).

Figure 4. Palestinian Studies course Boxplot (Grade Distribution by Cluster).

Table 1. Temporal Feature Averages for Arabic Language course.

Cluster	Early	Late	Weekend	Night
0	0.871	0.129	0.165	0.166
1	0.925	0.075	0.025	0.037
2	0.691	0.309	0.069	0.099

Table 2. Grade Statistics Across Clusters for Arabic Language Course.

Cluster	Mean Grade	Median	Count
0	57.95	59	19
1	57.72	58	176
2	57.83	57.5	46

Table 3. Temporal Feature Averages for Research Methods.

Cluster	Early	Late	Weekend	Night
0	0.737	0.263	0.084	0.231
1	0.918	0.082	0.015	0.195
2	0.932	0.068	0.129	0.143

Table 4. Grade Statistics Across Clusters for Fundamentals of Research Methods Course.

Cluster	Mean Grade	Median	Count
0	86.08	86.5	24
1	81.79	83	95
2	87.32	90	38

Table 5. Temporal Feature Averages for Palestinian Studies.

Cluster	Early	Late	Weekend	Night
0	0.620	0.380	0.147	0.163
1	0.929	0.071	0.023	0.056
2	0.881	0.119	0.354	0.402

Table 6. Grade Statistics Across Clusters for Palestinian Studies Course.

Cluster	Mean Grade	Median	Count
0	69.99	70	47
1	70.51	71	110
2	72.32	73	18

Table 7. Zero-shot LLM prediction accuracy (%) across courses based on temporal engagement summaries.

Course	Accuracy (%)
Research Methods	51.0%
Arabic Language	45.2%
Palestinian Studies	46.0%
Overall Accuracy	47.4%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shehada, W.; Ashqar, H.I.; Ewais, A.; Hatzilygeroudis, I. Temporal Modeling of LMS Logs and Zero-Shot LLM Prediction: A Multi-Course Study in Moodle. Appl. Sci. 2026, 16, 2707. https://doi.org/10.3390/app16062707

AMA Style

Shehada W, Ashqar HI, Ewais A, Hatzilygeroudis I. Temporal Modeling of LMS Logs and Zero-Shot LLM Prediction: A Multi-Course Study in Moodle. Applied Sciences. 2026; 16(6):2707. https://doi.org/10.3390/app16062707

Chicago/Turabian Style

Shehada, Wala’a, Huthaifa I. Ashqar, Ahmed Ewais, and Ioannis Hatzilygeroudis. 2026. "Temporal Modeling of LMS Logs and Zero-Shot LLM Prediction: A Multi-Course Study in Moodle" Applied Sciences 16, no. 6: 2707. https://doi.org/10.3390/app16062707

APA Style

Shehada, W., Ashqar, H. I., Ewais, A., & Hatzilygeroudis, I. (2026). Temporal Modeling of LMS Logs and Zero-Shot LLM Prediction: A Multi-Course Study in Moodle. Applied Sciences, 16(6), 2707. https://doi.org/10.3390/app16062707

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Temporal Modeling of LMS Logs and Zero-Shot LLM Prediction: A Multi-Course Study in Moodle

Abstract

1. Introduction

2. Background

3. Methodology

3.1. Data Collection

3.2. Preprocessing and Feature Engineering

3.2.1. Temporal Segmentation Algorithm

3.2.2. Temporal Feature Engineering

3.3. Hybrid Processing Model

3.3.1. Clustering of Temporal Behaviors

3.3.2. Statistical Analysis

3.3.3. LLM-Based Zero-Shot Prediction

4. Experimental Evaluation

4.1. Experimental Protocol

4.2. Clustering and Statistical Results by Course

4.2.1. Linguistic Course

4.2.2. Fundamentals of Research Methods Course

4.2.3. Social Science Course

4.3. Cross-Course Analysis of Temporal Behaviors

4.4. LLM Zero-Shot Prediction Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI