Predicting Learner Contributions in MOOC Learning Forums Using the Hidden Markov Model

Bing Wu; Ruodan Xie

doi:10.3390/app16020881

and

School of Economics and Management, Laboratory of High Quality Urban Development and Strategic Decision, Tongji University, Shanghai 200092, China

^*

Author to whom correspondence should be addressed.

Appl. Sci.2026, 16(2), 881;https://doi.org/10.3390/app16020881

Version Notes

Order Reprints

Abstract

Learner engagement is a pivotal factor affecting the effectiveness of Massive Open Online Courses (MOOCs), as it promotes collaborative learning environments. However, measuring the extent of learners’ contributions in MOOC learning forums presents challenges due to the complex nature of engagement and its variability. Given the limited research in this domain, further investigation is necessary. This study aims to address this gap by utilizing the Hidden Markov Model (HMM) to identify latent states of MOOC learners and improve their participation in learning forums. The study constructs a multidimensional observable signal sequence based on learner-generated post data from MOOC forums, with a particular focus on the widely attended course on a MOOC platform. To evaluate the predictive accuracy of HMM in forecasting learner contributions, the study employs several prominent prediction models for comparative analysis, including k-nearest neighbor, logistic regression, random forest, extreme gradient boosting tree, and the long short-term memory network. The results demonstrate that HMM provides superior accuracy in predicting learner contributions compared to other models. These findings not only validate the effectiveness of HMM but also offer significant insights and recommendations for enhancing forum management practices. This research represents a substantial advancement in addressing the challenges related to learner engagement in MOOC learning forums and underscores the potential benefits of employing the HMM approach in this context.

Keywords:

MOOC learning forums; Hidden Markov Model; learner contribution; hidden state

1. Introduction

Massive Open Online Courses (MOOCs) have revolutionized the traditional educational paradigm by democratizing access to learning for a diverse global audience. The increasing popularity of MOOCs in recent years can be attributed to their cost-effectiveness and convenience [1]. These platforms have introduced innovative teaching and business models that have significantly advanced digital educational environments, reshaping the landscape of online learning.

Central to the educational experience within MOOCs is the exchange of knowledge. Learning forums play a crucial role in facilitating communication and idea exchange among learners, fostering an interactive learning community [2]. Effective communication not only enhances the acquisition of knowledge but also promotes meaningful connections among learners.

However, despite the potential benefits offered by MOOC learning forums, learners frequently demonstrate hesitancy to actively engage, leading to reduced levels of participation [3]. This phenomenon aligns with the observed “1-9-90” principle, where 1% of users initiate discussions, 9% engage through comments, and the remaining 90% passively consume content [4]. Such behavioral dynamics contribute to diminished interaction and involvement among learners [5], ultimately constraining the diversity of perspectives and viewpoints within course forums. Consequently, understanding and addressing these dynamics have become pivotal areas of focus in online learning research.

Current research on MOOC learning forums predominantly focuses on analyzing learner interactions through sentiment analysis and topic extraction from their posts. This analytical approach integrates factors such as learner characteristics, post attributes, and interaction behaviors to investigate the mechanisms of knowledge dissemination within MOOC learning forums [6,7]. However, a standardized definition of learner contribution in these forums is currently lacking. In this study, learner contribution is defined as topic initiation, replies, sub-replies, and likes—observable actions within the MOOC learning forum.

It is noteworthy that existing research primarily examines these observable contribution behaviors, which are influenced by underlying hidden states of learners. As learners engage in interactive courses on MOOC platforms, their hidden states evolve dynamically over time. Consequently, learners initially displaying negative states may transition to positive states, resulting in varying levels of contribution among individuals with diverse hidden states. Notably, the hidden state of learners in MOOC learning forums cannot be directly measured [8], underscoring the necessity for research to model and quantitatively assess these latent states.

The Hidden Markov Model (HMM) has been advocated as a potent method for modeling the contributions and latent states of learners in online forums, owing to its mathematical properties that align with observable contributions and hidden states of participants [9]. However, its application within the domain of MOOC learning forums remains constrained.

The aim of this study is to explore the latent states of learners in MOOC learning forums and examine how their dynamic evolution influences observable contributions. To achieve this objective, the course forum from the widely recognized MOOC platform Coursera, specifically the “Machine Learning” course, was selected as the primary data source for learner posts. The HMM was employed to predict learner contributions within the course forum. Additionally, benchmark models such as k-nearest Neighbor Regression (kNN) [10], logistic regression [11], Random Forest [12], extreme gradient boosting (XGBoost) [13], and long short-term memory (LSTM) networks [14] were included for comparative analysis, given their common use in predictive tasks within online forums.

By comparing the predictive outcomes of HMM with those derived from benchmark models, this study empirically validated the efficacy of HMM in forecasting learner contributions. Leveraging the dual stochastic processes inherent in HMM, the research investigated both the latent states and observable contributions of learners within the MOOC learning forum, offering theoretical insights into the underlying mechanisms that influence diverse learner engagements. Additionally, from a practical standpoint, the study aimed to optimize management strategies within the MOOC learning environment by accurately anticipating learner contributions, thereby enhancing the overall learning experience for participants [15].

2. Literature Review

This study begins with a thorough review of theoretical foundations related to learners’ contributions and their latent states within MOOC learning forums. This foundational understanding is imperative for establishing the conceptual framework of the research. Subsequently, the HMM and its pertinent applications are meticulously examined as the designated analytical methodology for this investigation.

2.1. Learner Contributions in MOOC Learning Forums

MOOC learning forums have become a foundational component of MOOCs [2], offering learners a crucial platform to exchange learning experiences and acquire the necessary knowledge and skills to navigate challenges encountered during their educational journey. However, a notable proportion of participants in these forums exhibit passive behavior, predominantly extracting information for personal benefit without actively contributing, often assuming the role of “lurkers”. This phenomenon presents a significant challenge to the sustained vitality of MOOC learning forums [16].

The viability and advancement of many MOOC learning forums currently hinge largely on the altruistic motivations of participating learners. Despite the immediate practical benefits of active engagement not always being apparent, learners can amass valuable knowledge and experience through their contributions. Such active participation ultimately enhances their sense of identity and fulfillment within the MOOC learning environment [17].

The interactivity of online social networking plays a pivotal role in influencing learners’ contributions within MOOC learning forums. Several key factors have been identified as significant in shaping these contributions, including the quality and interactivity of posts, peer recognition, and the scale of the course forum [18]. Both post interactivity and peer recognition notably enhance learners’ engagement. The encouragement and recognition received from peers serve as intrinsic motivators, inspiring individuals to participate more actively and contribute meaningfully to the courses. Additionally, guided by the principle of reciprocity, high-quality posts often prompt valuable responses from other participants [19,20].

Additionally, the size of MOOC learning forums has been observed to correlate positively with learners’ contributions, as larger forums tend to attract a greater number of active participants. This relationship underscores the importance of community scale in fostering robust and dynamic participation within these forums. Furthermore, insights derived from graph theory offer valuable perspectives on the relationship between learners’ characteristics and their contributions. By encompassing elements such as topic posts, replies, sub-replies, and likes, graph theory provides a more nuanced comprehension of the intricate dynamics of learner engagement in MOOC learning forums [7].

Given the comprehensive analysis of factors influencing contributions in MOOC learning forums, the accurate prediction of learners’ contributions holds significant importance. This predictive capability can assist forum managers in proactively formulating incentive strategies to enhance learner participation and can also aid teachers in refining their teaching methods for more effective delivery of MOOCs.

To achieve this, a machine learning model has been developed that segments learners’ post contents into topics and subtopics, enabling a comprehensive investigation of the influence of post sequence on learners’ contributions within MOOC learning forums [21]. Various machine learning models, including logistic regression, random forests, decision trees, and AdaBoost, have been employed to forecast user contributions in online forums. This exploration aims to determine the potential for predicting future posts based on users’ initial post experiences, individual preferences, and group preferences. Moreover, Latent Dirichlet allocation (LDA) topic modeling has been utilized to predict users’ subsequent behaviors on the network platform, leading to the development of a dynamic model for predicting long-term user behavior [22,23]. Additionally, HMMs have been leveraged for predicting user contributions in online forums, exhibiting superior performance compared to other machine learning models [4].

While learners’ contributions in MOOC learning forums manifest directly through their posts, it is crucial to acknowledge that these observable contributions are contingent upon the hidden states of the learners.

2.2. The Hidden State of Learners in MOOC Learning Forums

The hidden state of learners pertains to their underlying state of engagement while participating in MOOC learning forums, which remains unobservable and may undergo potential evolution over time. Depending on this hidden state, learners may demonstrate varying observable contributions. Thus, gaining insights into and comprehending the hidden state of learners becomes critical for accurately interpreting their visible behaviors and predicting their future actions within the forum.

Statistics from online forums indicate that approximately 1% of users actively generate content, 9% participate actively in discussions, while the majority, about 90%, primarily engage as passive observers known as lurkers [4]. Lurkers typically make minimal contributions and maintain a passive role within the forum. Psychological factors linked to lurking behavior include limited awareness of community dynamics, fear of self-expression, concerns regarding privacy exposure, absence of incentive structures, and tendencies towards procrastination, as identified in empirical studies [24]. It is important to note, however, that while many users in online forums tend to lurk with limited participation, certain individuals have the potential to transition from this state and become active contributors in the future, thereby significantly enhancing their engagement within the forums [25].

By modeling the hidden state of learners in MOOC learning forums and understanding the underlying transformative mechanisms, it becomes possible to identify learners who are likely to evolve into active contributors over time. This approach facilitates effective management of learners with diverse hidden states, ultimately leading to increased contributions in MOOC learning forums.

In exploring learners’ hidden states within MOOC learning forums, previous researchers have traditionally employed questionnaires and interviews to investigate this elusive construct. However, from a psychological perspective, the measurement of emotions has emerged as a valuable approach for assessing the impact of varying levels of user engagement on these hidden states. Empirical studies have underscored that individual traits such as extroversion, emotional stability, and agreeableness exert significant influence on learners’ hidden states, enabling differentiation between users with low and high levels of participation [26]. Furthermore, beyond examining learners’ behavioral patterns manifested in their posts, the content of these contributions critically shapes their engagement states [27,28]. Building on these findings, targeted intervention strategies can be devised to enhance learners’ inclination to actively participate in MOOC learning forums, thereby promoting more dynamic and interactive engagements [29].

However, given the substantial number of learners in MOOC learning forums, conducting surveys or interviews for each individual learner poses challenges due to time constraints and the evolving nature of learners’ hidden states over time. Consequently, capturing the dynamic characteristics of these hidden states through questionnaire- or interview-based approaches becomes inherently challenging in the context of MOOC learning forums. Fortunately, the HMM presents a promising solution by facilitating dynamic analysis of both hidden states and observable contributions at the individual level within MOOC learning forums.

2.3. Application of HMM

The HMM operates under the foundational assumption that the research subject possesses a latent, or hidden, state which evolves according to a Markov process within a statistical framework [30]. A defining characteristic of a Markov process is its property where future states depend solely on the current state, independent of past states. Thus, the hidden state in any subsequent period is contingent upon the hidden state in the current period.

The application of HMM in online forum research has garnered attention, particularly in analyzing observable contributions of users through their posts. Transition probability in HMM refers to the likelihood of transitions between hidden states of learners, while emission probability signifies the likelihood of users generating diverse contributions from these hidden states [31]. By employing HMM to model user posts in knowledge-sharing forums, researchers can discern not only the hidden states of users and the probabilities of transitioning between them, but also the probabilities of users making varied contributions from different hidden states. Consequently, this facilitates predicting future user contributions based on HMM, thereby enhancing prediction accuracy.

Building on this framework, empirical observations indicate that users in distinct hidden states within online forums respond differently to various incentive strategies, influencing their contributions in diverse manners [32].

2.4. Research Gaps and Opportunities

In MOOC learning forums, learners’ explicit contributions are readily observable; however, these contributions may not necessarily reflect the underlying hidden states of the learners. It is possible that reserved learners have a significant likelihood of transitioning to a positive state and subsequently making positive contributions in the future. Nevertheless, the hidden state of the learner remains unobservable.

Furthermore, through a comprehensive analysis of the hidden states of learners within MOOC learning forums and implementing appropriate incentive measures, it becomes plausible to effectively enhance learners’ motivation to contribute, thereby mitigating the issue of learner disengagement stemming from negative states. While HMMs have been employed for predicting user contributions in online forums [4], their application in analyzing learners’ contributions specifically within MOOC environments remains relatively limited. Consequently, current research in MOOC learning forums lacks an integrated exploration of observable contributions and hidden states to dynamically predict learners’ engagements.

Addressing this research gap concerning the dynamics of learners’ contributions in MOOC learning forums could offer invaluable insights into understanding learner engagement. By combining the analysis of observable contributions with the modeling of hidden states using the HMM, this research can develop a more comprehensive framework for dynamically predicting and understanding learners’ contributions in MOOC learning forums. Such an integrated approach holds promise for significantly advancing our comprehension of learner engagement and guiding the design of more effective interventions to foster and enhance learners’ participation in online learning communities, thereby potentially enhancing the quality and efficacy of online learning experiences.

3. Research Methodology

Each participant in the MOOC learning forum demonstrates a diverse range of motivations and objectives [29]. Some learners primarily use these forums to gather information, while others actively contribute by sharing knowledge and insights with their peers. Furthermore, the intrinsic motivation of participants in these learning forums experiences dynamic fluctuations over time. Individuals who initially engage in knowledge dissemination may gradually decrease their involvement and take on a more passive role as observers, whereas those who start as observers seeking information may evolve into active contributors. Therefore, the subsequent latent disposition of participants in the MOOC learning forum is contingent solely upon their current state, consistent with the homogeneous Markov assumption of the HMM [30].

3.1. Contribution Sequence in MOOC Learning Forums

HMMs, as probabilistic models for analyzing time series data, have evolved from the Markov model, which represents a random process. An HMM characterizes a stochastic sequence of unobservable states generated by a hidden Markov chain, referred to as the state sequence. Each state in this sequence then produces an observation, resulting in an observable random sequence known as the observation sequence. Assuming a finite set of hidden states denoted as

S = {S_{1}, \dots, S_{k}}

, and a corresponding set of observable emissions associated with each hidden state defined as

O = {O_{1}, \dots, O_{k}}

, the hidden state at any given time

t

can be represented as

S_{t} \in S

, while the observable emissions can be defined as

O_{t} \in O

.

In the context of MOOC learning forums [33], learner contributions can be categorized into four distinct types: initiating topics of discussion by creating one or more topic posts (initiate), selecting one or more topic posts in the forum to reply to (reply), choosing one or more replies in the forum to respond to (sub-reply), and passive browsing of information without active participation (lurk). To capture the varying importance and influence of these four types of contributions, arranged in ascending order of significance, the set of possible learner contributions in the MOOC learning forum can be established as

O = {l u r k, s u b r e p l y, r e p l y, i n i t i a t e}

. Consequently, the contribution sequence for a specific learner, denoted as learner

i

, can be represented as shown in Equation (1), where

t_{j}

represents the instructional week of the course.

O_{i, t_{j}} = (\begin{matrix} \begin{matrix} O_{1, t_{1}} & O_{1, t_{2}} & \dots & O_{1, t_{j}} \\ O_{2, t_{1}} & O_{2, t_{2}} & \dots & O_{2, t_{j}} \\ \dots & \dots & \dots & \dots \\ O_{i, t_{1}} & O_{i, t_{2}} & \dots & O_{i, t_{j}} \end{matrix} \end{matrix})

(1)

3.2. Covariates for Posts in MOOC Learning Forums

While direct observation of the hidden states linked to learners’ intrinsic motivation in the MOOC learning forum is unattainable, these states can be inferred through an analysis of the characteristics of learners’ posts. In these forums, sub-replies lack the capacity to stimulate further responses, whereas topic posts have the potential to attract likes. Accordingly, this study develops a set of covariates,

X = {X_{1}, \dots, X_{k}}

, for learners’ posts based on the instructional week of the course. At any given time t, the covariates are denoted as

X_{t} \in X

, as illustrated in Table 1.

Table 1. Covariates for learners’ post.

Feedback from participation in discussion forums has demonstrated positive outcomes for learners [29]. These outcomes can be quantified using several metrics related to forum engagement, such as the number of likes received per thread by a leaner (

x_{1}

), the number of subsequent responses garnered per thread (

x_{2}

), and the total number of responses received per thread (

x_{3}

).

Observable actions of learners encompass a variety of metrics, including the number of original posts initiated by a learner (

x_{4}

) and the posts to which a learner has responded (

x_{5}

). Additionally, temporal aspects related to learner actions include the duration in weeks since a learner’s last thread initiation (

x_{6}

) and the mean interval in weeks between a learner’s thread initiations (

x_{7}

). Notably, learners’ engagement with the popularity of a topic post, indicated by the existing count of replies to the original thread before a learner responds (

x_{8}

), is significant. Furthermore, learners’ propensity to respond to posts can be assessed through metrics such as the average number of sub-replies made by a learner to each reply (

x_{9}

), the average number of replies contributed by a learner to each original thread (

x_{10}

), and the average Henri value associated with a learner’s original threads, reply threads, and sub-reply threads (

x_{11}

). The Henri cognitive level (

x_{11}

), ranging from low to high as detailed in Table 2 [34], categorizes the cognitive depth of MOOC learning forum posts. Each level corresponds to a specific cognitive characteristic with values from 0 to 5, where higher values indicate a higher level of demonstrated cognitive engagement in the posts.

Table 2. Assignment of Henri cognitive levels for posts.

3.3. HMM Structure

In the context of the MOOC learning forum, we operate under the following assumptions. The hidden state sequence of learner

i

is denoted as

S_{i} = {S_{i 1}, S_{i 2}, \dots, S_{i N}}

, where

S_{i n} \in S

; the observable contribution sequence of learner

i

is represented as

O_{i} = {O_{i 1}, O_{i 2}, \dots, O_{i N}}

, where

O_{i n} \in O

. Furthermore, the covariate associated with a learner’s posts is specified as

X_{i} = {X_{i 1}, X_{i 2}, \dots, X_{i N - 1}}

, with

X_{i n} \in X

. As illustrated in Figure 1,

N

represents both the count of hidden states and the number of observed states.

Figure 1. The Hidden State Sequence and Observation Sequence in an HMM.

The parameters of the HMM utilized in this study are represented by

θ = (A, B, π)

. The transition probability matrix

A

delineates the likelihoods of transitioning between hidden states for the learner. Each element of matrix

A

signifies the probability of transitioning from one specific state to another. Similarly, the emission probability matrix

B

characterizes the probabilities of the learner’s observable contributions originating from each hidden state. Each element of matrix

B

denotes the probability of observing a particular observable state given the learner’s current hidden state. The initial probability vector

π

encapsulates the probabilities of the learner occupying each hidden state at the outset of the sequence. It is represented by a vector of size

1 \times N

, denoted as

π = [\begin{matrix} π_{S_{1}} & π_{S_{2}} & \dots & π_{S_{N}} \end{matrix}]

, where

\sum_{i = 1}^{N} π_{S_{i}} = 1

, for

i = 1, 2, \dots, N

. Each element indicates the initial likelihood of the learner being in hidden state

S_{i}

. Importantly, the elements of

π

collectively sum up to 1, ensuring validity in assigning probabilities to each hidden state. Initially, the learner’s hidden state is determined by the probability vector π. Subsequent transitions to other states occur randomly following the first observable contribution, guided by the transition probabilities specified in matrix

A

.

The transition between hidden states of learner

i

is governed by a defined transition probability, encapsulated within the hidden state transition probability matrix

A

. This matrix is formally expressed by Equation (2), where

\sum_{i = 1}^{N} a_{θ X_{t - 1}}^{s_{i} s_{j}} = 1

, for

i, j = 1, 2, \dots, N

. Here, each element

a_{θ X_{t - 1}}^{s_{i} s_{j}}

denotes the probability of transitioning from hidden state

s_{i}

to hidden state

s_{j}

at time

t - 1

, conditioned on the model parameters

θ = (A, B, π)

. Furthermore, the conditional probability of the learner transitioning to a specific hidden state

S_{i}

can be denoted using Equation (3). This equation represents the probability of the learner being in hidden state

S_{i}

at time

t

, given the previous hidden state and the model parameters.

A (θ, X_{t - 1}) = [\begin{matrix} a_{θ X_{t - 1}}^{s_{1} s_{1}} & a_{θ X_{t - 1}}^{s_{1} s_{2}} & \dots & a_{θ X_{t - 1}}^{s_{1} s_{N}} \\ a_{θ X_{t - 1}}^{s_{2} s_{1}} & a_{θ X_{t - 1}}^{s_{2} s_{2}} & \dots & a_{θ X_{t - 1}}^{s_{2} s_{N}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{θ X_{t - 1}}^{s_{N} s_{1}} & a_{θ X_{t - 1}}^{s_{N} s_{2}} & \dots & a_{θ X_{t - 1}}^{s_{N} s_{N}} \end{matrix}]

(2)

P (S_{i}| A) = π_{S_{1}} \prod_{t = 2}^{N} P (S_{i t}| S_{i t - 1}; A) = \prod_{t = 2}^{N} a_{θ X_{t - 1}}^{s_{i t - 1} s_{t}}

(3)

The observable contribution of learner

i

at any given time is determined solely by the current hidden state [35], indicating a direct influence of the hidden state on the observed contribution. The probabilities governing the generation of various observed contributions across different hidden states are encapsulated in the emission probability matrix

B

, as illustrated in Equation (4). Furthermore, the conditional probability of the learner contributing

O_{i}

under hidden state

S_{i}

can be represented by Equation (5). These equations effectively characterize the interplay between hidden states and observable contributions within the HMM framework employed in this study.

B = [\begin{matrix} b_{l u r k}^{s_{1}} & b_{s u b r e p l y}^{s_{1}} & b_{r e p l y}^{s_{1}} & b_{t h r e a d}^{s_{1}} \\ b_{l u r k}^{s_{2}} & b_{s u b r e p l y}^{s_{2}} & b_{r e p l y}^{s_{2}} & b_{t h r e a d}^{s_{2}} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ b_{l u r k}^{s_{N}} & b_{s u b r e p l y}^{s_{N}} & b_{r e p l y}^{s_{N}} & b_{t h r e a d}^{s_{N}} \end{matrix}]

(4)

P (O_{i}| S_{i}; B) = \prod_{t = 1}^{N} P (O_{i t}| S_{i t}; B) = \prod_{t = 1}^{N} b_{O_{i t}}^{s_{i t}}

(5)

3.4. HMM Modeling

This study employed the HMM as a modeling approach to represent both observable and hidden states of learners within a MOOC learning forum. By analyzing sequences of observable posts made by learners, the HMM effectively revealed underlying hidden states that characterize their behaviors. Furthermore, leveraging these identified hidden states, the HMM facilitated predictions regarding learners’ future contributions in the course forum. The procedural framework for applying the HMM in this specific context is outlined as follows, providing a comprehensive depiction of the research methodology.

Initially, sequences of learner contributions and covariates extracted from the MOOC learning forum were utilized as observable sequences to determine the optimal number of hidden states among learner groups. This determination was guided by employing the HMM in conjunction with the Bayesian Information Criterion (BIC), serving as a criterion for selecting the most appropriate number of hidden states [36].

Subsequently, to ensure consistency in modeling individual learners within the MOOC learning forum, the number of hidden states identified for learner groups was adopted as the number of hidden states for each individual learner. Parameter estimation for each individual learner’s HMM was then conducted using the Baum–Welch algorithm [30]. The Baum–Welch algorithm, an iterative optimization method based on the Expectation-Maximization (EM) algorithm, iteratively updates parameters using the observed sequence to enhance model effectiveness until convergence criteria are met. This iterative process ensures that HMM parameters accurately capture the unique characteristics of each learner’s contribution sequence.

Finally, the HMM, along with benchmark models such as in logistic regression, random forest, Boost, and LSTM networks, was employed to predict learner contributions in the MOOC learning forum. Prediction results were extensively compared to assess the efficacy of the HMM relative to benchmark models. This comprehensive comparative analysis provided a thorough evaluation of the HMM’s predictive performance, thereby enhancing understanding of its utility within the context of the MOOC learning forum.

3.5. Systematic Approach

We developed a systematic approach to predict learner contributions in MOOC discussion forums. Initially, forum posts were organized chronologically to construct each learner’s contribution sequence, with irrelevant content excluded to ensure data accuracy. Key learner characteristics, including personal attributes and forum activity metrics, were identified to establish a foundation for detecting behavioral patterns.

Subsequently, Henry’s cognitive framework was applied to assess the cognitive depth of posts, classifying them into three levels: information retrieval, information integration, and analysis and reflection. This classification provided valuable insights into engagement quality. Descriptive statistical analysis was conducted to compute metrics such as average post frequency and cognitive level distribution, further enriching our understanding of participation dynamics.

Thirdly, the Baum–Welch algorithm was employed to estimate the parameters of the HMM using contribution sequences and posting data from 25,474 learners. This algorithm was chosen for its strong convergence properties with large datasets. The Bayesian Information Criterion (BIC) was utilized to determine the optimal number of hidden states, indicating a better model fit with lower values.

Finally, we integrated contribution sequences, learner characteristics, cognitive levels, and hidden state data into both HMM and benchmark models to compare AUC improvement scores, evaluating the relative enhancement in prediction accuracy. This study focuses on AUC improvement rather than individual model AUC results, defined as the percentage by which HMM’s prediction accuracy surpasses that of all benchmark models.

4. Data and Method Implementation

4.1. Data Collection

In this study, the research concentrated on the “Machine Learning” by Andrew Ng from Stanford University hosted on Coursera, a prominent MOOC platform. The selection of Coursera as the primary platform was driven by several compelling factors. Firstly, Coursera offers access to an extensive array of courses provided by more than 200 prestigious universities globally, encompassing over 6000 courses as of 2022. Moreover, the platform has witnessed substantial growth in its learner base, escalating from 2 million users in 2012 to surpassing 220 million by 2022 [3]. Secondly, each course on Coursera features an autonomous learning forum where participants engage in discussions and contribute posts, presenting a valuable dataset for analyzing learner behavior. Lastly, the “Machine Learning” course was chosen due to its widespread popularity and sustained presence on the platform. Since its inception in 2015, the course has garnered significant interest from learners, amassing a cohort of 4.4 million participants by 2022. The enduring popularity of this course ensures a comprehensive and continuous record of learner interactions. By focusing on the “Machine Learning” course on Coursera, this study capitalizes on the platform’s substantial learner engagement and the extensive repository of behavioral data within its dedicated learning forum.

The “Machine Learning” course is structured into 11 weeks of instructional content, each accompanied by dedicated sub-forums spanning from week 1 to week 11. Within these sub-forums, learners can initiate new topic posts, which are open to responses and likes from fellow participants. Furthermore, learners have the autonomy to actively engage in discussions by contributing replies to existing topic posts, thereby fostering subsequent layers of sub-replies. Importantly, it should be noted that while sub-replies can be generated, they do not elicit subsequent direct responses.

The Python (3.12) programming language was employed in this study to gather and preprocess data scraped from the dedicated forum of the “Machine Learning” course. The dataset encompassed posts contributed by a total of 104,203 learners, spanning a duration of 348 weeks, from 2 March 2015 (Monday) to 31 October 2021 (Sunday). The collected data comprised 108,772 topic posts, 195,311 reply posts, and 181,284 sub-reply posts. To provide an overview of temporal patterns, Figure 2 illustrates the monthly trends of new topic posts, replies, and sub-replies across 80 months, from March 2015 to October 2021. The analysis reveals a generally stable monthly influx of new posts, with a notable peak in topic posts, reaching 3670 in the 62nd month (April 2020), accompanied by increased levels of replies and sub-replies. This surge coincided with heightened interest in MOOCs spurred by the COVID-19 pandemic. Overall, the dataset offers valuable insights into the dynamics of learner interaction within the “Machine Learning” forum, highlighting significant trends observed over the 80-month period.

Figure 2. Monthly trends of new topic posts.

4.2. Determining the Contribution Sequence of Learners

This study aimed to investigate the depth of learner engagement within the “Machine Learning” learning forum over an 11-week period. Given the typical concentration of learner interaction within this timeframe, participants were purposefully selected based on their consistent and active involvement in discussions throughout the entire duration. Consequently, the analysis included the contributions of a cohort comprising 25,474 engaged learners, who collectively generated 50,884 topic posts, 44,989 replies, and 48,895 sub-reply posts. The substantial sample size of actively engaged participants provided a solid foundation for the study, enhancing the generalizability of its findings.

To establish a rigorous framework for analyzing the sequence of learner contributions, a structured approach was developed. This approach assigned numerical values—0, 1, 2, and 3—to different types of actions: lurkers, sub-repliers, repliers, and topic posters, respectively. These values were assigned in ascending order to reflect varying levels of contribution. Adhering to this classification enabled the objective assessment of engagement levels exhibited by each participant.

To quantify the weekly contribution of individual learners, we calculated the maximum score from their recorded actions for each week, as this reflects their highest potential in specific contexts. For example, if learner

i

posted both topic posts and replies in a given week, their contribution score for that week would be assigned a value of 3, indicating active participation. This methodical strategy facilitated the creation of a contribution sequence denoted as

O_{i, t_{j}}

for the 25,474 learners over the 11-week duration. Here,

i \in [1, 25474]

represents the learner index, and

j \in [1, 11]

denotes the week index. This sequence holds substantial value as it provides insights into the varying levels of engagement exhibited by learners throughout the 11-week timeframe. By leveraging this sequence, we can analyze and interpret the patterns and trends in learner participation, offering insights into the dynamics of engagement within the “Machine Learning” learning forum. Equation (6) visually represents this contribution sequence, serving as a valuable analytical tool for understanding the distribution and progression of learner contributions over time.

O_{i, t_{j}} = (\begin{matrix} \begin{matrix} O_{1, t_{1}} & O_{1, t_{2}} & \dots & O_{1, t_{11}} \\ O_{2, t_{1}} & O_{2, t_{2}} & \dots & O_{2, t_{11}} \\ \dots & \dots & \dots & \dots \\ O_{25474, t_{1}} & O_{25474, t_{2}} & \dots & O_{25474, t_{11}} \end{matrix} \end{matrix})

(6)

4.3. Determining the Covariates of Learners’ Posts

By utilizing the contribution sequence

O_{i, t_{j}}

of learner

i

, we can capture and depict the covariate sequence

X_{t_{j}}

associated with the learner’s post. The covariate sequence

X_{t_{j}}

conveys significant insights into the specific attributes and characteristics of the learner’s post at each timestamp

t_{j}

. This sequence is contingent upon on the timing of the post and can be expressed as demonstrated in Equation (7). Notably, Equation (7) delineates the relational framework between the contribution sequence and the corresponding covariate sequence, elucidating the dynamic interaction between the learner actions and the attributes of their posts across the temporal span.

X_{t_{j}} = (\begin{matrix} x_{1, t_{1}} & x_{1, t_{2}} & \dots & x_{1, t_{11}} \\ x_{2, t_{1}} & x_{2, t_{2}} & \dots & x_{2, t_{11}} \\ \dots & \dots & \dots & \dots \\ x_{11, t_{1}} & x_{11, t_{2}} & \dots & x_{11, t_{11}} \end{matrix})

(7)

4.3.1. Evaluation Criteria for Assessing Henri’s Cognitive Level

The methodology employed to establish the criteria for determining Henri’s cognitive level

x_{11}

is visually depicted in Figure 3, offering a clear illustration of the process. This approach enabled a comprehensive assessment of cognitive attributes within the corpus, thereby facilitating a detailed analysis of learner engagement and cognitive processes.

Figure 3. Formulate Evaluation Criteria for Assessing Henri’s Cognitive Level.

In Step 1, the evaluation criteria for assessing Henri’s cognitive level were initially established by leveraging authoritative references such as the LIWC2015 dictionary and the Google Machine Learning Glossary [37,38]. These resources categorize cognitive words into various classifications, encompassing insight words, causal words, difference words, tentative words, exact words, restriction words, inclusion words, and exclusive words.

In Step 2, a meticulous selection process was undertaken to construct a corpus aimed at evaluating cognitive levels. This endeavor entailed the random selection of 2000 entries from learners’ posts, ensuring the sample’s representativeness. The construction of this corpus was meticulously executed through a synergistic application of programming techniques and manual coding.

To assess the cognitive level of the posts extracted in Step 2, Step 3 utilized Java programming. This phase employed the predefined evaluation criteria established in Step 1. Concurrently, researchers conducted manual assessments of the cognitive level of these posts, facilitating cross-validation and ensuring the robustness of the analysis. This dual approach underscored the methodological rigor and reliability of the cognitive evaluation process.

In Step 4, the alignment between cognitive level assessments derived from programming and manual coding was scrutinized. This comparison was substantiated by calculating a consistency coefficient [39]. A coefficient below the 90% threshold indicated a requirement for refining the evaluation criteria, necessitating a repetition of Step 3. Conversely, a coefficient surpassing 90% affirmed the adequacy of the established evaluation criteria, validating their application in computing Henri’s cognitive level with confidence.

Finally, Step 5 entailed the assignment of numeric cognitive level values, ranging from 0 to 5, to each post based on the refined evaluation criteria established in Step 4. This systematic procedure ensured a uniform and standardized measurement of cognitive levels throughout the corpus.

In summary, this methodical process provided a rigorous and comprehensive approach to ascertain Henri’s cognitive level. By leveraging authoritative sources, meticulous post selection, proficient programming techniques, rigorous manual coding, and thorough cross-validation, the analysis facilitated a robust evaluation of cognitive attributes within the dataset.

4.3.2. Descriptive Statistics

Table 3 presents comprehensive descriptive statistics pertaining to learners’ contributions and relevant covariates associated with their posts, specifically analyzed across different teaching week. This table provides a succinct summary of essential statistical measures and variables that delineate learner participation and the contextual factors influencing their contributions over the course of the instructional period.

Table 3. Descriptive statistics for learners’ contributions and post covariates.

The average learner contribution (

O

) is recorded to be 0.58, while the mean values for creating topic posts (

x_{4}

) was observed at 0.18. The findings suggest a prevailing tendency among learners towards passive observation rather than active engagement in posting activities.

Furthermore, it is observed that the mean interval between thread initiations (

x_{7}

) is 1.42 weeks, while the mean duration since the last thread initiation (

x_{6}

) is 3.33 weeks. These results empirically substantiate the variability in learners’ participation levels across different weeks of the instructional period.

The mean values for the number of replies (

x_{5}

), sub-replies (

x_{9}

) and replies per original thread (

x_{10}

) are 0.23, 0.34 and 0.25, respectively. Notably, all these values exceed the average number of topic posts (

x_{4}

), indicating a prevalent tendency for topic posts to elicit responses from other participants.

The mean, maximum, and standard deviation of responses preceding each post (

x_{8}

) are computed as 17.21, 3048, and 146, respectively, indicating significant diversity in the reception of various topic posts. Similarly, the average, maximum, and standard deviation of received responses (

x_{3}

) are determined to be 0.72, 9964, and 21, respectively, underscoring substantial variability in the engagement levels surrounding topic discussions.

The average Henri cognitive level (

x_{11}

) is 0.56, indicating that the cognitive content of posts predominantly reflects lower-level engagement, characterized by basic cognitive processes and fundamental clarification. Consequently, the average number of received likes (

x_{1}

) stands modestly at 0.05.

4.4. Determining the Number of Hidden States for Learners

The Baum–Welch algorithm was employed to estimate the parameters of the HMM, utilizing the contribution sequence and posting covariates from 25,474 learners as input. This algorithm was chosen for its robust convergence properties, particularly well-suited for large sample sizes [40]. To determine the optimal number of hidden states, the Bayesian Information Criterion (BIC) was utilized. A lower BIC value indicates a better fit of the model and parameter configurations to the data.

After conducting multiple rounds of parameter optimization, the number of hidden states yielding the smallest BIC value was determined. As illustrated in Figure 4, the BIC value reaches its minimum when the number of hidden states is set to four. Accordingly, it has been established that there are four distinct hidden states corresponding to learners’ contributions in the course discussion forum. These states are ordered in ascending magnitude and characterized as follows:

S_{1}

represents an extremely negative state,

S_{2}

denotes a moderately negative state,

S_{3}

signifies a moderately positive state, and

S_{4}

represents a highly positive state. Collectively, these four hidden states are denoted as

S = \{S_{1}, S_{2}, S_{3}, S_{4}\}

, which represent latent behavioral patterns inferred from observable actions.

Figure 4. BIC values of HMM with a varying number of hidden states.

4.5. Predicting Learners’ Contributions in MOOC Learning Forums

Using post data collected from the MOOC learning forum spanning weeks 1 to 11, the Baum–Welch algorithm was applied for parameter estimation, leveraging the four hidden states identified in Figure 4. After conducting multiple rounds of training iterations, the parameters yielding the smallest BIC value were chosen as the HMM parameters. These parameters were then utilized to predict the contribution classes of 25,474 individual learners.

To evaluate the effectiveness of models in predicting individual learner contributions within the MOOC learning forum, the Area Under Curve (AUC) was utilized as the evaluation metric due to its consideration of all classification thresholds, providing a thorough assessment of the model’s performance, especially in cases of class imbalance [41,42]. The AUC value, ranging from 0.5 to 1, reflects prediction performance, with higher values indicating superior accuracy. By comparing the AUC improvement scores derived from learner contributions predicted by the HMM with those from a benchmark model, we can determine the relative enhancement in prediction accuracy [35].

Figure 5 presents the average AUC improvement scores achieved by each method, obtained through 10-fold cross-validation. The AUC scores, which encompass all models, range from 54.1% to 87.5%, yielding a mean score of 74.3%. Remarkably, the HMM approach demonstrates superior performance compared to other methods, showing an average AUC improvement ranging from 1% to 16%, as illustrated by the mean across classes plot in Figure 5.

Figure 5. AUC improvement scores using HMM.

To optimize model performance, we employ a comprehensive approach to hyperparameter tuning for XGBoost, KNN, and LSTM models. For XGBoost, we start with default settings (e.g., max_depth = 6, learning_rate = 0.1, n_estimators = 100) and perform a grid search over critical parameters such as learning_rate = [0.01, 0.1, 0.3] and max_depth = [3, 6, 10]. When grid search becomes computationally expensive, we transition to random search, enhancing efficiency while using cross-validation to evaluate generalization across various data subsets. In the case of KNN, we focus on optimizing key parameters, including the number of neighbors (k), distance metrics, and weighting schemes. A grid search is conducted over different values for k (e.g., 3, 5, 7, 10) and common distance metrics (e.g., Euclidean, Manhattan). To further streamline the process, we implement random search or k-fold cross-validation, ensuring the optimal selection of k while preventing overfitting. Feature scaling is also applied to improve KNN’s sensitivity to data variability. For LSTM models, we adopt a similar strategy, integrating grid or random search with manual adjustments based on domain expertise. Additionally, we incorporate learning rate schedulers to gradually decrease the learning rate for better convergence, while early stopping is implemented to avoid overfitting. As a result, HMM exhibits approximately 2% superior runtime efficiency compared to XGBoost, KNN, and LSTM. This advantage arises from HMM’s simpler Markov assumption and dynamic programming, which bypass iterative training in XGBoost and LSTM, and eliminate the need for distance calculations in KNN.

The HMM approach demonstrates superior performance compared to all other benchmark models in terms of per-class accuracy for the “Subreply” and “Reply” classes, showing an AUC improvement of over 4%. When predicting the “Lurk” class, the HMM approach outperforms Random Forest with an AUC improvement of less than 2%, while surpassing other approaches with improvements ranging from 2.5% to 5.9%. For predicting the “Initiate” class, the HMM approach performs competitively with logistic regression, random forest, and XGBoost models, and outperforms KNN and LSTM models.

These findings collectively highlight that the HMM, through its integration of a double random structural feature, effectively learns a probabilistic state space that accurately represents learner behavior. By leveraging the inherent time constraints of course participation, the short-term dependencies of learner actions, and the stochastic nature of these behaviors, HMM captures the dynamic patterns of learner engagement. As a result, this ability to understand nuanced learner behaviors enables the HMM approach to generalize effectively and provide precise predictions of learner contributions, offering valuable insights into their interactions within online learning environments.

5. Conclusions and Implications

Although centered on a specific course, the HMM and analytical framework employed exhibit broader applicability. This model effectively captures learner behavior patterns that may be similar across various MOOCs. As a result, the findings offer valuable insights into learner contributions across diverse MOOC contexts.

5.1. Conclusions

This study leverages the unique attributes of MOOC learning forums to classify learner contributions into four distinct categories: lurk, sub-reply, reply, and initiate, each representing progressively higher levels of engagement. The research specifically examines the forum of the well-regarded “Machine Learning” course offered on Coursera, a leading MOOC platform. To facilitate this analysis, the study employs HMM as a methodological tool. The HMM framework offers a robust approach for exploring and predicting the patterns and dynamics of learner contributions within MOOC learning forums.

Utilizing the Bayesian Information Criterion (BIC), this study effectively identifies four distinct hidden states that characterize various learner groups. These hidden states are classified as extremely negative, generally negative, generally positive, and very positive. To maintain methodological rigor, the assignment of hidden states to individual learners is systematically aligned with the group-level assignments. Following this, the HMM parameters for individual learners are estimated using the Baum–Welch algorithm. This rigorous estimation process not only allows for precise predictions of learners’ contributions within the MOOC learning forum but also provides a deep understanding of their engagement patterns and behavioral dynamics.

In order to explore the effectiveness of HMM in predicting the learners’ contributions within the MOOC learning forum, five widely used prediction models, namely kNN, Logistic Regression, Random Forest, XGBoost and LSTM, are utilized as benchmark models. Then the evaluation metric of AUC is employed to assess and compare the prediction results between the benchmark models and HMM. The research findings indicate that HMM surpasses the mainstream prediction models in accurately predicting learner contributions. Thus, the study highlights the efficacy of HMM as an effective approach for classifying and predicting learner contributions in MOOC learning forums.

To evaluate the effectiveness of HMM in predicting learner contributions within MOOC learning forums, this study employs five widely recognized prediction models—k-Nearest Neighbors (kNN), Logistic Regression, Random Forest, XGBoost, and Long Short-Term Memory (LSTM)—as benchmark models. The area under the curve (AUC) is used as the evaluation metric to compare the predictive performance of the benchmark models with that of the HMM. The research findings demonstrate that the HMM outperforms these mainstream prediction models in accurately forecasting learner contributions. Consequently, the study underscores the efficacy of HMM as a robust and reliable method for both classifying and predicting learner engagement within MOOC learning forums.

Moreover, the predictive capabilities of the HMM can be utilized to forecast learners’ future contributions within the MOOC learning forum. This predictive capability allows for the delivery of personalized information services that align with learners’ engagement patterns, thereby enhancing the activity and effectiveness of MOOC learning forums in a customized manner. By leveraging the forecasting capabilities inherent in the HMM, MOOCs can facilitate more effective and efficient communication among learners, instructors, and course content, ultimately leading to improved learning outcomes and a more engaging learning experience.

5.2. Theoretical Implications

This study makes several notable theoretical contributions to the understanding of learners’ participation in the MOOC learning forum. Firstly, by employing HMM, it provides insights into the underlying mechanisms that drive learners’ engagement. Secondly, the integration of Henri’s cognitive level analysis enriches our understanding of learners’ cognitive engagement and its influence on their contributions [43,44]. Finally, through empirical evidence, this study demonstrates the superior predictive capabilities of the HMM in forecasting learners’ contributions within MOOC learning forums.

5.2.1. Exploring Learners’ Contributions Through HMM

Previous research on learners within MOOC learning forums has predominantly concentrated on analyzing observable behaviors, with comparatively limited attention given to exploring the underlying hidden states that influence these behaviors. Recognizing the intricate relationship between learners’ hidden states and their observable behaviors is essential. To address this research gap, the utilization of HMM offers a compelling framework for elucidating this relationship.

By leveraging the HMM, a more profound comprehension of the dynamic nature of learners’ latent states and their influence on observable contributions can be attained. This approach allows for a comprehensive examination of the patterns and dynamics of learner participation in the forum, shedding light on the various factors that influence their level of engagement. Consequently, the application of the HMM provides a powerful methodology to explore the complex interplay between learners’ hidden states and their observable behaviors in the MOOC learning forum. Through delving into these hidden states, researchers can enhance their insights into learners’ engagement patterns and contribute to a more comprehensive understanding of learner behavior in online learning environments.

5.2.2. Introduction of Henri Cognitive Level Analysis

Existing investigations pertaining to learners in the MOOC learning forum have frequently neglected the cognitive dimensions inherent in learners’ post contents, thus failing to adequately consider their potential contributions. However, these cognitive levels are critical as they serve as valuable indicators of learners’ engagement and participation. To address this significant research gap, the Henri cognitive level analysis model—renowned for its logical division and broad applicability—has been employed in this study.

This research endeavor introduces the Henri cognitive level analysis model to rigorously scrutinize the substantive content of learners’ posts, thereby enhancing and diversifying the discernible information used to predict their contributions. By examining the cognitive processes and levels of thinking involved in learners’ interactions within the forum, this analysis not only deepens our comprehension of the cognitive aspects of learner engagement but also elucidates how these cognitive factors are manifested in their forum contributions. Such an approach provides a more nuanced understanding of how cognitive dimensions influence participation and engagement in MOOC environments.

5.2.3. Efficacy of HMM for Predicting Learners’ Contributions

In previous studies on prediction, various models such as kNN, logistic regression, Random Forest (RF), XGBoost, and LSTM have been extensively employed. Building upon this established foundation, these five models are utilized as benchmark frameworks in this study. The objective of this research is to predict learners’ contributions in the MOOC learning forum by leveraging both the benchmark models and HMM for comparative analysis. The findings indicate that, overall, HMM outperforms the benchmark models in predicting learners’ contributions. This superiority can be attributed to the double random structure inherent in HMM, which effectively reduces noise in learner post data and enhances the reliability and logical coherence of contribution predictions. These findings highlight the practical applicability of HMM in the context of MOOC learning forums. By accurately predicting learners’ future participation, HMM enables educators and platform administrators to anticipate and respond to learners’ needs more effectively, thus fostering a more engaging and personalized learning experience.

5.3. Practical Implications

The application of HMM represents a highly effective methodology for predicting learners’ future contributions in MOOC learning forums. This approach facilitates the provision of tailored services that are specifically aligned with individual learners’ engagement levels, offering significant potential to enhance overall participation and activity within these forums. To leverage this predictive capability fully, several targeted strategies are suggested for the management of MOOC learning forums, as well as for the benefit of learners and educators.

For learners who are actively involved in reply contributions, implementing push notifications from forum managers for newly published topic posts proves to be particularly advantageous. This proactive strategy enables learners to conveniently access relevant and focused topics, empowering them to deliver timely and valuable responses to peers seeking assistance. By adopting this approach, forum managers can significantly encourage active participation, thereby fostering a more robust sense of community among learners. In addition, learners who predominantly engage in sub-reply contributions stand to gain substantially from receiving proactive push notifications that spotlight highly discussed posts within the MOOC learning forum. This method allows these learners to engage more effectively in ongoing discussions, promoting deeper communication and interaction with other participants on relevant issues and topics. By staying well-informed through timely updates, these learners can contribute meaningful insights and perspectives, enhancing the quality of the dialogue. Furthermore, learners who focus on contributing topic posts can also benefit from receiving notifications about popular posts in the course forum. Such notifications provide these learners with valuable insights into prevailing discussion trends and focal points, enabling them to avoid redundant postings and instead concentrate on enriching the quality and relevance of their own contributions. This strategy not only supports knowledge sharing but also encourages a more in-depth exploration of the subject matter, fostering a collaborative and engaging learning environment.

Overall, the integration of tailored services based on HMM predictions, coupled with effective management practices, holds considerable promise for creating dynamic and collaborative learning environments. By offering targeted and proactive support, educators and forum managers can effectively enhance learners’ engagement levels, promote active participation, and build a strong sense of community within MOOC learning forums.

6. Limits and Future Research

This study explores the discussion forums of the “Machine Learning” course on Coursera, utilizing HMM to analyze learner engagement and interaction dynamics. By examining learner-generated posts, the research uncovers nuanced patterns of interaction and behavior within structured online learning environments. However, the complexity of predictive modeling algorithms may result in opaque decision-making processes, obscuring the underlying rationale. To improve the generalizability and applicability of these findings, future research should integrate discussion forum data from diverse courses across multiple MOOC platforms within a unified modeling framework. This approach will enhance insights into learner participation, broaden the analytical scope, and facilitate a comprehensive examination of commonalities and unique characteristics in learner behaviors, ultimately enriching our understanding of online collaborative learning dynamics.

While this study primarily focuses on predicting learners’ contributions within discussion forums, future research should expand its scope to include forecasting the overall success and long-term sustainability of MOOC forums. Specifically, examining the factors that contribute to the sustained vitality of these forums could provide valuable insights for administrators, enabling data-driven decisions regarding resource allocation and incentives. Such insights may help identify early warning signs of declining participation, allowing for timely interventions to maintain an active and engaged learning community. A data-driven approach to forum sustainability could mitigate the common decline in engagement often observed in MOOC forums, where participation typically decreases as courses progress. By better understanding the dynamics influencing sustained learner involvement, researchers can offer actionable recommendations for enhancing long-term learner retention.

Author Contributions

Conceptualization, B.W.; Methodology, B.W.; Software, R.X.; Validation, B.W.; Formal analysis, B.W. and R.X.; Investigation, B.W.; Resources, B.W.; Data curation, R.X.; Writing—original draft, R.X.; Writing—review & editing, B.W.; Visualization, B.W. and R.X.; Supervision, B.W.; Project administration, B.W.; Funding acquisition, B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Chinese National Social Science Fund “Thirteenth Five-Year Plan” education topic (BFA180064).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Perifanou, M.; Economides, A.A. The Landscape of MOOC Platforms Worldwide. Int. Rev. Res. Open Distrib. Learn. 2022, 23, 104–133. [Google Scholar] [CrossRef]
Li, J.S.; Li, L.L.; Zhu, Z.X.; Shadiev, R. Research on the predictive model based on the depth of problem-solving discussion in MOOC forum. Educ. Inf. Technol. 2023, 28, 13053–13076. [Google Scholar] [CrossRef] [PubMed]
Chen, J.; Fang, B.; Zhang, H.; Xue, X. A systematic review for MOOC dropout prediction from the perspective of machine learning. Interact. Learn. Environ. 2022, 32, 1642–1655. [Google Scholar] [CrossRef]
Kokkodis, M.; Lappas, T.; Ransbotham, S. From Lurkers to Workers: Predicting Voluntary Contribution and Community Welfare. Inf. Syst. Res. 2020, 31, 607–626. [Google Scholar] [CrossRef]
Rivera, D.A.; Frenay, M.; Paquot, M. The role of MOOC forum discussion tasks in learners’ cognitive engagement. J. Comput. Assist. Learn. 2024, 40, 2103–2120. [Google Scholar] [CrossRef]
Almatrafi, O.; Johri, A. Systematic Review of Discussion Forums in Massive Open Online Courses (MOOCs). IEEE Trans. Learn. Technol. 2019, 12, 413–428. [Google Scholar] [CrossRef]
Wu, B.; Wu, C.C. Research on the mechanism of knowledge diffusion in the MOOC learning forum using ERGMs. Comput. Educ. 2021, 173, 104295. [Google Scholar] [CrossRef]
Chen, W.; Wei, X.; Zhu, K.X. Engaging Voluntary Contributions in Online Communities: A Hidden Markov Model. MIS Q. 2018, 42, 83–100. [Google Scholar] [CrossRef]
Mor, B.; Garhwal, S.; Kumar, A.A. Systematic Review of Hidden Markov Models and Their Applications. Arch. Comput. Methods Eng. 2021, 28, 1429–1448. [Google Scholar] [CrossRef]
Sun, Y.B.; Pfahringer, B.; Gomes, H.M.; Bifet, A. SOKNL: A novel way of integrating K-nearest neighbours with adaptive random forest regression for data streams. Data Min. Knowl. Discov. 2022, 36, 2006–2032. [Google Scholar] [CrossRef]
Jin, J.; Liu, S.Z.; Ma, T.F. Robust and efficient subsampling algorithms for massive data logistic regression. J. Appl. Stat. 2023, 51, 1427–1445. [Google Scholar] [CrossRef] [PubMed]
Cook, J.A.; Siddiqui, S. Random forests and selected samples. Bull. Econ. Res. 2020, 72, 272–287. [Google Scholar] [CrossRef]
Ren, Q.X.; Wang, J.G. Research on Enterprise Digital-Level Classification Based on XGBoost Model. Sustainability 2023, 15, 2699. [Google Scholar] [CrossRef]
Eggebrecht, P.; Lutkebohmert, E. A hybrid convolutional neural network with long short-term memory for statistical arbitrage. Quant. Financ. 2023, 23, 595–613. [Google Scholar] [CrossRef]
Jiang, H.X.; Fan, S.K.; Zhang, N.; Zhu, B. Deep learning for predicting patent application outcome: The fusion of text and network embeddings. J. Informetr. 2023, 17, 101402. [Google Scholar] [CrossRef]
Wu, B.; Chen, X.H. Continuance intention to use MOOCs: Integrating the technology acceptance model (TAM) and task technology fit (TTF) model. Comput. Hum. Behav. 2017, 67, 221–232. [Google Scholar] [CrossRef]
Gong, J.; Liu, T.X.; Tang, J. How monetary incentives improve outcomes in MOOCs: Evidence from a field experiment. J. Econ. Behav. Organ. 2021, 190, 905–921. [Google Scholar] [CrossRef]
Yu, S.Z.; Androsov, A.; Yan, H.B.; Chen, Y. Bridging computer and education sciences: A systematic review of automated emotion recognition in online learning environments. Comput. Educ. 2024, 220, 105111. [Google Scholar] [CrossRef]
Almatrafi, O.; Johri, A. Improving MOOCs Using Information from Discussion Forums: An Opinion Summarization and Suggestion Mining Approach. IEEE Access 2022, 10, 15565–15573. [Google Scholar] [CrossRef]
Kim, C.; Feng, B. Digital inequality in online reciprocity between generations: A preliminary exploration of ability to use communication technology as a mediator. Technol. Soc. 2021, 66, 101609. [Google Scholar] [CrossRef]
Yang, B.K.; Tang, H.T.; Hao, L.; Rose, J.R. Untangling chaos in discussion forums: A temporal analysis of topic-relevant forum posts in MOOCs. Comput. Educ. 2022, 178, 104402. [Google Scholar] [CrossRef]
Nguyen, M.D.; Cho, Y.S. A Hybrid Generative Model for Online User Behavior Prediction. IEEE Access 2020, 8, 3761–3771. [Google Scholar] [CrossRef]
Mustafa, S.; Zhang, W. Predicting users knowledge contribution behaviour in technical vs non-technical online Q&A communities: SEM-Neural Network approach. Behav. Inf. Technol. 2022, 42, 2521–2544. [Google Scholar] [CrossRef]
Mousavi, S.; Roper, S. Enhancing Relationships Through Online Brand Communities: Comparing Posters and Lurkers. Int. J. Electron. Commer. 2023, 27, 66–99. [Google Scholar] [CrossRef]
Nguyen, M.; Malik, A.; Sharma, P. How to motivate employees to engage in online knowledge sharing? Differences between posters and lurkers. J. Knowl. Manag. 2021, 25, 1811–1831. [Google Scholar] [CrossRef]
Badreddine, B.; Blount, Y.; Quilter, M. The role of personality traits in participation in an Online Cancer Community. Aslib J. Inf. Manag. 2022, 75, 318–341. [Google Scholar] [CrossRef]
Kuo, T.M.; Tsai, C.C.; Wang, J.C. Linking web-based learning self-efficacy and learning engagement in MOOCs: The role of online academic hardiness. Internet High. Educ. 2021, 51, 100819. [Google Scholar] [CrossRef]
Deng, R.Q.; Benckendorff, P.; Gannaway, D. Learner engagement in MOOCs: Scale development and validation. Br. J. Educ. Technol. 2020, 51, 245–262. [Google Scholar] [CrossRef]
Wu, B. Influence of MOOC learners discussion forum social interactions on online reviews of MOOC. Educ. Inf. Technol. 2021, 26, 3483–3496. [Google Scholar] [CrossRef]
Grewal, J.K.; Krzywinski, M.; Altman, N. Markov models—Training and evaluation of hidden Markov models. Nat. Methods 2020, 17, 121–122. [Google Scholar] [CrossRef]
Elkimakh, K.; Nasroallah, A. Hidden Markov model steady-state estimation. Commun. Stat.-Simul. Comput. 2022, 51, 6792–6807. [Google Scholar] [CrossRef]
Ray, S.; Kim, S.S.; Morris, J.G. The Central Role of Engagement in Online Communities. Inf. Syst. Res. 2014, 25, 528–546. [Google Scholar] [CrossRef]
Amjad, T.; Shaheen, Z.; Daud, A. Advanced Learning Analytics: Aspect Based Course Feedback Analysis of MOOC Forums to Facilitate Instructors. IEEE Trans. Comput. Soc. Syst. 2022, 11, 4698–4706. [Google Scholar] [CrossRef]
Cohen, A.; Shimony, U.; Nachmias, R.; Soffer, T. Active learners’ characterization in MOOC forums and their generated knowledge. Br. J. Educ. Technol. 2019, 50, 177–198. [Google Scholar] [CrossRef]
Su, Z.; Yi, B. Research on HMM-Based Efficient Stock Price Prediction. Mob. Inf. Syst. 2022, 2022, 8124149. [Google Scholar] [CrossRef]
Dridi, N.; Delignon, Y.; Sawaya, W. BIC and AIC criteria for the hidden Markov chain Application to numerical communication. Trait. Signal 2014, 31, 383–400. [Google Scholar] [CrossRef]
Tausczik, Y.R.; Pennebaker, J.W. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. J. Lang. Soc. Psychol. 2010, 29, 24–54. [Google Scholar] [CrossRef]
Google Machine Learning Glossary [EB/OL]. Available online: https://developers.google.cn/machine-learning/glossary (accessed on 26 January 2024).
Neha; Kim, E. Designing effective discussion forum in MOOCs: Insights from learner perspectives. Front. Educ. 2023, 8, 1223409. [Google Scholar] [CrossRef]
Deng, Q.; Soffker, D. A Review of HMM-Based Approaches of Driving Behaviors Recognition and Prediction. IEEE Trans. Intell. Veh. 2022, 7, 21–31. [Google Scholar] [CrossRef]
Zhang, X.L.; Xu, M.L. AUC optimization for deep learning-based voice activity detection. Eurasip J. Audio Speech Music Process. 2022, 1, 27. [Google Scholar] [CrossRef]
Kim, T.; Lee, J.S. Maximizing AUC to learn weighted naive Bayes for imbalanced data classification. Expert Syst. Appl. 2023, 217, 119564. [Google Scholar] [CrossRef]
Liu, S.; Liu, S.Q.; Liu, Z.; Peng, X.; Yang, Z.K. Automated detection of emotional and cognitive engagement in MOOC discussions to predict learning achievement. Comput. Educ. 2022, 181, 104461. [Google Scholar] [CrossRef]
Liu, Z.; Mu, R.; Yang, Z.K.; Peng, X.; Liu, S.Y.Y.; Chen, J. Modeling temporal cognitive topic to uncover learners’ concerns under different cognitive engagement patterns. Interact. Learn. Environ. 2023, 31, 7196–7213. [Google Scholar] [CrossRef]

Figure 1. The Hidden State Sequence and Observation Sequence in an HMM.

Figure 2. Monthly trends of new topic posts.

Figure 3. Formulate Evaluation Criteria for Assessing Henri’s Cognitive Level.

Figure 4. BIC values of HMM with a varying number of hidden states.

Figure 5. AUC improvement scores using HMM.

Table 1. Covariates for learners’ post.

Variables	Definition
Forum actions
$x_{1}$	Likes per thread received by a leaner
$x_{2}$	Subsequent responses per thread received by a leaner
$x_{3}$	Responses per thread received by a leaner
Learner actions
$x_{4}$	Original posts created by a learner
$x_{5}$	Original posts replied to by a learner
$x_{6}$	Weeks since a learner’s last thread initiation
$x_{7}$	Mean interval between a learner’s thread initiations
$x_{8}$	Pre-existing count of replies to the original thread before a learner responds
$x_{9}$	Sub-replies authored by a learner per reply
$x_{10}$	Replies contributed by a learner per original thread
$x_{11}$	Average Henri value of a learner’s original, reply, and sub-reply threads

Table 2. Assignment of Henri cognitive levels for posts.

Hierarchy	Description	Indicator	Value
No Cognition	Cognitive level is not involved	Asking questions, seeking help, engaging in polite conversations	0
Basic Clarification	Observing problems, analyzing basic concepts, sorting out connections, and summarizing understanding	Identifying and defining basic concepts; Involving basic subject knowledge; Restating the problem; Asking relevant questions	1
In-depth Clarification	Analyzing problems, deeply understanding assumptions, logic, conclusions, and application value	Using and defining terminology; Establishing a reference taxonomy; Utilizing examples and analogies	2
Inference	Endorsing or presenting a point of view through induction and deduction based on accepted facts	Drawing conclusions, making inferences, and elaborating ideas based on previous statements	3
Judgment	Making decisions and expressing appreciation, criticism, or support	Assessing relevance, effectiveness, and correctness of solutions; Making value judgments; Evaluating reasonableness	4
Strategy	Proposing specific solutions or actions	Deciding to act; Proposing solutions	5

Table 3. Descriptive statistics for learners’ contributions and post covariates.

Variables	Average	Std. Deviation	Median	Max
$O$	0.58	1.11	0	3
$x_{1}$	0.05	0.60	0	193
$x_{2}$	0.16	1.13	0	74
$x_{3}$	0.72	21.11	0	9964
$x_{4}$	0.18	0.53	0	20
$x_{5}$	0.23	0.70	0	75
$x_{6}$	3.33	2.54	3	10
$x_{7}$	1.42	1.32	1	10
$x_{8}$	17.21	146.86	0	3048
$x_{9}$	0.34	1.15	0	117
$x_{10}$	0.25	0.68	0	43
$x_{11}$	0.56	1.26	0	5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.