Understanding Learners’ Perception of MOOCs Based on Review Data Analysis Using Deep Learning and Sentiment Analysis

: Massive open online courses (MOOCs) have exploded in popularity; course reviews are important sources for exploring learners’ perceptions about different factors associated with course design and implementation. This study aims to investigate the possibility of automatic classiﬁcation for the semantic content of MOOC course reviews to understand factors that can predict learners’ satisfaction and their perceptions of these factors. To do this, this study employs a quantitative research methodology based on sentiment analysis and deep learning. Learners’ review data from Class Central are analyzed to automatically identify the key factors related to course design and implementation and the learners’ perceptions of these factors. A total of 186,738 review sentences associated with 13 subject areas are analyzed, and consequently, seven course factors that learners frequently mentioned are found. These factors include: “Platforms and tools”, “Course quality”, “Learning resources”, “Instructor”, “Relationship”, “Process”, and “Assessment”. Subsequently, each factor is assigned a sentimental value using lexicon-driven methodologies, and the topics that can inﬂuence learners’ learning experiences the most are decided. In addition, learners’ perceptions across different topics and subjects are explored and discussed. The ﬁndings of this study contribute to helping MOOC instructors in tailoring course design and implementation to bring more satisfactory learning experiences for learners.


Introduction
Massive open online courses (MOOCs) have received intensive attention since their first appearance in 2007 [1]. The popularity of MOOCs is promoted by several factors. First, the broad accessibility of the Internet makes MOOCs available for global learners [2]. Second, MOOCs are cost-efficient for everyone, particularly learners in developing countries and/or regions [3]. In addition, the diversity of MOOC resources means that there are courses to suit the taste and needs of different learners [4]. The popularity of MOOCs thus prompts many educational institutions to produce MOOCs.
However, the development and implementation of a MOOC are not cheap; thus, there is a need to justify the benefits [5]. As a result, researchers and instructors have gone to great effort to understand MOOC success and the factors that contribute to their success [6].
Learners' perceptions of and satisfaction with a MOOC are factors that are increasingly adopted for measuring MOOC success. Traditional ways of understanding learners' perceptions of a MOOC may use questionnaire survey data (e.g., [7][8][9]). However, this can obtain limited information, and the analysis results depend heavily on the questionnaire design. Additionally, it usually takes a long time to collect all necessary data, and thus a timely and dynamic analysis is impossible.
Nowadays, many MOOC providers and platforms have integrated interactive technologies to allow learners to freely express their perceptions of and satisfaction with different aspects concerning MOOC design and implementation. This source of data is essential for tracking MOOCs' performance; thus, it is crucial to exploit rich information to allow a timely, dynamic, and automatic understanding of a MOOC's performance in satisfying learners [10].
Previous studies on MOOC data analysis for understanding learner satisfaction focus mainly on learners' demographics, personal characteristics, and disposition (e.g., [11][12][13]). Research on course review data analysis is primarily conducted based on qualitative analysis methodologies. Owing to the continuingly growing number of learner-generated reviews, it would be very time-consuming and labor-intensive to detect topics from learnerproduced review data through manual evaluation [14]. Thus, alternative analysis methodologies based on natural language processing (NLP) and machine learning should be considered. Although there are studies that have touched upon topic mining, learner sentiment detection, and topic classification in the context of MOOC review analysis (e.g., [15][16][17][18][19][20]), there are a lack of comprehensive and automated course review data analyses from topical and sentimental perspectives, and especially a lack of studies combining deep learning and sentiment analysis methodologies. Additionally, since learners' satisfaction can vary across subject areas due to differences in study objectives, different modes of assessment, etc. [21], and a comparison of learner dissatisfaction across different subjects are needed.
To that end, the present study aims to understand learner satisfaction with MOOCs based on course review data analysis using sentiment analysis and deep neural networks, with a particular focus on the factors concerning MOOC design and implementation that can lead to learner satisfaction. More specifically, learners' concerning factors and the sentimental scores on course quality, learning resources, instructors, relationship, assessment, process, platforms, and tools are investigated. We also examine the differences in learners' satisfaction with the identified factors across different subject areas. The present study is conducted to answer the following three research questions (RQs): RQ1: Can deep learning automatically identify factors that can predict learner satisfaction in MOOCs?
RQ2: What factors are frequently mentioned by learners? RQ3: How do learners' perceptions of the identified factors differ across subjects?
The findings of this study are helpful for MOOC educators and instructors during their design and implementation of a MOOC with a particular focus on improving learners' satisfaction. With a better understanding of learners' perceptions of different factors, instructors can tailor their course designs to produce MOOCs that can bring more satisfactory learning experiences for learners.

MOOCs
By providing free online courses, MOOCs offer an openness that enables higher education to be highly accessible worldwide [22,23]. MOOCs are an essential channel to promote the practices of ubiquitous and blended learning that have been popularly adopted in higher education settings (e.g., [24,25]). Despite the constantly growing number of MOOC learners, there is a low retention rate in MOOCs; thus, an increase in research understanding factors that can contribute to MOOC success is needed [23,[26][27][28]. Based on the analysis of a Standford MOOC dataset, Hewawalpita et al. [29] found that many MOOC learners did not complete all course learning activities. Watted and Barak [30] found that personal interest, eagerness for self-promotion, and gamification features contributed to learners' intention to complete a MOOC. According to Milligan [31], "understanding the nature of learners and their engagement is critical to the success of any online education provision, especially those MOOCs where there is an expectation that the learners should self-motivate and self-direct their learning" (p. 1882). Similarly, Hone and El Said [32] indicated that more studies should be conducted to exploit successful MOOC design and implementation to ensure a high level of course completion.

Understanding Learners' Satisfaction with MOOCs
Satisfaction, which shows learners' perceptions of their learning experiences, is a crucial psychological factor that affects learners' learning [33]. According to Hew et al. [34], satisfaction is significantly associated with the perceived quality of instruction in conventional face-to-face classroom learning and online education [35][36][37]. In recent years, the significance of learner satisfaction for measuring MOOC success has been increasingly recognized by educators and researchers. For example, Rabin et al. [38] suggest that learner satisfaction is a more appropriate measure of MOOC success, as it primarily focuses on learners' perceptions of learning experiences. Rabin et al. also claimed that because of different learning goals held by learners, MOOC success ought to be assessed by studentoriented indicators such as satisfaction, rather than outdated indicators such as dropout rates. In other words, when a learner does not intend to complete a MOOC, the completion rate as a success measure seems inappropriate. In addition, when more learners are satisfied with MOOCs, more newcomers will enroll and participate in MOOCs.

Research on MOOC Learner Satisfaction Based on Course Review Data Analysis
In analyzing the course review data regarding learners' satisfaction/dissatisfaction with their enrolled course [39], most studies have adopted qualitative manual coding methodologies (e.g., [40,41]). For instance, by qualitatively analyzing 4466 course reviews, [41] recognized seven factors that contributed to learner engagement, including "problem-centric learning, active learning supported by timely feedback, course resources that cater to participants' learning needs or preferences, and instructor attributes such as enthusiasm or humor" (p. 1). However, the reliability of the results derived from qualitative analysis methodologies depends heavily on analyst expertise. Furthermore, as manual data coding is labor-intensive, only a small dataset can be investigated, making it difficult to deal with the constantly increasing number of course reviews.
With the increasing availability of "big data" in MOOCs alongside the recent trend of applying machine learning and NLP techniques for educational purposes, there has been a rapid growth in studies that adopt text mining and machine learning to gain insight into the determinants of learner satisfaction based on course review data [42,43]. For instance, ref. [34] utilized five supervised machine learning techniques to classify a random sample of 8274 MOOC review sentences into six major topical categories (i.e., structure, video, instructors, content and resources, interaction, and assessment), identifying gradient boosting trees model's excellent classification performance. By training machine learning classifiers based on K-nearest neighbors, gradient boosting trees, support vector machines, logistic regression, and naive Bayes for the analysis of 24,000 reflective sentences produced by 6000 MOOC learners, [44] found the satisfactory performance of gradient boosting trees in understanding learners' perceptions. However, ref. [34,44] merely adopted machine learning, and deep neural networks that are widely accepted as preferred solutions for various NLP tasks were not considered. In addition, a comprehensive analysis of course review data from the perspectives of both topics and sentiments in an automated manner is lacking.
To capitalize on the advantages of NLP-oriented text-mining methodologies, the present study incorporates different analysis methods such as TextRCNN and sentiment analysis to conduct a more thorough analysis of the textual content of learner review data. By uncovering learners' focal points and sentiments based on course review data, we aim to obtain an in-depth understanding of learners' perceptions of their learning experiences in MOOCs.  Figure 1 displays the architectural schema of data collection and analysis methodologies. A more detailed description of each major step (dataset preparation, coding scheme development, data coding procedure, review topic classification, and review sentiment analysis) is given in the following sections.

Methods
To capitalize on the advantages of NLP-oriented text-mining methodologies, the present study incorporates different analysis methods such as TextRCNN and sentiment analysis to conduct a more thorough analysis of the textual content of learner review data. By uncovering learners' focal points and sentiments based on course review data, we aim to obtain an in-depth understanding of learners' perceptions of their learning experiences in MOOCs. Figure 1 displays the architectural schema of data collection and analysis methodologies. A more detailed description of each major step (dataset preparation, coding scheme development, data coding procedure, review topic classification, and review sentiment analysis) is given in the following sections.

Dataset Preparation
Course metadata and review data from Class Central were trawled for further processing. After excluding duplicated MOOCs, MOOCs with fewer than 20 review comments (https://www.classcentral.com/help/highest-rated-online-courses accessed on 21 June 2022), and reviews not written in English, 102,184 reviews remained for spell check and correction using TextBlob (https://textblob.readthedocs.io/en/dev/ accessed on 21 June 2022). According to Park and Nicolau [45], online reviews can be divided into helpful and unhelpful reviews. Unhelpful reviews contain limited helpful information, thus contributing little to our understanding of customer satisfaction. Thus, there is a need to distinguish helpful reviews from unhelpful ones before conducting a formal analysis [46]. This study thus treated the top 80% of reviews ranked by helpful votes as helpful reviews. Then, naive Bayes was adopted as a classification model following the suggestion of Lubis et al. [47,48], with TF-IDF (term frequency-inverse document frequency) vectors constructed based on terms in the texts being used as inputs for classifier training and testing. A total of 99,779 helpful reviews were identified and used for further analysis.

Coding Scheme
This study develops a coding scheme for course review data analysis based on Moore's theory of transactional distance [49]. For example, Hew et al. [34] used six variables espoused in transactional distance theory to facilitate the understanding of learners' satisfaction with MOOCs, including course structure, videos, instructors, course content, learning resources, interaction, and assessment. By taking into consideration the factors in previous literature, this study designs a coding scheme with categories "Platforms and tools", "Course quality", "Learning resources", "Instructor", "Relationship", "Process", and "Assessment". The specific descriptions and examples of different categories are given in Table 1.

Dataset Preparation
Course metadata and review data from Class Central were trawled for further processing. After excluding duplicated MOOCs, MOOCs with fewer than 20 review comments (https://www.classcentral.com/help/highest-rated-online-courses accessed on 21 June 2022), and reviews not written in English, 102,184 reviews remained for spell check and correction using TextBlob (https://textblob.readthedocs.io/en/dev/ accessed on 21 June 2022). According to Park and Nicolau [45], online reviews can be divided into helpful and unhelpful reviews. Unhelpful reviews contain limited helpful information, thus contributing little to our understanding of customer satisfaction. Thus, there is a need to distinguish helpful reviews from unhelpful ones before conducting a formal analysis [46]. This study thus treated the top 80% of reviews ranked by helpful votes as helpful reviews. Then, naive Bayes was adopted as a classification model following the suggestion of Lubis et al. [47,48], with TF-IDF (term frequency-inverse document frequency) vectors constructed based on terms in the texts being used as inputs for classifier training and testing. A total of 99,779 helpful reviews were identified and used for further analysis.

Coding Scheme
This study develops a coding scheme for course review data analysis based on Moore's theory of transactional distance [49]. For example, Hew et al. [34] used six variables espoused in transactional distance theory to facilitate the understanding of learners' satisfaction with MOOCs, including course structure, videos, instructors, course content, learning resources, interaction, and assessment. By taking into consideration the factors in previous literature, this study designs a coding scheme with categories "Platforms and tools", "Course quality", "Learning resources", "Instructor", "Relationship", "Process", and "Assessment". The specific descriptions and examples of different categories are given in Table 1. Table 1. Coding scheme for course review data.

Categories Descriptions Examples
Platforms and tools Platform use, system quality, video quality "The video is very good and provides enough repetition to drive it home, but not so much you get bored" Course quality Content quality, course difficulty, knowledge enhancement, beginner friendliness, practicality, usefulness, helpfulness "The information and lesson were given in chunks which is easier for all learners to chew it"

Categories Descriptions Examples
Learning resources Textbooks, notes, handouts, slides "The lecture material aligns well with the textbook he's written for the course, as well as the think python textbook" Instructor Instructor knowledge, accessibility, enthusiasm, humor, instructional pace "The instructor was more involved than I have experienced in many MOOCs, which greatly appreciated and enhanced the learning experience"

Relationship
Peer interaction, leaner-instructor interaction "It was a good idea to allow users to interact, I like to read comments made by other students"

Coding Procedure
This study used sentences as the unit of analysis. Prior to training and testing the automatic classifier, there was a need to produce an "instructional" dataset. Thus, a researcher manually coded a sample of 10,000 sentences based on the seven categories. For example, the sentence "the lecture material aligns well with the textbook he has written for the course, as well as the think python textbook" was coded as "Learning resources" related mainly to lecture materials and textbooks. Another sentence, "the instructor was more involved than I have experienced in many MOOCs, and this was much appreciated and enhanced the learning experience", was coded as "Instructor" since it emphasizes instructor involvement. An example, "the assignment was quite difficult so we could maintain the level", was coded as "Assessment" because it focuses on course assignment. To ensure coding reliability, 200 out of the 10,000 sentences were coded by another researcher. As the agreement between the two researchers reached above 90%, only the codes showing inconsistencies were revised after discussion.

Automatic Classification of Review Data
The training and testing of the classifier included the following steps. First, each sentence was retrieved from textual review content as the model input alongside its referred topical categories. Second, sentences were chopped up into tokens, with punctuation and stop words being removed. Subsequently, the corpus was randomly categorized into training, validating, and testing datasets.
The present study adopted TextRCNN (recurrent convolutional neural network) for classifier training and testing. TextRCNN was developed by Lai et al. [50] as a deep neural model to capture text semantics. RCNN exploits recurrent structure's capabilities for capturing contextual information and learning text feature representations. RCNN's structure is presented in Figure 2. RCNN defines w i−1 and w i+1 as the previous and next words of w i . α is l or r to represent left or right. β is w i , w i−1 , w i+1 , and so on. c α (β) represents the word β's left or right context as a dense vector with |c| real value elements. f donates a non-linear activation function. W (α) represents a matrix for transforming a hidden layer into the subsequent one. W (sα) represents a matrix for combining the semantics of the present word with the subsequent word's left or right context. e(β) donates word β's word embedding as a dense vector with |e| real value elements. By using Equations (1) and (2), the word w i 's left-and right-side context vectors c l (w i ) and c r (w i ) can be obtained. By using Equation (3), the representation x i of the word w i turns into the word vector's concatenation, the forward and backward context vector. The forward and backward recurrent neural networks (RNNs) are adopted for obtaining the representations of individual words' forward and backward contexts. In Equation (4) simultaneously. The result y (2) i is then obtained and sent to the following layer. In the max pooling layer, representations y (3) of all words are obtained by using Equation (5), where the max represents an element-wise function. can be obtained. By using Equation (3), the representation of the word turns into the word vector's concatenation, the forward and backward context vector. The forward and backward recurrent neural networks (RNNs) are adopted for obtaining the representations of individual words' forward and backward contexts. In Equation (4), ( ) donates the bias vector. A linear transformation and a tanh activation function are adopted to simultaneously. The result ( ) is then obtained and sent to the following layer. In the max pooling layer, representations ( ) of all words are obtained by using Equation (5), where the max represents an element-wise function. To measure how the trained classifier performs, precision, recall, and F1 scores were adopted [51], as shown in Equations (6)- (8). The classifier was trained for 100 epochs with a batch size of 64; categorical cross-entropy was utilized for loss computation.

Sentiment Analysis of Review Data
Sentiment analysis is a "field of study that analyzes people's opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes" (p. 7) [52]. In the context of teaching and learning, the understanding of MOOC learners' positive and negative sentiments helps instructors better understand learners' needs and satisfaction. The present study calculated learners' sentimental values for each sentence using the syuzhet package, with four sentiment lexicons being considered, including "syuzhet", "afinn", "bing", and "nrc". To be specific, the "syuzhet" lexicon is composed of 10,748 words in relation to a sentimental value ranging between −1 (negative) and 1 (positive), where 7161 negative words dominate the whole corpus. The "afinn" lexicon consists of a list of 2477 Internet slang and obscene words in English to indicate semantic orientation ranging from −5 (negative) to 5 (positive). The "bing" lexicon includes 2006 and 4783 positive and negative words. Table 2 shows the details of the "syuzhet", "afinn", and "bing" lexicons. To measure how the trained classifier performs, precision, recall, and F1 scores were adopted [51], as shown in Equations (6)- (8). The classifier was trained for 100 epochs with a batch size of 64; categorical cross-entropy was utilized for loss computation.

Sentiment Analysis of Review Data
Sentiment analysis is a "field of study that analyzes people's opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes" (p. 7) [52]. In the context of teaching and learning, the understanding of MOOC learners' positive and negative sentiments helps instructors better understand learners' needs and satisfaction. The present study calculated learners' sentimental values for each sentence using the syuzhet package, with four sentiment lexicons being considered, including "syuzhet", "afinn", "bing", and "nrc". To be specific, the "syuzhet" lexicon is composed of 10,748 words in relation to a sentimental value ranging between −1 (negative) and 1 (positive), where 7161 negative words dominate the whole corpus. The "afinn" lexicon consists of a list of 2477 Internet slang and obscene words in English to indicate semantic orientation ranging from −5 (negative) to 5 (positive). The "bing" lexicon includes 2006 and 4783 positive and negative words. Table 2 shows the details of the "syuzhet", "afinn", and "bing" lexicons. The "nrc" dictionary proposed by Mohammad and Turney [53] differs from the above three lexicons because it detects eight types of emotions and the associated valences instead of just reporting positive or negative words. The "nrc" lexicon comprises 13,889 words distributed among the different categories, as shown in Table 3. Table 2. Details about the "syuzhet", "afinn", and "bing" lexicons.

Lexicon
No  Considering the differences between the four lexicons, we also considered the average value of the sentiment scores calculated based on the four lexicons. A positive sentiment about the instructor could be "the teachers are wonderful and are extremely suppurative". A negative sentiment about instructors is "the problem is that the professors seem to talk to themselves". A positive sentiment about assessment could be "the exercise is useful, and the code has some real-world applications, but I would have gotten more out of it if I was able to write some of the segments myself". A negative sentiment about assessment is "you can't submit quires to see if you got them right and all your coming problems are marked wrong".

Performance of the Classification Model
The RCNN classifier's performance across different categories is presented in Figure 3. The precision values for "Course quality", "Instructor", and "Process" were 76.46%, 69.37%, and 54.93%. Regarding recall, the top included "Course quality", "Instructor", and "Learning resources", with values of 75.41%, 71.73%, and 53.45%. Regarding the F1 score, "Course quality", "Instructor", and "Learning resources" achieved the highest scores of 75.93%, 70.53%, and 54.15%, respectively. For "Course quality" and "Instructor", 75.41% and 71.73% of records were classified accurately. To sum up, the RCNN classifier could identify course reviews regarding their topical categories such as "Course quality" and "Instructor". However, it performed relatively poorly in classifying course reviews associated with categories such as "Platforms and tools" and "Relationship". The "nrc" dictionary proposed by Mohammad and Turney [53] differs from the above three lexicons because it detects eight types of emotions and the associated valences instead of just reporting positive or negative words. The "nrc" lexicon comprises 13,889 words distributed among the different categories, as shown in Table 3. Table 2. Details about the "syuzhet", "afinn", and "bing" lexicons.  Considering the differences between the four lexicons, we also considered the average value of the sentiment scores calculated based on the four lexicons. A positive sentiment about the instructor could be "the teachers are wonderful and are extremely suppurative". A negative sentiment about instructors is "the problem is that the professors seem to talk to themselves". A positive sentiment about assessment could be "the exercise is useful, and the code has some real-world applications, but I would have gotten more out of it if I was able to write some of the segments myself". A negative sentiment about assessment is "you can't submit quires to see if you got them right and all your coming problems are marked wrong".

Performance of the Classification Model
The RCNN classifier's performance across different categories is presented in Figure  3. The precision values for "Course quality", "Instructor", and "Process" were 76.46%, 69.37%, and 54.93%. Regarding recall, the top included "Course quality", "Instructor", and "Learning resources", with values of 75.41%, 71.73%, and 53.45%. Regarding the F1 score, "Course quality", "Instructor", and "Learning resources" achieved the highest scores of 75.93%, 70.53%, and 54.15%, respectively. For "Course quality" and "Instructor", 75.41% and 71.73% of records were classified accurately. To sum up, the RCNN classifier could identify course reviews regarding their topical categories such as "Course quality" and "Instructor". However, it performed relatively poorly in classifying course reviews associated with categories such as "Platforms and tools" and "Relationship".  The trained RCNN model was used to predict labels for unannotated sentences. Examples of this automatic prediction process are as follows. For instance, when the sentence "the professors made the lessons lighthearted and understandable" was input into the RCNN classifier, the classifier analyzed its semantic content and predicted confidence levels for all categories. After the analysis, the label with the highest confidence was identified as the predicted category of the input sentence. Thus, the category "Instructor" achieved a probability of 1.0, reflecting its relevance to the category "Instructor". Similarly, the prediction result for the sentence "a lot of the exercises in this course are about mathematical induction, which is an extremely important skill in university mathematics" indicates its relevance to the category "Assessment" with a probability of 1.0. The prediction result for another sentence, "the only thing that I could not use was the voice recorder", shows its relevance to the category "Platforms and tools" with a probability of 1.0. By using the RCNN classifier, a final data corpus with complete labels for each sentence was obtained. The distribution of different topics in the data corpus is presented in Table 4. The prevalence of different topics varied a lot. For example, the category "Course quality" had the most sentences (i.e., 105,130, with a proportion of 56.30%), whereas "Process" had the least (i.e., 3165, 1.69%).

Learners' Perceptions of Different Factors
Learners' satisfaction with different factors across different subjects is determined based on sentiment scores obtained via sentiment analysis. Figure 4 shows the averaged sentiment scores calculated using "syuzhet" for each topic in different subjects. Figure 5 shows the averaged sentiment scores computed using "bing" for each topic in different subjects. Figure 6 shows the averaged sentiment scores computed using "afinn" for each topic in different subjects. Figure 7 shows the averaged sentiment scores computed using "nrc" for each topic in different subjects. From the results, we can see that the distribution patterns for the four figures were similar. For example, for the four types of calculation methods, learners tended to show the lowest level of satisfaction towards almost all of the different factors in their learning of MOOCs related to Data Science. Additionally, for MOOCs related to the fields of Humanities and Social Sciences (for example, Education and Teaching, Humanities, Personal Development, and Social Science), learners tended to show a higher level of satisfaction towards the different factors in comparison to MOOCs related to the fields of Science and Technology (for example, Programming, Mathematics, and Data Science). Such a finding is validated by Figure 8, which shows the averaged overall sentiment scores calculated based on different methodologies for each subject. For example, regarding the subject of Data Science, the overall sentiment scores seem to be low based on different calculation methods. In contrast, for subjects such as Education and Teaching, Humanities, and Personal Development, the overall sentiment scores seem to be high based on the different calculation methods.       We also obtained the averaged sentiment scores for different subjects, which were calculated based on the averaged sentiment scores obtained based on "syuzhet", "bing", "afinn", and "nrc". The results are presented in Figure 9, and do not vary much from the previous analysis. When looking at individual factors, there are some interesting results worth noting. For example, in terms of "Process", learners in almost all subjects showed a low level of satisfaction, especially for learners participating in courses related to Data Science. The low satisfaction towards "Process" is validated by Figure 10, where the overall sentiment scores for "Process" seem to be low based on the different calculation methods. Regarding "Instructor", learners in almost all subjects showed a high level of satisfaction.
The high satisfaction towards "Instructor" is validated by Figure 10, where the overall sentiment scores for "Instructor" seem to be high based on different calculation methods.     We also obtained the averaged sentiment scores for different subjects, which were calculated based on the averaged sentiment scores obtained based on "syuzhet", "bing", "afinn", and "nrc". The results are presented in Figure 9, and do not vary much from the previous analysis. When looking at individual factors, there are some interesting results worth noting. For example, in terms of "Process", learners in almost all subjects showed a low level of satisfaction, especially for learners participating in courses related to Data Science. The low satisfaction towards "Process" is validated by Figure 10, where the overall sentiment scores for "Process" seem to be low based on the different calculation methods. Regarding "Instructor", learners in almost all subjects showed a high level of satisfaction. The high satisfaction towards "Instructor" is validated by Figure 10, where the overall sentiment scores for "Instructor" seem to be high based on different calculation methods.  We also obtained the averaged sentiment scores for different subjects, which were calculated based on the averaged sentiment scores obtained based on "syuzhet", "bing", "afinn", and "nrc". The results are presented in Figure 9, and do not vary much from the previous analysis. When looking at individual factors, there are some interesting results worth noting. For example, in terms of "Process", learners in almost all subjects showed a low level of satisfaction, especially for learners participating in courses related to Data Science. The low satisfaction towards "Process" is validated by Figure 10, where the overall sentiment scores for "Process" seem to be low based on the different calculation methods. Regarding "Instructor", learners in almost all subjects showed a high level of satisfaction. The high satisfaction towards "Instructor" is validated by Figure 10, where the overall sentiment scores for "Instructor" seem to be high based on different calculation methods. Figure 9. Averaged scores calculated based on averaged sentiment scores of "syuzhet", "bing", "afinn", and "nrc".

Can Deep Learning Automatically Identify Factors That Can Predict Learner Satisfaction in
MOOCs?
To answer RQ1, this study investigates the potential of context classification of course review data using deep neural networks. To achieve this, it is essential that the classifier is capable of understanding semantic content. As the semantics of natural language are complicated, alongside the high dimensionality of text representations, conventional machine learning approaches usually fail to learn deep semantic information. Thus, the present study adopts RCNN to combine the unique capabilities of the convolutional neural network (CNN) and the RNN to capture and learn the deep relationships within 99,779 helpful MOOC reviews. Before classifier training, 10,000 randomly selected sentences from these reviews were coded manually regarding the referred topics according to a coding scheme with seven categories ("Platforms and tools", "Course quality", "Learning resources", "Instructor", "Relationship", "Process", and "Assessment") being considered. The classification performance of the RCNN classifier indicates its capability to classify the referred topical categories mentioned by MOOC learners during their evaluation of the attended courses regarding instructional design and implementation.
As for the RCNN classifier's performance across different categories, variations were found. For example, the RCNN achieved a classification accuracy of higher than 70% for categories such as "Instructor" and "Course quality". In contrast, it showed a classification accuracy of lower than 50% for categories such as "Relationship" and "Platforms and tools". The sample sizes may explain the differences in classification performance across different categories. More specifically, the categories with higher classification performance tended to have a larger sample size than those with a lower performance. This means that it was difficult for the deep neural network-based classifier to capture the deep relationships from the course review content for categories with sample sizes. Another explanation lies in the easiness of topic interpretation. Take the category "Instructor" as an example. It is relatively easy for the classifier to discriminate because learners com-

Can Deep Learning Automatically Identify Factors That Can Predict Learner Satisfaction in MOOCs?
To answer RQ1, this study investigates the potential of context classification of course review data using deep neural networks. To achieve this, it is essential that the classifier is capable of understanding semantic content. As the semantics of natural language are complicated, alongside the high dimensionality of text representations, conventional machine learning approaches usually fail to learn deep semantic information. Thus, the present study adopts RCNN to combine the unique capabilities of the convolutional neural network (CNN) and the RNN to capture and learn the deep relationships within 99,779 helpful MOOC reviews. Before classifier training, 10,000 randomly selected sentences from these reviews were coded manually regarding the referred topics according to a coding scheme with seven categories ("Platforms and tools", "Course quality", "Learning resources", "Instructor", "Relationship", "Process", and "Assessment") being considered. The classification performance of the RCNN classifier indicates its capability to classify the referred topical categories mentioned by MOOC learners during their evaluation of the attended courses regarding instructional design and implementation.
As for the RCNN classifier's performance across different categories, variations were found. For example, the RCNN achieved a classification accuracy of higher than 70% for categories such as "Instructor" and "Course quality". In contrast, it showed a classification accuracy of lower than 50% for categories such as "Relationship" and "Platforms and tools". The sample sizes may explain the differences in classification performance across different categories. More specifically, the categories with higher classification performance tended to have a larger sample size than those with a lower performance. This means that it was difficult for the deep neural network-based classifier to capture the deep relationships from the course review content for categories with sample sizes. Another explanation lies in the easiness of topic interpretation. Take the category "Instructor" as an example. It is relatively easy for the classifier to discriminate because learners commonly use terms such as professor, instructor, and teacher when expressing their perception of or satisfaction towards instructors. Thus, the classifier can associate the "Instructor" effortlessly with the appearance of these identifiable terms within a sentence. On the contrary, it is much more difficult for the classifier to detect categories such as "Relationship" and "Process", as they are less likely to associate with identifiable terms.

What Factors Are Frequently Mentioned by Learners?
The classification results provide answers to RQ2, showing that learners more frequently mentioned issues regarding categories such as "Course quality", "Instructor", and "Assessment", as compared to categories such as "Relationship" and "Process". Firstly, course quality is the most frequently discussed topic among MOOC learners. Course quality's significance is also reported in prior research. For instance, according to Albelbisi and Yusop [54], content quality and course materials that are easily understandable are essential for encouraging self-learning in MOOCs. Similarly, our results indicate that the success of MOOCs is positively associated with the high quality of course design, course content, and ease of content understanding. This finding is also highlighted by [55,56], who reported course quality's direct effect on learners' success in online learning. As Yousef et al. [57] suggest, the quality of course content is essential in promoting global learners' motivation to enroll and participate in MOOCs. A MOOC with first-rate design enables learners to organize and plan their learning independently, promotes their motivation to set goals, identifies efficient learning methodologies, and attains learning success. Lin et al. [58] also verified that learners' perceptions about the MOOC quality could significantly affect their acceptance of knowledge. Considering the importance of course quality, we suggest that designers and instructors of MOOCs ought to guarantee that the course materials are easy to understand with high-quality content to provide learners with a genuine chance to develop responsibility for their learning.
"Instructor" was the second most frequently mentioned issue among MOOC learners. This is consistent with prior research [27,28]. According to Hew et al. [34], knowledgeable, enthusiastic, and humorous instructors can satisfy learners more easily. Watson et al. [59] also suggest that instructors should show specialization in the subjects they teach and deliver the content clearly and concisely by using case studies and examples.
Another topic frequently mentioned by MOOC learners is "Assessment", which is consistent with Jordan [60]. This suggests that designers and instructors of MOOCs ought to pay more attention to the potential of assessments to promote learner satisfaction because this allows learners to verify and assess their learning against their goals. The significance of assessment in MOOCs has been highlighted by researchers. According to Bali [61], assessment tasks in MOOCs ought to offer chances for learners to apply their learned knowledge rather than being used to merely recall knowledge. Similarly, Hew [62] highlighted the necessity of active learning and knowledge application to improve MOOC learners' engagement. According to Hew et al. and [34]'s analyses of learners' course review data, assessments that could promote learner satisfaction should be clearly stated, implicitly associated with lecture content, and capable of allowing learners to apply knowledge learned in practice.
Comparatively, learners tended to mention the "Relationship" and "Process" factors infrequently. This suggests that interaction or instructor feedback has little impact on their perceptions of learning in MOOCs. One reason is that most learners understand that MOOC instructors have little time to spend supporting individuals in a large class. As a result, learners commonly have low expectations about interaction or instructor feedback. As learners in MOOCs are mainly motivated to broaden their horizons and enhance their expertise [63], they care mainly about whether they can achieve useful learning (e.g., their expected skills or knowledge) via MOOCs rather than about how the course is taught or whether there is rich interaction.

How Do Learners' Perceptions of the Identified Factors Differ across Subjects?
The investigation into learners' perceptions of the identified factors across different subjects provides answers to RQ3, indicating that learners' experiences in MOOCs are associated with subject differences. Such a finding is also confirmed by Li et al. [18], who considered MOOCs related to Arts and Humanities, Business, and Social Science as knowledge-seeking courses with an emphasis on learning concepts or principles, checking knowledge with quizzes and assignments, and enhancing decision-making in practice. On the contrary, MOOCs in Computer Science, Data Science, and Information Technology are mainly skill-seeking-driven and require learning through sampled problem-solving, laboratory tasks, projects, and assignments to promote new skill acquisition. Thus, in designing and implementing MOOCs related to different domains, designers and instructors should consider learners' different concerns and needs. For example, to satisfy learners that enroll in skill-seeking courses, more attention should be paid to problem-solving and project and assignment design to demonstrate their acquired skills.
We also found that learners in almost all subjects showed a low level of satisfaction towards topics such as "Process". In contrast, they tended to show high satisfaction towards topics such as "Instructor". A reason for the low satisfaction level of "Process" lies in the difficulties in providing inquiry support to individuals because instructors are overwhelmed with the workload of dealing with large classes [64]. Another challenge for MOOC instructors is feedback provision on assignments in line with individual solutions [65]. However, some providers have integrated interactive technologies and modules (e.g., discussion forums and live chats) into MOOC platforms to support problem-solving and knowledge inquiry. However, it takes considerable time and effort to run these interactive modules [66]. Therefore, alternative tools such as video feedback can be integrated to "help MOOC instructors scale up the provision of perceived personal attention to students" (p. 15) [65]. Additionally, MOOC designers and instructors ought to pay more attention to problemsolving during instruction rather than merely focusing on information delivery [41].

Conclusions
This study adopts deep learning and sentiment analysis to explore learners' perceptions of different factors regarding MOOC design and implementation, and understand factors that can impact learners' satisfaction with their learning in MOOCs. The contributions of this study include: (1) providing a quantitative analysis of 102,184 recodes of learners' reviews on MOOCs; (2) proposing a novel deep learning and sentiment analysis methodological framework for the examination of large-scale learner-produced review content; and (3) identifying essential factors that are frequently mentioned by learners who have attended MOOCs.
This study has limitations. We only used data collected in Class Central; thus, different MOOC websites such as EdX, Coursera, and FutureLearn should be considered to validate the results and findings. Another limitation lies in the use of sentences for the calculation of sentimental scores. This can be error-prone, since learners sometimes show contradictory attitudes towards different issues within one sentence. Thus, future work should focus on proposing fine-grained sentiment analysis models for the detection of learners' accurate perceptions of a specific topic based on the associated context. In addition, the RCNN classifier's performance was especially low for some categories (e.g., "Relationship" and "Platforms and tools") compared to others. Future work should consider revising the topic categories and seek ways to improve the classification performance.