Data Mining and Computational Intelligence for E-learning and Education

A special issue of Data (ISSN 2306-5729). This special issue belongs to the section "Information Systems and Data Management".

Deadline for manuscript submissions: closed (30 April 2023) | Viewed by 86798

Special Issue Editors


E-Mail Website
Guest Editor
Facultad de Informática, Universidad Complutense de Madrid, 28040 Madrid, Spain
Interests: fuzzy logic; artificial intelligence; algorithmic optimization; databases

Special Issue Information

Dear Colleagues,

In recent decades, the rise of artificial intelligence has driven its application in various fields, including education. Applications can be found aimed at analyzing the data of the learning-teaching activity, both in the face-to-face environment and in distance-learning environments, through intelligent algorithms with the aim of extracting information about the educational process. From this information, it is possible to infer aspects such as the reasons for the success or failure of students, patterns of behavior and learning, and other predictions. Likewise, applications have also been developed that implement intelligent algorithms with the aim of automating the educational process. Related to this last point is the development of chatbots and approaches to ethics in the use of artificial intelligence. In this sense, an area of interest has developed relating to the application of artificial intelligence to problem solving in education. The objective of this Special Issue is to bring together works that show the latest advances in the application of artificial intelligence to the educational field, as well as those describing specific experiences and applications to certain problems.

The objective of this Special Issue is to serve as a meeting point for all researchers working in these fields, both theoretically and applied. The topics of interest include but are not limited to:

  • Machine learning applied to e-learning and education;
  • Artificial intelligence applied to e-learning and education;
  • Big data and e-learning;
  • Intelligent learning systems;
  • Data analysis applied to e-learning and education;
  • Intelligent systems for e-learning;
  • Ethical aspects of the application of AI in education;
  • E-learning analytics;
  • Data mining for e-learning and education;
  • Chatbots for education.

Both review articles on the state of the art and experimental or theoretical articles are welcome.

Prof. Dr. Antonio Sarasa Cabezuelo
Dr. Ramón González del Campo Rodríguez Barbero
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Data is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • e-learning
  • machine learning
  • artificial intelligence
  • data analysis
  • algorithms
  • big data

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

14 pages, 2048 KiB  
Article
Data Balancing Techniques for Predicting Student Dropout Using Machine Learning
by Neema Mduma
Data 2023, 8(3), 49; https://doi.org/10.3390/data8030049 - 27 Feb 2023
Cited by 10 | Viewed by 5128
Abstract
Predicting student dropout is a challenging problem in the education sector. This is due to an imbalance in student dropout data, mainly because the number of registered students is always higher than the number of dropout students. Developing a model without taking the [...] Read more.
Predicting student dropout is a challenging problem in the education sector. This is due to an imbalance in student dropout data, mainly because the number of registered students is always higher than the number of dropout students. Developing a model without taking the data imbalance issue into account may lead to an ungeneralized model. In this study, different data balancing techniques were applied to improve prediction accuracy in the minority class while maintaining a satisfactory overall classification performance. Random Over Sampling, Random Under Sampling, Synthetic Minority Over Sampling, SMOTE with Edited Nearest Neighbor and SMOTE with Tomek links were tested, along with three popular classification models: Logistic Regression, Random Forest, and Multi-Layer Perceptron. Publicly accessible datasets from Tanzania and India were used to evaluate the effectiveness of balancing techniques and prediction models. The results indicate that SMOTE with Edited Nearest Neighbor achieved the best classification performance on the 10-fold holdout sample. Furthermore, Logistic Regression correctly classified the largest number of dropout students (57348 for the Uwezo dataset and 13430 for the India dataset) using the confusion matrix as the evaluation matrix. The applications of these models allow for the precise prediction of at-risk students and the reduction of dropout rates. Full article
Show Figures

Figure 1

27 pages, 1261 KiB  
Article
Multi-Level Analysis of Learning Management Systems’ User Acceptance Exemplified in Two System Case Studies
by Parisa Shayan, Roberto Rondinelli, Menno van Zaanen and Martin Atzmueller
Data 2023, 8(3), 45; https://doi.org/10.3390/data8030045 - 22 Feb 2023
Cited by 2 | Viewed by 3419
Abstract
There has recently been an increasing interest in Learning Management Systems (LMSs). It is currently unclear, however, exactly how these systems are perceived by their users. This article analyzes data on user acceptance for two LMSs (Blackboard and Canvas). The respective data are [...] Read more.
There has recently been an increasing interest in Learning Management Systems (LMSs). It is currently unclear, however, exactly how these systems are perceived by their users. This article analyzes data on user acceptance for two LMSs (Blackboard and Canvas). The respective data are collected using a questionnaire modeled after the Technology Acceptance Model (TAM); it relates several variables that influence system acceptability, allowing for a detailed analysis of the system acceptance. We present analyses at two levels of the questionnaire data: questions and constructs (taken from TAM) as well as on different analysis levels using targeted methods. First, we investigate the differences between the above LMSs using statistical tests (t-test). Second, we provide results at the question level using descriptive indices, such as the mean and the Gini heterogeneity index, and apply methods for ordinal data using the Cumulative Link Mixed Model (CLMM). Next, we apply the same approach at the TAM construct level plus descriptive network analysis (degree centrality and bipartite motifs) to explore the variability of users’ answers and the degree of users’ satisfaction considering the extracted patterns. In the context of TAM, the statistical model is able to analyze LMS acceptance on the question level. As we are also very much interested in identifying LMS acceptance at the construct level, in this article, we provide both statistical analysis as well as network analysis to explore the connection between questionnaire data and relational data. A network analysis approach is particularly useful when analyzing LMS acceptance on the construct level, as this can take the structure of the users’ answers across questions per construct into account. Taken together, these results suggest a higher rate of user acceptance among Canvas users compared to Blackboard both for the question and construct level. Likewise, the descriptive network modeling for Canvas indicates a slightly higher concordance between Canvas users than Blackboard at the construct level. Full article
Show Figures

Figure 1

16 pages, 1573 KiB  
Article
Federated Learning for Data Analytics in Education
by Christian Fachola, Agustín Tornaría, Paola Bermolen, Germán Capdehourat, Lorena Etcheverry and María Inés Fariello
Data 2023, 8(2), 43; https://doi.org/10.3390/data8020043 - 20 Feb 2023
Cited by 7 | Viewed by 3646
Abstract
Federated learning techniques aim to train and build machine learning models based on distributed datasets across multiple devices while avoiding data leakage. The main idea is to perform training on remote devices or isolated data centers without transferring data to centralized repositories, thus [...] Read more.
Federated learning techniques aim to train and build machine learning models based on distributed datasets across multiple devices while avoiding data leakage. The main idea is to perform training on remote devices or isolated data centers without transferring data to centralized repositories, thus mitigating privacy risks. Data analytics in education, in particular learning analytics, is a promising scenario to apply this approach to address the legal and ethical issues related to processing sensitive data. Indeed, given the nature of the data to be studied (personal data, educational outcomes, and data concerning minors), it is essential to ensure that the conduct of these studies and the publication of the results provide the necessary guarantees to protect the privacy of the individuals involved and the protection of their data. In addition, the application of quantitative techniques based on the exploitation of data on the use of educational platforms, student performance, use of devices, etc., can account for educational problems such as the determination of user profiles, personalized learning trajectories, or early dropout indicators and alerts, among others. This paper presents the application of federated learning techniques to a well-known learning analytics problem: student dropout prediction. The experiments allow us to conclude that the proposed solutions achieve comparable results from the performance point of view with the centralized versions, avoiding the concentration of all the data in a single place for training the models. Full article
Show Figures

Figure 1

18 pages, 2683 KiB  
Article
Density-Based Unsupervised Learning Algorithm to Categorize College Students into Dropout Risk Levels
by Miguel Angel Valles-Coral, Luis Salazar-Ramírez, Richard Injante, Edwin Augusto Hernandez-Torres, Juan Juárez-Díaz, Jorge Raul Navarro-Cabrera, Lloy Pinedo and Pierre Vidaurre-Rojas
Data 2022, 7(11), 165; https://doi.org/10.3390/data7110165 - 18 Nov 2022
Cited by 8 | Viewed by 3854
Abstract
Compliance with the basic conditions of quality in higher education implies the design of strategies to reduce student dropout, and Information and Communication Technologies (ICT) in the educational field have allowed directing, reinforcing, and consolidating the process of professional academic training. We propose [...] Read more.
Compliance with the basic conditions of quality in higher education implies the design of strategies to reduce student dropout, and Information and Communication Technologies (ICT) in the educational field have allowed directing, reinforcing, and consolidating the process of professional academic training. We propose an academic and emotional tracking model that uses data mining and machine learning to group university students according to their level of dropout risk. We worked with 670 students from a Peruvian public university, applied 5 valid and reliable psychological assessment questionnaires to them using a chatbot-based system, and then classified them using 3 density-based unsupervised learning algorithms, DBSCAN, K-Means, and HDBSCAN. The results showed that HDBSCAN was the most robust option, obtaining better validity levels in two of the three internal indices evaluated, where the performance of the Silhouette index was 0.6823, the performance of the Davies–Bouldin index was 0.6563, and the performance of the Calinski–Harabasz index was 369.6459. The best number of clusters produced by the internal indices was five. For the validation of external indices, with answers from mental health professionals, we obtained a high level of precision in the F-measure: 90.9%, purity: 94.5%, V-measure: 86.9%, and ARI: 86.5%, and this indicates the robustness of the proposed model that allows us to categorize university students into five levels according to the risk of dropping out. Full article
Show Figures

Figure 1

41 pages, 2202 KiB  
Article
Thematic Analysis of Indonesian Physics Education Research Literature Using Machine Learning
by Purwoko Haryadi Santoso, Edi Istiyono, Haryanto and Wahyu Hidayatulloh
Data 2022, 7(11), 147; https://doi.org/10.3390/data7110147 - 28 Oct 2022
Cited by 4 | Viewed by 4285
Abstract
Abundant physics education research (PER) literature has been disseminated through academic publications. Over the years, the growing body of literature challenges Indonesian PER scholars to understand how the research community has progressed and possible future work that should be encouraged. Nevertheless, the previous [...] Read more.
Abundant physics education research (PER) literature has been disseminated through academic publications. Over the years, the growing body of literature challenges Indonesian PER scholars to understand how the research community has progressed and possible future work that should be encouraged. Nevertheless, the previous traditional method of thematic analysis possesses limitations when the amount of PER literature exponentially increases. In order to deal with this plethora of publications, one of the machine learning (ML) algorithms from natural language processing (NLP) studies was employed in this paper to automate a thematic analysis of Indonesian PER literature that still needs to be explored within the community. One of the well-known NLP algorithms, latent Dirichlet allocation (LDA), was used in this study to extract Indonesian PER topics and their evolution between 2014 and 2021. A total of 852 papers (~4 to 8 pages each) were collectively downloaded from five international conference proceedings organized, peer reviewed, and published by Indonesian PER researchers. Before their topics were modeled through the LDA algorithm, our data corpus was preprocessed through several common procedures of established NLP studies. The findings revealed that LDA had thematically quantified Indonesian PER topics and described their distinct development over a certain period. The identified topics from this study recommended that the Indonesian PER community establish robust development in eight distinct topics to the present. Here, we commenced with an initial interest focusing on research on physics laboratories and followed the research-based instruction in late 2015. For the past few years, the Indonesian PER scholars have mostly studied 21st century skills which have given way to a focus on developing relevant educational technologies and promoting the interdisciplinary aspects of physics education. We suggest an open room for Indonesian PER scholars to address the qualitative aspects of physics teaching and learning that is still scant within the literature. Full article
Show Figures

Figure 1

13 pages, 695 KiB  
Article
Advances in Contextual Action Recognition: Automatic Cheating Detection Using Machine Learning Techniques
by Fairouz Hussein, Ayat Al-Ahmad, Subhieh El-Salhi, Esra’a Alshdaifat and Mo’taz Al-Hami
Data 2022, 7(9), 122; https://doi.org/10.3390/data7090122 - 31 Aug 2022
Cited by 3 | Viewed by 7620
Abstract
Teaching and exam proctoring represent key pillars of the education system. Human proctoring, which involves visually monitoring examinees throughout exams, is an important part of assessing the academic process. The capacity to proctor examinations is a critical component of educational scalability. However, such [...] Read more.
Teaching and exam proctoring represent key pillars of the education system. Human proctoring, which involves visually monitoring examinees throughout exams, is an important part of assessing the academic process. The capacity to proctor examinations is a critical component of educational scalability. However, such approaches are time-consuming and expensive. In this paper, we present a new framework for the learning and classification of cheating video sequences. This kind of study aids in the early detection of students’ cheating. Furthermore, we introduce a new dataset, “actions of student cheating in paper-based exams”. The dataset consists of suspicious actions in an exam environment. Five classes of cheating were performed by eight different actors. Each pair of subjects conducted five distinct cheating activities. To evaluate the performance of the proposed framework, we conducted experiments on action recognition tasks at the frame level using five types of well-known features. The findings from the experiments on the framework were impressive and substantial. Full article
Show Figures

Figure 1

27 pages, 2571 KiB  
Article
A Cross-Sectional Study on Mental Health of School Students during the COVID-19 Pandemic in India
by Sibnath Deb, Samarjit Kar, Shayana Deb, Sanjib Biswas, Aehsan Ahmad Dar and Tusharika Mukherjee
Data 2022, 7(7), 99; https://doi.org/10.3390/data7070099 - 18 Jul 2022
Cited by 5 | Viewed by 4540
Abstract
The broad objective of the present study is to assess the levels of anxiety and depression of school students during the COVID-19 lockdown phase and their association with students’ background, stress, concerns and social support. In this regard, the present study follows a [...] Read more.
The broad objective of the present study is to assess the levels of anxiety and depression of school students during the COVID-19 lockdown phase and their association with students’ background, stress, concerns and social support. In this regard, the present study follows a novel two stage approach. In the first phase, an empirical survey was carried out, based on multivariate statistical analysis, wherein a group of 273 school students participated in the study voluntarily. In the second phase, a novel Picture Fuzzy FFA (PF-FFA) method was applied for understanding the dynamics of facilitating and prohibiting factors for three categories of focus groups (FG), formulated on the basis of attendance in online classes. Findings revealed a significant impact of anxiety and depression on mental health. Further, PF-FFA examinedthe impact of the driving forces that steered children to attend class as contrasted to the the impact of the restricting forces. Full article
Show Figures

Figure 1

18 pages, 4276 KiB  
Article
Development of a Model Using Data Mining Technique to Test, Predict and Obtain Knowledge from the Academics Results of Information Technology Students
by Wisam Ibrahim, Sanjar Abdullaev, Hussein Alkattan, Oluwaseun A. Adelaja and Alhumaima Ali Subhi
Data 2022, 7(5), 67; https://doi.org/10.3390/data7050067 - 23 May 2022
Cited by 6 | Viewed by 3664
Abstract
Due to the huge amount of data obtained from students’ academic results in most tertiary institutions such as the colleges, polytechnics and universities, data mining has become one of the most effective tools for discovering vital knowledge from students’ dataset. The discovered knowledge [...] Read more.
Due to the huge amount of data obtained from students’ academic results in most tertiary institutions such as the colleges, polytechnics and universities, data mining has become one of the most effective tools for discovering vital knowledge from students’ dataset. The discovered knowledge can be productive in understanding numerous challenges in the scope of education and providing possible solutions to these challenges. The main objective of this research is to utilize the J48 decision algorithm model to test, classify and predict the students’ dataset by identifying some important attributes and instances. The analysis was conducted on the final year students’ academic results in C# programming amongst five universities which was imported in csv excel file dataset in WEKA environment. These training datasets contained the scores obtained in the examinations, grade remarks, grades, gender, and department. The knowledge extracted for the prediction model will help both the tutors and students to determine the success grade performance in the future. Flow lines, J48 decision trees, confusion matrices and a program flowchart were generated from the students’ dataset. The KAPPA value obtained from the prediction in this research ranges from 0.9070–0.9582 which perfectly agrees with the standard for an ideal analysis on datasets. Full article
Show Figures

Figure 1

Other

Jump to: Research

16 pages, 5700 KiB  
Data Descriptor
Knowledge Discovery and Dataset for the Improvement of Digital Literacy Skills in Undergraduate Students
by Pongpon Nilaphruek and Pattama Charoenporn
Data 2023, 8(7), 121; https://doi.org/10.3390/data8070121 - 20 Jul 2023
Cited by 3 | Viewed by 1917
Abstract
For over two decades, scholars and practitioners have emphasized the importance of digital literacy, yet the existing datasets are insufficient for establishing learning analytics in Thailand. Learning analytics focuses on gathering and analyzing student data to optimize learning tools and activities to improve [...] Read more.
For over two decades, scholars and practitioners have emphasized the importance of digital literacy, yet the existing datasets are insufficient for establishing learning analytics in Thailand. Learning analytics focuses on gathering and analyzing student data to optimize learning tools and activities to improve students’ learning experiences. The main problem is that the ICT skill levels of the youth are rather low in Thailand. To facilitate research in this field, this study has compiled a dataset containing information from the IC3 digital literacy certification delivered at the Rajamangala University of Technology Thanyaburi (RMUTT) in Thailand between 2016 and 2023. This dataset is unique since it includes demographic and academic records about undergraduate students. The dataset was collected and underwent a preparation process, including data cleansing, anonymization, and release. This data enables the examination of student learning outcomes, represented by a dataset containing information about 45,603 records with students’ certification assessment scores. This compiled dataset provides a rich resource for researchers studying digital literacy and learning analytics. It offers researchers the opportunity to gain valuable insights, inform evidence-based educational practices, and contribute to the ongoing efforts to improve digital literacy education in Thailand and beyond. Full article
Show Figures

Figure 1

6 pages, 883 KiB  
Data Descriptor
Data from Zimbabwean College Students on the Measurement Invariance of the Entrepreneurship Goal and Implementation Intentions Scales
by Takawira Munyaradzi Ndofirepi
Data 2022, 7(12), 172; https://doi.org/10.3390/data7120172 - 29 Nov 2022
Viewed by 1486
Abstract
This article analyses primary data on the entrepreneurship intentions of selected Zimbabwean college students. The goal of this study was to examine the measurement invariance of the entrepreneurship goal and implementation intention scales across gender groups in a higher education setting. Entrepreneurship goal [...] Read more.
This article analyses primary data on the entrepreneurship intentions of selected Zimbabwean college students. The goal of this study was to examine the measurement invariance of the entrepreneurship goal and implementation intention scales across gender groups in a higher education setting. Entrepreneurship goal intentions (EGI) and entrepreneurship implementation intentions (EII) are examined as separate but related constructs. To address the research goal, a positivist philosophy and quantitative research approach were used. A cross-sectional survey was used to collect data from a convenient sample of 262 college students in Zimbabwe. A researcher-administered questionnaire, written in English, was distributed to the respondents and collected after completion. Multi-group confirmatory analysis was performed on the dataset using JASP computer software. The results obtained confirmed all four levels of measurement invariance, namely configural, metric, scalar, and strict invariance. The pattern of the results validates the consistency of the measurement properties of the entrepreneurial intention instruments designed in developed countries across different contexts of use. Researchers, entrepreneurship educators, and policymakers in Zimbabwe can use the results of this analysis to quantify potential entrepreneurs among young adults and to come up with intervention measures to support future entrepreneurship. Full article
Show Figures

Figure 1

17 pages, 6419 KiB  
Data Descriptor
Predicting Student Dropout and Academic Success
by Valentim Realinho, Jorge Machado, Luís Baptista and Mónica V. Martins
Data 2022, 7(11), 146; https://doi.org/10.3390/data7110146 - 28 Oct 2022
Cited by 25 | Viewed by 31015
Abstract
Higher education institutions record a significant amount of data about their students, representing a considerable potential to generate information, knowledge, and monitoring. Both school dropout and educational failure in higher education are an obstacle to economic growth, employment, competitiveness, and productivity, directly impacting [...] Read more.
Higher education institutions record a significant amount of data about their students, representing a considerable potential to generate information, knowledge, and monitoring. Both school dropout and educational failure in higher education are an obstacle to economic growth, employment, competitiveness, and productivity, directly impacting the lives of students and their families, higher education institutions, and society as a whole. The dataset described here results from the aggregation of information from different disjointed data sources and includes demographic, socioeconomic, macroeconomic, and academic data on enrollment and academic performance at the end of the first and second semesters. The dataset is used to build machine learning models for predicting academic performance and dropout, which is part of a Learning Analytic tool developed at the Polytechnic Institute of Portalegre that provides information to the tutoring team with an estimate of the risk of dropout and failure. The dataset is useful for researchers who want to conduct comparative studies on student academic performance and also for training in the machine learning area. Full article
Show Figures

Figure 1

17 pages, 1326 KiB  
Data Descriptor
Student Dataset from Tecnologico de Monterrey in Mexico to Predict Dropout in Higher Education
by Joanna Alvarado-Uribe, Paola Mejía-Almada, Ana Luisa Masetto Herrera, Roland Molontay, Isabel Hilliger, Vinayak Hegde, José Enrique Montemayor Gallegos, Renato Armando Ramírez Díaz and Hector G. Ceballos
Data 2022, 7(9), 119; https://doi.org/10.3390/data7090119 - 25 Aug 2022
Cited by 12 | Viewed by 8286
Abstract
High dropout rates and delayed completion in higher education are associated with considerable personal and social costs. In Latin America, 50% of students drop out, and only 50% of the remaining ones graduate on time. Therefore, there is an urgent need to identify [...] Read more.
High dropout rates and delayed completion in higher education are associated with considerable personal and social costs. In Latin America, 50% of students drop out, and only 50% of the remaining ones graduate on time. Therefore, there is an urgent need to identify students at risk and understand the main factors of dropping out. Together with the emergence of efficient computational methods, the rich data accumulated in educational administrative systems have opened novel approaches to promote student persistence. In order to support research related to preventing student dropout, a dataset has been gathered and curated from Tecnologico de Monterrey students, consisting of 50 variables and 143,326 records. The dataset contains non-identifiable information of 121,584 High School and Undergraduate students belonging to the seven admission cohorts from August–December 2014 to 2020, covering two educational models. The variables included in this dataset consider factors mentioned in the literature, such as sociodemographic and academic information related to the student, as well as institution-specific variables, such as student life. This dataset provides researchers with the opportunity to test different types of models for dropout prediction, so as to inform timely interventions to support at-risk students. Full article
Show Figures

Figure 1

16 pages, 339 KiB  
Data Descriptor
A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave
by Nirmalya Thakur
Data 2022, 7(8), 109; https://doi.org/10.3390/data7080109 - 4 Aug 2022
Cited by 19 | Viewed by 5751
Abstract
The COVID-19 Omicron variant, reported to be the most immune-evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, [...] Read more.
The COVID-19 Omicron variant, reported to be the most immune-evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, social media platforms such as Twitter are seeing an increase in conversations related to online learning in the form of tweets. Mining such tweets to develop a dataset can serve as a data resource for different applications and use-cases related to the analysis of interest, views, opinions, perspectives, attitudes, and feedback towards online learning during the current surge of COVID-19 cases caused by the Omicron variant. Therefore, this work presents a large-scale, open-access Twitter dataset of conversations about online learning from different parts of the world since the first detected case of the COVID-19 Omicron variant in November 2021. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management. The paper also briefly outlines some potential applications in the fields of Big Data, Data Mining, Natural Language Processing, and their related disciplines, with a specific focus on online learning during this Omicron wave that may be studied, explored, and investigated by using this dataset. Full article
Back to TopTop