Dataset of Students’ Performance Using Student Information System, Moodle and the Mobile Application “eDify”

: The data presented in this article comprise an educational dataset collected from the student information system (SIS), the learning management system (LMS) called Moodle, and video interactions from the mobile application called “eDify.” The dataset, from the higher educational institution (HEI) in Sultanate of Oman, comprises ﬁve modules of data from Spring 2017 to Spring 2021. The dataset consists of 326 student records with 40 features in total, including the students’ academic information from SIS (which has 24 features), the students’ activities performed on Moodle within and outside the campus (comprising 10 features), and the students’ video interactions collected from eDify (consisting of six features). The dataset is useful for researchers who want to explore students’ academic performance in online learning environments, and will help them to model their educational datamining models. Moreover, it can serve as an input for predicting students’ academic performance within the module for educational datamining and learning analytics. Furthermore, researchers are highly recommended to refer to the original papers for more details.


Summary
Higher educational institutions (HEIs) employ a variety of learning approaches based on information and communications technology (ICT). These approaches involve different learning environments to facilitate the teaching and learning process with ease and dissemination of knowledge to their learners. Moreover, these environments keep track of the users and their interactions within these environments for auditing and recovery purposes. The logs can help stakeholders with valuable learning data, and when analyzed effectively, can help to provide a better learning experience to learners. Reports generating different users/courses can be used to evaluate the efficacy of the courses and the progress of the learners. Insights can help cater different learning styles, which helps to determine the complexity of courses, identifying specific parts of the content that cause problems in understanding the concepts and gaining insights into the future performance of learners.
Many HEIs use machine learning (ML) to discover associated patterns from these learning environments for better decision making and datamining (DM) to improve decisionmaking models using artificial intelligence (AI). HEIs require educational datamining (EDM) for a better understanding of learners' behaviors in these learning environments that have the potential to impact educational practices [1].
The provided dataset was from the students of Middle East College (MEC), Muscat, Oman, studying in a computing specialization from the sixth semester and above. Historical data about student academics (extracted from SIS), student logs (extracted from Moodle, where the time spent engaging in activities was considered), and video interactions on blended learning material (extracted from logs of the mobile application eDify). For the students' academic data, the SIS parameters included student demographic data, academic data, degree plan, and academic integrity violations (AIVs). The academic and AIV data were considered. For the students' activities, the Moodle log parameters included logs of course activity, logs of site activity, live logs, site administration settings, and view log capabilities-although, only logs of course activity were considered. Different from the aforementioned, the eDify logs consist of video interactions for each student, indicating attributes such as played, paused, likes, and number of segments replayed within the video-all of which were selected. The Supplementary Material provided within this paper is the raw and filtered data.
To predict student performance based on the datamining approach, many studies have been carried out [2][3][4][5][6][7]. Nevertheless, these studies have primarily focused on demographic data, and predictions have been carried out based on activities performed in the online environment. However limited research has been conducted based on analyzing the video interactions of learners in a video-assisted course [8][9][10][11][12]. The provided dataset contains SIS, Moodle, and video-assisted course data (eDify), which can help researchers to understand video learning analytics using EDM, thereby enhancing the teaching and learning process.
The dataset provided aimed to predict student performance using EDM. The dataset contained 326 observations, where each observation represents an individual student and has 40 attributes. The application of the dataset can provide the research community to benchmark EDM tasks performed on longitude and latitude datasets. This can help to understand student academic performance (SAP) modeling and prediction using datamining techniques [13,14]. Furthermore, it can be combined with other online environments such as Moodle and online video streaming to understand the behaviors of their learners [15][16][17].

Data Description
The presented dataset was classified into three categories: Student academic information (Sections 2.1 and 2.2), student activity (Sections 2.3 and 2.4), and student video interactions (Section 2.5). First, student academic information was collected from SIS. Second, student activity information was collected from the activities performed on Moodle. Lastly, student video interactions were collected from the mobile application "eDify". Figure 1 shows the mapping on how the dataset was formed.

Student Academic Information
Ten comma-separated value (CSV) files of "KMS Module <Number> <Semester>," which contain "Know My Student" detail features, were extracted from SIS, with 20 attributes. Table 1 summarizes these attributes, accompanied by a brief description.

ModuleCode
Code of the module in which the student has been registered, with a nominal data type such as "Module 1"

ModuleTitle
Title of the module in which the student has been registered, with a nominal data type such as "Course 2" Session Shows the session in which the student has been registered, with a nominal data type such as "Session-A"

Student Academic Information
Ten comma-separated value (CSV) files of "KMS Module <Number> <Semester>," which contain "Know My Student" detail features, were extracted from SIS, with 20 attributes. Table 1 summarizes these attributes, accompanied by a brief description.

ModuleCode
Code of the module in which the student has been registered, with a nominal data type such as "Module 1"

ModuleTitle
Title of the module in which the student has been registered, with a nominal data type such as "Course 2" Session Shows the session in which the student has been registered, with a nominal data type such as "Session-A" RollNumber Identification number of the student, with a nominal data type such as "21S1234" ApplicantName Name of the student, with a nominal data type such as "Student 1" ApplicantMobile Mobile number of the student, with a discrete data type such as "12345678" CGPA Cumulative grade point average of the student, with a discrete data type such as "4.0"

AttemptCount
The number of attempts in the module, with a discrete data type such as "1"

RemoteStudent
Either the student is under remote study mode or not, with a nominal data type such as "Yes/No"

Probation
Either the student has a backlog of modules to clear, with a nominal data type such as "Yes/No"

HighRisk
The high failure rate in a module, with a nominal data type such as "Yes/No" TermExceeded Progression rate of the student in the degree plan, with a nominal data type such as "Yes/No" AtRisk Previously failed two or more modules, with a nominal data type such as "Yes/No"

AtRiskSSC
Whether the student been registered by the student success center for any educational deficiencies, with a nominal data type such as "Yes/No"

SpecialNeeds
Whether the student been registered by the student success center for any special needs, with a nominal data type such as "Yes/No" OtherModules A student registered in any other modules in the current semester, with a numeric data type such as "1" PrerequisiteModule Prerequisite module registration, with a nominal data type such as "Yes/No" PlagiarismHistory Onto which modules the student has been booked for academic integrity violation, including module and academic year, with a nominal data type such as "Module 3"

Student Academic Performance
Ten comma-separated value (CSV) files of "Result Module <Number> <Semester>", containing the overall results in the modules extracted from SIS, and six attributes (including "RollNumber", "ApplicantName" and "Session") and three new attributes were extracted. Table 2 summarizes these attributes, accompanied by a brief description.

CW1
Marks obtained by the student in their first coursework, with a discrete data type such as "86.5"

CW2
Marks obtained by the student in their second coursework, with a discrete data type such as "86.5"

ESE
Marks obtained in the end semester examination, with a discrete data type such as "86.5"

Student Moodle Logs
Ten comma-separated value (CSV) files of "Moodle Module <Number> <Semester>" containing nine attributes were extracted. Table 3 summarizes these attributes, accompanied by a brief description.

Student Online Activity on Moodle
Ten comma-separated value (CSV) files of "Activity Module <Number> <Semester>" containing "RollNumber" and "ApplicantName" and two new attributes were extracted. Table 4 summarizes these attributes, accompanied by a brief description. Table 4. Attributes and descriptions exported from Moodle.

Attribute Description
Online C User-performed activities within campus (in minutes), with a discrete data type such as "25" Online O User-performed activities outside of campus (in minutes), with a discrete data type such as "25"

Student Video Interaction
Ten comma-separated value (CSV) files of "VL Module <Number> <Semester>" containing "RollNumber" and "ApplicantName" and four new attributes were extracted. Table 5 summarizes these attributes, accompanied by a brief description. Table 5. Attributes and descriptions exported from eDify.

Played
The number of times the video has been played Paused The number of times the video has been paused Likes The number of times the student has liked the video

Segment
The number of times a student has played a specific portion of the video by using the slider

Data Pre-Processing
The pre-processing .csv file contained the consolidated data, and out of that 24 attributes, the following were selected for this study: "ModuleCode", "ModuleTitle", "Ses-sionName", "ApplicantName", "CGPA", "AttemptCount", "RemoteStudent", "Probation", "HighRisk", "TermExceeded", "AtRisk", "AtRiskSSC", "OtherModules", "PlagiarismHistory", "CW1", "CW2", "ESE", "Online C", "Online O", "Played", "Paused", "Likes", "Segment" and "Result" (mapped with the outcome of the student either having passed or failed the module) based on the grading scheme, as shown in Table 6. Eight attributes were converted from numeric to ordinal values, as shown in Table 7, and three attributes were converted from different numeric to ordinal values, as shown in Table 8. The criteria used to convert the grading scheme and marks to ordinal were in line with the assessment evaluation classification range used at MEC. This conversion was carried out to map the outcome of the target variable "Result".

Methods
Before starting the data collection, the first step was to identify the modules. The data were extracted from SIS, Moodle, and eDify. Figure 2 shows the design, materials, and methods used in the process. The raw data were collected from SIS in two phases. First, "Know My Student" details were extracted from the chosen modules. Second, the results of those specific students in the particular modules were extracted. The logfiles from Moodle and eDify were extracted from the selected modules for data cleansing. After the data cleansing, the files were merged, and pre-processing was carried out by merging them into a single consolidated .csv file. Figure 2 shows the procedure on how the data were collected, processed, and made available for the datamining tool (Orange) to predict the students' academic performance using SIS, Moodle activity data, and video interactions through the mobile application.
Data 2021, 6, x FOR PEER REVIEW 7 of 10 "Know My Student" details were extracted from the chosen modules. Second, the results of those specific students in the particular modules were extracted. The logfiles from Moodle and eDify were extracted from the selected modules for data cleansing. After the data cleansing, the files were merged, and pre-processing was carried out by merging them into a single consolidated .csv file. Figure 2 shows the procedure on how the data were collected, processed, and made available for the datamining tool (Orange) to predict the students' academic performance using SIS, Moodle activity data, and video interactions through the mobile application.

Module Selection
The sixth semester modules from the computing department at MEC were selected based on the difficulty level (level 2 and 3 modules). Through sampling, it was found that 188 samples were sufficient for this study. Data were collected only from the Spring 2017 to Spring 2021 semesters in the respected modules. The next step was to obtain informed consent from the students who were enrolled on the module; module leaders/instructors obtained this consent, as the study posed no potential risk or discomfort for the students. Ensuring confidentiality and privacy, the identity of the students was coded and mapped accordingly in the data cleansing process. The character marking and generalization method was used to anonymize the data where necessary and applicable.

•
For data extracted from SIS, the data were complete and had no missing values. From the first extraction out of 20 attributes, few were not relevant to the study. "Roll-Number," "ApplicantName," and "ApplicantMobile" were encoded to "xxxxxxxxxx." The "Advisor" attribute were also encoded to "xxxxx." From the second set, only "RollNumber" and "ApplicantName" were also encoded as the first set to make the data anonymous.

•
For the data extracted from Moodle, the faculty and moderator logs were filtered out,

Module Selection
The sixth semester modules from the computing department at MEC were selected based on the difficulty level (level 2 and 3 modules). Through sampling, it was found that 188 samples were sufficient for this study. Data were collected only from the Spring 2017 to Spring 2021 semesters in the respected modules. The next step was to obtain informed consent from the students who were enrolled on the module; module leaders/instructors obtained this consent, as the study posed no potential risk or discomfort for the students. Ensuring confidentiality and privacy, the identity of the students was coded and mapped accordingly in the data cleansing process. The character marking and generalization method was used to anonymize the data where necessary and applicable.

•
For data extracted from SIS, the data were complete and had no missing values. From the first extraction out of 20 attributes, few were not relevant to the study. "Roll-Number", "ApplicantName" and "ApplicantMobile" were encoded to "xxxxxxxxxx".
The "Advisor" attribute were also encoded to "xxxxx". From the second set, only "RollNumber" and "ApplicantName" were also encoded as the first set to make the data anonymous.

•
For the data extracted from Moodle, the faculty and moderator logs were filtered out, as they were not required for the study. After the removal of the entries, "User Full Name" and "Affected user" were encoded to "-" instead.

Data Pre-Processing
The pre-processing .csv files contained all of the data in a single file that could be pre-processed before being used for classification in any datamining tool. The step carried out here was merging all data into one single data file, where we identified 24 attributes that were useful for this study. The next step was to convert the ordinal values to nominal values, as shown in Tables 7 and 8.
For the data extracted from Moodle, "Affected user", "Event context", "Component", "Event name", "Description" and "Origin" were not relevant to this study, so they were omitted and only the time spent by the individual user in Moodle courses was converted into minutes. IP addresses were used to identify the login timings, either connected from within or outside the campus. 192.168.x.x IP was considered as the within the campus IP address, and the rest were IP addresses from outside the campus. The access time was then converted into minutes to understand the time spent on the activities within or outside the campus.
For data extracted from eDify, all four attributes were taken and no conversion was performed on the data.

Final Dataset
The final .csv dataset was the complete dataset, with 21 out of 40 attributes that could be used for this study. This dataset can be used with any datamining tool for classifying and predicting student academic performance using EDM.
From Moodle, two attributes were selected based on the activities performed on Moodle from outside or within the campus: "Online C" and "Online O".
From eDify, four attributes were selected: "Played", "Paused", "Likes" and "Segment". The final dataset can help researchers to better understand the learning behaviors of the students in the online learning environment setting.

Conclusions
This article provides the dataset with multiple learning environments, which will be useful for researchers who want to explore students' academic performance in online learning environments. This will help them to model their educational datamining models. The dataset will be useful for researchers who want to conduct comparative studies on student behaviors and patterns related to online learning environments. It will further help to form an educational datamining model that can be applied to different classification algorithms to predict successful students. Moreover, feature selection techniques can be applied, which can provide a better accuracy rate for predicting students' academic performance.
For future studies, weekly video interaction records can be considered to provide better insights into video learning analytics and student performance. Furthermore, the data can be used with the predictive churn model to act as an early warning system for the dropouts in the course.

Patents
Hasan, Raza, Palaniappan, Sellappan, Mahmood, Salman, and Asif Hussain, Shaik. A novel method and system to enhance teaching and learning and the student evaluation process using the "eDify" mobile application. AU Patent Innovation 2021103523, filed 22 June 2021.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/data6110110/s1, Data S1: csv files.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The authors confirm that the data supporting the findings of this study are available within the article and/or its Supplementary Materials.