Early and precisely predicting the students’ dropout based on available educational data belongs to the widespread research topic of the learning analytics research field. Despite the amount of already realized research, the progress is not significant and persists on all educational data levels. Even though various features have already been researched, there is still an open question, which features can be considered appropriate for different machine learning classifiers applied to the typical scarce set of educational data at the e-learning course level. Therefore, the main goal of the research is to emphasize the importance of the data understanding, data gathering phase, stress the limitations of the available datasets of educational data, compare the performance of several machine learning classifiers, and show that also a limited set of features, which are available for teachers in the e-learning course, can predict student’s dropout with sufficient accuracy if the performance metrics are thoroughly considered. The data collected from four academic years were analyzed. The features selected in this study proved to be applicable in predicting course completers and non-completers. The prediction accuracy varied between 77 and 93% on unseen data from the next academic year. In addition to the frequently used performance metrics, the comparison of machine learning classifiers homogeneity was analyzed to overcome the impact of the limited size of the dataset on obtained high values of performance metrics. The results showed that several machine learning algorithms could be successfully applied to a scarce dataset of educational data. Simultaneously, classification performance metrics should be thoroughly considered before deciding to deploy the best performance classification model to predict potential dropout cases and design beneficial intervention mechanisms.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited