EHCFE: Enhanced Hierarchical Clustering with Feature Engineering for Automating Labeling of Student Performance and Dropout Prediction
Abstract
1. Introduction
Contribution
- The EHCFE approach proposed will automate labeling of students’ performance to facilitate performance and dropout prediction;
- Two feature selection methods are applied, random forest and variance, to identify the top t important features;
- Feature engineering is combined in EHCFE to create new features from the top t important features to improve the labeling process;
- EHCFE employs hierarchical clustering, is applied to three datasets, and compared against current works and state-of-the-art techniques using different evaluation metrics to measure predication performance and quality of clustering.
2. Related Works
2.1. Single Unsupervised Approaches
2.2. Hybrid Approaches
3. The Proposed Enhanced Hierarchical Clustering with Feature Engineering Approach (EHCFE)
- Finding the top t important features: The first step in the EHCFE approach is to select the most t important features by combining two methods, variance and random forest (RF), where t is a range from 3 to 5 features. Variance measures the variability of the data from the mean, where low variability indicates the values of attributes close to each other, while higher values of variance mean that this feature is informative and has a high cardinality [11]. The results from the variance are ranked based on highest variance, where 1 refers to important features.RF is also applied, which works as an ensemble of decision trees to assist in making predictions with greater accuracy [19]. Two measurements are used to measure the importance of features in RF, which are mean decrease in accuracy and the Gini index; then, the rank for each measurement is found, and, finally, average rank is obtained for both measurements. Each decision tree’s degree of node impurity is measured with the Gini index, while accuracy measures performance when attributes are randomly shuffled in the dataset.RF is a supervised machine learning model that requires labeled data for training [19], and the data inputted into EHCFE is unlabeled; thus, K-means clustering is used to label it as synthetic label data. K-means is selected because it is a simple and widely used clustering algorithm. Combined with RF, this strategy enables the identification of the most relevant features within an unlabeled dataset by leveraging task-specific information provided through synthetic labels. Z-score normalization is applied prior to clustering, as shown in the following equation, as it is more robust to outlier values [20].where is the normalized value of , and and are the mean and standard deviation of feature X.By combining both ranks found from RF and variance, the most t important features are selected to derive new features, where 1 is the most important feature. Further analysis is performed in the ablation study (Section 6) to examine the effect of using only the top t important features in the training clustering algorithm.
- Deriving new features for feature engineering: New features are derived by selecting the top-ranked t features from the previous step, where multiple methods exist for deriving new features [9]. In this work, the simple ratio is selected as it is a widely used and well-established approach in feature engineering [9]. A simple ratio is effective in revealing the relationship between two features while also being computationally inexpensive and easy to interpret. For example, a ratio such as “Feature A divided by Feature B” provides a straightforward proportional comparison between the two variables and remains easy to understand. To construct these ratio-based features, one important feature is divided by another important feature from the selected top-ranked set. The general form of a ratio-based feature iswhere is the new feature, is the most important feature, and represents a less important feature that follows in the ranking order. The feature may be any feature in the selected set, including , which is the last feature in the ordered set of top-ranked features, with the requirement that .Only a fixed number of ratio-based features is retained to avoid unnecessary feature expansion; therefore, the selection is limited to five newly created ratio-based features. Any ratio-based feature with many zero values in the denominator is excluded to prevent unstable or uninformative outputs. Further analysis has been conducted on these new features to determine whether they provide unique information; if not, they are considered highly correlated and will be removed. To determine this, correlated features analysis is computed using a pairwise correlation matrix (using Pearson correlation). These new feature pairs, along with their correlation coefficients, are removed if their correlation coefficient exceeds the threshold of 0.95. Another analysis is performed in the ablation study (Section 6) to see the importance of feature engineering step in the proposed model.
- Employing Hierarchical Clustering: An unsupervised learning model is then applied to the data, which is called hierarchical clustering (HC), which establishes the similarity among data points and deals with data in a tree or hierarchical form. HC works by dealing with data points as a single cluster, then merging them with the closest cluster until one or K clusters. There are several methods to measure the distance between clusters, known as linkage methods, such as complete or average linkage. In this work, ward linkage is applied, which differs from other linkages by measuring the variance of clusters [21]. It is based on increasing the average error among clusters, aiming to minimize the total within-cluster variance.In this step, all features in the dataset and the newly derived features from the previous step are used for hierarchical clustering. Before applying HC, the data is normalized using min–max normalization. Min–max normalization adjusts feature values to a common range, ensuring that all features contribute uniformly to the distance calculations used in machine learning and data analysis [20,22]. This step helps in preserving the relationships among the original feature values [23]. Min–max normalization is calculated as follows:where is the normalized value of , and and are the minimum and maximum values of feature X, where and are minimum and maximum values of the new range of feature X, which is [0, 1].The similarity among the data points and is calculated using Euclidean distance, which is one of the most popular methods [24], as shown in the following equation:where and represent the feature values for data points d and y, respectively.
- The algorithmic form of the proposed EHCFE algorithm is presented in Algorithm 1, which accepts unlabeled students’ performance data as input, and generates labeled students’ performance data as output. The computational cost of EHCFE is influenced by both the number of records and the number of features in the dataset. The method includes feature selection and feature engineering steps that introduce additional processing before hierarchical clustering is applied. These steps increase the workload compared to using only hierarchical clustering, which does not require any preprocessing. This effect is also reflected in the execution time results reported in Section 5.2.
| Algorithm 1 EHCFE: Enhanced Hierarchical Clustering using Feature Engineering Algorithm. | |
| |
| |
| |
| ▹ Use normalization Equation (1) |
| |
| |
| |
| ▹ Find average rank for features |
| |
| ▹ Use simple ratio Equation (2) |
| |
| ▹ Use normalization Equation (3) |
| |
| |
| |
4. Experiment Setting
4.1. Datasets
- Joint Entrance Examination (JEE) dataset: This is a synthetic dataset that simulates the academic and behavioral aspects of students to determine whether they will dropout after class 12 [25]. This dataset consists of 15 attributes that fall between numeric and categorical variables, such as JEE scores (numerical) or family income (categorical, ranging from low to high). The target variable for this dataset describes the students’ state, with a value of 1 indicating that the student dropped out after class 12, and 0 indicating that the student is continuing. The distribution of the data is presented in Figure 2a, with the majority of students from the class continuing after class 12.
- Academic success and dropout (ASD) dataset: This dataset was collected from different sources, and consists of 35 attributes, with numeric typing describing the different properties of students, such as academic engagement, demographic, and academic data [2,26,27]. The dataset was collected for the academic years 2008/2009 to 2018/2019, and covers different majors, such as nursing and technology. The dataset consists of 4424 records, with target variables “dropout”, “enrolled”, and “graduate”. In this paper, only classes label dropout and graduation are considered, as the enrolled students’ statuses can later become either “dropout” or “graduate”. The distribution of the data is presented in Figure 2b, with the majority of students having successfully completed their studies and graduated.
- The Engineering and Educational Sciences (EES) dataset: It was collected in 2019 from the Faculty of Engineering and the Faculty of Educational Sciences [28,29]. It comprises various personal, educational, and family information about the students. It comprises 145 records with 33 features. The dataset consists of grades as a target value, so the target has been modified to accommodate a binary task by retaining the class “fail”, and categorizing the remaining grades as “pass”. The distribution of the data is presented in Figure 2c, with the majority of students having successfully completed and passed their courses.
4.2. Evaluation Metrics
4.2.1. Ground Truth Dependent Metrics
- F1 score: This measures the agreement between clustering algorithms’ results labels and ground truth, measuring the harmonic means of recall and precision. Precision measures the proportion of correctly predicted students with dropout status among all students predicted that have dropout status. Recall measures the corrected predicted student dropout status compared to actual dropout statuses, where higher values are better. F1 score, precision, and recall are calculated as follows:where TP is the number of students who have actual dropout status and were predicted as dropout status. FP is the number of students who were predicted to have dropout status but received successful status, while FN is the number of students who are actual dropouts but were predicted as having successful status.
- Area under the receiver operator characteristic (ROC) curve (AUC): The ROC is a probability curve that helps show the trade-off between recall (on the y-axis) and false positive rate (FPR) (on the x-axis). AUC measures the area under the curve, ranging [0, 1], and a higher value is required. FPR measures the proportion of students who were incorrectly predicted as dropout status, where the actual classification is successful status.
- Adjusted Rand index (ARI): It adjusts the random chance of Rand index (RI) similarity or agreement [31], where RI calculates the proportion of agreement or similarity of clusters, meaning that it measures the correct decision by clustering methods compared to the ground truth. ARI ignores the differences in label names between the predicted and actual labels. The range of values is [−1,1], where −1 has random or disagreement in labeling, and 1 has good agreement in labeling. RI and ARI are calculated as follows:where TN is the number of students who have successful status and were predicted as successful. ERI is the RI’s expected value under a random cluster, and max(RI) is the maximum value of RI, which is usually 1.
4.2.2. Ground Truth Independent Metrics
- The Calinski–Harabasz index (CHI) [33] (higher is better) measures the ratio of between-cluster separation to within-cluster compactness, where a higher value indicates better-defined and well-separated clusters.
- The silhouette coefficient (SC) [34] (higher is better) also measures how well data points belong to the cluster (compactness) compared to other clusters (separation) by talking the pairwise difference, and has a range of [−1,1], where −1 indicates data points unmatched to their own cluster, and 1 indicates that data points are matched to their own cluster.
- The Davies–Bouldin index (DBI) [35] (lower is better) is found by calculating similarity between clusters and other clusters, and the highest value is assigned to each cluster. The DBI is obtained by taking the average of all cluster similarities, where a lower value indicates better-defined and well-separated clusters.
5. Results and Discussion
5.1. Evaluation of the Proposed Model
5.2. Discussion
6. Ablation Study
6.1. Selection of t Importance Features
6.2. Feature Engineering: Features Derived
7. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kehm, B.M.; Larsen, M.R.; Sommersel, H.B. Student dropout from universities in Europe: A review of empirical literature. Hung. Educ. Res. J. 2019, 9, 147–164. [Google Scholar] [CrossRef]
- Realinho, V.; Machado, J.; Baptista, L.; Martins, M.V. Predicting Student Dropout and Academic Success. Data 2022, 7, 146. [Google Scholar] [CrossRef]
- Quinn, J. Drop-out and completion in higher education in Europe among students from under-represented groups. In An Independent Report Authored for the European Commission; European Commission: Brussels, Belgium, 2013. [Google Scholar]
- Ahuja, R.; Jha, A.; Maurya, R.; Srivastava, R. Analysis of Educational Data Mining. In Proceedings of the Harmony Search and Nature Inspired Optimization Algorithms; Yadav, N., Yadav, A., Bansal, J.C., Deep, K., Kim, J.H., Eds.; Springer: Singapore, 2019; pp. 897–907. [Google Scholar]
- Llanos, J.; Bucheli, V.A.; Restrepo-Calle, F. Early prediction of student performance in CS1 programming courses. PeerJ Comput. Sci. 2023, 9, e1655. [Google Scholar] [CrossRef]
- Al-Ahmad, B.I.; Alzaqebah, A.; Alkhawaldeh, R.; Al-Zoubi, A.; Lo, H.; Ali, A. Predicting academic performance for students’ university: Case study from Saint Cloud State University. PeerJ Comput. Sci. 2025, 11, e3087. [Google Scholar] [CrossRef] [PubMed]
- Baniata, L.H.; Kang, S.; Alsharaiah, M.A.; Baniata, M.H. Advanced deep learning model for predicting the academic performances of students in educational institutions. Appl. Sci. 2024, 14, 1963. [Google Scholar] [CrossRef]
- Shou, Z.; Xie, M.; Mo, J.; Zhang, H. Predicting student performance in online learning: A multidimensional time-series data analysis approach. Appl. Sci. 2024, 14, 2522. [Google Scholar] [CrossRef]
- Heaton, J. An empirical analysis of feature engineering for predictive modeling. In Proceedings of the SoutheastCon 2016; IEEE: New York, NY, USA, 2016; pp. 1–6. [Google Scholar] [CrossRef]
- Alameri, F. Predicting Student Dropout Risk using Machine Learning; Rochester Institute of Technology: Rochester, NY, USA, 2025. [Google Scholar]
- Mohamed Nafuri, A.F.; Sani, N.S.; Zainudin, N.F.A.; Rahman, A.H.A.; Aliff, M. Clustering Analysis for Classifying Student Academic Performance in Higher Education. Appl. Sci. 2022, 12, 9467. [Google Scholar] [CrossRef]
- Pecuchova, J.; Drlik, M. Enhancing the Early Student Dropout Prediction Model Through Clustering Analysis of Students’ Digital Traces. IEEE Access 2024, 12, 159336–159367. [Google Scholar] [CrossRef]
- Oeda, S.; Hashimoto, G. Log-Data Clustering Analysis for Dropout Prediction in Beginner Programming Classes. Procedia Comput. Sci. 2017, 112, 614–621. [Google Scholar] [CrossRef]
- Palani, K.; Stynes, P.; Pathak, P. Clustering Techniques to Identify Low-engagement Student Levels. In Proceedings of the 13th International Conference on Computer Supported Education—Volume 2: CSEDU; INSTICC; SciTePress: Setúbal, Portugal, 2021; pp. 248–257. [Google Scholar] [CrossRef]
- Valles-Coral, M.A.; Salazar-Ramírez, L.; Injante, R.; Hernandez-Torres, E.A.; Juárez-Díaz, J.; Navarro-Cabrera, J.R.; Pinedo, L.; Vidaurre-Rojas, P. Density-Based Unsupervised Learning Algorithm to Categorize College Students into Dropout Risk Levels. Data 2022, 7, 165. [Google Scholar] [CrossRef]
- Ghosh, P.; Charit, A.; Banerjee, H.; Bandhu, D.; Ghosh, A.; Pal, A.; Goto, T.; Sen, S. DropWrap: A Neural Network Based Automated Model for Managing Student Dropout. Int. J. Networked Distrib. Comput. 2025, 13, 17. [Google Scholar] [CrossRef]
- UDISE+ Data Dashboard Report. 2024. Available online: https://dashboard.udiseplus.gov.in/#/reportDashboard/sReport (accessed on 10 August 2024).
- Kim, S.; Choi, E.; Jun, Y.K.; Lee, S. Student Dropout Prediction for University with High Precision and Recall. Appl. Sci. 2023, 13, 6275. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Kim, Y.S.; Kim, M.K.; Fu, N.; Liu, J.; Wang, J.; Srebric, J. Investigating the impact of data normalization methods on predicting electricity consumption in a building using different artificial neural network models. Sustain. Cities Soc. 2025, 118, 105570. [Google Scholar] [CrossRef]
- Murtagh, F.; Legendre, P. Ward’s hierarchical agglomerative clustering method: Which algorithms implement Ward’s criterion? J. Classif. 2014, 31, 274–295. [Google Scholar] [CrossRef]
- Ali, P.J.M. Investigating the Impact of min-max data normalization on the regression performance of K-nearest neighbor with different similarity measurements. ARO-Sci. J. Koya Univ. 2022, 10, 85–91. [Google Scholar]
- Han, J.; Kamber, M.; Pei, J. 3—Data Preprocessing. In Data Mining, 3rd ed.; Han, J., Kamber, M., Pei, J., Eds.; The Morgan Kaufmann Series in Data Management Systems; Morgan Kaufmann: Boston, MA, USA, 2012; pp. 83–124. [Google Scholar] [CrossRef]
- Han, J.; Kamber, M.; Pei, J. 2—Getting to Know Your Data. In Data Mining: Concepts and Techniques, 3rd ed.; Han, J., Kamber, M., Pei, J., Eds.; The Morgan Kaufmann Series in Data Management Systems; Morgan Kaufmann: Boston, MA, USA, 2012; pp. 39–82. [Google Scholar] [CrossRef]
- Nath, J. Simulated Dataset: JEE Dropout After Class 12. 2023. Available online: https://www.kaggle.com/datasets/jayaantanaath/simulated-dataset-jee-dropout-after-class-12/data (accessed on 21 September 2025).
- Predict Students’ Dropout and Academic Success. 2023. Available online: https://www.kaggle.com/datasets/thedevastator/higher-education-predictors-of-student-retention (accessed on 21 September 2025).
- Realinho, V.; Machado, J.; Baptista, L.; Martins, M.V. Predict Students’ Dropout and Academic Success. 2021. Available online: https://zenodo.org/records/5777340 (accessed on 21 September 2025).
- Yılmaz, N.; Sekeroglu, B. Student Performance Classification Using Artificial Intelligence Techniques. In Proceedings of the 10th International Conference on Theory and Application of Soft Computing, Computing with Words and Perceptions—ICSCCW-2019; Aliev, R.A., Kacprzyk, J., Pedrycz, W., Jamshidi, M., Babanli, M.B., Sadikoglu, F.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 596–603. [Google Scholar]
- Yilmaz, N.; Şekeroğlu, B. Higher Education Students Performance Evaluation; UCI Machine Learning Repository: Irvine, CA, USA, 2019. [Google Scholar] [CrossRef]
- Hafzan, M.Y.N.N.; Safaai, D.; Asiah, M.; Saberi, M.M.; Syuhaida, S.S. Review on Predictive Modelling Techniques for Identifying Students at Risk in University Environment. MATEC Web Conf. 2019, 255, 03002. [Google Scholar] [CrossRef]
- Hubert, L.J.; Arabie, P. Comparing partitions. J. Classif. 1985, 2, 193–218. [Google Scholar] [CrossRef]
- Liu, Y.; Li, Z.; Xiong, H.; Gao, X.; Wu, J. Understanding of Internal Clustering Validation Measures. In Proceedings of the 2010 IEEE International Conference on Data Mining; IEEE: New York, NY, USA, 2010; pp. 911–916. [Google Scholar] [CrossRef]
- Caliński, T.; Harabasz, J. A dendrite method for cluster analysis. Commun. Stat.-Theory Methods 1974, 3, 1–27. [Google Scholar] [CrossRef]
- Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
- Davies, D.L.; Bouldin, D.W. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, PAMI-1, 224–227. [Google Scholar] [CrossRef]
- Kaufman, L.; Rousseeuw, P.J. Partitioning Around Medoids (Program PAM). In Finding Groups in Data; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 1990; Volume Chapter 2, pp. 68–125. [Google Scholar] [CrossRef]
- Neave, H.R.; Worthington, P.L. Distribution-Free Tests; Routledge: London, UK, 1992. [Google Scholar]
- Brazdil, P.B.; Soares, C. A Comparison of Ranking Methods for Classification Algorithm Selection. In Proceedings of the Machine Learning: ECML 2000; López de Mántaras, R., Plaza, E., Eds.; Springer: Berlin/Heidelberg, Germany, 2000; pp. 63–75. [Google Scholar]
- McInnes, L.; Healy, J.; Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]







| Datasets | Features | New Features | Total Features |
|---|---|---|---|
| JEE | 14 | 5 | 19 |
| ASD | 34 | 2 | 36 |
| EES | 32 | 3 | 35 |
| Metric | Models | |||
|---|---|---|---|---|
| K-Means-All | PAM-All | HC-Avg-All |
EHCFE
(The Proposed Model) | |
| ARI | 0.0003 (2.5) | 0.02 (1) | −0.015 (4) | 0.0003 (2.5) |
| AUC | 0.508 (1.5) | 0.382 (3) | 0.313 (4) | 0.508 (1.5) |
| F1 score | 0.298 (1.5) | 0.181 (3) | 0.052 (4) | 0.298 (1.5) |
| DBI | 2.536 (1) | 2.957 (4) | 2.899 (3) | 2.572 (2) |
| SC | 0.134 (1) | 0.103 (3) | 0.1 (4) | 0.13 (2) |
| CHI | 776.377 (1) | 571.64 (3) | 545.72 (4) | 754.659 (2) |
| Avg rank | 1.42 | 2.83 | 3.83 | 1.92 |
| Metric | Models | |||
|---|---|---|---|---|
| K-Means-All | PAM-All | HC-Avg-Subset |
EHCFE
(The Proposed Model) | |
| ARI | 0.019 (3) | 0.02 (2) | −0.001 (4) | 0.333 (1) |
| AUC | 0.357 (4) | 0.564 (2) | 0.499 (3) | 0.766 (1) |
| F1 score | 0.111 (3) | 0.492 (2) | 0.001 (4) | 0.709 (1) |
| DBI | 2.471 (2) | 2.552 (3) | 1.258 (1) | 2.586 (4) |
| SC | 0.109 (4) | 0.141 (3) | 0.401 (1) | 0.159 (2) |
| CHI | 428.679 (3) | 555.536 (1) | 21.112 (4) | 505.395 (2) |
| Avg rank | 3.17 | 2.17 | 2.83 | 1.83 |
| Metric | Models | |||
|---|---|---|---|---|
| K-Means-All | PAM-All | HC-Avg-All |
EHCFE
(The Proposed Model) | |
| ARI | 0.074 (4) | 0.095 (3) | 0.201 (1) | 0.131 (2) |
| AUC | 0.773 (2) | 0.696 (3) | 0.562 (4) | 0.81 (1) |
| F1 score | 0.233 (2) | 0.222 (3.5) | 0.222 (3.5) | 0.28 (1) |
| DBI | 2.813 (4) | 2.636 (3) | 0.674 (1) | 2.635 (2) |
| SC | 0.114 (4) | 0.119 (3) | 0.205 (1) | 0.13 (2) |
| CHI | 17.023 (2) | 15.81 (3) | 2.169 (4) | 17.416 (1) |
| Avg rank | 3 | 3.08 | 2.42 | 1.5 |
| Models | ARI | AUC | F1 Score | DBI | SC | CHI |
|---|---|---|---|---|---|---|
| HC-ward-5F | 0.337 | 0.747 | 0.668 | 0.636 | 0.688 | 7937.597 |
| HC-ward-10F | 0.337 | 0.747 | 0.668 | 1.271 | 0.378 | 1684.447 |
| EHCFE (the proposed model) | 0.333 | 0.766 | 0.709 | 2.586 | 0.159 | 505.395 |
| Models | ARI | AUC | F1 Score | DBI | SC | CHI |
|---|---|---|---|---|---|---|
| HC-ward-All | 0.309 | 0.767 | 0.717 | 3.256 | 0.118 | 338.614 |
| EHCFE (the proposed model) | 0.333 | 0.766 | 0.709 | 2.586 | 0.159 | 505.395 |
| Models | ARI | AUC | F1 Score | DBI | SC | CHI |
|---|---|---|---|---|---|---|
| HC-ward-All | 0.107 | 0.703 | 0.233 | 2.478 | 0.13 | 17.22 |
| EHCFE (the proposed model) | 0.131 | 0.81 | 0.28 | 2.635 | 0.13 | 17.416 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Alghanmi, N. EHCFE: Enhanced Hierarchical Clustering with Feature Engineering for Automating Labeling of Student Performance and Dropout Prediction. Electronics 2026, 15, 1265. https://doi.org/10.3390/electronics15061265
Alghanmi N. EHCFE: Enhanced Hierarchical Clustering with Feature Engineering for Automating Labeling of Student Performance and Dropout Prediction. Electronics. 2026; 15(6):1265. https://doi.org/10.3390/electronics15061265
Chicago/Turabian StyleAlghanmi, Nusaybah. 2026. "EHCFE: Enhanced Hierarchical Clustering with Feature Engineering for Automating Labeling of Student Performance and Dropout Prediction" Electronics 15, no. 6: 1265. https://doi.org/10.3390/electronics15061265
APA StyleAlghanmi, N. (2026). EHCFE: Enhanced Hierarchical Clustering with Feature Engineering for Automating Labeling of Student Performance and Dropout Prediction. Electronics, 15(6), 1265. https://doi.org/10.3390/electronics15061265
