Predicting the Impact of Academic Key Factors and Spatial Behaviors on Students’ Performance

: Quality education is necessary as it provides the basis for equality in society. It is also signiﬁcantly important that educational institutes be focused on tracking and improving the academic performance of each student. Thus, it is important to identify the key factors (i.e., diverse backgrounds, behaviors, etc.) that help students perform well. However, the increasing number of students makes it challenging and leaves a negative impact on credibility and resources due to the high dropout rates. Researchers tend to work on a variety of statistical and machine learning techniques for predicting student performance without giving much importance to their spatial and behavioral factors. Therefore, there is a need to develop a method that considers weighted key factors which have an impact on their performance. To achieve this, we ﬁrst surveyed by considering experts’ opinions in selecting weighted key factors using the Fuzzy Delphi Method (FDM). Secondly, a geospatial-based machine learning technique was developed which integrated the relationship between students’ location-based features, semester-wise behavioral features, and academic features. Three different experiments were conducted to prove the superiority and predict student performance. The experimental results reveal that Long Short-Term Memory (LSTM) achieved higher accuracy of 90.9% as compared to other machine learning methods, for instance, Support Vector Machine (SVM), Random Forest (RF), Naive Bayes (NB), Multilayer Perceptron (MLP), and Decision Tree (DT). Scientiﬁc analysis techniques (i.e., Fuzzy Delphi Method (FDM)) and machine learning feature engineering techniques (i.e., Variance Threshold (VT)) were used in two different experiments for selecting features where scientiﬁc analysis techniques had achieved better accuracy. The ﬁnding of this research is that, along with the past performance and social status key factors, the semester behavior factors have a lot of impact on students’ performance. We performed spatial statistical analysis on our dataset in the context of Pakistan, which provided us with the spatial areas of students’ performance; furthermore, their results are described in the data analysis section.


Introduction
Quality education provides the basis for equality in society. One of the most basic public services is high-quality education [1]. Quality of education is vital for every citizen [2]. For this, educational institutes should be focused to improve the academic performance of every student individually. To achieve overall academic success, students need to perform well in all courses [3]. It is quite difficult for educators to keep track of their students' academic performance and improve their performance in each course [4]. As they cannot manage individual course-wise records manually, so they are unable to improve students' performance when it is required to meet the demands of each student on different attitudes [5]. Thus, a technical automated system is required which should provide detailed information on students' progression, which should input exam results, assignments, and performance of other activities in the course [6]. Researchers are working on a variety of statistical and machine learning models for predicting the impact of academic factors on different students [7].
Educational Data Mining (EDM) is a field that has been used to analyze academic data [8,9]. EDM has various applications during the development of a system as it uses different computational methods to detect patterns to analyze large-scale data. The prediction of student results in terms of academic performance, ratings, or grades is a well-known application of Educational Data Mining [10][11][12][13]. Predictive modeling techniques have been considered for students' academic performance [14,15], in this regard, classification techniques come out to be the most effective for this problem.
Data mining can be used in different fields to improve overall efficiency by using pattern analysis [16]. It is possible only by extraction of valuable information [17] from a stored dataset that is undiscovered until now. This extracted useful information will be used later, as it will help in resolving the issues that were previously faced during the development of the structural model. Today, data mining and its applications in the education sector have gained importance more than before. Thus, we can define Educational Data Mining as 'the process to transform raw data from the educational system into useful information that can be used later by the stakeholders for further applications' [18]. In the end, it will assist the educational institutes to review, improve, and strengthen students' learning process. It is obvious that to enhance the environment of any educational institute, the most important thing is to understand the learning process of students. The sheer understanding of this process has several advantages like optimization of learning outcomes for students and making the system strong enough to support weak students [19]. As a result, the rate of students failing their courses and dropping out will be decreased [20].
In the literature on geography education, spatial thinking is more closely tied to spatial skills, aptitudes, and ideas [21,22]. Choosing the best route to commute to work or school is only one example of how spatial thinking is used on a daily basis. Geographical Information systems (GIS), in particular, can improve spatial thinking because they make it possible to analyze geospatial data and find hidden patterns inside the data [23,24]. GIS is a system that is used for the management, storage, and analysis of geospatial data. GIS-based applications are mostly available online easily. These applications can be used by anyone for processing data concerning its spatial features [25].
Geospatial data is comprised of both location and characteristics of spatial features. To define a lane, for example, a reference is made to its position (i.e., where it is) and its attributes (e.g., length, name, speed limit, and direction). A GIS enables the user to handle road data and many other geospatial data, thereby separating them from non-spatial data business management systems. In addition to geospatial data, a GIS contains hardware, software, people, and organization. GIS hardware includes computers, printers, plotters, digitizers, scanners, Global Positioning System (GPS), and mobile devices. GIS software, either commercial or open-source, contains programs and applications for data management, data interpretation, data display, and other tasks to be performed by a computer [26].
Student dropout in the educational sector is a very important issue in higher education and If students' dropout rate is high then it will surely waste the resources of the institution and will also affect its credibility whenever an institutional evaluation will be performed [27]. Consequently, it is the need of the hour to propose a model that will output the estimation of the final result of the students by making use of their previous records to reduce the rate of dropout. This will also enhance the quality of education. For this, all the faculty members, administration, and educational system of the institutions should take this responsibility to design better outlines of learning and establish useful systems which will enhance learning opportunities for the students [28].
Hence, in this paper, firstly, we identified the student performance risk factors and semester behavioral factors from the literature in order to predict their performance.
Secondly, we conducted three experiments to meet the objectives. In the first experiment, we defined the student performance prediction by using a scientific analysis technique, which is the Fuzzy Delphi Method (FDM) for screening and shortlisting the student performance key factors. In the second experiment, we incorporated all identified risk factors for predicting student academic performance. Finally, in the third experiment, we used the machine learning feature engineering technique, which is the Variance Threshold (VT) for predicting student academic performance. The main objective of this paper is to first find and use the spatial locational factors and semester behavioral factors for predicting student academic performance. Later on, to identify the key factors that have the most impact on student performance. The last objective is to analyze the spatial data in terms of spatial statistical analysis.

Related Work
The emergence of Education Data Mining (EDM), the latest discipline which has been used for over a decade in the development, study, and application of computerized methods in pattern detection, has helped exponentially in the analysis of vast educational data [29] that would otherwise be difficult due to large volume of existing data. The prediction of student results, where the aim is to evaluate the untold gain of function, information, ratings, or grades, is considered one of the experienced and famous applications of educational data mining (EDM) [30]. One of the historical student data findings, a predictive model for the success of student performance, is a highly recommended technique to investigate the relevant problems of students [31,32].
To increase the overall efficiency of a system, data mining (DM) can be applied in different fields. This can be achieved by extracting valuable and specific knowledge previously undiscovered from a stored data set [33]. In this way, the information learned will help solve several challenges and develop the current structure [34].
The use of DM in education is of increasing importance. In fact, for college learners, conventional DM techniques can be applied to educational data for the results. EDM is defined as the process [18] used to transform raw data collected by education systems into useful information that teachers can use to take corrective action and answer research questions. Thus, EDM assists education centers to review and strengthen students' learning processes. In enhancing an institution's educational environment, understanding students' knowledge-based learning should have played a huge part in developing skills. Such awareness results in many advantages, such as optimizing learning outcomes for students and the ability to prepare outcomes for the support of weak students. The number of students dropping out or failing classes would decreased as a consequence [35].
Estimating student performance is not an easy task and it is also important for both students and teachers to be aware of student performance. Early estimation is helpful for students and teachers. Teachers can play an important role for students and keep them aware of students dropping out of their course or subject in university. Teachers can also help students who need extra support [35].
Student dropout rates of academic students, which is one of the significant problems in higher education, affects the resources of the university and eventually affects the institutional evaluation process [27]. It is necessary to propose an evaluation model for the estimation of results for academic students. This will give support to the academic quality process and reduce dropout rates. We should give priority to education and communication in our societies. It is the responsibility of teachers and all education systems and their administrators to develop better outlines of learning and establish systems to expand learning opportunities [28].
We need to identify weak students among the whole class through their performance predictions using different techniques to provide the proper attention and to prevent them from dropping out of their studies [36]. Therefore, to support students' dropout rate, early warning systems need to be made and considered [37]. Due to incomplete and faulty information systems of educational organizations, student behavioral characteristics are used for student performance predictions [38].
Locational features also have some impact on students' performance. In this regard, the geographic location of public schools has been considered [39], and the study concluded that the geographic location of public schools does affect the performance of academic students. Another study concluded that the rural graphical location of the teaching side was associated with high students satisfaction [40]. Yet another study on location highlighted their findings by concluding that school resources vary across graphical locations, and communities having small rural areas have the lowest socioeconomic profile [41], lower student academic performance, shortage of education staff, and industrial material, while schools in the neighborhood of towns have high resources, more availability of teaching staff, and higher students' satisfaction [42]. Another study [43] concluded that students are not performing well in remote areas as compared to their country counterparts. A study [44] further concluded that students who study in city location schools achieve significantly greater marks as compared to other geographic areas.
Student dropout rates also cause financial loss for both students and their education sectors. It affects graduation rates and lowers employment opportunities in highly qualified positions. If an institution loses a student, it decreases the retention rate of the university. Education for Sustainable Development is an important factor for making societies better, higher education guarantees any society to produce future professionals and leaders [28]. That is why the anticipation of good performance on the part of students is a significant study area as it can make students aware of their expected results before final exams. This prediction will be an alert for weaker students that they have to put in extra effort than before, in order to achieve better results than predicted. If we apply this theory based on an institutional perspective, observe how, by performing different prediction techniques, these affected students will be identified, and as a result, their teachers can provide their full attention to ease their studies and keep them safe from dropping out. With the help of these predictions, we can make early warning systems to decrease student dropout rates [37].

Student Performance Factors
Comprehensive evidence through literature was gathered. Researchers have been using numerous machine learning techniques for predicting the performance of academic students. In this regard, 33 studies have been identified from 2015 to 2022. The aim of collecting these studies was to find the key factors that have mostly been used or have been important to be used for predicting the academic performance of students.
Primarily, the comprehensive literature sorted out the student risk factors that have been identified by research articles for the prediction of student academic performance. Secondly, this literature was focused to include the key factors that have been highlighted by multiple educational articles, surveys, and systemic literature review articles. Finally, the literature was reviewed to find factors that have not been used in research articles for students' performance prediction. These features have been mentioned in the "New Features" column in the Taxonomy of Systematic Literature Review in Figure 1.

Students' Semester Behavioral Feature
Multiple students' semester behavioral features have been considered in this research. The list of these factors has been listed in Student performance factors and FDM output Table 1. A Google Form was designed considering the 24-semester behavioral features of students. At the end of their semester, students were asked to fill out this form. A total of 200 students took part in this survey, who have been targeted for their first four semesters. Later, a survey was conducted by experts to find the importance of these students' academic features. Such a process was monitored using the Fuzzy Delphi Method [45]. The features that have lower importance according to experts were discarded.

Fuzzy Delphi Method
A systematic review of the student performance factors revealed that there are many key factors. As it is impractical to consider all input factors for the student performance prediction process, a proper screening approach must be utilized to identify the significant factors between a larger set of inputs. Delphi is an extensively used method for screening that tries to seek the most critical or influential elements of a phenomenon under consideration. However, there are certain demerits of the traditional Delphi method including less uniformity of fine judgments, increased computational cost, and up-gradation of skilled personal judgments to attain uniform altogether judgments [46]. The Fuzzy Delphi Method [47] addresses these demerits and has received significant popularity since its inception. This research utilizes the Fuzzy Delphi Method to screen out the most significant input factors for student academic performance prediction.
The Fuzzy Delphi Method (FDM) was employed in the screening and shortlisting of input features for student performance prediction. The threshold value was determined through this process to seek expert judgments regarding the significance of the evaluation factors in the list [48]. A questionnaire survey is utilized as the main research tool for the data collection in this phase. This questionnaire comprises two sections, where the first section collected the basic information and demographics of the experts. The second section collected the opinions of the experts regarding the significance of a particular input factor for student performance prediction. Considering a large number of factors, the second part of the questionnaire is divided into five sub-sections guided by the underlying perspectives. A Likert scale of 1-5 is used to evaluate the factors where the higher point refers to the higher significance of a factor. The research population comprised educationists from the educational domain. Finally, considering the guidelines and requirements of Fuzzy Delphi screening, questionnaires were distributed through Google Forms. A comprehensive list of 44 factors concerning student performance was formulated depending on results acquired from existing literature, technical papers, and blogs. Later, skillful responses for these factors were collected from 42 educational professionals.
The fuzzy Delphi screening or evaluation phase consists of three constituent steps including: (i) conversion of judgments into estimates, (ii) defuzzification using Graded Mean Integration Representation (GMIR), and (iii) screening of the critical factors according to the threshold value. The threshold can be subjectively set according to the mean or geometric mean of all evaluation factors [45]. By referring to the geometric mean of all 44 candidate evaluation factors, the threshold value was subjectively set to 3.37. Based on results from the Fuzzy Delphi screening process, a detailed and multidisciplinary grading system was developed. Out of 44 features, 29 features crossed the threshold mark of the grading system which was 3.37 out of 5, and were considered significant.
Certain unavailable features such as Family Income and Attendance rendered the dataset with 27 features. Features such as teaching quality was thought to be entered from only our university where it was considered to be the same for each student. The Fuzzy Delphi thresholding discarded this feature as well. As a result, the dataset was left with 26 features. Table 1 presents all input features that are incorporated in this research with their number of meanings, their feature description, and their FDM Score (feature threshold value). Thirty-seven features are listed in this table. Out of these 37 features, four features are used twice, because of two meanings, mentioned in the column (#) and column (description) of the same Table 1. There are three features from the list of educational factors of Figure 1 that were dropped in the data preprocessing phase because they also existed in the behavioral factors list of Figure 1. These three features are Self Employed, Marital Status, and Hostel Factor.

Dataset
In this research, we used the dataset of the FAST National University of Computer and Emerging Sciences (NUCES) Chiniot-Faisalabad (CFD) campus. The dataset of 200 students in their first four semesters of bachelor's degree in the discipline of Computer Science is considered for this research. The data features included academic features, locational features, and semester behavioral features, listed in Table 1. The dataset consists of 47 input features and 800 records. Three new features that were not incorporated by the Fuzzy Delphi process but are included in this dataset are Guardian (Father, Mother, or Other), attempted credit hours (Multiple Values), and earned credit hours (Multiple Values). In the data cleaning process, due to some missing information from a record, we had to remove a record. So dataset was left with 799 records for this research.

Data Analysis
The histogram in Figure 2 shows the semester-wise behavior features of students. From the histogram, we can estimate the score varies between semesters. Some of the factors that have the higher score in FDM have been highlighted in Figures 2 and 3. Behavior features include the new students' urge to make new friends in their first semester, hosteller trends in their first four semesters, personal and home issues faced by the students, students having older siblings in the same field, and some students with other factors. The results showed that over 80%, of students make new friends in their first semester, and here we have extracted fruitful information from students' semester behavioral records. A comparison of graphs between daily study or weekly study habit of students is also made. Older siblings in education, internet facility at home, and guidance from seniors are shown in the histogram. The results show that the semester 4 graph has high values during the period of COVID-19. The overall behavioral data of students has also been analyzed in Different Input Parameters, as shown in Figure 4. Some important parameters have been visualized here for a better understanding of the students' behavioral data. The percentage of male students in this dataset is higher compared to female students. A very low number of student are self-employed during their studies. Different types of students came from different backgrounds/locations. Some other parameters that are visualized in Figure 4 are SSC (secondary school certificate equivalent to the 10th standard), HSSC (higher secondary school certificate equivalent to the 12th standard), O-Level (qualification in a specific subject equivalent to the 10th standard), and A-Level (a qualification that is equal to the 12th standard).
Geo-mapping was then performed by using the coordinates data of students. Geo-mapping is the act of translating locational coordinates into a geo map that is used to visualize the location of utilities quickly and correctly. It is a method for displaying data from various geo-cultural contexts or specific geographic areas. It is also used to make maps of real-world features' spatial locations and to illustrate the spatial relationships between them. For this study, the student data was visualized through geo-mapping as shown in Figure 5 (Multivariate clustering of students data). Figure 5 shows students' locality within different regions of Pakistan. The zoomed map of the state, Punjab of Pakistan, also indicated that we have a majority of students from metropolitan areas like Faisalabad and Lahore. Multivariate clustering of the Punjab boundary shows the clusters based on the performance of students in the region of Punjab.

Proposed Methodology and Results
The proposed methodology includes three experiments for the prediction of student performance. The first experiment used the Fuzzy Delphi Method output for the prediction of student performance. The second experiment is applied to all the datasets having all academic features, locational features, and behavioral features. The last experiment applies different feature selection techniques to the complete dataset followed by a trial that tried to predict student performance.

Methodology
In the methodology phase, multiple steps are encountered for data processing. In the data preprocessing phase, we performed different steps, starting with data cleaning. In the data cleaning step, a record was removed because we did not have their semester information. A record contains the students' data for a semester, and because of the unavailability of some important features, we had to remove that record before using the data in the experiments. Data binning was performed on some columns containing numeric values, such as the feature 'Past Performance'. There are some categorical columns, having values 'Yes' and 'No', directly normalized to 1 and 0 in the data normalization. In the last phase of data preprocessing, label encoding is performed on such columns, where a count of unique column value is more than two. Our dataset has a multi-label class (Good-Performer, Avg-Performer, Bad-Performer). To synthesize the dataset, SMOTE (Synthetic Minority Oversampling Technique) is used. Before applying SMOTE to the dataset, the dataset had 799 records and 47 features, and after applying SMOTE, dataset records increased to 1428 with the same 47 features (Figure 6). We performed three experiments to predict the academic performance of students ( Figure 7); for all experiments we performed the same preprocessing steps (Data bining, Label encoding, Data normalization, and Data Synthesis using SMOTE). The aim of the first experiment (Exp-1) is to consider only those features that have higher importance according to experts and to obtain their consensus on the sustainability of the presented item in the questionnaire. In this experiment, we have used the Fuzzy Delphi method (FDM) which is a scientific analysis technique to consolidate consensus agreement within the panel of experts. FDM was used to shortlist these 47 features. After taking consensus from 42 experts, we were left with 26 features for experiment one (Exp-1). Later, after performing the preprocessing steps (Data bining, Label encoding, Data normalization, and Data Synthesis using SMOTE), we had two datasets for experiment one (Exp-1). For without-SMOTE, the dataset had 799 records and 66 features, and for with-SMOTE, the dataset had 1428 records and 66 features. For the second experiment (Exp-2), the aim is to consider all the features that have been called important by the literature. In this research, we collected all 47 important features, and we used all these features in this experiment. After performing different preprocessing, we obtained two datasets for experiment two (Exp-2) as well. For without-SMOTE, the dataset had 799 records and 116 features, and for with-SMOTE, the dataset had 1428 records and 116 features.
Lastly, the third experiment (Exp-3) aims to consider the features that have higher importance according to machine learning features selection techniques. In this experiment, we applied three machine learning feature engineering techniques. Feature engineering techniques Select K-Best, Variance Threshold, and L1 Based techniques are applied in this experiment. After obtaining the results from these three-feature engineering techniques in the pilot experiment, we analyzed that the variance threshold feature engineering techniques produced much better results compared to the other two techniques. The accuracies achieved with variance threshold are much better than the other two techniques, as can be seen from Figure 8. As a result, we used the method of Variance threshold. Here again, we have two datasets in the third experiment (Exp-3) after performing different preprocessing steps. For without-SMOTE, the dataset has 799 records and 43 features, and the with-SMOTE dataset has 1428 records and 37 features. Here, features vary in both methods (without-SMOTE and with-SMOTE) because the variance threshold feature selection technique is applied separately to both data sources (without-SMOTE data and with-SMOTE data). Figure 9 elaborates the dataset properties of all the experiments.
In this research, we used different machine learning and deep learning algorithms for the prediction of academic student performance. These algorithms are Naïve Bayes (NB), Decision Tree (DT), Long Short-Term Memory (LSTM), Multi-Layer Perceptron (MLP), Random Forest (RF), and Support Vector Machine (SVM). Hyperparameter tuning in the shape of Grid Search CV is also applied in all these experiments, using both without-SMOTE and with-SMOTE. Different data evaluation techniques have been applied to evaluate the performance of the models. These evaluation techniques are Accuracy, Precision, Recall, F1-Score, and ROC (Receiver Operating Characteristic) curve.
Spatial statistical analysis techniques have also been applied to find the spatial behavior of the dataset. We used two methods: Multivariate Clustering and Average Nearest Neighbor.

Results
We carried out three experiments to test our methodology and to compare the performance of different proposed experiments with state-of-the-art machine learning and deep learning classification algorithms.

Experiment 1 (Exp-1)
Experiment 1 is performed on Fuzzy Delphi output. The input features that are shortlisted by applying FDM are used for student academic performance prediction. In this experiment, the Support vector machine (SVM) achieved the best accuracy compared to Decision tree (DT), Long short-term memory (LSTM), Multi-layer perceptron (MLP), Naïve Bayes (NB), and Random Forest (RF). This experiment is performed without SMOTE and with SMOTE. SMOTE is used for data balancing. With SMOTE, we can see that SVM obtains a higher accuracy of 89.5 as compared to all other models especially Random Forest and Long short-term memory, as shown in Figure 10. Regarding precision, recall, and F1-score, here again, the support vector machine (SVM) performs well with SMOTE; the multi-layer perceptron obtains good results with SMOTE as compared with all models especially Random Forest (RF) and deep learning model long short term memory (LSTM) ( Table 2). After applying hyperparameter tuning by using grid search CV, we also applied hyperparameter tuning by considering both data sources (without-SMOTE data, and with-SMOTE data). We retrieved the best hyperparameters for each used machine learning algorithm of both data sources (Table 3).

Algorithms
Without Experiment 2 is performed on the full dataset, which has 799 records and 47 features. In this experiment, Random Forest (RF) obtained higher accuracy using without-SMOTE as compared to Decision tree (DT), Long short-term memory (LSTM), Multi-layer perceptron (MLP), Naïve Bayes (NB), and Support vector machine (SVM). This experiment is performed without SMOTE and with SMOTE. With SMOTE, we can see that LSTM achieved a higher accuracy of 90.9 as compared to other models, as shown in Figure 11. With regard to precision, recall, and f1-score, LSTM performed better as compared to other models in both data sources (without-SMOTE data, with-SMOTE data) ( Table 4). Detailed hyperparameter tuning of LSTM was performed in this experiment. Firstly, the number of neurons of a single LSTM layer was identified with an epoch value of 70 and batch-size value of 10. Then, the number of LSTM layers was identified by providing the best number of neurons with the same epochs and batch-size values. In the next phase, we added and found the best number of dense layers (fully connected layers) in the LSTM model, with the best-identified LSTM layers and neurons, and with the same epochs and batch-size values. In the fourth phase, the best number of neurons in the dense layer(s) were identified. In the fifth and sixth phases, several epochs and batch sizes were identified by providing them with the best values of LSTM layers, LSTM neurons, Dense layers, and Dense neurons. This experiment was performed with both data sources (without-SMOTE data and with-SMOTE data), but we are visualizing the working of the with-SMOTE data, as it achieved the highest accuracy ( Figure 12).
LSTM loss and accuracy measure were visualized with both the test and train values of the dataset, and ROC curve were plotted to evaluate the LSTM model ( Figure 13). The ROC Curve shows good results with regard to the LSTM model. The hyperparameter of this experiment is mentioned in Table 5.   Experiment 3 is performed on the full dataset, which has 799 records and 47 features in the beginning. In this experiment, different machine learning feature selection techniques have been applied. Variance Threshold provided the best performance in the pilot experiment with all machine learning models as compared to other feature engineering techniques (Select K Best and L1-based feature engineering). For this, in experiment 3, we have considered this feature selection technique.
In this experiment, Support vector machine (SVM) achieved higher accuracy using without-SMOTE as compared to Decision tree (DT), Long short-term memory (LSTM), Multi-layer perceptron (MLP), Naïve Bayes (NB), and Random Forest (RF). This experiment was performed without SMOTE and with SMOTE. With SMOTE, we can see that RF achieved a higher accuracy of 88.8 as compared with other models (Figure 14). With regard to precision, recall, and f1-score, here again, the Random Forest achieved the highest accuracy with SMOTE, as shown in Table 6. After applying hyperparameter tuning by using grid search CV, we retrieved the best hyperparameters for each model, as shown in Table 7. Because we applied hyperparameter tuning on both data sources (without-SMOTE data, and with-SMOTE data), we obtained the best hyperparameters for each model for both data sources (Table 7).

Results Comparison
We have performed three different experiments on our research problem and accuracies of the experiments have been compared. Using Without-SMOTE data, we achieved the highest accuracy with experiment 2 by Random Forest (RF); using with-SMOTE data, the highest accuracy was also achieved with experiment 2 by Long Short Term Memory (LSTM), as shown in Figure 15 and the compared results in Table 8.

Significance of Features (P-Value)
In our research context, eleven features have a significant p-value (value < 0.05). Features like past performance, society status, and semester behavior have a lot of impact on students' performance ( Figure 16).

Conclusions and Future Works
In the current study, our main concern was to predict students' academic performance at an early stage of the semester so that early predictions will make students aware of their expected results, and early warning systems can be made to support student dropout rates. This study also performed extensive literature and tried to find the importance of key factors that can play important role in the academic student's performance predictions. Moreover, this study has focused on the locational factors of the students that can play important role in the student's academic life. By the locational features, we can find out the areas or regions that are lacking or that need an uplift, so that proper educational facilities can be provided to them.
Our study was focused on finding and working on important key features, from which we can predict student performance at early stages. We tried to collect all key factors that had been highlighted by relevant articles. Later, we took consensus from the educational experts to give each factor a score by using a 5-point Likert scale. To obtain the threshold of the scores, we applied Fuzzy Delphi Method to the scores. We used all the factors whose scores exceeded the threshold value. The purpose of this process was to highlight all important factors that have effects on student performance. Hence, future researchers do not need to find the importance of educational factors again. Educational Institutes also work on these factors and predict early student performance so that they too can minimize the dropout rates.
Another main aspect of our study was to consider locational factors, as it was one of our goals to find out the importance of some locational features (e.g., student location (urban, rural), access to school distance, society status, and geo coordinates of student location). We also performed some GIS analytics to discover the areas or regions that are lacking or that need an uplift, so that proper educational facilities can be provided to them.
Our study predicted student academic performance with high accuracy at early stages and highlighted key factors that are affecting student performance. Thus, these features are important not only to predict the academic performance of students but also to decrease dropout rates, increase graduation rates, drop the financial loss of both students and educational sectors, and most importantly provide high employment rates.
In our findings, the deep learning model LSTM achieved the highest accuracy of 90.9 compared with state-of-the-art machine learning algorithms. LSTM performed better when features and dataset records were large in number. Student academic performance prediction is performed with three experiments: (a) using the Fuzzy Delphi Method; (b) using educational key factors; and (c) using Machine Learning feature engineering techniques. The SMOTE data synthesizing tool was considered a significant method to deal with the unbalanced nature of the dataset for student performance prediction in all the experiments. According to this research, we can conclude that all the factors considered in the study have a higher correlation. We will obtain maximum accuracy if we use all factors for the prediction of students' academic performance. In this specific research context, we can conclude that the scientific analysis technique FDM obtains better accuracy as compared to the machine learning feature engineering technique (Variance K Threshold). Along with past performance and social status, the semester behavior factors have much impact on students' performance (T-Test). Spatial statistical analysis provided us the spatial information about the results (Performance areas and spatial correlation of features with Average Nearest Neighbor).
Estimation of student performance demands the attention of both students and teachers towards student performance where teachers can play an integral part by helping those students who need extra support and keeping them aware of dropping out of their course or subject. Student dropout rate is a potential problem that causes financial loss to both students and education sectors, affects graduation rates, and lowers the employment opportunities in highly qualified positions. Thus, we proposed a model based on previous records of students that will help estimate the final result of students and help reduce the rate of student dropout. Substantial work has been performed on the prediction of the performance of academic students using some data mining and machine learning models, where less importance was given to the locational features for prediction. In this research, we have combined both geospatial and machine learning tools for creating a relationship between students' location factors with their academic performance. The main purpose was to establish a GIS-based system that takes the geographic location of students, evaluates various educational strategies based on machine learning techniques, and generates results using multiple input data, for the prediction of their academic performance.
This study focused on finding the key factors, predicting the performance, and clustering the students in different areas based on their data class. Here, we especially used the geospatial locations of students. This study could not consider geo-socio-demographics features as we did not have the data. In the future, we can extend this work and predict student academic performance by considering their geo-locational attributes. As we have the coordinates data of students in our dataset, we can increase the amount of the dataset, and with the coordinates data of the student, we can extract their geo-locational features or area-specific features (i.e., number of schools, universities, hospitals, etc.) to find the socio-demographics features of any region. This method of working on the geo-locations of the students will also help us to look at student performance prediction with a new and broader perspective.