Design and Optimization of a Fuzzy Logic System for Academic Performance Prediction

: Currently, in Colombia, different problems in education exist; one of them is the inconve-nience in tracing and controlling the learning trajectories that decide the topics taught in the country’s educational institutions. This work aims to implement a logic-based system that allows teachers and educational institutions to carry out a continuous monitoring process of students’ academic performance, facilitating early corrections of errors or failures in teaching methods, to promote educational support spaces within the educational institution.


Introduction
Given its interpretability, Fuzzy Logic (FL) simplifies the design and analysis of rulebased systems in different research areas. Within the research, various proposals have arisen to improve predictability: some chose optimization algorithms; others chose a combination of Artificial Neural Networks (ANNs) with Fuzzy Inference Systems (FISs) to achieve Adaptive Neuro-Fuzzy Inference Systems (ANFISs). Today, the information from different educational institutions worldwide, whether physical or virtual, is becoming an essential aspect of data analysis; many proposals have been made to allow students and teachers of virtual courses to monitor academic performance, taking into account the concept of competency-based learning. Teachers analyze student's competencies and see the progress made [1].
On the other hand, some researchers applied fuzzy logic to the evaluation processes of exams or activities, and the assessment of the results could be carried out linguistically [2]. Then, a fuzzy logic system was proposed to modify the evaluation of the exams, taking into account the difficulty of each question and the time it should take for it to be answered, regarding the complexity of the question. This allows obtaining the "cost" of answering a question thanks to these data. Depending on these factors, an adjusted assessment is generated.
Arsad [3] proposed a performance prediction system based on neural networks and linear regression. The case study corresponded to the Faculty of Electrical Engineering of UiTM (Universiti Teknologi MARA) in Malaysia. Different grades obtained by students in the most relevant subjects in the first semester were taken as the system's input, and the output was taken from the Cumulative Academic Average (CAA) in the last semester.
The data from the analysis of a teacher's performance or the history of grades stored in institutions' databases could be applied in other engineering fields. Lin [4] proposed regression models to make predictions such as the number of hours studied or the level of literacy of the parents.
In Colombia, different attempts have been made in the use of predictive tools in education. Merchan's proposal [5] takes information from demographics and the "Saber 11th" state test to predict students' performance during the first year at the university. Merchan also developed a predictive model of the analysis of the information through

Proposed Approach and Organization
This work proposes a fuzzy logic system; thanks to its interpretability, we seek to facilitate educational institutions in their understanding of the existing relationships among the different academic degrees and, in turn, to carry out corrective tasks where the impact of academic performance is higher. In this way, this article presents the prediction system's design process, going from the data analysis phase and system modeling, to the optimization process and the results obtained for each algorithm to identify the one that returns a higher percentage of success in the final predictions.
The motivation for undertaking this investigation is to achieve excellence as a way of life; this deserves performing the academic controls and follow-up of the future professional students regarding bold and resolute decision making processes aiming at permanent success. In agreement with human development, it is mandatory to set the foundational and structural basis for decision-making processes at the personal and academic levels. This type of education is understood as fundamental and requires monitoring, traceability, and academic controls to allow corrections and adaptability regarding the cognitive strategies that, in time, ensure the success of the student. Consequently, this explains the proposal of a system based on fuzzy logic since this may help achieve the expected educational outcomes.
The document is organized as follows. Section 2 details the methodology and describes the design and optimization of the fuzzy system. Section 3 displays the results and performance metrics. Section 4 exposes the discussion. Finally, the conclusions are presented in Section 5.

Methodology
The order of the stages applied is as follows: data collection, prediction system design, and fuzzy logic system optimization; finally, analysis of the results using various metrics.
It should be borne in mind that the prediction system takes student's grades (scores in each term) in previous years in every subject and academic period to outline the performances in 11th grade.

Data Collection and Analysis
In this stage, the data analysis provided by the school "Colegio de la Reina " located in Bogotá D.C., Colombia, takes place. Within the information provided by the Educational Institution (EI), there was a record for each of the subjects taken using the same group of students from 8th to 11th grades, producing different academic marks for each academic period during a year and scoring between 0 and 100 marks.
At present, the Colombian educational system is divided into five different stages: five for the preschool stage (walkers, toddlers, pre-kindergarten, kindergarten, and transition); primary education from 1st to 5th grades; secondary education (grades 6 to 9, called basic-high school), and middle education (grades 10 and 11), culminating with higher or professional education [18]. From this, it is possible to define that the data provided by "Colegio de la Reina" consist of students' grades during the last two years. From there, it was possible to identify subjects that reported more academic marks in the same academic period than those reported for other subjects; thus, it was decided to establish the average of grades for each academic period denoted asx p,a,g in Equation (1), which allowed 4 terms per year in each subject with values between 0 and 100. These data were obtained for each student.x In Equation (1), the respective variables are: • i = 1, 2, 3 . . . , N: represents the academic marks; the rows in Table 1. • N: the number of academic marks in the academic period (number of rows in Table 1). • p = 1, 2, 3, 4: is the academic period of the year. • a = 1, 2, 3, . . . , 8: the subject, depending on the grade (see Table 2). • g = 1, 2, 3: the respective grade (level); g = 1 for 8th, g = 2 for 9th, g = 3 for 10th, and g = 4 for 11th.
As an example, Table 1 shows the case of grade 9, corresponding to g = 2 and the subject of sciences a = 1, showing the respective mark values (performance) for a student. The datax p,a,g are used as the input to predict the academic performance. Subsequently, an analysis was carried out on the subjects evaluated during the state exams "Saber 9th" [19] and "Saber 11th" [20], making it possible to see the evaluation of knowledge in areas such as mathematics, natural sciences, social sciences, and English. Thanks to this analysis, a filtering of the most relevant subjects for the prediction process was carried out, obtaining Table 2.  Table 2 presents the subjects of the educational institution, which were evaluated through the state exams applied by ICFES (Instituto Colombiano para el Fomento de la Educación Superior). Those exams are applied when the students complete studies at the primary and secondary education levels. Knowing the subjects evaluated in the exams that will be applied to the students of the Colombian educational institutions, it is possible to define how actively monitoring of the students' academic performance should be carried out.
The collected datax p,a,g were used as inputs in the prediction system, corresponding to the term values from each academic period; for its easy interpretation, in the model, special labels are assigned as presented in Section 2.3. Figure 1 shows the correlation matrix among subjects to identify the relationships of the 11th grade. It should be noted that the calculus subject does not have any type of correlation with other subjects outside of the mathematics area since it is only directly related to statistics; besides, when examining other cells in the calculus row, it is possible to observe that the performance in this area is mostly inversely proportional to the other subjects, which may imply that mathematics requires a high degree of dedication; therefore, when performance increases, other subjects are affected.
On the other hand, the area of languages, which includes English and Spanish, displays a high degree of correlation. In addition, the existing correlation between physics and chemistry is low, which would imply that the only relationship that can exist between these two subjects is the knowledge acquired during the 8th and 9th grades in the sciences subject. One should appreciated that the political science subject does not have any type of relationship with the others.
As a summary of the above, Figure 1 allows knowing the related subjects as observed with SB3, SB6, SB7, and SB8; however, when considering the relationship between previous subjects, it should be borne in mind that although there is a high correlation value, there are subjects that do not belong to the same area of knowledge. In this way, subjects that may be affected by common factors such as previously viewed subjects can be established as shown below. Taking Table 2 and Figure 1 as a reference, it is possible to make the graph of Figure 2, which allows visualizing the learning trajectories for each subject throughout the educational levels proposed, excluding subjects with no high correlation degree or those that, during the teaching process, do not have a relationship between topics. In this way, the relationships among the subjects used to design the prediction system are established.
The information obtained permits the definition of a system where student's scores can be used by differentiating a unique subject identifier to get a single prediction system instead of multiple systems (one for each subject) working individually.

Optimization Algorithm Selection
Within the optimization algorithms defined in different documents and investigations, two large groups can be determined. First are algorithms where several N executions provide the same result in the intermediate process and at the end of the execution, which are called deterministic. On the other hand, there are the non-deterministic ones, in which random factors are introduced during the solution search process, for which the same result is not always achieved [21].
Within these groups of non-deterministic algorithms, there are those based on behaviors found in nature, known as metaheuristic algorithms [22], within which it is possible to identify the following: From the above, different types of optimization algorithms that allow modifying the rules, sets, or parameters of the membership functions of a fuzzy logic system were defined, achieving the prediction of academic performance. For this, the MATLAB Global Optimization Toolbox as used [23]. For the prediction system, the Fuzzy Logic Toolbox was used [24]. Referring to the review of [23], five compatible optimization algorithms were identified, including Genetic Algorithms (GAs), Particle Swarm Optimization (PSO), Simulated Annealing (SA), and Pattern Search (PS).
The toolboxes correspond to Version 2019a, implementing new features within the Fuzzy Logic Toolbox, including the utilization of fuzzy trees as a collection of different fuzzy logic systems or the possibility of carrying out the learning and training process of fuzzy logic systems through some of the algorithms of the Global Optimization Toolbox, such as genetic algorithms, particle swarm optimization, simulated annealing, pattern search, and ANFIS [25].
Genetic algorithms emulate nature's process of improving a species over time [26,27]. The collective behaviors inspired the algorithms based on swarms of particles that living beings create when searching for food [28,29]. Simulated annealing is a method that aims to emulate the crystalline form of a material by heating and cooling it, thereby seeking to go from a higher to a lower energy state [30]. Pattern search is based on direct search algorithms such as Generalized Pattern Search (GPS), Generating Set Search (GSS), and Mesh Adaptive Search (MADS), where in each step, a mesh pattern of points is generated and evaluated [23].
In this order, the optimization algorithms are used to achieve the objective of modifying the parameters of the proposed fuzzy systems used for the prediction of academic performance.
It is important to note that this work does not seek to make a comparison between algorithms; the object of study is to establish which configuration of the fuzzy system is the most suitable for the prediction system. Under this approach, the most appropriate configuration must present the best performance on the different algorithms used.

Fuzzy Logic System Design
Fuzzy logic was first proposed in the mid-1960s by Lotfy A. Zadeh, who at that time defined the "principle of incompatibility". A fuzzy set is a class of objects with different degrees of membership; each set is characterized using different membership functions, which assign to each object a degree of membership in the range between 0 and 1. Using fuzzy logic is beneficial since it represents human reasoning, where the truth or falsity of a proposition, or the degree of belonging of an object to some kind of class, is measured in proportions, such as "little", "greatly", "more" and "less" [31].
After carefully analyzing the relevant data and choosing the optimization algorithms, the logic-based system is implemented. The first system designed consisted of multiple inputs and four outputs: each academic period taken by a student during 8th, 9th, and 10th grades corresponded to the system inputs with the number of failures during each elective year and an identifier for the subject. The four outputs corresponded to the prediction of the student's marks during 11th grade.
For the implementation of the prediction system, a Takagi-Sugeno-type fuzzy inference system was employed, which uses fuzzy sets in the antecedent and, in the output, functions that depend on the input variables [32].

Takagi-Sugeno Fuzzy Systems
The Takagi-Sugeno fuzzy model was proposed to develop a systematic approach to generating fuzzy rules from a given input-output data set [33]. A typical fuzzy rule in a TS fuzzy model has the form: If x is A and y is B, then z = f (x, y), where A and B are fuzzy sets in the antecedent and z = f (x, y) is a function in the consequent. In many applications, f (x, y) is a polynomial of inputs x and y. Some TS systems are: Considering the case when a fuzzy inference system has two inputs x and y and one output z, a first-order TS fuzzy model has rules as follows: Figure 3 displays an example of a TS fuzzy system for two inputs and two memberships in each input.  Figure 4 shows the proposed Prediction Model 1 (PM1), which incorporates the inference process into a single zero-order Takagi-Sugeno fuzzy inference system. The system inputs of Figure 4 are the "term" values (student scores) for Periods 1 to 4 of 8th, 9th, and 10th grades. The number of absences "# fails" of the student at the 8th, 9th, and 10th levels is also taken as input; besides, the identifier "ID Subject 10th" is employed (according to Figure 2 and Table 2). The outputs are the predicted term values for Periods 1 to 4 of 11th grade. The "term" scores of the inputs and outputs are values between 0 and 100, the number of absences depending on the number of lectures of the academic period and the subject identifier according to Table 2. In order to reduce the number of inputs, the value of the "final exam" for each level is not included in the prediction system; this can be considered as a system complexity limitation.  By inspecting the proposed system of Figure 4, an overload in the inference process (associated with the number of rules) is established, making it difficult to carry out any modification process and increase the system's training costs.

Prediction Models
Considering the above, a modular system called Prediction Model 2 (PM2) was developed using zero-order Takagi-Sugeno systems. Each elective grade (level) uses a fuzzy inference system to ease modifications such as when it was necessary to add the qualifications of previous grades to the prediction system. This configuration corresponds to Predictive Model 2 (PM2), shown in Figure 5. Using the systems for each elective grade, the configuration where the subsystems should be placed is shown according to the relationships among them (from Section 2.1).

Subsystem 11th Grade
(Sugeno)   Figure 6 shows the "8th grade" subsystem; the inputs are the "term" values of students' grades for Periods 1 to 4, including the mark for the "final exam" and the student's number of absences "# fails". The output corresponds to the influence of this subsystem on the prediction. The "term" inputs, the "final exam", and the output have values between [0, 100].  Figure 7 shows the "9th grade" subsystem with the same structure of the "8th grade" subsystem; however, for this, the data of 9th grade is used. The output "Result 9th" is a value in the range [0, 100].  Figure 8 shows the configuration for the subsystem associated with 10th grade. The system inputs are the "term" values obtained by the students in Periods 1 to 4, the "final exam" also being used as input with vales between 0 and 100. The number of students' absences for each grade "# fails" and, as additional input, the identification of the subject type "ID subject" are shown according to Table 2. The subject identifier is included in this system since the trigonometry and statistics subjects of 10th grade have an influence on 11th grade statistics, as presented in Figure 2. As observed in Figure 5, the creation of a combination subsystem was proposed where the outputs of the subsystems "8th grade", and "9th grade" are used to obtain a single output. The range of values for the inputs and output of the subsystem "mixed 8th-9th" is [0, 100]. In the same way, the subsystem "11th grade" uses the outputs of subsystems "mixed 8th-9th" and "10th grade" to attain the "term" prediction values of "11th Grade" for each of Period 1 to 4. The range of values for the inputs and output of this subsystem is [0, 100].
As functioning example is considered as the case of one student that has the scores presented in Table 3, where the columns of 8th, 9th, and 10th grade contain the data input and the column for 11th grade the resulting output.

System Optimization
The implementation seeks to establish which configuration of the fuzzy system presents the best result for the different algorithms considered. Thus, according to the number of rules used in the fuzzy systems, the same trend must be presented in the algorithms used.
For the optimization process, the four selected algorithms were implemented in the fuzzy logic tree proposed in Figure 5. For PM2, two-hundred iterations were carried out for each algorithm, and five different scenarios were proposed, each one with a maximum number of rules (5, 10, 20, 50, and 70). This represents the maximum number of rules assigned within each of the subsystems (8th, 9th, 8th-9th, 10th, 11th); this is in order to find the point at which the number of rules ceases to strongly influence the error of the predictions.
For the optimization algorithms, some important factors can affect their efficiency. In this regard, different approaches for parameter selection can be considered as presented in [34][35][36], where the common approach was via meta-optimization. In particular, suggestions for GA parameter selection can be found in [37], as well as in [38] for PSO and in [39] for SA. However, as the objective of this work is not to carry out a study of the parameter variation of the optimization algorithms, for the configuration of these algorithms, the suggestions (default parameters) of [40] for the GA and those in [41] for PSO, SA, and PS were followed, which are presented below.
For the GA, the size of the population is 200; the crossover fraction was set to 0.8; and the mutation rate was equal to 0.01; for coding solutions (chromosome), the type "double" data in MATLAB was used, which is a 64 bit word with 1 sign bit, 11 bits for floating point exponent, and 52 bits for the mantissa; in addition, for the rule list, a "bit string" was used. The scattered crossover function was used, which creates a random binary vector and selects the genes where the vector is 1 from the first parent and the genes where the vector is 0 from the second parent, combining the genes to create the offspring. In addition, the Gaussian mutation function was used, for which a random number taken from a Gaussian distribution was used for the mutation process, and the Gaussian distribution depended on the parameters' scale and the population range. The algorithm stops if the average relative change in the best fitness function is less than 1 × 10 −6 . For PSO, the swarm size was 10n var , where n var is the number of variables of the fuzzy model, and the cognitive (self-adjustment) weight associated with each particle's best position was set to 1.49, while the social adjustment weight associated with best position of the swarm was equal to 1.49. Using SA, the initial T 0 temperature was set to 100 for each dimension, while the reannealing interval corresponded to 50. The function used to update the temperature schedule was T(k) = 0.95 k , where k is the annealing parameter. Finally, for PS, the initial mesh size was set to 10, and the mesh tolerance value used to stop the sear process was 1 × 10 −7 . Tables 4-7 show a summary of the results obtained during the optimization process. On the other hand, setting 200 iterations allows finding the one where the variation of the error between the iterations is no longer significant with respect to the time it takes to execute each iteration.
The objective function is presented in Equation (2), where i is the variable associated with each input-output data, N the total number of data used, Real i the value of the real data, Calculated i (X) the data obtained using the prediction fuzzy system, and X the parameters of the fuzzy system that include the parameters of the membership functions, the rules of the fuzzy system, and the parameters of the output functions of each Sugenotype fuzzy subsystem.
As seen in Table 4, during the optimization process of the system with 5 rules, a minimum error of 11.4 points was reached. On the other hand, the system with 70 rules managed to obtain the lowest error during 200 generations, although it was not possible to capture a significant variation of this after the 150th generation. Finally, the systems with 5 and 10 rules, during optimization, presented alerts due to the lack of rules associated with system outputs.  By analyzing the results obtained during the optimization process with the particle swarm optimization shown in Table 5, it can be seen that the system with 5 rules ended prematurely in Iteration 160, in turn obtaining the worst error of the algorithm, while the system with 70 rules achieved the best result in Generation 190, with a mean error of 7.8.
When analyzing the results obtained by the optimization process using the simulated annealing algorithm (Table 6), it is seen that during several iterations, there was no reduction in error, and in turn, the algorithm attained the worst optimization result.
According to Table 7, the pattern search optimization algorithm produces the same error of 16.77 in all systems. For the adaptive mesh pattern search algorithm, regardless of the maximum number of rules, the system always ended when reaching 32 iterations because the minimum size for the mesh was reached, which is located below 1.0 × 10 −8 .

Result Analysis
For the result analysis, the measures of the Mean Absolute Deviation (MAD), the Root Mean Squared Error (RMSE), and the Symmetric Mean Absolute Percentage Error (SMAPE) were used [42]. The MAD measure is the mean of the absolute deviations of a dataset about the data's mean, and this corresponds to the average distance of the data set from its mean and is defined as: The mean squared error is commonly used for assessing numeric prediction. This value is computed by taking the squared differences between each predicted value and the actual value. Thus, the root mean squared error corresponds to the square root of the mean squared error, given by the formula: The mean absolute percentage error is a measure of accuracy in fitted dataset values, and this performance measure usually expresses accuracy as a percentage, calculated by the following expression: In the above equations, i is the index of the data value, Real i corresponds to the real data, Calculated i is the value calculated with the fuzzy inference system, and N is the total number of data.
The MAD and RMSE are used to make precision comparisons between different algorithms, while the SMAPE allows evaluating the error percentage in the predictions considering only its magnitude.
Furthermore, a comparison was made between the predictions by each of the algorithms implemented, in such a way that the best algorithm could be selected, as well as the best number of rules for the system. The following tables present the results obtained by the MAD, RMSE, and SMAPE measurements; on the other hand, each table indicates the maximum number of rules associated with the learning and optimization process. As a note, the SMAPE is given in terms of the percentage of error. Table 8 presents the summary of the results obtained by the system during the optimization process. From all data, seventy-two percent (121 input-output data) of the total data was used to perform the system training, while the remaining 28% (47 input-output data) was used to perform the system evaluation process.
When analyzing the results, it is possible to see the lowest error percentage achieved where the genetic algorithms were implemented during the optimization process; meanwhile, the worst results are attributed to the simulated annealing algorithm and pattern search.
Subsequently, Table 9 describes the analysis of the results considering the maximum number of rules for the system, and from there, the system is visualized, in which a maximum of 70 rules were used presenting the lowest MAD, followed by the set of 20 rules. Then, the evaluation process of both systems took place using the remaining 28% of the data, in turn analyzing the error for each of the outputs individually.
When checking the results obtained during the system evaluation process, it is seen that the system in which a maximum number of 20 rules obtained the lowest error for the first two outputs, this corresponds to the first and second academic periods of 11th grade, with an error around 4.87 and 5.1 points. Analyzing the SMAPE, it would correspond to an error of 7.6% and 8.6%, respectively. On the other hand, the system with 70 rules managed to obtain the best result for the third and fourth academic periods with an approximate error percentage of 10.8% and 5.15%; therefore, when compared to the results obtained by the system with 20 rules, it is not possible to really appreciate a difference high enough to define without further analysis any of these two systems.
Besides, to determine a suitable prediction system, it is necessary to consider the time required for the system optimization process. For this, Table 10 was calculated, presenting the respective time for each algorithm and different numbers of rules. In order to unify the execution of GA, PSO, and SA algorithms, two-thousand iterations were used in each case. For PS, it stopped at Iteration 34, inasmuch as the smallest mesh of 9.537 × 10 −7 was reached. Since the best results of MAD, RMSE, and SMAPE were obtained for 20 and 70 rules using GA, then during the system's optimization process using GA with 20 rules, it spent 190.78 minutes. Meanwhile, the system of 70 rules took 1007.98 minutes. Then, considering this aspect, the system based on 20 rules is the suitable option, since it uses low optimization time.

Discussion
Despite the different system implementations made, the genetic algorithms managed to reach the best result. Contrary to the expectations, the implementation with a maximum of 20 rules per subsystem managed to obtain a better result than the one with 70 rules, and this is because the latter required a greater number of generations for a better product, although this requires a high computational and temporal cost.
On the other hand, when analyzing the amount of rules generated in each subsystem, it was seen that the greatest computational load in terms of the optimization process occurred in the first three subsystems since they were the ones using the maximum limit of rules, while the last two reached less than 50% of the maximum amount.
Regarding the error measurement, a suitable performance was obtained since the MAD was located at 5.4282 and the RMSE at 7.4833. Using an evaluation scale that goes from zero to 100, the error was less than 10%, which is a positive aspect given the difficulty of predicting data such as academic performance.
The error obtained was acceptable since it was expected to execute a academic control on those students displaying low performance during the last academic period. The proposed system provided a preview of students' performance; when implementing this system with one of the alerts, it notifies about the student's low performance or possible academic problems from the beginning.
The proposed model is applicable to all educational levels, which implies a fine-tuned monitoring and academic control provided that the intervention is suitable and adequate. Implementing this intervention must be based on the expected profiles of a person at each educational stage; mainly, it is necessary to define the advancements of the student's when finishing each level. From this viewpoint, the educational strategies constitute the best answer to meet the educational needs. These may be easier if the profiles are synchronously obtained with the learning and development of skills and competencies in building a disciplinary lifestyle.
Finally, it should be considered that a direct comparison between the algorithms was not carried out since the object of study was to establish the most appropriate configuration of the fuzzy prediction system. The optimization algorithms allow observing the existence of a configuration that presents the best result for the different algorithms considered.

Conclusions
The study would allow academic control either by carrying out reinforcements at the beginning of the year or by notifying the parents about the shortcomings that the student may have throughout the year, in such a way that some difficulties are mitigated, allowing the institution to achieve better outcomes from the state tests, granting a better reputation to the institution due to its educational quality and the high effectiveness of the communication process between the institution and parents regardless of the shortcomings of the learners.
It is also observed that students' academic performance depends on a large number of external factors that affect students in positive and negative ways, like health status or even mood. Therefore, taking into account that these factors were not considered, it is concluded that a suitable error result was achieved, which can be improved if more variables are added to the system, although these would lead to a higher computational cost.
Finally, the implementation costs were not high since they only require the energy expenditure of the remote server that performs the optimization process, and the time cost depends directly on the machine where the process is executed; thus, based on the fact that a 2.0GHz quad-core computer spends approximately four hours on optimization using genetic algorithms with 20 rules for each subsystem with 200 records divided between training and evaluation data.  Institutional Review Board Statement: In this work, direct tests were not carried out on individuals (humans). The historical data used for this study were provided by the school "Colegio de la Reina" located in Bogotá D.C., Colombia.

Informed Consent Statement:
The data used was requested from "Colegio de la Reina" Bogotá D.C., Colombia.
Data Availability Statement: The original database is at "Colegio de la Reina" located in Bogotá D.C., Colombia.