Analysis and Prediction of Engineering Student Behavior and Their Relation to Academic Performance Using Data Analytics Techniques

This study focuses on identifying personality traits in computer science students and determining whether they are related to academic performance. In addition, the importance of the personality traits based on motivation scale and depression, anxiety, and stress scales were measured. A sample of 188 students from the Computer Engineering Schools of the Pontifical Catholic University of Valparaíso was used. Through econometric two-stage least squares and paired sample correlation analysis, the results obtained indicate that there is a relation between academic performance and the personality traits measured by educational motivation scale and the ranking of university entrance and gender. In addition, these results led to characterization of students based on their personality traits and provided elements that may enhance the development of an effective personality that allows students to successfully face their environment, playing an important role in the educational process.


Introduction
Motivation has been one of the most important concepts in the educational context [1]. Several studies show that motivation is related to various aspects such as persistence, learning, and the level of execution [2][3][4]. Motivation is defined as the energy that moves us towards a goal; it is a complex and multifactorial concept, as well as a personal and individual process [5].
Motivation continues to be an essential area of work in the university environment due to the connection that it has with the behavior of students academically. Finally, it is understood as the engine that mobilizes the disposition towards learning of the students, so it is crucial for educators to create a context in which students can learn with motivation towards a goal that they consider personal and relevant as the main axis [6].
In the last 50 years, the concept of motivation has been widely discussed and investigated, mainly because it visualizes motivation as one of the most important factors in the learning process, with 5 theories that treat academic motivation systemically: attribution theory, expectancy-value theory, social cognitive theory, achievement goal theory, and self-determination theory. These approaches focus on student beliefs, values, needs, and goals [6].
This research seeks to verify whether there is a correlation between the academic performance of the students of the School of Computer Engineering of the Pontifical Catholic University of Valparaíso and their motivations and moods for each of them. Since anxiety and depression are disorders that negatively impact the development of young people, in addition, they affect various aspects of their lives and predispose them to suffer from other health problems [7]. In this sense, it has been observed that anxiety impacts the normal functioning of young people, taking chronic form and raising the risk of presenting other diseases. On the other hand, depression significantly impairs their academic and psychosocial performance, raising the risk of suffering from other mental and physical health pathologies [7]. Regarding motivation, we want to observe how influential this factor is in the depressive and stressful disorders that students have and if it manages to produce an impact on academic performance.

Background
The theory of self-determination tells us that there is an inherent interest of human beings to learn and develop from the moment they are born, and that it is the environment and upbringing that contribute positively or negatively to development as the child as a girl grows up; from there, it is imperative to consider motivation from a developmental psychology perspective [8]. The theory of self-determination postulates that, in motivation in education, it cannot be understood from a one-dimensional point of view and postulates that behavior can be intrinsically motivated, extrinsically motivated, or amotivated [1].
Intrinsic motivation (IM) is understood as a sign of competence and self-determination [9,10] and is considered a global construct in which three types can be differentiated: towards knowledge, towards achievement, and towards experiences [11]. IM towards knowledge has been related to concepts such as curiosity or motivation to learn [12] and refers to carrying out an activity for the pleasure that is experienced while learning, exploring, or trying to understand something new [13]. IM towards achievement can be defined as the commitment in an activity for the pleasure and satisfaction experienced when trying to overcome or reach a new level [1]. Finally, the IM towards stimulating experiences takes place when someone engages in an activity to have fun or experience stimulating and positive sensations derived from their own dedication to the activity [13].
On the other hand, extrinsic motivation (EM) refers to participation in an activity to obtain rewards [1]. The EM defines as a multidimensional construct, just as the IM has three types that, ordered from least to highest level of self-determination, is: external regulation, introjections, and identification [9][10][11][12][13][14]. External regulation is the most representative type of EM and refers to participating in an activity to get rewards or avoid punishment. In addition, the behavior is the result of experiencing external or internal pressures [1]. In introjection, although the behavior is regulated by demands, the individual begins to internalize the reasons for his action, but is not yet self-determined. It may involve coercion or pressure to carry out something, which prevents individuals from making decisions about their own behavior [1]. Finally, identification is the most self-determined type of EM, since individuals value their behavior and believe that it is important. The commitment to an activity is perceived as a choice of the individual himself, although EM is still considered, because behavior is an instrument to achieve something [11].
The third dimension that postulates the theory of self-determination is amotivation. It is carried out when contingencies are not perceived between the actions and their consequences. The individual is not intrinsically or extrinsically motivated and all he feels is incompetence and uncontrollability.
The amotivation is at the lowest level of autonomy in the continuum of the different types of motivation [1].
In the psychoeducational context, the relationship between achievement motivation and school performance has been studied to try to answer why some students succeed academically, while others who possess similar abilities do not achieve similar results. It is thought that what distinguishes high and low achieving students is that the former have a higher achievement motivation than the latter. However, it is not a clear and resolved rule, since other authors have not found differences between the motivation of students with high and low school performance. As the investigations are not conclusive, the need arises to continue with the approach to this concept that intervenes as one of the most important factors in the learning process [15].
In the absence of an instrument that allowed measuring the different types of motivation present in the theory of self-determination, Vallerand, Blais, Brière, and Pelletier [16] developed and validated in French the Échelle de Motivation en Éducation (EME). Validation studies revealed that EME had satisfactory levels of internal consistency, with a mean in Cronbach being alpha of 0.80 and high rates of temporal stability, and a mean of 0.75 in the test-retest correlation, after a period of one month [16]. The results of the confirmatory factor analysis confirmed the structure of seven EME factors and the construct validity was tested through correlations between the seven subscales of the instrument. In addition, EME has been able to predict educational dropout [13].
The translation of EME into Spanish was performed using the parallel back translation procedure [17] as a first step and secondly, the items obtained above were evaluated by a committee formed by the subjects who participated in the translation process and two expert teachers in the psychology of motivation. They selected the items that had maintained the original meaning and prepared the format and instructions of the scale in an identical way to the original version [1]. Thus, the Spanish version is called the Educational Motivation Scale (EME-E). The Educational Motivation Scale has been validated in Spanish-speaking countries such as Mexico [15] and Paraguay [18].
The results have revealed that the EME-E has adequate levels of reliability and validity factorial consonance with the results of the original version and the English version. Regarding the reliability of the scale, the internal consistency of the subscales has been adequate and very similar to those found in the original version [16] and in the English version [11], as well as in the validation performed in a US sample [19].
Youth is a very important stage in people lives, in which very important development processes occur for the transition to adulthood. Among them, young people finish their educational training, join the world of work, can participate as citizens, establish emotional relationships, as well as face risk behaviors such as drug and alcohol use [20]. This period can be especially challenging and vulnerable for individuals with a greater emotional reactivity to stress, which can play a significant role in the development of anxiety and depression disorders [21].
Depression and anxiety are very different, but in clinical practice and research specifically overcome, since they usually appear several times [22]. To improve the distinction of anxiety and depression, the Depression, Anxiety and Stress Scales (DASS) were created [23]. The DASS was made up of three scales that measure depression, anxiety, and stress separately [24]. Considering that depression and anxiety represent dimensions and not categories, non-clinical samples were predominantly used in the development of DASS, in which preliminary tests showed that DASS had adequate convergent and discriminant validity [22]. Some studies have directly tested the validity of DASS construction, both in clinical and non-clinical samples, in young people and adults [24][25][26][27] In later years, an abbreviated version of the instrument (DASS-21) was developed for situations in which a shorter application is needed [22]. It was translated and validated in the Hispanic population [28] and its factor structure and psychometric properties have been analyzed, finding a three-factor structure through confirmatory factor analysis and exploratory factor analysis [22]. In the Hispanic population, DASS-21 has been found to show an acceptable fit for a three-factor model and important correlations between scales [28].
In Chile, a validation study of DASS-21 [29] was carried out in middle school students, with an average age of 15 years. To verify the convergent and discriminant validity, the Beck II Depression Inventory (BDI-II) and the Beck Anxiety Inventory (BAI) [30] were used, obtaining significant correlations between DASS-21 and the BDI (r = 0.64), as with the BDI (r = 0.57). Together, previous research indicates that DASS-21 has a solid internal consistency and provides an adequate distinction between anxiety and depression, in relation to other existing means [22].
These studies show that the comorbidity that occurs in clinical practice could have a basis in the common origin of depression, anxiety, and stress and that they have implications in the management of treatments, since they increase clinical complexity, reduces the effectiveness of usual treatments, and prognosis worsens [22]. That is why researchers and clinicians who are responsible for clarifying the origin of an emotional disturbance may find it useful to measure the three states separately in order to plan interventions to prevent these disorders [31]. Following this line, prevention is fundamental in the university population, especially considering the threat that young people will present to develop emotional disorders [32].
Other concepts that are widely studied in the literature associated with the psychosocial area, but that have not yet been developed in the academic area, are: self-concept and self-esteem. Self-concept and self-esteem are constructs linked by the psychosocial area, as the entrance to the university of a greater number of students with different abilities and knowledge expands, the importance of focusing on aspects not only of empirical knowledge is beginning to be glimpsed that they can acquire in this space, but it is observed that in order to train a suitable professional for the world of work, we must focus on the development of personal aspects of each student.
The concepts self-esteem and self-concept are complex terms to define, because the conceptual delimitation of both is not clear, some authors even use them as synonymous terms, referring to the knowledge of the human being of himself; however, it is also stated that they are perfectly distinguishable, being self-esteem oriented to value and self-concept clearly to knowledge [33]. Regarding the concept of self-esteem, there is no single definition, but rather a wide variety of definitions that are not exclusive, but are definitely not complementary; a consensus has been reached that it belongs to a dimension or aspect of the self-concept, being an orientation towards the self, the value that the person attributes to himself [34].
Although these concepts have been widely investigated in the psychosocial area, they have not yet been fully developed in the academic area, having approaches where the relationship between self-esteem, self-concept, and academic performance is evaluated using the Coopersmith Inventory, it has been concluded that, effectively, self-esteem positively influences academic performance [35,36].

Sample
The research sample comprised 188 students of the Pontifical Catholic University of Valparaíso, in particular, the School of Computer Engineering. The latter has two majors: Civil Data Engineering and Data Execution Engineering. The sample included students of both genders. The average age of students was 19 years.
Students accessed the test by an online form. Before accessing the first question, the students read a section that explained the objectives of the activity and described the confidential treatment of the data, noting its exclusive use for the study. In addition to making the voluntary nature of their participation explicit, the test required clicking on the <I Understand> button to begin answering, representing Informed Consent.

Research Design and Study Variables
For the measurement of personality traits, the Educational Motivation Scale (EME-E) and Depression, Anxiety and Stress Scales (DASS-21), were used in their Spanish version [1,22]. The data were obtained after surveying the previous sample. For the measurement of academic performance, total accumulated registered credits were used in this research [37]. For the measurement of academic performance, total accumulated registered credits were used in this research [37], besides this variable, ranking of university entrance and students family members (relatives) gone through higher education. The data for these variables were obtained from the institutional analysis unit of the university.
The Spanish version of the EME called the Educational Motivation Scale (EME-E) had 28 items that represented the reasons why students go to the University; these reasons were scored according to a Likert scale of seven points from 1 ("It does not correspond at all") until 7 ("It corresponds completely"), with an intermediate score of 4 ("It corresponds fairly"). The EME-E instrument measured seven dimensions: motivation, external regulation, introjected regulation, identified regulation, motivation to knowledge, motivation to achieve, and motivation to stimulating experiences [11,16].
The abbreviated Chilean version of the Depression, Anxiety and Stress Scales (DASS-21) was translated and adapted in Chile previously [38] and modified [29], the latter version being used. The DASS-21 has 21 items, with four response alternatives in Likert format, which range from 0 ("It does not describe anything that happened to me or felt in the week") to 3 ("Yes, this happened a lot, or almost always"). To answer, the slogan states to what extent the phrase describes what happened or the person felt during the last week. This instrument has the advantage of being a self-report scale, brief, easy to administer and respond, being its simple interpretation. In addition, it has presented adequate psychometric properties in previous validation studies [25,27,39] and an acceptable adjustment to a three-factor model in Spanish-speaking samples [28,29,40]. The DASS-21 instrument measured three dimensions: depression, anxiety, and stress.

Data Analysis
Next, we present the results of the research, which seeks to identify the primary personality traits of engineering students and determine whether personality traits are related to academic performance. In Table 1, we can see the exploratory analysis about variables of the research; of the 188 students of the School of Computer Engineering, 90.3% were men and 9.7% were female. On average, 2 student family members (relatives) have gone through higher education; the minimum ranking of university entrance was 438 points and the maximum was 850.

Econometric Analysis
According to the purpose of this study and the literature review, we use a set of variables to determine the relation between each of the personality traits and academic performance in the sample. Therefore, the general model includes the academic performance measured by total accumulated registered credits as an endogenous variable and ranking of university entrance, students family members (relatives) gone through higher education, external regulation, introjected regulation, identified regulation, motivation to knowledge, motivation to achieve, motivation to stimulating experiences, and motivation, as exogenous variables. Due to the nature of our variables, we have the presence of feedback loops in the model. We have used a cross-sectional model with two-stage least square (2SLS) regression analysis for this empirical application [41][42][43][44].
The general model in its functional form is specified by the following equation: The estimation results of the final model with endogenous variable measured by academic performance and its significant exogenous variables are illustrated in Table 2. In Table 2, is possible to observe the estimation of the general model of the Equation (1). The T-Student test of the individual significance of the parameters would help us to evaluate whether or not a variable should be included in the model specification, to the extent that if the true value of the parameter were equal to zero, it would be clear that the importance of said variable to explain the endogenous one would be null, and vice versa. For that, the hypothesis to test the significance of the regression coefficient was H 0 as null hypotheses, which states that the coefficient is equal to zero, the alternative hypotheses H 1 states the opposite (coefficient is different to cero). p-value provides information about whether a statistical hypothesis test is significant or not, and it also indicates something about how significant the result is: the smaller the p-value, the stronger the evidence against the null hypothesis. The results presented in Table 2 allow us to appreciate adequate levels of individual significance (T-Student test), as well as levels of joint significance (F-Statistics), which compares the joint effect of all the variable model together. Furthermore, the model presents adequate information criteria (Akaike, Schwarz, and Hannan-Quinn).

Paired Sample Statistics Analysis
We have tested paired sample t-test, to determine whether the mean difference between total accumulated registered credits and total accumulated registered credits estimated for the Equation (1) is zero, i.e., we are interested in evaluating if the estimates made by our predictive model are correct (see Tables 3 and 4). If we look at the results in Table 3, the mean of total accumulated registered credits is 4.23707 and the total accumulated registered credits estimated by equation 1 is 3.92668; they seem similar and only have a difference of 0.31039 points. Now, when reviewing the results delivered by Table 4 (paired samples correlations), we have checked this similarity (correlation index: 0.358; sig. 0.002).

Results
From the econometric analysis, we have tested that the variables that were significant are: gender, ranking of university entrance, external regulation (EME-E), introjected regulation (EME-E), identified regulation (EME-E), motivation to knowledge (EME-E), and motivation to achieve (EME-E). Thus, the variables: external regulation (EME-E), introjected regulation (EME-E), and motivation to knowledge (EME-E), had a positive impact over the academic performance. The variables: identified regulation (EME-E) and motivation to achieve (EME-E) had a negative impact over the academic performance. About gender, the result indicates that men had greater academic performance than females and ranking of university entrance had a positive impact over the academic performance. On the other hand, we have verified that there are no significant differences between the total accumulated registered credits and total accumulated registered credits estimated, which validates our model used.

Conclusions
Although academic performance is one of the most commonly used manners with which to evaluate the progress of the development of professionals and relations have been observed between performance and variables such as personality traits, it is necessary to reflect on the training process as such, beyond quantifiable measures, which are known to depend on a multiplicity of factors. However, attempts such as this study to know personality traits and their relations to academic performance seek to bring us closer to a better understanding of the learning process. Such understanding allows us to advance in the improvement of the quality of education, which does not imply that this work attempts to simplify the educational process as it relates to performance. The objective is not only to determine the relation between personality and academic performance, but also to identify elements that allow contributing to the training of well-rounded professionals [45].
The results of this study indicate that it is possible to observe a marked tendency in the sample of students toward a manner of being. Regarding traits such as: external regulation (EME-E), introjected regulation (EME-E), identified regulation (EME-E), motivation to knowledge (EME-E), and motivation to achieve (EME-E), all of those variables are significant, and explain the academic performance. For the students of the School of Computer Engineering of the Pontificia Universidad Católica de Valparaíso, Chile, men have greater academic performance than female, and this is could be related to psychological factors such as low self-esteem and lacked encouragement during their secondary schooling [46], and ranking of university entrance has a positive impact over the academic performance.
It is striking that the variable motivation to the knowledge has a positive impact over the academic performance, but motivation to achieve has a negative impact over the academic performance; this could be, as a result, by the nature of the computer science students, because they are in a career to acquire more knowledge rather than the particular acquisition of achievements. All of those results are validated using the paired samples statistics analysis technique, where it was possible to verify that there were no significant differences between the results of the model and the sample data.
From the literature, we could think that external regulation (EME-E), introjected regulation (EME-E), identified regulation (EME-E), and motivation to knowledge (EME-E) are protective factors of the negative impact that can stress the mental health of students (DASS-21 questionnaire results). We understand that the higher the motivation indicators are, the lower the probability of having a negative impact on mental health due to academic load [47]. However, how can we positively impact the motivation of our students? In a previous research [48], gender and the effects of collaborative learning were compared, and a series of lines of work were proposed, being relevant due to the importance to intervene at the group level that it raises, pointing out that we can intervene in motivation by intervening in enhancing the positive relationship between classmates, understanding that students generate a positive interdependence, which increases motivation, i.e., it is considered positive social pressure and learning is valued positively.
Although the personality traits of the students may be considered more complex constructs, various investigations have suggested that they can be influenced by development, by learning, and the daily experience of various skills, such as emotional intelligence. It is clear that personality is an important element to consider in improving professional training, and for that, it is necessary to address the challenge critically. The objective of the curriculum in engineering should also be to train the ethical, aesthetic, humanistic, and political person so that future engineers can participate in projects and organizations in the national setting with world-class quality in addition to mastering their specific engineering specialty. Thus, it is necessary to stop focusing excessively on specialization and allow more space for the development of integrating processes that favor communication among disciplines, professions, and the societies that support them [49].
The importance of measuring anxiety, depression, and stress comes from the fact that these are mental health disorders, so if a student has indicators that make us suspect that they suffer from such disorders, thus, he will need psychoeducational support to carry out the school-university transition process properly, taking into account that depression affects the learning process in students [50]; therefore, we could consider the DASS-21 instrument as a predictor of students who will require more support in the process of integration to the university environment. Young university students are more prone to depression, due to a number of factors: personal, social, and academic, as demonstrated in a study [50]. The conclusion is the importance of changing the written and hidden curriculum to a student-friendly curriculum, as a result of which professionals from the psychosocial areas should enter as a support to the period of transition mentioned above (school-university), facilitating learning spaces around self-knowledge and stress and anxiety management strategies to improve their academic result.
In order to safeguard the learning and social process of young people who are immersed in university, it is imperative to carry out psychosocial interventions from the field of prevention mainly, through an induction that facilitates the process of adapting to university; however, when detecting students who may be more likely to have mental health disorders, it is necessary to support them through two aspects, counseling or individual therapy and workshops for a targeted group of students at risk. Research indicates that the streams with better results are cognitive, behavioral, and mindfulness interventions, mainly in reducing stress in students [51]; this could be a good path to start an intervention aimed at student assets.
Therefore, as agents of change in the university environment, we can propose to positively impact the academic space by problematizing and understanding the basic difficulties that our students have, taking into consideration that the university is already a new and different space that generates feelings and questions about the academic ability of students. The contribution of this research has, as its central axis, the involvement of psychosocial and academic variables, from where we can validate the importance of promoting autonomous motivation strategies in all students, understanding that higher education should focus on the entire learning and not on the qualifications, being able to provide students with programs that guide them in vocational and psychosocial terms [52].