# Modeling E-Behaviour, Personality and Academic Performance with Machine Learning

^{1}

^{2}

^{*}

## Abstract

**:**

## Featured Application

**The e-behaviour, personality and performance evaluation frameworks described in this article can be used by students and academic staff alike to monitor performance and online behaviour as it relates to performance. Being aware of e-behavioural patterns is a starting point to improving the academic performance of individual students and groups of students. The methodology can be used to inform the extent to which a course is to be adapted such that it encourages students to engage in behaviour that promotes better academic performance.**

## Abstract

## 1. Introduction

- Define a framework for personality traits and behaviour in the context of student online engagement;
- Show the relationship between Bourdieu’s Three Forms of Capital and academic performance;
- Show the relationship between personalities and academic performance through e-behaviours;
- Show that we can use e-behaviour and machine learning to predict student performance;
- Highlight the importance of the explainability of modelled personality traits and e-behaviours.

- We present a framework and methodology for arriving at predictive models for student performance starting with personality traits. These traits are the drivers of online behaviours that generate features that are predictive of performance.
- We argue for the use of online behaviours and proxies for the personality traits Conscientiousness and Extraversion.
- We demonstrate that online behaviours that are strongly associated with the identified personality traits correlate with student performance in a statistically significant way.

#### 1.1. Literature Review

#### Contribution to Existing Evaluation Systems

- The contextual framework for the research;
- The metric construct used for measuring each of these variables and outcomes;
- How the variables are used in an academic setting.

#### 1.2. Bourdieu’s Three Forms of Capital and Student Success

“[…] the application of sociological concepts and methods to the analysis of the production, distribution, exchange, and consumption of goods and services.”

#### 1.2.1. Social Capital

“The aggregate of the actual or potential resources which are linked to the possession of a durable network of more or less institutionalised relationships of mutual acquaintance and recognition.”

#### 1.2.2. Cultural Capital

- Institutionalised cultural capital (highest degree of education);
- Embodied cultural capital (values, skills, knowledge and tastes);
- Objectified cultural capital (possession of cultural goods).

#### 1.2.3. Economic Capital

## 2. Methodology

#### 2.1. Data Preprocessing

#### 2.2. Importance and Choice of Personality Traits

#### 2.3. Encoding Personality Traits

‘the relatively stable and enduring aspects of individuals which distinguish them from other people and form the basis of our predictions concerning their future behaviour’.

#### Challenges against Encoding Personality Traits

#### 2.4. Encoding Performance

#### 2.5. Student Background

#### Feature Selection Using RFE

- Optimise the Decision Tree weights with respect to their objective function on a set of features, F;
- Compute the ranking of importance for the features in F using the Decision Tree optimiser;
- Prune the features with the lowest rankings from F;
- Repeat 1–3 on the pruned set until the specified number of features is reached.

#### 2.6. Extraversion and Academic Groups

#### Forum Posts and Extraversion

#### 2.7. Student Discussions

Algorithm 1: Correlation Between Mean Discussion Grade and Student Grade. |

#### 2.8. Student Collaboration Groups

**Collaboration group policy**specified in the below paragraph. By this policy, not all students fit the qualify to host a Collaboration group.

**Collaboration group Policy:**${c}_{*}$ becomes a Collaboration group, ${c}_{i}$, if and only if $n\left({c}_{*}\right)>2$ students. Equivalently, if ${h}_{*}$ shares a discussion with ${s}_{1}$, ${s}_{2}$ and ${s}_{3}$, then ${h}_{*}$qualifies as a Host, ${h}_{i}$, and ${c}_{i}=\{{s}_{1},{s}_{2},{s}_{3}\}$. If $n\left({c}_{*}\right)\le 2$ students, then ${h}_{*}$ remains a candidate until they share a discussion with at least one more member.

#### 2.9. Logins and Conscientiousness

#### 2.10. Behaviour–Personality and Behaviour Model

**input**for each student was the array of sequences:

**output**for each student was a Safety Score: Flagged for At-risk students, and Ignored for Safe students. These per student B-PM

**input**and

**output**structures are summarised in Table 9.

#### 2.11. Algorithms for E-Behaviour, Personality and Performance Analysis

#### 2.11.1. Decision Tree Classifier

#### DTC Architecture

#### Gini Impurity Index—Decision Factors

#### Gini Calculation

#### 2.11.2. Ordinary Least Squares Linear Regression Analysis

#### 2.11.3. Validity of OLS Regression Models

- Normality of model residuals. The residual for each point was given by ${y}_{i}-{\widehat{y}}_{i}$. ${s}^{2}+{k}^{2}$ was computed for the residuals, where s is the z-score returned by the test for skewness and k is the z-score returned by the test for kurtosis.
- Residual Independence or lack of Autocorrelation in Residuals.
- Linearity in Parameters.
- Homoscedasticity of Residuals.
- Zero Conditional Mean.
- No Multicollinearity in Independent Variables.

#### 2.12. Long Short-Term Memory

- ${\mathbf{h}}_{t-1}$ and ${\mathbf{X}}_{t}$ were fed into the gate (or function) $\mathbf{f}$, where the output ${\mathbf{f}}_{t}$ laid in the open interval $(0,1)$. ${\mathbf{f}}_{t}$ then interacted with previous cell state ${\mathbf{c}}_{t-1}$ through element-wise multiplication ⨂; thus, ${\mathbf{c}}_{t-1}$ held an interim cell state, ${\mathbf{f}}_{t}{\mathbf{c}}_{t-1}$. At this stage, ${\mathbf{f}}_{t}{\mathbf{c}}_{t-1}$ represented a state that had forgotten some previous cell state data in ${\mathbf{c}}_{t-1}$ that were captured as unimportant (note that importance was regulated by weight coefficients that were trained and stored in their respective weight matrices).
- Whereas the forget gate ${\mathbf{f}}_{t}$ focused on regulating the extent to which previous data were forgotten, the input gate ${\mathbf{i}}_{t}$ focused on adding new data, scaled by their importance, or extent to which data should be added from the matrix comprised of ${\mathbf{h}}_{t-1}$ and ${\mathbf{X}}_{t}$.
- The $tanh$ gate obtained ${\mathbf{h}}_{t-1}$ and ${\mathbf{X}}_{t}$, but used the hyperbolic tangent $tanh$ function to compute its outputs (between −1 and 1).
- The result given by $tanh$ and ${\mathbf{i}}_{t}$ was then multiplied element-wise and further added (⨁) to ${\mathbf{f}}_{t}{\mathbf{c}}_{t-1}$, giving ${\mathbf{c}}_{t}$, shown in Equation (20).
- The output gate ${\mathbf{o}}_{t}$ decided what values to output, given ${\mathbf{h}}_{t-1}$ and ${\mathbf{X}}_{t}$, and also computed its exposure to the following cell state based on trained importance.
- Finally, the values of the cell state, ${\mathbf{c}}_{t}$, were passed through a $tanh$ function and multiplied by the output gate result, ${\mathbf{o}}_{t}$, such that the LSTM unit kept only the output that it accounted for as important in ${\mathbf{h}}_{t}$, described by Equation (23).

#### LSTM Problem Design

#### 2.13. Evaluation Metrics for Student Risk Classification

#### Results Summary

#### The Overall Accuracy of a Model

## 3. Results

#### 3.1. Background Data and Grade

#### Classifying a Student Based on Background Data

#### 3.2. Extraversion-Level and Grade

#### 3.3. Conscientiousness-Level and Grade

#### 3.4. Student Discussions and Grade

#### 3.5. Student Collaboration—Groups and Grade

#### 3.6. B-PM and Outcome

#### 3.7. BM and Outcome

## 4. Discussion

#### 4.1. Background and Grade

#### 4.2. Extraversion-Level and Grade

#### 4.3. Conscientiousness-Level and Grade

#### 4.4. Academic Groups and Social Capital

The quality of a student’s social capital is the quality of their Academic group’s performance.

#### Academic Group Size Constraints

#### 4.5. BM and B-PM

#### 4.6. BM and the Trade-Off Waterfall

- ${t}_{2019}^{*}$ was chosen to be the t in 2018 that yielded the maximum value of $\kappa \left(t\right)$ in 2018.
- ${t}_{2019}^{*}$ was based on exogenous considerations determined by the institution’s stakeholders. Examples of exogenous considerations were the urgency required for intervention and resources required to make interventions.

#### Practical Benefits and Limitations of the Trade-Off Waterfall

- The general trade-off was that $\kappa \left(t\right)$ increased as t increased;
- The trade-off peaked at some point, and, in this case, three weeks before the examination period at $t=18$. Therefore, it may not be worth waiting for the start of an examination period (such as $t=21$) before conducting interventions (given the login data of this cohort, different cohorts and different datasets from those presented in this report may produce different peak periods). For example, the Trade-off Waterfall showed that, after $t=18$, there was no benefit of waiting for an extra one, two or even three weeks to intervene because $\kappa \left(19\right)$, $\kappa \left(20\right)$, $\kappa \left(21\right)<\kappa \left(18\right)$.

## 5. Limitations and Future Work

#### 5.1. Alternative Formulations of Personalities

#### 5.2. Extraversion Levels

## 6. Conclusions

- Economic Capital—modelled by the Financial Assistance;
- Cultural Capital—modelled by the Quintile in Province and Township School;
- Social Capital—modelled by Academic groups.

## Author Contributions

## Funding

## Institutional Review Board Statement

## Conflicts of Interest

## References

- Richiţeanu-Năstase, E.R.; Stăiculescu, C. University dropout. Causes and solution. Ment. Health Glob. Chall. J.
**2018**, 1, 71–75. [Google Scholar] - Wright, D.; Taylor, A. Introducing Psychology: An Experimental Approach; Penguin Books: London, UK, 1970. [Google Scholar]
- Heppner, P.P.; Wampold, B.E.; Owen, J.; Wang, K.T.; Thompson, M.N. Research Design in Counseling, 4th ed.; Cengage Learning: Boston, MA, USA, 2015. [Google Scholar]
- Stone, A.A. Hove, East Sussex, United Kingdom; Lawrence Erlbaum: Mahwah, NJ, USA, 2000. [Google Scholar]
- Northrup, D.A.; York University (Toronto, O.; for Social Research, I.). The Problem of the Self-Report in Survey Research: Working Paper; Institute for Social Research, York University: North York, ON, Canada, 1997. [Google Scholar]
- Fellegi, I.P. The Evaluation of the Accuracy of Survey Results: Some Canadian Experiences. Int. Stat. Rev. Rev. Int. Stat.
**1973**, 41, 1–14. [Google Scholar] [CrossRef] - Costa, P.T.; McCrae, R.R. The NEO Personality Inventory; Psychological Assessment Resources: Odessa, FL, USA, 1985. [Google Scholar]
- Poropat, A.E. A meta-analysis of the five-factor model of personality and academic performance. Psychol. Bull.
**2009**, 135, 322. [Google Scholar] [CrossRef] [Green Version] - Furnham, A.; Nuygards, S.; Chamorro-Premuzic, T. Personality, assessment methods and academic performance. Instr. Sci.
**2013**, 41, 975–987. [Google Scholar] [CrossRef] - Ciorbea, I.; Pasarica, F. The study of the relationship between personality and academic performance. Procedia-Soc. Behav. Sci.
**2013**, 78, 400–404. [Google Scholar] [CrossRef] [Green Version] - Kumari, B. The correlation of Personality Traits and Academic performance: A review of literature. IOSR J. Humanit. Soc. Sci.
**2014**, 19, 15–18. [Google Scholar] [CrossRef] - Morris, P.E.; Fritz, C.O. Conscientiousness and procrastination predict academic coursework marks rather than examination performance. Learn. Individ. Differ.
**2015**, 39, 193–198. [Google Scholar] [CrossRef] [Green Version] - Chamorro-Premuzic, T.; Furnham, A. Personality predicts academic performance: Evidence from two longitudinal university samples. J. Res. Personal.
**2003**, 37, 319–338. [Google Scholar] [CrossRef] - Kim, S.; Fernandez, S.; Terrier, L. Procrastination, personality traits, and academic performance: When active and passive procrastination tell a different story. Personal. Individ. Differ.
**2017**, 108, 154–157. [Google Scholar] [CrossRef] [Green Version] - Costa, P.T., Jr.; McCrae, R.R. The Revised NEO Personality Inventory (NEO-PI-R); Sage Publications, Inc.: Newbury Park, CA, USA, 2008. [Google Scholar]
- Wilt, J.; Revelle, W. Extraversion. In Handbook of Individual Differences in Social Behavior; The Guilford Press: New York, NY, USA, 2009; pp. 27–45. [Google Scholar]
- APA Dictionary of Psychology–Gregariousness. 2020. Available online: https://dictionary.apa.org/gregariousness (accessed on 29 December 2020).
- Merriam-Webster. Dutiful. Available online: https://www.merriam-webster.com/dictionary/dutifulness (accessed on 29 November 2020).
- Akçapınar, G. Profiling students’ approaches to learning through moodle logs. In Proceedings of the Multidisciplinary Academic Conference on Education, Teaching and Learning (MAC-ETL 2015), Prague, Czech Republic, 4–6 December 2015. [Google Scholar]
- Huang, A.Y.; Lu, O.H.; Huang, J.C.; Yin, C.J.; Yang, S.J. Predicting students’ academic performance by using educational big data and learning analytics: Evaluation of classification methods and learning logs. Interact. Learn. Environ.
**2020**, 28, 206–230. [Google Scholar] [CrossRef] - Khan, I.A.; Brinkman, W.P.; Fine, N.; Hierons, R.M. Measuring personality from keyboard and mouse use. In Proceedings of the 15th European Conference on Cognitive Ergonomics: The Ergonomics of Cool Interaction, Funchal, Portugal, 16–19 September 2008; pp. 1–8. [Google Scholar]
- Fowler, G.C.; Glorfeld, L.W. Predicting aptitude in introductory computing: A classification model. AEDS J.
**1981**, 14, 96–109. [Google Scholar] [CrossRef] - Poh, N.; Smythe, I. To what extend can we predict students’ performance? A case study in colleges in South Africa. In Proceedings of the 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Orlando, FL, USA, 9–12 December 2014; pp. 416–421. [Google Scholar] [CrossRef]
- Evans, G.E.; Simkin, M.G. What Best Predicts Computer Proficiency? Commun. ACM
**1989**, 32, 1322–1327. [Google Scholar] [CrossRef] - Dauter, L.G. Economic Sociology. 2016. Available online: https://www.britannica.com/topic/economic-sociology (accessed on 29 December 2020).
- Bourdieu, P.; Richardson, J.G. The forms of capital. In Cultural Theory: An Anthology; John Wiley & Sons: Hoboken, NJ, USA, 1986. [Google Scholar]
- Carpiano, R.M. Toward a neighborhood resource-based theory of social capital for health: Can Bourdieu and sociology help? Soc. Sci. Med.
**2006**, 62, 165–175. [Google Scholar] [CrossRef] [PubMed] - Hallinan, M.T.; Smith, S.S. Classroom characteristics and student friendship cliques. Soc. Forces
**1989**, 67, 898–919. [Google Scholar] [CrossRef] - Song, L. Social capital and psychological distress. J. Health Soc. Behav.
**2011**, 52, 478–492. [Google Scholar] [CrossRef] [PubMed] - Hayes, E. Elaine Hayes on “The Forms of Capital”. 1997. Available online: https://web.english.upenn.edu/~jenglish/Courses/hayes-pap.html (accessed on 2 November 2021).
- Smith, E.; White, P. What makes a successful undergraduate? The relationship between student characteristics, degree subject and academic success at university. Br. Educ. Res. J.
**2015**, 41, 686–708. [Google Scholar] [CrossRef] [Green Version] - Caldas, S.J.; Bankston, C. Effect of School Population Socioeconomic Status on Individual Academic Achievement. J. Educ. Res.
**1997**, 90, 269–277. [Google Scholar] [CrossRef] - Fan, J. The Impact of Economic Capital, Social Capital and Cultural Capital: Chinese Families’ Access to Educational Resources. Sociol. Mind
**2014**, 4, 272–281. [Google Scholar] [CrossRef] [Green Version] - Ajzen, I. Attitudes, Personality, and Behavior; McGraw-Hill Education: New York, NY, USA, 2005. [Google Scholar]
- Campbell, D.T. Social Attitudes and Other Acquired Behavioral Dispositions. In Psychology: A Study of a Science. Study II. Empirical Substructure and Relations with Other Sciences. Investigations of Man as Socius: Their Place in Psychology and the Social Sciences; McGraw-Hill: New York, NY, USA, 1963; Volume 6, pp. 94–172. [Google Scholar] [CrossRef]
- Hemakumara, G.; Ruslan, R. Spatial Behaviour Modelling of Unauthorised Housing in Colombo, Sri Lanka. KEMANUSIAAN Asian J. Humanit.
**2018**, 25, 91–107. [Google Scholar] [CrossRef] - Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn.
**2002**, 46, 389–422. [Google Scholar] [CrossRef] - Barrick, M.R.; Mount, M.K.; Strauss, J.P. Conscientiousness and performance of sales representatives: Test of the mediating effects of goal setting. J. Appl. Psychol.
**1993**, 78, 715. [Google Scholar] [CrossRef] - Campbell, J.P. Modeling the performance prediction problem in industrial and organizational psychology. In Handbook of Industrial and Organizational Psychology; Consulting Psychologists Press: Palo Alto, CA, USA, 1990. [Google Scholar]
- Stigler, S.M. Gauss and the Invention of Least Squares. Ann. Stat.
**1981**, 9, 465–474. [Google Scholar] [CrossRef] - Blumberg, M.; Pringle, C.D. The missing opportunity in organizational research: Some implications for a theory of work performance. Acad. Manag. Rev.
**1982**, 7, 560–569. [Google Scholar] [CrossRef] - Leontjeva, A.; Kuzovkin, I. Combining Static and Dynamic Features for Multivariate Sequence Classification. In Proceedings of the 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Montreal, QC, Canada, 17–19 October 2016; pp. 21–30. [Google Scholar] [CrossRef] [Green Version]
- Safavian, S.R.; Landgrebe, D. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern.
**1991**, 21, 660–674. [Google Scholar] [CrossRef] [Green Version] - Mitchell, T.M. Machine Learning, 1st ed.; McGraw-Hill, Inc.: New York, NY, USA, 1997; Chapter 10. [Google Scholar]
- Raileanu, L.; Stoffel, K. Theoretical Comparison between the Gini Index and Information Gain Criteria. Ann. Math. Artif. Intell.
**2004**, 41, 77–93. [Google Scholar] [CrossRef] - Khalaf, A.; Hashim, A.; Akeel, W. Predicting Student Performance in Higher Education Institutions Using Decision Tree Analysis. Int. J. Interact. Multimed. Artif. Intell.
**2018**, 5, 26–31. [Google Scholar] - Topîrceanu, A.; Grosseck, G. Decision tree learning used for the classification of student archetypes in online courses. Procedia Comput. Sci.
**2017**, 112, 51–60. [Google Scholar] [CrossRef] - Kolo, K.D.; Adepoju, S.A.; Alhassan, J.K. A decision tree approach for predicting students academic performance. Int. J. Educ. Manag. Eng.
**2015**, 5, 12. [Google Scholar] [CrossRef] [Green Version] - Gujarati, D.N.; Porter, D.C. Basic Econometrics; McGraw Hill Inc.: New York, NY, USA, 2009. [Google Scholar]
- Liu, Z.; Sullivan, C.J. Prediction of weather induced background radiation fluctuation with recurrent neural networks. Radiat. Phys. Chem.
**2019**, 155, 275–280. [Google Scholar] [CrossRef] - Wang, M.; Zhang, Y.D.; Cui, G. Human motion recognition exploiting radar with stacked recurrent neural network. Digit. Signal Process.
**2019**, 87, 125–131. [Google Scholar] [CrossRef] - Bengio, Y.; Boulanger-Lewandowski, N.; Pascanu, R. Advances in Optimizing Recurrent Networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013. [Google Scholar]
- Olah, C. Understanding LSTM Networks. 2015. Available online: https://colah.github.io/posts/2015-08-Understanding-LSTMs/ (accessed on 19 April 2019).
- Hand, D.; Christen, P. A note on using the F-measure for evaluating record linkage algorithms. Stat. Comput.
**2018**, 28, 539–547. [Google Scholar] [CrossRef] [Green Version] - Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas.
**1960**, 20, 37–46. [Google Scholar] [CrossRef] - Landis, J.R.; Koch, G.G. The measurement of observer agreement for categorical data. Biometrics
**1977**, 33, 159–174. [Google Scholar] [CrossRef] [Green Version] - Hung, J.L.; Zhang, K. Revealing online learning behaviors and activity patterns and making predictions with data mining techniques in online teaching. MERLOT J. Online Learn. Teach.
**2009**, 4, 426–437. [Google Scholar] - Romero, C.; Ventura, S.; Espejo, P.; Martínez, C. Data Mining Algorithms to Classify Students; Journal of Education and Data Mining: Memphis, TN, USA, 2008; pp. 8–17. [Google Scholar]
- Bhandari, H.; Yasunobu, K. What Is Social Capital? A Comprehensive Review of the Concept. Asian J. Soc. Sci.
**2009**, 37, 480–510. [Google Scholar] [CrossRef] - Costa, P.; McCrae, R.; Kay, G. Persons, Places, and Personality: Career Assessment Using the Revised NEO Personality Inventory. J. Career Assess.
**1995**, 3, 123–139. [Google Scholar] [CrossRef]

**Figure 1.**The LSTM unit kept a cell state throughout its operations, which served as input in the next time step. It also output ${\mathbf{h}}_{t}$, which supplemented the input ${\mathbf{X}}_{t}$ in the following time step. From Olah [53].

**Figure 2.**Pearson Correlation Coefficients of the Chosen Features. Quintile and Township School had the highest correlation with Grade.

**Figure 6.**Random-Student’s Grades against Collaboration-group’s Grade Averages. $r=0.479$, $p=0.004$.

**Figure 8.**Crude Post Count against Student Grade. There was a positive relationship between Post Count and Grade that was not suitable for a linear OLS fit.

System | Continuously Proactive | Corrective | Easily Feasible at Scale | Reliable |
---|---|---|---|---|

Questionnaires | ✗ | ✗ | ✓ | ✗ |

Previous Grades | ✗ | ✗ | ✓ | ✓ |

Consultation | ✗ | ✓ | ✗ | ✓ |

e-behaviour Models | ✓ | ✗ | ✓ | To be shown |

Transformation Phase | Categorical Features | Total Features |
---|---|---|

Before One-hot | 169 | 176 |

After One-hot | 6616 | 6623 |

After RFE | 5 | 5 |

Feature | Description | Type | Values |
---|---|---|---|

Quintile | To which of the five categories a student’s high-school belongs under the South African Government school standards; a 6 indicates private high-schools | Categorical | 1–6 |

Gauteng Province | Whether a student completed their ultimate year of high-school at a school in GP (Gauteng Province) | Binary | No, Yes |

Gender | Whether the student was female or male | Binary | Female, Male |

Financial Assistance | Whether a student received financial aid from the National Student Financial Aid Scheme | Binary | No, Yes |

Township School | Whether a student’s high-school was situated in a township area | Binary | No, Yes |

Grade(label) | Grade points out of 100 obtained, as defined in Section 2.4 on Encoding performance | Continuous | 0.0–100.0 |

Outcome(label) | Risk of the student based on their Grade, as defined in Section 2.4 on Encoding performance | Binary | At-Risk, Safe |

Feature | Description | Type | Possible Values |
---|---|---|---|

Discussion | Discussion number. Messages that begin a topic and are posted as responses were assigned the same discussion number. | Categorical | 0–337 |

Message | Contents of each forum post. | String | - |

Time | Extracted from the Created variable. Indicates the time at which each message was posted. | yyyy-mm-dd | 2018-01-05 to 2019-01-05 |

Grade(label) | Number of points out of 100 (Section 2.4). | Number | 0.0–100.0 |

E | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

${G}_{E}$ | 55.8 | 64.4 | 68.3 | 66.1 | 69.3 | 73.3 | 66.0 | 69.7 | 71.4 | 70.2 | 81.3 | 79.4 | 75.2 | 74.4 | 79.3 |

${\mathbf{d}}_{\mathbf{i}}$ | ${\mathbf{s}}_{\mathbf{i}}$ | $\mathbb{E}\left[{\mathbf{Gd}}_{\mathbf{i}}\right]$ | ${\mathbf{Gd}}_{\mathbf{i}}\left({\mathbf{s}}_{\mathbf{i}}\right)$ |
---|---|---|---|

0 | 23 | 83.08 | 92.75 |

1 | 728 | 56.81 | 59.25 |

2 | 833 | 49.75 | 48.50 |

⋮ | ⋮ | ⋮ | ⋮ |

336 | 79 | 75.35 | 90.75 |

337 | 15 | 74.87 | 70.25 |

s | ${\mathbf{d}}_{0}$ | ${\mathbf{d}}_{1}$ | ${\mathbf{d}}_{2}$ | ${\mathbf{d}}_{3}$ | ${\mathbf{d}}_{4}$ | … | ${\mathbf{d}}_{337}$ |
---|---|---|---|---|---|---|---|

1 | 0 | 0 | 0 | 0 | 0 | … | 1 |

2 | 1 | 1 | 0 | 0 | 0 | … | 0 |

3 | 0 | 0 | 1 | 1 | 0 | … | 0 |

… | … | … | … | … | … | … | … |

1131 | 0 | 1 | 0 | 1 | 0 | … | 0 |

1132 | 1 | 0 | 0 | 0 | 0 | … | 0 |

1133 | 0 | 1 | 0 | 0 | 0 | … | 1 |

${\mathbf{h}}_{\mathbf{i}}$ | ${\mathbf{c}}_{\mathbf{i}}$ | ${\mathbf{Gc}}_{\mathbf{i}}\left({\mathbf{h}}_{\mathbf{i}}\right)$ | $\mathbb{E}\left[{\mathbf{Gc}}_{\mathbf{i}}\right]$ |
---|---|---|---|

1 | {5, 48, 3, 138} | 73.00 | 47.68 |

2 | {119, 172, 199} | 81.80 | 67.62 |

3 | {40, 35, 20, 16, 51} | 90.75 | 69.80 |

4 | {90, 200, 28, 33, 94, 142, 42, 101, 84} | 49.00 | 62.08 |

5 | {81, 209, 143, 206, 12, 150} | 98.25 | 63.04 |

6 | {142, 33, 28, 42} | 59.25 | 58.25 |

7 | {65, 190, 107, 8, 173} | 46.60 | 74.51 |

Feature | Shape | Type | Example Value |
---|---|---|---|

$\phantom{(}$$\left\{L{\left(s\right)}_{t}\right\}$$\phantom{(}$ Login Sequence | $(1\times 17)$ | A Sequence of Whole Numbers | $[3,7,\dots ,0]$ |

$\left\{E{\left(s\right)}_{t}\right\}$ Extraversion (Extraversion level Sequence) Sequence | $(1\times 17)$ | A Sequence of Whole Numbers | $[8,8,\dots ,8]$ |

$\left\{C{\left(s\right)}_{t}\right\}$ Conscientiousness (Conscientiousness level Sequence) Sequence | $(1\times 17)$ | A Sequence of Real Numbers | $[1.1,1.1,\dots ,1.1]$ |

Input:$[\left\{L{\left(s\right)}_{t}\right\},\left\{E{\left(s\right)}_{t}\right\},\left\{C{\left(s\right)}_{t}\right\}]$ | $(3\times 17)$ | An Array of Sequences of Real Numbers | $\left[\right[3,7,\dots ,0]$ $\phantom{\rule{3.33333pt}{0ex}}[8,8,\dots ,8]$ $\phantom{\rule{3.33333pt}{0ex}}[1.1,1.1,\dots ,1.1]]$ |

Output:Safety Score = $\{Flagged,Ignored\}$ | $(1\times 1)$ | Binary | Ignored |

Safety Score (Prediction) | ||||
---|---|---|---|---|

Flagged | Ignored | Total | ||

Outcome (True Label) | At-risk | a | b | $a+b$ |

Safe | c | d | $c+d$ | |

Total | $a+c$ | $b+d$ | $N=a+b+c+d$ | |

Outcome | Precision | Recall | ||

At-risk | $\frac{a}{a+c}$ | $\frac{a}{a+b}$ | ||

Safe | $\frac{d}{d+b}$ | $\frac{d}{d+c}$ |

$\mathbf{\kappa}$ | Level of Agreement |
---|---|

<0.00 | Worse than chance |

0.00–0.20 | Slight agreement |

0.21–0.40 | Fair agreement |

0.41–0.60 | Moderate agreement |

0.61–0.80 | Substantial agreement |

0.81–1.00 | Near-perfect agreement |

Safety Score (Prediction) | ||||
---|---|---|---|---|

Flagged | Ignored | Total | ||

Outcome (True Label) | At-risk | 107 | 157 | 264 |

Safe | 153 | 533 | 686 | |

Total | 260 | 690 | 950 | |

Outcome | Precision | Recall | ||

At-risk | 0.41 | 0.41 | ||

Safe | 0.77 | 0.78 | ||

$\kappa $ = 0.18 |

Linear Equation: ${\widehat{\mathit{G}}}_{\mathit{E}}=1.269\mathit{E}+62.422$ | ||||
---|---|---|---|---|

Feature | Coeff. | $\mathbf{r}$ | $\mathit{p}$-Value | Coeff. 95% CI |

E | 1.269 | 0.846 | 0.000 | [0.771, 1.767] |

Intercept | 62.422 | 0.000 | [58.354, 66.491] |

Linear Equation: $\widehat{\mathit{G}}\left(\mathit{s}\right)=5.988\mathit{C}\left(\mathit{s}\right)+39.829$ | ||||
---|---|---|---|---|

Feature | Coeff. | $\mathbf{r}$ | $\mathit{p}$-Value | Coeff. 95% CI |

$C\left(s\right)$ | 5.988 | 0.319 | 0.000 | [4.129, 7.847] |

Intercept | 39.829 | 0.000 | [34.426, 45.232] |

Linear Equation: $\widehat{\mathit{G}}{\mathit{d}}_{\mathit{i}}\left({\mathit{s}}_{\mathit{i}}\right)=0.528\mathbb{E}\left[{\mathit{Gd}}_{\mathit{i}}\right]+35.607$ | ||||
---|---|---|---|---|

Feature | Coeff. | $\mathbf{r}$ | $\mathit{p}$-Value | Coeff. 95% CI |

$\mathbb{E}\left[G{d}_{i}\right]$ | 0.528 | 0.421 | 0.011 | (0.131, 0.925) |

Intercept | 35.607 | 0.014 | (7.789, 63.425) |

**Table 16.**OLS regression summary—random student’s Grades against Collaboration group’s Grade averages.

Linear Equation: $\widehat{\mathit{G}}{\mathit{c}}_{\mathit{i}}\left({\mathit{h}}_{\mathit{i}}\right)=0.984\mathbb{E}\left[{\mathit{Gc}}_{\mathit{i}}\right]+5.175$ | ||||
---|---|---|---|---|

Feature | Coeff. | $\mathbf{r}$ | $\mathit{p}$-Value | Coeff. 95% CI |

$\mathbb{E}\left[G{c}_{i}\right]$ | 0.984 | 0.479 | 0.004 | (0.334, 1.663) |

Intercept | 5.175 | 0.797 | (−35.501, 0.975) |

Safety Score (Prediction) | ||||
---|---|---|---|---|

Flagged | Ignored | Total | ||

Outcome (True Label) | At-risk | 27 | 19 | 46 |

Safe | 12 | 112 | 124 | |

Total | 39 | 131 | $\mathbf{170}$ | |

Outcome | Precision | Recall | ||

At-risk | $0.69$ | 0.59 | ||

Safe | $0.85$ | $0.90$ | ||

$\kappa $ = 0.51 |

Safety Score | ||||
---|---|---|---|---|

Flagged | Ignored | Total | ||

Outcome | At-risk | $27\left(27\right)$ | $19\left(19\right)$ | 46 |

Safe | $22\phantom{\rule{3.33333pt}{0ex}}\left(12\right)$ | $102\phantom{\rule{3.33333pt}{0ex}}\left(112\right)$ | 124 | |

Total | $49\phantom{\rule{3.33333pt}{0ex}}\left(39\right)$ | $121\phantom{\rule{3.33333pt}{0ex}}\left(131\right)$ | $\mathbf{170}$ | |

Outcome | Precision | Recall | ||

At-risk | 0.55 (0.69) | 0.59 (0.59) | ||

Safe | 0.84 (0.85) | 0.82 (0.90) | ||

$\phantom{\rule{3.33333pt}{0ex}}\kappa $ = 0.40 | ||||

$(\kappa $ = 0.51) |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Seota, S.B.-W.; Klein, R.; van Zyl, T.
Modeling E-Behaviour, Personality and Academic Performance with Machine Learning. *Appl. Sci.* **2021**, *11*, 10546.
https://doi.org/10.3390/app112210546

**AMA Style**

Seota SB-W, Klein R, van Zyl T.
Modeling E-Behaviour, Personality and Academic Performance with Machine Learning. *Applied Sciences*. 2021; 11(22):10546.
https://doi.org/10.3390/app112210546

**Chicago/Turabian Style**

Seota, Serepu Bill-William, Richard Klein, and Terence van Zyl.
2021. "Modeling E-Behaviour, Personality and Academic Performance with Machine Learning" *Applied Sciences* 11, no. 22: 10546.
https://doi.org/10.3390/app112210546