# Academic Success Assessment through Version Control Systems

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Version Control System

#### 2.2. Input Data

- An identifier to make students anonymous (id): it provided differentiation among the samples.
- Commits counter (commits): it represented the number of commit operations made by a given student.
- Number of days where at least one commit was done (days).
- Average number of commit operations per day (commits/day).
- Number of additional lines appended to the source code during the assignment development (additions).
- Number of deleted lines removed from the source code during the assignment development (deletions).
- Number of issues opened (issues-opened) during the assignment.
- Number of issues closed (issues-closed) during the assignment.

#### 2.3. Models Design and Evaluation

- Adaptive Boosting (AB). Ensemble methods are techniques that combine different basic classifiers turning a weak learner into a more accurate method. Boosting is one of the most successful types of ensemble methods and AB one of the most popular boosting algorithms.
- Classification And Regression Tree (CART). A decision tree is a method that predicts the label associated with an instance by traveling from a root node of a tree to a leaf [19]. It is a non-parametric method in which the trees are grown in an iterative, top-down process.
- K-Nearest Neighbors (KNN). Although the nearest neighbor concept is the foundation of many other learning methods, notably unsupervised, supervised neighbor-based learning is also available to classify data with discrete labels. It is a non-parametric technique that classifies new observations based on the distance to the observation in the training set. A good presentation of the analysis was given in [20,21].
- Linear Discriminant Analysis (LDA). This is a parametric method that assumes that distributions of the data are multivariate Gaussian [21]. Furthermore, LDA assumes knowledge of population parameters. In another case, the maximum likelihood estimator can be used. LDA uses Bayesian approaches to select the category, which maximizes the conditional probability; see [22,23,24].
- Logistic Regression (LR). Linear methods are intended for regressions in which the target value is expected to be a linear combination of the input variables. LR, despite its name, is a linear model for classification rather than regression. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function.
- Multi-Layer Perceptron (MLP). An artificial neural network is a model inspired by the structure of the brain. Neural networks are used when the type of relationship between inputs and outputs is not known. The network is organized into layers of nodes (an input layer, an output layer, and one or more hidden layers). These layers are organized in a directed graph so that each layer is fully connected to the next one. An MLP is a modification of the standard linear perceptron whose best characteristic is that it is able to distinguish data that are not linearly separable. An MLP uses back-propagation for training the network; see [25,26].
- Random Forest (RF). This is a classifier consisting of a collection of decision trees, in which each tree is constructed by applying an algorithm to the training set and an additional random vector that is sampled via bootstrap re-sampling [28].

## 3. Results

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

AB | Adaptive Boosting |

CART | Classification And Regression Tree |

ICT | Information and Communications Technology |

KNN | K-Nearest Neighbors |

KPI | Key Performance Indicator |

LA | Learning Analytics |

LDA | Linear Discriminant Analysis |

LMS | Learning Management System |

LR | Logistic Regression |

ML | Machine Learning |

MLP | Multi-Layer Perceptron |

MoEv | Module Evaluator |

NB | Naive Bayes |

RF | Random Forest |

SIS | Student Institutional System |

VCS | Version Control System |

## References

- Siemens, G.; Gasevic, D. Guest editorial-Learning and knowledge analytics. Educ. Technol. Soc.
**2012**, 15, 1–2. [Google Scholar] - Siemens, G.; Dawson, S.; Lynch, G. Improving the Quality and Productivity of the Higher Education Sector: Policy and Strategy for Systems-Level Deployment of Learning Analytics; Society for Learning Analytics Research for the Australian Office for Learning and Teaching: Canberra, Australia, 2013. [Google Scholar]
- Gašević, D.; Dawson, S.; Rogers, T.; Gasevic, D. Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success. Internet High. Educ.
**2016**, 28, 68–84. [Google Scholar] [CrossRef] [Green Version] - Guerrero-Higueras, Á.M.; DeCastro-García, N.; Matellán, V.; Conde, M.Á. Predictive models of academic success: a case study with version control systems. In Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing Multiculturality, Salamanca, Spain, 24–26 October 2018; pp. 306–312. [Google Scholar]
- Guerrero-Higueras, Á.M.; DeCastro-García, N.; Rodriguez-Lera, F.J.; Matellán, V.; Conde, M.Á. Predicting academic success through students’ interaction with Version Control Systems. Open Comput. Sci.
**2019**, 9, 243–251. [Google Scholar] [CrossRef] - Guerrero-Higueras, Á.M.; DeCastro-García, N.; Matellán, V. Detection of Cyber-attacks to indoor real time localization systems for autonomous robots. Robot. Auton. Syst.
**2018**, 99, 75–83. [Google Scholar] [CrossRef] - Kovacic, Z. Predicting student success by mining enrolment data. Res. High. Educ. J.
**2012**, 15, 1–20. [Google Scholar] - Agudo-Peregrina, Á.F.; Iglesias-Pradas, S.; Conde-González, M.Á.; Hernández-García, Á. Can we predict success from log data in VLEs? Classification of interactions for learning analytics and their relation with performance in VLE-supported F2F and online learning. Comput. Hum. Behav.
**2014**, 31, 542–550. [Google Scholar] [CrossRef] - Barber, R.; Sharkey, M. Course correction: Using analytics to predict course success. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, Vancouver, BC, Canada, 29 April–2 May 2012; pp. 259–262. [Google Scholar]
- Spinellis, D. Version control systems. IEEE Softw.
**2005**, 22, 108–109. [Google Scholar] [CrossRef] - Fischer, M.; Pinzger, M.; Gall, H. Populating a release history database from version control and bug tracking systems. In Proceedings of the International Conference on Software Maintenance, Amsterdam, The Netherlands, 22–26 September 2003; pp. 23–32. [Google Scholar]
- Pilato, C.M.; Collins-Sussman, B.; Fitzpatrick, B.W. Version Control with Subversion: Next Generation Open Source Version Control; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2008. [Google Scholar]
- Torvalds, L.; Hamano, J. Git: Fast Version Control System. Available online: http://git-scm.com (accessed on 21 February 2020).
- Guerrero-Higueras, A.M.; Matellán-Olivera, V.; Esteban-Costales, G.; Fernández-Llamas, C.; Rodríguez-Sedano, F.J.; Ángel, C.M. Model for evaluating student performance through their interaction with version control systems. In Proceedings of the Learning Analytics Summer Institute (LASI), New York, NY, USA, 11–13 June 2018. [Google Scholar]
- Guerrero-Higueras, Á.M.; Sánchez-González, L.; Fernández-Llamas, C.; Conde, M.Á.; Lera, F.J.R.; Sedano, F.J.R.; Costales, G.E.; Matellán, V. Prediction of academic success through interaction with version control systems. In Proceedings of the Seventh International Conference on Technological Ecosystems for Enhancing Multiculturality, Salamanca, Spain, 24–26 October 2018; pp. 284–289. [Google Scholar]
- De Alwis, B.; Sillito, J. Why are software projects moving from centralized to decentralized version control systems? In Proceedings of the 2009 ICSE Workshop on Cooperative and Human Aspects on Software Engineering, Vancouver, BC, Canada, 17 May 2009; pp. 36–39. [Google Scholar]
- Griffin, T.; Seals, S. Github in the classroom: Not just for group projects. J. Comput. Sci. Coll.
**2013**, 28, 74. [Google Scholar] - Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn.
**2006**, 63, 3–42. [Google Scholar] [CrossRef] [Green Version] - Friedman, J.; Hastie, T.; Tibshirani, R. The Elements of Statistical Learning, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2009; Volume 1. [Google Scholar]
- Devroye, L.; Györfi, L.; Lugosi, G. A Probabilistic Theory of Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2013; Volume 31. [Google Scholar]
- Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Bishop, C.M. Pattern recognition. Mach. Learn.
**2006**, 128, 1–58. [Google Scholar] - Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Rummelhart, D.E. Learning internal representations by error propagation. In Parallel Distributed Processing; University of California, Institute for cognitive science: San Diego, CA, USA, 1986. [Google Scholar]
- Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst.
**1989**, 2, 303–314. [Google Scholar] [CrossRef] - Zhang, H. The optimality of naive Bayes. AA
**2004**, 1, 3. [Google Scholar] - Breiman, L. Random forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**MoEv operation scheme as shown in [5].

Subject | Year | Course | Practical Assignments | Support |
---|---|---|---|---|

Computer Programming I | 1${}^{\circ}$ | 2018–2019 | 3 | 372 |

Computer Organization | 1${}^{\circ}$ | 2018–2019 | 5 | 240 |

Operating Systems Extension | 2${}^{\circ}$ | 2016–2017 | 1 | 72 |

2017–2018 | 1 | 47 |

Classifier | ||||||||
---|---|---|---|---|---|---|---|---|

RF | KNN | AB | LR | LDA | CART | NB | MLP | |

Score | 0.78 | 0.72 | 0.70 | 0.67 | 0.67 | 0.67 | 0.63 | 0.63 |

Classifier | Class | P | R | ${\mathit{F}}_{1}$-Score | Support |
---|---|---|---|---|---|

RF | P | 0.79 | 0.85 | 0.82 | 27 |

F | 0.76 | 0.68 | 0.82 | 19 | |

avg/total | 0.78 | 0.78 | 0.78 | 46 | |

KNN | P | 0.72 | 0.85 | 0.78 | 27 |

F | 0.71 | 0.53 | 0.61 | 19 | |

avg/total | 0.72 | 0.72 | 0.71 | 46 | |

AB | P | 0.76 | 0.70 | 0.73 | 27 |

F | 0.62 | 0.68 | 0.65 | 19 | |

avg/total | 0.70 | 0.70 | 0.79 | 46 |

**Table 4.**Accuracy scores’ comparison between the models proposed in this work (first line) and the models proposed in [5] (second line).

Classifier | ||||||||
---|---|---|---|---|---|---|---|---|

RF | KNN | AB | LR | LDA | CART | NB | MLP | |

Score of the models proposed at this work | 0.78 | 0.72 | 0.70 | 0.67 | 0.67 | 0.67 | 0.63 | 0.63 |

Score of the models proposed in [5] | 0.70 | 0.60 | 0.40 | 0.60 | 0.60 | 0.60 | 0.50 | 0.40 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Guerrero-Higueras, Á.M.; Fernández Llamas, C.; Sánchez González, L.; Gutierrez Fernández, A.; Esteban Costales, G.; Conde González, M.Á.
Academic Success Assessment through Version Control Systems. *Appl. Sci.* **2020**, *10*, 1492.
https://doi.org/10.3390/app10041492

**AMA Style**

Guerrero-Higueras ÁM, Fernández Llamas C, Sánchez González L, Gutierrez Fernández A, Esteban Costales G, Conde González MÁ.
Academic Success Assessment through Version Control Systems. *Applied Sciences*. 2020; 10(4):1492.
https://doi.org/10.3390/app10041492

**Chicago/Turabian Style**

Guerrero-Higueras, Ángel Manuel, Camino Fernández Llamas, Lidia Sánchez González, Alexis Gutierrez Fernández, Gonzalo Esteban Costales, and Miguel Ángel Conde González.
2020. "Academic Success Assessment through Version Control Systems" *Applied Sciences* 10, no. 4: 1492.
https://doi.org/10.3390/app10041492