The Effect of Green Software: A Study of Impact Factors on the Correctness of Software

: Unfortunately, sustainability is an issue very poorly used when developing software and hardware systems. Lately, and in order to contribute to the earth sustainability, a new concept emerged named Green software which is computer software that can be developed and used efﬁciently and effectively with minimal or no impact to the environment. Currently, new teaching methods based on students’ learning process are being developed in the European Higher Education Area. Most of them are oriented to promote students’ interest in the course’s contents and offer personalized feedback. Online judging is a promising method for encouraging students’ participation in the e-learning process, although it still has to be researched and developed to be widely used and in a more efﬁcient way. The great amount of data available in an online judging tool provides the possibility of exploring some of the most indicative attributes (e.g., running time, memory) for learning programming concepts, techniques and languages. So far, the most applied methods for automatically gathering information from the judging systems are based on statistical methods and, although providing reasonable correlations, these methods have not been proven to provide enough information for predicting grades when dealing with a huge amount of data. Therefore, the great novelty of this paper is to develop a data mining approach to predict program correctness as well as the grades of the students’ practices. For this purpose, powerful data mining technologies taken from the artiﬁcial intelligence domain have been used. In particular, in this study, we have used logistic regression, decision trees, artiﬁcial neural network and support vector machines; which have been properly identiﬁed as the most suitable ones for predicting activities in the e-learning domains. The results have achieved an accuracy of around 74%, both in the prediction of the program correctness as well as in the practice grades’ prediction. Another relevant issue provided in this paper is a comparison among these four techniques to obtain the best accuracy in predicting grades based on the availability of data as well as their taxonomy. The Decision Trees classiﬁer has obtained the best confusion matrix, and time and memory efﬁciency were identiﬁed as the most important predictor variables. In view of these results, we can conclude that the development of green software leads programmers to implement correct software.


Introduction
Information and Communications Technology (ICT) plays a part in many aspects of human activity.The role of ICT in the UN Sustainable Development Goals [1] has been highlighted by many authors [2][3][4].However, ICT plays a dual role [5]: ICT can help to reduce the environmental impact of activities in the environment, but, at the same time, it is also a significant user of resources, so it is also a main contributor to the negative impact of human activities.The ICT industry represents approximately 2% of global carbon dioxide (CO 2 ) emissions [6]; this amount of emissions is equivalent to aviation [7].Furthermore, it is estimated that ICT will account for 3% of global emissions by 2020 [8], but recent research, based on the extrapolation from the current trends, argues that "for the near future (up to 2020) the current energy consumption plateau will remain while the number of users continuous to grow but at a decreasing rate" [9].Clearly, there is exponentially growing energy consumption caused by the ICTs [10], which has a negative impact on sustainability.The ICT Green House Gas Emissions (GHGE) relative contribution could grow until 14% of the 2016-level worldwide GHGE by 2040.Regarding energy efficiency, the software system performance can be improved from different angles such as coding styles and algorithms [11].There is not a standardized definition of green (or sustainable) software [12].Nevertheless, the protection of resources is a commonly addressed issue.We will focus on environmentally sustainable software that is used efficiently and effectively.
After a decade of the implantation of the European Higher Education Area (EHEA) [13,14] in Spain, we are starting to observe profound changes in the processes of teaching/learning, thanks to the development of new methodologies that it boosted.It is well known that the purpose of the EHEA was to develop new teaching methods based on the students' learning process rather than the teacher's point of view.Under this perspective, the new methods should stimulate students' interest and offer appealing material, fair assessment and relevant feedback.
Based on these ideas and principles, we developed an innovative experience with a course on "Algorithms and Data Structures" (ADS) in the second year of a Computer Science Degree, using a web-based automatic judging system called Mooshak [15].The course was organized as a series of activities in a continuous evaluation context.Many of these activities used the online judging system; some of them were done individually, while others were done collaboratively by groups of students.These three elements (online judging, continuous evaluation and collaborative work) were the keys to generating motivation and enthusiasm among students.The effect of that experience was an important descent on the dropout rates and a general improvement on marks and pass rates [16].The amount of data collected was a valuable material to be used in decision-making processes.In fact, since the academic performance and future professional achievements are considered to be related concepts, analysis of the success factors behind online collaborative programming activities may help to understand and improve performance in team programming within a large programming project.
During the last several years, improvements in the Artificial Intelligence (AI) methods have made it possible to develop expert systems and Decision Support Systems (DSS) in many different fields such as business, health, psychology, environmental science and education [17,18].Among the good classifiers in the AI field, we highlight Artificial Neural Networks (ANNs) and Decision Trees (DT) as the most widely chosen for the construction of DSS [19,20].In many cases, the goal is to establish groups or clusters with the data that have similar features.These cases are unsupervised since there are no references or classification expected and, therefore, these cases are data-driven insight.
We have previously demonstrated that AI methods are capable of improving the accuracy of the final classification as well as selecting the best features as, very often, the number of features to deal with is huge.For instance, we have experienced this in classification tasks for male fertility [21,22] and urology diagnosis [23,24].This will lead to the knowledge discovery in databases, data mining or the process of extracting patterns from large data sets.
As mentioned above, the online judging system named Mooshak [25] was used to evaluate the work of the students.This learning environment is a free automatic tool to evaluate the correctness of computer programs.Although this system was designed to support programming competitions, it has also been used in both the evaluation of programs in computer programming courses and the assessment of test exams [16,[26][27][28].Students, instructors and system administrators have different web-based interfaces.The instructor introduces a set of pairs (program input, expected output from the program) for each problem proposed in Mooshak.This set is used in the correctness of the computer programs submitted for that problem by the students.An important advantage of using the online judge is the availability of a lot of information about the submissions made by the students, which can be analyzed in real time by the teachers.
Educational data mining (EDM) is an emerging research field aiming at discovering knowledge, making decisions, and providing recommendations.In spite of education being a novel data mining application, a recent review identified 240 educational data mining works describing 222 EDM approaches and 18 EDM tools between 2010 and the first quarter of 2013 [29].Previous reviews of EDM [30,31] identified more than 250 studies carried out between 1973 and 2010.Studies were classified in various educational categories (e.g., analysis and visualization of data, providing feedback for supporting, instruction recommendations for students, predicting students' performance, student modeling, student behavior modeling, student assessment, grouping students) and different variants of e-learning systems (e.g., web-based courses, learning management systems, and adaptive and intelligent web-based educational systems).
Regarding Green Computer Science as part of ICT and sustainable development [32], regrettably ICT sustainability has not been widely addressed in education [5].Pattinson highlights that "ICT sustainability does not feature more strongly in school or University ICT curricula, and that, where it does appear, it is often a sub-part of a broader topic" [5].These are some examples that are exceptions where sustainability is taken into account and the benefits are explained in every work.For instance, the objective of the paper presented by [33] is inserted into a wider finality: to provide recommendations regarding the redesign of pre-service teacher training curricula and learning programmes.
In [34], they used data from 2413 students in grades 6, 9, and 12 from 51 schools across Sweden to study the effectiveness of Education for Sustainable Development (ESD).The results of this study reveal the key role ESD plays in addressing sustainable development (SD), paving the way for a more sustainable future.
The review [35] analysis has emphasised the lack of environmental competences amongst pre-service teacher students and the gaps in the teacher training curriculums regarding environmental education (EE).The overall scarcity of research in this area, jointly with certain gaps and methodological limitations, affirms the need for strengthening the evidence base.Although sustainability is one of the greatest problems to affect humanity nowadays, it does not appear to be receiving the coverage that it might deserve in education [5].The attention given to ICT sustainability as a subject in the education curriculum is insufficient and sustainability should be considered "a core element of the teaching, education and research in ICT" [5].However, there exist some initiatives, such as a green computer science program [36] or a master program in green ICT [37].
Two kinds of works have been carried out in EDM: approaches based on predictive models and approaches based on descriptive models.Predictive approaches, which depict around 60% of all of the studies proposed in EDM in the last four years (e.g., [38,39]), usually employ supervised learning functions to estimate unknown values of dependent variables based on the values of related independent variables [40].In contrast, descriptive models (e.g., [41,42]) frequently use unsupervised learning functions to obtain patterns that explain the structure and relations of the mined data [43].The implementation of a model is developed by a task.For example, clustering [38,41], association rules [44] and correlation analysis [42] generate descriptive models, whereas classification [45], regression [39] and categorization [46] produce predictive models.Most of the studies carried out in the last four years use classification and clustering for EDM.After some task is chosen, there are many methods and techniques to build the approach.Bayes theorem and DT [38,45], instances-based learning [47] and hidden Markov model [46] are the top-four most used methods in EDM, whereas LR [39], frequencies [39] and hierarchical clustering [47] are the most used techniques in EDM.Lastly, an algorithm, equation, and/or frame are implemented to mine the source data.K-means, expectation maximization, J48, and Naive Bayes are the most deployed algorithms [38,47].Statistical equations, including descriptive [41] are the most popular equations, and several versions of Bayesian networks [45] are the most used frames.
Several studies have compared the accuracy of data mining techniques [48,49].Data mining techniques were employed to compare the achievements of Computer Engineering Department students in Karabük University [49].Age, gender, type of high school graduation and whether the students were studying in distance education or regular education were the parameters studied.Two prediction/classification methods, ANN, and DT were used and compared to each other in order to study the achievements of Computer Engineering Department students in Karabük University [49].DT algorithms produced the best prediction results with 97.8% overall accuracy in comparison with 94.4% for ANN.
Various EDM studies have investigated which type of modules or features influence students in learning collaboratively [50].Data mining techniques were used to classify chat messages in online communication with the aim of providing learners with real-time adaptive feedback while learning collaboratively [51].A frequent sequential pattern mining algorithm was used to find the patterns characterizing some aspects of teamwork, and to identify significant sequences indicative of problems in teams of five to seven students [52].Three types of events (wiki event, ticket event and SVN event) reflecting the students learning process were analyzed to draw conclusions from the traces of their collaborations.Clustering technique on data collected in a software development project course was applied to characterize the collaborative work of stronger and weaker students [53].In this work.seven groups of five to seven students (43 students in total) were clustered based on 11 attributes that seemed to capture essential aspects of team performance (e.g., average number of lines deleted per wiki edit).
To the best of our knowledge, no previous data mining studies have analyzed such large data sets from an online judgement tool.The only previous work found in the literature was statistical analyses (no an artificial intelligent technique) of 89,402 programs written to 60 specifications in an online judge, which was carried out to study the effectiveness of software diversity [54].The results of our empirical study have been used to identify factors which have an effect on students' performance.It will also help address future directions as regards improving students' knowledge and skills in programming courses.
The aim of this paper is to identify the main factors to predict program correctness, practice grades of students and the use of computers by students by using a data mining approach.These identified factors allow us to analyze the reasons why students are consuming electrical resources such as computers, monitors, and so on.The consumption of computers has become an issue because of high electricity costs and problems with power supply capacity.In many cases, students use computers inefficiently because computers draw around 70-90% of its maximum power usage even when doing no useful work [8].Consequently, the objective is to generalize the lesson learned for good practices of students in Education for Sustainable Development (ESD).
One of the novel aspects of our proposal is the use of big data provided by online judgement systems to obtain the predictor variables.In our approach, we make use of AI methods in order to tackle this problem.In particular, we use four supervised techniques: Logistic Regression (LR), DT, Multilayer Perceptron (MLP) and Support Vector Machines (SVM).Some of the AI techniques can indicate rules and correlations among input variables and the predictable variables.This will help to identify the key factors to predict the program correctness, and thus the programmer's performance.
The remaining part of the paper is organized as follows.In Section 2, we start by defining the materials and methods of the study (samples of the study).Section 3 shows the results of experiments carried out.We then proceed by providing available data as well as a detailed explanation of the different values of our database.Finally, Section 4 analyzes the results, draws the relevant conclusions and presents future work.

Materials and Methods
The procedure followed in this work is summarized in Figure 1.This scheme is represented in three steps: 1.
Step 1: Among the different AI techniques, we choose the four methods LR, DT, MLP and SVM. 2.
These four methods LR, DT, MLP and SVM are the basis to construct the model.

3.
Finally, the model is built through the training model and then it will be capable to predict/classify.The machine learning software package used in our experiments is WEKA [55].It includes the most efficient AI algorithms, such as LR, C4.5, backpropagation and SVM, among others.

Data Description
The information used in our analysis was collected along a period of seven academic years, from 2008/09 to 2013/14 with similar groups of practices.It has been possible to analyze this period as these groups have similar features.The thematic and the difficulty of the proposed problems in the practices did not vary from year to year.During this long period, a change in the curriculum took place motivated by the implantation of the EHEA.Thus, the first three years of the study correspond to the former Computer Science Engineering (CSE) and the last four to the new Degree in Computer Science (DCS).From the total of 505 students analyzed, 137 (27.1%) are from CSE and 368 (72.9%) from DCS.
Students were recruited from a course on ADS which was included both in CSE and DCS, with a very similar content, teaching methodology and schedule of activities.In a certain sense, ADS teachers anticipated the methodological changes in the subject, so the results of all years are comparable.Basically, ADS is a course in advanced programming concepts, which stresses issues of algorithms and data representation.This course introduces topics such as data structures, abstract data types, formal specifications and graph theory.It is a second-year course, which takes place after two introductory programing courses in the first year.The programming languages used to illustrate the concepts studied are C and C++, and Maude as a formal specification language.Both in CSE and DCS, ADS are organized as weekly lectures along a semester, laboratory sessions, a programming project and a final exam.
More specifically, the course consists of four topics: T1-Formal specifications; T2-Sets and maps with hash tables; T3-Sets and maps with trees; T4-Graph representation and problems.Each of these topics is evaluated in a continuous evaluation context, by means of a specific activity.T1 is evaluated with a practical activity in the online judge using Maude.T2 and T3 are assessed by means of partial exams in the middle of the semester.In addition, T4 is carried out with a programming activity in C/C++ to solve some graph problems using the online judge.
In addition, the programming project is a compulsory activity where students have to face a problem of higher difficulty related to the concepts studied in the course, but mainly related to topics T2 and T3.The project to solve varies from year to year; for example, in the 2008/09 course, the students had to develop the internal engine of a spellchecker.The activity is decomposed into a series of exercises that have to be programed in C or C++.These exercises are evaluated in the online judge.
They present an increasing degree of difficulty, from basic input/output format until reaching the last problem corresponding to the whole project.The project has to be done cooperatively by a group of two students in a co-located software development environment [56].However, in some cases, students are allowed to do the activity individually (for example, if they do not find a classmate, or if their mates drop out).
The information collected for the present work corresponds to the submissions made by the students to the programming project.This includes submission-specific data such as time and date of the submission, number of the exercise, memory and time consumed by the program, and the result of the evaluation (whether the program is accepted or not).These data are augmented with information about the students, and the context where the submissions are done such as gender, marks of the students in the activities of the course, number of times the student has been previously enrolled in ADS, and whether the project is done individually or in a group of two.Table 1 presents the list of the variables used in this study.

Logistic Regression
Probably one of the most commonly used techniques for prediction is LR, where, in many cases, it is compared with other usual techniques such as ANN and DT [57][58][59][60].It is a multivariable method which means that it tries to establish a functional relationship between input data (predictors or independent variables) and one output (outcome, dependent variable); generally speaking, the outcome variable of a LR is categorical [61].For the aim of this work, we will focus on both binary LR which could predict membership of only two categorical outcomes (acceptance of a program, correct or not) as well as general LR to predict four outcomes (grades A, B, C and D).The grades are defined as follows: A = mark in [10,9]; B = mark in (9,7]; C = mark in (7, 5]; and D = mark in (5,0]. The LR model identifies the probability of the default event occurring as the following Formula (1): where P(Y) is the probability of Y occurring, which can be also expressed Y belonging to a certain class.
x n are predictor variables and b n are coefficient to be determined by the LR.

Decision Trees
A DT classifier [62] is represented in the form of a tree structure; each node is either a decision node, which specifies some test to be carried out on a single attribute-value, with one branch and sub-tree for each possible outcome of the test, or a leaf node, which indicates the value of the target attribute (class) of examples.
Among the tools for classification and prediction, DT is one of the most powerful and popular.In opposition to other AI methods, such as MLP, the main advantage of DT is due to the fact that they represent rules.Rules can be readily expressed so that persons can understand them or even directly used in a database.For further information about this advantage with a categorization, we can see a classical and recurrent example in the bibliography: the forecasting prediction to play tennis [63].
For some applications, the accuracy of a classification or prediction is the most important result.In such situations, the user is not necessarily concerned about how the model works.However, in other situations, the ability to explain the reason for a decision is critical.
DT provides a good explanation applicable to the life habits of the population and therefore interprets problems very much according to the principles of mathematical and statistical principles [64].A DT may be used to classify a given example; one begins at the root and follows the path provided by the answers to the questions in the internal nodes until a leaf is reached.The DT algorithm family includes well-known algorithms, such as CART [65], ID3 [66], and C4.5 [67].The algorithm chosen in our study is C4.5.
The classification and regression trees (CART or C&RT) method of Breiman, Friedman, Olshen, and Stone [65] generates binary DT.In the real world, the chances of biased binary outcomes are few, but the binary method facilitates interpretation and analysis [68][69][70].Therefore, our study employs the binary method in the DT to classify the programs implemented for the students.A simple representation of DT consists of two types of nodes:

•
Decision nodes: Usually represented by circles.From the circles appear the arcs with the diverse decisions.
DT are commonly used in decision analysis in order to help identify a strategy most likely to reach a goal.Another use of DT is as a descriptive means for calculating conditional probabilities.

ANN-Multilayer Perceptron
An MLP [71-73] is composed of three or more layers of neurons; each layer is fully connected to the next one.Usually, there are three types of layers (see Figure 2):

•
An input layer receives external inputs.

•
One or more hidden layers transform the inputs into something useful for the output layer.

•
Finally, an output layer generates the classification results.
Every neuron in the output and hidden layers is a computational element with a nonlinear activation function.The basis of the network is that when data are introduced at the input layer, the network neurons perform calculations in the following layers until an output value is obtained at each of the output neurons.The final output provides an appropriate class for the input data.
In an MLP, every neuron in the input and the hidden layers is connected to all neurons in the next layer by weighted connections.The neurons of the hidden layers compute weighted sums of their inputs and they add a threshold.The activity of the neurons is calculated from the resulting sums by applying a sigmoid activation function.The input layer represents the input data (the input data set is described in Section 2.1).The output layer represents the classification result and it contains as many outputs as the number of classes in a particular problem.The hidden layer is calculated in the experimentation.
The learning process is defined in the following formula: where: • p is the number of inputs, • ν j is the linear combination of inputs x 1 , x 2 , ..., x p , • the threshold θ j , w ji is the connection weight between the input x i and the neuron j, • and f j is the activation function of the j th neuron, and y j is the output.
The sigmoid function is a common choice as an activation function.This mathematical function is defined as: (3) In an MLP, a single neuron is able to linearly separate its input space into two subspaces by a hyperplane defined by the weights and the threshold: the weights define the direction of this hyperplane and the threshold term θ j offsets it from the origin.
In our proposal, a supervised learning method denominated backpropagation [74] is used to train the MLP.This method is used in conjunction with an optimization method such as gradient descent, for the adaptation of the weights.In this method, the network is presented with input examples as well as the complementary desired output.The following steps are carried out in this algorithm: 1.
All the weight vectors w are initialized with small random values from a pseudorandom sequence generator.

2.
Three basic steps are repeated until the convergence is achieved, that is, the error E is below a preset value.

•
The weight vectors w i are updated by where ∆w(t) = −η∂E(t)/∂w. (5) where t is the iteration number, w is the weight vector, and η is the learning rate.
The error function E is minimized by adapting the weights and the thresholds of the neurons: In the error function, y p is the actual output and d p is the desired output for input pattern p.
The minimization of E can be accomplished by gradient descent, a process where the weights are adjusted to change the value of E in the direction of its negative gradient.
For the construction of the three-layer MLP architecture, we can conclude that layers 1 and 3 are the easiest ones.Layer 1 corresponds directly to the input vector and layer 3 is the output with two outputs for classification: correct and error program output.Layer 2 (the hidden layer) consists of the number of hidden neurons, which means the most advanced question in the network's architecture.The choice of the number of neurons in a hidden layer leads to a trade-off between performance and the risk of overfitting [71].The number of hidden neurons will significantly have effects on the ability of the network to generalize from the training data to unknown examples [75].Findings of the experiments provided evidence that a low number of neurons for this layer leads to a poor performance for both training and test sets.At the opposite edge, a high number of neurons keep a good generalization and consequently a high accuracy, for training and test sets, although the risk of overfitting is high [76].The tests carried out between these two options leads to meet the optimal solution for this layer with eight neurons (see Figure 2).

Support Vector Machines
Although the foundations of SVM are dated back to 1963, current SVM was proposed in a seminal paper presented in 1992 [77].A detailed description of SVM can be found in [78,79].A typical two class problem is similar to the problem of predicting programs as either correct or incorrect.
In a two class classification problem, the estimation of a function f : N → {±1} using training data is required.These data are l N-dimensional patterns x i and class labels y i , where such that f will classify new samples (x, y) correctly.The SVM classifier, as described by [80,81], satisfies the following conditions: which is equivalent to Here, training vectors x i are mapped into a higher dimensional space by the function ϕ.The equations of Label (8) construct a hyperplane w T ϕ(x i ) + b = 0 in this higher dimensional space that discriminates between the two classes.Each of the two half-spaces defined by this hyperplane corresponds to one class, H 1 for y i = +1 and H 2 for y i = −1.The SVM classifier therefore corresponds to decision functions: The SVM finds a linear separating hyperplane with the maximal margin in this higher dimensional space.The margin of a linear classifier is the minimal distance of any training point to the hyperplane which is the distance between H 1 and H 2 .

Evaluation
We have used a cross validation method, which is widely used to ensure the generalization of the AI methods.A k-fold cross-validation method has been applied for the performance assessment of network.This experimental design method, also called k rotation estimation, has been widely used to minimize the bias associated with the random sampling of the data set: where: • n is the size of the dataset S, and S i is the probable target of x i by the classifier function I.
Therefore, it is concluded that: In our study, the value of k is set to 10, and consequently a ten-fold cross-validation method has been applied for the performance assessment of the model.The data has been divided into ten sets (S 1 , S 2 , ..., S 10 ).

Results
In this section, we expose the results obtained starting from the experiments carried out.In problems of classification in multiple classes, the confusion matrix can be a very useful tool to look into the strengths and weaknesses of the classifiers.This matrix relates the real classes of the samples with the classes predicted by the system.The prediction results by applying a 10-fold cross validation to the four methods used are presented in Table 2.The values of the confusion matrix are accepted and not accepted, according to the result of the program output.The letter "A" means accepted program in Mooshak and the letter "R" means rejected program in Mooshak.Two blocks of results are shown: collaborative programming activities and individual programming activities.Since the output variable had two nominal values, the confusion matrix shows a 2 × 2 square matrix where the correct predictions (the shaded cells) are placed at the diagonal from the upper left to the lower right corner.The rows of the confusion matrixes represent the actual and the columns represent the predictions.The four AI methods were evaluated by obtaining the following measures: classification accuracy, sensitivity, specificity, positive predictive value and negative predictive value (Table 3).
DT obtained the best confusion matrix.Time and memory efficiency were identified as the most important predictor variables as shown in Figure 3. This, as it has been aforementioned, is the purpose of this paper: to find the correlation between the grades and other variables.Accordingly, this fact will lead us to find indicators that allow us to reduce the consumption of electrical devices by students when developing practical works.
Figure 4 shows the plot of the execution times of the last submission for each collaborative group, as a function of the final mark in the practice.Execution times are relative to the best time for each year.In order to quantify the correlation, a linear model is fitted using the least squares method.The result proves the inverse relation between both variables (the shorter time the better mark).In any case, the parameter that indicates the goodness of fit (R2) has a very low value of 0.0555.This fact indicates that time efficiency is not the only parameter in the evaluation of collaborative work.
Regarding memory efficiency, Figure 5 shows the plot of the memory consumed by the last submission for each collaborative group, as a function of the final mark in the practice.Memory is relative to the best memory usage for each year.The correlation between memory and mark has also been studied with the fitting of a linear model using the least squares method, obtaining a negative slope.The R2 parameter, with value 0.0533, shows once again that the mark is not explained only with the memory efficiency, although it is a very important factor.Figure 6 shows two sample fragments of source code in C++, exemplifying the inverse relationship between execution time and code correction.Both samples correspond to the implementation of the hashing function in a hash table class.The sample on the left is a good choice for the function, using an iterative multiplicative method with base 17.However, the sample on the right is an incorrect implementation; the use of pow function produces an overflow, so the result is saturated to the max integer.Therefore, the code on the left takes 0.456 s on the test case for this problem, while the code on the right takes 4.964 s.

Academic Findings
Teaching methods have been changing dramatically in recent years.For example, the appearance of Massive Open Online Courses (MOOCs) offers many advantages but also presents new challenges.The correlations obtained among input variables and the predictable variable help to identify the key factors to predict the performance in the practices of the students.In a MOOC, there are thousands of students and a teacher cannot track each student individually as in traditional methods.For this reason, techniques that have been successfully applied in other environments (e.g., Biology, Internet of Things and so on) are of particular interest in these new educational scenarios, even more so when dealing with a huge amount of data.
One of the contributions of this work is to lay the foundation for automatic prediction of the programmer's performance by using data provided by online judgement systems.Two key factors associated with green software were identified to predict the programmer's competence: execution time and memory used.However, we have not found the programming language as a key factor in determining the correctness of the programs.This finding contrasts with the fact that previous research revealed that errors made by programmers may depend on programming language [54].These results provide insights into computer programming learning.It means that, if we teach how to develop green software (with low memory and time consumption), we will improve the performance of the programmer as he will acquire good practices to implement correct programs.While performance considerations are a key aspect of many software systems, undergraduate curricula do not always explicitly teach students about tools and techniques for improving software performance.Although this topic is included in courses designed in the IEEE/ACM 2013 Curriculum Guidelines for Undergraduate Degree Programs in Computer Science [82], the topic is sometimes addressed incidentally by issues ranging from architectural matters in machine organization courses, to asymptotic complexity in algorithm design courses.
As shown in our study, when teaching students how to design programs with performance issues in mind, as well as to identify and rectify performance bottlenecks, it can lead novice programmers to develop correct programs.Notice that using design patters can help to reuse efficient solutions [83].Design patterns describe high quality practical solutions to common programming problems.Students should understand the principles of good program design to achieve a clearer system architecture, thus reducing the resource consumption (memory usage and execution time).However, design patterns are used only marginally in programming courses [84].

Educational Data Mining Findings
The aim of this paper has been to apply four data mining techniques (DT, MLP, SVM and LR) to predict students' results in a learning environment, in particular, the prediction of the program correction and the practice grades.Our findings show that DT is the best predictor as compared with MLP, SVM and LR classifiers.Nevertheless, factors such as efficiency (time taken to build a model) and understanding (easy of interpreting the developed model) should be also considered according to the particular application context.When choosing a method over another for a prediction problem, in addition to prediction accuracy, we should also consider factors like efficiency (time it takes to build a model), interpretability (ease of understanding of the developed model), deployability (ease of deploying the model for actual use) and theoretical justification.
Particularly, we would like to emphasize the high percentages of all the measures (classification accuracy, sensitivity, specificity, positive predictive value and negative predictive value) for the Decision Trees.The rest of the techniques obtain good percentages but around 10% lower.This indicates how suitable the Decision Trees are for the prediction in this environment.

Sustainability Findings
ICT for sustainability is an emerging research and educational area that encompasses different fields [85]: Environmental Informatics, Computational Sustainability, Sustainable HCI, Green ICT, and ICT for Sustainability.Energy consumption of data centers is rapidly becoming an environmental problem [86]: in 2016, global data centers used roughly 416 terawatts or about 3% of the total electricity consumed, approximately 40% more than the entire electricity consumed in United Kingdom.The forecast is that this consumption will double every four years.In this paper, we have explored how EDM can be used to support Green ICT: how to reduce the environmental impacts of ICT hardware and software.The improvements in power effectiveness of data center have mainly focused on new data center designs and more efficient components [87].However, theoretically, power usage effectiveness is approaching, future efficiency gains will be minor and new approaches are needed.For example, most Green ICT practices do not yet particularly tackle software-specific aspects to improve sustainability [88] as we have proposed in this paper.In recent times, artificial intelligence has been employed for reducing power consumption in data centers [89].In a MOOC, where thousands of students can be accessing the course at the same, the power consumed by the data centers where the course is hosted can be very significant.The data mining approach presented in this paper to predict program correctness as well as the grades of the students' practices helps to reduce the usage of data centers that support a MOOC.Data centers used efficiently and effectively, with less memory and time consumption, means minimal impact to the environment.
As future work, we will apply new data mining techniques (such as machine learning based on big data or deep learning) to the critical problem of automatic assessment of students to find stronger evidence on the impact of green software on program correctness.In addition, we will be able to generalize these achievements in order to analyze other factors associated with green software such as pattern design [90] to identify new predictor variables that affect program correctness.We hope that our future findings help MOOC teachers and organizers to better know students and be able to improve their education programs for sustainable development.

Figure 2 .
Figure 2. The architecture of the MLP network consists of input layer, hidden layer and output layer.The input layer represents the input data (the input data set is described in Section 2.1).The output layer represents the classification result and it contains as many outputs as the number of classes in a particular problem.The hidden layer is calculated in the experimentation.

Figure 3 .
Figure 3.The experimentation carried out with a DT helps to predict the acceptance of a program.

Figure 4 .
Figure 4. Final mark of the practice of each collaborative group.Execution times are relative to the best time for each year.

Figure 5 .
Figure 5. Final mark of the practice of each collaborative group with respect to memory consumption of the last submission of the group to the last problem (which contains the whole practice).Memory is relative to the best memory usage for each year.

intFigure 6 .
Figure 6.Two fragments of sample of code showing good (left) and bad (right) programming practices.

Table 2 .
Definition of the confusion matrix MLP, SVM, DT and LR classifiers.

Table 3 .
Equations according to the DT, MLP, SVM and LR classifiers.