Influence of Contextual Variables on Educational Performance: A Study Using Hierarchical Segmentation Trees

The general objective of this study is to explore the relationship between students’ contextual characteristics and their performance in mathematical reasoning (MR) and linguistic comprehension (LC) skills. The census data from the ESCALA (EScritura, CAlculo y Lectura en Andalucía) tests developed by Agencia Andaluza de Evaluación Educativa (AGAEVE) in 2017 were used. These tests are carried out in the second year of primary school in the Autonomous Community of Andalusia (Spain). These data have been analysed through the data mining technique known as segmentation trees, using the CRT (Classification and regression trees) algorithm for each of the skills. This has allowed the detection of the high influence of social and cultural status (ESCS) and familial expectations regarding academic performance in both tests. In addition, it allows us to point out that there are different interactions between contextual characteristics and their relationship to performance in MR and LC. These results have made it possible to establish groups of students who may be at risk of not reaching the minimum required levels. Some characteristics of at-risk students are low ESCS, low family expectations or being born in the last six months of the year. The detection of at-risk profiles could contribute to the optimisation of the performance of these groups by creating specific plans.


Introduction
Large-scale educational evaluations allow us to thoroughly understand educational reality [1], improve educational systems [2,3] and optimise the education received by students [3,4]. Their results are often used to establish changes in educational policies [3,5] or as an empirical basis on which to justify established educational reforms [6]. They also make it possible to establish reference standards for current educational trends [3].
In the particular area of school effectiveness, large-scale evaluations have made it possible to detect variables that influence student performance [7]. Economic, social and cultural status (ESCS), gender, migrant status or ethnic minority status have been topics of interest in numerous research studies on school effectiveness and educational performance.
Numerous studies indicate that the ESCS [2,[7][8][9][10][11][12][13][14][15], the location of the centre [13,16], the parents' attendance at university [11] and the number of books at home [17] have a decisive influence on the level of performance achieved by students in large-scale assessments. It is likely that the cultural level of the families influences the value of the education and educational opportunities offered [14,18,19]. Preschool education attendance seems to benefit the low ESCS student body more [20], which could indicate that it provides compensatory educational opportunities for the most disadvantaged families.
In addition, the influence of the ESCS interacts with variables such as the gender of the student, girls being the most sensitive to its influence [21]. Therefore, the ESCS would influence student performance mediated by its interaction with other educational and contextual variables.
Some studies [14,22] present results that show that immigrant students achieve worse academic results and that the greater the presence of ethnic minorities in the school, the lower the performance [14]. Again, the cultural level of the families may be the variable that is truly driving these results, in addition to other cultural mechanisms that could limit access to resources by these families [23].
Although in primary education, there seems to be no difference between boys' and girls' results [24], in secondary education, girls tend to perform worse in mathematics and better in reading and writing [13,25]. This could be explained by the influence of gender stereotypes still present in today's society [26].
Students repeating a year [16] and a high level of religiosity in the school [27] seem to negatively influence educational performance. On the other hand, attendance, high levels of motivation [7] and discipline [28] have a positive effect.
Data Mining in Educational Research Data mining (DM) is part of exploratory research techniques. DM allows the exploration of patterns that explain the educational phenomenon in order to improve it [29,30]. In other words, these predictive analyses help to detect characteristics that put academic achievement at risk. This valuable information should be used by educational institutions to prevent low levels of achievement [31][32][33].
In the achievement of this objective, as stated by Anwar and Ahmed [29], appropriate modifications of the curriculum and performance assessment systems can have a very positive impact.
In the educational field, DM is mainly used in distance education [34,35] because e-learning platforms generate a large amount of data that can be analysed using the big data technique. The qualification of the students is used as a dependent variable, while the characteristics of the students [30], their context [36] and the interaction with the teaching platform in the case of e-learning are usually used as independent variables.
Some studies on school effectiveness that use these techniques indicate that adolescents who are in a relationship achieve worse educational performance, although this may be due to a lower level of dedication to study [37]. By contrast, higher levels of achievement are associated with students who are more aware of the seriousness of drug use and have parents with low rates of alcohol consumption and higher levels of education [30,37].
Within hierarchical segmentation, segmentation trees is a technique that provides results that are simple to interpret and allows non-linear effects and higher-order interactions to be automatically found [38,39]. Segmentation trees are appropriate for the study of large amounts of data and the detection of patterns. They allow the classification of independent variables to forecast values of a dependent variable, reduce the number of independent variables and facilitate the explanation of a phenomenon [40]. In this sense, this study has the following objectives: - To explore the influence of contextual variables and their interaction on academic achievement obtained in mathematical reasoning (MR) and linguistic communication (LC) skills; -To detect profiles of students at risk of school failure.

Research Sample
To carry out the study, census data from the ESCALA (diagnostic test EScritura, CÁlculo y Lectura en Andalucía [Writing, Calculating and Reading in Andalusia]) tests, administered by the Agencia Andaluza de Evaluación Educativa (AGAEVE), during the academic year 2016-2017 in Spain, were used. Once the sample was refined, the results of 75,820 students were used (50.9% male and 49.1% female). The purpose of these tests is to objectively and rigorously evaluate the competencies in LC and MR of students in the 2nd grade of primary education (7-8-year-old students). In addition, it is complemented with a context questionnaire, administered to families, which provides information on contextual and cultural factors. Andalusia is one of the areas with the least social and educational development in Europe [41], so it is necessary to know what factors could be influencing it from an educational point of view. This region is located in the south of Spain and is the autonomous community with the largest population in the country (8,427,000). The results in mathematics of the last Programme for International Student Assessment (PISA) (2018) placed the region almost 30 points below the OECD (Organisation for Economic Cooperation and Development) average [42] The region's proximity to Morocco means that there is an immigrant population of around 20% in some provinces such as Málaga and Almería, and the foreign population accounts for 7.77% in Andalusia (see http://www.juntadeandalucia.es/justiciaeinterior/opam/es/node/90). Table 1 shows a description of the contextual variables extracted from the context questionnaire [43], as well as the debugging treatment used and the categorisation made.

Analysis
The analysis was carried out using the Statistical Package for the Social Sciences (SPSS) [44]. The independent variable was LC or MR, while the dependent variables were the contextual factors of the students. The analysis was initially carried out with the CHAID (Chi-square automatic interaction detector) and CRT (Classification and regression trees) algorithms. The CHAID algorithm was finally discarded, because it generated a high number of nodes that made interpretation difficult.
Therefore, the algorithm ultimately used was CRT, which is suitable for weighing the importance of contextual factors in the explanation of school performance [45] and facilitates the interpretation of the data by dividing the variables in a binary way. For this same reason, pruning has been used, using SPSS commands, which avoids over-adjustments and simplifies the results.

Results
As can be seen in Table 2, according to the means (M) and standard deviations (SD), the results in MR and LC do not present large differences, although it is noteworthy that the minimum score in MR reaches lower values than in LC. On the other hand, the variable family involvement with the student presents a low SD, which could be interpreted as a small intra-group difference. This has led us to discard it for the segmentation technique. The independent variable with the greatest intra-group variance seems to be the ESCS, if we consider the difference between the mean, the maximum and the minimum. In this case, as it is a standardised variable, SD is close to 1. The annex [46] presents the results of the CRT algorithm for MR. The tree contains 63 nodes (32 of them terminals) distributed over 5 levels. The dependent variables selected by the algorithm for the construction of the segmentation tree were ESCS, Expectativas_familiares, Tiempo_tareas, MES_nacimiento, Cantidad_tareas and gender.
The variable that best predicts achievement in MR is the ESCS (nodes 1 and 2), as already pointed out in the scientific literature. It is remarkable that the two different groups that are constituted based on the ESCS interact in different ways with the contextual variables. While students with low ESCS are influenced by the expectations that the family places on them (nodes 3 and 4), those with high ESCS are affected by the time they spend on tasks (nodes 5 and 6). Students from families who expect higher education obtain the highest scores. It could be understood, therefore, that family expectations act as a protective factor in the face of unfavourable socioeconomic situations, probably because they are associated with a more positive evaluation of education and its importance.
It seems that students are negatively affected when they spend more than one hour on schoolwork (see nodes 6 and 10), while spending up to one hour seems to be beneficial (see nodes 5 and 9). This may be because students with higher needs and lower academic achievement generally need more time to complete assignments. On the other hand, being born in the last six months of the year seems to decrease MR performance (see nodes 15 and 16). The amount of homework, when families believe that they should have less (see nodes 55 and 56), is positive for performance.
The segmentation tree for the case of LC is shown in the annex [47]. It is composed of 61 nodes, 31 of them terminals, distributed over 5 levels. The variables included by the algorithm were ESCS, Expectativas_familiares, Tiempo_tareas, gender and MES_nacimiento.
In the case of LC, the variables mentioned above act in the same way as in MR. However, in this case, gender does seem to influence students negatively and positively (see nodes 7 to 14). Regarding the number of tasks in the case of LC, their absence may be worse for performance than their presence (nodes 59 and 60). It is noteworthy that commitment to reading is not among the variables selected by the algorithm, which could be due to the fact that the ESCS would catalyse, in some way, cultural commitment in general.
Taking as a reference the results presented to respond to the second objective of the research, both in LC (see node 31) and MR (see node 32), the student body that presents one or more of the following characteristics could be catalogued as an at-risk group: low ESCS, low family expectations, born in the last six months of the year. In the case of LC, we can also add being a male student. By contrast, having high family expectations seems to protect students with low ESCS, in both tests.

Discussion
The findings are consistent with the existing scientific literature in the field of school effectiveness and educational performance, as they confirm that the ESCS is the variable that best predicts academic success. The influence of ESCS has been confirmed in studies carried out with large-scale evaluations, such as PISA, both in a national context [10,16] and at the international level [2,7,9,[11][12][13][14][15]19]. However, this contribution to the existing body of knowledge has demonstrated that high expectations of families may be a protective factor for students with low ESCS. This could be explained by the fact that these same students are also likely to have greater social capital and support from their families [18,48]. However, it could also be due to the reciprocity of expectations, i.e., higher expectations for students who perform well.
On the other hand, spending too much time on homework does not seem to be beneficial for elementary school students. This is probably because, as Valle et al. [49] stated, it is the students with the greatest difficulties who spend the most time on homework. However, at the secondary school level, it is the students who spend more time on homework that perform better [12,50]. This could be a consequence, once again, of ESCS, since students with a more comfortable socioeconomic position are also the ones who generally receive more support from their parents and more learning opportunities [19].
The issue of gender also deserves to be highlighted. While no interaction was found in mathematical reasoning, it was recorded in linguistic understanding, with male students being disadvantaged. This partially confirms the evidence provided by the scientific literature of the better performance of female students in communication skills and higher performance of male students in MR [13,19,24,51]. Educational systems should undertake appropriate changes in educational programmes to overcome this shortcoming and achieve, at early ages, gender equity in these two basic competencies.
As some of the previous studies have shown [6,52], the results derived from educational research are not usually used to promote educational changes at the centre or teaching methodology level, even in countries with a long tradition of educational evaluation [9]. They are usually used more for political than educational purposes [6]. Even so, taking as a reference the profiles detected for students at risk of academic failure, we consider that it could be beneficial to develop workshops in schools, aimed at raising awareness among families about the importance of family expectations in student achievement. It would also be helpful for schools to offer extracurricular activities of a cultural nature for low-ESCS students. Book-lending services could also help to narrow the gap between students, as families with fewer resources have more difficulty acquiring books [53].
Educational systems must guarantee equality of opportunity, overcoming the segregation of students by contextual characteristics [54] that may have been aggravated by the coronavirus crisis [55,56]. The results of this study managed to explain about 10% of the variance in performance in LC and MR among 7-and 8-year-old schoolchildren according to their contextual factors, which is close to the results of previous studies [57]. This makes us reconsider the moment at which the influence of the ESCS begins to determine results in school effectiveness [2,[7][8][9][10][11][12][13][14][15]. Performance at these ages could be explained more by individual characteristics, such as differences in executive functions [57][58][59][60][61][62] than by context. Although these results could be evaluated as positive, since it is not contextual factors that determine performance at early ages of schooling, they do invite reflection on the factors that could affect performance at these ages and when contextual factors begin to gain importance. Some studies suggest that students from families with lower ESCS may have lesser development of certain cognitive skills [62]. However, early interventions aimed at stimulating these skills could mitigate these inequalities [63,64]. Therefore, it is necessary to detect vulnerable students and promote actions that allow for equitable and sustainable educational systems within today's society. This also raises questions for future research. For example, at what point are the ESCS and other contextual factors decisive for school effectiveness? Could the ESCS effects be avoided with early stimulation of variables associated with executive function? What role do teachers and teaching-learning methodologies play in this effectiveness? Would the effectiveness of the educational centre have an influence in the early years of schooling? To do this, progress must be made on the early detection of students at risk, using large-scale evaluations for educational purposes and integrating them into the daily work of educational centres. This progress should also aim to connect the information coming from the large-scale evaluation with the dialogue between different paradigms and more everyday realities [65]. Only the synergy between the different perspectives on educational research will allow progress in the identification of at-risk student profiles. According to the results of the study, in the early ages of schooling, contextual factors still do not explain to a great extent the variance in student performance.
As for the limitations of the study, variables that could be relevant to the explanation of academic performance have been left out of the research. Specifically, we are referring to immigrant status, pre-school attendance and bilingualism. These types of variables were not included in the context questionnaire administered by AGAEVE, so it was impossible to access this information. Resolving this situation would require collaboration between institutions and researchers, which, given the complex bureaucratic framework, is not always easy to achieve. In this sense, for educational science to advance, research must open its doors to both institutions and schools themselves.