Validity of the “Big Data Tendency in Education” Scale as a Tool Helping to Reach Inclusive Social Development

: Big Data technology can be a great resource for achieving the Sustainable Development Goals in a fair and inclusive manner; however, only recently have we begun to analyse its impact on education. This research goal was to analyse the psychometric characteristics of a scale to assess opinions that educators in training have about Big Data besides their related emotions. This is important, as it will be the educators of the future who will have to manage with Big Data at school. A nonprobability sample of 337 education students from Peru and Spain was counted. Internal consistency, as well as validity, were analysed through exploratory and conﬁrmatory factorial analysis. The results show good psychometric values, highlighting as relevant a latent structure of six factors that includes emotional and cognitive dimensions. As a result, the proﬁle deﬁning the participants in relation to Big Data was identiﬁed. Finally, the implications of the Big Data for Inclusive Education in a sustainable society are discussed.


Introduction
Big Data has become an emerging term and concept from a social, cultural and pedagogical point of view. Although the term has been used since the 1980s, it was in 2008 that D.J. Partir (from LinkedIn) and Jeft Hammerbadier (from Facebook) used "Big Data" to refer to a new professional activity. The paper "The Exabyte era" in "Wired" magazine in 2010, as well as similar papers in that year, determined the beginning of Big Data as social and business phenomena [1]. Now, Big Data admits a large number of definitions and perspectives [2] mainly related to their technological properties. In this sense, Big Data would be defined as managing, gathering and organizing big volumes of data, and how they are analysed and interpreted [3]. The value generated by the data can be considered the most important element of the characteristics of Big Data [4].
Big Data could be the key to a new social revolution [5]. Nevertheless, much of the literature is focused on the business world [6] rather than areas such as education. This is quite remarkable, considering that the implementation of Big Data in the educational system could be a real boost in terms of inclusion and improving the quality of teaching and learning processes [7]. This feature is affecting to the core of the Inclusive School or Inclusive Education. An inclusive school is one where all students feel included and accepted, whatever their abilities, recognized in their uniqueness, valued and with the possibility of participating in the school. In fact, the integration of Big Data in Education

Methodology
Taking into consideration the arguments presented above, a pilot study was conducted. The target population was Spanish education students from the south of Spain and from north of Peru. Peru is a region of interest because of broad partnerships with the Andalusian universities. Nevertheless, an incidental sample of 337 participants were recruited from Peruvian and Spanish university students (mean age = 23.1, S.D. = 7.88, 70.03% female) of educational degrees. The majority of the participants are from Spain (70.03%) and 29.97% from Peru. The Spanish sample is from three universities: Malaga (107), Jaen (57) and Seville (72); while the Peruvian sample has students from six universities: Alas Peruanas (27), Pedro Ruiz Gallo (12), USAT (50) and Nacional de Piura (12). All Peruvian universities are from the same region (north Peru). Of the sample, 65.28% said that they had no idea about what Big Data is, 31.15% said that they knew a little bit but not enough, and only 3.25% said they knew what it is. All participants were informed of the research goals, giving their informed consent.
Big Data Applied to Education Scale (VABIDAE) [14] is a 31 items questionnaire that gathers information about opinions and how people face and perceive the presence of Big Data technologies in the educational system and in classrooms. The VABIDAE construction process is described by authors on the website https://vabidae.gitlab.io/vabidae/. As part of the VABIDAE, and previous to the questions, the scale embeds a short video about Big Data in Education. This video is available online by Euronews: https://es.euronews.com/2015/05/22/big-data-al-servicio-de-la-educacion.
This video contextualizes the topic and offers a piece of common information to all participants. This strategy is used to reduce misunderstandings and mistakes about what the issue is, as authors say. In this sense, the VABIDAE authors assume that those surveyed could have no previous idea or have misunderstanding ideas regarding what Big Data is, biasing their answer in order to keep the coherence between their wrong assumptions and their answers (more information about this topic at VABIDAE website). On the other hand, they highlight that this video was selected because it is from an official European mass media company, and because it is centred on the concept of Big Data in Education specifically.
After watching the video, participants started answering the items. The scale contains three subscales: (1) assessment of positive aspects of Big Data applied to education, (2) assessment of negative aspects and (3) emotions that Big Data induces in those surveyed. Participants rated their agreement with each item on a 5-point scale: positive and negative issues (1 Not at all, 2 I think not, 3 I don't know, 4 I think so, and 5 I strongly agree) and emotional items (1 Nothing at all, 2 Almost nothing, 3 I don't know/I'm indifferent, 4 Something, 5 Totally).
The instrument is completed with a series of sociodemographic questions (age, gender, residence, university, etc.) and the question "Are you inclined to use Big Data in your future job as an Educator?" The procedure started with an agreement with Spanish and Peruvian university teachers contacted via email. They informed their students about this research and asked for their voluntary participation. The data was collected from March to November 2019 via an online Form Application by Google. The form includes instructions for respondents regarding how to distribute and answer it. For this purpose, the online form was available with a password given to professors and teachers of the participating students. After the period of collecting data, the application remained closed, preventing uncontrolled access.
An exploratory factor analysis (EFA) was conducted to examine the psychometric properties of VABIDAE. After checking that the sample was big enough (more than 150 cases and at least 5 cases for each variable), as is proposed by Pallant [16], the EFA was conducted. Specifically, the principal component axis factoring with Equamax rotation was conducted to examine whether subscales emerged and to analyse the items' consistency, in accordance with the advice by Carretero-Dios and Pérez [17]. Equamax was developed in order to maximize loads in both components and variable. On the other hand, the eigenvalue over one and the scree test were considered as criteria in order to extract the number of factors. In this sense, loads below 0.40 were considered as low [18,19].
Following, a confirmatory factorial analysis (CFA) was conducted based on the previous exploratory analysis result. A CFA provides a more powerful method than EFA to determine the best-fitting factor structure of the scale, because individual items are a priori, predicted to only load on their theoretically driven latent variables rather than loading on all latent variables in the exploratory factor analysis [20]. A maximum likelihood estimation (MLM) was conducted because it has been shown to perform well even under non-normal conditions [21]. This CFA was developed without splitting the sample into two groups, considering it more important to use all data than comparison between results from two analytical strategies. This approach was considered more coherent with the aim of this study, what it is to identify the empirical validity of VABIDAE. On the other hand, both EFA and CFA results were taken into consideration, being contrasted to each other as a triangulation methodological approach. This strategy facilitates the identification of overfitting in this case.
The following goodness-of-fit indexes were developed: ratio Chi-square-degree freedom, where ratios less than 3 are considered acceptable [22]; comparative fit index (CFI) obtained from a free-distribution estimation due to the ordinal scale of the observed variables, where values greater than 0.95 indicate a good fit of the model [23]; root-mean-square error of approximation (RMSEA), where values smaller than 0.05 indicate a good fit of the model and values up to 0.08 represent a reasonable error of approximation to the population [24]; and standardized root-mean-square residual (SRMR), where values smaller than 0.08 indicate a good fit of the model [25,26].
Next, the reliability of the subscales was conducted using Cronbach's alpha coefficient (α) and McDonald's omega coefficient (Ω). Then, all VABIDAE scores on the subscales were calculated by averaging their items.
The convergence and divergence validity were not calculated, because no other scale measuring Big Data attitude was found. Using any other instrument would involve making a decision without enough evidence to compare to VABIDAE. For the analysis, the R version 3.6.1 (R core Team, 2019) and SPSS version 24 were used.

Exploratory Factor Analysis
An exploratory factor analysis was conducted using the principal component analysis approach. The Kaiser-Mayer-Olkin measure was 0.90, Bartlett's test of sphericity was statistically significant (Chi-square = 5672.291; d.f. = 465; p < 0.001) and the determinant of the correlation matrix was practically 0 (D = 2.61E-008). These results suggest that sampling adequacy was acceptable and that factor analysis is appropriate for the data.
The analysis offers six component solutions with 64.6% explained variance after extraction. The results after the Equamax rotation is available in Table 1. Every component was given a title and an interpretation depending on the items loading on them. To elaborate the meanings, every item was interpreted within the component where it had the highest charge.

1.
Negative feeling: negative emotions and emotional states that appear when thinking about Big Data, such as guilty, angry, shame, etc. 2.
Negative impacts: how Big Data could have negative social consequences, mainly related to the educational system and democracy. 3.
Positive impacts: benefits in the educational results because of Big Data. 4.
Educational system improvement: benefits in educational organization and teacher recruitment. 5.
Positive feelings: good emotions and emotional states related to Big Data in school.

6.
Privacy: concerning in relation to privacy loss and a possible increase in governmental control.
Control of the education system by the Control of the education system by Loss of the school's own socialization - Better meet the needs of students

Confirmatory Factor Analysis
From data, two models were tested. The first examined the six-factor fit from the Principal Component Analysis (PCA); the second tested the three-component structure according to the three original VABIDAE subscales. Table 2 has the fit index from both models. The first six-component model provides a good fit to the data, with high RMSEA and SRMR indexes, although only the Chi-square suggests an acceptable fit. On the other hand, the Bentler CFI also suggests a fit, although low. Regarding the three-component model, only the RMSEA index suggests an acceptable fit, while the rest of the indexes show a poor fit. In this sense, the AIC, BIC and ratio Chi-square/degree freedom statistics support that the first model is better. Overall, these results say that the six-component model is significantly better than the second three-component model based on the theoretical structure and that fits the data well. The six-component correlations are shown in Table 3 and model parameters are available in the annex. Both models can be well interpreted from Psychopedagogist theories, such as a three-model attitude, the theory of reasoned action, etc. Nevertheless, the six-factor model is retained because of its better CFA indexes, and because it offers more specificity. The Parameter Estimates are available in Appendix A.

Item Characteristics and Internal Consistency
Because the six-component model had the best fit, both descriptive statistics and internal consistency were analysed, global scale and subscales. The internal reliability for the global VABIDAE scale and subscales was analysed using the Cronbach's alpha coefficient and McDonald's omega coefficient. The Cronbach's alpha was 0.86 and the McDonald's omega 0.873. Then, a good internal consistency from the VABIDAE measurement was considered. It was tested that the consistency does not improve dropping any item. The internal consistency was tested for every subscale too (see Table 4). No coefficients improved by dropping any item, so all of them were kept.

Descriptive Statistics from Scale and Subscales
Finally, the descriptive statistics for subscales by country were calculated in order to know how the sample was according to VABIDAE. Results are available in Table 5.  Table 5 shows that subscales are biased, and that there are statistical differences between countries. In general, scores from Peru are more extreme as Table 4 shows. In this sense, the positive feelings average is significantly higher in Peru than in Spain; meanwhile, the average is less for negative feelings. In the same way, the educational improvement factor has a higher average in Peru than in Spain. On the other hand, the negative impacts factor has a similar average, although dispersion is higher in Peru, hence that Student's t is statistically significant. Eventually, Student's t test analysis was conducted using gender as a factor, although no statistical differences between male and female results at alpha 0.01 were found.

Discussion
The results show that VABIDAE has high internal consistency measurement, showing coherence about the Big Data effects, both positive (teaching improvement, curricular adaptation, etc.) and negative (privacy issues, isolation, etc.). Likewise, based on exploratory factor analysis, the internal structure of the instrument could be considered as valid. However, the original theoretical structure of the instrument needs to be revised, as the six-factor model is supported by confirmatory factor analysis.
The latent structure suggested by the EFA and the CFA recap the issues highlighted about Big Data by the reviewed literature, including negative and positive issues, both emotional dimension and in the impacts, opportunities and threats of Big Data in society, and specifically in education. [27]. However, the "improvement of education" and "privacy" emerge as new topics, suggesting that participants give special importance to these issues. It is important to point out that the "improvement of education" has a positive aspect and "privacy" has a negative aspect. All of this is consistent with what is presented in the literature related to Big Data potential and its threats [28,29].
On the other hand, the latent structure of VABIDAE is coherent with Rosenberg and Hovland's three-dimensional models of attitude [30]. According to this, the attitude is a three-dimension-based construct: cognition, affection and behaviour. VABIDAE includes questions about the participants' thoughts about Big Data (cognition), and about what and how they feel about Big Data (emotion). Also, it includes one question about intention; however, it would be difficult to consider it as a behavioural indicator. It would be worth studying this relationship in more detail, considering that attitudes have been shown as a key factor in making the decision to use information and communication technologies in the classroom [31].
Regarding the second aim, this research also studied the profile of the participants by countries, because the differences between countries are evident [32]. Nevertheless, it can be said that a more positive than negative view predominates over the potential of Big Data for educational purposes. Also, considering that participants point out that Big Data could improve teaching, the predisposition, as in other technologies, was to be expected [33].
Although all results are very promising, it is important to take into consideration that the sample could be small and potentially unstable in replication studies. According to the results, the VABIDAE six-structure factor should be considered. Besides, new research should be developed with a more heterogeneous sample that includes students of other countries and educational disciplines. Also, it would be necessary to analyse VABIDAE including moral and ethical elements, coping values, pedagogical adjustment, etc., broadening its evaluation spectrum.
On the other hand, the VABIDAE structure seems to be flexible enough to measure the opinion, perspective and confrontation of the educators with other emerging technologies, such as virtual reality, augmented reality or educational robotics. For this purpose, versions of VABIDAE should be developed by adapting the items to that new technology or modify it, as necessary, so it may be applied independently of the technology analysed.
Therefore, VABIDAE is valid (in terms of measurement and assessment) to evaluate the future educators' stance on Big Data. In this sense, it is an instrument that can help the implementation of Big Data in schools, in a correct way. The data generated in the classroom has to be managed by the decision-makers. This includes school managers and teachers. The inclusive school benefits from the real-time knowledge of the social dynamics that take place in each specific school. Therefore, Big Data is the optimal medium for inclusive decision-making. However, the commitment of teachers and managers (teachers, managers and stakeholders in general) is indispensable. Only in this way can the challenge of inclusion in education be efficiently addressed, and with it, sociocultural sustainability.
In short, via data collected through different platforms and technological applications of administrations and schools, this medium can provide valuable information to those responsible for establishing educational policies, curriculum adaptations or educational support programs. To do all this (social sustainability from education), it is necessary to have instruments that measure the level of knowledge, opinions and emotions that educators have in relation to this technology, and therefore we have to create instruments and validate them.

Conclusions
As a conclusion, it can be stated that the results show that VABIDAE can be used as an instrument to measure how future educational professionals perceive and confront Big Data in Education. This scale could be useful in higher educational institutions, mainly related to education teaching, to help teachers and educational managers to know the standing and attitudes of their students and teachers, professors and lecturers on this technology, and then to make decisions regarding how to implement it.
Likewise, it seems that a generic instrument can be developed that values the position of the educator in relation to new technologies based on VABIDAE, which is especially important in a socioeducational reality of continuous change.
Big Data is a powerful new ally that allows new knowledge and systems to be generated-in real time-to improve decision-making. This is fundamental in teaching and learning processes.
Finally, it is important to emphasize the willingness of students to integrate this technology into their professional reality even though they are aware of its problems and risks. Funding: This research did not receive any external funding.