Educational Warehouse: Modular, Private and Secure Cloudable Architecture System for Educational Data Storage, Analysis and Access

: Data in the educational context are becoming increasingly important in decision-making and teaching-learning processes. Similar to the industrial context, educational institutions are adopting data-processing technologies at all levels. To achieve representative results, the processes of extraction, transformation and uploading of educational data should be ubiquitous because, without useful data, either internal or external, it is difﬁcult to perform a proper analysis and to obtain unbiased educational results. It should be noted that the source and type of data are heterogeneous and that the analytical processes can be so diverse that it opens up a practical problem of management and access to the data generated. At the same time, ensuring the privacy, identity, conﬁdentiality and security of students and their data is a “sine qua non” condition for complying with the legal issues involved while achieving the required ethical premises. This work proposes a modular and scalable data system architecture that solves the complexity of data management and access. On the one hand, it allows educational institutions to collect any data generated in both the teaching-learning and management processes. On the other hand, it will enable external access to this data under appropriate privacy and security conditions.


Data Revolution: Industry, Society and Education
The industrial revolution 4.0, where big data is considered the core, fosters great advances in the technification, digitalization and datafication of the business sector. As Marciano et al. said, "We are just at the beginning of this co-evolutionary path leading toward an epochal revolution. This process is affecting all sectors and all countries." [1]. Some technologies such as big data, artificial intelligence and machine learning stand out in this revolution, all becoming increasingly present in society in various forms, such as platforms or mobile apps due to the adoption of cloud computing by the business sector. Two main reasons support this adoption, one economical and the other technological. Gong et al. set forth the economic perspective of cloud computing as an "economic pattern as the main reason why so many companies jump into the hot pool of cloud computing". He also described its main technological features as "service-oriented, loose coupling, strong fault-tolerant, business model and eas[y to] use" [2] (p. 1). The ease of use of cloud computing combined with virtualization has made it possible to reduce costs on data access and storage, and this has been a huge game changer in the business sector. As stated by Marston et al., "one of the significant opportunities of cloud computing lies in its potential to help . . . upfront investments that have stymied past efforts . . . . Small businesses can exploit high-end applications like ERP software or business analytics that were hitherto unavailable to them." [3][4][5] (pp. 181-182).
The union of these technologies has directly impacted society, with the Internet of Things being a clear example of the ease of everyday tasks using thermostats, surveillance cameras and buttons to dispense toilet paper. This trend is expected to follow an exponential growth in terms of technical change and socioeconomic impact [6,7]. Natural language processor technologies based on machine learning or deep learning are examples providing correction and translation services online. Artificial intelligence-based technologies capable of recreating non-existing human faces are sometimes used to create fake videos promoting legal and moral debate. This can be seen in the works of authors such as Perc et al. regarding the juristic challenges of artificial intelligence [8], Bechmann and Kim regarding research ethics in big data [9], Dixon-Román and Parisi regarding the growth of data capitalism and ethics in artificial intelligence [10] or Coghlan et al. regarding artificial intelligence in an education context [11].
Many solution stacks are based on a cloud computing centralized structure to offer the same product to everyone. Despite that fact, these technologies cannot be placed on closed environments such as a mobile app because its development is continuous and the machine learning process requires quality data to improve, providing a big data environment and interceding for it. Machine learning provides an accurate and precise solution if the dataset on which it is based contains lots of quality data. That is why these technologies require lots of real-time data to offer a decent service, with its mistakes included, even in the research context, in the need to balance privacy when creating knowledge for society's good as stated by Bechmann and Kim [9]. Some technologies, like machine learning, are not based on new concepts. Some authors already acknowledged this concept in the 90s [12], but the amount of data creates a need for those technologies. To sum up, the business sector offers machine learning-based services on cloud computing environments, promoting a society-business interdependent cycle with data lying in its central point. In this scheme, ethics [13], data privacy and trust [14] are essential to a fair balance of power. This is defined as the main statement for data capitalism, a concept related to a series of social disconformities provoked by big technological companies. Some authors such as Bellamy [15], West [16] and Zuboff [17] give the name "Surveillance Capitalism" to this new data-based social era where this interdependency is unbalanced against the citizen. Some examples can be found in fake news [1], data leaking [18], society manipulation and digital attention economy [19] besides a lack of privacy and security.

Educational Context
Unfortunately, the education sector faces a similar situation. This context reflects the evolution of the society-industry binomial. Its growth rate is lower, but there is a trend to apply these data-based technologies coming from the business sector [20]. To be precise, the educational sector is the new target for companies to increase their profits, as many governmental initiatives such as Spain's HAZ [21] and UK's educational space standardization [22] are introducing companies to this sector as a new market [23,24]. It would not be surprising if these companies could offer big data technology applicable to the educational sector.
From this first contact between business towards education, the benefits of applying big data technologies and methodologies will undoubtedly be experienced [25][26][27], since they are primarily accepted as positive by society. The available data should explain the evolution of the educational context, the teaching process, learning-teaching processes, academic performance and even behavior, aiming at improving quality and results. Due to this fact, many business technological practices are incorporated into the educational context. These specific practices are big data, machine learning, artificial intelligence and Appl. Sci. 2021, 11, 806 3 of 18 cloud computing. Combining those technologies causes the loss of ownership of the data generated by educational institutions due to the digital learning deployment environments from the educational centers in the control of the business. This migration of educational services is due to the resource administration and budget reduction implied by cloud computing. This new context of outsourced Educational Technology (EdTech) offers automatic processes such as educational data mining, learning analytics, academic analytics, multimodal learning analytics or a new generation of smart student advisors, merged all together as a new digital ruleset. This new EdTech, owned by biased algorithms [28,29], makes it easier to automatize decisions, overriding human criteria and opening contemporary debates in a social and educational context [30].

Data-Related Issues
The big data technological stack applied to education implies a big revolution. Still, it also enhances distrust among educational institutions by using these analytical technologies, as it provides an unreliable context and a loss of control [31]. Some examples of these analytical technologies are the massive automatic decision-making; massive sensible data collection from students [32]; unauthorized access to data [33]; enormous filtering, analysis and predictive tools against students will [34]; or data transfer without a legally defined relation [35]. There is an unstable situation in collecting, treating and analyzing educational data, metadata and personal data [31].
The enthusiasm for integrating big data processes, data-based decision-making, data processing and even international transfer between countries has, in some cases, led to problems of misuse, filtering and improper access [36]. This educational data revolution raises concerns both ethically and in terms of exposing students' privacy, identity, confidentiality and security of data, personal data and metadata (PICSDPDM) [32,35]. An example of this is the use of learning analytics in educational processes, which has, since its inception, aroused mistrust of data collection and processing processes [37]. We should not rule out the privacy and security weaknesses of the various EdTech companies that facilitate massive data theft, as shown in data breaches. It was experienced among US schools [38], in student loans companies [39] and even public administrations that allowed thousands of parents' and students' data to be left uncovered, as happened in Madrid [40] and Catalunya [41].
Facing this devastating picture, we focus on PICSDPDM-compliant solutions, such as encryption [32,35] or visualization [42] plugins for Learning Management Systems (LMS) or even ethics principles in educational data analytics [33]. From this point of view, the legal system is far from regulating every technological issue, as its growth rate is far greater than the legal evolution, as in the Blockchain's case, an example of technology already used but far away from compliance of regulations such as General Data Protection Regulation (GDPR), as shown by the work of Amo et al. [33] and of Lafarre and Van der Elst [43]. Despite these differences, the legal system is working towards standardization. Legal frameworks as GDPRs created a base, while in the EU, it assures the chance to deploy the technologies mentioned before (big data, machine learning, learning analytics and cloud computing). Due to massive espionage laws enacted in the USA, there is a collection of doubts pointing out its unreliable status, as shown in the "Schrems I" and "Screms II" cases [44][45][46][47][48][49]. These two resolutions automatically invalidated the US-EU Privacy Shield [50], hence reinforcing the already existing GDPR legal framework in Europe [51] or Data Privacy Law in California [52] or even forcing the development of legal frameworks worldwide to assure data exchange such as the Data Protection & Privacy in Australia [53]. There is a general awareness about privacy for everyone (including students). Still, there is a long way to solving it in a legal context, as current laws are oriented towards correcting instead of preventing. We believe in the previous statement and agree that the use of technology may lead to a balance.

Balance between Punishment and Prevention
As previously stated, there is a complicated ongoing situation in data management in the educational sector, specifically in the student-university relationship (within the university level) and in the entity-administration-university relationship (interuniversity status). A solution is required that will allow data access and management across entities, considering that entry could be public and private. Data's nature, fragile and sensible, and the need to protect it, ensuring or even sharing its open core, affect its refinement, visualization, access and exchange. This requirement was reason we create a technological structure that respects the given concerns (data analysis, access management and privacyfriendliness for this sector).
Regarding legal matters, we have stated that, legality, as of today, this (solution/ proposal/approach/structure mentioned above) is very far from regulating emerging technologies. We believe that it is more corrective than preventive. There are still problems concerning trust and loss of control in managing educational data that are not avoided due to legal loopholes. This situation generates an asymmetry of power that tilts the balance in favor of the technology companies' profits, leaving users who use their services at their mercy and without many desirable privacy settings. An example of this is the analysis by Norwegian Consumer Councils. It highlights the dark patterns that those technologies such as Google, Facebook or Microsoft use to reduce the privacy options on their devices [54] and to even force the users to accept being tracked continuously [55]. The guide published by the Norwegian Data Inspectorate is another example in an educational context of how to monitor and profile students when using Google tools [56].
We are aware that there are as many decrees and regulations (from now on, laws) as there are different jurisdictions, some of them more prone to protecting the citizens' PICSDPMs, including educational roles, and some less.
To reach a balance between correction and legal prevention and to avoid power asymmetries, we believe that it is necessary to have a technological stack that automates every jurisdiction's legal framework by default and design. In this approach, the educational data's control must be the first point to solve, either through pseudo-anonymization, anonymization or encryption procedures. We affirm and demonstrate that the previous sentence is made possible using developments made by some of the authors of this paper, such as the Protected Users plug-in [32] for the Moodle LMS. The Protected Users plug-in allows any user to adopt a second identity and to remain anonymous for whatever reason required, such as situations of harassment and cyber bullying, or gender violence, or for any other cause that requires him or her to remain anonymous. Regarding GDPR, students have the right to anonymize their identity in any of the courses they are enrolled due to those mentioned and other undesired causes. The policies and data privacy plugins for LMS solve general issues raised by the GDPR but usually do not allow students to be anonymous in any enrolled course. Ensuring PICSDPDM of educational roles is possible with a technology stack that automates laws. Therefore, we are faced with a need for acceptable practices in EdTech and a data management architecture that facilitates them.

A Proposal for a Modular and Scalable Architecture
Technology is always present. However, law and ethics are not found implemented together in some educational technology solutions. Therefore, we believe that the triad formed by laws, ethics and technology must always be present in order to face the problems already mentioned, now and in the future. Regulations are inevitable, but laws should not be seen as the new ethics. EdTech provides classrooms with a set of digital tools that may enhance learning but neither diminishes its possibilities nor raises privacy data problems and concerns. LegalTech offers legal protection as a default [8,[57][58][59], which is understood in this paper as the technology to help the data privacy officer and legal department enforce the law. Hence, somehow LegalTech and EdTech should unite as a holistic and integral solution to ensure educational data privacy and security.
From our point of view, the educational sector requires a technological stack that • allows storing, refining and analyzing educational data following the law and ethics in a way that respects all academic roles. Institutions have to be capable of applying and automating the law's rigor in their educational solutions by default and design. Nonetheless, they should also be agile and flexible in applying their moral and ethical principles, as stated in their educational project's mission, vision and scope, without breaking the legal regulation. • gives the educational centers a choice between a local deployment or a cloud computing scheme but prioritizing local deployment. We propose that approach in detail in our Local Educational Data Analytics (LEDA) framework's principles [60]. We expose the need to improve first local ad hoc solutions as an additional solution and to acknowledge the perks of using the aforementioned technological stack (big data, machine learning, artificial intelligence and cloud computing) on education yet refusing its malpractices. The seven principles of the LEDA Framework are (1) legality; (2) transparency, information and expiration; (3) data control; (4) anonymous transactions; (5) responsibility in the code; (6) interoperability; and (7) local first, where we advocate for this principle to be considered by every institution to increase the control over educational data.
There is a need for a proposal that can draw a relationship between technologies and law in equal terms, considering a complementary ethic approach, to solve the needs for storing, analyzing and sharing educational data over a privacy-and security-compliant environment. Hence, we thought of a technological solution that can be adapted to present and future problems, incorporating premises related to privacy and security of data protection, legal protection and ethics by default and by design.
We introduce such a proposal in this document. Section 2 presents a replicable solution in detail and how this architecture aims to protect educational data. Section 3 describes some case uses for this solution deployed at La Salle, Universitat Ramon Llull (La Salle-URL). Finally, we discuss and expose ongoing projects and future lines.

Educational Warehouse
We chose the term "educational warehouse" for the fusion of EdTech and LegalTech. It can be defined as a modular platform for educational data analytics, with a "local first" approach architecture but scalable and cloudable if needed. Hence, we found it necessary for the solution to have the following technical features: • to be modular; • to permit both a rigid and hybrid scenario (local or cloud computing) but with a "local first" approach; • to use a decentralized scheme, totally or partially; • to be scalable in terms of volume of data, processing capacity, and public and private access; • to be technically adaptable to any educational context; • to be private, permissioned and temporalized; and • to automate law compliance to treat safety and privacy as transversal axes of the solution by default and by design.
The goal of our proposal is to build a context of confidence and absolute control over data. For this reason and taking the technical features into account, it acts as a digital entity that should offer the following: • total control over data, both modular and local, once they get into the architecture. Our modular architecture performs as a data bunker with private, permissioned access. A modular architecture permits the distribution of responsibilities, privatizing access to known users and creating a taxonomy of roles and temporary permissions for acting in different parts of the architecture to limit the power of access. Moreover, the "local first" principle put in the first instance places this solution on the opposite side of cloud computing, hence ensuring an airtight space. In short, it provides local control of who or what has access to data, what for and for how long.
• a set of acceptable practices. Our modular architecture facilitates the integration of good practices further than required by law and in compliance with each institution's morals and ethics. We will later describe a series of acceptable practices associated with each module of this architecture.
Therefore, we propose an architecture that facilitates regularizing and defining data management processes to ensure adequate privacy and security levels through technological automation of law and ethics.

Basic System Architecture
Educational institutions are continually generating data. Some of these data result from academic management processes such as administrative procedures, qualifications or certifications. In contrast, others result from the teaching-learning processes, such as feedback related to student's tasks or comments about their academic behavior. The analytic approaches to educational data must fulfil the requirements of both kinds of data. The origin of this data can be either internal or external depending on the EdTech services that are applied [61]. For example, in times of evaluation, there is a need to know the student's status and other data coming from different academic, learning and behavioral interactions; this data can be hosted both internal platforms and in third-party services available in the company's cloud.
In the same way, educational institutions can benefit from data generated by other institutions, that is, the case of research activities, where quality data are fundamental for the correct generation of results. Educational institutions generate a series of anonymous data that can even be considered open data. Those datasets, once depersonalized, are of great value for research. Moreover, institutions are obliged to share data with public administration and government services. The possibility of automating those obligations, liberating tasks now done by machines, facilitates data exportation or even the interoperable connection between the institution and the administration.
Based on those mentioned above, we designed a modular architecture that allows for data import, storage, analysis and even external access. This modular characteristic facilitates such scenery, being flexible, adaptable, decentralizable and scalable. It is flexible and adaptable so the different modules can be implemented using the technology required in every case and can be located either inside the institution, in the cloud or in a mixed local-cloud implementation. This arrangement permits adaptation to every institution's needs or even decentralization. Decentralization may allow the replication of the primary system's architecture and may modulate it as needed, whereas scalability would take place at a module level in conjunction with cloud computing. Figure 1 describes the system architecture layers implementing the aforementioned characteristics.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 7 of 20 Figure 1. The educational warehouse architecture: data are exchanged from different sources to a collector module where it can be processed and then sent into record storage. When there is a petition to analyze data, the Data Intelligence and Visualization module (DIV) accesses the storage to obtain the required data and stores its results. To make the request, the stakeholders use the data access interface module.
Each educational warehouse integrates the following essential modules: Extract, Transform and Load (ETL), Learning Record Store (LRS), Data Intelligence and Visualization (DIV) and Data Access Interface (DAI). It is possible to customize them internally or to even add new ones according to each institution's requirements and needs. Some modules can be disabled if necessary. For each module, we proposed a series of possible Figure 1. The educational warehouse architecture: data are exchanged from different sources to a collector module where it can be processed and then sent into record storage. When there is a petition to analyze data, the Data Intelligence and Visualization module (DIV) accesses the storage to obtain the required data and stores its results. To make the request, the stakeholders use the data access interface module.
Each educational warehouse integrates the following essential modules: Extract, Transform and Load (ETL), Learning Record Store (LRS), Data Intelligence and Visualization (DIV) and Data Access Interface (DAI). It is possible to customize them internally or to even add new ones according to each institution's requirements and needs. Some modules can be disabled if necessary. For each module, we proposed a series of possible actions as an example of good practice to ensure PICSPDM. As aforementioned in Section 1.5, each module may be affected by the local jurisdiction. Therefore, the technical actions and good practices associated with the ethical-legal framework should be adequate ones as determined by law.
• ETL: This is the module that allows importing data from abroad. It contains the software and hardware in charge of importing data from different data sources (management and educational software of the institution, EdTech, LMS, etc.). The content imported to the ETL module can be of an encrypted and anonymized nature. Once stored, it cannot be identified by any entity but the users involved, i.e., using Pretty Good Privacy (PGP) encryption technology [62]. It can also be managed securely by employing secure connections, such as Secure Shell (SSH), Secure Socket Layer (SSL) or Virtual Private Network (VPN). Regarding data management, data registries can be transferred entirely and removed from their origin, just as data from users that have not offered or consented to processing their data can be excluded as well. In the LMS or EdTech, the educational warehouse administrators may not have management competencies. However, they may demand the same good practices to ensure PICSDMPD or other more convenient ones. For instance, in the Moodle LMS, some of the authors managed to encrypt the user table since most of the personal data reside there and used a set of views that allow decryption or encryption at convenience. This measure can be complemented by ensuring students' anonymity through second identities with the Protected Users plugin [32], thus not exporting data that can be used to identify people. The entire database can also be encrypted with a double user control system where not even the administrator can access or decrypt the data. Some of the authors of this paper conceived such a system and named it AuthChecker [35]; simple authentications, private and permissioned Application Program Interfaces (API), and Learning Tools Interoperability (LTI) technologies are recommended to ensure data access by manually specified roles. • LRS: This is the module where data are stored. It contains all that software and hardware needed to store the data in the original format or as the result of transformations and analyses. The private and secure nature of data can be guaranteed by defining temporary and regulated accesses, with users generated by a previous contract with the institution; by storing encrypted data; by or applying change control protocols (integrations through high-level log systems, definition of non-editable data, etc.), among other practices. A set of essential, contractual permissions can be defined automatically in the user creation process. This is the case of accessing one's action history or the acceptance or rejection of data transfer agreements. Each user profile may request an extension of its capabilities under a manual review process; this process may generate a set of legal bonds that will detail the relationship between the user and the system. The reviewers, i.e., the data privacy officer in the case of GDPR, may reject the request. A set of permissions that do not require an extensive legal bond can also be offered using under-request automatic approval procedures, allowing easy access to research data and collaboration for open data consortiums.
Stakeholders play an essential role in the design of the architecture of the system. They somehow help to shape it in terms of data access. From the perspective of analytics, we make a category of stakeholders [63,64] that allows us to embrace the most critical roles in teaching-learning processes, management and research: Macro-level analytics tries to make analytics (data) accessible between institutions and third parties. For instance, this level's objective is for educational institutions to be able to access statistical data from state exams carried out by students throughout their lives or to facilitate administrations to be able to access statistical data from institutions. It is fed with the information generated by the meso and micro sublevels. The results at this level are reflected in a possible transformation of the institutional system (school, university, educational organization, etc.) and changes in academic models or pedagogical approaches.

•
Meso-level analytics operates at the institution level through business intelligence. The objective of this level is, among other institutional aspects, the improvement of the different educational processes at the institutional level and strategic business decisionmaking, for example, to identify those courses that are more effective or functional. Its benefits are linked to the optimization of decision-making at the administrative level, the increment of educational "production" and even improvements in resource allocation. • Micro-level analytics: This level addresses the analysis of the interactions carried out by every student, both independently and as a part of a group. These analytical processes include personal and sensitive student data, such as book loans, geolocations, financial data, social media conversations or even the clickstream of virtual learning environments. The benefits of this level result in a system that can identify students at risk, alert of possible dropouts and even provide students with conclusions and advice that can help them improve. The micro level is intended to be introduced to coordinators and instructors who deliver content to students and evaluate their work. • Open-data analytics: all educational institutions are likely to generate anonymized data. Making them public in a raw format or even in a processed format evokes a desire for transparency and open knowledge that can be advantageous to society. This open level of data is accessible to any independent person, research group, third party or citizen who requires access to data that educational institutions, especially public ones, can make available without violating student data privacy or security.
Modularity constitutes the alma mater of the educational warehouse architecture. We are not the first to propose an architecture of this kind, but our efforts to define a flexible architecture that stands up for data privacy and security goes much further than the previous existing solutions.
Firstly, it is differentiated from traditional database management in that it adds a data management scheme that includes several layers of privacy and security. It is also possible to distribute the architecture partially between the institution and cloud computing. It also fosters research by allowing data access under open data consortiums or by accessing data with API systems further than merely creating and sending Comma Separated Values (CSV) files or datasheets under request. Although it adds some complexity to data management, it can perform the same operations as those of a traditional architecture database. However, the set of technical actions and good practices in a modular structure define a completely different mode of operation.
Secondly and referring to authors who previously proposed similar architectures, our proposal evolves in some ways from those previous ones. Aziz et al. proposed in 2014 [65] a linear architecture to carry out business intelligence over educational data. Their educational data warehouse solution has similarities to our LRS module. Flanagan and Ogata 2017 [66] also proposed a linear typology, centered on learning analytics but not yet implemented unlike ours and constrained to a mode of operation based on unique identifiers, where data entered the system from LMSs and ended in a user dashboard in a closed environment. Our solution is more flexible regarding technical proposals since it can even apply those by Aziz et al.; by Flanagan and Ogata; by other analytical approaches beyond learning analytics; and above all, by using nonlinear open topology. We refer to a linear typology where the proposed architecture has a start in the data input and an end in the user reports. The architecture that we offer, as shown in Figure 2, adapts to any typology where the data output from an educational warehouse can be the input for another educational warehouse. To further exemplify how to organize a topology with an educational warehouse, we use a network architecture simile. Each node is represented by an educational warehouse connected with others in a star, ring, tree or any other required topology, thus creating a multidirectional interconnected network of educational warehouses, emphasizing educational data privacy and security. possible to distribute the architecture partially between the institution and cloud computing. It also fosters research by allowing data access under open data consortiums or by accessing data with API systems further than merely creating and sending Comma Separated Values (CSV) files or datasheets under request. Although it adds some complexity to data management, it can perform the same operations as those of a traditional architecture database. However, the set of technical actions and good practices in a modular structure define a completely different mode of operation.
Secondly and referring to authors who previously proposed similar architectures, our proposal evolves in some ways from those previous ones. Aziz et al. proposed in 2014 [65] a linear architecture to carry out business intelligence over educational data. Their educational data warehouse solution has similarities to our LRS module. Flanagan and Ogata 2017 [66] also proposed a linear typology, centered on learning analytics but not yet implemented unlike ours and constrained to a mode of operation based on unique identifiers, where data entered the system from LMSs and ended in a user dashboard in a closed environment. Our solution is more flexible regarding technical proposals since it can even apply those by Aziz et al.; by Flanagan and Ogata; by other analytical approaches beyond learning analytics; and above all, by using nonlinear open topology. We refer to a linear typology where the proposed architecture has a start in the data input and an end in the user reports. The architecture that we offer, as shown in Figure 2, adapts to any typology where the data output from an educational warehouse can be the input for another educational warehouse. To further exemplify how to organize a topology with an educational warehouse, we use a network architecture simile. Each node is represented by an educational warehouse connected with others in a star, ring, tree or any other required topology, thus creating a multidirectional interconnected network of educational warehouses, emphasizing educational data privacy and security.

Use Cases
This proposal is set out to be used for different purposes. In this paper, we present different approaches where an educational warehouse can facilitate the management of educational data. Specifically, we propose three use cases, although the implementation of an educational warehouse is not limited to them.
The first case that we present describes a development carried out at La Salle-URL, where all modules are used and new features are developed. We implemented this structure in a local environment to regulate data access securely, controlling the environment and its access. Since needs are growing, the modules described in the architecture are made in different working environments to facilitate other teams' development.
At a practical level, the educational warehouse state implemented at La Salle-URL is described in Figure 3. Data are extracted from educational tools such as Moodle (LMS), third-party applications (such as Kahoot [67]) and public access (Open Data). Data are processed to reduce their volume and to get a better understanding of their nature. Once this data has been understood and adapted, it is stored in the storage module. Data related to the operation of Moodle and data associated with the validation of an educational method used in La Salle-URL [68] are currently stored. If a request is made to access the data, an access gateway or a client is enabled so data can be accessed securely. This case is presented in detail in the Section 3. ture in a local environment to regulate data access securely, controlling the environment and its access. Since needs are growing, the modules described in the architecture are made in different working environments to facilitate other teams' development.
At a practical level, the educational warehouse state implemented at La Salle-URL is described in Figure 3. Data are extracted from educational tools such as Moodle (LMS), third-party applications (such as Kahoot [67]) and public access (Open Data). Data are processed to reduce their volume and to get a better understanding of their nature. Once this data has been understood and adapted, it is stored in the storage module. Data related to the operation of Moodle and data associated with the validation of an educational method used in La Salle-URL [68] are currently stored. If a request is made to access the data, an access gateway or a client is enabled so data can be accessed securely. This case is presented in detail in the third section. A second possible case is based on providing and managing data for research teams, which require information coming from different entities. This information may be regulated under an agreement between entities or based on open data between educational entities. A summary of the required modules can be seen in Figure 4.  This second case shows how an educational data management system allows for creating synergies in research, as it offers the possibility to create a collaborative and easily accessible environment. To comply with good practice requirements, a researcher should This second case shows how an educational data management system allows for creating synergies in research, as it offers the possibility to create a collaborative and easily accessible environment. To comply with good practice requirements, a researcher should only adopt the educational warehouse architecture model and connect the DAI and ETL modules as the output and input gate. We believe that fields such as educational data mining or learning analytics can benefit from the synergies offered by different educational entities' collaboration. They allow the state-of-the-art to remain accessible to those interested or involved.
Finally, we present a third possible case, now aimed at institutions with limitations to implement an educational warehouse at the local level. Given this architecture's modular design, it remains functional as long as the different modules operate and the established privacy principles are respected. This means that its implementation, if desired, can use cloud computing, with the benefits that this technology brings and taking into account the risks involved. In the same way, a module can be fragmented, as it is possible that at a given point there is a volume of data large enough to require the application of big data techniques in cloud computing to store and analyze, such as MapReduce, clustering or similar [69]. Components of the same module can be located in different computing environments, as shown in Figure 5.
These three use cases, one of them already in execution and the other two drawing possible scenarios, demonstrate a solution that institutions such as universities can use to facilitate the administration, processing and regulation of educational data. The educational warehouse also permits collaboration in joint research and an easy way to regulate access to data, maintaining the balance between technology and the legal system. These three use cases, one of them already in execution and the other two drawing possible scenarios, demonstrate a solution that institutions such as universities can use to facilitate the administration, processing and regulation of educational data. The educational warehouse also permits collaboration in joint research and an easy way to regulate access to data, maintaining the balance between technology and the legal system.

Results
La Salle-URL implemented an initial version of the educational warehouse which permits the development of different research lines. The implementation of the architecture is progressing and being adopted in several in-house projects. It follows an end-toend path gradually; initially, the existing data are understood and then stored so researchers can access them. Finally, tools are enabled for analysis and evaluation.
One of the projects is based on the extraction of indicators to validate the viability of the educational methodology that was applied in La Salle, Self Directed Based Learning (SDBL) [68]. Using the educational warehouse structure, data were extracted, transformed and adapted to a subsequent local-level analysis database.
Another ongoing research project aims to extract generalized indicators of LMS user interaction with the educational warehouse to its fullest, applying all ETL, LRS, DIV and

Results
La Salle-URL implemented an initial version of the educational warehouse which permits the development of different research lines. The implementation of the architecture is progressing and being adopted in several in-house projects. It follows an end-to-end path gradually; initially, the existing data are understood and then stored so researchers can access them. Finally, tools are enabled for analysis and evaluation.
One of the projects is based on the extraction of indicators to validate the viability of the educational methodology that was applied in La Salle, Self Directed Based Learning (SDBL) [68]. Using the educational warehouse structure, data were extracted, transformed and adapted to a subsequent local-level analysis database.
Another ongoing research project aims to extract generalized indicators of LMS user interaction with the educational warehouse to its fullest, applying all ETL, LRS, DIV and DAI modules.

Moodle's Executive Interaction Board
A third project under development, named EIStudy, implements a tool to describe the students' behavior in a Moodle-based LMS. This tool provides the users of the virtual campus at La Salle-URL with a depiction of their interaction with the virtual facility. At the same time, it offers an API for the researchers to access those datasets. All functionalities of EIStudy follow a series of rules implemented in the modules of the educational warehouse architecture.
• ETL and LRS modules: Initially, the loaded data are based on the Moodle reports. Thanks to Moodle's relational model, information can be extracted that describes user's interactions with the platform using different levels of detail. • DIV module: This solution offers a micro-level analysis using students' highly detailed, sensitive, personal and behavioral data. The data are anonymized to preserve the student's identities while presenting related information. • DAI module: This interface's development to access data considers the LMS user roles (see Figure 6). The user is offered a set of indicators depending on whether they are a teacher or a student (see Figure 7). A teacher had access to viewing the data of the students in the course. A student has access to only their data but not that of other classmates in the course, thus avoiding privacy conflicts.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 13 of 20 the students in the course. A student has access to only their data but not that of other classmates in the course, thus avoiding privacy conflicts.
Thanks to this tool, it is possible for teachers to detect all interactions with resources related to the subjects they teach and to adapt their teaching process. Students are also allowed, in this case, to be aware of their own interaction with the platform, helping them to understand what kind of situation they may face passing a subject or asking the teacher for help based on the information displayed.
(a) (b) Figure 6. (a) EIStudy's client login form: the user must log in using their credentials to get data related to his or her profile. If there is a failure to log in, no data will be sent to the user. If it works (b), it will enable the petition that sends details on the courses the user is enrolled in. Figure 6. (a) EIStudy's client login form: the user must log in using their credentials to get data related to his or her profile. If there is a failure to log in, no data will be sent to the user. If it works (b), it will enable the petition that sends details on the courses the user is enrolled in.  Figure 7. (a) EIStudy's visualization toolL once the user creates a request to the educational warehouse modules, it offers data related to the specified content and its role. If the user is a teacher, he or she will be able to see "individual" values and "group" values, whereas students will only see "individual" values since they are not allowed to see data from other people in that context (b).

Discussion
Sentences such as "data are the new oil" or concepts such as "data capitalism" [16,70] highlight the importance of data as drivers of our knowledge-based society. From time immemorial, data has been a source of power and knowledge. That is why the first digitization of information and subsequent automation were so important in the context of businesses and in the beginning of the conception of computers [71]. In the business world, extracting or discovering patterns from data is useful to increase productivity and as a competitive advantage. Business intelligence [72] was born from these first computerized data analyses, which evolved as a series of systematized processes to exploit data both analytically and visually in search of relevant information.
Al Gore's "Information highways" [73], the digital interconnection of all continents [74] and the technological evolution of telecommunications have allowed business intelligence since the 50s [72] to take one significant step forward. These early technological revolutions in telecommunication together with the adoption of Internet-web technology at all levels systematically generate enormous amounts of data, thus pushing the emergence of big data technologies [1,9]. Big data is emerging in society and consolidating a fourth revolution where the Internet of Things functions as a gateway for the interconnection of virtual and physical space, blurring both realities. There is little or no differences between real and digital identity.
Regarding previous innovations, entities in the educational context are taking part in this revolution by following the same evolutionary pattern. First, Internet-web technology is being adopted when virtual learning environments or learning management systems become relevant, and then, analytical approaches such as educational data mining [75] or learning analytics [76] and their intersection [77,78] are adopted. Finally, big data and artificial intelligence technologies [79], such as facial recognition or brainwave analysis to Commen complete Figure 7. (a) EIStudy's visualization toolL once the user creates a request to the educational warehouse modules, it offers data related to the specified content and its role. If the user is a teacher, he or she will be able to see "individual" values and "group" values, whereas students will only see "individual" values since they are not allowed to see data from other people in that context (b).
Thanks to this tool, it is possible for teachers to detect all interactions with resources related to the subjects they teach and to adapt their teaching process. Students are also allowed, in this case, to be aware of their own interaction with the platform, helping them to understand what kind of situation they may face passing a subject or asking the teacher for help based on the information displayed.

Discussion
Sentences such as "data are the new oil" or concepts such as "data capitalism" [16,70] highlight the importance of data as drivers of our knowledge-based society. From time immemorial, data has been a source of power and knowledge. That is why the first digitization of information and subsequent automation were so important in the context of businesses and in the beginning of the conception of computers [71]. In the business world, extracting or discovering patterns from data is useful to increase productivity and as a competitive advantage. Business intelligence [72] was born from these first computerized data analyses, which evolved as a series of systematized processes to exploit data both analytically and visually in search of relevant information.
Al Gore's "Information highways" [73], the digital interconnection of all continents [74] and the technological evolution of telecommunications have allowed business intelligence since the 50s [72] to take one significant step forward. These early technological revolutions in telecommunication together with the adoption of Internet-web technology at all levels systematically generate enormous amounts of data, thus pushing the emergence of big data technologies [1,9]. Big data is emerging in society and consolidating a fourth revolution where the Internet of Things functions as a gateway for the interconnection of virtual and physical space, blurring both realities. There is little or no differences between real and digital identity.
Regarding previous innovations, entities in the educational context are taking part in this revolution by following the same evolutionary pattern. First, Internet-web technology is being adopted when virtual learning environments or learning management systems become relevant, and then, analytical approaches such as educational data mining [75] or learning analytics [76] and their intersection [77,78] are adopted. Finally, big data and artificial intelligence technologies [79], such as facial recognition or brainwave analysis to know students' attention or emotions are implemented. The Internet of Things is an integration, and we already see functional solutions that combine big data, machine learning, artificial intelligence and cloud computing [6].
However, the adoption of Internet-web technology as a means of communication has also set out exponentially, reflecting what Tim Berners Lee stated in his 1996 Request For Comments (RFC) [80]: "The Referer field allows reading patterns to be studied and reverse links drawn. Although it can be beneficial, its power can be abused if user details are not separated from the information contained in the Referer." Nowadays, lots of data types are collected to improve services as commodities for society and the educational context, risking the privacy and security of people's data and, consequently, both their real and digital identity. This has been the case in recent misuses, data leaks and improper access due to the high value of data in data capitalism platforms [70] where privacy and security are at risk because of daily surveillance [15]. In terms of the educational context, maintaining the privacy, security and digital identity of students and their data is even more critical. In many cases, data related to minors are also involved [81]. Although data analysis could bring benefits, there is an intense fragility in managing privacy and security of students' data in interconnected virtual environments and in exposing their real identity to different hazards, such as cyber-bulling or virtual harassment [32].
We need to become aware of this positive-negative dual context and develop solutions that address the different social issues beyond legality. Legality taken as the new ethic allows us to evolve obscure corners of humanity. It is the case of the mass failing grades of British university students deprived of access to the desired studies (A-levels) [30] or mass deportations of students for alleged copies in linguistic examinations that were evaluated automatically by algorithms full of legal rules (ETS-TOEIC) [82]. The limits of legality and technology, such as in biased results or surveillance technologies, are set by ethics.
As authors and citizens, we are aware of the data problems in educational terms, but at the same time, we know the benefits of processing this data. We believe that legality and ethics can be automated in technology. We must avoid using the law as the new ethic and complement the triad of legality, ethics and technology as the basis for real solutions. In this sense, it is possible to make the laws, corrective at first, preventive.
Cloud computing is a double-edged sword. On the one hand, it facilitates the collection and computation of massive data. On the other hand, it facilitates the exposure of students. We echo this and propose a framework of 7 principles called LEDA to minimize this exposure, prioritizing local technologies and leaving cloud computing as a last resource [60].
In this work, we take a new step and propose a system architecture solution that complements the LEDA framework and aims to achieve a law/ethics balance in the educational context mediated by technologies. The system architecture that we propose is modular. We name it educational warehouse, as it is intended to collect, analyze and manage access to educational data of an institution.
Modularization allows educational institutions to adapt the system to their peculiarities. We define four basic modules of the system: one to import the data, one to store raw and processed data, one to perform the analysis, and another one to allow regulated access from the outside. Without imposing any specific storage technology, software or hardware, we leave total freedom to institutions to integrate, for example, free or proprietary software or storage solutions in an open format such as xAPI [83] and/or to complement it with relational or documentary databases, hosting servers within the same institution or using cloud hosting at those points it deems relevant; enabling or disabling external access; and scaling modularity by replicating the system as islands within the same institution or interconnecting institutions or third parties such as other universities, educational services, administrations and governments, or even individuals if the educational institution deems it appropriate to publish its data in the open.
As a part of the new pedagogical change that the NCA project [84] is drawing up at the La Salle institution, we implement this solution to provide technical support while complying with the law and enforcing Lasallian ethics. We carried out further developments that we presented in the Section 3, where the educational warehouse has allowed us to modulate and continue to develop new interconnected solutions to facilitate and enhance teaching and learning processes.
Our system architecture proposal fosters new research lines to identify new interoperability options between tools and institutions, to identify new storage formats to facilitate the analysis of data, to define machine learning models from specific educational data and LMS, to establish levels of data access in relation to legality or to define good data analysis practices preserving privacy and security of students' data.