Big Data and the Ethical Implications of Data Privacy in Higher Education Research
Abstract
:1. Introduction
2. Big Data in HE Research
2.1. Ethical Issues of Privacy
2.2. Anonymity, Confidentiality and Privacy
3. Privacy, HE Research and the New European Perspective
- Firstly, the lack of GDPR explicitness regarding data property neutralizes the relationship between data-owner and data user. In more specific terms, there is no clear indication about who owns the new research data, whether it is the information subject (the student, teacher), the data custodian (the data collecting HE institution), the researcher (the added-value contributor to research), the society at large, or simply the ultimate data buyer (a HE stakeholder or any third party purchasing an interest in the data). Such lack of clarity additionally stems from the fact that some regulations treat data as information whereas others treat it as property, which complicates matters and poses data management risks arising from the everlasting nature of digital data.
- Secondly, there is a conflicting nature between the general provision of lawful data collection practices, scope limitation and personal data minimization on the one hand and the exemption for scientific research on the other. In other words, while other sectors of research (such as biomedical, health research) are more restricted in ensuring good data governance and rigorous data-centric security controls over how they obtain, store, process and analyze data—in HE research, the specific ethical issues are not tackled through the GDPR as the EU has no conferred competency to harmonize legislations in this field. Hence, the restrictive content of the controversial Article 89 that specifies rules that must be harmonized with the national or EU laws shows that the EU has merely a support competency leaving room for national legislations to be enforced. However, how far and how well can these safeguards capture all the mechanisms for personal data processing in HEIs and Big Data HE research? How is the tension between the public nature of HE research and the corporate stakeholders alleviated when dealing with sectorial Codes of Conduct, binding private stakeholder rule sand data protection seals? The implications of such safeguards and tools for the HE research communities that seek data protection standardization while maximizing data sharing advantages operate both within HEIs and between third parties (public or private) and HEIs. For example, in the case of a research making use of labor market analytics that may help universities identify jobs for their graduates and which may inform institutional/national/regional graduate tracking studies, the privacy protocols regarding the Big Data that are mined from millions of advertised jobs as well as the management of graduates’ CVs must be harmonized across the multiple entities that use and process that data. Furthermore, the data subjects’ interests must be weighed against those of third parties’ Data Controllers that must ensure data protection and privacy. Thus, while consent may look, within a university, like an attractive legal research protocol due to its ease of application, with third parties’ (public and private) or corporate stakeholders’ consent, it may be considered to be a more optional legal obligation, which means that it can be removed at any time by the participating data subject. This mismatch of visions, protocols and unstandardized approaches are likely to stall the research process, complicate collaboration and hinder data use and data portability.
- A third critical aspect concerns the type of consent in HE research albeit the GDPR defines and sets clear provisions for pseudonymization, encryption, informed consent and anonymized data. The broad nature of consent and the GDPR provision, according to which, under certain provisions, “data subjects should have the opportunity to give their consent only to certain areas of research or parts of research projects” (Recital 33 GDPR, our emphasis), are indicative of a problematic issue for the data collector-subject-user relationship that we explore within this study. A large-scale Big Data project, for example, examining a number of educational communities of students and professors alike in which anonymization of research subjects has been made, names of data subjects have been removed and informed consent has been obtained, is likely to contain pictures of buildings, streets and open spaces besides research data per se. The broad consent, may cover here, areas regarding both subjects’ personal data and their use for different project sections/areas. However, there is an unclear area concerning the consent regarding pictures. The difficulty in handling the pictures lawfully would allow for identification of both research-included students and out-of-survey pooled individuals in identifiable settings. Additionally, in HE research informed by social media analytics, data processing and student personal data are manageable under the provisions of a consent that is exclusively restricted to researchers (data collector) and participating subjects and not to data users (any third party, HEI or stakeholder) and/or data portability. Extending this critical issue, since Big Data analysis may lead to all sorts of discriminations, stereotype perpetuation, life-choice limitation, judgments, harassment, etc., the consent should be descriptive of the individual’s agreement to personal data processing, the scope of all activities agreed therein as well as, most importantly, the expectations that are likely to be violated if the subject agrees to provide personal data. That is, rather than describe what will be achieved via and during the actual research, the focus in a HE informed consent should be unambiguously on the “what?”, “how?”, “for whom?” and “for what?” (research part) in the undertaken research.
- Another important aspect is related to the fragmented nature of ethics in Big Data-based research and the uneven treatment of the data collector-subject-user relationship across the EU-28 HE systems. Sectorally, HE research is principled in ethical guidelines, codes of conduct, standards for ethical research practices, etc., and sets of criteria are defined for proper research conduct in order to maximize research quality and address research integrity. However, a single look at the EU28 National Codes of Ethics (NCE) shows that there is considerable variation, fragmentariness and diversity in approaches. While biomedical ethics is generally included in NCE, many countries have separate bioethics codes and activities carried out within separate national ethics councils for the life sciences. The European Code of Conduct for Research Integrity, published in its last 2017 version [52], has emerged as an attempt at unifying national approaches and visions, being relevant to both privately funded and public research (researchers, universities, funding institutions, academies, learned societies, publishers, etc.) despite its acknowledged limitations in use and applicability. Since not only social, economic political and technological factors but also changes altering the research environment are very likely to impact research regulating values and principles, the Code remains a living document that needs constant updating and harmonizing.
- A fifth factor refers to data portability, use and data-sharing issues in HE Big Data research. The legitimate need and necessity (of stakeholders, HEIs, other entities, etc.) to ensure (and thereby gain from) research accessibility to personal data often clashes with national privacy laws and Eurostat policies [49], which is why cross-border data transfer brings about issues pertaining to effectiveness and lawfulness of use and reuse of data. As a result, more effective data management systems, such as FAIR—based on the four foundational principles of Findability, Accessibility, Interoperability, and Reusability—are in place and have been more recently instrumentalized for fields such as digital humanities [53]. Since the context of the ongoing debate and critical reflection regarding data-sharing and reuse of data is very complex and multifaceted, we will only refer to the more general data management plans for data generated in HE publicly funded experiments that aim at striking a balance between the public data openness and privacy protection. Two conflicting trends define the current data ecosystem in HE research: On the one hand, the emergence of a variety of large-scale data repositories that range from a university to global repositories such as Mendeley Data, DataHub, FigShare, EUDat and include multi-formatted available data types is apt to complicate privacy protection of research subjects due to lack of insufficient restrictions on the deposited data descriptors. On the other hand, data management in HE research, in line with Open Science and FAIR principles, fosters an increasingly swift transition from human-readable data to machine-readable data, which requires the “interpretive” judgment of HE researchers to be coupled with the scientific effort of data scientists. While the former is involved in data (re)use and the latter in data collection and curation, the participating subject’s position is somewhat unclear, being, at best, fragmented between a wide-scoped, multi-user and multi-staged research. For example, in projects dealing with large sets of Moodle- or Blackboard-generated data on student interactions, academic learning progression, etc., personal data protection falls with the learning management system (LMS) administrator. The researcher whose interest lies in assessing students’ choices, attitudes, learning environment or progress over a range of tasks and period of time may use the information for general analyses; however, if more in-depth examinations are targeted and data collected on stigmatized, or otherwise sensitive behaviors are of research interest, the responsibility for the informed consent and privacy of subjects involved will rest with the researcher as well. In this case, data privacy implicates protecting the individuals whereas confidentiality involves protecting the information, hence, a subject’s personal data while involving both remains arbitrary and largely project-determined. Additionally, the distinction between “the information a subject provides” (such as personal data or by creating a LMS platform account) and “the information data users (researchers, third parties, administrators, etc.) get from their use of a subject’s data” underscores a policy and practices that any Google map user is likely to understand. Complications may arise when research is large-scale and multi-phased, when data is shared across multiple entities and when research involves large datasets of de-identified information whose amount of stripping in qualitative datasets may prove problematic for research data validation.
4. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
EU | European Union |
HE | Higher Education |
NCE | National Codes of Ethics |
GDPR | General Data Protection Regulation |
References
- Mehmood, A.; Natgunanathan, I.; Xiang, Y.; Hua, G.; Guo, S. Protection of Big Data Privacy. IEEE Access 2016, 4, 1821–1834. [Google Scholar] [CrossRef] [Green Version]
- Sokolova, M.; Matwin, S. Personal Privacy Protection in Time of Big Data. In Challenges in Computational Statistics and Data Mining; Springer: Berlin, Germany, 2015; pp. 365–380. [Google Scholar]
- Jain, P.; Gyanchandani, M.; Khare, N. Big data privacy: A technological perspective and review. J. Big Data 2016, 3, 25. [Google Scholar] [CrossRef] [Green Version]
- Lane, J.; Stodden, V.; Bender, S.; Nissenbaum, H. Privacy, Big Data, and the Public Good: Frameworks for Engagement; Cambridge University: Cambridge, UK, 2013. [Google Scholar]
- Hoffman, S. Medical Big Data and Big Data Quality Problems. SSRN Electron. J. 2014, 21, 289. [Google Scholar] [CrossRef] [Green Version]
- Mattioli, M. Disclosing Big Data. Minn. Law Rev. 2014, 99, 535. [Google Scholar]
- Khan, A. Book review: Shoshana Zuboff, The Age of Surveillance Capitalism: The Fight for Human Future at the New Frontier of Power. Soc. Chang. 2019, 49, 735–738. [Google Scholar] [CrossRef]
- Boyd, D.; Crawford, K. Critical Questions for Big Data. Inf. Commun. Soc. 2012, 15, 662–679. [Google Scholar] [CrossRef]
- Housley, W.; Procter, R.; Edwards, A.; Burnap, P.; Williams, M.; Sloan, L.; Rana, O.F.; Morgan, J.; Voss, A.; Greenhill, A. Big and broad social data and the sociological imagination: A collaborative response. Big Data Soc. 2014, 1, 1–15. [Google Scholar] [CrossRef] [Green Version]
- Barocas, S.; Nissenbaum, H. Big Data’s End Run around Anonymity and Consent. In Privacy, Big Data, and the Public Good: Frameworks for Engagement; Lane, J., Victoria, S., Bender, S., Nissenbaum, H., Eds.; Cambridge University Press: Cambridge, UK, 2014; pp. 44–75. [Google Scholar]
- Douglas, L. 3D Data Management: Controlling Data Volume, Velocity and Variety. Gartner Report. 2001. Available online: https://gtnr.it/2VqBPPs (accessed on 24 April 2020).
- Maneth, S.; Poulovassilis, A. Data Science. Comput. J. 2016, 60, 285–286. [Google Scholar] [CrossRef]
- Kitchin, R. Big data and human geography. Dialog. Hum. Geogr. 2013, 3, 262–267. [Google Scholar] [CrossRef]
- Snijders, C.; Matzat, U.; Reips, U.D. “Big Data”: Big gaps of knowledge in the field of internet science. Int. J. Internet Sci. 2012, 7, 1–5. [Google Scholar]
- Ward, J.S.; Barker, A.D. Undefined by Data: A Survey of Big Data Definitions. arXiv 2013, arXiv:1309.5821. [Google Scholar]
- Tolle, K.M.; Tansley, D.S.W.; Hey, T. The Fourth Paradigm: Data-Intensive Scientific Discovery [Point of View]. Proc. IEEE 2011, 99, 1334–1337. [Google Scholar] [CrossRef] [Green Version]
- Dede, C.; Ho, A.; Mitros, P. Big Data Analysis in Higher Education: Promises and Pitfalls. Educause [Review]. 2016. Available online: https://bit.ly/2HSWlk8 (accessed on 2 March 2020).
- Waller, M.A.; Fawcett, S.E. Data Science, Predictive Analytics, and Big Data: A Revolution That Will Transform Supply Chain Design and Management. J. Bus. Logist. 2013, 34, 77–84. [Google Scholar] [CrossRef]
- Klašnja-Milićević, A.; Ivanović, M.; Budimac, Z. Data science in education: Big data and learning analytics. Comput. Appl. Eng. Educ. 2017, 25, 1066–1078. [Google Scholar] [CrossRef]
- Mayer- Schonberger, V.; Cukier, K. Big Data: A Revolution That Will Transform How We Live, Work, and Think; Houghton Mifflin Harcourt: Boston, MA, USA, 2013. [Google Scholar]
- Greer, J.; Mark, M. Evaluation Methods for Intelligent Tutoring Systems Revisited. Int. J. Artif. Intell. Educ. 2015, 26, 387–392. [Google Scholar] [CrossRef] [Green Version]
- McKenney, S.; Mor, Y. Supporting teachers in data-informed educational design. Br. J. Educ. Technol. 2015, 46, 265–279. [Google Scholar] [CrossRef] [Green Version]
- Daniel, B.K.; Butson, R. Technology enhanced analytics (TEA) in higher education. In Proceedings of the International Conference on Educational Technologies (ICEduTech), Kuala Lumpur, Malaysia, 29 November–1 December 2013; Kommers, P., Issa, T., Sharef, N.M., Isaıas, P., Eds.; IADIS Press: Lisbon, Portugal, 2013; pp. 89–96. [Google Scholar]
- Florea, S.; Cecile, H.M. Governance and Adaptation to Innovative Modes of Higher Education Provision. Manag. Sustain. Dev. 2014, 6, 35–38. [Google Scholar] [CrossRef]
- Daniel, B. Big Data and analytics in higher education: Opportunities and challenges. Br. J. Educ. Technol. 2014, 46, 904–920. [Google Scholar] [CrossRef]
- Beneito-Montagut, R. Big Data and Educational Research. In The BERA/SAGE Handbook of Educational Research: Two Volume Set; SAGE Publications: New York, NY, USA, 2017; pp. 913–934. [Google Scholar]
- Clow, D. An overview of learning analytics. Teach. High. Educ. 2013, 18, 683–695. [Google Scholar] [CrossRef] [Green Version]
- Daniel, B.K. (Ed.) Big Data and Learning Analytics in Higher Education: Current Theory and Practice; Springer: New York, NY, USA, 2017. [Google Scholar]
- Prinsloo, P.; Slade, S. Mapping Responsible Learning Analytics. In Responsible Analytics and Data Mining in Education; Routledge: London, UK, 2018; pp. 63–79. [Google Scholar]
- Campbell, J.P.; DeBlois, P.B.; Oblinger, D.G. Academic analytics: A new tool for a new era. Educ. Rev. 2007, 42, 40. [Google Scholar]
- Shields, R. Following the leader? Network models of “world-class” universities on Twitter. High. Educ. 2015, 71, 253–268. [Google Scholar] [CrossRef] [Green Version]
- Souto-Otero, M.; Beneito-Montagut, R. From governing through data to governmentality through data: Artefacts, strategies and the digital turn. Eur. Educ. Res. J. 2015, 15, 14–33. [Google Scholar] [CrossRef] [Green Version]
- Van Harmelen, M. Analytics for Understanding Research: CETIS Analytics Series. Available online: http://publications.cetis.org.uk/wp-content/uploads/2012/12/Analytics-for-Understanding-Research-Vol1-No4.pdf (accessed on 24 April 2020).
- Kobayashi, V.B.; Mol, S.; Kismihok, G. Labour Market Driven Learning Analytics. J. Learn. Anal. 2014, 1, 207–210. [Google Scholar] [CrossRef]
- Mayer-Schönberger, V. Big Data for cardiology: Novel discovery? Eur. Hear. J. 2015, 37, 996–1001. [Google Scholar] [CrossRef] [Green Version]
- Greene, J.C. Engaging Critical Issues in Social Inquiry by Mixing Methods. Am. Behav. Sci. 2012, 56, 755–773. [Google Scholar] [CrossRef]
- Kitchin, R. Big Data, new epistemologies and paradigm shifts. Big Data Soc. 2014, 1, 205395171452848. [Google Scholar] [CrossRef] [Green Version]
- Daniel, B.K. Big Data and data science: A critical review of issues for educational research. Br. J. Educ. Technol. 2017, 50, 101–113. [Google Scholar] [CrossRef] [Green Version]
- Miyares, J.; Catalano, D. Institutional Analytics Is Hard Work: A Five-Year Journey. Educ. Rev. 2016. Available online: https://er.educause.edu/~/media/files/articles/2016/8/erm1656.pdf (accessed on 2 March 2020).
- Nespor, J. Anonymity and Place in Qualitative Inquiry. Qual. Inq. 2000, 6, 546–569. [Google Scholar] [CrossRef]
- Bauman, Z.; Lyon, D. Liquid Surveillance; Polity Press: Cambridge, UK, 2013. [Google Scholar]
- Wiles, R.; Crow, G.; Heath, S.; Charles, V. The Management of Confidentiality and Anonymity in Social Research. Int. J. Soc. Res. Methodol. 2008, 11, 417–428. [Google Scholar] [CrossRef]
- Walford, G.; Massey, A. (Eds.) Explorations in Methodology; Studies in Educational Ethnography; Emerald Publishing Limited: Bingley, UK, 1999. [Google Scholar]
- Moosa, D. Challenges to anonymity and representation in educational qualitative research in a small community: A reflection on my research journey. Comp. J. Comp. Int. Educ. 2013, 43, 483–495. [Google Scholar] [CrossRef]
- Troman, G.; Jeffrey, B.; Walford, G. (Eds.) Methodological Issues and Practices in Ethnography; Studies in Educational Ethnography; Emerald Publishing Limited: Bingley, UK, 2005. [Google Scholar]
- Dwork, C.; Lane, J.; Stodden, V.; Bender, S.; Nissenbaum, H. Differential Privacy: A Cryptographic Approach to Private Data Analysis. In Privacy, Big Data, and the Public Good; Cambridge University Press: Cambridge, UK, 2014; pp. 296–322. [Google Scholar]
- Wilbanks, J.; Lane, J.; Stodden, V.; Bender, S.; Nissenbaum, H. Portable Approaches to Informed Consent and Open Data. In Privacy, Big Data, and the Public Good; Cambridge University Press: Cambridge, UK, 2014; pp. 234–252. [Google Scholar]
- Nissenbaum, H. A Contextual Approach to Privacy Online. Daedalus 2011, 140, 32–48. [Google Scholar] [CrossRef]
- Elias, P.; Lane, J.; Stodden, V.; Bender, S.; Nissenbaum, H. A European Perspective on Research and Big Data Analysis. In Privacy, Big Data, and the Public Good; Cambridge University Press: Cambridge, UK, 2014; pp. 173–191. [Google Scholar]
- Chassang, G. The impact of the EU general data protection regulation on scientific research. Ecancermedicalscience 2017, 11, 709. [Google Scholar] [CrossRef] [Green Version]
- Mondschein, C.F.; Monda, C. The EU’s General Data Protection Regulation (GDPR) in a Research Context. In Fundamentals of Clinical Data Science; Kubben, P., Dumontier, M., Dekker, A., Eds.; Springer: Cham, Switzerland, 2019. [Google Scholar]
- The European Code of Conduct for Research Integrity; All European Academies: Berlin, Germany, 2017; Available online: https://bit.ly/2VmdwlQ (accessed on 14 May 2020).
- Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Florea, D.; Florea, S. Big Data and the Ethical Implications of Data Privacy in Higher Education Research. Sustainability 2020, 12, 8744. https://doi.org/10.3390/su12208744
Florea D, Florea S. Big Data and the Ethical Implications of Data Privacy in Higher Education Research. Sustainability. 2020; 12(20):8744. https://doi.org/10.3390/su12208744
Chicago/Turabian StyleFlorea, Diana, and Silvia Florea. 2020. "Big Data and the Ethical Implications of Data Privacy in Higher Education Research" Sustainability 12, no. 20: 8744. https://doi.org/10.3390/su12208744