Next Article in Journal
Using Big Data for Educational Decisions: Lessons from the Literature for Developing Nations
Previous Article in Journal
Relations among and Predictive Effects of Anxiety, Enjoyment and Self-Efficacy on Chinese Interpreting Majors’ Self-Rated Interpreting Competence
 
 
Article
Peer-Review Record

Technology-Enhanced Learning, Data Sharing, and Machine Learning Challenges in South African Education

Educ. Sci. 2023, 13(5), 438; https://doi.org/10.3390/educsci13050438
by Herkulaas MvE Combrink *, Vukosi Marivate and Baphumelele Masikisiki
Reviewer 1:
Reviewer 2:
Educ. Sci. 2023, 13(5), 438; https://doi.org/10.3390/educsci13050438
Submission received: 14 February 2023 / Revised: 18 March 2023 / Accepted: 5 April 2023 / Published: 24 April 2023

Round 1

Reviewer 1 Report

Please find attached the file for specific areas that need corrections.

 

Title

Reduce to 15 words or less.

 

Abstract

The objective of this paper is not clear. It is, therefore, going to be difficult to compare the findings and conclusions against research objectives since they are not explicitly indicated.

 

Consider cutting down this long sentence into three.

The objective of the study needs to be clear in the abstract.

 

Research methodology

You have to include the research philosophy and approach that was used in this study. It seems your explanation of methodology is starting from data collection without paying attention to other research methodology fundamental issues (i.e., research philosophy, design, methodology, analysis).

 

Results

You are presented definitions of data sharing on the results section and I am not sure if this addresses the key concerns of that need to be addressed for the advancement of MLER. If this does not add to the objective of the study then this content should not be reported under results but preliminary lit review.

 

This sounds to me like advantages of data sharing but the question, does this address key concerns that need to be addressed for the advancement of MLER? If it does not then its irrelevant.

 

Inconsistency on the use of and vs & on references.

 

The aspect you are discussing here on the need for training might add to the objective.

 

So what are the issues key concerns that need to be addressed for the advancement of MLER?

 

Is it not advisable to have a separate section to discuss recommendations rather than in the results section?

 

Ensure that the finding that support the claim you are making here is reflected in the results section for consistency.

 

This is not true for qualitative research where an inductive approach can be pursued without declaring a set of hypotheses. So you are making deduction sorely based on deductive and quantitative research.

 

Pl

Comments for author File: Comments.pdf

Author Response

Response to Reviewer 1 Comments

 

Point 1: Title Reduce to 15 words or less.

 

 

Response 1: Title revisited, and changed to

 

“Technology-Enhanced Learning, Data Sharing, and Machine Learning Challenges in South African Education”

 

 

Point 2: Abstract The objective of this paper is not clear. It is, therefore, going to be difficult to compare the findings and conclusions against research objectives since they are not explicitly indicated. Consider cutting down this long sentence into three. The objective of the study needs to be clear in the abstract.

 

Response 2: The comment was well received and the reviewer pointed out a very important fundamental flaw that was addressed. As a result, the abstract was rewritten to address the critique from the reviewer.

 

“The objective of this paper is to scope the challenges associated with data-sharing governance for machine learning applications in education research (MLER) within the South African context. Machine learning applications have the potential to assist student success and identifying areas where students require additional support. However, the implementation of these applications depends on the availability of quality data. This paper highlights the challenges in data-sharing policies across institutions and organisations, making it difficult to standardise data-sharing practices for MLER. This poses a challenge for South African researchers in the MLER space who wish to advance and innovate. The paper proposes viewpoints that policymakers must consider overcoming these challenges on data-sharing practices, ultimately allowing South African researchers to leverage the benefits of machine learning applications in education effectively. By addressing these challenges, South African institutions and organisations can improve educational outcomes and work toward the goal of inclusive and equitable education.”

 

Point 3: Research methodology

You have to include the research philosophy and approach that was used in this study. It seems your explanation of methodology is starting from data collection without paying attention to other research methodology fundamental issues (i.e., research philosophy, design, methodology, analysis).

 

Response 3: Contextual sections were added to the methodology section to illustrate both the research design, approach, and analysis. Due to this design change – the resutls section included the results mentioned within the analysis that visualised some of the elements of the scoping review that were not present in the original draft. Below is the rewritten methodology section to address the reviewers comments.

 

2.1 Research Design and Methodology

To gauge the landscape present within the data sharing for MLER context, scoping review as a methodological assessment was used [16, 17]. Scoping review was performed to assess the current information related to data sharing for MLER within the South African context. The scoping review aimed to provide a conceptual overview of the data sharing practices present within institutions of higher learning, highlighting the challenges and current practices associated with data sharing. The purpose of a scoping review is to provide clarity required within a specific field of study from the perspective of addressing a broad research question and aim conceptually, within a specific domain that requires further contextualisation [18]. This is performed by assessing secondary data sources, literature as well as expert opinions about a particular subject matter.

The overarching question that was explored was guided by a need to assess the data sharing practices present within institutions of higher learning. This methodological approach was favoured as there are gaps in the understanding related to data sharing practices that may have obstructing impacts on MLER research. The interpretation of the results was discussed from an interpretivist paradigm. This was proposed due to the subjective nature of the information as the bodies of knowledge included different domain specific contexts. As to minimise the potential bias in the interpretation of the experts, the themes identified were used as the discussion points and the arguments were developed based on the findings. This scoping review allowed for the themes to be identified from the literature available so that further discussion on the topics could commence.

 

2.2 Inclusion and Exclusion Criteria

 The scoping data collection strategy included a specific scope and selection criteria for inclusion and exclusion. Initially, more than 80 related academic articles within the local (South African) and global context were considered within the scoping review. These articles contained keywords related to data sharing, machine learning for education, challenges in obtaining training data, data sharing policies, as well as GDPR and POPIA with regard to data sharing frameworks. Furthermore, only works obtained from 2010 onward were considered for the review (Table 1).

Table 1. Inclusion and exclusion criteria in the study.

Inclusion criteria

Exclusion criteria

Information after 2010

No information prior to 2010

Keywords: data sharing, machine learning for education, challenges in obtaining training data, data sharing in education, data sharing policies, GDPR, POPIA, data sharing frameworks, South Africa, education

If the body of knowledge was only about a specific organization, and not the data sharing between organisations or if the data sharing involved criminal activity or examples that are against the law

 

Exclusion criteria were applied once the literature was gathered. The exclusion criteria were applied if the body of knowledge only made reference to a very specific use case within an organisation (rather than the organisation itself), whether or not the principles can be applied to a context involving multiple institutions (for example, if a specific South African institution of Higher Learning has a policy that is only specific to that institution, then it was removed from the criteria), and if the data sharing had anything to do with criminal activity or the use of data for negative use cases (as this was not the scope or the focus of this very specific study). Upon the initial screening of the body of text, 50 bodies of literature were excluded, given the exclusion criteria. In addition to this, the bibliometric analysis was not included within this specific paper, as the interpretation of the body of texts were the focus of scoping rather than the statistics and counts of the number of literature prevalent within this field of research. In total, 34 bodies of literature in the form of journal articles (28), conference proceedings (5), and a book review (1) were considered for the scope [19 – 53].

 

2.3 Analysis

The most salient ideas were summarised from each of the literature sources and coded according to their thematic themes. The thematic coding of ideas was identified within the scoping process and noted on a separate data source. The thematic coding was conducted on the premise of the most salient ideas surrounding data sharing, governance and the challenges associated with them. The analysis was only performed on the final data included in the scoping review. An initial analysis was performed to identify the broad overarching categories, which were then subdivided to understand the nuance and gain further context on the subject matter. Each of the broadly defined categories underwent another thematic analysis to extract the themes within each of these. This meant that the analysis process occurred for two different levels of understanding in the text a) a superficial categorisation of the broader themes from within the literature, and b) a more descriptive and nuanced analysis of the text to understand the scope and context of the categories. The information was then further processed for analysis using Python 3.7 for the data processing of the frequency graphs, and visualisation of the codified data. These visualisations included word clouds to illustrate the most salient synonyms and related words within the body of knowledge that were grouped into the respective themes. Furthermore, the information was processed to include additional descriptive data such as word frequencies and frequency graphs and tables. These data points were then grouped to include the primary overarching ideas related to data sharing in this context for further discussion. The summary of these results will be outlined within the result section to follow.

 

Point 4: Results

You are presented definitions of data sharing on the results section and I am not sure if this addresses the key concerns of that need to be addressed for the advancement of MLER. If this does not add to the objective of the study then this content should not be reported under results but preliminary lit review.

 

Response 4: The results section presents results from the scoping review, and the authors note that none of the tabular or visual results of the scoping was included in the results section. The authors further note that there were narratives in the results section, such as the definitions, that were moved to their respective sections in the article. This change led to a much shorter results section, followed by a discussion section, and a call to action section. Below is the updated Results section:

 

  1. Results

Based on the first iteration of coding the data, three primary categories were identified for the scope namely: different definitions of data sharing; a need to upskill experts, share data and protect personal information; and data sharing and the impact on research.

 

3.1 Different Definitions of Data Sharing and and the Impact on Research

Based on the literature used in the scoping review, the content related to the definitions contained the following keywords and themes (Figure 1).

 

Figure 1. Frequency of words and themes used to define data sharing.

Data sharing definitions differ between experts, and part of the scoping review was to define these parameters. Data sharing across various domains and contexts have an impact on the type of practices that influence to what extent information is shared between different organisations in the context of MLER and other research related activities involving the use of digital information. In the context of this scoping review, the various definitions considered included a variety of different keywords and concepts that overlapped and some that were different, on the basis of the use case. For example, medical information sharing specific to patients would differ from aggregate data about education. The four subthemes that were identified were the most prevalent across the literature sources and included a variety of concepts related to data reuse, legal consequences, research data as opposed to industry data, and the availability of the data.

 

3.2 A need to Upskill Experts, Data Sharing and Protection of Personal Information

Three salient themes within the expert literature from the scoping review included a need to train and upskill experts in academe and industry, a need to share information more frequently between researchers, and a need to do this sharing under very specific frameworks to protect personal information (Figure 2).

 

Figure 2. Frequency of words used to identify the themes related to the upskilling of experts, sharing data for research and protecting personal information.

The themes identified will be unpacked in the discussion section below. After the discussion of the themes, based on the scoping review, a call to action will include the recommendations and proposed areas that require improvement for MLER to be successful in the South African context.

 

Point 5: This sounds to me like advantages of data sharing but the question, does this address key concerns that need to be addressed for the advancement of MLER? If it does not then its irrelevant.

Inconsistency on the use of and vs & on references.

 

Response 5: The authors welcome this viewpoint and critique. All the irrelevant sections were removed, and the use of & was changed to and for consistency sake.

 

Point 6:  

The aspect you are discussing here on the need for training might add to the objective.

 

Response 6: This comment was within the PDF document and referred to the mention of training and upskilling. These sections were moved to the call to action section, and an emphasis was placed on packaging the way forward for these themes. Please see the call to action section below:

  1. Call to Action

Based on the discussions, arguments and outcomes from the scoping review, we propose an adapted data sharing definition to include MLER in the following ways: “Data sharing for MLER is the practice of making research data available under different conditions, with the aim of promoting scientific advancement, social good, commercialisation, transparency, reproducibility, and/or collaboration. It involves sharing data and information under specified terms and conditions to ensure responsible use and protection of research subjects' privacy as well as take cognisance of the unintended consequences of sharing such information”. MLER research may require context and data from African perspectives for each individual country, institution, and a lot of the data needed for MLER is not present within this network. Furthermore, even though there are some positive results with data sharing practices among different countries, data sharing practices are as yet not firmly established in all academic disciplines and countries regarding the required training data needed for MLER. We therefore suggest that the education domain in South Africa has not yet fully established the practice of data sharing for MLER because the data that is required for this kind of research requires expertise in identifying, storing, and sharing the information in a manner that complies with the current legal and ethical academic frameworks. We acknowledge the stringent frameworks that are currently in place, but we also suggest that there are some areas that require improvement.

In this study, we also echo Riggs et al. (2019) by saying that there is an urgent need for fair data sharing practices for MLER [22]. We argue that machine learning based research in education is not only impeded by data policies and regulations, but also by a lack of understanding by regulatory bodies in how MLER and data work together. We therefore fear that if these regulations remain at a hardware and data level, and do not include the machine learning for education components, then it runs the risk of stifling innovation, research and development, and transformation within the field because the data sharing policies are too strict, focusing too much on the sharing of information rather than coding and aggregating information in ways that are still meaningful for education research to take place.

With reference to Alter and Vardigan (2015), we outline some of the data types that are usually requested to conduct machine learning research in education [3]. We make this claim based on the lack of processes currently available to request and gain access to training data between different South African higher education institutions. For example, if a South African researcher wants access to the data, extensive research proposals, ethical clearance, and a series of anonymisation steps are required just to acquire the data, but companies implementing learning management systems may conduct ‘in house’ research to ‘improve customer satisfaction’ without such processes. Furthermore, the researchers need to have a very specific research outcome in mind, a clear hypothesis and a clear scope of analysis, leaving very little room to explore the data outside the well-defined scope of the approved research proposal, thereby potentially stifling innovative research, whereas industry may have more freedom to explore the data. As stated before, this happens because of different policy implementation strategies managed differently between different sectors, and between countries internationally. International organisations and international companies with enough fluid capital are able to fast-track these processes under the guise of the legal gatekeeping mechanisms as the governance of policies (such as GDPR and POPIA) can be justified. This does not mean that the ethics committees themselves are redundant in the process of protecting information, but it does pose concerns if South African researchers require ethical clearance to conduct research from local committees, but international organisations do not – based on the same data. We are not advocating MLER to be the ultimate solution to the plethora of challenges that South African education faces, but as we outline, MLER enables highly innovative solutions in this context. For MLER to be successful, the underlying data sharing governance structures need to support innovation in this domain. South Africa needs a shared strategy toward what is needed for a more inclusive economy, including how we share information among one another [19, 23, 29, 50, 51]. The current collaborative research model, and the current ecosystem needs to shift its paradigm to an objective, value-laden approach [53]. The set of values on a strategic level within the South African scientific community and general public (outside the sciences) should be shared between a variety of different stakeholders, and a more open network of researchers and research should be promoted for South African researchers. We propose this by strategically involving all the relevant stakeholders to agree upon the values that the scientific community should focus on if we are to remain relevant in the MLER domain. For MLER related research, we propose that the South African research community must remain objective to a set of common goals that we need to strive for, as a collective, to shift the needle for research in this domain. Secondly, the epistemic value for science and innovation in South Africa should take a stance to include a variety of different disciplines that are not traditionally seen as strictly within the domain of MLER research. This includes philosophy, general humanities, education, public relations, law, and commerce – all of which are important to drive this ecosystem. To define what the scientific community and research in the field should prioritise, and where the value of research should move toward also requires an alignment of the current strategic goals. There are a variety of different scientific goals and visions, as outlined by the 4IR commission in South Africa, the strategic vision of the Department of Science and Innovation (DSI), the Council for Scientific and Industrial Research (CSIR), and a variety of other major South African institutions [47]. We propose that the education and ethical specialists both locally and internationally that work with South African data come together and collaboratively contribute toward advancing the field of MLER research. As a result, we created a repository that can be accessed by any such expert with an internet connection so that these meaningful contributions can be consolidated in a collaborative manner[1]. Within the repository, the proposed set of values outlined are not based on the collaborative efforts of the various stakeholders, currently, but rather based on the survey conducted to streamline these efforts and based on the arguments made in this paper from a practitioner perspective. It is therefore intended to serve only as a starting point for these collaborative dialogues – and should be collaborative as the core objectives might change depending on how value is defined. It is also important that if we as a community of researchers strive for digital independence, that we also talk about software ownership, copyright, and intellectual property based on the data we generate. Although these constraints might seem economically driven only, the extent of innovation is dependent on the data and the skill to develop within those environments. There is also a need to specify what type of data and variables are acceptable to share and use for MLER, and what is not. For this to work, a consensus must be reached among researchers, policy makers, institutions and industry so that MLER can advance – given the data sharing governance and implementation structures. As outlined before, our arguments do not stem from the perspective to rebut data sharing policies, and we acknowledge that these policies are a vital part of the 21st century. Instead, our arguments centre around the rate of innovation and the type of innovation needed in developing countries, like South Africa, that need innovation and technology to assist the education domain.

Point 7: So what are the issues key concerns that need to be addressed for the advancement of MLER? Is it not advisable to have a separate section to discuss recommendations rather than in the results section?

 

Response 7: The key issues are data sharing, Data Sharing and the Impact on Research, a need to Upskill Experts, and Protection of Personal Information. A call to action section was created to include these recommendations. Please see response 6.

 

Point 8: Ensure that the finding that support the claim you are making here is reflected in the results section for consistency.

 

Response 8: Adding the relevant sources, arguments, and restructuring the discussion and call to action section, and adding an alignment in the context of findings supporting the claims is reflected throughout the document.

[1] https://github.com/dsfsi/Higher_Education_EDA

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors raise the serious topic of challenges associated with the data sharing governance in the context of using it in the machine learning studies.

While it seems that authors have made a good job of searching related works the paper is rather "boringly" formatted. In a way it is very hard to follow without any additional visual information that helps readers to "anchor" their attention. I honestly think that the paper will benefit with the content like table, which briefly summarize the reviewed work and some other diagrams (some statistics of the possible collected data, maybe in map format or in temporal diagram highlighting key events). Because raw text is rather hard to follow.

Another point that could be improved is the additional overview of the results that could be drawn from the collecting big data related to MLER. As the author OULAD dataset but the discussion on potential application of such dataset is rather limited. It could be implemented in form of citing key takeaways and (if it is possible) application of such results, drown from MLER in practices.

 

 

 

Author Response

Response to Reviewer 2 Comments

 

Point 1: The authors raise the serious topic of challenges associated with the data sharing governance in the context of using it in the machine learning studies.

While it seems that authors have made a good job of searching related works the paper is rather "boringly" formatted. In a way it is very hard to follow without any additional visual information that helps readers to "anchor" their attention. I honestly think that the paper will benefit with the content like table, which briefly summarize the reviewed work and some other diagrams (some statistics of the possible collected data, maybe in map format or in temporal diagram highlighting key events). Because raw text is rather hard to follow.

 

 

Response 1: To make the paper more appealing, the authors included the visualisations used in the scoping review. This meant that the methodology and results section was adjusted:

 

 

  1. Research Design and Method

2.1 Research Design and Methodology

To gauge the landscape present within the data sharing for MLER context, scoping review as a methodological assessment was used [16, 17]. Scoping review was performed to assess the current information related to data sharing for MLER within the South African context. The scoping review aimed to provide a conceptual overview of the data sharing practices present within institutions of higher learning, highlighting the challenges and current practices associated with data sharing. The purpose of a scoping review is to provide clarity required within a specific field of study from the perspective of addressing a broad research question and aim conceptually, within a specific domain that requires further contextualisation [18]. This is performed by assessing secondary data sources, literature as well as expert opinions about a particular subject matter.

The overarching question that was explored was guided by a need to assess the data sharing practices present within institutions of higher learning. This methodological approach was favoured as there are gaps in the understanding related to data sharing practices that may have obstructing impacts on MLER research. The interpretation of the results was discussed from an interpretivist paradigm. This was proposed due to the subjective nature of the information as the bodies of knowledge included different domain specific contexts. As to minimise the potential bias in the interpretation of the experts, the themes identified were used as the discussion points and the arguments were developed based on the findings. This scoping review allowed for the themes to be identified from the literature available so that further discussion on the topics could commence.

 

2.2 Inclusion and Exclusion Criteria

 The scoping data collection strategy included a specific scope and selection criteria for inclusion and exclusion. Initially, more than 80 related academic articles within the local (South African) and global context were considered within the scoping review. These articles contained keywords related to data sharing, machine learning for education, challenges in obtaining training data, data sharing policies, as well as GDPR and POPIA with regard to data sharing frameworks. Furthermore, only works obtained from 2010 onward were considered for the review (Table 1).

Table 1. Inclusion and exclusion criteria in the study.

Inclusion criteria

Exclusion criteria

Information after 2010

No information prior to 2010

Keywords: data sharing, machine learning for education, challenges in obtaining training data, data sharing in education, data sharing policies, GDPR, POPIA, data sharing frameworks, South Africa, education

If the body of knowledge was only about a specific organization, and not the data sharing between organisations or if the data sharing involved criminal activity or examples that are against the law

 

Exclusion criteria were applied once the literature was gathered. The exclusion criteria were applied if the body of knowledge only made reference to a very specific use case within an organisation (rather than the organisation itself), whether or not the principles can be applied to a context involving multiple institutions (for example, if a specific South African institution of Higher Learning has a policy that is only specific to that institution, then it was removed from the criteria), and if the data sharing had anything to do with criminal activity or the use of data for negative use cases (as this was not the scope or the focus of this very specific study). Upon the initial screening of the body of text, 50 bodies of literature were excluded, given the exclusion criteria. In addition to this, the bibliometric analysis was not included within this specific paper, as the interpretation of the body of texts were the focus of scoping rather than the statistics and counts of the number of literature prevalent within this field of research. In total, 34 bodies of literature in the form of journal articles (28), conference proceedings (5), and a book review (1) were considered for the scope [19 – 53].

 

2.3 Analysis

The most salient ideas were summarised from each of the literature sources and coded according to their thematic themes. The thematic coding of ideas was identified within the scoping process and noted on a separate data source. The thematic coding was conducted on the premise of the most salient ideas surrounding data sharing, governance and the challenges associated with them. The analysis was only performed on the final data included in the scoping review. An initial analysis was performed to identify the broad overarching categories, which were then subdivided to understand the nuance and gain further context on the subject matter. Each of the broadly defined categories underwent another thematic analysis to extract the themes within each of these. This meant that the analysis process occurred for two different levels of understanding in the text a) a superficial categorisation of the broader themes from within the literature, and b) a more descriptive and nuanced analysis of the text to understand the scope and context of the categories. The information was then further processed for analysis using Python 3.7 for the data processing of the frequency graphs, and visualisation of the codified data. These visualisations included word clouds to illustrate the most salient synonyms and related words within the body of knowledge that were grouped into the respective themes. Furthermore, the information was processed to include additional descriptive data such as word frequencies and frequency graphs and tables. These data points were then grouped to include the primary overarching ideas related to data sharing in this context for further discussion. The summary of these results will be outlined within the result section to follow.

  1. Results

Based on the first iteration of coding the data, three primary categories were identified for the scope namely: different definitions of data sharing; a need to upskill experts, share data and protect personal information; and data sharing and the impact on research.

 

3.1 Different Definitions of Data Sharing and and the Impact on Research

Based on the literature used in the scoping review, the content related to the definitions contained the following keywords and themes (Figure 1).

 

Figure 1. Frequency of words and themes used to define data sharing.

Data sharing definitions differ between experts, and part of the scoping review was to define these parameters. Data sharing across various domains and contexts have an impact on the type of practices that influence to what extent information is shared between different organisations in the context of MLER and other research related activities involving the use of digital information. In the context of this scoping review, the various definitions considered included a variety of different keywords and concepts that overlapped and some that were different, on the basis of the use case. For example, medical information sharing specific to patients would differ from aggregate data about education. The four subthemes that were identified were the most prevalent across the literature sources and included a variety of concepts related to data reuse, legal consequences, research data as opposed to industry data, and the availability of the data.

 

3.2 A need to Upskill Experts, Data Sharing and Protection of Personal Information

Three salient themes within the expert literature from the scoping review included a need to train and upskill experts in academe and industry, a need to share information more frequently between researchers, and a need to do this sharing under very specific frameworks to protect personal information (Figure 2).

 

Figure 2. Frequency of words used to identify the themes related to the upskilling of experts, sharing data for research and protecting personal information.

The themes identified will be unpacked in the discussion section below. After the discussion of the themes, based on the scoping review, a call to action will include the recommendations and proposed areas that require improvement for MLER to be successful in the South African context.

 

 

In addition to this, the results focussed solely on the result, and the discussion and call to action included the narrative.

 

 

Point 2: Another point that could be improved is the additional overview of the results that could be drawn from the collecting big data related to MLER. As the author OULAD dataset but the discussion on potential application of such dataset is rather limited. It could be implemented in form of citing key takeaways and (if it is possible) application of such results, drown from MLER in practices.

 

Response 2: The reviewers comments are welcomed. As a result, a section was added after the discussion. This section “Call to Action” included concrete ideas and recommendations. There were additions to this section not in the original text, and an emphasis was placed on packaging all recommendations together.

 

  1. Call to Action

Based on the discussions, arguments and outcomes from the scoping review, we propose an adapted data sharing definition to include MLER in the following ways: “Data sharing for MLER is the practice of making research data available under different conditions, with the aim of promoting scientific advancement, social good, commercialisation, transparency, reproducibility, and/or collaboration. It involves sharing data and information under specified terms and conditions to ensure responsible use and protection of research subjects' privacy as well as take cognisance of the unintended consequences of sharing such information”. MLER research may require context and data from African perspectives for each individual country, institution, and a lot of the data needed for MLER is not present within this network. Furthermore, even though there are some positive results with data sharing practices among different countries, data sharing practices are as yet not firmly established in all academic disciplines and countries regarding the required training data needed for MLER. We therefore suggest that the education domain in South Africa has not yet fully established the practice of data sharing for MLER because the data that is required for this kind of research requires expertise in identifying, storing, and sharing the information in a manner that complies with the current legal and ethical academic frameworks. We acknowledge the stringent frameworks that are currently in place, but we also suggest that there are some areas that require improvement.

In this study, we also echo Riggs et al. (2019) by saying that there is an urgent need for fair data sharing practices for MLER [22]. We argue that machine learning based research in education is not only impeded by data policies and regulations, but also by a lack of understanding by regulatory bodies in how MLER and data work together. We therefore fear that if these regulations remain at a hardware and data level, and do not include the machine learning for education components, then it runs the risk of stifling innovation, research and development, and transformation within the field because the data sharing policies are too strict, focusing too much on the sharing of information rather than coding and aggregating information in ways that are still meaningful for education research to take place.

With reference to Alter and Vardigan (2015), we outline some of the data types that are usually requested to conduct machine learning research in education [3]. We make this claim based on the lack of processes currently available to request and gain access to training data between different South African higher education institutions. For example, if a South African researcher wants access to the data, extensive research proposals, ethical clearance, and a series of anonymisation steps are required just to acquire the data, but companies implementing learning management systems may conduct ‘in house’ research to ‘improve customer satisfaction’ without such processes. Furthermore, the researchers need to have a very specific research outcome in mind, a clear hypothesis and a clear scope of analysis, leaving very little room to explore the data outside the well-defined scope of the approved research proposal, thereby potentially stifling innovative research, whereas industry may have more freedom to explore the data. As stated before, this happens because of different policy implementation strategies managed differently between different sectors, and between countries internationally. International organisations and international companies with enough fluid capital are able to fast-track these processes under the guise of the legal gatekeeping mechanisms as the governance of policies (such as GDPR and POPIA) can be justified. This does not mean that the ethics committees themselves are redundant in the process of protecting information, but it does pose concerns if South African researchers require ethical clearance to conduct research from local committees, but international organisations do not – based on the same data. We are not advocating MLER to be the ultimate solution to the plethora of challenges that South African education faces, but as we outline, MLER enables highly innovative solutions in this context. For MLER to be successful, the underlying data sharing governance structures need to support innovation in this domain. South Africa needs a shared strategy toward what is needed for a more inclusive economy, including how we share information among one another [19, 23, 29, 50, 51]. The current collaborative research model, and the current ecosystem needs to shift its paradigm to an objective, value-laden approach [53]. The set of values on a strategic level within the South African scientific community and general public (outside the sciences) should be shared between a variety of different stakeholders, and a more open network of researchers and research should be promoted for South African researchers. We propose this by strategically involving all the relevant stakeholders to agree upon the values that the scientific community should focus on if we are to remain relevant in the MLER domain. For MLER related research, we propose that the South African research community must remain objective to a set of common goals that we need to strive for, as a collective, to shift the needle for research in this domain. Secondly, the epistemic value for science and innovation in South Africa should take a stance to include a variety of different disciplines that are not traditionally seen as strictly within the domain of MLER research. This includes philosophy, general humanities, education, public relations, law, and commerce – all of which are important to drive this ecosystem. To define what the scientific community and research in the field should prioritise, and where the value of research should move toward also requires an alignment of the current strategic goals. There are a variety of different scientific goals and visions, as outlined by the 4IR commission in South Africa, the strategic vision of the Department of Science and Innovation (DSI), the Council for Scientific and Industrial Research (CSIR), and a variety of other major South African institutions [47]. We propose that the education and ethical specialists both locally and internationally that work with South African data come together and collaboratively contribute toward advancing the field of MLER research. As a result, we created a repository that can be accessed by any such expert with an internet connection so that these meaningful contributions can be consolidated in a collaborative manner[1]. Within the repository, the proposed set of values outlined are not based on the collaborative efforts of the various stakeholders, currently, but rather based on the survey conducted to streamline these efforts and based on the arguments made in this paper from a practitioner perspective. It is therefore intended to serve only as a starting point for these collaborative dialogues – and should be collaborative as the core objectives might change depending on how value is defined. It is also important that if we as a community of researchers strive for digital independence, that we also talk about software ownership, copyright, and intellectual property based on the data we generate. Although these constraints might seem economically driven only, the extent of innovation is dependent on the data and the skill to develop within those environments. There is also a need to specify what type of data and variables are acceptable to share and use for MLER, and what is not. For this to work, a consensus must be reached among researchers, policy makers, institutions and industry so that MLER can advance – given the data sharing governance and implementation structures. As outlined before, our arguments do not stem from the perspective to rebut data sharing policies, and we acknowledge that these policies are a vital part of the 21st century. Instead, our arguments centre around the rate of innovation and the type of innovation needed in developing countries, like South Africa, that need innovation and technology to assist the education domain.

 

 

[1] https://github.com/dsfsi/Higher_Education_EDA

Author Response File: Author Response.pdf

Back to TopTop