Next Article in Journal
Deep Learning with Northern Australian Savanna Tree Species: A Novel Dataset
Next Article in Special Issue
Multi-Level Analysis of Learning Management Systems’ User Acceptance Exemplified in Two System Case Studies
Previous Article in Journal
Dataset of Public Objects in Uncontrolled Environment for Navigation Aiding
Previous Article in Special Issue
Data from Zimbabwean College Students on the Measurement Invariance of the Entrepreneurship Goal and Implementation Intentions Scales
 
 
Article
Peer-Review Record

Federated Learning for Data Analytics in Education

by Christian Fachola 1,†, Agustín Tornaría 2,†, Paola Bermolen 2,†, Germán Capdehourat 3,4,†, Lorena Etcheverry 1,*,† and María Inés Fariello 2,†
Reviewer 1: Anonymous
Reviewer 3: Anonymous
Submission received: 30 December 2022 / Revised: 8 February 2023 / Accepted: 14 February 2023 / Published: 20 February 2023

Round 1

Reviewer 1 Report

The authors of this study have investigated whether the use of a federated learning approach in predicting the dropout rates of students in various institutions can reach the predictive power of a centralised learning approach. They conclude, that given enough agents and rounds, this can be indeed achieved, by using the MOOC dataset from KDDCUP2015 in both cases.

I think the design of their study, even though not very novel, serves well in showcasing that such approaches have merit and can be used in such cases, where data of sensitive nature are very difficult or not possible to be collated in a central repository due to privacy concerns. There are however a few points that I think improving would benefit the paper overall:

1. The design and explanation of the methodology could be improved. eg. in line 247, it is not very clear why this approach is significantly different compared to section 4.2.1. Is it because testing is done onsite for each client? line 251 -> Why was this data distribution implement and how exactly was it done? What purpose does is serve?

2. line 254 -> Figure 5 is not clear what it shows and why you decided to show the results this way. Elaborate more.

3. line 258 -> Why do you think institution training separately outperforms federated version? Discuss this

4. line 265. Not clear why you chose to do this split and no details on how this was implemented is presented.

5. l 283-285. Need to discuss this further.

6. I think the inclusion of the federated k-means algorithm does not serve anything towards this study, as no actual results are shown. The Jaccard Similarity Index is not a sufficient metric to show that the models have similar performance. I think the study would benefit from omitting the unsupervised models and concentrating on the supervised federated learning model.

7. The greatest weakness of this study is the used of a ready made model in simulated distributed institutions, which results in nicely constructed data across the agents, a non-realistic scenario. The authors discuss this at some extent, but I think it needs to be highlighted even more. There are a lot of potential problems that would make this kind of model unfeasible in the real-world, like heterogeneous data sampling and storing between institutions, lack of processing power from under-funded institutions (or lack of such infrastructure all together), sampling bias from institutions at different parts of the country (or countries) etc.

Overall the paper presents in a nice way the way federated learning could be used in an academic environment and I think people will find this interesting, it needs however to present the methodology and conclusions in a more thorough way. Also, it needs to further discuss the implications of attempting such an approach in the real-world, where staffing and funding problems in such institutions could be a real hindrance in such efforts.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors applied a federated learning approach with application in education data. The topic and methodololgy is interesting and the results are satisfying.However, there are several points that should be addressed:

1) Organization: I would prefere a more clean separation of methods and results, since it is easier for the readers.

2) Methods: a more in-depth description of the federated aspect of the kmeans algothithm is needed. E.g., in neural network, there were some local rounds and then a global update, which was repeated for some epochs. What is the equivalent in k-means?

3) Figure 6. It is not clear what is depicted, what is x axis? Also, in this figure and others, the font of the labels is too small.

4) Table 1 percentages do not add to 100%.

5) When you report average values, please also report standard deviations.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

This manuscript addresses a very relevant topic, namely federated learning applied to education data, nonetheless it fails to achieve academical relevance due to multiple issues, such as: poor presentation skills both in terms of language and organization, unsatisfactory experimental design and  vague conclusions.

Below you may find a more detailed description of such flaws:

Presentation:

The problem motivation in the introduction is not sufficient. You are addressing two predictive tasks: student dropout and unsupervised classification (???) of students. These are, however, simply mentioned and not built upon. Why are they important? What kind of task are they (binary classification, regression, survival analysis)? 

Further, why is your work relevant? There is very little motivation specifically for your problem. In the current version, these are mentioned in the Section 2. Knowing why student dropout and unsupervised classification (???) is important should be clearly stated in the introduction. 

More misplaced content is present in the other sections, such as: the proposal of your work. How is your proposal related work? It is your work itself!!! Several other cases can be found throughout the text.

Repetition of content is also an issue. I have counted the definition of vertical and horizontal learning at least three times. Why is it the case? It is not even the main topic of your manuscript.

Capitalization and acronyms. It is not common anymore to capitalize every term, specially the less used ones, such as Artificial Intelligence (AI). It is used only once, so why would you create an acronym for it? The same applies for acronyms without a proper definition, such as: AA and FEEDAN.

Experiments:
Experiments and results are very vague. I have failed to comprehend how the authors have preprocessed the raw dataset and what the final dataset actually contains. A matrix 225642 × 21 in the format ( course_id, username ) sounds very weird. I suspect that there is a mistake there. Furthermore, how is this in fact used? Authors should clearly specify the features extracted, the number of students per course, etc

There are no competitors method at all. How do you validate your experiments if you are using a single method, which is in fact not even described in this manuscript? Considering that the literature on federated learning is vast, I highly recommend the authors to compare more methods and provide a more solid evaluation of your contribution.

Further, what is unsupervised classification? Up to the date, I have never heard of such term. After inspecting your experiment session, it turns your that unsupervised classification is clustering. This is a very shocking mistake. More shocking is the fact that there is no discussion about these experiments. What are your findings? Everything seems to be mixed up in Section 5, which brings the question: What are you doing exactly with clustering?   

Due to the rather misguiding organization and experiments, it is challenging to clearly state that this manuscript proves the feasibility of your application. I would strongly suggest the authors to familiarize themselves more with machine learning literature and how papers are expected to be presented.

This manuscript definitely has potential, nonetheless it is, unfortunately, not a meaningful scientific contribution in its current form.



Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The authors edited the manuscript according to all reviewers' suggestions. Now the manuscript is more clear and it is easier for the readers to understand the content, thus it is suitable for publication in Data. 

Reviewer 3 Report

Despite not performing more experiments as previously recommended, which would severally improve the quality of this paper, the paper has improved since its last version.

I do insist to say though that more validation is necessary to make this paper a solid contribution. It is, in my opinion, very optimistic to attest the applicability of federated learning in this scenario by only considering the experiments presented. A more thoroughly, and even realistic, analysis is recommended in the future, which should include more algorithms, to the very least.

I would just like to recommend a complete proofread of this version.

Back to TopTop