Next Article in Journal
Optimal Image Characterization for In-Bed Posture Classification by Using SVM Algorithm
Next Article in Special Issue
Fair-CMNB: Advancing Fairness-Aware Stream Learning with Naïve Bayes and Multi-Objective Optimization
Previous Article in Journal
Quality and Security of Critical Infrastructure Systems
Previous Article in Special Issue
Contemporary Art Authentication with Large-Scale Classification
 
 
Article
Peer-Review Record

AI-Based User Empowerment for Empirical Social Research

Big Data Cogn. Comput. 2024, 8(2), 11; https://doi.org/10.3390/bdcc8020011
by Thoralf Reis *, Lukas Dumberger, Sebastian Bruchhaus, Thomas Krause, Verena Schreyer, Marco X. Bornschlegl and Matthias L. Hemmje
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Big Data Cogn. Comput. 2024, 8(2), 11; https://doi.org/10.3390/bdcc8020011
Submission received: 26 December 2023 / Revised: 18 January 2024 / Accepted: 18 January 2024 / Published: 23 January 2024
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this paper, the authors explore the challenges and opportunities presented by Big Data in empirical social research, particularly in the context of creating more comprehensive models of human behavior and development. While Big Data offers a wealth of information, traditional coding-based analysis methods like qualitative content analysis become impractical due to the sheer volume of data, as manual categorization is highly time-consuming. To address this challenge, the paper introduces AI2VIS4BigData, a reference model that standardizes use cases and artifacts for Big Data information systems integrating Artificial Intelligence (AI) and Machine Learning (ML) for enhancing user capability. The main contribution is the design and implementation of an AI2VIS4BigData-compliant information system, aimed at aiding social researchers in categorizing text data and generating insightful dashboards.

The paper is well-written, it flows well and it is easy to read. The contribution is solid and the study seems sound. There are some drawbacks that the authors could address in order to improve the paper’s quality:

  • The abstract could be rewritten by highlighting the motivation in a better way. Also, specific research conclusions and research significance should be presented in the abstract.
  • I appreciated the discussion and remaining challenges in the Related Work section. However, the title of the section could be changed to also accomodate this perspective.
  • Section 2.1 highlights the importance of empirical social research. By skimming through it, it appears clear that the proposed approach could also help researchers working in HCI-related fields. For instance, processes such as thematic analysis could be carried out via this approach. To this end, I suggest the authors to briefly highlight this point. Also, the following paper recently published could also be cited [10.1016/j.ipm.2023.103516], whose thematic analysis could have been carried out via this approach. An additional recent reference that could be cited is the following one: 10.1109/ACCESS.2023.3268224
  • RC1, 2 and 3 should be highlighted better in the manuscript. How are these addressed?
  • The system is well presented but its advantages and limitations are not well illustrated. I suggest to discuss them in a more detailed manner.
  • More detail on how the recommendations are made could be useful for the interested reader.
  • It seems that the number of participants in the study is not reported. Please highlight it.
  • The final discussion is obvious and not profound enough. More valuable content needs to be mined.

 

Author Response

Dear Sir or Madam,

Thank you very much for your comments. We understood them and found the criticism justified. We tried to address your comments as well as possible and aimed to add more profoundness to every main section of the paper.

Please find the comments to your specific remarks below:

 

Point 1: The abstract could be rewritten by highlighting the motivation in a better way. Also, specific research conclusions and research significance should be presented in the abstract.
Response 1: We rearranged the motivational aspects of the aspect and added additonal explanations on the intention and results. Hopefully, this meets the raised points.

Point 2: I appreciated the discussion and remaining challenges in the Related Work section. However, the title of the section could be changed to also accomodate this perspective.
Response 2: True, we adjusted it.

Point 3: Section 2.1 highlights the importance of empirical social research. By skimming through it, it appears clear that the proposed approach could also help researchers working in HCI-related fields. For instance, processes such as thematic analysis could be carried out via this approach. To this end, I suggest the authors to briefly highlight this point. Also, the following paper recently published could also be cited [10.1016/j.ipm.2023.103516], whose thematic analysis could have been carried out via this approach. An additional recent reference that could be cited is the following one: 10.1109/ACCESS.2023.3268224
Response 3: Thank you very much for this remark and the two provided sources. Indeed, they made valuable additions as cross-application domain references (the esports paper is an example for qualitative content analysis of social networks like Reddit and the copilot study provides additional background to the reference of GitHub’s co pilot as an example for user empowerment).

Point 4: RC1, 2 and 3 should be highlighted better in the manuscript. How are these addressed?
Response 4: We adjusted the phrasing of the RCs to make them more specific and better comprehensible. We also adjusted the formatting to highlight the section where we describe how they are being addressed (beginning of Section 3).

Point 5: The system is well presented but its advantages and limitations are not well illustrated. I suggest to discuss them in a more detailed manner.

Response 5: We added a short discussion on the advantages and limitations of the text categorization and data visualization component sections.

Point 6: More detail on how the recommendations are made could be useful for the interested reader.
Response 6: We added more background to the MLTC section the used algorithms for indexing and inference (e.g., TFDIF-SVM). We also extended the text in the data visualization rule section and translated all figures. Hopefully this enhances the overall comprehensibility of the paper and its recommendation mechanism.

Point 7: It seems that the number of participants in the study is not reported. Please highlight it.
Response 7: We added the number of participants to the abstract and the headline of the section.

Point 8: The final discussion is obvious and not profound enough. More valuable content needs to be mined.

Response 8: We added additional statements on the outcome of the study.

Reviewer 2 Report

Comments and Suggestions for Authors

This paper presents a system that uses AI and machine learning to assist non-technical users in categorizing and visualizing text data. I think the paper's strength lies in evaluation of the system with human volunteers, and the qualitative and quantitative analysis of their experience; although the results show that the integration of AI does not speed up performing the tasks, I think it is important to also publish negative results. My main question is about the text categorization and visualization system that the authors adopt, which I will elaborate on below.

 

The authors describe the MLTC system that they use in page 3. However, there are numerous text categorization algorithms and tools available, many of them being open-source. When there is no pre-determined set of categories, there are also many unsupervised learning methods such as clustering. Therefore, I'm doubtful that (RC1) in page 5 is true. I also think the authors should provide a more comprehensive review of text classification algorithms and systems. Moreover, I'm curious about why the authors choose to use this specific system; if there is a strong motivation, it will be nice to see it laid out. Additionally, from [8], it seems most of the classification algorithms in MLTC are "shallow" machine learning algorithms like LSA. In this case, I wonder if it is necessary to make it distributed since these algorithms are quite fast.

Comments on the Quality of English Language

Language is fine with very minor grammatical errors.

Author Response

Dear Sir or Madam,

Thank you very much for your comments. We understood them and found the criticism justified. We tried to address your comments as well as possible and aimed to add more profoundness to every main section of the paper.

Please find the comments to your specific remarks below:

Point 1: The authors describe the MLTC system that they use in page 3. However, there are numerous text categorization algorithms and tools available, many of them being open-source. When there is no pre-determined set of categories, there are also many unsupervised learning methods such as clustering. [...] I also think the authors should provide a more comprehensive review of text classification algorithms and systems. Moreover, I'm curious about why the authors choose to use this specific system; if there is a strong motivation, it will be nice to see it laid out.
Response 1: Indeed, the description of technical alternatives and an explanation of the selection of MLTC was missing. We extended the MLTC section with a generic introduction of possibilities for automatic TC, indexing methods and ML algorithms. We also added a justification of the choice of MLTC. In a nutshell, it supports simple, robust state of the art algorithms (TFIDF-SVM) and it was developed within the same research group. The algorithm quality was not part of the research goals, the aim was to select proven, working algorithms that enable the evaluation of the user empowerment impact. We mentioned this now very clearly as the two criteria for selecting it.

Point 2: Additionally, from [8], it seems most of the classification algorithms in MLTC are "shallow" machine learning algorithms like LSA. In this case, I wonder if it is necessary to make it distributed since these algorithms are quite fast.
Response 2: True, the algorithms implemented within the athletes are “shallow” and, thus, not computation intensive. However, the main motivation of Eljasik-Swoboda’s trainer/athlete pattern based microservices was the possibility to combine different algorithms and use them in a composed form within an application. We extended the MLTC section and added this information to make it more transparent.

Point 3: Therefore, I'm doubtful that (RC1) in page 5 is true.
Response 3: The previous version of RC1 was too broad and, therefore, criticized rightfully. We reviewed our related work section and the cited literature again and came up with a new proposal: One RC on the application domain in combination with the reference model, one RC on MLTC user empowerment, and one RC on the data exploration phase of the reference model.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors successfully addressed all my concerns.

Back to TopTop