Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessReview

Peer-Review Record

Research Practice and Progress of Models and Algorithms Applied in Topic Identification and Prediction Based on the Analysis of CNKI

Appl. Sci. 2023, 13(13), 7545; https://doi.org/10.3390/app13137545

by Sicheng Guo^1,*, Li Si^1,2,* and Xianrui Liu^1,*

Reviewer 1:

Fernando Almeida

Reviewer 2: Anonymous

Appl. Sci. 2023, 13(13), 7545; https://doi.org/10.3390/app13137545

Submission received: 8 March 2023 / Revised: 14 June 2023 / Accepted: 16 June 2023 / Published: 26 June 2023

(This article belongs to the Special Issue New Techniques of Machine Learning and Deep Learning in Text Classification)

Round 1

Reviewer 1 Report

The article is well structured and offers relevant contributions. The scientific quality of the work is remarkable. However, the methodological process presents several vulnerabilities that should be better worked on by the authors.

Improvement suggestions:

1. Research questions are relevant but their pertinence should be better justified considering their impact for the development of knowledge in the field.

2. Authors indicate that adopt a literature research review method. This information is only partially correct because in fact they adopt a systematic literature review (SLR). Therefore, they should review it and explain the relevance of applying a SLR.

3. Authors adopt CNKI as data source. It would be relevant to justify this choice considering alternatives like WoS and Scopus. Additionally, please explain if the results would be significantly different if they adopt Scopus or WoS as data source.

4. In my opinion the step 2 of Figure 1 is excessively simplified. Did you look to the title, abstract or full-text? Typically, the screening process is dividing into two or three phases.

5. Excel is a very versatile and useful tool but not enough robust to perform a SLR. Why not adopting another software like VosViewer? Please address this issue.

6. Authors use several keywords to perform searches in the database. They should demonstrate the adopted strategy to identify these keywords.

7. Figure 3 is very interesting but perhaps it was made using another software than Excel. Please mention it.

8. Academic and practical contributions should be better explored.

9. Authors should also explore the adopted strategy to reduce the risk of bias. Please read:

https://marksmanhealthcare.com/2023/05/08/risk-of-assessment-during-slr-why-and-how/

It is particularly relevant in health field but the risk of bias is also a major issue in other fields like computer science.

10. I would expect to have a better exploration of the SLR results. For example, use VoSViewer to understand the evolution of citations, research networks, collaborations, and so on. Use only Excel to analyze data is not enough.

Author Response

Dear Reviewer,

We highly appreciate for your extensive reading, excellent comments and suggestions. We have revised our article in the best way as we could. Here are comments and responses.

Reviewer 1: The article is well structured and offers relevant contributions. The scientific quality of the work is remarkable. However, the methodological process presents several vulnerabilities that should be better worked on by the authors.

Improvement suggestions:

Research questions are relevant but their pertinence should be better justified considering their impact for the development of knowledge in the field.

Response ：

Thanks for your advice. In 3.1 Research Questions, we explain the specific connection between these questions, and the specific content and effect of each of them. See the highlighted sentences for details:

Through Question1, this study composes an overall overview of the application of algorithms and models, including all types, numbers and chronological distribution of algorithms and models. By exploring question 2, essential algorithms and models are filtered out and their specific ways of application are parsed. Finally, addressing question 3 provides a reference for selecting and evaluating algorithms and models, and deter-mining the most appropriate algorithms and models when analyzing topics of texts. Based on this, this paper provides a comprehensive insight into the current state of algorithms and models in the field of topic analysis and prediction.

Authors indicate that adopt a literature research review method. This information is only partially correct because in fact they adopt a systematic literature review (SLR). Therefore, they should review it and explain the relevance of applying a SLR.

Response ：

Thanks for your helpful advice. In 3.2.1 Research Method, we confirm the use of systematic literature review (SLR) and explain the advantages of this method and the reasons for using it in this research. See the section marked in red for details:

It is a particular type of literature review that is characterized by being methodical, comprehensive, transparent, and replicable, which consists of raising research questions on a specific topic, searching and obtaining all relevant literature in a comprehensive way and conducting a systematic assessment and integration of analysis to address the research question. This method contains five steps: defining research scope, developing selection criteria, planning search, collecting and screening literature, and presenting results.

Authors adopt CNKI as data source. It would be relevant to justify this choice considering alternatives like WoS and Scopus. Additionally, please explain if the results would be significantly different if they adopt Scopus or WoS as data source.

Response ：

Thanks for your helpful suggestion. Actually, we already mention the advantage of CNKI database, it is critical when exploring the application of model and algorithms in China. We have again explained the reasons for using CNKI as a data source in 3.2.2 Research Process. Specifically, CNKI (China National Knowledge Infrastructure) is currently the most comprehensive academic repository in China with a total of over 200 million titles, containing more than 95% of officially published Chinese academic resources. Therefore, using CNKI database as the data source ensures adequate access to high-quality Chinese research in this field, which fits our research aims and content.

In my opinion the step 2 of Figure 1 is excessively simplified. Did you look to the title, abstract or full-text? Typically, the screening process is dividing into two or three phases.

Response ：

Thanks for your advice. We have further elaborated on the specifics of the screening process. Specifically, screening the study sample. First, we read the titles, abstracts and keywords of obtained literature to filter out the samples that are relevant to this field. Then, we review the full text of these samples one by one, selecting articles that explicitly identify the use of models and algorithms for topic analysis in the research methodology, finally choosing 96 document as the research samples of this paper.

Excel is a very versatile and useful tool but not enough robust to perform a SLR. Why not adopting another software like VOSviewer? Please address this issue.

Response ：

Thanks for your helpful suggestion. We also used VOSviewer as tool to perform SLR and add Figure 3.Co-occurrence of keywords by using VOSviewer.

Authors use several keywords to perform searches in the database. They should demonstrate the adopted strategy to identify these keywords.

Response ：

Thanks for your advice. We explain why we use these keywords in 3.2.2 Research Process. Specifically, the search terms "research frontiers", "dynamic evolution", "link prediction" and "net-work links" have been excluded in consideration of search completeness and accuracy. After several trial searches, the advanced search is conducted with the keywords of "technology identification" OR "technology prediction" OR "subversive technology", apart from "topic identification" OR "topic prediction" OR "emerging topics" OR "topic evolution" .

Figure 3 is very interesting but perhaps it was made using another software than Excel. Please mention it.

Response ：

Thanks for your advice. The Dycharts tool is used to visualizing data. Dycharts is a platform for searching, analyzing and visualizing data (https://dycharts.com/appv2/#/pages/home/index).

Academic and practical contributions should be better explored.

Response ：

Thanks for your helpful advice. We specialize and complete the contribution part in 8.1 Academic and practical contributions. Specifically,this research contributes in three aspects, which covers research methods, research content and research implications. See the content in red.

Authors should also explore the adopted strategy to reduce the risk of bias. Please read:

https://marksmanhealthcare.com/2023/05/08/risk-of-assessment-during-slr-why-and-how/

It is particularly relevant in health field but the risk of bias is also a major issue in other fields like computer science.

Response ：

Thanks for your helpful suggestion. The tool you mentioned is very essential, but it may not suitable for our research, so we used Cohen's Kappa Index to verify whether the conclusions of this paper have the risk of bias. Cohen's kappa coefficient is a statistic which measures inter-rater agreement for qualitative (categorical) items. Consistency check was conducted by random selecting records of three researchers and calculating Cohen's Kappa statistic. Cohen's Kappa scores were all above 0.8 (0.75 indicates good consistency), indicating credible results.

I would expect to have a better exploration of the SLR results. For example, use VoSViewer to understand the evolution of citations, research networks, collaborations, and so on. Use only Excel to analyze data is not enough.

Response ：

Thanks for your advice. We add Figure 3.Co-occurrence of keywords by using VosViewer.

Author Response File: Author Response.docx

Reviewer 2 Report

This paper is related to an interesting topic, namely Research Practice and Progress of Models and Algorithms Applied in Topic Identification and Prediction Based on the analysis of CNKI for Applied Sciences

However, the manuscript needs to be further developed in order to meet the expected academic requirements.

1) To improve the abstract, it would be helpful to provide more specific information about the research methodology, key findings, and practical implications of the study. Additionally, providing background information and context would enhance the overall understanding of the research topic.

2) To improve the introduction, it would be beneficial to clearly state the research objectives and questions, provide more background information to contextualize the study. In addition, add the research gap. The contribution not clear.

3) The related work section is weak. Add more details and studies in Related Work section.

4) Add the discussion section and should be strengthened with more details.

5) Add the Implications section

6) The conclusions is weak. Answer your research question in the conclusions; what did we learn compared with current, significant research (up to 2023).

Author Response

Dear Reviewer,

We highly appreciate for your extensive reading, excellent comments and suggestions. We have revised our article in the best way as we could. Here are comments and responses.

Response to Reviewer 2 Comments

Comments and Suggestions for Authors

However, the manuscript needs to be further developed in order to meet the expected academic requirements.

1 To improve the abstract, it would be helpful to provide more specific information about the research methodology, key findings, and practical implications of the study. Additionally, providing background information and context would enhance the overall understanding of the research topic.

Response ：

Thanks for your advice. We add more content about the research methodology, key findings, and practical implications, background information and context of the study in abstract. Specifically, research methods include systematic literature review, bibliometric analysis and classification method. Through systematic literature review, 96 literature about topic identification and evolution pre-diction models are selected in CNKI database. By using VOSviewer to conduct bibliometric analysis, the key research content and themes are revealed. Through classification method, EX-CEL is used to summarize models and algorithms used in literature comprehensively. key findings contain four topic identification models and algorithms categories, the common index system involved in evaluating the effectiveness of the method. Practical implications are that providing reference for model chosen or evaluation when identifying and predicting topic in the future. This research can help to learn the overall progress in text analysis research, and provides useful reference to select and apply the appropriate models, algorithms and indicators.

2 To improve the introduction, it would be beneficial to clearly state the research objectives and questions, provide more background information to contextualize the study. In addition, add the research gap. The contribution not clear.

Response ：

Thanks for your suggestion. We rewrite the research objectives and questions in the introduction part to make sure improving the background information of this research. See the section marked in red for details. Specifically, the research objectives and question are to reveal the scenarios and applications of algorithms and models for topic identification and prediction in China, identify the most frequently used algorithms and models for both types of systems, and provides indicators to assess their effectiveness. The research gap includes that the scenarios and applications of these two types of systems need to be further explored. It is also not clear which indicators and features are used to select the appropriate algorithms and models for topic identification and prediction. At the same time, the current studies focus on one or two kinds of models and algorithms and mainly used quantitative methods but not scan these literature piece by piece, whose conclusion provide a general reference at a more coarse-grained level. The contribution is that collected relevant research results on topic identification and evolution systematically, and made statistics and analysis on the topic identification and evolution model, methods and indicators adopted, with a view to providing reference for future research on topic identification and trend prediction.

3 The related work section is weak. Add more details and studies in Related Work section.

Response ：

Thanks for your advice. We add more related literature and improve the related work section. See the section marked in red for details. We have added literature published in the last two years. Specifically, Park et al. proposed a text mining framework to extract the hot topics in speeches. Savin et al. identified around 20 topics belonging to service robotics by applying topic modelling. Luo et al. reviewed urban flood numerical simulations by summarizing the calculation methods of surface runoff, drainage systems, and coupled models. Shelton et al. provided an overview on the method of data collection, sampling, and analysis for qualitative research. Shuoshuo et al. compared different research methods of soil cracking field.

4 Add the discussion section and should be strengthened with more details.

Response ：

Thanks for your suggestion. We improve the discussion section by making sure answering research questions, comparing with former research, specializing contributions and completing implications. See the section marked in red for details. Specifically, we summarize findings from three aspects, namely, the overview of models and algorithms application on topic identification and prediction, the frequency of models and algorithms used from high to low, the characteristics of the indicators involved in models and algorithms evaluation.

5 Add the Implications section

Response ：

Thanks for your advice. We add more implications in 8.1 part. See the section marked in red for details. Specifically, this research contributes in three aspects. First, this study demonstrates a literature review study in a systematic and reproducible manner, demonstrating a standardized research process. Comparing to former study, the same standardized research process and a more fine-grained analysis of the literature are realized, and even achieves a more detailed and in-depth analysis of the content of the literature. Second, this study shows the progress of the algorithm and modeling applications in China comprehensively, which can be used as a reference for further research in multiple fields. Unlike research focusing on specific fields, this research reveals the application of models and algorithms in multiple fields, which can help to learn the overall progress in text analysis research. Third, this study provides ideas and methods for selecting suitable models and algorithms, and also provides a reference for an indicator system to assess the effectiveness of a model or algorithm. These indicators are also supportive for reconfiguring and optimizing algorithms and models.

6 The conclusions are weak. Answer your research question in the conclusions; what did we learn compared with current, significant research (up to 2023).

Response ：

Thanks for your suggestion. We add more implications to answer research questions one by one in the conclusions. In order to compare with current, significant research, we add more reference to explain our significant implication. Specifically, they include:

Yao, L., Lu, GF.(2023). Multi-view clustering indicator learning with scaled similarity. Pattern Anal Applic. https://doi.org/10.1007/s10044-023-01167-7

Ebrahimi, F., Dehghani, M., & Makkizadeh, F. (2023). Analysis of Persian Bioinformatics Research with Topic Modeling. BioMed research international, 2023, 3728131. https://doi.org/10.1155/2023/3728131. 8.

Khaire, U. M., & Dhanalakshmi, R. (2022). Stability of feature selection algorithm: A review. Journal of King Saud University - Computer and Information Sciences, 34(4), 1060-1073. https://doi.org/https://doi.org/10.1016/j.jksuci.2019.06.012.

Zhang, M. Z., Tang, Q., Kim, J. G., Burgstaller, B., & Kim, S. D. (2023). Adaptive Regression Prefetching Algorithm by Using Big Data Application Characteristics. Applied Sciences-Basel, 13(7), Article 4436. https://doi.org/10.3390/app13074436

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Improvement suggestions:

- Authors provide three research questions. They explore the relevance of these questions. That’s good but not enough. Also explore the relevance of these questions regarding the current knowledge on the field.

- Theoretical contributions of this field are still not clear.

- Explore better the limitations of this study. It is not enough to say “The limitation of this paper is that the comparative analysis of various methods is still insufficient.”

Author Response

Dear Reviewer,

We highly appreciate for your extensive reading, excellent comments and suggestions. We have revised our article in the best way as we could. Here are comments and responses.

Response to Reviewer 1 Comments

Reviewer 1: Improvement suggestions:

Authors provide three research questions. They explore the relevance of these questions. That’s good but not enough. Also explore the relevance of these questions regarding the current knowledge on the field.

Response ：

Thanks for your advice. In 3.1 Research Questions, we further explain the relevance of these questions by exploring its links with existing research, and illustrates its contribution to the advancement of current research in the field. See section marked in red.

Specifically, Through Question 1, this study composes an overall overview of the application of algorithms and models, including all types, numbers and chronological distribution of algorithms and models. Since current research focus more on the application of algorithms or models in specific field, it is essential to understand the whole landscape to better promote the effective selection and application of algorithms and models. By exploring Question 2, essential algorithms and models are filtered out and their specific ways of application are parsed. This can be correlated with previous studies to dissect how exactly they apply specific algorithms and models to inform the subsequent selection and optimization of algorithms and models. Finally, addressing Question 3 provides indicators for selecting and evaluating algorithms and models, and determining the most appropriate algorithms and models when analyzing topics of texts and its validity can be verified. This allows for the optimization and evaluation of existing algorithms and models[9-10] to advance research in the field of text analysis.

Theoretical contributions of this field are still not clear.

Response ：

Thanks for your helpful advice. We explain the theoretical contributions of this field in 8.1 Academic and practical contributions. See section marked in red.

Specifically, this can theoretically guide the classification and application of algorithms and models, and explore methods and techniques for the analysis of multiple types of text or data. Based on this, researchers can better able to cope with the emergence of big data in text classification and promote cutting-edge machine learning and deep learning techniques.

Explore better the limitations of this study. It is not enough to say “The limitation of this paper is that the comparative analysis of various methods is still insufficient.”

Response ：

Thanks for your helpful suggestion. We add more content about limitations and corresponding suggestions in 8.2 limitations and suggestions. See section marked in red.

Specifically, the limitation of this paper is that the comparative analysis of various methods and the validation of the evaluation indicators for algorithms and models are still insufficient, which enables the features of each type of algorithm and model to be more prominent and provides more operational guidance. At the same time, it is useful to include more literature from other database to enable comparison between Chinese and other national studies. For future study, researchers should obtain a larger sample of literature from several databases and consider the uniqueness of various identification and prediction models and algorithms and compare their advantages and disadvantages, so as to choose proper methods to solve research questions. In addition, for the indicators used in the models and algorithms, it should be further applied to algorithms and models in the future so that its validity can be verified.

Author Response File: Author Response.docx

Reviewer 2 Report

The revised paper is better than the first submission.

Author Response

Response to Reviewer 2 Comments

Comments and Suggestions for Authors

The revised paper is better than the first submission.

Response ：

Thanks for your kind comments. In this review, we have further optimized the interpretation of the importance of the research questions, highlighted the theoretical contributions of this study, explained the limitations of the paper and presented future research perspectives.

Author Response File: Author Response.docx

Article Menu

Research Practice and Progress of Models and Algorithms Applied in Topic Identification and Prediction Based on the Analysis of CNKI

Further Information

Guidelines

MDPI Initiatives

Follow MDPI