Review Reports - Resilience of Scientific Collaboration Networks in Young Universities Based on Bibliometric and Network Analysis

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Thank you for the opportunity to review the manuscript “Resilience of Scientific Collaboration Networks in Young Universities based on Bibliometric and Network Analysis.”

The findings are of broad interest and align with the aims and scope of the Data journal.

The manuscript examines the resilience of collaboration networks at four “young” universities: Astana IT University, Nazarbayev University, University of Suffolk, and ShanghaiTech University. The authors analyze the publication activity of researchers from the aforementioned institutions and apply network analysis to co-authorship relations. They construct undirected, weighted graphs, compute a range of network metrics, and assess the robustness of the collaboration networks via simulation-based node-removal scenarios.

I appreciate the authors’ effort to tackle a topic that feels both timely and underexplored. The study design and data analysis are generally sound and make a useful contribution to the discussion of scientific collaboration in young universities.

At the same time, there are several weaknesses that, in my opinion, need further refinement before the paper is ready for publication.

1. Please explain why these four universities—AITU, NU, US, and STU—were selected for the study. Were they meant to represent a wider group of “young” universities, or were they selected mainly because of data availability or other practical reasons? It would also be useful to discuss whether this selection could have introduced any bias.

2. In addition, if studies on young universities are scarce, the authors should acknowledge this gap, justify their scope, and discuss implications for interpretation and external validity.

3. I recommend that the authors maintain consistent terminology throughout the manuscript, explicitly distinguishing among resilience, robustness, and vulnerability.

4. Considering the data in Table 1 and the authors’ statement that “the largest in terms of the number of participants and overall integration is the STU network,” I wonder whether the large-scale differences (e.g. Number of nodes and edges, Size of the largest connected component)—particularly between STU and the US—might have influenced the comparison of resilience. Have the authors controlled for network size and disciplinary composition when comparing resilience, or considered normalizing some indicators or stratifying analyses by scale or discipline?

5. I suggest clarifying the procedures used to clean and standardize affiliations and to remove duplicate records, specifying how many records were manually corrected versus automatically processed. Consider providing illustrative examples or applied rules, and discuss the implications of these corrections. In general, including a graphical diagram that outlines the research procedure would substantially improve the transparency and interpretability of the study.

6. The authors should consider providing precise OpenAlex API queries, including selected fields, data collection dates, and inclusion/exclusion criteria (document types, language, and affiliation matching rules), and include scripts or pseudocode in the Supplementary Material.

7. Considering the Supplementary Material, please consider providing additional information, such as the annual counts of publications and the number of unique authors affiliated with each university by year.

8. According to the authors, “The weight of each edge, or the strength of the connection between universities, was defined as the number of joint scientific publications over the specified observation period. To reduce the influence of background connections, only pairs of universities with a connection strength greater than two were considered in the analysis”. Have the authors conducted sensitivity analyses using alternative thresholds, as the results may be sensitive to this choice?

9. The authors stated that: “Since the complete network proved to be too dense for clear visualization, a “core” of the network was extracted for presentation purposes by applying an additional filter to retain only nodes with the highest total connection strength (threshold weighted degree greater than 2000)”. Could you please explain the origin of the threshold weighted degree >2000 for visualizations. Maybe it could hide relevant structure? Have the authors explored visualizations based on alternative cutoff values or without thresholding to assess the robustness of their results?

10. According to the authors, “For each removal fraction, multiple simulations were conducted, and the results were averaged to reduce random fluctuations”. Please consider providing more details about the number of simulation runs and the measures/methods used, as this information is essential for reproducibility and for evaluating the robustness of the results.

12. The authors stated that: “Errors were also identified in the OpenAlex catalog, particularly in university affiliations. These mistakes were corrected prior to the resilience assessment procedure.” Please quantify the number of records that required correction and comment whether these manual corrections materially affect the results.

1. comparative studies on collaboration and network resilience in higher education institutions,
2. methodological literature on network robustness and appropriate models,
3. bibliometric analyses of institutional collaboration and publication weighting.

In particular, please update references with recent studies (preferably last 5–7 years) and include comparative empirical works even from different country contexts or institution types.

Final comment:

The study addresses a novel and relevant topic—the resilience of collaboration networks in very young universities—using open bibliographic data from OpenAlex and describes the data pipeline that enhances reproducibility, while the application of a comprehensive set of network metrics provides a thorough and multidimensional view of the analyzed structures.

The manuscript has solid potential but requires important methodological clarifications, sensitivity checks, and greater transparency to ensure reproducibility and strengthen interpretation.

Author Response

Thank you for your valuable comments, which helped us improve the clarity and overall quality of our manuscript. We have made general revisions to the paper, combining several figures to enhance readability and avoid redundancy. In addition, we have implemented the specific corrections and adjustments according to your remarks, as detailed below:

Thank you for this valuable comment. A combination of representativeness and practical data availability determined the selection of universities. Since the study was conducted within the framework of a grant-funded project, our primary focus was on selecting universities from the Republic of Kazakhstan that represent different stages of institutional development (AITU – 6 years and NU – 15 years). To compare the results with those of other young universities, we included a leading young Asian university — STU (12 years) — and a small young European university — US. The study considered universities founded within the past 20 years; unfortunately, the sample of such institutions is limited. The second selection criterion concerned the availability of university data in OpenAlex. Some young universities in OpenAlex are listed as subsidiary institutions of older parent universities. In such cases, it is not easy to distinguish publications and correctly attribute them to either the young subsidiary or the parent institution. For example, in OpenAlex, Cornell Tech is not represented as a separate institution — it is a campus of Cornell University located on Roosevelt Island, so affiliations such as “Cornell Tech” are parsed as Cornell University. Similarly, CEA Paris-Saclay is part of the French Alternative Energies and Atomic Energy Commission (CEA) and is affiliated with Université Paris-Saclay as its “parent institution.” Clarifications regarding your comment have been added to the manuscript in Section 3.

2. In addition, if studies on young universities are scarce, the authors should acknowledge this gap, justify their scope, and discuss implications for interpretation and external validity.

Thank you for the comment. It should be noted that there are relatively few scientific studies on resilience collaboration networks in young universities. Moreover, different studies use different criteria to determine what qualifies as a “young” university. For example, some works analyze institutions founded in the 1970s and still refer to them as young universities. Given that the dynamics of scientific collaboration worldwide have changed significantly over the past several decades, there is a clear need for network analysis focused on universities established in the current century. We have addressed the implications of this limitation for the interpretation of our results in the discussion section.

3. I recommend that the authors maintain consistent terminology throughout the manuscript, explicitly distinguishing among resilience, robustness, and vulnerability.

Thank you for the comment. We have made the corresponding revisions to the manuscript.

Thank you for this insightful comment. For the four universities considered in our study, differences in network size did not significantly affect the results. Most of the indicators we used are dimensionless and therefore inherently account for network scale when calculated. For the Size of the Largest Connected Component, we reported relative values (percentages) in the discussion of the results, allowing direct comparison across universities of different sizes. Moreover, the scale of universities could have a critical influence on the outcomes. For example, the University of Technology Nuremberg, which has data on only seven researchers and 20 publications, was intentionally excluded from the analysis. This ensured that the comparison of resilience metrics remained consistent and meaningful across all selected universities.

During the study, we conducted a preliminary analysis and data cleaning. For example, when analyzing data for AITU, founded in 2019, we identified 13 publications from 2018 authored by this university's faculty. These errors were corrected. However, it should be noted that, given the overall volume of publications, such inaccuracies do not significantly affect the study's results.

Thank you for the comment. We have added information about the query to the manuscript.

Thank you for the comment. We have added Table 1 and included additional information about the universities in the manuscript..

Thank you for the comment. We did not conduct a separate sensitivity analysis; however, we acknowledge that the choice of threshold significantly affects the network’s resilience — the higher the threshold, the lower the resilience. Nonetheless, we decided not to use the minimum threshold of 1, as we believe that a single joint publication between researchers does not indicate an established or stable collaboration. This clarification has been added to the Materials and Methods section of the manuscript.

This value was chosen to improve the visualization of the results. For smaller threshold values, the nodes and edges overlap, making the graph difficult to interpret. The complete dataset is provided in the Supplementary Materials at https://github.com/Andrashko/publications/tree/main. Thus, readers can, if desired, reconstruct the network using different threshold values..

The extraction procedure was performed ten times for each university. A larger number of simulations was not considered, as each simulation run requires a substantial amount of time. The results were then averaged using the mean of the obtained indicators. Other metrics were not analyzed. This information has also been added to the manuscript in paragraph 3.2.

11. As reported by the authors, “The study revealed that NU and STU have a highly resilient structure of scientific collaboration”. Considering the results presented, is the assertion that one university is “more resilient” adequately supported by statistical evidence?
In our study, NU and STU were found to be significantly more resilient compared to the other young universities. This is clearly illustrated in Figure 4. Naturally, this resilience remains vulnerable to the removal of key authors. Overall, these indicators are consistent with the findings of other studies analyzing more mature universities. Therefore, we draw the conclusion regarding the resilience of NU and STU based on this observation.

Thank you for the comment. In our response to comment 5, we provided quantitative information regarding the detected data errors and included this clarification in the manuscript. We noted that the number of such errors was small and, given the overall dataset size, they could not have significantly affected the obtained results.

13. The literature in the manuscript appears relatively sparse and narrowly focused. This restricts the contextualization of the findings and makes claims regarding novelty and generalizability less robust. I suggest that the authors consider broadening the literature to include: comparative studies on collaboration and network resilience in higher education institutions, methodological literature on network robustness and appropriate models, bibliometric analyses of institutional collaboration and publication weighting. In particular, please update references with recent studies (preferably last 5–7 years) and include comparative empirical works even from different country contexts or institution types.

Thank you for the comment. We have added additional information to the literature review to strengthen the justification and contextual grounding of the study within the manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors need to disclose in more detail why these countries and universities were chosen, and discuss the impact of such choices on the research conclusions.
Both Web of Science and Scopus are widely used databases, as documented by the Scientometrics community; the authors need to explain why they did not choose these two well-known databases.
The completeness and accuracy of the author address field for the new OpenAlex (Zhang, Cao, et al., 2024) and established Web of Science (Liu, Hu, et al., 2018) and Scopus (Liu & Wang, 2025) SHOULD be disclosed or added as a limitation of your study.
Many figures should be merged for direct comparison.
Add some subheadings and shorten this manuscript.
The quality of the metadata in OpenAlex and also the established databases such as Web of Science and Scopus should be discussed.
You should also note the differences in disciplines, as well as the distinctions between multilateral co-authorship and bilateral co-authorship.

Author Response

1. The authors need to disclose in more detail why these countries and universities were chosen, and discuss the impact of such choices on the research conclusions.

Thank you for this valuable comment. A combination of representativeness and practical data availability determined the selection of universities. Since the study was carried out within the framework of grant funding, our primary interest was in selecting universities from the Republic of Kazakhstan at different stages of institutional development (AITU – 6 years and NU – 15 years). To compare the results with those of other young universities, we included a leading young Asian university — STU (12 years) — and a small young European university — US. The study focused on universities established within the past 20 years; however, the sample of such institutions is rather limited. The second selection criterion concerned the availability of data in OpenAlex. Some young universities in OpenAlex are listed as subsidiary institutions of older parent universities. In such cases, it is challenging to separate publications and correctly attribute them either to the young subsidiary or to the parent institution. For example, in OpenAlex, Cornell Tech is not treated as an independent institution — it is a campus of Cornell University located on Roosevelt Island, so affiliations such as “Cornell Tech” are parsed as Cornell University. Similarly, CEA Paris-Saclay is part of the French Alternative Energies and Atomic Energy Commission (CEA) and is affiliated with Université Paris-Saclay as its “parent institution.” Clarifications regarding this issue have been added to the manuscript in Section 3.

2. Both Web of Science and Scopus are widely used databases, as documented by the Scientometrics community; the authors need to explain why they did not choose these two well-known databases.

The total number of publications for young universities is relatively small. Consequently, their representation in Web of Science and Scopus is even more limited than in OpenAlex, which includes all publications with a DOI. Therefore, OpenAlex was chosen as the primary data source for this study.

3. The completeness and accuracy of the author address field for the new OpenAlex (Zhang, Cao, et al., 2024) and established Web of Science (Liu, Hu, et al., 2018) and Scopus (Liu & Wang, 2025) SHOULD be disclosed or added as a limitation of your study.

Thank you for the comment. We have added information about these publications to the Limitations section of the manuscript.

4. Many figures should be merged for direct comparison.
Thank you for the comment. We have revised the manuscript by combining the figures to improve the clarity of presentation and to avoid unnecessary repetition.

5. Add some subheadings and shorten this manuscript.
We have shortened the manuscript text and combined the information describing the analysis of each individual university into a single consolidated paragraph.

6. The quality of the metadata in OpenAlex and also the established databases such as Web of Science and Scopus should be discussed.

In our study, we performed a preliminary analysis and data cleaning. For example, when analyzing AITU data in OpenAlex (an institution founded in 2019) we found 13 publications dated 2018 whose authors were affiliated with the university. These inaccuracies were corrected. However, given the total number of publications, such minor discrepancies did not have a significant impact on the study’s results, as noted in the manuscript. Web of Science and Scopus were not selected for the analysis because the number of publications from such young universities in these databases is very limited, which would make the resulting collaboration networks insufficiently representative. Overall, the total number of publications for young universities is relatively small. Therefore, their representation in Web of Science and Scopus is even lower than in OpenAlex, which includes all publications with a DOI. For this reason, OpenAlex was chosen as the primary data source.

7. You should also note the differences in disciplines, as well as the distinctions between multilateral co-authorship and bilateral co-authorship.

Thank you for the remark. We did not set this task in the article. However, it will be interesting to consider it in the future.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

General comment

I appreciate the authors’ thorough and constructive responses to my comments. The revisions have clearly improved the clarity, contextualization, and methodological transparency of the paper. Most of my concerns appear to have been addressed, though a few points could still benefit from additional detail or explicit clarification in the manuscript.

Specific comments:

1. Selection of universities – The explanation is detailed and reasonable, especially the rationale combining representativeness and data availability.

2. Gap and scope of young-university research – The authors have satisfactorily acknowledged the scarcity of prior studies. It would strengthen the discussion further if the implications for external validity, i.e., transferability beyond the sample) were briefly elaborated in the paper.

3. Terminological consistency – Acknowledged and resolved; no further comments.

4. Controlling for network size and disciplinary composition – The clarification that most indicators are dimensionless and that relative measures were used is helpful. However, I suggest making this rationale explicit in the Materials and Methods section to ensure readers clearly understand how comparability was maintained.

5. Affiliation cleaning and duplicate removal – The description remains somewhat general; however, it appears sufficient for the purposes of this study.

6–7. OpenAlex query and supplementary data – The inclusion of query information and additional tables is appreciated and addresses the comment adequately.

8. Threshold sensitivity analysis – The rationale for selecting a threshold >2 is clear, though it would be valuable to briefly acknowledge this as a potential limitation (since alternative thresholds could alter results).

9. Visualization threshold (weighted degree >2000) – The reasoning for readability is acceptable, but I suggest adding one sentence in the manuscript explicitly noting that this was chosen for visual clarity and that the full data are available for replication.

10. Simulation runs – The clarification of ten iterations per university is useful. It would be helpful to mention whether variance across runs was examined to support claims of stability. This is just a suggestion for your consideration.

11. Claims of resilience differences – The justification referencing Figure 4 helps, but if possible, the authors could try to clarify whether statistical comparison (e.g., confidence intervals or effect size metrics) was considered to substantiate the “significant” difference wording.

12. Manual corrections and data errors – The clarification is acceptable. Please ensure quantitative information about the corrections is visible in the main text or supplementary materials.

13. References section – The addition of more recent and comparative works is noted and appreciated. Although the number of added publications is not substantial, it can be considered satisfactory.

Overall recommendation: The authors have made meaningful revisions that improve the paper’s rigor and readability. After minor clarifications, the manuscript should be suitable for publication.

Author Response

We have revised the manuscript in accordance with your recommendations. In addition, we conducted an additional experiment for one of the universities, using only the data included in the Web of Science Core Collection, as suggested by another reviewer.

1. Selection of universities – The explanation is detailed and reasonable, especially the rationale combining representativeness and data availability.

Thank you.

We have supplemented the discussion with an explanation of the implications for external validity.

3. Terminological consistency – Acknowledged and resolved; no further comments.

Thank you.

Thank you. We have explicitly added this information in the Materials and Methods section.

5. Affiliation cleaning and duplicate removal – The description remains somewhat general; however, it appears sufficient for the purposes of this study.

6–7. OpenAlex query and supplementary data – The inclusion of query information and additional tables is appreciated and addresses the comment adequately.

Thank you.

Thank you for the comment. We have incorporated this clarification into Section 4. Limitations and Future Research.

Thank you. We have added this explanation to the manuscript text.

Each simulation included 50 points corresponding to the sequential removal of 1 to 50% of nodes in the network of researchers. For each point, the variance of the results was calculated based on ten independent experiments. The obtained values reflect the stability of the network structure under random node removal and make it possible to assess the degree of variability in the system’s response to destructive influences. Subsequently, for each university, the average variance value across all simulation points was computed. The results revealed significant differences among the universities. In particular, the variance for AITU is 0.01232, for US – 0.02998, for NU – 0.00372, and for STU – 0.00257. As can be seen, for NU and STU, the variance is approximately an order of magnitude lower than for AITU and US, which indicates higher stability of the obtained results and a more homogeneous reaction of the network to a decrease in the number of active nodes. One possible explanation for this difference may be the varying size of the initial samples: for example, in the case of US, the number of publications and authors is significantly smaller, which increases the model’s sensitivity to random removals. However, this factor is not the only one. The dynamics of young universities’ development may also play a significant role, as they are characterized by an as yet unstable structure of scientific collaboration, a limited number of stable inter-institutional connections, and high variability in the composition of active authors. Thus, higher variance values for AITU and US can be interpreted as an indication of greater structural instability of their scientific collaboration networks compared to NU and STU. The results of this additional analysis have been incorporated into the manuscript.

12. Manual corrections and data errors – The clarification is acceptable. Please ensure quantitative information about the corrections is visible in the main text or supplementary materials.

13. References section – The addition of more recent and comparative works is noted and appreciated. Although the number of added publications is not substantial, it can be considered satisfactory.

Thank you.

Reviewer 2 Report

Comments and Suggestions for Authors

The abstract should be shortened.
The sub-datasets and coverage years of the used Web of Science Core Collection should be disclosed as suggested in a study in Scientometrics in 2019.

Author Response

We have revised the manuscript in accordance with your recommendations. We have also conducted a statistical comparison, as suggested by another reviewer.

1. The abstract should be shortened.

Thank you for the comment. The abstract has been shortened.

2. The sub-datasets and coverage years of the used Web of Science Core Collection should be disclosed as suggested in a study in Scientometrics in 2019.

Thank you for the comment. We experimented with one of the universities using only the data included in the Web of Science Core Collection. The analysis of the obtained curves showed that, despite the decrease in the absolute values of the largest connected component after restricting the dataset to Web of Science records only, the overall behavior of the functions remained practically unchanged. To quantitatively assess the similarity between the results obtained from the full dataset and those derived from the WoS data, Pearson correlation coefficients were calculated between the corresponding curves for different scenarios. All obtained coefficients exceeded 0.85, and in most cases approached 1.0, indicating a very high correlation between the curves. This means that excluding part of the data (in particular, restricting the analysis to publications indexed in WoS) did not affect the overall dynamics of network degradation, with the changes mainly limited to the scale of the metrics. Thus, using only WoS data does not provide substantial analytical advantages; instead, it merely reduces the sample size and consequently the scale of the network, without altering the fundamental patterns of its structural behavior. The results of this experiment, along with the corresponding graphs, have been added to the manuscript.