Review Reports - Evaluating Generative AI for Identifying Ethical, Legal, and Social Dimensions in Migration Narratives: A Case Study of Ukrainian Discourse

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript presents an interesting and thought-provoking analysis of Ukrainian migration narratives through a combination of Generative AI and human experts’ analysis. Two different data sets were collected: one consisting of official governmental publications from multiple countries hosting Ukrainian refugees and the second one made up of the Telegram messages by Ukrainian refugees staying in the countries represented in the first data set (UK, Poland, Spain, Italy, US, Canada etc.). On the basis of the central role played by ethical, legal and social issues shaping public communication and fostering social cohesion, the author(s )address two research questions:

“Can generative AI models identify the ethical, legal, and social (ELS) components embedded in textual content?” (l. 68-69)
“How can the alignment between governmental and Ukrainian migrant narratives be evaluated across ethical, legal, and social aspects?” (lines 88-89)

Their meticulous procedure, supported by extensive literature and computational social science methods, helps them develop a framework, “which combines taxonomy-driven modelling with generative AI and expert-in-the-loop validation” (lines 18-19) with a view to distinguishing the ethical, legal, and social issues in both datasets. According to the authors, the findings highlighted by this framework point to a thematic and temporal misalignment between the two datasets: the former being more focused on legal aspects especially between 2022 and 2024, and the latter more focused on social aspects, as to be expected of textual messages composed by refugees.

The manuscript is generally well written, makes a good use of sources and builds a strong argument for the 1^st research question: the framework sounds convincing as each step is extensively detailed (lines 231-605). However, a noticeable weaknesses in my view lies in the fact that no excerpts from either dataset is included to explain what is meant exactly by ethical, legal and social and give us a flavour of the differences (if any) between the datasets. This seems particularly important considering that these aspects may have an entirely different meaning for governments and individuals. Furthermore, because one of the interesting findings relate to the minimal presence of the ethical category in both datasets (lines 639-648) it is important to clarify what stands for “ethical” in this study with an appropriate definition/clarification/ and practical examples of ethical issues.

Table 7 does provide some excerpts from the datasets but they are not primarily meant to give a flavour of the different categories, rather not to to point to the mismatches between human and Gen AI evaluations (please beware that examples 2 and 3 in Dataset 1 and Dataset 2 appear to be convergent rather than mismatched). Moreover, the Telegram messages in Table 7 sound promotional and lack the kind of ‘personal touch’ we would expect of individual users’ Telegram posts. Unfortunately, the two additional sources separately attached to the manuscript are not helpful as they seem to be entirely in Ukrainian (However, I am not sure this may be due to my computer settings or my lack of expertise with the platform interface).

In light of the lack of examples from the two datasets, the answer to the second section, which is much less detailed as it is included in the Discussion from line 602 to line 648, appears to be somewhat sketchy. Concrete examples of the thematic and temporal misalignment should be provided by the Author(s) to support their conclusions. In particular, as for the temporal misalignment, it is not clear what it refers to. Please check the sentence (lines 620-623): “Temporal misalignment is illustrated by two contrasting trends. Between 2022 and 2024, governmental discourse shifted toward legal concerns, increasing by 8.4%, while reducing its emphasis on social issues by 6.7%. In contrast, migrant-generated messages remained relatively stable, with only a minor decrease of 0.7% and no comparable shift toward the legal dimension.” When did the governmental discourse shift towards legal concerns? The time indication “between 2022 and 2024” sounds too broad especially as the two datasets date back precisely to these two years.

Some minor points to take care of are the following:

Please review whether “state institutions” is the appropriate expression in this excerpt: “long-established state institutions such as the media, social surveys, and monitoring assessments, as well as more recent open government platforms (Schmidthuber et al. 2021)… (lines 30-31;
Please revise column 2 in Table 1: The heading “Names of Countries/Websites” appears to include two different categories
Please clarify the following sentence and provide examples of your selection, if possible: “the selection of Telegram channels was informed by migration flow statistics reported by the United Nations” (lines 275-276);
Please check whether “migration narratives” is appropriate in the title and the datasets: are narrative genres included? Is there a difference between migrants and refugees?
Please consider rephrasing “entities” in line 478;
Please check line 525.

I enjoyed reading this paper, which provides a valuable framework as long as the categories of analysis (ethical, legal, social) are properly clarified and exemplified. I would like to thank the Author(s) for their interesting research and invite them to consider including guidelines on how to use this framework with other datasets and in other contexts.

Author Response

Thank you very much for taking the time to review this manuscript. We sincerely appreciate your valuable comments and recommendations. Please find our detailed responses below. The corresponding revisions have been highlighted using track changes in the resubmitted manuscript files.

Comments 1: “However, a noticeable weaknesses in my view lies in the fact that no excerpts from either dataset is included to explain what is meant exactly by ethical, legal and social and give us a flavour of the differences (if any) between the datasets. This seems particularly important considering that these aspects may have an entirely different meaning for governments and individuals.”

Response 1: Thank you for this important observation. We have added Appendix B, which presents selected excerpts from official governmental documents and Telegram-based migrant discourse corresponding to each ELS category as examples of the model’s classification. The appendix is introduced in lines 537–539.

Comments 2: “Furthermore, because one of the interesting findings relate to the minimal presence of the ethical category in both datasets (lines 639-648) it is important to clarify what stands for “ethical” in this study with an appropriate definition/clarification/ and practical examples of ethical issues.”

Response 2: Thank you for this important observation. We expanded the Discussion section to clarify the meaning of the ethical dimension within the Migration ELS Taxonomy. Specifically, we added a definition explaining that the ethical category refers to normative considerations related to moral responsibility, social justice, human rights, inclusion, vulnerability, and the ethical treatment of migrants and refugees. We also incorporated practical examples of ethical issues, including fairness in refugee treatment, protection of vulnerable groups, discrimination, humanitarian responsibility, and moral obligations toward displaced populations. (lines 679-687)

Comments 3: “Table 7 does provide some excerpts from the datasets but they are not primarily meant to give a flavour of the different categories, rather not to to point to the mismatches between human and Gen AI evaluations (please beware that examples 2 and 3 in Dataset 1 and Dataset 2 appear to be convergent rather than mismatched). Moreover, the Telegram messages in Table 7 sound promotional and lack the kind of ‘personal touch’ we would expect of individual users’ Telegram posts.”

Response 3: Agree. We supplemented the manuscript with a new appendix containing category-oriented excerpts. At the same time, we agree that many Telegram messages exhibit a promotional style, as users frequently share advertisements or reproduce information from official institutions, sometimes supplementing it with personal attitudes or commentary

Comments 4: “Unfortunately, the two additional sources separately attached to the manuscript are not helpful as they seem to be entirely in Ukrainian (However, I am not sure this may be due to my computer settings or my lack of expertise with the platform interface).”

Response 4: Agree. We renewed and updated the dataset containing Telegram messages to try to improve its accessibility and compatibility across different platforms and interface settings (line 824)

Comments 5: “In light of the lack of examples from the two datasets, the answer to the second section, which is much less detailed as it is included in the Discussion from line 602 to line 648, appears to be somewhat sketchy. Concrete examples of the thematic and temporal misalignment should be provided by the Author(s) to support their conclusions. In particular, as for the temporal misalignment, it is not clear what it refers to.”

Response 5: Thank you for this important observation. We revised the Discussion section to provide more concrete examples from both datasets illustrating the thematic and temporal misalignment between governmental and migrant discourses. Specifically, we added examples of social and legal themes identified in official documents and migrant-generated narratives. (lines 643-648, 651-657). We also clarified the meaning of temporal misalignment by explicitly defining it as divergent trends in the emphasis of legal, and social dimensions over time across institutional and migrant discourses.

Comments 6: “Please check the sentence (lines 620-623): “Temporal misalignment is illustrated by two contrasting trends. Between 2022 and 2024, governmental discourse shifted toward legal concerns, increasing by 8.4%, while reducing its emphasis on social issues by 6.7%. In contrast, migrant-generated messages remained relatively stable, with only a minor decrease of 0.7% and no comparable shift toward the legal dimension.” When did the governmental discourse shift towards legal concerns? The time indication “between 2022 and 2024” sounds too broad especially as the two datasets date back precisely to these two years.”

Response 6: Thank you for this observation. We revised the sentence to clarify that the comparison is made between data from 2022 and data collected during the 2023–Feb. 2024 period (lines 658-663)

Comments 7: “Please review whether “state institutions” is the appropriate expression in this excerpt: “long-established state institutions such as the media, social surveys, and monitoring assessments, as well as more recent open government platforms (Schmidthuber et al. 2021)… (lines 30-31;”

Response 7: Thank you for this comment. We revised the expression “state institutions” to “communication and feedback mechanisms”. We agree that this wording is more appropriate in the given context.

Comments 8: “Please revise column 2 in Table 1: The heading “Names of Countries/Websites” appears to include two different categories”

Response 8: Agree. Thank you for this observation. We revised the heading of Column 2 in Table 1 from “Names of Countries/Websites” to “Source Details”, added an explanatory table footer, and standardised several source names (e.g., changing https://www.youtube.com/ to “YouTube”) in order to avoid mixing different source categories.

Comments 9: “Please clarify the following sentence and provide examples of your selection, if possible: “the selection of Telegram channels was informed by migration flow statistics reported by the United Nations” (lines 275-276);”

Response 9: Thank you for this comment. We revised the sentence to clarify that the selection of Telegram channels was based on United Nations migration flow statistics identifying the main host countries receiving the largest numbers of Ukrainian refugees (lines 296 - 298) The list of these countries is provided in the Introduction section (lines 90 - 92).

Comments 10: “Please check whether “migration narratives” is appropriate in the title and the datasets: are narrative genres included? Is there a difference between migrants and refugees?”

Response10: Thank you for this important observation. We clarified the meaning of the term “migration narratives” in the Introduction section by defining it as discursive and thematic representations related to forced migration experiences, refugee integration, institutional responses, and public communication surrounding the displacement of Ukrainians. We also clarified the distinction between the terms “migrants” and “refugees”, noting that although the analysed population primarily consists of refugees, the broader term “migrants” is occasionally used in relation to migration-related discourse and terminology found in prior literature and policy documents (lines 111 - 116).

Comments 11: “Please consider rephrasing “entities” in line 478;”

Response11: Thank you for this observation. We replaced the term “entities” with “concepts” (lines 500-502)

Comments 12: “Please check line 525.”

Response12: Thank you for this observation. We have removed the line.

Comments 13: “ I would like to thank the Author(s) for their interesting research and invite them to consider including guidelines on how to use this framework with other datasets and in other contexts..”

Response13: Thank you very much for your positive evaluation of our research and for this valuable suggestion. We agree that applying the proposed framework to other datasets and contexts could be a very interesting direction for future work

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

It is a very impressive submission which meets the criteria for a strong article in this journal. It combines, in an entirely convincing way, (1) a complex methodological procedure, which is fully innovative; (2) a huge dataset, which is sheding light on ongoing processes; and (3) a powerful application to a topical and relevant issue such as the the intersection of governmental and migrant narratives vis-a-vis the Ukrainian refugees since 2022.

Fig. 1 is a brilliant summary of the GenAI-assisted framework developed for the needs of this study.

In my view, the research design and the logic of developing the article are appropriate.

Just several remarks:

The focus of the study could be explained in more detail: what is the main focus? Is it a methodological paper (it is!) which just offers an example (the Ukrainian refugees) - or is it a migration studies analysis for which an innovative methodological framework has been employed? Both are present, but a focus on one of them, or explanation for the equal weight of both, is perhaps needed.
Can we extend conclusions from Telegram messages to a "migrant discourse" (line 574, etc.)? The authors admit the limitations, but anyway such summarizing visions are abundant, especially in the Discussion section, but also in the Results one.
Doesn't one need a more thorough explanation for the selection of the GPT-3.5 model? For me, "most sutiable option" (lines 306-309) is not very clear. One can ask: Doesn't it weaken the claim for the use of "generative AI models", since we apply a single one?
The ELS taxonomy is a genuine contribution, in my opinion. At the same time: the authors admit that the ethical, legal, and social dimensions are overlapping (lines 653-655), but consider this just a part of the limitations of the study. But if the categories are so vague, doesn't it undermine the classification itself? It needs several sentences to explain.
The near absence of the ethical category in the results is very interesting. But can we hypothesize that ethical concerns are implicit, and thus more difficult to detect than legal or social issues? In the text, there is such a warning (lines 667-670), but perhaps it deserves deeper discussion.
6. Finally, in some cases template sentences are left in the text - for instance, regarding Table 3 (line 349 - "This is a table..."), or Figure 3 (line 525 - "All figures and tables...")

Author Response

Comments 1: “The focus of the study could be explained in more detail: what is the main focus? Is it a methodological paper (it is!) which just offers an example (the Ukrainian refugees) - or is it a migration studies analysis for which an innovative methodological framework has been employed? Both are present, but a focus on one of them, or explanation for the equal weight of both, is perhaps needed.”

Response 1: Thank you for this insightful observation. We clarified the dual focus of the study in the Introduction by clearly indicating that the paper combines both a methodological contribution and an empirical migration discourse analysis. We emphasised that the study develops and evaluates a GenAI-assisted framework for identifying ELS dimensions in textual data, while simultaneously demonstrating the applicability of this framework through the case study of governmental and Ukrainian migrant discourses. (lines 95-101)

Comments 2: “Can we extend conclusions from Telegram messages to a "migrant discourse" (line 574, etc.)? The authors admit the limitations, but anyway such summarizing visions are abundant, especially in the Discussion section, but also in the Results one.”

Response 2: Thank you for this important observation. To avoid overgeneralisation, we revised several formulations throughout the Results and Discussion sections by replacing broader expressions such as “migrant discourse” with more specific formulations, including “migrant discourse observed on Telegram”. (lines 593, 599, 641, 629, 637, 671). In addition, we added a methodological clarification in the Data Collection section stating that, although the analysed Telegram messages cannot fully represent all Ukrainian migrants, previous studies identify Telegram as one of the central communication platforms among displaced Ukrainian communities. Therefore, with appropriate methodological caution, the analysed Telegram discourse may provide important insights into broader communication and narrative patterns among Ukrainian migrants displaced by the war. (lines 285-290)

Comments 3: “Doesn't one need a more thorough explanation for the selection of the GPT-3.5 model? For me, "most sutiable option" (lines 306-309) is not very clear. One can ask: Doesn't it weaken the claim for the use of "generative AI models", since we apply a single one?”

Response 3: Thank you for this important observation. We revised the paragraph in Section 3.2 to provide a more detailed explanation of the generative AI model selection process and to clarify that the objective of the study was not to benchmark different GenAI systems, but rather to investigate the capability of a GenAI-assisted framework for mapping and analysing ethical, legal, and social dimensions in migration-related discourse. (lines 322-332) In addition, we expanded the Limitations section by explicitly acknowledging the reliance on a single generative AI model (GPT-3.5) and noting that future research may extend the framework through comparative evaluations involving multiple generative AI models and architectures.(lines 736-742)

Comments 4: “The ELS taxonomy is a genuine contribution, in my opinion. At the same time: the authors admit that the ethical, legal, and social dimensions are overlapping (lines 653-655), but consider this just a part of the limitations of the study. But if the categories are so vague, doesn't it undermine the classification itself? It needs several sentences to explain.”

Response 4: Thank you for this important observation and for recognising the contribution of the Migration ELS Taxonomy. We expanded the Limitations section to clarify that the overlap between ethical, legal, and social dimensions represents an inherent characteristic of complex normative concepts and does not invalidate the classification itself. We also added several sentences explaining how the four-step human-in-the-loop development pipeline for the Migration ELS Taxonomy was designed to minimise category vagueness and improve conceptual consistency, while acknowledging that some ambiguity between categories remains unavoidable in migration-related discourse (lines 709-715)

Comments 5: “The near absence of the ethical category in the results is very interesting. But can we hypothesize that ethical concerns are implicit, and thus more difficult to detect than legal or social issues? In the text, there is such a warning (lines 667-670), but perhaps it deserves deeper discussion.”

Response 5: Thank you for this insightful observation. We expanded the Discussion section by adding several sentences explaining how ethical concerns may be embedded indirectly in discussions of legal procedures and social difficulties, which may make them more difficult to identify consistently through the model and taxonomy. We also clarified that the similarly low proportion of ethics-related labels assigned by human experts supports the conclusion that explicit ethical reflection remains limited in both governmental and migrant-generated discourse. (line 679 - 697)

Comments 6: “Finally, in some cases template sentences are left in the text - for instance, regarding Table 3 (line 349 - "This is a table..."), or Figure 3 (line 525 - "All figures and tables...").”

Response 6: Thank you for this observation. We apologise for this oversight. We carefully reviewed the manuscript and removed the remaining template sentences related to tables and figures, including those associated with Table 3 and Figure 3 (lines 372, 550)

Author Response File: Author Response.pdf