Using KeyGraph and ChatGPT to Detect and Track Topics Related to AI Ethics in Media Outlets

Li, Wei-Hsuan; Yu, Hsin-Chun

doi:10.3390/math13172698

Open AccessArticle

Using KeyGraph and ChatGPT to Detect and Track Topics Related to AI Ethics in Media Outlets

by

Wei-Hsuan Li

and

Hsin-Chun Yu

^*

Department of Information Management, Tunghai University, Taichung 407224, Taiwan

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(17), 2698; https://doi.org/10.3390/math13172698

Submission received: 29 July 2025 / Revised: 16 August 2025 / Accepted: 18 August 2025 / Published: 22 August 2025

(This article belongs to the Special Issue Artificial Intelligence and Algorithms)

Download

Browse Figures

Versions Notes

Abstract

This study examines the semantic dynamics and thematic shifts in artificial intelligence (AI) ethics over time, addressing a notable gap in longitudinal research within the field. In light of the rapid evolution of AI technologies and their associated ethical risks and societal impacts, the research integrates the theory of chance discovery with the KeyGraph algorithm to conduct topic detection through a keyword network built through iterative semantic exploration. ChatGPT is employed for semantic interpretation, enhancing both the accuracy and comprehensiveness of the detected topics. Guided by the double helix model of human–AI interaction, the framework incorporates a dual-layer validation process that combines cross-model semantic similarity analysis with expert-informed quality checks. An analysis of 24 authoritative AI ethics reports published between 2022 and 2024 reveals a consistent trend toward semantic stability, with high cross-model similarity across years (2022: 0.808 ± 0.023; 2023: 0.812 ± 0.013; 2024: 0.828 ± 0.015). Statistical tests confirm significant differences between single-cluster and multi-cluster topic structures (p < 0.05). The thematic findings indicate a shift in AI ethics discourse from a primary emphasis on technical risks to broader concerns involving institutional governance, societal trust, and the regulation of generative AI. Core keywords, such as bias, privacy, and ethics, recur across all years, reflecting the consolidation of an integrated governance framework that encompasses technological robustness, institutional adaptability, and social consensus. This dynamic semantic analysis framework contributes empirically to AI ethics governance and offers actionable insights for researchers and interdisciplinary stakeholders.

Keywords:

AI ethics; topic detection; chance discovery; KeyGraph; keyword network; ChatGPT

MSC:

68T30

1. Introduction

The rapid advancement of artificial intelligence (AI) has transformed lifestyles by offering novel solutions to real-world problems and challenges and has introduced revolutionary changes across application domains. Although AI systems enhance efficiency and generate value, the ethical and societal issues they raise have become global concerns. For instance, the rapid development of various technologies (e.g., autonomous driving, drones, smart health care, and generative language models) has led to ethical risks, including algorithmic bias, data privacy infringements, opacity in decision making, and ambiguous accountability. In response to these challenges, multiple international AI ethics guidelines and governance frameworks have been introduced, including the European Union’s AI Act and various national white papers on technology ethics, which aim to address the ethical and social risks posed by technological advancements through policy and institutional design [1,2]. As AI applications penetrate deeper into decision-making processes and social governance, building AI systems with ethical sensitivity and social legitimacy has become inevitable in technological development and is a cornerstone for maintaining social trust and promoting sustainable development.

Against this backdrop, academic and public interest in AI ethics has significantly increased worldwide. Systematic searches of scholarly databases reveal that existing research encompasses diverse topics, including data governance, model fairness, transparency design, and accountability ethics, forming an interdisciplinary and multifaceted body of knowledge [3,4,5,6]. Nevertheless, most prior studies have focused on static articles or in-depth analyses of individual issues, without a systematic examination of whether semantic shifts, thematic evolution, or value-focused transformations occur in AI ethics discourse over time. Given the continuous and rapid evolution of AI technology and its application contexts, the discourse on ethical issues may significantly change over time and potentially shift in thematic focus due to event-driven factors, policy interventions, and public opinion.

Motivated by these considerations, this study employs topic detection techniques in text mining to conduct an in-depth analysis of multiple unstructured articles. This work explores whether the themes addressed in AI ethics reports demonstrate stable and consistent focal points or reveal dynamic shifts and contextual changes alongside technological and temporal developments. This research aims to identify the underlying semantic shifts and trends in AI ethics topics by thoroughly examining and comparatively analyzing recent AI ethics articles published by academic institutions, media outlets, and nonprofit organizations [7,8,9].

This study integrates the KeyGraph text mining algorithm grounded in the theory of chance discovery with the generative AI capabilities of the large language model (LLM)-based ChatGPT-4o tool to address the challenges of semantic analysis in unstructured textual datasets. This integration establishes an innovative research workflow combining structured mining with semantic interpretation. KeyGraph, a graph-based method, constructs keyword networks and knowledge graphs by calculating the frequency and co-occurrence strength of keywords. Notably, KeyGraph identifies “chance” keywords that, despite their low frequency, possess significant bridging value in the network structure, revealing latent topics or emerging concepts in the dataset. This approach surpasses traditional frequency-based methods by uncovering hidden associations, offering the substantial potential for analyzing thematic evolutions and shifts in ethical focus [10].

Unlike standard topic modeling techniques such as latent Dirichlet allocation (LDA) [11], non-negative matrix factorization (NMF) [12], and transformer-based models such as BERT [13], which are unsupervised learning methods that require predefined topic numbers, statistical distributions, or pretrained embeddings to infer latent topics, topic detection emphasizes the identification of semantic associations and dynamic thematic changes. Topic detection is particularly suitable for analyzing rapidly evolving, cross-temporal, or problem-oriented datasets. The KeyGraph algorithm is a well-recognized topic detection tool that captures nonlinear co-occurrence relationships among keywords, making it appropriate for exploring complex ethical issues characterized by competing values and shifting contexts, as in this study [14,15,16].

Furthermore, this study employs ChatGPT as an auxiliary tool for semantic interpretation and summary generation to overcome the subjective limitations in conventional topic analysis methods that rely on expert interpretation for deep semantic understanding and domain knowledge. By applying ChatGPT’s advanced language comprehension and summarization capabilities, researchers can achieve more accurate and focused thematic interpretations of each keyword cluster generated by KeyGraph, enhancing the precision of topic detection and the overall efficiency and interpretability of the analytical process [17].

Researchers have employed a human–AI interactive mechanism based on the double helix model to adjust node parameters, reclassify keyword clusters, and evaluate the semantic consistency and logical coherence of generated summaries dynamically. Through iterative HCI, this process constructs a logically coherent thematic structure for AI ethics [18].

This study analyzes AI ethics-related articles published between 2022 and 2024 by international academic institutions, news media, and nonprofit organizations. Rigorous selection criteria were applied during data collection. Sources were limited to reputable, authoritative organizations, and articles focusing exclusively on single disciplines or industry-specific applications were excluded. Comprehensive reports reflecting global trends were prioritized to ensure that the dataset represented diverse perspectives and contentious issues in AI ethics worldwide, establishing a neutral and macro-level textual database. Eight representative documents were selected for each year as the basis for the KeyGraph algorithm-based topic detection and chance exploration.

This study conducts comparative analyses of annual thematic clusters using a multistage processing approach comprising text preprocessing, keyword network construction, topic detection, and semantic interpretation. By identifying core themes, logical transitions, and shifting focal points, the study examines chance keywords to detect potentially emerging yet significant ethical issues. Integrating the structural keyword-mining capabilities of KeyGraph with the semantic reasoning strengths of ChatGPT, this research develops a dynamic, extensible, and semantically enriched method for topic detection. This approach provides empirical evidence and strategic insights for trend monitoring, governance planning, and knowledge development in the field of AI ethics, while also enabling nonspecialist audiences to efficiently grasp the evolving landscape of AI ethics themes.

The methodological framework adopted here is comparable to the hybrid semantic analysis model proposed by Chechkin et al. [19], which addresses cybersecurity text analytics by integrating a Knowledge-Associated Network (KAN) for domain-specific feature extraction, a BiLSTM module for capturing sequential dependencies, a Transformer encoder for modeling global semantic relationships, and a Multi-Domain Dynamic Attention Network (MD-DAN) to adaptively weight features from different sources. This architecture enables the model to combine local context sensitivity with cross-domain semantic generalization, achieving high performance in identifying and classifying complex semantic patterns in cybersecurity data. While the domain focus differs—with their study centered on cybersecurity and ours on AI ethics discourse—both approaches share the aim of enhancing semantic analysis through the integration of complementary computational methods. In contrast to their use of deep neural architectures for direct text classification, our method combines the structured keyword-network modeling capabilities of KeyGraph with the semantic reasoning capacity of ChatGPT. Together with iterative human–AI interaction, this facilitates the dynamic discovery and interpretation of cross-text topics and supports the identification of latent and emerging issues in longitudinal AI ethics reports.

To guide readers, the remainder of this paper is organized as follows. Section 2 reviews the literature on AI ethics, consolidates the research gap and contributions, and formulates the research questions; the generic background and procedural details of KeyGraph are provided in Appendix A. Section 3 describes the dataset, parameter settings, and the hybrid KeyGraph–ChatGPT workflow. Section 4 presents the results alongside quantitative validation. Section 5 discusses the findings in relation to prior work and their practical implications. Section 6 concludes the study and highlights both theoretical and applied contributions. Finally, Section 7 addresses limitations and proposes directions for future research.

2. Literature Review

2.1. AI Ethics Literature Review

In recent years, ethical issues surrounding artificial intelligence have become a prominent topic of debate in both academia and the public sphere, reflecting AI’s growing role in global technological development and social governance. A systematic review of academic databases shows that the existing literature addresses a wide range of themes: some studies concentrate on specific topics (e.g., data transparency and privacy protection), while others integrate multiple dimensions of ethical concerns to enable interdisciplinary analyses [7,20,21].

However, research offering structural examinations of interrelationships among ethical issues remain limited, particularly in fast-evolving domains such as generative AI and autonomous driving. Recent studies have identified distinctive ethical challenges associated with generative AI, including concerns about content authenticity, misuse risks, and value alignment [22]. Capturing the contextual shifts and logical frameworks of ethical debates across interdisciplinary publications therefore remains an important research gap [23].

Against this background, AI ethics-related reports from news media, academic institutions, and nonprofit organizations—valued for their timeliness, accessibility, and public engagement—have become essential for tracking thematic evolution and value conflicts. These reports supplement the academic literature by clarifying practical contexts and illustrating how ethical principles are interpreted and contested in real-world applications.

AI ethics issues are inherently diverse and complex, spanning technology, law, society, and philosophy [20,23]. This study integrates the KeyGraph algorithm with text mining to construct a keyword co-occurrence network based on keyword frequency and co-occurrence patterns. This approach allows for a systematic and visual exploration of ethical themes across various sources, as well as an assessment of diffusion structures and intrinsic keyword relationships [24].

A core subtopic is AI governance, which focuses on ensuring AI’s trustworthiness and social acceptability in public decision-making [25]. In medical and public service contexts, institutionalized risk management practices have emphasized verifiable sources, content labeling, and platform accountability [26]. Trustworthy AI incorporates principles such as accuracy, robustness, transparency, accountability, fairness, interpretability, explainability, legality, redress mechanisms, and human oversight. In practice, achieving all these values simultaneously is challenging, often leading to prioritization conflicts [21]. For instance, improving accuracy often requires complex models that reduce transparency, while fairness metrics face inherent trade-offs: the “statistical impossibility theorem” asserts that when base rates differ across groups, balancing false-positive and false-negative rates is impossible [23,27].

The integration of AI technologies with the United Nations Sustainable Development Goals (SDGs) has emerged as a significant focus in AI ethics. While AI can support technological applications, social innovation, and resource efficiency, it must be developed under conditions of responsible governance and sustainable deployment. Recent frameworks for mitigating generative AI risks combine governance mechanisms, technical safeguards, and social oversight. Four primary challenges remain in aligning AI ethics with sustainable development: (1) transparency, fairness, bias, and accountability issues in automated decision-making; (2) environmental impacts from the high energy demands of large-scale models; (3) inadequate governance frameworks that often react to, rather than anticipate, technological change; and (4) technical bottlenecks, including insufficient explainability, data governance limitations, and the difficulty of designing multidimensional performance indicators [9,27].

The vast quantities of sensitive data generated by AI heighten risks to privacy and security. In AI-driven knowledge management, ethical risks related to privacy, bias, and transparency are particularly prominent. A review of 102 AI ethics articles found that privacy and algorithmic bias accounted for 27.9% and 25.6% of discussions, respectively, making them the most frequently addressed topics. Transparency, accountability, and fairness also remain central [28]. These findings mirror other frameworks, indicating that AI ethics discourse is shifting from abstract principles toward operational challenges [29], necessitating cross-disciplinary integration and technical governance [23,28,30].

Some researchers advocate for decentralized data governance, such as federated learning, to balance utility with privacy protection [31]. They also recommend adopting multidimensional performance evaluation frameworks that incorporate environmental, social, and ethical metrics. Furthermore, inclusive development and human-centered design—grounded in diverse perspectives—can help mitigate systemic bias and bolster the legitimacy of AI governance.

Although research and policy documents in AI ethics have widely articulated principles such as transparency, accountability, fairness, and privacy, operationalizing these principles within organizations remains difficult. A persistent gap exists between normative statements and practical implementation, underscoring the insufficiency of relying solely on external guidelines. Organizations should establish internal governance systems that integrate ethical principles into the technology lifecycle, decision-making processes, and risk assessments, thereby enhancing ethical sensitivity and adaptability [9,20,23,28,32].

In summary, AI ethics has progressed from high-level principle declarations toward practice-oriented institutional development and governance innovation. Future research should pursue integrative frameworks and multi-level governance models that combine design principles, regulatory mechanisms, ethical conflict detection, and public engagement—ensuring that AI development promotes innovation while safeguarding ethical values, social order, and sustainable development objectives [9,20,23,33].

2.2. Chance Discovery Theory

The theory of chance discovery is an interdisciplinary data mining framework designed to identify rare yet valuable “chances” through human–computer interaction (HCI) and structural data analysis. A “chance” is broadly defined as critical information that can guide decision-makers or automated systems toward significant actions, often revealing emerging opportunities or previously unrecognized risks [10,34,35,36,37]. Unlike random event detection, chance discovery emphasizes conscious awareness [35], focusing on the associations between keywords rather than their frequency alone. Structured keyword analysis and visualization tools, such as the Polaris system that implements the KeyGraph algorithm [34], facilitate the detection of bridging words that connect multiple clusters, which may represent potential chances [10,38,39].

In contrast to traditional data analysis methods, which typically assume stable structures and predefined variables, chance discovery targets low-frequency information nodes with significant implications. Insights often emerge gradually and may be concealed within unstructured data, requiring decision makers to actively engage in the evolving information landscape. The method integrates human contextual sensitivity with computational capabilities for data processing and visualization, enabling the detection of chance information and the formulation of strategic responses [10,38,39].

Ohsawa [10,38,39] proposed three key criteria for identifying chances:

Establishing and uncovering innovative models and variables: Incorporating contextual factors to identify emerging variables in specific situations, thereby improving relevance and accuracy.
Identifying tail events: Detecting rare but high-impact occurrences through focused observation and analysis.
Relying on human–AI interaction for interpretation and judgment: Leveraging human expertise to evaluate whether a tail event constitutes a genuine chance, given its rarity and inherent ambiguity.

Chance discovery thus relies on HCI to integrate computational analysis with domain expertise, enhancing both the accuracy and practical utility of chance identification. Information is generated and interpreted through a nonlinear, iterative process—represented by the double helix model—which supports ongoing exploration via computer-based mining, visual feedback, human interpretation, and parameter refinement [10,35]. The subsumption architecture cognitive model further strengthens this concurrent feedback loop, aligning it with human cognitive patterns.

The theory of chance discovery has been widely applied in fields such as health care, business innovation, marketing, disaster prediction, and risk management, demonstrating its generative capacity. It enables the detection of rare events and latent information often overlooked by conventional analyses, providing an integrated framework that combines HCI, semantic construction, and abductive reasoning to generate new hypotheses and create value [10,34,35].

2.3. Double Helix Model: Human–Machine Collaborative Framework for Chance Discovery

This study adopts the theory of chance discovery as its theoretical foundation and applies its core framework, the double helix model, to perform semantic network analysis and topic detection through an HCI process. The model comprises two interwoven components: the computer-driven process and the human-driven process. Together, they form a spiral cognitive feedback mechanism analogous to the structure of DNA [24,34,35,40,41,42]. This approach captures the dynamic interplay between data-driven analysis and knowledge interpretation, thereby enabling the identification of low-frequency but semantically significant nodes, referred to as “chances” within the keyword network.

As shown in Figure 1, the model advances through a dual-dimensional interactive spiral that reflects the iterative nature of human–AI collaboration. The HCI loop consists of four primary stages, each involving repeated cycles between human- and computer-driven processes [10,24,42,43,44]:

Human-driven process: Setting analysis parameters by inputting the article dataset and initializing the KeyGraph algorithm. The researcher uses the Polaris visualization tool to configure initial parameters, such as the number of bridging nodes (red nodes) and high-frequency black keyword nodes (Phase 1 in Figure 1).
Computer-driven process: Conducting data mining and constructing the keyword network. The system runs the KeyGraph algorithm, calculates keyword co-occurrence frequencies and structural relationships, and builds the network graph, thereby extracting latent knowledge structures from large-scale textual datasets (Phase 2 in Figure 1).
Computer-driven process: The algorithm generates a network graph visualization, producing an interpretable structure that illustrates keyword relationships, with red nodes serving as potential bridges for semantic interpretation (Phase 3 in Figure 1).
Human-driven process: Interpreting and refining results. The researcher evaluates ChatGPT’s topic detection and semantic interpretations based on the visualized graphs. If illogical results occur (e.g., “I beer”), Polaris parameters—such as the number of black or red nodes—are adjusted until coherent outputs emerge (e.g., “I love to drink beer”) (Phase 4 in Figure 1).

This iterative cycle combines the computational efficiency of the machine with the critical thinking and domain expertise of the human analyst, ensuring the continuous refinement of keyword structures and thematic interpretations [24]. Guided by expert judgment, researchers progressively transform implicit keyword relationships into explicit knowledge, enhancing decision-making and enabling the identification of emerging or evolving themes [10,41,43,44]. The double helix model thus demonstrates its value in addressing complex problem-solving tasks.

2.4. KeyGraph Algorithm Overview

The KeyGraph algorithm is a core tool for implementing chance discovery. This graph-based text mining technique aims to uncover critical contexts and latent events with significant influence that are hidden in texts via a keyword network model, which is compared to traditional keyword-frequency-dependent algorithms (e.g., term frequency–inverse document frequency or LDA). The KeyGraph algorithm is widely applied in topic detection research. The operational mechanism extracts high-frequency keywords from the text as nodes and analyzes the co-occurrence relationships between these keywords to construct a keyword network graph [45,46,47,48,49].

Furthermore, the uniqueness of this algorithm lies in its ability to identify “keywords that have structural bridging value but occur with low frequency,” revealing latent nonexplicit topics and interdisciplinary conceptual connections in the text. This approach enables the derivation of core issues and underlying value perspectives in articles. This method does not require manual annotation or prior knowledge and can automatically extract representative keywords or topics from the collected technical texts or academic literature. This method constructs a keyword network graph to present the associative structure of keywords, enhancing the transparency and interpretability of topic detection [45,46,50,51].

The KeyGraph algorithm is suitable for applications, including knowledge structure exploration, thematic context evolution analysis, and emerging topic detection, owing to its advantages in knowledge structure construction and analysis. Ohsawa [10,47,50] first proposed this algorithm in 1998, initially as an automatic indexing technique based on keyword co-occurrence graphs in texts, and it was later developed into a core analytical tool for chance discovery applications. In Ohsawa’s [10,47,50] original conceptualization, the KeyGraph algorithm is explained using an architectural structure analogy. The literature is viewed as a building, where the foundation represents the fundamental concepts of the literature, constructed by analyzing co-occurrence relationships between high-frequency keywords. The pillars symbolize the associations between the keywords and foundational concepts, forming a structured network connecting concepts in the literature. The roof represents the core viewpoints in the literature, typically constituted by low-frequency bridging keywords strongly connected to multiple conceptual clusters, reflecting the primary perspectives or innovative points in the literature.

In terms of technical implementation, the KeyGraph algorithm produces a graph structure by analyzing co-occurrence relationships between keywords, where nodes represent keywords and edges indicate the strength of these co-occurrences. As depicted in Figure 2, the concepts of black and red nodes are further introduced to describe this structure precisely. Black nodes represent high-frequency keywords with strong connections to fundamental concepts. These nodes are responsible for the interpretability of the knowledge structure, serving as the backbone of the knowledge graph and the foundational structure of KeyGraph. Typically, multiple black nodes form clusters that contain latent topics embedded in the articles. Red nodes are keywords with lower frequencies but strong co-occurrence relationships with multiple clusters. They are often metaphorically associated with chance discovery, reflecting atypical yet potentially valuable keywords in articles, called chance nodes in this work. Black edges represent strong co-occurrence relationships between black nodes, forming stable keyword clusters via their connections. Red edges denote bridging relationships between red nodes and clusters, highlighting the value of rare events [10,46,47].

A distinctive feature of the KeyGraph algorithm is its ability to extract key concepts and their associations automatically from articles, revealing implicit relationships between these concepts. Generating visualized knowledge graphs uncovers latent and significant information in articles, enabling users without a technical background to comprehend the complex structures and interrelations of keywords intuitively [46,50,52] (Figure 2). Further details on the procedural steps and computational methods of the KeyGraph algorithm are provided in Appendix A.

2.5. Research Gap and Contributions

Despite extensive research on AI ethics (fairness, privacy, transparency, accountability, governance), much of the existing work is static (single snapshot), domain-specific, or based on a single source. It rarely (i) traces year-over-year semantic shifts across heterogeneous, authoritative reports; (ii) applies graph-structured topic detection to uncover low-frequency yet structurally central (“chance”) keywords [53,54,55]; or (iii) combines such detection with LLM-assisted interpretation under an explicit human-in-the-loop protocol. Furthermore, prior studies rarely (iv) quantify interpretive reliability (e.g., cross-model agreement), (v) justify network parameterization choices that affect topic separability, or (vi) translate structural patterns into governance-relevant insights.

This study addresses these gaps by contributing the following:

A dynamic, hybrid pipeline for topic detection. We integrate KeyGraph (co-occurrence structure and chance discovery) with LLM-assisted interpretation [54,55], which is governed by a two-stage human-in-the-loop process that constrains prompts and verifies outputs.
A longitudinal, cross-source corpus (2022–2024). Twenty-four authoritative reports support the comparison of stable versus shifting themes across years and sources.
Operationalization via chance-anchored diffusion. We formalize semantic diffusion paths from chance (bridging) keywords to clustered high-frequency terms, producing cluster-level topic summaries grounded in the source texts.
Dual-layer reliability checks. We combine expert-informed review (semantic logic, consistency, keyword coverage, inter-rater agreement) with cross-model semantic similarity. Summaries are independently generated by two LLMs, with sentence-level alignment measured using multiple embedding models. We also assess how structural complexity (single versus combined clusters) affects stability.
Transparent parameterization. We report and justify all parameter settings (e.g., number of high-frequency and chance nodes) in accordance with Zipf-like term distributions to balance network density and topic separability [56,57].
From structure to governance. We link detected patterns to actionable AI-governance insights (e.g., bias and privacy risk chains, transparency and explainability needs, responsibility allocation, and implications of generative AI deployment).

Collectively, these contributions establish a methodologically transparent and empirically supported approach demonstrating how KeyGraph–ChatGPT integration enables effective topic detection and credible interpretation in longitudinal, cross-source analyses of AI ethics discourse.

2.6. Research Questions

Building on the identified gaps and contributions, we assess the effectiveness and reliability of the hybrid KeyGraph–LLM (ChatGPT) pipeline for topic detection in unstructured texts through the following research questions:

RQ1 (Effectiveness for topic detection). Can the integrated KeyGraph–LLM workflow deliver reliable topic detection—specifically, coherent and context-faithful cluster-level summaries—without relying on domain experts?
Operationalization: Human evaluation of semantic logic, consistency, and keyword coverage with inter-rater agreement (e.g., Cohen’s κ), supplemented by convergence evidence from cross-model semantic similarity.
RQ2 (Longitudinal thematic evolution). Across 2022–2024, what stable and shifting themes characterize AI ethics discourse, and how do chance (bridging) keywords reveal emerging or cross-cutting issues?
Operationalization: Year-over-year analysis of KeyGraph-derived cluster structures and chance-anchored diffusion paths.
RQ3 (Reliability vs. structural complexity). Given identical inputs and prompts, to what extent do two LLMs produce convergent topic interpretations, and does topic structure complexity (single versus combined clusters) systematically affect cross-model similarity?
Operationalization: Sentence-level cosine similarity using multiple embedding models, with statistical tests for differences by cluster configuration.

Bridge to methods. Guided by these research questions, Section 3 details the dataset, parameterization, and hybrid KeyGraph–ChatGPT workflow. Appendix A (“KeyGraph Algorithm”) consolidates the stepwise procedure and the co-occurrence strength computation used in this study to support reproducibility.

3. Methodology

The research process is divided into the five stages at the beginning of Figure 3.

3.1. Data Collection

The data collection method in this study primarily employed the search terms AI ethics, ethical AI, and responsible AI, focusing on the overall concepts and frameworks related to AI ethics, cross-industry universal guidelines, and globally influential and controversial problems, while avoiding biases toward specific industry applications. Moreover, AI ethics emphasizes moral principles at the technical level, addressing ethical issues in the design and operation of AI systems [58]. Ethical AI concerns the implementation of ethical principles in the development and application of AI technology, ensuring compliance with moral standards [59]. Responsible AI highlights the responsibilities and regulations of developers and users regarding the social influence of AI technology, promoting accountability and governance [60]. These three concepts are complementary and synergistic, facilitating an in-depth exploration of the ethical challenges faced by AI and advancing discourse on its practical applications and social responsibilities toward greater depth and breadth.

This study prioritized the selection of reports and articles related to AI ethics published in English between 2022 and 2024 via online searches to emphasize data timeliness and capture recent developments in the field. This study aimed to provide a systematic clarification of perspectives and the latest trends in AI ethics discourse, situated within the context of the rapid technological evolution of AI during this period (see Table 1, Table 2 and Table 3).

A rigorous screening process was conducted during the data collection phase to ensure the comprehensiveness and authority of the research data. Data sources were limited to internationally recognized and credible academic research institutions, news media, and nonprofit organizations. Articles focusing solely on a single-domain or industry application were excluded, prioritizing comprehensive reports reflecting international trends. This approach aimed to ensure that the data adequately represent diverse global perspectives and contentious issues regarding AI ethics, constructing a neutral and macro-level article dataset. Eight highly comprehensive and interpretative articles were selected for each year as the analytical corpus for the KeyGraph-based topic detection and chance exploration.

3.2. Data Preprocessing

This study employed the KeyGraph algorithm to extract topics from articles, based on co-occurrence relationships between keywords, and constructed a structured visual graph. However, the original articles often contain numerous stop words, irrelevant information, and punctuation marks. If the articles are analyzed directly using the KeyGraph algorithm without preprocessing, extracting the associations between keywords becomes difficult, resulting in overly cluttered or off-focus keyword networks. This study segmented sentences, tokenized words, and filtered out meaningless words and stop words prior to analysis to enhance the accuracy of keyword identification and reduce graph noise, ensuring the accuracy and visualization quality of the keyword co-occurrence network graph.

This study employed Python 3.12 combined with the Natural Language Toolkit (NLTK) for article preprocessing during the data processing phase to achieve the objectives. First, each collected article was segmented into independent sentences based on periods, and each sentence was converted to lowercase and tokenized into individual words while removing punctuation marks but retaining contractions containing apostrophes (e.g., in the work don’t). The stop word functionality for NLTK was used to filter out semantically insignificant words (e.g., the, is, and and) to reduce data noise and highlight keywords related to AI ethics. The filtered word list was written line by line into output files with the words separated by spaces. A manual review and filtering process was also conducted to remove any remaining irrelevant words and tokenization errors to enhance the thematic relevance of the data. This process effectively streamlined the data, improved keyword prominence, and established a structured data foundation for the keyword network of the KeyGraph algorithm, ensuring the accuracy and visualization quality of the analyses.

3.3. Construction of the Keyword Co-Occurence Network

3.3.1. Chance Discovery in AI Ethics Using KeyGraph

This study employed the graph-based KeyGraph text mining technique as the core tool for topic detection and chance discovery in AI ethics articles to overcome the limitations of traditional text mining methods in analyzing topic evolution and identifying latent issues. As described in Section 2, KeyGraph is a keyword network analysis method that integrates lexical co-occurrence structures with chance identification. By constructing a co-occurrence graph of keywords, KeyGraph reveals the intrinsic topic structures and semantic linkages in articles, making it suitable for exploring cross-topic or dynamic contexts and emerging keywords. A distinctive feature of KeyGraph is its ability to identify low-frequency but highly connected chance nodes linked to multiple topic clusters, uncovering latent issues that traditional high-frequency analyses often fail to capture. The KeyGraph analytical process can be divided into the following five core strategies, serving as the foundation for semantic mining and topic detection in AI ethics articles:

Keyword frequency and co-occurrence calculation: First, the occurrence frequency of all words in the articles is calculated and sorted. The top consecutive high-frequency words are selected as keywords, representing the core foundational concepts of the articles. Using paragraphs or sentences as the calculation units, the co-occurrence relationships between all keywords are computed and applied to establish connections.
Node role classification and keyword clustering: Based on the frequency of keyword occurrences and their structural positions in the co-occurrence network, nodes are classified into three categories, which lays the foundation for chance discovery.
○
High-frequency keywords: Keywords with high occurrence frequency that are concentrated in specific topic clusters represent the primary concepts of the topics. In this study, these are consistently represented by high-frequency black nodes.
○
Chance keywords: These keywords (known as bridging words) have lower occurrence frequencies but are associated with multiple topic clusters. They typically indicate emerging concepts or interdisciplinary issues and are valuable for discovering latent topics. In this study, they are represented by red nodes.
○
General terms: Keywords lacking structural significance are excluded from the visualization network.
Keyword co-occurrence network construction and thematic cluster identification: A keyword association graph is constructed with keywords as nodes and the co-occurrence strength as weighted edges. This method aggregates high-frequency terms and forms thematic clusters.
Keyword network visualization: The nodes and links are visualized using tools (e.g., Polaris) which map co-occurrence relationships between keywords to construct their association network graphs. By adjusting parameters (e.g., frequency thresholds, co-occurrence strength, and the number of nodes), different levels of keyword structures are explored to enhance the understanding of potential keyword clusters and association pathways.

In the practical implementation, this study preprocessed the articles, including word segmentation and stop word removal, and segmented them according to the time series from 2022 to 2024 to observe changes and shifts in potential topics or issues over time. The Polaris visualization tool was employed in combination with statistical methods (e.g., word frequency and co-occurrence analysis) and data mining techniques to execute the KeyGraph algorithm [61], which automatically calculated the keyword co-occurrence frequency and co-occurrence strength. This tool constructs a keyword network graph to explore keyword structures at various levels, promoting semantic interpretation and chance discovery. In the network, high-frequency black nodes represent keywords with a high frequency and stable semantic cores, whereas red nodes represent potential keywords with lower frequency but strong connections to multiple topic clusters. Black edges between nodes reflect the co-occurrence strength between keywords.

This study applied the parameter adjustment functions provided by Polaris to identify an appropriate keyword network structure and dynamically set conditions (e.g., keyword frequency thresholds, co-occurrence link strength, and maximum number of nodes), controlling the hierarchical levels and cluster partitioning of the keyword network graph. This approach enhances the structural clarity of the visualization, facilitating a comparative analysis and interpretation of the keyword networks across years and enabling further observation of the formation, expansion, and evolution of topic clusters.

This method reveals the explicit topic structures in AI ethics articles and can uncover potential low-frequency keywords and emerging issues that traditional techniques often fail to capture, providing a solid foundation for topic detection and chance discovery. Overall, by integrating keyword co-occurrence structures, latent chance nodes, and visualization tools, KeyGraph effectively extracts explicit and implicit issues in articles, enhancing the exploratory and strategic aspects of topic detection.

However, the interpretive precision of the keyword network graph and the clarity of topic detection often depend on the number of high-frequency black nodes and the ratio between high-frequency black nodes and red chance nodes. An excessive or insufficient number of these nodes may affect the semantic clarity and accuracy of topic detection, influencing the reliability of the analytical results. The following section analyzes the relationship between semantic node density and the effectiveness of topic detection, investigating the effect of node density on the keyword structure and topic differentiation.

Based on extensive parameter testing and iterative visualization reviews, this study found that selecting 60–80 high-frequency black nodes and 2–4 red chance nodes per year provides an optimal balance between network density and topic separability. This configuration aligns with the long-tail characteristics of Zipf’s law in term frequency distributions, where the top 5–10% of high-frequency terms are sufficient to form a stable backbone, while the inclusion of a small number of low-frequency but highly connected red nodes substantially improves the detection of potential emerging topics.

3.3.2. Analysis of Keyword Network Node Density and Topic Detection Accuracy

When conducting a KeyGraph keyword network analysis, setting too many high-frequency black nodes (e.g., designating 100 out of 1000 (10%) distinct terms as black nodes) may lead to an overly complex network structure, adversely affecting the accuracy and focus of topic detection. In KeyGraph, high-frequency black nodes represent the core terms in the keyword network, outlining the principal thematic structure of the articles. However, when the number of high-frequency black nodes is too large, the co-occurrence density of high-frequency keywords increases significantly, resulting in an overly dense network. This density can blur thematic boundaries, intensify the semantic overlap between nodes, and hinder the convergence of co-occurrence paths, weakening the ability to detect latent topics and contextual structures [62,63,64,65].

Second, an excessive number of high-frequency terms may include morphologically varied but semantically similar words (e.g., make, makes, and made). Although these high-frequency terms frequently co-occur, they may lack clear thematic referentiality, disrupting the focus of the keyword network. This disruption often leads to topic analysis results that are biased toward overly generalized dominant themes or may even trigger semantic hallucinations, limiting the ability to identify subtle, overlapping, or emerging topics [62,63,64,65].

From an operational perspective, setting too many high-frequency black nodes can lead to an overly complex graph structure, reducing the feasibility of cluster partitioning and cross-validation and increasing the difficulty of topic analysis. This outcome may result in the omission of critical information during topic summarization or affect the interpretability of the detection process, weakening the depth and novelty of the conclusions. Conversely, setting too few high-frequency black nodes may cause essential topic-related keywords to be inadequately captured. In particular, some secondary terms (although not highly frequent) may carry significant semantic meaning but be excluded from the analysis, leading to an unbalanced topic distribution. This outcome compromises the formation of structured keyword clusters, dilutes the network core, and undermines the coherence of the keyword network and the overall inference of topic evolution.

In summary, determining the optimal number of key nodes requires iterative parameter tuning and visual inspection to control node density appropriately and to identify the most suitable keyword network. This approach helps preserve the structural stability of the keyword network, mitigating the risks of overlap and semantic hallucination in topic analysis while enhancing the overall interpretability and exploratory depth of the findings.

3.4. Selection of High-Frequency Keyword Clusters

After constructing the keyword network, this study conducted manual classification and clustering based on the network structure formed by the high-frequency black nodes. The initial clustering process employed the chance nodes (i.e., red nodes), identified by the KeyGraph algorithm, as the starting points for keyword diffusion. These red nodes are considered anchors for potential emerging or latent themes due to their role in bridging high-frequency keywords despite having a relatively low frequency themselves. From each red node, the diffusion extends outward to directly connected high-frequency black nodes, with the number of connected black nodes limited to a maximum of six to seven per red node. This setting helps control the cluster size and prevents excessive expansion that may blur or overgeneralize the results of the topic analysis.

After completing the initial clustering, each cluster was input into ChatGPT for topic detection and semantic interpretation. ChatGPT analyzed the complex relationships between the keywords in each cluster and inferred the potential ethical issues or discourse associated with the cluster. If the interpretations generated by ChatGPT lack logical coherence or sufficient semantic clarity, researchers can adjust the relevant parameters of the KeyGraph algorithm (e.g., the number of high-frequency black nodes, connection strength, or red nodes) to reconstruct a new keyword structure and cluster distribution. This dynamic human–AI collaborative adjustment mechanism was iteratively repeated until the resulting cluster division demonstrates semantic clarity and structural coherence.

Overall, this procedure embodies a human–AI collaborative mechanism for semantic construction. The KeyGraph algorithm segments clusters based on the proximity of red nodes and high-frequency black nodes, considering their co-occurrence. In contrast, ChatGPT provides complementary support for semantic interpretation, and researchers can manually adjust parameters to ensure coherence. This approach reflects the core aim of HCI in the double helix model, forming a dynamic and iterative process of topic analysis that enhances the semantic focus and ensures the structural integrity of the keyword network.

3.5. Employing ChatGPT for Topic Detection

This section elaborates on how the KeyGraph algorithm was employed to conduct topic detection and chance discovery in AI ethics articles while examining the limitations of traditional keyword network analysis methods. This work employed ChatGPT as an auxiliary tool for semantic interpretation to overcome these constraints during the potential topic detection phase of keyword clusters. The technical advantages and application strategies of this integration are detailed below.

3.5.1. Limitations of Previous Methods

Before the widespread adoption of LLMs, such as ChatGPT, article mining for keyword networks and topic detection primarily relied on interpreting the semantic relationships between node clusters and bridging chance nodes in the keyword network graph. Researchers subjectively assigned thematic meanings to the keyword structures and topics based on these relationships. Researchers needed to trace the data back to the original article and examine the contextual usage for semantic interpretation to clarify the semantic association of a bridging node (e.g., accountability) with multiple clusters [66,67,68,69]. In this analytical framework, topic detection and semantic clarification were achieved primarily through the following approaches:

Topic cluster identification and core concept summarization: KeyGraph identifies high-frequency keywords in articles and designates them as high-frequency nodes (i.e., black nodes) in the keyword network structure. Based on the co-occurrence relationships between these keywords, tightly connected clusters naturally form, reflecting the primary themes or subdomains in articles. Researchers can summarize representative thematic labels based on the characteristics and co-occurrence patterns of keywords in each cluster, producing an initial thematic summary and classification of the core article content.
Chance keyword identification and pairwise semantic relationship mining: The uniqueness of KeyGraph lies in its ability to identify chance keywords that, despite their low frequency, connect multiple thematic clusters. Although these keywords appear infrequently, they serve as bridging nodes linking thematic clusters in the keyword network. Researchers conduct in-depth analyses of these chance keywords by tracing their contextual usage back to the original articles, manually interpreting their semantic roles and how they connect with multiple thematic clusters. This process facilitates identifying emerging topics, interdisciplinary integration points, or potential trends.

However, although KeyGraph can provide structural information and identify potential chance keywords, without LLM assistance, theme summarization and semantic clarification still heavily rely on manual interpretation and domain-expert knowledge. Researchers must manually integrate the keyword network graph, centrality metrics, and the contextual usage of terms in the original articles to trace and interpret the semantic roles of potential keywords and their connections to multiple thematic clusters. This process is time-consuming, and the results are often limited by the researchers’ professional judgment, reducing the efficiency and scalability of semantic mining. These traditional methods commonly face several significant limitations [68,69] defined below:

Topic summarization heavily relies on manual interpretation, resulting in subjectivity and inconsistency: Although traditional keyword network graphs can visually present co-occurrence relationships between high-frequency keywords, their semantic connections often lack systematic explanatory mechanisms, typically relying on researchers’ expertise and experience for semantic interpretation and topic detection. This process is time-consuming, labor-intensive, and prone to inconsistencies due to variations in interpreters’ knowledge, affecting the objectivity of topic summarization. These problems become pronounced when analyzing multiple articles or conducting comparative analyses over time.
Limited ability to identify low-frequency, high-value keywords, making latent topic detection difficult: Traditional text mining methods using statistical frequency focus on topic clusters formed by high-frequency keywords, often overlooking low-frequency keywords and chance nodes that play bridging or transitional roles in the keyword structure. These low-frequency keywords often represent emerging concepts, topic intersections, or contextual shifts, holding significant value for uncovering latent research topics and policy chance information. However, traditional methods struggle to identify and interpret their semantic roles systematically, limiting the efficiency and usefulness of topic exploration.
Difficulty tracking dynamic contexts hinders automating topic-evolution pattern analysis: When managing cross-temporal texts, such as AI ethics articles from 2022 to 2024, traditional keyword network analysis often requires a manual comparison of keyword structural changes at various time points and cannot effectively or automatically track how topic keywords undergo semantic shifts or experience topic merging and splitting as the context evolves. This limitation hinders researchers’ understanding and forecasting of topic evolution trajectories, resulting in analyses without the capacity to present temporal and dynamic characteristics.
Visualization maps are challenging to convert into structured data for inference: Although keyword network graphs offer a high degree of visual intuitiveness and help reveal thematic contexts and lexical and relational structures in texts, their results are often presented as images. When the number of keyword nodes in topic clusters is high, the clarity and readability of these visuals significantly decrease, leading to blurred outcomes or difficulty in interpretation during advanced analyses (e.g., topic classification, semantic comparison, or cross-validation).

3.5.2. Technical Background: Semantic Comprehension and Topic Extraction in ChatGPT

The ChatGPT LLM is based on the deep learning transformer architecture that Vaswani et al. proposed in 2017. Unlike the widely used LDA, this model undergoes unsupervised pretraining on large-scale textual data to generate high-dimensional semantic embeddings, capturing the syntactic structures and semantic relationships in sentences. Its core self-attention mechanism captures semantic associations and contextual features in the text, transforming textual data into semantic vector representations to infer deep linguistic patterns and latent word relationships. In practical applications, ChatGPT demonstrates capabilities in processing long texts, generating summaries, and performing topic detection.

The experimental results from existing studies indicate that ChatGPT achieves highly accurate content detection and classification tasks, significantly surpassing the current benchmark methods. Notably, ChatGPT displays outstanding performance in zero-shot learning scenarios. The literature has suggested that ChatGPT can directly understand and execute most tasks without any additional training or fine-tuning, with performance typically exceeding that of other mainstream LLMs, demonstrating exceptional generalizability. Moreover, studies have found that, in specific tasks, the performance of ChatGPT surpasses even that of fine-tuned models. This finding highlights the potential of ChatGPT as a foundation model, achieving or exceeding the performance of task-specific trained models without special optimization and highlighting stronger adaptability and broader application prospects [70,71,72,73].

This study integrated ChatGPT into the topic detection and summary generation tasks of the keyword networks produced by KeyGraph to apply the semantic analysis potential of KeyGraph fully and overcome the limitations of traditional manual interpretation. When the structure of the keyword network, particularly starting from the red nodes, is expanded layer by layer based on the connection strength and semantic distance with high-frequency black nodes, the corresponding original texts are input into ChatGPT individually. This LLM employs the following steps to conduct semantic interpretation and topic detection processes [73,74,75]:

Comprehension of keyword network structures and semantic interpretation: ChatGPT tokenizes the input text, including the original AI ethics articles and translated descriptions of the KeyGraph keyword network structure, and processes it via its multi-layer transformer model for deep syntactic and semantic analyses. The built-in attention mechanism in ChatGPT accurately captures complex relationships between tokens and their contextual meaning, constructing a comprehensive, detailed semantic representation. This approach enables the model to understand the meaning of individual tokens and their positions and roles in the keyword network.
Topic identification: The model identifies frequently recurring keywords and their semantic relationships in the text, grouping them into coherent thematic clusters. Notably, ChatGPT applies its strong contextual reasoning to generate semantically complete and representative thematic descriptions, facilitating the discovery of core concepts in the network structure.
Semantic interpretation and text summarization: ChatGPT extracts critical insight from text based on semantic logic and generates contextually coherent and concise summaries. Researchers can control the content and length of these summaries using precise prompt engineering (e.g., restricting the summary to the imported text) to meet specific analytical requirements. This control considerably enhances the efficiency of extracting insight from complex network graphs [75,76].

Through these steps, this study expanded the keyword network structure constructed by KeyGraph layer by layer, starting from the red nodes and analyzing their associations with the high-frequency black nodes. Then, this study employed ChatGPT to interpret the natural language and summarize the themes of the topic clusters and bridging nodes of the original AI ethics articles. This integrated process significantly enhances the analytical efficiency and accuracy of thematic induction, realizing the HCI and semantic reciprocal interpretation emphasized by the chance discovery theory and providing a more systematic and operable framework for dynamic topic detection.

3.5.3. Method: Integrating KeyGraph and ChatGPT for Topic Detection

The KeyGraph application emphasizes that mining, understanding, and topic detection should be conducted and interpreted by domain experts to ensure the contextual relevance of the keyword network interpretation. However, given the limited availability of domain experts, this study integrated the KeyGraph keyword network analysis with ChatGPT to enhance the semantic interpretability of detected topics and the ability to determine potential chances by constructing keyword association structures and exploring the topics. The analysis process began with the KeyGraph algorithm generating a keyword network graph based on the co-occurrence frequency and distance between keywords, identifying potential cross-topic connective chance nodes (red nodes), which served as the starting points for semantic expansion and topic detection.

Using the red nodes as the starting points for semantic diffusion aligns with the core aim of the chance discovery theory. Although these nodes have a relatively low frequency, they form connections with multiple clusters, displaying cross-topic bridging characteristics and capturing hidden information more effectively. Furthermore, although these nodes may not represent the focus of the texts, they can reveal potential topics at the boundaries of the keyword network, offering high informational value and strategic significance.

For example, starting from the red node R1, a strong co-occurrence relationship exists between R1 and the high-frequency black node T4, which connects to other high-frequency black nodes (e.g., T1 and T3), with T3 linking to T2. Notably, T2 forms a closed-loop connection with both T1 and T4. This hierarchical node expansion enables the construction of a structured semantic diffusion path, delineating the internal relationships and boundaries of a topic (Figure 4).

This study integrated the ChatGPT model with an HCI mechanism to apply the semantic diffusion structure to semantic inference and topic interpretation tasks, enhancing the semantic accuracy and interpretative consistency of topic understanding. The keyword clusters identified by KeyGraph were reconstructed into logically coherent semantic diffusion paths, with red nodes serving as the starting points for interpretation. These paths were input into ChatGPT, which was guided to perform semantic judgment and topic mining based on the provided semantic diffusion path. Figure 4 presents an input example, the R1 semantic diffusion path, where the red node is R1, and the high-frequency black nodes in Cluster 1 include T1–T4.

3.6. R1 Semantic Diffusion Path

The co-occurrence structure in Cluster 1 consists of the following edges: {(R1, T4), (T4, T1), (T4, T2), (T4, T3), (T1, T2), (T2, T3)}. Although this method integrates the KeyGraph algorithm with ChatGPT for topic detection and semantic interpretation, practical implementation still faces challenges. As an LLM, ChatGPT’s outputs may exhibit risks, including semantic hallucination, topic ambiguity, or semantic overextension. This study adopted a dual-layer control strategy to mitigate these biases effectively [77].

The first layer employs a refined prompt engineering mechanism to guide ChatGPT to focus on topic inference in a specific context, ensuring that the generated semantics rely solely on the semantic diffusion path and imported textual data. In this process, researchers initially use role-playing prompt strategies to assign ChatGPT a particular identity or perspective. This initial setting helps to converge the model’s understanding of the specific domain or context before topic inference, aligning its behavior more closely with expectations and enhancing the efficiency and accuracy of tasks. Clear task assignments are given to direct ChatGPT to perform various functions, including topic classification, semantic interpretation, or summary generation. Throughout the process, researchers control the scope of contextual input, restricting the topic summary content to the imported textual data, compensating for ChatGPT’s potential limitations in domain-specific knowledge and reducing the influence of semantic hallucinations on the interpretative results.

The second layer consists of a repeated manual review process. Topic interpretation results generated by ChatGPT are manually examined and validated by researchers. Two independent evaluators, each with graduate-level training in information management and prior research experience in AI ethics, reviewed every ChatGPT-generated summary. Evaluations were based on three explicit criteria: (i) semantic logic, defined as whether the summary’s reasoning is coherent and valid given the provided keyword cluster and source text; (ii) semantic consistency, defined as whether the interpretation aligns with the context of the original sources without introducing contradictions; and (iii) keyword coverage, defined as whether all key terms necessary to represent the cluster’s theme are included.

Each criterion was scored on a binary scale (1 = satisfactory, 0 = unsatisfactory). Inter-rater agreement was measured using Cohen’s κ, with κ ≥ 0.80 indicating strong agreement. Discrepancies were resolved through discussion until consensus was reached. Summaries that did not meet the agreed standards were reprocessed by adjusting parameters in the KeyGraph model and re-summarizing with ChatGPT, after which the validation cycle was repeated. The goal was not to conduct a word-for-word comparison of semantic interpretations but rather to assess and revise the logical coherence and contextual fidelity of the topic summaries. This process allowed researchers to iteratively adjust algorithm parameters based on review outcomes, progressively improving the semantic structure and topic detection results.

Three primary error types were identified in the ChatGPT-generated outputs: (i) semantic hallucinations, in which unrelated terms were incorrectly linked into non-existent relationships; (ii) topic ambiguity, where keywords from two distinct topic clusters were inappropriately merged into imprecise or overly broad summaries; and (iii) overextension, in which speculative information absent from the source data was incorporated into the summaries. These issues were mitigated by adjusting KeyGraph keyword frequency thresholds, constraining expansion layers initiated from red nodes, and refining prompt engineering strategies. Subsequent iterative runs showed measurable improvements in both output accuracy and thematic clarity.

This risk management mechanism balances generative capability with interpretative reliability, illustrating the dynamic optimization loop of data-driven insight combined with knowledge reasoning, as described in the double helix model, facilitating incremental knowledge construction. By integrating keyword diffusion structures with HCI workflows, this study deepens the understanding and identification of latent topics. This approach enhances the sensitivity of topic detection and the accuracy of interpretation, improving the overall explainability of semantic analysis and its capacity to support decision making.

This study conducted topic detection using keyword network graphs and keyword clusters generated by KeyGraph, supplemented by ChatGPT for topic interpretation and summary generation. Researchers interpreted the results and made logical judgments, adjusting the model parameters to guide analysis iterations. This iterative process establishes an HCI mechanism for exploring topics and constructing knowledge.

Although the combined use of KeyGraph and generative AI demonstrates strong potential for topic detection and semantic interpretation, its practical implementation still faces challenges. As an LLM, ChatGPT’s outputs may involve risks, including semantic hallucination, topic ambiguity, or excessive semantic expansion.

To further enhance the reliability of topic interpretation and mitigate potential biases inherent in single-model generative summarization, this study implemented an additional cross-model validation procedure. Alongside the original prompt-engineered constraints and manual review, the identical input data—comprising (i) the KeyGraph-derived keyword co-occurrence network structure, (ii) the complete set of original AI ethics articles, and (iii) the topic-detection instructions—were independently processed by both ChatGPT and Google Gemini under identical prompt conditions. This design ensured that any differences in output could be attributed solely to the internal reasoning and generative behavior of the respective models, rather than to variations in input or prompt formulation.

To quantitatively assess the semantic alignment between the two sets of topic summaries, we conducted sentence-level semantic similarity analysis using three distinct sentence embedding models: (i) the Universal Sentence Encoder [78], designed for general-purpose semantic tasks; (ii) the MiniLMv2 [79], a lightweight transformer-based model distilled from larger BERT-family models to preserve semantic representation quality while enabling faster inference; and (iii) the Cohere Embed model [80], optimized for capturing contextual and conceptual nuances. Each model encoded the summaries into high-dimensional vectors, and cosine similarity was computed between corresponding sentences from the ChatGPT and Gemini outputs. The use of multiple embedding models minimized embedding-specific biases and provided a robust basis for assessing semantic alignment.

This cross-model validation complemented the original manual review process, resulting in a dual-layer verification framework that integrated qualitative and quantitative evidence. Incorporating this procedure strengthened the robustness, reproducibility, and credibility of the proposed hybrid KeyGraph–ChatGPT approach, demonstrating its ability to produce convergent and reliable topic interpretations even without domain-expert involvement.

4. Result Analysis

This section presents a systematic analysis employing the Polaris visualization tool combined with the KeyGraph algorithm to analyze English-language articles related to AI ethics. The dataset comprises representative articles published between 2022 and 2024 by reputable academic institutions, news media, and nonprofit organizations providing comprehensive discourse. Eight representative articles for each year were selected for analysis to deepen the understanding of the evolving context of AI ethics issues and construct a structured keyword network supporting topic detection and interpretation.

In KeyGraph, green nodes represent frequent keywords that form the backbone of the semantic network. Due to the numerous high-frequency black nodes in the initially generated keyword network, the analysis focused on those high-frequency black nodes with explicit connections (i.e., co-occurrence relationships represented by solid black edges). Isolated high-frequency black nodes without connections to other nodes were excluded from the keyword network. This approach improved the readability and relational strength of the keyword network and helped identify potential emerging chance trends.

Next, a structural analysis was conducted on the keyword network formed by these high-frequency black nodes, and keyword clusters were manually delineated. Clustering originated from chance nodes (red nodes) and expanded outward, with the number of linked high-frequency keywords limited to about six to seven per cluster. This process was supported by a keyword review and thematic convergence procedures to ensure semantic coherence and appropriateness within clusters. Following the preliminary clustering, the semantic diffusion paths for each keyword cluster were individually input into ChatGPT for in-depth topic detection and semantic interpretation, revealing the core topics embedded in each cluster.

Cross-cluster thematic integration analyses were conducted to explore latent common topics and interwoven keyword structures of strongly related clusters. All outputs underwent manual inspection to ensure the readability, reliability, and logical consistency of the results. This process illustrates the HCI interpretative loop in the double helix model, where dynamic cycles of semantic network visualization, topic clustering, and generative AI-assisted interpretation collectively enhance the depth and explanatory power of topic detection.

4.1. Yearly Analysis of Topic Evolution and Keyword Structures (2022–2024)

The eight articles from 2022 contained 2595 distinct terms; hence, this study set a parameter of 75 high-frequency keywords as black nodes connected by 135 solid black edges depicting the co-occurrence network of these keywords to present the core associative structure of AI ethics topics. Additionally, four red nodes were designated as the potential chance discovery nodes. Under these conditions, a keyword network was generated using automaker, dignity, behavior, and statistical as the red nodes, highlighting the key themes in the 2022 AI ethics articles (Figure 5). The following bullet points summarize the thematic content.

Cluster A-1: The semantic cluster around the red node automaker focuses on the implementation of autonomous driving technology and the ethical challenges faced by AI in automotive applications. This red node extends through its connection to self to include the keywords based, car, driver, vehicles, and autonomous, outlining application scenarios involving HCI. The keywords driver, task, and autonomous intertwine, reflecting issues of responsibility allocation and control authority. In situations where automated and manual control are combined, the attribution of responsibility for accidents (whether borne by the driver or system) requires further clarification via regulatory frameworks and technical design. Furthermore, task transparency and the interface design are also critical. For example, whether drivers can quickly grasp the operational status and decision rationale of the system directly affects their safety judgments and behavioral responses. Establishing trust and risk perception cannot be overlooked. An insufficient HCI design and information transmission may cause excessive driver trust or erroneous reliance, increasing safety risks. Overall, the keyword structure emphasizes several topics, including the behavior prediction of autonomous technologies, system safety, and user responsibility attribution.
Cluster A-2: The red node behavior forms a keyword network related to AI risk prediction, system deployment, and ethical practices. Through its connections to consequences, the network gradually expands to include the keywords risk, privacy, discrimination, design, and capabilities, reflecting the multifaceted and uncertain outcomes of AI system behavior. Notably, discrimination is intertwined with risk, indicating that failure to address data sources and algorithmic bias properly in real-world applications may reinforce existing societal inequalities and trigger ethical crises of systemic discrimination. The association between design and foundational highlights the need to judiciously consider fundamental principles and ethical values during the initial stages of AI development. Overall, this cluster maps the potential externalities that may arise during AI deployment, emphasizing that developers must assume the corresponding responsibility for the potential social and ethical consequences of system behavior.
Cluster A-3: The keyword network extended from the red node statistical focuses on the computational logic and algorithmic architecture of AI systems. The strong co-occurrence relationships, with the keywords computational, learning, machine, critical, and implementation, reveal core problems, including statistical biases, risk governance, and explainability in current AI technology. The direct and indirect connections between the keywords issues, concerns, ethical, implementation, and critical reflect that AI ethics is not merely a conceptual discussion but is involved in the development, design, and deployment stages of AI systems. Furthermore, the connections emphasize that the realization of AI ethics must integrate value judgments and ethical norms as essential foundations for technical practice. This cluster demonstrates the role of ethical issues in institutional frameworks, industrial applications, and technical design, indicating that ethical practice has become a critical factor that cannot be overlooked in the development of responsible technology.
Cluster A-4: The keyword network constructed around the red node dignity focuses on human rights protection and ethical principles. This red node displays high co-occurrence frequencies with the keyword responsibility, trust, justice, transparency, principles, and ethics, reflecting that current AI technology developers should assume the corresponding moral responsibilities to avoid problems (e.g., bias, discrimination, and structural inequality). Ensuring the transparency of algorithms and data processing allows users to understand the decision-making logic and behavioral patterns of AI systems, safeguarding human dignity and fundamental rights. The connections between justice, guidelines, and harm highlight the necessity of designing AI ethical frameworks and indicate that the lack of appropriate ethical judgment and operational guidance may harm individuals or society, causing discrimination or unfairness. Overall, this cluster focuses on protecting human rights and strengthening ethical norms and institutional justice as core principles, constructing an AI governance mechanism characterized by social legitimacy and long-term trust.
Combined cluster of A-2 and A-3: The keyword network reveals a significant intersection and complementary structure, highlighting the dual technological and societal dimensions of AI ethics issues. Through red chance nodes, including bias, risk, understand, issues, and implementation, cross-cluster bridging nodes emerge, uncovering a risk propagation chain that spans from statistical logic to behavioral consequences. Bias often originates from flaws in algorithm design and training data and further permeates the societal domain after system deployment, leading to concrete and potentially escalating ethical consequences. This analysis indicates that AI ethics challenges must be examined from an integrated, multi-layered perspective spanning technical construction and societal influence. Accordingly, ethical practice in AI should focus on identifying and mitigating potential ethical risks during the early stages of technological development (e.g., data preprocessing and model training). A comprehensive ethics governance framework encompassing bias detection, transparency enhancement, and regulatory mechanisms must be promoted to ensure responsible and sustainable AI applications.
Combined cluster of A-2 and A-4: The analysis reveals that AI behavior must be guided and constrained by ethical principles to prevent harm to human dignity and privacy, enabling the deployment of trustworthy and responsible AI. The behavioral logic of AI systems should be grounded in human rights protection and ethical values, with corresponding regulations (e.g., bias detection and privacy protection standards) introduced during the early design stages to ensure legitimacy and credibility during deployment. The consequences of AI behavior (e.g., bias and privacy infringement) must be directed by ethical principles and implemented through technical practice. This interactive relationship emphasizes that ethics should not be treated as an external constraint to technology but as an internal structure embedded throughout the life cycle of AI design, development, and application, advancing responsible and human-centered AI development. This perspective aligns with the discourse in the 2022 AI ethics articles, including The 2022 AI Index: AI’s Ethical Growing Pains and AI Ethics and AI Law: Grappling with Overlapping and Conflicting Ethical Factors Within AI, and identifies the integration of bias management and privacy protection into a unified ethical framework as an emerging research chance.
Combined cluster of A-2, A-3, and A-4: The semantic co-construction of these three clusters reveals that AI ethics challenges cannot be viewed as problems confined to a single level. The behavioral risks of AI systems (e.g., technical bias, discriminatory outcomes, and privacy infringement) are closely linked to their underlying statistical construction logic, indicating that once deployed, AI may produce irreversible and substantive ethical consequences. If such consequences are not addressed through institutionalized ethical safeguards that ensure prevention and accountability, AI technology risks losing social trust and legitimacy. Moreover, ethical AI practice must adopt a cross-level integration approach to address these challenges, spanning from model training and system deployment to institutional regulation, constructing a full-process ethical governance framework based on the triad of technology, behavior, and values. This structure is critical for preserving human dignity and developing trustworthy and responsible AI.

The eight articles from 2023 contain 1606 distinct terms. Based on customized parameters, 65 high-frequency keywords were selected as black nodes and were connected by 85 solid black lines to construct the co-occurrence keyword network. Two red nodes were designated as potential chance discovery nodes. Under these conditions, a keyword network was generated with machine and misuse as the red nodes, revealing the structural configuration of keywords related to AI ethics and highlighting the emerging chances (Figure 6). In this network, the red node machine forms a keyword structure divided into two fine-grained semantic clusters, connected via the terms trained and develop. The thematic clusters are summarized as follows:

Cluster B-1: With trained as the primary node, the network extends to the keyword data and further expands to the keyword models, privacy, and customer, reflecting early-stage concerns in AI development regarding the legitimacy of data sources and the protection of user information. The node models branch out to include intelligence, ChatGPT, generative, and bias, indicating attention to the algorithmic biases embedded in generative AI models (e.g., ChatGPT). The bidirectional links between privacy, customer, and system highlight ethical considerations regarding user privacy and data security in AI application contexts. The connections between system and the keywords customer, create, and generative reveal the interplay between system design and generative technology in practice, raising concerns about technological transparency and ethical accountability. The keyword artificial is linked to intelligence, lead, and ChatGPT, forming a semantic structure centered on AI model generation and leadership in application. This cluster reveals deep ethical concerns related to the legitimacy of data usage, model bias, privacy protection, and user participation during the training and deployment phases of AI systems.
Cluster B-2: The primary node develop connects with systems and human, revealing the bidirectional relationship of HCI in technological construction. Systems further expands to make and decisions, reflecting the role AI systems play in decision-making processes. Decisions links to making, humans, and believe, forming a cluster centered on how AI decision making influences human beliefs. Technology co-occurs with the terms ethics and concerns, indicating heightened attention to ethical regulations and institutional policies during AI development. Through the node concerns, the keyword ethics connects to potential, business, and responsibility, outlining the importance businesses place on ethical risks and responsibilities when applying AI technology. Overall, this semantic group illustrates the institutional and ethical challenges faced during AI development, emphasizing the importance of bias governance, technical regulation, and establishing user trust.
Cluster B-3: With misuse as the red node, the initial connection to government further extends to industry and society, forming a semantic cluster focusing on institutional roles. The node industry links to insurance, which connects to using, policy, and responsible, highlighting an ethical discourse focused on risk transfer mechanisms and institutional responsibility. The keyword policy is a central node connecting responsible, insurance, and using, indicating that policy should address AI misuse risks via clear responsibility allocation, technical application guidelines, and industry-level risk management, especially concerning privacy protection and social impact. The keywords ethical, ensure, responsible, and using are closely interlinked, underscoring that ethical principles must be embedded in technical usage and institutional regulation. These principles, when supported by accountability structures and protective measures, can mitigate risks of misuse, particularly in areas related to data privacy and societal consequences. The connection between impact and society further indicates the potential and far-reaching effects of technological misuse on social structures. Overall, this semantic cluster illustrates that AI ethical principles should be integrated into institutional design and technological application processes and that clear accountability and regulatory mechanisms are critical for reducing the potential negative influences of AI misuse on societal systems.
Combined cluster of B-1 and B-2: These two clusters, centered on the red node machine, focus on model training and system development, respectively, revealing, through the lens of practical application, the crucial ethical challenges spanning the AI life cycle, from data training and system development to deployment. Both clusters emphasize data ethics (e.g., privacy and bias) and the governance of potential negative influences of AI systems on society and humanity, including decision-making influence and responsibility attribution. Together, these semantic clusters reveal that the core of AI ethics lies in the technology itself and, more critically, in the processes of interaction between AI, humans, and society, particularly regarding risk management and the realization of accountability. The clusters collectively emphasize that achieving a vision of AI development that balances innovation and responsibility requires the parallel construction of responsible governance mechanisms throughout the innovation process.
Combined cluster of B-2 and B-3: In the KeyGraph keyword network, Clusters 2 and 3 are centered on the keywords develop and misuse, respectively, illustrating an ethical link from AI technology development to its potential misuse. The keyword structures revealed by these two clusters reflect that AI ethics challenges originate from individual acts of technical development and extend across broader societal institutions and governance dimensions. The ethical risks posed by AI technology can be effectively addressed only by constructing an integrated accountability framework encompassing development, deployment, and misuse prevention, ensuring that advancement contributes to positive and sustainable social value.
Combined cluster of B-1, B-2, and B-3: These three semantic clusters correspond to three stages of AI ethical risk, model training, system development, and actual misuse along with social impact, respectively, forming a progressive chain from ethical considerations to governance responses. The keyword structures reveal a trajectory that begins with micro-level concerns, including data bias and generative misinformation, and extends to challenges of decision-making and ethical design during the development process. The structures indicate misuse risks and governance responsibility at the societal level. This progression reflects that AI ethics issues are not isolated incidents but constitute a foreseeable and preventable chain of ethical risks. An integrated ethical framework must be established that encompasses data governance, technical design, and misuse prevention, enabling the realization of an AI development vision guided by social values to address multi-level challenges (e.g., bias, manipulation, and misuse).

The eight articles from 2024 encompass 2612 distinct terms. Based on the customized parameters, 60 high-frequency keywords were selected as black nodes and were connected by 95 solid black lines to construct the co-occurrence network of keywords. Two red nodes were designated as potential chance discovery nodes. Under these conditions, keyword networks were generated using media and security as red nodes, revealing the relationships between content production, technological transparency, and institutional responsibility in AI ethics issues (Figure 7). The bullet points below summarize the thematic clusters.

Cluster C-1: With media as the red node, the network connects to social, which links to content, genAI, and used, revealing that generative AI technology has been widely integrated into social platforms and public communication spaces. The connection between social and ethical, further extending to risks and then deployment, challenges, technology, and responsible, indicates that societal concerns have shifted beyond technical applications to the ethical risks and responsibility attribution involved in deployment processes, especially regarding misinformation, information manipulation, and bias problems arising in social media environments. The keywords digital, technology, innovation, industry, and development converge at the nodes essential, become, and important, demonstrating that generative AI has become a core driving force behind contemporary digital innovation and industrial transformation, with its ethical challenges escalating into systemic problems. Overall, this semantic cluster highlights that AI ethics attention has moved toward ethical challenges triggered by the application of generative AI in social and media contexts, emphasizing the importance of responsible technological deployment in these settings.
Cluster C-2: The red node security co-occurs with technologies and extends to data, models, and training, forming a semantic cluster. The keyword biases forms a triangular co-occurrence structure with these three terms, indicating that data sources and processing methods underpin AI system security, and that biases hidden within training data influence model behavior, representing the intersection of ethics and security. Transparency and privacy connect through technologies and further co-occur with regulatory, reflecting that AI ethics discourse has reached institutional dimensions and emphasizing the reliance on and necessity of regulatory mechanisms for system transparency and privacy protection. Via ensure and tools, the keyword decision links to generative and ChatGPT and is associated with businesses and trust, revealing the critical role of explainability and trust mechanisms in generative AI decision-making processes in corporate and societal applications. Overall, this keyword network reveals the interdisciplinary interconnection of AI ethics issues in 2024, providing a structured analytical perspective for technology development, policy regulation, and industry practice.
Combined cluster of C-1 and C-2: The integration of these two clusters highlights the increasingly multi-layered and interdisciplinary complexity of AI ethics issues in 2024. The AI ethical themes are no longer confined to a single domain but require addressing systemic governance challenges while promoting AI development, especially generative AI, in the fields of digital content and media. These challenges include core concerns, including data privacy, algorithmic bias, social trust, lack of transparency, and regulatory compliance. The importance of achieving trustworthy and responsible AI governance via the collaborative operation of social and technical dimensions is emphasized, with collective responsibility shared by developers, businesses, policymakers, and civil society. The current AI ethical frameworks must be established under risk contexts characterized by uncertainty and the undiscovered and unknown, guiding AI development toward a more legitimate and sustainable future.

4.2. Integrative Analysis and Trend Summary

The keyword network analysis results from 2022 to 2024 reveal that AI ethics issues exhibit an evolving trend that progressively expands from technical implementation to institutional regulation and social responsibility. Moreover, the keyword structures demonstrate increasingly complex and interdisciplinary governance concerns each year.

The 2022 AI ethics discourse emphasized the inseparable relationship between technical implementation, statistical construction logic, and the resulting social consequences. Without institutionalized ethical safeguards for prevention and accountability, AI technologies risk losing social trust and legitimacy. Therefore, the governance framework, spanning from technology development to institutional regulation, must incorporate ethical practices, addressing core problems (e.g., bias, transparency, and human rights protection) to promote the sustainable and equitable application of AI, achieving trustworthy and responsible AI development.

The 2023 AI ethics keyword structure displays a characteristic chain of ethical risk semantics linking training data, technical design, and governance policies. This structure indicates that AI ethics challenges are not isolated, discrete incidents but span from micro-level issues (e.g., data bias and generative misinformation) to decision-making effects and ethical design in the technical development process, culminating in systemic challenges of misuse risk and governance responsibility at the societal level. This keyword structure reflects the core perspective in 2023, emphasizing the establishment of an integrated ethical framework encompassing data governance, technical design, and misuse prevention. This framework ensures that responsible governance mechanisms are constructed while developing and innovating AI technology, achieving a development vision guided by social values and ensuring that AI deployment serves human well-being and social value.

The 2024 AI ethics issues present a multi-layered and interdisciplinary complexity. The scope of AI ethics has expanded from single technical applications to an integrated framework encompassing technology deployment, data sources, institutional regulation, and social trust. Although the rapid development of generative AI has promoted digital innovation and industrial upgrading, it has also introduced unprecedented ethical risks and governance pressures. As a result of systemic challenges (e.g., algorithmic bias, privacy infringement, lack of system transparency, and regulatory lag), future AI governance must be based on the collaborative operation of technological and social dimensions, emphasizing the collective responsibility of developers, businesses, policymakers, and civil society. Moreover, under the uncertain and rapidly evolving risks posed by generative AI, establishing a forward-looking ethical framework addressing uncertainty and potential risks is a critical direction indicated by the 2024 keyword network, guiding AI development toward a more legitimate and sustainable future.

4.3. Reliability Enhancement and Bias Mitigation via Cross-Model Validation

As described in Section 3.6, incorporating cross-model validation into the workflow provided an additional safeguard against interpretive biases and potential inconsistencies inherent in single-model summarization. By applying identical KeyGraph-derived inputs and topic-detection instructions to both ChatGPT and Google Gemini, and then quantifying their semantic alignment using three distinct embedding models—the Universal Sentence Encoder, MiniLMv2, and the Cohere Embed model—paired with cosine similarity scoring, the study objectively assessed the convergence of independent generative processes. The semantic similarity between ChatGPT- and Gemini-generated AI ethics themes from 2022 to 2024 was evaluated with these embedding models, as shown in Figure 8, Figure 9 and Figure 10. Similarity scores ranged from 0.687 to 0.899. Following prior work on semantic textual similarity [81,82], scores above 0.80 were interpreted as indicating a high degree of thematic agreement between the two systems, while scores between 0.75 and 0.79 were considered to represent upper-moderate agreement. These results suggest that the thematic content distilled by the two systems remained largely consistent despite differences in phrasing or lexical choice.

Each cluster generated by the KeyGraph algorithm represents a distinct semantic topic defined as a group of highly co-occurring keywords and their contextual relationships. Cross-model similarity for a given cluster was computed as the mean cosine similarity between the summaries produced by three sentence embedding models—Universal Sentence Encoder, MiniLMv2, and Cohere Embedding Model—applied to the same topic. This metric reflects the degree to which different embedding models produce consistent semantic representations for the same topic, with higher scores indicating greater interpretative stability and lower scores suggesting potential ambiguity or model-dependent variation.

To summarize annual trends in semantic stability, we computed the mean ± SD of cross-model similarity values for all clusters within each year. The results showed consistently high semantic agreement across models: 2022, 0.808 ± 0.023; 2023, 0.812 ± 0.013; 2024, 0.828 ± 0.015. While stability increased slightly over time, detailed analysis revealed systematic variations across three intersecting dimensions: (1) year-to-year trends, (2) cluster structure complexity, and (3) model-specific characteristics. In 2022, single-cluster topics exhibited strong convergence (USE: 0.824 ± 0.014; MiniLMv2: 0.796 ± 0.007; Cohere: 0.797 ± 0.045). However, two- and three-cluster combinations produced lower agreement, particularly in the MiniLMv2 for Cluster A-2 + A-4 (0.687). In 2023, the highest consistency again occurred in single clusters (USE: 0.818; MiniLMv2: 0.829; Cohere: 0.808), while MiniLMv2 showed reduced scores for multi-cluster combinations, with the lowest for the three-cluster combination B-1 + B-2 + B-3 (0.764). In 2024, all three models converged closely for single- and two-cluster combinations, with peak agreement between MiniLMv2 and Cohere in Cluster C-1 (0.875 vs. 0.871). Nevertheless, for three-cluster combinations, Cohere displayed the largest deviation, scoring 0.770 in Cluster C-2. Please refer to Table 4 for the summary of these results.

These results demonstrate that single-cluster topics consistently yield the highest cross-model similarity, indicating stable semantic representation when thematic focus is narrow. In contrast, combined clusters—especially those merging three distinct topics—tend to amplify inter-model differences due to greater semantic diversity. The observed discrepancies also reveal model-specific tendencies: MiniLMv2 often produces lower similarity for multi-cluster combinations, while Cohere appears more sensitive to certain high-complexity semantic aggregations. The year-to-year increase in average similarity suggests a slight improvement in thematic stability over time, potentially reflecting the convergence of AI ethics discourse around core concepts. This multidimensional analysis highlights the value of using diverse embedding models in longitudinal studies and supports a dual-layer validation framework that integrates quantitative cross-model metrics with qualitative review to mitigate model-specific interpretive bias.

5. Discussion

This section interprets the research findings and situates them within the context of existing academic studies for comparative evaluation. While prior research on AI ethics has addressed issues such as bias prevention, privacy protection, transparency, accountability, and multi-level governance, most studies adopt a static, cross-sectional perspective or focus narrowly on specific domains and technologies. The existing literature still lacks systematic examinations of temporal changes in AI ethics thematic structures based on consistent and authoritative report data, as well as scholarly analyses offering in-depth interpretations for specific years.

To address this gap, this study compared AI ethics thematic clusters generated through KeyGraph–ChatGPT analysis for 2022–2024 with the literature review in Section 2.1 and the core concepts in Section 1 and Section 2. This cross-year, cross-source comparison identified (i) clusters that reaffirm and extend existing perspectives and (ii) emergent or structurally integrated themes not yet widely documented. This approach moves beyond descriptive annual keyword networks toward interpretive synthesis, linking computationally derived semantic structures to normative and practical concerns in AI ethics research. It verifies the robustness of the analytical framework and, by highlighting persistent priorities and shifting emphases over the three-year period, enriches academic discourse on AI governance. This study should therefore be regarded as an intra-family improvement within KeyGraph-based approaches rather than a cross-family performance competition with other topic modeling methods such as LDA.

5.1. Year 2022: Comparative Interpretation

Cluster A-1 addresses human–computer interaction design, responsibility allocation, and safety in autonomous driving—aligning with the literature emphasizing transparency, accountability, and trust [1,4,25]—and also resonates with perspectives on verifiable sources and platform accountability in medical and public services [26]. Cluster A-2 outlines ethical risks from AI behavior, particularly bias and discrimination in deployment, corresponding to calls for addressing fairness conflicts and embedding ethics in design [21,23]. Cluster A-3 focuses on technical bias, explainability, and the institutionalization of ethics, reflecting arguments for integrating ethics throughout the development process [7,32] and linking to data governance challenges. Cluster A-4 underscores human rights protection, institutional justice, and transparency, consistent with emphases on dignity, rights, and procedural justice [20,23,25,29].

Combined analysis reveals (i) the risk chain from algorithmic and data deficiencies to societal impacts (A-2 + A-3); (ii) integration of bias management and human rights protection (A-2 + A-4); and (iii) a triadic governance framework encompassing technology, behavior, and values (A-2 + A-3 + A-4), aligning with multi-level governance models [25,33].

5.2. Year 2023: Comparative Interpretation

Cluster B-1, centered on trained, addresses data legitimacy, privacy protection, model bias, and user participation, aligning with discussions on bias and privacy in generative AI [21]. Cluster B-2 focuses on systems and human–machine interactions linked to development, emphasizing bias governance, institutional regulation, and trust—echoing calls to embed ethical principles and regulation in AI development [7,25]. Cluster B-3, centered on misuse, highlights institutional and policy roles in preventing AI misuse, consistent with frameworks for integrating responsibility systems, risk management, and social impact assessment [1].

Combined analysis reveals (i) full-chain ethical challenges from data governance to decision-making impacts (B-1 + B-2); (ii) integrated responsibility frameworks for development, deployment, and misuse prevention (B-2 + B-3); and (iii) a chain-linking model training, system development, and misuse prevention (B-1 + B-2 + B-3), corroborating prior discussions of chain-linked ethical risks [25,33].

5.3. Year 2024: Comparative Interpretation

Cluster C-1 addresses the integration of generative AI into social media and public communication, extending to risks of disinformation, manipulation, and bias—aligning with the literature on ethical risks and responsibility in deployment [22,26]. Cluster C-2, centered on security, integrates data sources, model training, and bias management into AI safety and ethics, emphasizing regulatory mechanisms via connections among transparency, privacy, and compliance—consistent with balancing these principles with legality [21,25,33].

The combined cluster (C-1 + C-2) reflects cross-disciplinary, multi-level approach to AI ethics in 2024, integrating generative AI development in media and digital content with system-level governance challenges (privacy, bias, transparency, compliance). This integration echoes calls for collaborative socio-technical governance with shared stakeholder responsibility [23,33].

5.4. Cross-Year Synthesis

From 2022 to 2024, core issues persisted, demonstrating stability and sustained attention despite rapid technological evolution. Data legitimacy and privacy protection remained central—progressing from governance-focused discussions in 2022, to parallel technological and institutional governance in 2021, to integration within AI safety frameworks in 2024—all reinforcing their role in legitimacy and trust. Algorithmic bias and fairness remained constant challenges, spanning prevention, institutional implementation, and trust impacts. Transparency and explainability evolved from system interpretability (2022), to integration with corporate responsibility and policy (2023), to association with compliance and trust (2024). Responsibility allocation and institutional governance progressed from liability distribution to collective responsibility, encompassing misuse prevention and cross-level governance structures.

Across the three years, the interplay between technology and society is evident—from the triadic governance model of 2022, to chain-linked risks in 2023, to coordinated socio-technical governance in 2024—highlighting privacy, bias governance, transparency, responsibility, and socio-technical integration as enduring focal themes in AI ethics discourse.

6. Conclusions

Building upon the results presented in Section 4.3, the high similarity scores observed in the cross-model validation confirm that the hybrid KeyGraph–ChatGPT approach can produce consistent and convergent topic interpretations across different generative AI systems. This finding suggests that the method is not only effective in the absence of domain experts but also resilient to model-specific interpretive tendencies. Moreover, combining quantitative similarity metrics with qualitative manual review provides a dual-layer validation framework, thereby strengthening the robustness, credibility, and generalizability of the proposed methodology.

This study adopts the KeyGraph algorithm as its core analytical method to examine the evolving semantic structures in AI ethics discourse. By constructing keyword co-occurrence networks, this work offers an in-depth understanding of the multi-layered thematic architecture and conceptual transitions in this domain. The systematic semantic analysis of representative articles from 2022 to 2024 revealed that the KeyGraph algorithm identifies core nodes and semantic linkages over time. The clustering of keywords and identification of chance nodes facilitated the mapping of the developmental trajectory and internal logic of AI ethics over time.

This analysis of time-specific keyword networks demonstrates that the evolution of AI ethics discourse is not static. Instead, AI ethics discourse dynamically evolves in response to the interplay between technological advancement, regulatory change, and societal interaction, prompting continuous semantic reconstruction and thematic reorientation. The findings reveal latent interconnections between technology, law, and society, offering a novel perspective for understanding the complexity of AI ethics. These results validate the effectiveness of KeyGraph in processing large-scale unstructured textual data, extracting critical information, and constructing meaningful keyword networks for semantic exploration.

The text-mining results produced by the KeyGraph algorithm were integrated with the generative AI ChatGPT-based text summarization technique to enhance the analytical rigor and accuracy of the study. This integration allows for guided semantic interpretation and topic detection in co-occurrence-driven keyword clusters. The combined approach strengthens the depth, accuracy, and interpretive consistency of latent topic detection from complex articles. The cross-validation between the keyword co-occurrence network analysis and generative language models improves the scientific robustness and credibility of our understanding of AI ethics discourse and its semantic evolution.

The proposed method has broad potential for cross-domain applications. Beyond the AI ethics domain, integrating the KeyGraph algorithm with temporal analysis strategies can be extended to issue exploration and trend tracking in various fields (e.g., social media, e-commerce, and news media). The dynamic analysis of keyword node structures and co-occurrence intensities over time can reveal shifts in public attention and capture the evolving focus of technology and topics with greater precision. This approach enables the early identification of emerging concepts that may pose potential risks or offer innovative value (e.g., market reactions to new products, the progression of societal hotspots, or changes in user sentiment).

Such dynamic tracking capabilities offer valuable insight for corporate decision making, public policy formulation, and academic research, supporting more accurate forecasting and timely responses to evolving challenges. Overall, this study demonstrates the applicability of the keyword co-occurrence network analysis in AI ethics and proposes a transferable framework that can be extended to other interdisciplinary issues. With the continued refinement of the KeyGraph algorithm and the integration of advanced LLMs for summarization, this approach is promising for advancing a deeper understanding and anticipatory reflection of the ethical implications of AI technology.

7. Limitations

Although this study demonstrates the feasibility and potential of integrating the KeyGraph algorithm with large language models (LLMs) for semantic analysis of AI ethics discourse, several limitations should be noted. First, the dataset comprises only 24 reports published over a three-year period, which limits the generalizability of the findings and may not fully capture the breadth and evolving diversity of global AI ethics debates. Second, while the deliberate selection of authoritative sources ensures a high degree of credibility, it may inadvertently introduce selection bias by underrepresenting perspectives from smaller institutions, grassroots organizations, or non-English publications. Third, the use of LLMs such as ChatGPT carries inherent risks of semantic hallucination, topic drift, and overgeneralization. Although the dual-layer control strategy and cross-model validation employed in this study substantially mitigate these risks, complete elimination remains unattainable. Fourth, the KeyGraph parameterization—including the number of high-frequency keywords and the number of “chance” keywords—was determined empirically. Alternative configurations may produce different network structures and topic interpretations, potentially influencing downstream thematic synthesis. Finally, the absence of systematic domain-expert validation for all identified clusters limits the extent to which the semantic interpretations presented herein can be considered definitive. Future research should address these limitations by expanding datasets across languages and domains, applying systematic parameter optimization, and incorporating iterative validation with domain experts to enhance both reliability and interpretive depth.

Author Contributions

Conceptualization, W.-H.L. and H.-C.Y.; methodology, W.-H.L. and H.-C.Y.; software, W.-H.L.; validation, W.-H.L. and H.-C.Y.; formal analysis, W.-H.L. and H.-C.Y.; investigation, W.-H.L.; resources, W.-H.L.; data curation, W.-H.L.; writing—original draft preparation, W.-H.L.; writing—review and editing, W.-H.L. and H.-C.Y.; visualization, W.-H.L. and H.-C.Y.; supervision, H.-C.Y.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
HCI	Human-Computer Interaction
LDA	Latent Dirichlet Allocation
LLM	Large Language Model
SDGs	Sustainable Development Goals

Appendix A. KeyGraph Algorithm

Data preprocessing:

Data preprocessing forms the foundation for constructing a keyword network in the KeyGraph algorithm. In this study, preprocessing involves several steps, including tokenization, normalization, stop-word removal, and part-of-speech filtering. Tokenization and normalization establish a stable keyword base, while stop word removal and part-of-speech filtering reduce semantic noise, enhancing the accuracy of co-occurrence analysis and the quality of the network structure. These steps optimize the performance of topic detection and chance node identification [47,56,57].

2.: High-frequency keyword extraction:

Based on the preprocessed data, a new dataset

D

is generated, consisting of sentences, each representing a set of keywords. All keywords are ranked according to their frequency in

D

, and the highest-frequency keywords are selected to form the high-frequency keyword set. These keywords serve as the nodes of the network cluster

G

[47,53,56].

3.: Calculation of keyword network co-occurrence:

In the KeyGraph algorithm, the co-occurrence relationship between keywords is the core basis for constructing the keyword network. Each keyword is treated as a network node. When two keywords co-occur in the same semantic unit (e.g., a sentence or paragraph), a link (edge) is formed between them. The co-occurrence relationship is quantified using the co-occurrence strength [47,54,55], which is calculated as follows in Equation (A1):

a s s o c (w_{i}, w_{j}) = \sum_{s \in D} m i n (| {w_{i} |}_{s}, | {w_{j} |}_{s})

(A1)

where

a s s o c (w_{i}, w_{j})

represents the co-occurrence strength between keywords

w_{i}

and

w_{j}

across all semantic units

s

in dataset

D .

This measure is calculated by summing the minimum occurrence frequencies of the two keywords in the same semantic unit, reflecting their co-occurrence count. In addition,

| {w_{i} |}_{s}

and

| {w_{j} |}_{s}

denote the frequencies of keywords

w_{i}

and

w_{j}

, respectively, in the semantic unit

s, w h e r e m i n (| {w_{i} |}_{s}, | {w_{j} |}_{s})

indicates the minimum occurrence frequency between

w_{i}

and

w_{j}

in the semantic unit

s .

This metric aggregates the minimal occurrence counts across semantic units to capture the overall semantic linkage strength between the keyword pair. This approach helps construct a semantic backbone comprising high frequency terms and reveals latent nodes that may have lower surface frequencies yet critical semantic significance.

In addition to absolute co-occurrence values, normalized measures, such as the Jaccard similarity coefficient, are often used [56], as calculated in Equation (A2):

J a c c a r d (w_{i}, w_{j}) = \frac{F r e q (w_{i} \cap w_{j})}{F r e q (w_{i} \cup w_{j})},

(A2)

where

F r e q (w_{i} \cap w_{j})

is the number of times

w_{i}

and

w_{j}

co-occur in the same semantic unit, and

F r e q (w_{i} \cup w_{j})

is the total frequency of either

k e y w o r d i n D .

The Jaccard coefficient ranges from 0 to 1, with higher values indicating stronger similarity between keywords, implying greater semantic similarity. As a normalized similarity measure, the Jaccard similarity coefficient can be employed to adjust the edge weights in the network generated by the KeyGraph algorithm, mitigating the linking bias caused by high-frequency terms and enhancing the performance of the keyword network in topic detection and keyword identification. This approach helps reveal the relational paths in the deeper semantic structure more effectively.

4.: Co-occurrence between keywords and keyword clusters:

The KeyGraph algorithm measures the connection strength between a keyword

w

and a single keyword cluster

g

using the Co-Occurrence Strength Index [38,47,48], which is defined in Equation (A3) as:

b a s e d (w, g) = \sum_{s \in D} {| w |}_{s} {| g - w |}_{s ″}

(A3)

where

w

is a retained keyword in the preprocessed dataset

D

,

g

is the target cluster, and

D

represents the dataset obtained after preprocessing. The co-occurrence strength is calculated based on sentences

s

, which are treated as the fundamental semantic units and are typically considered as sets of keywords that define co-occurrence relationships. Here,

{| w |}_{s}

denotes the frequency of keyword

w

in sentence

s

, while

{| g - w |}_{s}

represents the total occurrences of all keywords in cluster

g

, excluding

w,

within the same sentence

s

. If no other keywords from the cluster appear in the sentence, this value is zero.

This formula evaluates each sentence in dataset

D

to determine whether the target keyword

w

appears. When it does, this formula multiplies the frequency of keyword

w

by the total frequency of the other keywords in cluster

g

(excluding

w

) in that sentence. The resulting products are then summed across all sentences to quantify the overall co-occurrence strength between keyword

w

and cluster

g

.

5.: Calculating the co-occurrence potential of all keywords in cluster $g$ :

In keyword network analysis, the association between a keyword and a cluster depends on both the degree of co-occurrence and the contextual interactions among the cluster and other keywords. The KeyGraph algorithm provides a standardized metric for evaluating these associations through a cluster-level semantic quantification measure called

n e i g h b o r s (g)

, which estimates the potential of a specific keyword cluster

g

to interact semantically with other keywords across the dataset [47,54,55], as defined in Equation (A4):

n e i g h b o r s (g) = \sum_{s \in D} \sum_{w \in S} {| w |}_{s} {| g - w |}_{s}

(A4)

where

s

represents each sentence in dataset

D

, which is regarded as a set of co-occurring keywords.

S

is the set of all keywords,

w ϵ S

represents any keyword in

S

,

{| w |}_{s}

denotes the frequency of

w

in sentence

s

, and

{| g - w |}_{s}

denotes the total frequency of all other keywords in cluster

g

(excluding

w

) appearing in the same sentence

s

. If none of the keywords in the cluster appear in the sentence, this value is zero.

This formula iterates through each sentence

s

in dataset

D

and, for each keyword in the high-frequency keyword set

S

, calculates its co-occurrence strength with the other keywords in cluster

g

when

w

appears. These values are aggregated to measure the total co-occurrence strength between cluster

g

and the keywords in

S

across the dataset. A higher

n e i g h b o r s (g)

value indicates that cluster

g

has stronger connections with high-frequency keywords, suggesting its potential significance in the semantic structure.

6.: Evaluation of keyword potential across clusters:

To evaluate the connective role of a keyword

w

in the overall keyword network graph and determine whether it bridges clusters, this study employs the keyness calculation from the KeyGraph algorithm [47,54,55], which is defined in Equation (A5):

k e y (w) = 1 - \prod_{g \subset G} [1 - \frac{b a s e d (w, g)}{n e i g h b o r s (g)}]

(A5)

where

k e y (w)

represents the importance score of keyword

w

, ranging from 0 to 1.

G

is the set of all keyword clusters, with

g

representing an individual cluster. The expression

g \subset G

indicates that each cluster in set

G

is evaluated individually to compute the semantic relevance of

w

to that cluster. Here,

b a s e d (w, g)

is the co-occurrence strength between

w

and cluster

g

, and

n e i g h b o r s (g)

represents the total co-occurrence strength of cluster

g

.

This formula calculates the product of the co-occurrence complements

w

with each cluster

g \subset G

, representing the overall probability that

w

has no co-occurrence with the various clusters. Subtracting this product from 1 yields

k e y (w)

, reflecting the importance of

w

in bridging multiple clusters.

When

w

exhibits significant co-occurrence with multiple keyword clusters,

k e y (w)

approaches 1, indicating that

w

is strongly associated with multiple keyword clusters and may represent a potential chance node. Conversely, when

k e y (w)

approaches 0,

w

has minimal co-occurrence with other clusters, reflecting lower importance within the keyword network structure.

References

Shetty, D.K.; Arjunan, R.V.; Cenitta, D.; Makkithaya, K.; Hegde, N.V.; Bhatta, B.S.R.; Salu, S.; Aishwarya, T.R.; Bhat, P.; Pullela, P.K. Analyzing AI Regulation through Literature and Current Trends. J. Open Innov. Technol. Mark. Complex. 2025, 11, 100508. [Google Scholar] [CrossRef]
Tallberg, J.; Lundgren, M.; Geith, J. AI Regulation in the European Union: Examining Non-State Actor Preferences. Bus. Politics 2024, 26, 218–239. [Google Scholar] [CrossRef]
Ong, J.C.L.; Chang, S.Y.; William, W.; Butte, A.J.; Shah, N.H.; Chew, L.S.T.; Liu, N.; Doshi-Velez, F.; Lu, W.; Savulescu, J.; et al. Ethical and Regulatory Challenges of Large Language Models in Medicine. Lancet Digit. Health 2024, 6, e428–e432. [Google Scholar] [CrossRef]
Huang, C.; Zhang, Z.; Mao, B.; Yao, X. An Overview of Artificial Intelligence Ethics. IEEE Trans. Artif. Intell. 2023, 4, 799–819. [Google Scholar] [CrossRef]
Tabassum, A.; Elmahjub, E.; Padela, A.I.; Zwitter, A.; Qadir, J. Generative AI and the Metaverse: A Scoping Review of Ethical and Legal Challenges. IEEE Open J. Comput. Soc. 2025, 6, 348–359. [Google Scholar] [CrossRef]
Taeihagh, A. Governance of Generative AI. Policy Soc. 2025, 44, 1–22. [Google Scholar] [CrossRef]
Morley, J.; Elhalal, A.; Garcia, F.; Kinsey, L.; Mökander, J.; Floridi, L. Ethics as a Service: A Pragmatic Operationalisation of AI Ethics. Minds Mach. 2021, 31, 239–256. [Google Scholar] [CrossRef]
Mittelstadt, B.D. Principles alone cannot guarantee ethical AI. Nat. Mach. Intell. 2019, 1, 501–507. [Google Scholar] [CrossRef]
Cath, C.; Wachter, S.; Mittelstadt, B.; Taddeo, M.; Floridi, L. Artificial intelligence and the ‘Good Society’: The US, EU, and UK Approach. Sci. Eng. Ethics 2018, 24, 505–528. [Google Scholar]
Ohsawa, Y.; McBurney, P. Chance Discovery; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
Vayansky, I.; Kumar, S.A.P. A Review of Topic Modeling Methods. Inf. Syst. 2020, 94, 101582. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Sayyadi, H.; Raschid, L. A Graph Analytical Approach for Topic Detection. ACM Trans. Internet Technol. 2013, 13, 1–23. [Google Scholar] [CrossRef]
Hayashi, T.; Ohsawa, Y. Information Retrieval System and Knowledge Base on Diseases Using Variables and Contexts in the Texts. Procedia Comput. Sci. 2019, 159, 1662–1669. [Google Scholar] [CrossRef]
Wang, J.; Lai, J.Y.; Lin, Y.H. Social Media Analytics for Mining Customer Complaints to Explore Product Opportunities. Comput. Ind. Eng. 2023, 178, 109104. [Google Scholar] [CrossRef]
Guler, N.; Kirshner, S.N.; Vidgen, R. A Literature Review of Artificial Intelligence Research in Business and Management Using Machine Learning and ChatGPT. Data Inf. Manag. 2024, 8, 100076. [Google Scholar] [CrossRef]
Nissen, H.E. Using Double Helix Relationships to Understand and Change Information Systems. Informing Sci. Int. J. Emerg. Transdiscip. J. 2007, 10, 21–62. [Google Scholar]
Chechkin, A.; Pleshakova, E.; Gataullin, S. A Hybrid KAN-BiLSTM Transformer with Multi-Domain Dynamic Attention Model for Cybersecurity. Technologies 2025, 13, 223. [Google Scholar] [CrossRef]
Fjeld, J.; Achten, N.; Hilligoss, H.; Nagy, A.; Srikumar, M. Principled Artificial Intelligence: Mapping Consensus in Ethical and Rights-Based Approaches to Principles for AI. Berkman Klein Cent. Res. Publ. 2020, 2020, 39. [Google Scholar] [CrossRef]
Khan, A.A.; Badshah, S.; Liang, P.; Waseem, M.; Khan, B.; Ahmad, A.; Fahmideh, M.; Niazi, M.; Akbar, M.A. Ethics of AI: A Systematic Literature Review of Principles and Challenges. In Proceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering (EASE 2022), Gothenburg, Sweden, 13–15 June 2022; pp. 383–392. [Google Scholar]
Kirova, V.D.; Ku, C.S.; Laracy, J.R.; Marlowe, T.J. The ethics of artificial intelligence in the era of generative AI. J. Syst. Cybern. Inform. 2023, 21, 42–50. [Google Scholar] [CrossRef]
De Fine Licht, K. Resolving Value Conflicts in Public AI Governance: A Procedural Justice Framework. Gov. Inf. Q. 2025, 42, 102033. [Google Scholar] [CrossRef]
Fruchter, R.; Ohsawa, Y.; Matsumura, N. Knowledge Reuse through Chance Discovery from an Enterprise Design-Build Enterprise Data Store. New Math. Nat. Comput. 2005, 1, 393–406. [Google Scholar] [CrossRef]
Kieslich, K.; Keller, B.; Starke, C. Artificial intelligence ethics by design. Evaluating public perception on the importance of ethical design principles of artificial intelligence. Big Data Soc. 2022, 9, 20539517221092956. [Google Scholar] [CrossRef]
Inglada Galiana, L.; Corral Gudino, L.; Miramontes González, P. Ethics and artificial intelligence. Rev. Clin. Esp. 2024, 224, 178–186. [Google Scholar] [CrossRef]
Buolamwini, J.; Gebru, T. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, New York, NY, USA, 23–24 February 2018; Friedler, S.A., Wilson, C., Eds.; PMLR: New York, NY, USA, 2018; Volume 81, pp. 77–91. [Google Scholar]
Njiru, D.K.; Mugo, D.M.; Musyoka, F.M. Ethical considerations in AI-based user profiling for knowledge management: A critical review. Telemat. Inform. Rep. 2025, 18, 100205. [Google Scholar] [CrossRef]
Heilinger, J.C. The Ethics of AI Ethics. A Constructive Critique. Philos. Technol. 2022, 35, 61. [Google Scholar] [CrossRef]
Luomala, M.; Naarmala, J.; Tuomi, V. Technology-Assisted Literature Reviews with Technology of Artificial Intelligence: Ethical and Credibility Challenges. Procedia Comput. Sci. 2025, 256, 378–387. [Google Scholar] [CrossRef]
Hermansyah, M.; Najib, A.; Farida, A.; Sacipto, R.; Rintyarna, B.S. Artificial intelligence and ethics: Building an artificial intelligence system that ensures privacy and social justice. Int. J. Sci. Soc. 2023, 5, 154–168. [Google Scholar] [CrossRef]
Chen, F.; Zhou, J.; Holzinger, A.; Fleischmann, K.R.; Stumpf, S. Artificial Intelligence Ethics and Trust: From Principles to Practice. IEEE Intell. Syst. 2023, 38, 5–8. [Google Scholar] [CrossRef]
Gupta, A.; Raj, A.; Puri, M.; Gangrade, J. Ethical Considerations in the Deployment of AI. J. Propul. Technol. 2024, 45, 1001–4055. [Google Scholar]
Ohsawa, Y. Data Crystallization: A Project beyond Chance Discovery for Discovering Unobservable Events. In Proceedings of the 2005 IEEE International Conference on Granular Computing, Beijing, China, 25–27 July 2005; IEEE: Piscataway, NJ, USA, 2005; Volume 1, pp. 51–56. [Google Scholar]
Holzinger, A. Human-Computer Interaction and Knowledge Discovery (HCI-KDD): What Is the Benefit of Bringing Those Two Fields to Work Together? In Availability, Reliability, and Security in Information Systems and HCI; Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8127. [Google Scholar]
Ohsawa, Y.; Fukuda, H. Chance discovery by stimulated groups of people: Application to understanding consumption of rare food. J. Contingencies Crisis Manag. 2002, 10, 129–138. [Google Scholar] [CrossRef]
Ko, N.; Jeong, B.; Choi, S.; Yoon, J. Identifying Product Opportunities Using Social Media Mining: Application of Topic Modeling and Chance Discovery Theory. IEEE Access 2018, 6, 1680–1693. [Google Scholar] [CrossRef]
Ohsawa, Y. Chance Discoveries for Making Decisions in Complex Real World. New Gener. Comput. 2002, 20, 143–163. [Google Scholar] [CrossRef]
Ohsawa, Y.; Nishihara, Y. Innovators’ Marketplace: Using Games to Activate and Train Innovators; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Ho, T.B.; Nguyen, D.D. Chance Discovery and Learning Minority Classes. New Gener. Comput. 2003, 21, 149–161. [Google Scholar] [CrossRef]
Ohsawa, Y.; Tsumoto, S. Chance Discoveries in Real World Decision Making: Data-Based Interaction of Human Intelligence and Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2006; Volume 30. [Google Scholar]
Ohsawa, Y.; Nara, Y. Modeling the Process of Chance Discovery by Chance Discovery on Double Helix. In Proceedings of the AAAI Fall Symposium on Chance Discovery, North Falmouth, MA, USA, 15–17 November 2002; AAAI Press: Arlington, VA, USA, 2002; pp. 33–40. [Google Scholar]
Wang, H.; Ohsawa, Y.; Nishihara, Y. Innovation Support System for Creative Product Design Based on Chance Discovery. Expert. Syst. Appl. 2012, 39, 4890–4897. [Google Scholar] [CrossRef]
Wang, H.; Ohsawa, Y. Idea discovery: A Scenario-Based Systematic Approach for Decision Making in Market Innovation. Expert. Syst. Appl. 2013, 40, 429–438. [Google Scholar] [CrossRef]
Yang, S.; Sun, Q.; Zhou, H.; Gong, Z.; Zhou, Y.; Huang, J. A Topic Detection Method Based on KeyGraph and Community Partition. In Proceedings of the 2018 International Conference on Computing and Artificial Intelligence (ICCAI 2018), Chengdu, China, 12–14 May 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 30–34. [Google Scholar]
Ohsawa, Y. KeyGraph as Risk Explorer in Earthquake–Sequence. J. Contingencies Crisis Manag. 2002, 10, 119–128. [Google Scholar] [CrossRef]
Ohsawa, Y.; Benson, N.E.; Yachida, M. KeyGraph: Automatic indexing by co-occurrence graph based on building construction metaphor. In Proceedings of the IEEE International Forum on Research and Technology Advances in Digital Libraries, Santa Barbara, CA, USA, 22–24 April 1998; pp. 12–18. [Google Scholar]
Wanchia, K.; Yufei, J.; Hsinchun, Y. Discovering Emerging Financial Technological Chances of Investment Management in China via Patent Data. Int. J. Bus. Econ. Aff. 2020, 5, 1–8. [Google Scholar] [CrossRef]
Geum, Y.; Kim, M. How to Identify Promising Chances for Technological Innovation: Keygraph-Based Patent Analysis. Adv. Eng. Inform. 2020, 46, 101155. [Google Scholar] [CrossRef]
Sakakibara, T.; Ohsawa, Y. Gradual-Increase Extraction of Target Baskets as Preprocess for Visualizing Simplified Scenario Maps by KeyGraph. Soft Comput. 2007, 11, 783–790. [Google Scholar] [CrossRef]
Kim, K.-J.; Jung, M.-C.; Cho, S.-B. KeyGraph-Based Chance Discovery for Mobile Contents Management System. Int. J. Knowl. Based Intell. Eng. Syst. 2007, 11, 313–320. [Google Scholar] [CrossRef]
Perera, K.; Karunarathne, D. KeyGraph and WordNet Hypernyms for Topic Detection. In Proceedings of the 2015 12th International Joint Conference on Computer Science and Software Engineering (JCSSE), Chonburi, Thailand, 22–24 July 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 303–308. [Google Scholar]
Beliga, S.; Meštrović, A.; Martinčić-Ipšić, S. An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 2015, 39, 1–20. [Google Scholar]
Pan, R.C.; Hong, C.F.; Huang, N.; Hsu, F.C.; Wang, L.H.; Chi, T.H. One-Scan KeyGraph Implementation. In Proceedings of the 3rd Conference on Evolutionary Computation Applications & 2005 International Workshop on Chance Discovery, Taichung, Taiwan, 3 December 2005. [Google Scholar]
Nezu, Y.; Miura, Y. Extracting Keywords on SNS by Successive KeyGraph. In Proceedings of the 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore, MD, USA, 9–11 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 997–1003. [Google Scholar]
Manning, C.D.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
Liu, B. Sentiment Analysis and Opinion Mining. Synth. Lect. Hum. Lang. Technol. 2012, 5, 1–167. [Google Scholar] [CrossRef]
Jobin, A.; Ienca, M.; Vayena, E. The Global Landscape of AI Ethics Guidelines. Nat. Mach. Intell. 2019, 19, 389–399. [Google Scholar] [CrossRef]
Mittelstadt, B.; Allo, P.; Taddeo, M.; Wachter, S.; Floridi, L. The Ethics of Algorithms: Mapping the Debate. Big Data Soc. 2016, 3, 1–21. [Google Scholar] [CrossRef]
Dignum, V. Responsible Artificial Intelligence: How to Develop and Use AI in a Responsible Way. In Artificial Intelligence: Foundations, Theory, and Algorithms; Springer: Cham, Switzerland, 2019. [Google Scholar]
Okazaki, N.; Ohsawa, Y. Polaris: An Integrated Data Miner for Chance Discovery. In Proceedings of the Third International Workshop on Chance Discovery and Its Management, Crete, Greece, 22–27 June 2003. [Google Scholar]
Sayyadi, H.; Hurst, M.; Maykov, A. Event Detection and Tracking in Social Streams. Proc. Int. AAAI Conf. Web Soc. Media 2009, 3, 311–314. [Google Scholar] [CrossRef]
Jo, Y.; Lagoze, C.; Giles, C.L. Detecting Research Topics via the Correlation between Graphs and Texts. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’07), San Jose, CA, USA, 12–15 August 2007; ACM: New York, NY, USA, 2007; pp. 370–379. [Google Scholar]
Lozano, S.; Calzada-Infante, L.; Adenso-Díaz, B.; García, S. Complex Network Analysis of Keywords Co-Occurrence in the Recent Efficiency Analysis Literature. Scientometrics 2019, 120, 609–629. [Google Scholar] [CrossRef]
Zhou, Z.; Zou, X.; Lv, X.; Hu, J. Research on Weighted Complex Network Based Keywords Extraction. In Proceedings of the 7th International Conference on Advanced Data Mining and Applications (ADMA 2013), Wuhan, China, 14–16 May 2013; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2013; Volume 8229, pp. 442–452. [Google Scholar]
Grimmer, J.; Stewart, B.M. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 2013, 21, 267–297. [Google Scholar] [CrossRef]
Firoozeh, N.; Nazarenko, A.; Alizon, F.; Daille, B. Keyword extraction: Issues and methods. Nat. Lang. Eng. 2020, 26, 259–291. [Google Scholar] [CrossRef]
De Graaf, R.; van der Vossen, R. Bits versus brains in content analysis. Comparing the advantages and disadvantages of manual and automated methods for content analysis. Communications 2013, 38, 433–443. [Google Scholar] [CrossRef]
Lewis, S.C.; Zamith, R.; Hermida, A. Content analysis in an era of big data: A hybrid approach to computational and manual methods. J. Broadcast. Electron. Media 2013, 57, 34–52. [Google Scholar] [CrossRef]
Feng, Y. Semantic Textual Similarity Analysis of Clinical Text in the Era of LLM. In Proceedings of the 2024 IEEE Conference on Artificial Intelligence, (CAI), Singapore, 22–24 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1284–1289. [Google Scholar]
Papageorgiou, E.; Chronis, C.; Varlamis, I.; Himeur, Y. A Survey on the Use of Large Language Models (LLMs) in Fake News. Future Internet 2024, 16, 298. [Google Scholar] [CrossRef]
Maktabdar Oghaz, M.; Babu Saheer, L.; Dhame, K.; Singaram, G. Detection and classification of ChatGPT-generated content using deep transformer models. Front. Artif. Intell. 2025, 8, 1458707. [Google Scholar] [CrossRef]
Wu, J.; Yang, S.; Zhan, R.; Yuan, Y.; Chao, L.S.; Wong, D.F. A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions. Comput. Linguist. 2025, 51, 275–338. [Google Scholar] [CrossRef]
Domínguez-Diaz, A.; Goyanes, M.; de-Marcos, L. Automating Content Analysis of Scientific Abstracts Using ChatGPT: A Methodological Protocol and Use Case. MethodsX 2025, 15, 103431. [Google Scholar] [CrossRef] [PubMed]
Ma, X.; Zhang, Y.; Ding, K.; Yang, J.; Wu, J.; Fan, H. On Fake News Detection with LLM Enhanced Semantics Mining. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), Miami, FL, USA, 12–16 November 2024; Al-Onaizan, Y., Bansal, M., Chen, Y.-N., Eds.; Association for Computational Linguistics: Miami, FL, USA, 2024; pp. 508–521. [Google Scholar]
Yang, X.; Li, Y.; Zhang, X.; Chen, H.; Cheng, W. Exploring the Limits of ChatGPT for Query or Aspect-Based Text Summarization. arXiv 2023, arXiv:2302.08081. [Google Scholar]
Bang, Y.; Cahyawijaya, S.; Lee, N.; Dai, W.; Su, D.; Wilie, B.; Lovenia, H.; Ji, Z.; Yu, T.; Chung, W.; et al. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACI 2023), Nusa Dua, Indonesia, 1–4 November 2023; Park, J.C., Arase, Y., Hu, B., Lu, W., Wijaya, D., Purwarianti, A., Krisnadhi, A.A., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 675–718. [Google Scholar]
Cer, D.; Yang, Y.; Kong, S.-Y.; Hua, N.; Limtiaco, N.; St. John, R.; Constant, N.; Guajardo-Céspedes, M.; Yuan, S.; Tar, C.; et al. Universal sentence encoder. arXiv 2018, arXiv:1803.11175. [Google Scholar]
Wang, W.; Bao, H.; Huang, S.; Dong, L.; Wei, F. MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 2140–2151. [Google Scholar]
Singh, D.K. Unraveling Enterprise Large Language Model Platform—Cohere. Int. J. Sci. Res. Publ. 2025, 15, 219–223. [Google Scholar] [CrossRef]
Cann, T.J.B.; Dennes, B.; Coan, T.; O’Neill, S.; Williams, H.T.P. Using Semantic Similarity to Measure the Echo of Strategic Communications. EPJ Data Sci. 2025, 14, 20. [Google Scholar] [CrossRef]
Cer, D.; Yang, Y.; Kong, S.-Y.; Hua, N.; Limtiaco, N.; John, R.S.; Constant, N.; Guajardo-Céspedes, M.; Yuan, S.; Tar, C.; et al. Universal Sentence Encoder. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium, 31 October–4 November 2018; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 169–174. [Google Scholar]

Figure 1. Schematic diagram of the double helix model.

Figure 2. Illustration of the KeyGraph algorithm.

Figure 3. Research process flowchart.

Figure 4. Co-occurrence relationship mapping in the association graph.

Figure 5. KeyGraph co-occurrence network of AI ethics articles in 2022.

Figure 6. KeyGraph co-occurrence network of AI ethics articles in 2023.

Figure 7. KeyGraph co-occurrence network of AI ethics articles in 2024.

Figure 8. Semantic similarity between ChatGPT-generated and Gemini-generated AI ethics themes in 2022 evaluated using three independent sentence embedding models.

Figure 9. Semantic similarity between ChatGPT-generated and Gemini-generated AI ethics themes in 2023 evaluated using three independent sentence embedding models.

Figure 10. Semantic similarity between ChatGPT-generated and Gemini-generated AI ethics themes in 2024 evaluated using three independent sentence embedding models.

Table 1. Data sources for international reports and articles in 2022.

	Original Title	Data Sources	Publication Date
1	The 2022 AI Index: Industrialization of AI and Mounting Ethical Concerns	Stanford HAI	March 2022
2	AI Ethics And AI Law Grappling With Overlapping And Conflicting Ethical Factors Within AI	Forbes	November 2022
3	The 2022 AI Index: AI’s Ethical Growing Pains	Stanford HAI	March 2022
4	Prioritising AI & Ethics: A perspective on change	Deloitte	May 2022
5	Top Nine Ethical Issues In Artificial Intelligence	Forbes	October 2022
6	AI Ethics And AI Law Are Moving Toward Standards That Explicitly Identify And Manage AI Biases	Forbes	October 2022
7	Evaluating Ethical Challenges in AI and ML	ISACA Journal	July 2022
8	We’re failing at the ethics of AI. Here’s how we make real impact	World Economic Forum, WEF	January 2022

Table 2. Data sources for international reports and articles in 2023.

	Original Title	Data Sources	Publication Date
1	The Ethics Of AI: Navigating Bias, Manipulation And Beyond	Forbes	June 2023
2	The Ethics Of AI: Balancing Innovation And Responsibility	Forbes	December 2023
3	AI Ethics In The Age Of ChatGPT—What Businesses Need To Know	Forbes	July 2023
4	96% Of People Consider Ethical And Responsible AI To Be Important	Forbes	April 2023
5	How Businesses Can Ethically Embrace Artificial Intelligence	Forbes	May 2023
6	Experts call for more diversity to combat bias in artificial intelligence	CNN	December 2023
7	5 AI Ethics Concerns the Experts Are Debating	Georgia Tech	August 2023
8	Ethical Concerns Are Playing Catch-Up in Companies’ AI Arms Race: Equality	Bloomberg	June 2023

Table 3. Data sources for international reports and articles in 2024.

	Original Title	Data Sources	Publication Date
1	AI’s Trust Problem	Harvard Business Review	May 2024
2	‘Uncovered, unknown, and uncertain’: Guiding ethics in the age of AI	Yale News	February 2024
3	AI Regulation Is Evolving Globally and Businesses Need to Keep Up	Bloomberg Law	December 2024
4	AI is not ready for primetime	CNN Business	March 2024
5	With AI warning, Nobel winner joins ranks of laureates who’ve cautioned about the risks of their own work	CNN	October 2024
6	Navigating The Ethics Of AI: Is It Fair And Responsible Enough To Use?	Forbes	November 2024
7	AI And Ethics: A Collective Responsibility For A Safer Future	Forbes	October 2024
8	AI Started as a Dream to Save Humanity. Then, Big Tech Took Over.	Bloomberg	September 2024

Table 4. Cross-model similarity (mean ± SD) for each cluster represents an identified topic. Raw similarity scores per model are reported in Figure 8, Figure 9 and Figure 10.

Year	Cluster	Cross-Model Similarity (Mean ± SD)
2022	A-1	0.840 ± 0.054
2022	A-2	0.791 ± 0.022
2022	A-3	0.829 ± 0.034
2022	A-4	0.813 ± 0.014
2022	A-2 + A-3	0.801 ± 0.019
2022	A-2 + A-4	0.765 ± 0.078
2022	A-2 + A-3 + A-4	0.819 ± 0.045
2023	B-1	0.818 ± 0.014
2023	B-2	0.829 ± 0.027
2023	B-3	0.808 ± 0.037
2023	B-1 + B-2	0.824 ± 0.069
2023	B-2 + B-3	0.797 ± 0.050
2023	B-1 + B-2 + B-3	0.796 ± 0.033
2024	C-1	0.807 ± 0.006
2024	C-2	0.843 ± 0.025
2024	C-1 + C-2	0.833 ± 0.056

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, W.-H.; Yu, H.-C. Using KeyGraph and ChatGPT to Detect and Track Topics Related to AI Ethics in Media Outlets. Mathematics 2025, 13, 2698. https://doi.org/10.3390/math13172698

AMA Style

Li W-H, Yu H-C. Using KeyGraph and ChatGPT to Detect and Track Topics Related to AI Ethics in Media Outlets. Mathematics. 2025; 13(17):2698. https://doi.org/10.3390/math13172698

Chicago/Turabian Style

Li, Wei-Hsuan, and Hsin-Chun Yu. 2025. "Using KeyGraph and ChatGPT to Detect and Track Topics Related to AI Ethics in Media Outlets" Mathematics 13, no. 17: 2698. https://doi.org/10.3390/math13172698

APA Style

Li, W.-H., & Yu, H.-C. (2025). Using KeyGraph and ChatGPT to Detect and Track Topics Related to AI Ethics in Media Outlets. Mathematics, 13(17), 2698. https://doi.org/10.3390/math13172698

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Using KeyGraph and ChatGPT to Detect and Track Topics Related to AI Ethics in Media Outlets

Abstract

1. Introduction

2. Literature Review

2.1. AI Ethics Literature Review

2.2. Chance Discovery Theory

2.3. Double Helix Model: Human–Machine Collaborative Framework for Chance Discovery

2.4. KeyGraph Algorithm Overview

2.5. Research Gap and Contributions

2.6. Research Questions

3. Methodology

3.1. Data Collection

3.2. Data Preprocessing

3.3. Construction of the Keyword Co-Occurence Network

3.3.1. Chance Discovery in AI Ethics Using KeyGraph

3.3.2. Analysis of Keyword Network Node Density and Topic Detection Accuracy

3.4. Selection of High-Frequency Keyword Clusters

3.5. Employing ChatGPT for Topic Detection

3.5.1. Limitations of Previous Methods

3.5.2. Technical Background: Semantic Comprehension and Topic Extraction in ChatGPT

3.5.3. Method: Integrating KeyGraph and ChatGPT for Topic Detection

3.6. R1 Semantic Diffusion Path

4. Result Analysis

4.1. Yearly Analysis of Topic Evolution and Keyword Structures (2022–2024)

4.2. Integrative Analysis and Trend Summary

4.3. Reliability Enhancement and Bias Mitigation via Cross-Model Validation

5. Discussion

5.1. Year 2022: Comparative Interpretation

5.2. Year 2023: Comparative Interpretation

5.3. Year 2024: Comparative Interpretation

5.4. Cross-Year Synthesis

6. Conclusions

7. Limitations

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. KeyGraph Algorithm

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI