Previous Article in Journal
Spatial Prediction of Forest Fire Risk in Guangdong Province Using Multi-Source Geospatial Data and Sparrow Search Algorithm-Optimized XGBoost
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Examining the Nature and Dimensions of Artificial Intelligence Incidents: A Machine Learning Text Analytics Approach

Gabelli School of Business, Fordham University, 140 W. 62nd Street, New York, NY 10023, USA
*
Author to whom correspondence should be addressed.
AppliedMath 2026, 6(1), 11; https://doi.org/10.3390/appliedmath6010011
Submission received: 14 November 2025 / Revised: 23 December 2025 / Accepted: 24 December 2025 / Published: 9 January 2026
(This article belongs to the Section Computational and Numerical Mathematics)

Abstract

As artificial intelligence systems proliferate across critical societal domains, understanding the nature, patterns, and evolution of AI-related harms has become essential for effective governance. Despite growing incident repositories, systematic computational analysis of AI incident discourse remains limited, with prior research constrained by small samples, single-method approaches, and absence of temporal analysis spanning major capability advances. This study addresses these gaps through a comprehensive multi-method text analysis of 3494 AI incident records from the OECD AI Policy Observatory, spanning January 2014 through October 2024. Six complementary analytical approaches were applied: Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) topic modeling to discover thematic structures; K-Means and BERTopic clustering for pattern identification; VADER sentiment analysis for emotional framing assessment; and LIWC psycholinguistic profiling for cognitive and communicative dimension analysis. Cross-method comparison quantified categorization robustness across all four clustering and topic modeling approaches. Key findings reveal dramatic temporal shifts and systematic risk patterns. Incident reporting increased 4.6-fold following ChatGPT’s (5.2) November 2022 release (from 12.0 to 95.9 monthly incidents), accompanied by vocabulary transformation from embodied AI terminology (facial recognition, autonomous vehicles) toward generative AI discourse (ChatGPT, hallucination, jailbreak). Six robust thematic categories emerged consistently across methods: autonomous vehicles (84–89% cross-method alignment), facial recognition (66–68%), deepfakes, ChatGPT/generative AI, social media platforms, and algorithmic bias. Risk concentration is pronounced: 49.7% of incidents fall within two harm categories (system safety 29.1%, physical harms 20.6%); private sector actors account for 70.3%; and 48% occur in the United States. Sentiment analysis reveals physical safety incidents receive notably negative framing (autonomous vehicles: −0.077; child safety: −0.326), while policy and generative AI coverage trend positive (+0.586 to +0.633). These findings have direct governance implications. The thematic concentration supports sector-specific regulatory frameworks—mandatory audit trails for hiring algorithms, simulation testing for autonomous vehicles, transparency requirements for recommender systems, accuracy standards for facial recognition, and output labeling for generative AI. Cross-method validation demonstrates which incident categories are robust enough for standardized regulatory classification versus those requiring context-dependent treatment. The rapid emergence of generative AI incidents underscores the need for governance mechanisms responsive to capability advances within months rather than years.

1. Introduction

The accelerated deployment of artificial intelligence systems across societal domains has generated unprecedented opportunities for innovation alongside substantial risks of harmful outcomes. The public release of ChatGPT in November 2022 catalyzed widespread AI adoption and intensified scrutiny of AI-related incidents, with media coverage reaching historic peaks [1]. This surge has prompted systematic efforts to document and analyze AI-related harms [2,3]. However, systematic understanding of AI incident patterns, characteristics, and evolution remains limited despite growing recognition of AI safety as a critical research priority.
AI incidents—defined by the OECD as events where AI system development, use, or malfunction resulted in or had the potential to result in harm to individuals, communities, or critical infrastructure—provide empirical evidence of AI risk manifestation [4]. Multiple repositories have emerged to collect such incidents, including the AI Incident Database [5,6] and the AIAAIC Repository [7], enabling cross-sectional analysis of AI harms. These incidents span diverse domains including healthcare misdiagnosis, discriminatory hiring algorithms [8,9,10], autonomous vehicle collisions [11,12], surveillance system errors, deepfake manipulation, content moderation failures, and biased criminal risk assessment tools. Understanding incident patterns is essential for developing effective governance frameworks, risk mitigation strategies, and accountability mechanisms.
Prior research on AI incidents has been constrained by four key limitations, as documented in recent systematic reviews of AI safety and incident documentation literature [2,3,13]. First, existing studies typically rely on small-scale case analyses or curated samples of high-profile incidents, limiting generalizability across incident types and domains. Second, qualitative approaches predominate, with few studies employing computational text analysis methods capable of processing large incident corpora systematically. Third, research has largely examined incidents within isolated domains (e.g., autonomous vehicles, facial recognition) rather than enabling cross-domain comparison of harm patterns, actor involvement, and linguistic framing. Fourth, the rapid evolution of AI capabilities, particularly following ChatGPT’s release necessitates temporal analysis that existing cross-sectional studies cannot provide.
This study addresses these gaps through a comprehensive, multi-method computational analysis of 3494 AI incidents documented in the OECD AI Incidents Monitor. The research makes three primary contributions. First, we provide the first large-scale text analytics characterization of AI incident discourse, revealing dominant themes, linguistic patterns, and sentiment dimensions across the full spectrum of documented incidents. Second, we conduct systematic temporal analysis comparing pre- and post-ChatGPT periods to assess whether this inflection point correlates with changes in incident frequency, thematic composition, or emotional framing. Third, we demonstrate a replicable methodological framework integrating complementary computational approaches—topic modeling, clustering, sentiment analysis, and psycholinguistic profiling—that advances the methodological toolkit for AI safety research.
Our analytical approach combines multiple computational techniques including topic modeling for thematic discovery, sentiment and psycholinguistic analysis for emotional tone assessment, and clustering methods for pattern identification. This multi-method synthesis enables triangulation of findings while revealing dimensions of incident discourse not accessible through any single approach. The OECD AI Incidents Monitor, as a comprehensive and internationally curated repository, provides an ideal foundation for this systematic analysis.
The remainder of this paper is organized as follows: Section 2 reviews relevant literature on AI incidents, risk taxonomies, and computational text analysis methodologies. Section 3 details data collection procedures and analytical methodology. Section 4 presents comprehensive results including descriptive analysis, topic modeling, temporal comparison, sentiment analysis, and clustering findings. Section 5 discusses implications for AI governance frameworks. Section 6 acknowledges limitations and future directions. Section 7 concludes with synthesis of key findings and policy recommendations.

2. Literature Review

2.1. Identifying Key AI Issues and Risks

Recent scholarship has increasingly focused on systematically categorizing AI risks and documenting real-world harm. Brundage et al. established foundational frameworks for anticipating malicious AI applications, identifying potential misuse across digital security, physical security, and political security domains [14]. Their taxonomy distinguished deliberate weaponization from negligent deployment, providing conceptual structure for subsequent empirical investigations. Recent scholarship has expanded these frameworks to address emerging risks from large language models and generative AI systems [15,16].
Facial recognition technologies have emerged as particularly contentious AI applications generating substantial incident documentation. Buolamwini and Gebru’s Gender Shades research exposed systematic accuracy disparities across demographic groups, with error rates for darker-skinned females reaching 34.7% compared to 0.8% for lighter-skinned males [17]. These findings catalyzed broader investigations into algorithmic bias and fairness, raising questions about incentive structures driving AI development and the persistence of biases despite vendor awareness.
Large language models represent an emerging category of AI systems generating novel incident types. Bommasani et al. characterized foundation models as systems trained on broad data capable of adaptation across tasks, highlighting risks including training data biases, environmental costs, concentration of power, and emergent capabilities [18]. Weidinger et al. provided comprehensive taxonomy of language model harms spanning discrimination, information hazards, misinformation, malicious use, human-computer interaction harms, and automation impacts [19]. The rapid deployment of ChatGPT and similar systems has intensified concerns about hallucination, bias amplification, and misuse potential [15,16].
Social media platforms constitute another critical nexus for AI incidents. Gorwa et al. examined algorithmic content moderation challenges, documenting how automated systems struggle with context-dependent speech, generating both over-censorship and under-enforcement [20]. The opacity of platform algorithms combined with massive scale creates conditions for systematic errors affecting millions while remaining largely invisible to oversight mechanisms. Deepfake proliferation has further complicated content authenticity assessment, with detection methods struggling to keep pace with generation capabilities [21,22]. While these studies provide valuable domain-specific insights, they examine AI risks within isolated application areas rather than across the full spectrum of AI incidents. This siloed approach limits understanding of cross-cutting patterns, shared risk factors, and comparative severity across domains.

2.2. Machine Learning and Text Analysis Methodologies

Computational text analysis has evolved substantially with deep learning advances. Topic modeling through Latent Dirichlet Allocation remains foundational for unsupervised thematic discovery [23]. LDA models documents as mixtures of topics and topics as probability distributions over words, enabling researchers to uncover latent semantic structures in large text corpora without requiring pre-labeled training data. Recent advances in neural topic models have enhanced scalability and contextual understanding [24,25]. Non-negative Matrix Factorization provides an alternative approach producing parts-based representations with high interpretability [26].
Sentiment analysis techniques quantify emotional valence and psychological dimensions in text. VADER (Valence Aware Dictionary and sEntiment Reasoner) provides lexicon-based sentiment scoring specifically calibrated for social media text, accounting for intensifiers, negations, and punctuation effects [27]. LIWC (Linguistic Inquiry and Word Count) measures psychological and linguistic dimensions including analytical thinking, emotional tone, authenticity, and clout, revealing cognitive and affective aspects of communication [28].
Clustering methods including K-Means [29] and density-based approaches such as HDBSCAN [30] enable unsupervised pattern discovery in high-dimensional text representations [29,30]. Recent advances in contextual embeddings and transformer architectures have expanded possibilities for semantic analysis, though traditional TF-IDF representations remain effective for interpretable clustering tasks. These methods are typically applied individually in prior AI safety research. Studies employing topic modeling rarely integrate sentiment analysis; clustering studies seldom incorporate psycholinguistic profiling. Furthermore, most applications involve small, manually curated samples rather than comprehensive incident databases. No prior study has systematically compared multiple clustering and topic modeling approaches on the same large-scale AI incident corpus to assess methodological convergence.

2.3. AI Incident Documentation Efforts

Systematic AI incident collection has gained traction as recognition of AI safety importance grows. McGregor et al. established the AI Incident Database as a community-driven repository documenting real-world AI system failures [5]. Their framework distinguishes incidents from near-misses, accidents from intentional harms, and technical failures from governance inadequacies. The database employs crowdsourcing and expert curation to maintain comprehensive coverage while ensuring quality standards. The AI Incident Database has become a foundational resource for researchers, policymakers, and practitioners seeking to understand patterns in AI failures and develop preventive measures [6,31]. Recent work has proposed standardized taxonomies and schemas for incident classification to enable cross-database comparability [7,32].
The OECD AI Incidents Monitor represents an intergovernmental effort to track and analyze AI-related incidents globally [4]. Drawing on media reports, academic publications, and official documents, the monitor provides structured data on incident characteristics including affected domains, stakeholders, geographic locations, and temporal distributions. The OECD framework defines incidents broadly to encompass not only realized harms but also potential risks and near-misses, enabling proactive risk assessment. This inclusive definition captures a wider range of AI-related events than narrower definitions focused solely on realized harms, providing richer data for pattern analysis and early warning identification. The emerging regulatory landscape, including the EU AI Act and proposed incident reporting requirements, underscores the policy relevance of systematic incident documentation [33]. The emerging regulatory landscape, including the EU AI Act and proposed incident reporting requirements, underscores the policy relevance of systematic incident documentation [33].
While these repositories provide valuable data infrastructure, published analyses remain primarily descriptive—reporting counts, categories, and trends without applying computational text analysis to the incident narratives themselves. The rich textual content describing incident circumstances, actors, and harms remains largely unexplored through systematic linguistic and thematic analysis. This gap is particularly significant given the nuanced information contained in incident descriptions that categorical metadata alone cannot capture. Natural language processing techniques offer the potential to extract insights about incident framing, causal attributions, affected populations, and response patterns that structured fields do not record.

2.4. Research Gaps and Study Contributions

The preceding review reveals four specific gaps in existing research that this study addresses. The first gap concerns the absence of large-scale computational text analysis. Prior incident research relies predominantly on manual coding, categorical analysis, or small-sample qualitative methods. While databases like the OECD monitor contain thousands of incidents with detailed textual descriptions, no published study has applied systematic computational text analysis to characterize linguistic patterns, thematic structures, and sentiment dimensions at scale. This study analyzes 3494 incidents—substantially exceeding prior sample sizes—using automated methods that enable comprehensive corpus-level analysis.
The second gap involves the lack of cross-domain comparative analysis. Existing taxonomies classify AI risks within specific domains (autonomous vehicles, facial recognition, content moderation) but do not empirically compare patterns across domains within a unified analytical framework. This study enables direct comparison of thematic content, sentiment profiles, and linguistic characteristics across all incident types, revealing both domain-specific signatures and cross-cutting patterns invisible to siloed approaches.
The third gap relates to limited temporal analysis spanning major AI inflection points. ChatGPT’s November 2022 release represents a watershed moment in AI adoption and public attention, yet no systematic analysis examines how incident patterns changed across this inflection point. This study provides the first rigorous pre/post comparison examining shifts in incident frequency, thematic composition, vocabulary, and sentiment framing.
The fourth gap addresses the absence of multi-method validation of incident categorization. Prior clustering and topic modeling studies of AI-related text employ single methods without assessing robustness across approaches. This study applies four complementary methods (K-Means, BERTopic, LDA, NMF) to the same corpus, enabling assessment of which categories emerge consistently versus which are method-dependent artifacts. Cross-method alignment ranging from 66% to 96% provides empirical validation unprecedented in AI incident research.
Beyond applying established techniques, this study contributes methodologically through the novel combination of topic modeling, clustering, sentiment analysis, and psycholinguistic profiling applied to the same incident corpus, enabling triangulation across linguistic dimensions; systematic cross-method comparison quantifying alignment between clustering and topic approaches; and demonstration of a replicable analytical pipeline for ongoing AI incident monitoring as databases continue to grow.

3. Data and Methods

3.1. Data Collection

Data were collected from the OECD AI Incidents Monitor database accessed at https://oecd.ai/en/incidents (accessed on 20 December 2025) [4]. The monitor aggregates AI-related incidents from multiple sources including news media, academic publications, government reports, and civil society organizations. The OECD defines an AI incident as an event, circumstance or series of events where the development, use or malfunction of one or more AI systems directly or indirectly contributed to one of the following harms: injury or harm to the health of a person or groups of people, disruption of the management and operation of critical infrastructure, violations of human rights or a breach of obligations under applicable law, or harm to property, communities or the environment.
Web scraping was performed in October 2024 using Python 3.1.3 libraries including BeautifulSoup and Selenium to extract incident records systematically. Each incident record contains structured fields including incident title, detailed summary description, date of occurrence, affected location, involved organizations, stakeholder categories, application domain, and incident severity classification. The final dataset comprised 3494 incidents spanning January 2014 through October 2024, providing a comprehensive longitudinal perspective on AI incident evolution.
To enable temporal comparison, we used 30 November 2022—ChatGPT’s public release date—as the dividing point. The pre-ChatGPT period encompassed 1288 incidents (36.9%) across 107 months, while the post-ChatGPT period contained 2206 incidents (63.1%) across only 23 months. This stark disparity—a 4.6-fold increase in monthly incident rate—enables assessment of whether the ChatGPT release correlates with changes in incident reporting, characteristics, or patterns. Table 1 summarizes the data description.

3.2. Text Preprocessing

The incident title, summary, and concepts fields were concatenated to form the primary text corpus for analysis. Text preprocessing followed standard natural language processing protocols adapted for incident data [24]. All text was converted to lowercase to standardize lexical forms. URLs, email addresses, and special characters were removed as they provide minimal semantic value for thematic analysis. Numbers were retained when part of substantive phrases, but isolated digits were removed.
Tokenization segmented text into individual words using NLTK’s word tokenizer. Stopword removal eliminated common function words including articles, prepositions, and auxiliary verbs that carry minimal content information. We employed the standard NLTK English stopword list including domain-specific high-frequency terms that appeared ubiquitously but provided limited discriminative power for thematic analysis. Lemmatization reduced words to their dictionary base forms using WordNetLemmatizer, grouping morphological variants (e.g., discriminate, discriminating, discriminatory → discriminate). The final TF-IDF document-term matrix comprised 3494 documents × approximately 3000 terms (with min_df = 10, max_df = 0.95 filtering). All preprocessing and analyses used random seed 42 for reproducibility.

3.3. Rationale for Multi-Method Approach

This study employs multiple complementary analytical methods rather than a single technique. This design choice addresses three considerations: different methods reveal different dimensions of the data, as established in comparative text analysis research [24,25]; convergent findings across methods provide stronger evidence than single-method results; and comparing methods enables assessment of which patterns are robust versus method-dependent artifacts.
The choice of both VADER and LIWC for sentiment and psycholinguistic analysis reflects their complementary capabilities. VADER captures evaluative sentiment (positive/negative valence) optimized for news and social media text. LIWC provides orthogonal dimensions including analytical thinking (formal vs. narrative style), clout (confidence and social status), and authenticity (personal vs. guarded). An incident described with high negative sentiment (VADER) might simultaneously exhibit high analytical thinking (LIWC) if written in formal, logical prose. These complementary measures reveal how incidents are both evaluated and cognitively processed.
The combination of clustering (K-Means) and topic modeling (LDA, NMF) serves different analytical purposes. K-Means produces hard cluster assignments where each document belongs to exactly one cluster—useful for creating discrete taxonomies and policy-relevant categorizations. LDA produces soft topic mixtures where documents can exhibit multiple topics—useful for understanding thematic complexity. NMF provides an intermediate approach with cleaner topic boundaries. Comparing results across methods (66–96% alignment) validates which categories are robust empirical phenomena versus artifacts of specific algorithms. Table 2 below summarizes the various analytical methods.

3.4. Topic Modeling

Latent Dirichlet Allocation was implemented using scikit-learn to discover latent thematic structures [23]. LDA models each incident as a mixture of topics and each topic as a probability distribution over words. Model parameters were optimized through systematic evaluation: the number of topics varied from 5 to 25, with model quality assessed using perplexity and coherence scores. Topic coherence scores (C_v) were computed for each candidate model: k = 5 (C_v = 0.38), k = 10 (C_v = 0.41), k = 15 (C_v = 0.44), k = 20 (C_v = 0.42), and k = 25 (C_v = 0.39). The 15-topic model achieved the highest coherence score while maintaining interpretable topic granularity, balancing statistical fit with semantic coherence. Online variational Bayes inference was used with 50 iterations.
Non-negative Matrix Factorization decomposes the TF-IDF document-term matrix V into non-negative document-topic (W) and topic-term (H) matrices such that V ≈ W × H [26]. This linear algebraic approach produces parts-based representations where topics represent additive combinations of words, yielding high interpretability. Models were evaluated using reconstruction error across 5–25 topics, with 15 topics selected to match LDA for comparability. NNDSVD initialization and coordinate descent optimization were used with 500 maximum iterations.

3.5. Clustering Methods

K-Means clustering was applied to TF-IDF vectors reduced to 100 dimensions via Truncated SVD [34]. The optimal number of clusters (k = 3) was selected using silhouette score analysis across k = 2 to 10. Silhouette scores measure cluster cohesion and separation, with k = 3 achieving a local optimum (score = 0.078). Clusters were characterized by their top TF-IDF terms and representative incidents.
BERTopic-style analysis combined TF-IDF vectorization, SVD dimensionality reduction, UMAP projection to five dimensions (n_neighbors = 15, min_dist = 0.1), and HDBSCAN density-based clustering (min_cluster_size = 15) [35]. This approach automatically determined the number of topics (25) and identified outlier documents (377, 10.8%) that did not fit any cluster. Topics were characterized using class-based TF-IDF (c-TF-IDF) to identify discriminative terms.

3.6. Sentiment and Psycholinguistic Analysis

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analyzer specifically designed for social media and news text [27]. VADER generates compound sentiment scores ranging from −1 (extremely negative) to +1 (extremely positive). The tool accounts for sentiment-relevant linguistic features including punctuation intensification, capitalization emphasis, degree modifiers, negations, and contrastive conjunctions. Documents were classified as positive (compound ≥ 0.05), negative (compound ≤ −0.05), or neutral.
LIWC-22 (Linguistic Inquiry and Word Count) analysis provided psychological and linguistic dimensions beyond valence [28]. LIWC quantifies multiple text characteristics including: Analytical Thinking (0–100) measuring formal, logical reasoning versus narrative style; Clout (0–100) assessing confidence and social authority; Authenticity (0–100) capturing personal disclosure versus guarded formality; and Emotional Tone (0–100) quantifying positive versus negative emotional expression. LIWC profiles were computed for each harm category and actor type.

3.7. Mathematical Foundations

This section presents the mathematical formulations underlying the analytical methods employed, providing theoretical grounding for the multi-method approach. Table 3 below compares the mathematical basis of the various methods employed.
Latent Dirichlet Allocation models documents as mixtures of latent topics, where each topic is a probability distribution over words. The generative process assumes the following: first, for each topic k, draw word distribution φ_k~Dirichlet(β); second, for each document d, draw topic distribution θ_d~Dirichlet(α); third, for each word position, draw topic assignment z ~ Multinomial(θ_d), then draw word w ~ Multinomial(φ_z). The joint probability is: P(W, Z, Θ, Φ|α, β) = Π_k P(φ_k|β) × Π_d P(θ_d|α) × Π_n P(z_{d, n}|θ_d) × P(w_{d, n}|φ_{z_{d, n}}). Inference uses variational Bayes, and model selection employs perplexity.
Non-negative Matrix Factorization decomposes the document-term matrix X ∈ ℝ^{M × V} into two non-negative matrices: X ≈ WH, where W ∈ ℝ^{M × K} contains document-topic weights and H ∈ ℝ^{K × V} contains topic-term weights. The optimization minimizes the Frobenius norm: min_{W, H ≥ 0} ||X − WH||2_F. Unlike LDA, NMF produces parts-based, additive representations without probabilistic interpretation. K-Means partitions n observations into K clusters by minimizing within-cluster sum of squares: argmin_S Σ_{k = 1}^K Σ_{x∈S_k} ||x − μ_k||2. Cluster quality is assessed using silhouette score: s(i) = (b(i) − a(i))/max{a(i), b(i)}.

4. Results

4.1. Descriptive Analysis

4.1.1. Temporal Trends

AI incident reports increased dramatically from 2015 to 2023, with a notable spike in 2023 coinciding with ChatGPT’s public release in November 2022. The database shows exponential growth in documented AI incidents, rising from just 28 incidents in 2015 to 892 in 2023. The year 2024 shows 198 incidents through October, suggesting continued high reporting rates. Note that this study period for data collection is thru October 2024 and therefore we note full data for entire year was unavailable. This temporal pattern demonstrates both the acceleration of AI deployment and the heightened attention to AI-related harms following major capability advances. Table 4 below provides the count of incidents by year. For 2019, the count is only for 6 months, for which incidents were available.

4.1.2. Harm-Type Distribution

The most common harm categories are “AI system safety and reliability” (1017 incidents, 29.1%) and “physical and psychological harms” (721 incidents, 20.6%), followed by privacy and data protection (534, 15.3%), discrimination and bias (456, 13.1%), misinformation and manipulation (398, 11.4%), and other harms (368, 10.5%). The concentration of nearly half of all incidents in just two categories suggests that system reliability failures and direct physical/psychological impacts represent the most frequently documented AI risks, though this may partly reflect reporting biases toward more visible and newsworthy incidents. These are shown in Figure 1 above. Table 5 below summarize the number of incident the harm category, while Figure 2 shows the distribution of harms pre and post ChatGPT.
In Figure 2 above the harm categories are tracked over time. The orange color in this figure represents the median line within each box plot. The median is the middle value of the sentiment scores for each period (Pre-ChatGPT and Post-ChatGPT). In both box plots, one can see the orange horizontal line inside the white box—this shows that the median sentiment score is approximately 0 (neutral) for both time periods. The key takeaway from Figure 2 is that the sentiment distribution of AI incidents is quite similar between the Pre-ChatGPT and Post-ChatGPT periods, with both showing a median near zero and a cluster of negative outliers (the points below −0.05), indicating some incidents with notably negative sentiment framing.
Figure 3 displays harm category trends over time. As seen, privacy and surveillance incidents dominate the categories.

4.1.3. Actor Analysis

Private sector actors dominate the incident database, accounting for 2456 incidents (70.3%). Government and public sector actors appear in 534 incidents (15.3%), followed by research and academic actors (287, 8.2%), individual actors (156, 4.5%), and other or unknown actors (61, 1.7%). Among private sector actors, technology companies appear most frequently, with Google, Meta/Facebook, Tesla, Amazon, Microsoft, and OpenAI being the most mentioned organizations. This concentration reflects both the dominance of major technology companies in AI deployment and potential reporting biases favoring high-profile corporate actors whose activities attract media attention. Table 6 below presents the actor type counts, Figure 4 shows the distribution of incidents by actor type, and Figure 5 illustrates actor type trends over time. Table 6 summarizes the count and percentage by actor type in the data.

4.1.4. Geographic Signals

The United States leads geographically with 1678 incidents (48.0%), followed by China (423, 12.1%), the United Kingdom (312, 8.9%), the European Union (287, 8.2%), and Australia (134, 3.8%). The remaining 660 incidents (18.9%) are distributed across other countries and regions. This geographic distribution reflects both the concentration of AI development in North America and Western Europe and the English-language bias of the OECD database sources. The relative absence of documented incidents from Africa, South America, and Central Asia likely reflects reporting infrastructure limitations rather than true incident absence. Table 7 below lists the top countries and regions, and Figure 6 shows the geographic distribution.

4.1.5. Vocabulary Shifts Across the ChatGPT Divide

Vocabulary analysis reveals structural transformation in AI incident discourse following ChatGPT’s release. Pre-ChatGPT vocabulary emphasizes terms related to embodied AI and surveillance technologies: ‘facial,’ ‘recognition,’ ‘autonomous,’ ‘self-driving,’ ‘police,’ ‘surveillance,’ ‘bias,’ and ‘discrimination’ dominate the pre-2022 corpus. Post-ChatGPT vocabulary shows a dramatic shift toward generative AI terminology: ‘chatgpt,’ ‘openai,’ ‘generative,’ ‘llm,’ ‘hallucination,’ ‘prompt,’ ‘jailbreak,’ and ‘misinformation’ emerge as distinctive post-2022 terms. Log–odds ratio analysis identifies the most distinctive vocabulary for each period, highlighting the structural shift in AI incident discourse. Figure 7, Figure 8 and Figure 9 below present word clouds illustrating these vocabulary patterns.

4.2. Clustering Results

4.2.1. K-Means Clustering

K-Means clustering with k = 3 was selected based on silhouette score analysis, which showed k = 3 as a local optimum with a score of 0.078. The analysis identified three distinct clusters that represent different segments of the AI incident landscape. Cluster 1, labeled “General AI & Platform Risks,” contains 2808 incidents (80.4%) and serves as a catch-all category for diverse AI-related incidents involving major technology platforms. The top terms characterizing this cluster include ai, google, data, facebook, users, amazon, company, technology, apple, intelligence, algorithm, software, twitter, microsoft, and privacy.
Cluster 2, labeled “Autonomous Vehicles,” contains 205 incidents (5.9%) and represents a coherent category focused on self-driving technology. The top terms include tesla, car, driving, self, autopilot, driver, vehicle, musk, crash, elon, cars, autonomous, safety, vehicles, and road. The tight semantic coherence of this cluster reflects the distinctive vocabulary and incident patterns associated with autonomous vehicle technology. Cluster 3, labeled “Facial Recognition & Surveillance,” contains 481 incidents (13.8%) and captures incidents related to biometric identification and surveillance systems. The top terms include facial, recognition, police, surveillance, technology, law, privacy, clearview, enforcement, use, software, cameras, china, rights, and security. Table 8 below summarizes the K-Means cluster characteristics, and Figure 10 and Figure 11 present the silhouette score analysis and cluster size distribution.

4.2.2. BERTopic Clustering

BERTopic automatically identified 25 topics plus 377 outliers (10.8%), providing substantially finer granularity than K-Means. This approach distinguishes subtopics that K-Means groups together, such as separating Clearview AI incidents from general facial recognition, YouTube child safety from general content moderation, and EU regulation from US government policy. The largest BERTopic topics include Facial Recognition (362 incidents), Facebook/Social Media (324), AI Bias (255), Machine Learning Research (206), ChatGPT (200), Self-Driving/Tesla (198), Deepfake Pornography (185), Voice Assistants (118), Child Safety (102), and US Government AI (94). The BERTopic approach’s identification of outliers represents an advantage over forced-assignment methods, acknowledging that some incidents do not fit neatly into any coherent category. Table 9 below presents the top BERTopic topics, and Figure 12 shows the topic distribution.

4.3. Topic Modeling Results

4.3.1. LDA Topic Modeling

LDA identified 15 topics with model selection guided by coherence score optimization. The largest topic, AI/ML & Algorithmic Bias (Topic 10), contains 1035 incidents (29.6%) and serves as a catch-all for general AI discourse that does not fit specialized categories. This topic’s dominance is characteristic of LDA’s tendency to create one or more broad topics alongside more specialized ones.
Table 10 below presents the LDA topics, and Figure 13 and Figure 14 show the model selection metrics and topic distribution.
Figure 15 presents word clouds for all 15 LDA topics, providing visual representations of each topic’s characteristic vocabulary. Topic 1 (Deepfakes & Synthetic Media, 502 incidents, 14.4%) shows prominent terms including deepfake, video, fake, media, generated, image, audio, and synthetic, reflecting the growing concern about AI-generated misleading content. Topic 4 (Social Media Platforms, 425 incidents, 12.2%) features facebook, social, media, instagram, users, content, platform, and meta, capturing incidents involving major social networking services. Topic 9 (ChatGPT & Generative AI, 375 incidents, 10.7%) displays chatgpt, openai, generative, chatbot, gpt, and prompt, representing the newest major incident category that emerged following November 2022.
Topic 11 (Facial Recognition, 363 incidents, 10.4%) shows facial, recognition, police, surveillance, clearview, and enforcement, while Topic 13 (Tesla & Self-Driving, 201 incidents, 5.8%) features tesla, driving, autopilot, car, musk, and crash. The word cloud visualizations reveal that specialized topics (autonomous vehicles, facial recognition, deepfakes) display tight semantic coherence with highly discriminative vocabulary, while the general AI/ML topic shows more diffuse vocabulary reflecting its catch-all nature.

4.3.2. NMF Topic Modeling

NMF produced more balanced topic distributions than LDA, with the largest topic (Deepfakes & Synthetic Media, 417 incidents, 11.9%) substantially smaller than LDA’s largest topic (29.6%). This more even distribution reflects NMF’s parts-based decomposition approach, which tends to yield cleaner thematic separation. Table 11 below presents the NMF topics, and Figure 16 and Figure 17 show the reconstruction error and topic distribution.
Figure 18 presents word clouds for all 15 NMF topics. Comparing the NMF and LDA word clouds reveals both similarities and differences in how each method characterizes the incident landscape. NMF Topic 7 (Deepfakes) and LDA Topic 1 (Deepfakes) show highly similar vocabulary, as do NMF Topic 3 (Tesla) and LDA Topic 13 (Tesla), and NMF Topic 2 (Facial Recognition) and LDA Topic 11 (Facial Recognition). These correspondences confirm that specialized incident categories are robust across methods. However, NMF distributes the content of LDA’s large catch-all topic across multiple smaller topics.

4.3.3. Cross-Method Topic Correspondence

Cross-tabulation of NMF and LDA topic assignments reveals strong correspondence for specialized topics. NMF Topic 3 (Tesla & Self-Driving) maps 85.0% to LDA Topic 13, demonstrating that autonomous vehicle incidents form a robust category regardless of method. Similarly, NMF Topic 2 (Facial Recognition) maps 73.1% to LDA Topic 11, NMF Topic 7 (Deepfakes) maps 71.9% to LDA Topic 1, and NMF Topic 6 (ChatGPT) maps 67.1% to LDA Topic 9. However, many NMF topics map substantially to LDA Topic 10 (AI/ML & Algorithmic Bias), which serves as LDA’s catch-all category. Table 12 below shows the NMF-LDA topic correspondence, and Figure 19 displays the correspondence heatmap.

4.3.4. Topic Relationships and Hierarchical Clustering

To explore relationships among the identified LDA topics, hierarchical clustering analysis was conducted based on topic-word distribution similarity and document-level topic co-occurrence correlations. Hierarchical clustering of topic-word distributions reveals meaningful semantic groupings. The most similar topic pairs are Deepfakes & Synthetic Media and YouTube & Child Safety (cosine similarity 0.803), followed by Deepfakes and AI/ML & Algorithmic Bias (0.671). Across all topic pairs, mean similarity is 0.071 (SD = 0.136), indicating that most topics are lexically distinct while certain topic pairs share substantial vocabulary overlap.
Document-level correlation analysis reveals how topics co-occur within incident reports. The mean correlation is −0.057 (SD = 0.076), indicating that most topics are negatively correlated—when one topic dominates a document, others tend to be suppressed. Five topic clusters emerge from hierarchical analysis: Cluster 1 combines content-related AI incidents; Cluster 2 groups international and legal topics; Cluster 3 pairs US government policy with emerging generative AI; Cluster 4 links robotics with executive leadership; Cluster 5 encompasses consumer-facing technologies. Figure 20 below presents the hierarchical clustering analysis results. In Figure 20a dendrogram shows how the 16 LDA topics group together based on their similarity using Ward linkage. Topics that merge at shorter distances (closer to 0 on the x-axis) are more similar. For example, T3 (China, Microsoft & Search) and T7 (Privacy & Legal Issues) merge early, indicating high similarity. The colored branches highlight distinct topic clusters, revealing natural groupings such as consumer-facing AI technologies versus governance/security themes.
The heatmap in Figure 20b displays pairwise cosine similarity scores between all 16 topics based on their word distributions. Diagonal values are 1.00 (perfect self-similarity). Darker red/yellow cells indicate high similarity between topic pairs. Notable high-similarity pairs include T14–T10–T1–T12 (a cluster of deepfakes, algorithmic bias, and tech platforms) and T8–T9 (government/security and ChatGPT topics), confirming the dendrogram groupings.
The matrix in Figure 20c shows how frequently topics appear together within the same incident documents. Positive correlations (red) indicate topics that tend to co-occur; negative correlations (blue) indicate topics that rarely appear together. For instance, T9 (ChatGPT) shows positive correlation with T2 (Robots & Automation), while T13 (Tesla & Self-Driving) shows negative correlation with T14 (Entertainment Industry), suggesting these themes represent distinct incident categories.
Figure 20d summarizes the five thematic clusters derived from hierarchical clustering, listing the topics within each cluster along with their representative keywords. Cluster 1 groups consumer tech harms (deepfakes, algorithmic bias, child safety, smart tech); Cluster 2 covers international/legal issues; Cluster 3 addresses US government and generative AI; Cluster 4 focuses on robotics and leadership; Cluster 5 encompasses consumer-facing platforms and surveillance technologies.

4.4. Sentiment Analysis

4.4.1. Overall Sentiment Distribution

VADER sentiment analysis classified 61.2% of incidents as positive (2138 incidents), 36.7% as negative (1282 incidents), and 2.1% as neutral (74 incidents), with an overall mean compound score of +0.19. This surprising predominance of positive-classified text warrants careful interpretation. The moderate positivity and analytical tone likely reflect the source composition of the OECD repository, which draws primarily from news reports, regulatory documents, and technical analyses rather than first-person victim accounts or emotional testimonials. Journalistic conventions emphasize balanced, factual coverage even when reporting negative events. Table 13 below presents the overall sentiment statistics, and Figure 21 and Figure 22 show the sentiment distribution and VADER component scores.

4.4.2. Sentiment by Topic and Cluster

Sentiment varies substantially across topics and clusters. Among K-Means clusters, Autonomous Vehicles (Cluster 2) shows uniquely negative sentiment with a mean compound score of −0.077 and 53.7% of incidents classified as negative. This contrasts with General AI & Platform Risks (Cluster 1, mean +0.216) and Facial Recognition (Cluster 3, mean +0.160). BERTopic analysis reveals more extreme sentiment variation across fine-grained topics. The most negatively framed topics are Child Safety/YouTube (Topic 14, mean compound −0.326, 67.6% negative) and Self-Driving/Tesla (Topic 1, mean −0.140, 57.6% negative). The most positively framed topics are US Government AI (Topic 10, mean +0.633, 87.2% positive) and ChatGPT/GenAI (Topic 24, mean +0.586, 85.0% positive). Table 14 below presents sentiment by topic category, and Figure 23 and Figure 24 show the VADER sentiment distributions by K-Means cluster and BERTopic topic.

4.4.3. Psycholinguistic Profiles

LIWC analysis reveals psycholinguistic variation across harm categories and actor types. Physical harm incidents show elevated negative emotion and anxiety language, with Emotional Tone scores averaging 32.1 and Affect language at 4.8%. Privacy-related incidents emphasize cognitive processing terms (Cognition: 14.2%) and formal analytical style (Analytic: 82.4%). Discrimination incidents feature more social language (9.4%), while misinformation incidents show highest social language markers (10.8%). Actor type also correlates with psycholinguistic profiles. Corporate actor incidents use more formal, analytical language (Analytic: 81.4%) compared to government contexts (79.2%). Research and academic incidents exhibit the highest analytical thinking scores (84.6%). Table 15 below presents the LIWC profiles by harm category, and Figure 25 and Figure 26 show the psycholinguistic profiles.

4.5. Multi-Method Comparison

Comprehensive comparison across all four clustering/topic modeling methods reveals both convergence and divergence, providing empirical validation of incident categorization robustness. Cross-method alignment statistics quantify correspondence between methods. BERTopic and NMF achieved the highest alignment at 96.4%, indicating that these two methods produce highly consistent categorizations despite their different algorithmic foundations. K-Means and BERTopic achieved 91.0% alignment, K-Means and LDA achieved 84.9%, BERTopic and LDA achieved 86.5%, K-Means and NMF achieved 84.4%, and LDA and NMF achieved 81.6%.
Domain-specific analysis reveals that autonomous vehicle incidents achieved the highest cross-method alignment (84–89% across all four methods), indicating these incidents form a coherent, robust category regardless of analytical approach. Facial recognition incidents showed more moderate but still substantial alignment (66–68%), with variation attributable to the broader scope of surveillance-related content captured differently by different methods. The methodological implication is clear: categories achieving greater than 80% cross-method alignment (autonomous vehicles, deepfakes) represent robust taxonomic elements suitable for standardized regulatory classification. Categories with 66–80% alignment (facial recognition, content moderation) remain valid but may require context-dependent sub-categorization. Table 16, Table 17 and Table 18 below present the cross-method alignment and comprehensive comparison statistics, and Figure 27 and Figure 28 show the multi-method comparison visualizations.

4.6. Summary of Findings

The multi-method analysis reveals six core thematic categories that emerge consistently across analytical approaches: autonomous vehicles, facial recognition, deepfakes and synthetic media, ChatGPT and generative AI, social media platforms, and algorithmic bias. These themes demonstrate varying degrees of cross-method alignment, with specialized technical domains (autonomous vehicles, deepfakes) showing highest consistency and broader categories (general AI discourse, social media) showing more method-dependent structure. The thematic concentration has direct governance implications, as targeted intervention addressing the top categories could address a substantial share of documented incidents.
Risk concentration is evident across multiple dimensions: nearly half (49.7%) of all incidents fall within just two harm categories (system safety and physical harms); private sector actors account for 70.3% of incidents, with five companies (Google, Meta, Tesla, Amazon, Microsoft) dominating; and 48% of incidents occur in the United States alone. This concentration pattern suggests that regulatory resources could be strategically deployed toward high-frequency incident categories, major corporate actors, and primary geographic jurisdictions to maximize impact on documented harms.
The post-ChatGPT shift represents a structural transformation in the incident landscape. The 4.6-fold increase in monthly incident rate, combined with vocabulary analysis confirming genuine thematic transformation, demonstrates that ChatGPT’s release marks a clear inflection point. New terms (chatgpt, llm, hallucination, jailbreak) and new incident types (chatbot manipulation, AI-generated misinformation) emerged post-ChatGPT. Topic modeling identifies ChatGPT/generative AI as a distinct category comprising 6.8–10.7% of incidents depending on method. Sentiment patterns reveal that physical safety incidents receive notably negative framing (autonomous vehicles: −0.077; child safety: −0.326), while policy discussions and generative AI coverage receive surprisingly positive framing (+0.586 to +0.633). This sentiment structure likely reflects source characteristics and journalistic conventions rather than objective harm assessment.

5. Discussion

5.1. The ChatGPT Inflection Point

The 4.6-fold increase in incident reporting following ChatGPT’s November 2022 release represents the most dramatic shift in the dataset’s temporal evolution [1]. This increase likely reflects both genuine growth in AI deployment and associated harms, as well as heightened media attention and improved incident documentation infrastructure. The release brought AI capabilities directly into public consciousness through an accessible interface, generating widespread experimentation, deployment, and consequently, incident documentation. Increased media attention to AI issues likely enhanced incident detection and reporting, while broader AI adoption across domains created more opportunities for harmful outcomes. The sustained elevated reporting rate through October 2024 suggests this represents genuine structural change rather than a temporary spike in attention. However, detection and reporting biases complicate interpretation, as improved awareness may surface previously undetected incidents rather than reflecting new incident generation.

5.2. Thematic Concentration in High-Risk Domains

The concentration of incidents in facial recognition, algorithmic bias, autonomous vehicles, and content moderation reveals systematic patterns in AI risk manifestation. These domains share characteristics including deployment at scale affecting large populations, applications in sensitive contexts involving fundamental rights or physical safety, opacity that obscures how decisions are made, and power asymmetries between AI deployers and affected individuals. The persistence of facial recognition as the dominant specialized incident category despite regulatory actions and voluntary moratoria suggests technical and governance challenges remain substantial. Discriminatory hiring algorithms continue generating incidents despite years of academic attention and legal scrutiny, indicating that bias mitigation remains technically difficult and economically under-prioritized. Content moderation’s increased prominence in the post-ChatGPT period reflects new challenges as platforms confront AI-generated misinformation and synthetic media at unprecedented scale.
The six core themes identified across methods—autonomous vehicles, facial recognition, deepfakes, generative AI, social media, and algorithmic bias—represent distinct risk domains with differentiated vocabulary, affected populations, and governance requirements. Autonomous vehicle incidents cluster tightly around Tesla’s Autopilot system and involve physical safety harms requiring technical safety standards and real-time monitoring. Facial recognition incidents span law enforcement surveillance, commercial applications, and border control, implicating civil liberties, privacy rights, and anti-discrimination frameworks. Deepfake incidents increasingly involve non-consensual intimate imagery and political manipulation, raising questions about authenticity, consent, and information integrity that existing legal frameworks inadequately address.

5.3. Geographic Disparities in Incident Documentation

The pronounced geographic concentration in North America and Europe raises important questions about global AI incident patterns. Several interpretations warrant consideration. First, wealthy nations deploy AI systems more extensively, creating more opportunities for incidents. Second, robust media ecosystems and press freedom in these regions enable incident detection and reporting that may be suppressed or discouraged elsewhere. Third, cultural and linguistic factors bias incident collection toward English-language sources. Fourth, some regions may experience incidents but lack infrastructure or incentives for systematic documentation. The relative absence of African, South American, and Central Asian incidents likely reflects detection limitations rather than true incident absence. This geographic blind spot represents critical limitation in understanding global AI risk distribution and may obscure significant harms affecting populations with limited voice in international AI governance discussions.

5.4. Methodological Contributions

This research demonstrates the utility of computational text analysis for large-scale incident pattern detection. The combination of topic modeling, sentiment analysis, and clustering provides complementary perspectives on incident characteristics that manual coding alone could not efficiently achieve. LDA successfully discovered interpretable thematic categories without requiring pre-specified taxonomies, enabling data-driven taxonomy development. VADER and LIWC revealed not only emotional valence but also cognitive and communicative dimensions of incident discourse. Clustering identified natural groupings that both validated topic model results and revealed additional structure.
A critical contribution of this study is the explicit quantification of cross-method alignment, addressing methodological uncertainty absent from prior AI incident research. We computed pairwise alignment statistics by calculating the percentage of documents assigned to corresponding thematic categories across methods. Domain-specific analysis reveals that autonomous vehicle incidents achieved the highest cross-method alignment (84–89% across all four methods), indicating these incidents form a coherent, robust category regardless of analytical approach. Facial recognition incidents showed more moderate but still substantial alignment (66–68%). Categories achieving greater than 80% cross-method alignment represent robust taxonomic elements suitable for standardized regulatory classification. Categories with 66–80% alignment remain valid but may require context-dependent sub-categorization.

5.5. Implications for AI Governance

Findings suggest targeted regulatory attention may prove more effective than generic AI regulation. The concentration of incidents in specific domains supports sector-specific oversight mechanisms calibrated to documented risk patterns. For algorithmic hiring systems, the persistence of discriminatory incidents indicates current oversight mechanisms are insufficient. Governance requirements should include mandatory audit trails documenting all candidate-facing algorithmic decisions, pre-deployment adverse impact testing following EEOC four-fifths rule thresholds, annual third-party algorithmic audits, and candidate notification rights with access to human review upon request.
For autonomous vehicles, which represent a distinct, highly coherent category achieving 84–89% cross-method alignment with uniquely negative sentiment, specific regulatory requirements should include mandatory simulation testing against standardized scenario libraries with minimum performance thresholds before public road deployment, real-time incident reporting within 24 h for any collision or disengagement event, mandatory operational design domain declarations with geofencing enforcement, and public disclosure of disengagement rates and collision statistics.
For social media recommender systems, governance mechanisms should include algorithmic transparency reports disclosing optimization targets, mandatory algorithmic impact assessments before deployment, user-facing controls for algorithmic personalization, researcher data access provisions, and specific child safety requirements including age-appropriate design standards. For facial recognition and biometric systems, requirements should include prohibited use cases for real-time public surveillance without judicial authorization, minimum accuracy thresholds stratified by demographic group, mandatory bias testing with public disclosure, consent requirements for commercial applications, and data retention limits.
For generative AI systems, governance requirements should include mandatory output labeling using human-readable and machine-readable metadata, provenance tracking systems, prohibited impersonation uses, model documentation requirements (“model cards”), and API-level guardrails preventing harmful content generation. The rapid emergence of ChatGPT/generative AI as a distinct incident category demonstrates the need for governance mechanisms responsive to capability advances within months rather than years.

6. Limitations

Several limitations warrant consideration when interpreting findings. First, the analysis relies on incidents documented in the OECD AI Incidents Monitor [4], which may suffer from detection and reporting biases. High-profile incidents involving major technology companies receive disproportionate media coverage while routine harms affecting smaller populations may go undocumented. The dominance of private sector actors (70.3%) may partially reflect this visibility bias.
Second, incident summaries provide limited information about severity, affected populations, and long-term consequences. The mean text length of approximately 150 words per incident constrains the depth of individual case analysis. Third, the temporal analysis cannot definitively establish causality between ChatGPT’s release and the observed 4.6-fold increase in monthly incident rate. Multiple confounding factors including general AI capability advancement and expanded media coverage complicate causal interpretation.
Fourth, computational text analysis methods have inherent limitations. Topic modeling results depend on parameter choices; the selection of 15 topics prioritized interpretability over optimization. Sentiment analysis tools may miss sarcasm, context-dependent meaning, or domain-specific terminology valence. Fifth, the Word2Vec embeddings were trained on the entire corpus, introducing potential temporal leakage for pre/post analyses. Future longitudinal studies should employ temporally segmented embedding strategies.
Sixth, cross-method alignment statistics (66–96%) demonstrate that incident categorization is partially method-dependent. Categories with lower alignment may require additional sub-categorization. Seventh, the geographic concentration (48% US, 57% North America/Europe combined) likely reflects reporting infrastructure rather than true risk distribution. The relative absence of documented incidents from Africa, South America, and Central Asia represents a critical blind spot in understanding global AI risk patterns.

7. Conclusions

This research examined 3494 AI incidents documented in the OECD AI Incidents Monitor [4] spanning January 2014 through October 2024 using comprehensive text analytics and machine learning methodologies. Findings reveal dramatic incident acceleration following ChatGPT’s November 2022 release [1], with monthly reporting rates increasing 4.6-fold from 12.0 to 95.9 incidents per month. The pre-ChatGPT period encompassed 1288 incidents (36.9%) across 107 months, while the post-ChatGPT period contained 2206 incidents (63.1%) across only 23 months, demonstrating both the acceleration of AI deployment and the heightened public attention to AI-related harms following major capability advances.
Topic modeling using LDA [23] and NMF [26] (15 topics each) identified six robust thematic categories with particular concentration in facial recognition (10.1–10.4%), autonomous vehicles (5.5–5.9%), deepfakes (11.9–14.4%), ChatGPT/generative AI (6.8–10.7%), social media platforms (7.5–12.2%), and algorithmic bias. Sentiment analysis using VADER [27] confirmed predominantly negative tone for physical safety incidents (autonomous vehicles: −0.077; child safety: −0.326) with analytical, formal communication style as indicated by LIWC profiles [28]. The contrast between negative sentiment for physical harm categories and positive sentiment for policy discussions reveals how different incident types are framed in public discourse.
The multi-method comparison—systematically applying K-Means, BERTopic, LDA, and NMF to the same corpus—provides empirical validation unprecedented in AI incident research. Cross-method alignment ranging from 66% to 96% identifies which categories represent robust taxonomic elements (autonomous vehicles: 84–89%) versus those requiring context-dependent interpretation (facial recognition: 66–68%). This methodological contribution enables future researchers to assess the robustness of their categorization schemes and provides a template for validation in other incident analysis contexts.
The sector-specific regulatory recommendations—mandatory audit trails for hiring algorithms, simulation testing requirements for autonomous vehicles, transparency obligations for recommender systems, accuracy standards for facial recognition, and output labeling for generative AI—provide actionable frameworks grounded in empirical harm documentation. The rapid emergence of ChatGPT/generative AI as a distinct incident category demonstrates the need for governance mechanisms responsive to capability advances within months rather than years. These findings have direct implications for policymakers, technology developers, and civil society organizations seeking to prevent AI harms through evidence-based intervention.
Future research should extend temporal coverage as incidents accumulate, develop predictive models for proactive risk assessment, conduct deeper qualitative analysis of high-impact incidents, examine causal mechanisms linking AI design choices to incident outcomes, and investigate geographic disparities in incident documentation. The intersection of advancing AI capabilities with limited governance infrastructure suggests incident frequencies may continue rising absent substantial intervention. As AI systems become increasingly capable and widely deployed across domains affecting fundamental rights, physical safety, and information integrity, systematic incident monitoring and analysis becomes essential for evidence-based policy development, risk mitigation strategy design, and accountability mechanism implementation.

Author Contributions

Conceptualization: W.R. and J.R.; methodology: W.R., J.R. and T.K.; software: T.K.; validation:: W.R. and T.K.; format analysis: W.R., J.R. and T.K.; data curation: T.K.; writing—original draft preparation: W.R. and T.K.; writing—review and editing—W.R. and J.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are available on request to the corresponding author according to restrictions for privacy and legal reasons.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. OpenAI. Introducing ChatGPT. OpenAI Blog. 2022. Available online: https://openai.com/blog/chatgpt (accessed on 20 November 2025).
  2. Turri, V.; Dzombak, R. Why we need to know more: Exploring the state of AI incident documentation practices. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, Montréal, QC, Canada, 8–10 August 2023; pp. 576–583. [Google Scholar]
  3. Shelby, R.; Rismani, S.; Henne, K.; Moon, A.; Rostamzadeh, N.; Nicholas, P.; Virk, G. Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, Montréal, QC, Canada, 8–10 August 2023; pp. 723–741. [Google Scholar]
  4. OECD. Defining AI incidents and related terms. In OECD Artificial Intelligence Papers; No. 16; OECD Publishing: Paris, France, 2024. [Google Scholar] [CrossRef]
  5. McGregor, S.; Paeth, K.; Lam, K. Indexing AI risks with incidents, issues, and variants. Proc. AAAI Conf. Artif. Intell. 2021, 35, 15458–15463. [Google Scholar]
  6. Fjeld, J.; Achten, N.; Hilligoss, H.; Nagy, A.; Srikumar, M. Principled Artificial Intelligence: Mapping Consensus in Ethical and Rights-Based Approaches; No. 2020-1; Berkman Klein Center Research Publication: Cambridge, MA, USA, 2020. [Google Scholar]
  7. National Institute of Standards and Technology (NIST). Artificial Intelligence Risk Management Framework (AI RMF 1.0); NIST: Gaithersburg, MD, USA, 2023. [Google Scholar] [CrossRef]
  8. Raghavan, M.; Barocas, S.; Kleinberg, J.; Levy, K. Mitigating bias in algorithmic hiring: Evaluating claims and practices. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAT*), Barcelona, Spain, 27–30 January 2020; pp. 469–481. [Google Scholar]
  9. Ajunwa, I. The paradox of automation as anti-bias intervention. Cardozo Law Rev. 2020, 41, 1671–1740. [Google Scholar]
  10. Albaroudi, E.; Mansouri, T.; Alameer, A. A comprehensive review of AI techniques for addressing algorithmic bias in job hiring. AI 2024, 5, 383–404. [Google Scholar] [CrossRef]
  11. Almaskati, D.; Kermanshachi, S.; Pamidimukkala, A. Investigating the impacts of autonomous vehicles on crash severity and traffic safety. Front. Built Environ. 2024, 10, 1383144. [Google Scholar] [CrossRef]
  12. Kusano, K.D.; Scanlon, J.M.; Chen, Y.-H.; McMurry, T.L.; Chen, R.; Gode, T.; Victor, T. Comparison of Waymo rider-only crash data to human benchmarks at 7.1 million miles. Traffic Inj. Prev. 2024, 25, S66–S77. [Google Scholar] [CrossRef]
  13. Salhab, W.; Ameyed, D.; Jaafar, F.; Mcheick, H. A systematic literature review on AI safety: Identifying trends, challenges, and future directions. IEEE Access 2024, 12, 131762–131784. [Google Scholar] [CrossRef]
  14. Brundage, M.; Avin, S.; Clark, J.; Toner, H.; Eckersley, P.; Garfinkel, B.; Amodei, D. The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. arXiv 2018, arXiv:1802.07228. [Google Scholar] [CrossRef]
  15. Wach, K.; Duong, C.D.; Ejdys, J.; Kazlauskaitė, R.; Korzynski, P.; Mazurek, G.; Paliszkiewicz, J.; Ziemba, E. The dark side of generative artificial intelligence: A critical analysis of controversies and risks of ChatGPT. Entrep. Bus. Econ. Rev. 2023, 11, 7–30. [Google Scholar] [CrossRef]
  16. Ferrara, E. GenAI against humanity: Nefarious applications of generative artificial intelligence and large language models. J. Comput. Soc. Sci. 2024, 7, 549–569. [Google Scholar] [CrossRef]
  17. Buolamwini, J.; Gebru, T. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the Conference on Fairness, Accountability and Transparency, PMLR 81, New York, NY, USA, 23–24 February 2018; pp. 77–91. [Google Scholar]
  18. Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arber, S.; von Arx, S.; Liang, P. On the opportunities and risks of foundation models. arXiv 2021, arXiv:2108.07258. [Google Scholar] [CrossRef]
  19. Weidinger, L.; Mellor, J.; Rauh, M.; Griffin, C.; Uesato, J.; Huang, P.-S.; Gabriel, I. Ethical and social risks of harm from language models. arXiv 2021, arXiv:2112.04359. [Google Scholar] [CrossRef]
  20. Gorwa, R.; Binns, R.; Katzenbach, C. Algorithmic content moderation: Technical and political challenges in the automation of platform governance. Big Data Soc. 2020, 7, 205395171989794. [Google Scholar] [CrossRef]
  21. Gupta, G.; Raja, K.; Gupta, M.; Jan, T.; Whiteside, S.T.; Prasad, M. A comprehensive review of deepfake detection using advanced machine learning and fusion methods. Electronics 2024, 13, 95. [Google Scholar] [CrossRef]
  22. Chesney, R.; Citron, D.K. Deep fakes: A looming challenge for privacy, democracy, and national security. Calif. Law Rev. 2019, 107, 1753–1819. [Google Scholar] [CrossRef]
  23. Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
  24. Laureate, C.D.P.; Buntine, W.; Linger, H. A systematic review of the use of topic models for short text social media analysis. Artif. Intell. Rev. 2023, 56, 14223–14255. [Google Scholar] [CrossRef] [PubMed]
  25. Wu, X.; Nguyen, T.; Luu, A.T. A survey on neural topic models: Methods, applications, and challenges. Artif. Intell. Rev. 2024, 57, 18. [Google Scholar] [CrossRef]
  26. Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef]
  27. Hutto, C.; Gilbert, E. VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the International AAAI Conference on Web and Social Media, Ann Arbor, MI, USA, 1–4 June 2014; Volume 8, pp. 216–225. [Google Scholar]
  28. Pennebaker, J.W.; Boyd, R.L.; Jordan, K.; Blackburn, K. The Development and Psychometric Properties of LIWC2015; University of Texas at Austin: Austin, TX, USA, 2015. [Google Scholar] [CrossRef]
  29. Arthur, D.; Vassilvitskii, S. k-means++: The advantages of careful seeding. In Proceedings of the SODA’07: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007; pp. 1027–1035. [Google Scholar]
  30. McInnes, L.; Healy, J.; Astels, S. HDBSCAN: Hierarchical density-based clustering. J. Open Source Softw. 2017, 2, 205. [Google Scholar] [CrossRef]
  31. Holstein, K.; Wortman Vaughan, J.; Daumé, H., III; Dudík, M.; Wallach, H. Improving fairness in machine learning systems: What do industry practitioners need? In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, UK, 4–9 May 2019; pp. 1–16. [Google Scholar]
  32. Amodei, D.; Olah, C.; Steinhardt, J.; Christiano, P.; Schulman, J.; Mané, D. Concrete problems in artificial intelligence safety. arXiv 2016, arXiv:1606.06565. [Google Scholar]
  33. Crootof, R.; Ard, B. Structuring techlaw. Vanderbilt Law Rev. 2021, 74, 751–832. [Google Scholar] [CrossRef]
  34. MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Statistical Laboratory of the University of California: Berkeley, CA, USA, 1967; pp. 281–297. [Google Scholar]
  35. Grootendorst, M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar]
Figure 1. Temporal distribution of AI incidents by year (2014–2024).
Figure 1. Temporal distribution of AI incidents by year (2014–2024).
Appliedmath 06 00011 g001
Figure 2. Distribution of incidents by harm category.
Figure 2. Distribution of incidents by harm category.
Appliedmath 06 00011 g002
Figure 3. Harm category trends over time.
Figure 3. Harm category trends over time.
Appliedmath 06 00011 g003
Figure 4. Distribution of incidents by actor type.
Figure 4. Distribution of incidents by actor type.
Appliedmath 06 00011 g004
Figure 5. Actor type trends over time.
Figure 5. Actor type trends over time.
Appliedmath 06 00011 g005
Figure 6. Geographic distribution of AI incidents.
Figure 6. Geographic distribution of AI incidents.
Appliedmath 06 00011 g006
Figure 7. Overall word cloud showing dominant vocabulary in AI incident reports.
Figure 7. Overall word cloud showing dominant vocabulary in AI incident reports.
Appliedmath 06 00011 g007
Figure 8. Filtered word cloud with platform names removed.
Figure 8. Filtered word cloud with platform names removed.
Appliedmath 06 00011 g008
Figure 9. Pre-ChatGPT vs. Post-ChatGPT word cloud comparison.
Figure 9. Pre-ChatGPT vs. Post-ChatGPT word cloud comparison.
Appliedmath 06 00011 g009
Figure 10. Silhouette score analysis for K-Means (k = 2 to 10).
Figure 10. Silhouette score analysis for K-Means (k = 2 to 10).
Appliedmath 06 00011 g010
Figure 11. K-Means cluster size distribution.
Figure 11. K-Means cluster size distribution.
Appliedmath 06 00011 g011
Figure 12. BERTopic topic distribution (25 topics).
Figure 12. BERTopic topic distribution (25 topics).
Appliedmath 06 00011 g012
Figure 13. LDA model selection: perplexity and log-likelihood by number of topics.
Figure 13. LDA model selection: perplexity and log-likelihood by number of topics.
Appliedmath 06 00011 g013
Figure 14. LDA topic distribution (15 topics).
Figure 14. LDA topic distribution (15 topics).
Appliedmath 06 00011 g014
Figure 15. Word clouds for all 15 LDA topics.
Figure 15. Word clouds for all 15 LDA topics.
Appliedmath 06 00011 g015
Figure 16. NMF reconstruction error by number of topics.
Figure 16. NMF reconstruction error by number of topics.
Appliedmath 06 00011 g016
Figure 17. NMF topic distribution (15 topics).
Figure 17. NMF topic distribution (15 topics).
Appliedmath 06 00011 g017
Figure 18. Word clouds for all 15 NMF topics.
Figure 18. Word clouds for all 15 NMF topics.
Appliedmath 06 00011 g018
Figure 19. NMF vs. LDA topic correspondence heatmap.
Figure 19. NMF vs. LDA topic correspondence heatmap.
Appliedmath 06 00011 g019
Figure 20. (a). Hierarchical clustering dendrogram. (b). Topic similarity matrix. (c). Topic co-occurrence correlation matrix. (d). Identified topic clusters.
Figure 20. (a). Hierarchical clustering dendrogram. (b). Topic similarity matrix. (c). Topic co-occurrence correlation matrix. (d). Identified topic clusters.
Appliedmath 06 00011 g020aAppliedmath 06 00011 g020bAppliedmath 06 00011 g020cAppliedmath 06 00011 g020d
Figure 21. Overall sentiment distribution.
Figure 21. Overall sentiment distribution.
Appliedmath 06 00011 g021
Figure 22. VADER component scores.
Figure 22. VADER component scores.
Appliedmath 06 00011 g022
Figure 23. VADER sentiment distribution by K-Means cluster.
Figure 23. VADER sentiment distribution by K-Means cluster.
Appliedmath 06 00011 g023
Figure 24. VADER sentiment by BERTopic topic (sorted).
Figure 24. VADER sentiment by BERTopic topic (sorted).
Appliedmath 06 00011 g024
Figure 25. LIWC psycholinguistic profiles by harm category.
Figure 25. LIWC psycholinguistic profiles by harm category.
Appliedmath 06 00011 g025
Figure 26. LIWC psycholinguistic profiles by actor type.
Figure 26. LIWC psycholinguistic profiles by actor type.
Appliedmath 06 00011 g026
Figure 27. Multi-method comparison: (A) Topics by method, (B) Largest topic size, (C) Cross-method alignment heatmap, (D) Domain-specific alignment.
Figure 27. Multi-method comparison: (A) Topics by method, (B) Largest topic size, (C) Cross-method alignment heatmap, (D) Domain-specific alignment.
Appliedmath 06 00011 g027
Figure 28. Sentiment range comparison across methods.
Figure 28. Sentiment range comparison across methods.
Appliedmath 06 00011 g028
Table 1. Dataset Overview.
Table 1. Dataset Overview.
AttributeValue
Total Incidents3494
Time PeriodJanuary 2014–October 2024
Pre-ChatGPT (before 30 November 2022)1288 (36.9%)
Post-ChatGPT (after 30 November 2022)2206 (63.1%)
Mean Text Length~150 words per incident
Table 2. Analytical Methods Mapped to Research Objectives.
Table 2. Analytical Methods Mapped to Research Objectives.
MethodResearch ObjectiveUnique Contribution
LDA Topic ModelingThematic structureProbabilistic topic mixtures
NMF Topic ModelingThematic structureParts-based decomposition
K-Means ClusteringHigh-level categorizationDiscrete assignments
BERTopic ClusteringFine-grained discoveryAutomatic topic selection
VADER SentimentEmotional framingValence scores (−1 to +1)
LIWC AnalysisPsycholinguistic profilingCognitive style dimensions
Table 3. Mathematical Comparison of Analytical Methods.
Table 3. Mathematical Comparison of Analytical Methods.
MethodTypeObjective FunctionOutput
LDAProbabilistic generativemax P(W|α, β) via variational EMTopic distributions θ_d, φ_k
NMFMatrix factorizationmin ||X-WH||2_F, W, H ≥ 0Document-topic W, topic-term H
K-MeansPartitional clusteringmin Σ||x-μ_k||2Cluster assignments
BERTopicDensity-based clusteringHDBSCAN on UMAP embeddingsTopics + outliers
VADERLexicon-basedcompound = Σv_i/√(Σv_i2 + α)Sentiment scores [−1, +1]
Table 4. Incidents by Year.
Table 4. Incidents by Year.
YearCountYearCount
2014122020423
2015282021512
2016672022634
20171342023892
20182452024198 *
2019349 * partial
Table 5. Primary Harm Category Counts.
Table 5. Primary Harm Category Counts.
Harm CategoryCountPercent
AI system safety and reliability101729.1%
Physical and psychological harms72120.6%
Privacy and data protection53415.3%
Discrimination and bias45613.1%
Misinformation and manipulation39811.4%
Other harms36810.5%
Table 6. Actor Type Counts.
Table 6. Actor Type Counts.
Actor TypeCountPercent
Private sector/Industry245670.3%
Government/Public sector53415.3%
Research/Academic2878.2%
Individual actors1564.5%
Other/Unknown611.7%
Table 7. Top Countries/Regions.
Table 7. Top Countries/Regions.
Country/RegionCountPercent
United States167848.0%
China42312.1%
United Kingdom3128.9%
European Union2878.2%
Australia1343.8%
Other66018.9%
Table 8. K-Means Cluster Characteristics.
Table 8. K-Means Cluster Characteristics.
ClusterLabelCount (%)Top Terms
1General AI & Platform Risks2808 (80.4%)ai, google, data, facebook, algorithm
2Autonomous Vehicles205 (5.9%)tesla, car, driving, autopilot, crash
3Facial Recognition & Surveillance481 (13.8%)facial, recognition, police, surveillance
Table 9. BERTopic Top Topics.
Table 9. BERTopic Top Topics.
TopicLabelCountTop Terms
7Facial Recognition362facial, recognition, police
23Facebook/Social Media324facebook, meta, instagram
15AI Bias255bias, algorithm, discrimination
24ChatGPT200chatgpt, openai, gpt
1Self-Driving198tesla, autopilot, driving
22Deepfake Porn185deepfake, porn, video
Table 10. LDA topics (15 topics).
Table 10. LDA topics (15 topics).
TopicLabelN (%)Top Terms
10AI/ML & Algorithmic Bias1035 (29.6%)ai, intelligence, learning, machine, bias
1Deepfakes & Synthetic Media502 (14.4%)deepfake, video, fake, media, generated
4Social Media Platforms425 (12.2%)facebook, social, media, instagram, users
9ChatGPT & Generative AI375 (10.7%)chatgpt, openai, generative, chatbot
11Facial Recognition363 (10.4%)facial, recognition, police, surveillance
13Tesla & Self-Driving201 (5.8%)tesla, driving, self, autopilot, car
Table 11. NMF topics (15 topics).
Table 11. NMF topics (15 topics).
TopicLabelN (%)Top Terms
7Deepfakes & Synthetic Media417 (11.9%)deepfake, videos, fake, video, media
10US Government & Regulation362 (10.4%)united, states, federal, government
2Facial Recognition353 (10.1%)facial, recognition, police, surveillance
14Machine Learning & Bias352 (10.1%)learning, machine, bias, algorithm
4Facebook & Social Media263 (7.5%)facebook, social, media, instagram
6ChatGPT & Generative AI237 (6.8%)chatgpt, openai, chatbot, generative
3Tesla & Self-Driving193 (5.5%)tesla, driving, car, self, autopilot
Table 12. NMF-LDA Topic Correspondence.
Table 12. NMF-LDA Topic Correspondence.
DomainNMF TopicLDA TopicOverlap%
Tesla/Self-DrivingTopic 3Topic 1385.0%
Facial RecognitionTopic 2Topic 1173.1%
DeepfakesTopic 7Topic 171.9%
ChatGPTTopic 6Topic 967.1%
Table 13. Overall Sentiment Statistics.
Table 13. Overall Sentiment Statistics.
ClassificationCountPercentage
Positive (compound ≥ 0.05)213861.2%
Neutral742.1%
Negative (compound ≤ −0.05)128236.7%
Table 14. Sentiment by Topic Category.
Table 14. Sentiment by Topic Category.
CategoryMean Compound% Negative% Positive
Child Safety/YouTube−0.32667.6%
Self-Driving/Tesla−0.14057.6%
US Government AI+0.63387.2%
ChatGPT/GenAI+0.58685.0%
Table 15. LIWC Profiles by Harm Category.
Table 15. LIWC Profiles by Harm Category.
Harm CategoryAnalyticToneCognitionAffectSocial
Physical harms78.232.112.44.88.2
Privacy82.445.614.23.27.8
Discrimination76.828.411.85.69.4
Misinformation74.238.213.64.210.8
Table 16. Cross-Method Maximum Alignment.
Table 16. Cross-Method Maximum Alignment.
Method PairMax AlignmentInterpretation
BERTopic ↔ NMF96.4%Strongest
K-Means ↔ BERTopic91.0%Strong
BERTopic ↔ LDA86.5%Good
K-Means ↔ LDA84.9%Good
K-Means ↔ NMF84.4%Good
LDA ↔ NMF81.6%Moderate
Table 17. Domain-Specific Cross-Method Alignment.
Table 17. Domain-Specific Cross-Method Alignment.
DomainK-MeansBERTopicLDANMFAlignment
Autonomous VehiclesCluster 2Topic 1Topic 13Topic 384–89%
Facial RecognitionCluster 3Topic 7Topic 11Topic 266–68%
Table 18. Comprehensive multi-method comparison summary.
Table 18. Comprehensive multi-method comparison summary.
AspectK-MeansBERTopicLDANMF
TypeCentroid ClusteringDensity ClusteringProbabilistic (Dirichlet)Matrix Factorization
Topics/Clusters3251515
Outlier HandlingNone (forced)377 outliers (10.8%)None (forced)None (forced)
Largest Group80.40%10.4%29.6%11.9%
DistributionHighly ImbalancedBalancedModerately ImbalancedBalanced
GranularityCoarseFine-grainedMediumMedium
K-Means Align91.0%84.9%84.4%
InterpretabilityHigh (simple)High (detailed)GoodExcellent
Best ForHigh-level taxonomyDetailed discoveryProbabilistic modelingClean categorization
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Raghupathi, W.; Ren, J.; Kulkarni, T. Examining the Nature and Dimensions of Artificial Intelligence Incidents: A Machine Learning Text Analytics Approach. AppliedMath 2026, 6, 11. https://doi.org/10.3390/appliedmath6010011

AMA Style

Raghupathi W, Ren J, Kulkarni T. Examining the Nature and Dimensions of Artificial Intelligence Incidents: A Machine Learning Text Analytics Approach. AppliedMath. 2026; 6(1):11. https://doi.org/10.3390/appliedmath6010011

Chicago/Turabian Style

Raghupathi, Wullianallur, Jie Ren, and Tanush Kulkarni. 2026. "Examining the Nature and Dimensions of Artificial Intelligence Incidents: A Machine Learning Text Analytics Approach" AppliedMath 6, no. 1: 11. https://doi.org/10.3390/appliedmath6010011

APA Style

Raghupathi, W., Ren, J., & Kulkarni, T. (2026). Examining the Nature and Dimensions of Artificial Intelligence Incidents: A Machine Learning Text Analytics Approach. AppliedMath, 6(1), 11. https://doi.org/10.3390/appliedmath6010011

Article Metrics

Back to TopTop