Does the Choice of Topic Modeling Technique Impact the Interpretation of Aviation Incident Reports? A Methodological Assessment
Abstract
:1. Introduction
2. Related Work
Authors (Year) | Topic Modeling Method(s) | Data Source(s) | Application Domain | Key Findings/Contributions |
---|---|---|---|---|
Xing et al. [38] | Structural Topic Model (STM), TF-IDF, Word Co-occurrence Network (WCN) | National Transportation Safety Board (NTSB) accident reports | Aviation safety analysis | STM provided granular topic partitioning; WCN identified key risk factors like “inspection of equipment” and “take off” |
Nanyonga et al. [32] | LDA, NMF, LSA, PLSA, K-means clustering | NTSB aviation incident narratives | Comparative NLP analysis in aviation safety | LDA achieved highest coherence; clustering revealed thematic commonalities in incident narratives |
Liu et al. [39] | LDA, BERT-based Semantic Network (BSN) | Air Traffic Control (ATC) incident reports | Air traffic control risk analysis | Identified 17 risk topics; human factors and operational procedures were prominent; BSN highlighted inter-topic correlations |
Kuhn. D, [21] | STM | ASRS reports | Aviation incident trend analysis | STM uncovered issues like fuel pump and landing gear problems; highlighted specific approach path concerns at SFO |
Rose et al. [31] | STM | ASRS and NTSB reports | Aviation safety data analysis | STM effectively identified themes within technical datasets; performance improved with specific corpora |
Xu et al. [40] | Text Classification, BERTopics | Chinese civil aviation safety oversight reports | Safety oversight automation | Proposed method improved classification accuracy; reduced manual workload in analyzing oversight reports |
Luo and Shi, [13] | lda2vec | ASRS reports | Aviation safety report analysis | Unsupervised approach identified latent topics with higher interpretability; reduced reliance on manual labeling |
Robinson. S, [20] | LDA | ASRS reports | Temporal trend analysis in aviation safety | Identified temporal trends in safety concerns; demonstrated effectiveness of NLP in sensemaking |
Ahadh et al. [19] | Semi-supervised keyword extraction, Topic Modeling | ASRS and Pipeline and Hazardous Materials Safety Administration (PHMSA) | Cross-domain accident analysis | Achieved 80% classification accuracy; method effective with limited manual intervention |
3. Materials and Methods
3.1. Data Collection and Preprocessing
3.2. Topic Modeling Techniques
- LDA
- 2.
- NMF
- 3.
- pLSA
- 4.
- BERTopic
3.3. Model Evaluation
3.4. Implementation Framework
4. Results
4.1. Coherence Score and Perplexity
4.2. Interpretability Assessment
4.2.1. Topic Distribution
4.2.2. Topic Wordcloud
4.2.3. Topic Word Scores
4.3. Model Comparison
4.3.1. Top 10 Words per Model
4.3.2. Model Strengths and Limitations
4.3.3. Model Evaluation
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
Abbreviation | Definition |
ATSB | Australian Transport Safety Bureau |
ASN | Aviation Safety Network |
BERTopic | Bidirectional Encoder Representations from Transformers Topic Modeling |
LDA | Latent Dirichlet Allocation |
ML | Machine Learning |
NMF | Non-Negative Matrix Factorization |
NLP | Natural Language Processing |
pLSA | Probabilistic Latent Semantic Analysis |
References
- Wild, G.J.I.A.; Magazine, E.S. Airbus A32x Versus Boeing 737 Safety Occurrences. IEEE Aerosp. Electron. Syst. Mag. 2023, 38, 4–12. [Google Scholar] [CrossRef]
- Australian Transport Safety Bureau. Investigation Report; Australian Transport Safety Bureau: Canberra, ACT, Australia, 1999. [Google Scholar]
- Nanyonga, A.; Wasswa, H.; Wild, G. Phase of Flight Classification in Aviation Safety Using LSTM, GRU, and BiLSTM: A Case Study with ASN Dataset. In Proceedings of the 2023 International Conference on High Performance Big Data and Intelligent Systems (HDIS), Macau, China, 6–8 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 24–28. [Google Scholar]
- Nanyonga, A.; Wasswa, H.; Wild, G. Aviation Safety Enhancement via NLP & Deep Learning: Classifying Flight Phases in ATSB Safety Reports. In Proceedings of the 2023 Global Conference on Information Technologies and Communications (GCITC), Bangalore, India, 1–3 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Lee, D.; Seung, H.S. Algorithms for non-negative matrix factorization. In Proceedings of the NIPS’00: Proceedings of the 14th International Conference on Neural Information Processing Systems, NeurIPS 2000, Denver, CO, USA, 27 November–2 December 2000. [Google Scholar]
- Hofmann, T. Probabilistic Latent Semantic Analysis; UAI: Rio de Janeiro, Brazil, 1999; Volume 99, pp. 289–296. [Google Scholar]
- Grootendorst, M.J. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar]
- Stoltz, D.S.; Taylor, M.A. text2map: R tools for text matrices. J. Open Source Softw. 2022, 7, 3741. [Google Scholar] [CrossRef]
- Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef] [PubMed]
- Nanyonga, A.; Joiner, K.; Turhan, U.; Wild, G. Applications of natural language processing in aviation safety: A review and qualitative analysis. In Proceedings of the AIAA SCITECH 2025 Forum, Orlando, FL, USA, 6–10 January 2025; p. 2153. [Google Scholar]
- Yang, C.; Huang, C.J.A. Natural language processing (NLP) in aviation safety: Systematic review of research and outlook into the future. Aerospace 2023, 10, 600. [Google Scholar] [CrossRef]
- Luo, Y.; Shi, H. Using lda2vec topic modeling to identify latent topics in aviation safety reports. In Proceedings of the 2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS), Beijing, China, 17–19 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 518–523. [Google Scholar]
- Xu, J.; Li, T. Application of multimodal NLP instruction combined with speech recognition in oral english practice. Mob. Inf. Syst. 2022, 2022, 2262696. [Google Scholar] [CrossRef]
- Ricketts, J.; Barry, D.; Guo, W.; Pelham, J.J.S. A scoping literature review of natural language processing application to safety occurrence reports. Safety 2023, 9, 22. [Google Scholar] [CrossRef]
- Morais, C.; Yung, K.L.; Johnson, K.; Moura, R.; Beer, M.; Patelli, E. Identification of human errors and influencing factors: A machine learning approach. Saf. Sci. 2022, 146, 105528. [Google Scholar] [CrossRef]
- Jiao, Y.; Dong, J.; Han, J.; Sun, H. Classification and causes identification of Chinese civil aviation incident reports. Appl. Sci. 2022, 12, 10765. [Google Scholar] [CrossRef]
- Robinson, S.D. Visual representation of safety narratives. Saf. Sci. 2016, 88, 123–128. [Google Scholar] [CrossRef]
- Ahadh, A.; Binish, G.V.; Srinivasan, R. Text mining of accident reports using semi-supervised keyword extraction and topic modeling. Process. Saf. Environ. Prot. 2021, 155, 455–465. [Google Scholar] [CrossRef]
- Robinson, S.D. Temporal topic modeling applied to aviation safety reports: A subject matter expert review. Saf. Sci. 2019, 116, 275–286. [Google Scholar] [CrossRef]
- Kuhn, K.D. Using structural topic modeling to identify latent topics and trends in aviation incident reports. Transp. Res. Part C Emerg. Technol. 2018, 87, 105–122. [Google Scholar] [CrossRef]
- Zhong, B.; Pan, X.; Love, P.E.; Sun, J.; Tao, C. Hazard analysis: A deep learning and text mining framework for accident prevention. Adv. Eng. Inform. 2020, 46, 101152. [Google Scholar] [CrossRef]
- Krishnan, A. Exploring the power of topic modeling techniques in analyzing customer reviews: A comparative analysis. arXiv 2023, arXiv:2308.11520. [Google Scholar]
- Datchanamoorthy, K.J. Text mining: Clustering using bert and probabilistic topic modeling. Soc. Inform. J. 2023, 2, 1–13. [Google Scholar] [CrossRef]
- Bellaouar, S.; Bellaouar, M.M.; Ghada, I.E. Topic modeling: Comparison of LSA and LDA on scientific publications. In Proceedings of the 2021 4th International Conference on Data Storage and Data Engineering, Barcelona, Spain, 18–20 February 2021; pp. 59–64. [Google Scholar]
- Nanayakkara, A.C.; Thennakoon, G.J. Enhancing Social Media Content Analysis with Advanced Topic Modeling Techniques: A Comparative Study. Int. J. Adv. ICT Emerg. Reg. 2024, 17, 40–47. [Google Scholar] [CrossRef]
- Kaur, A.; Wallace, J.R. Moving Beyond LDA: A Comparison of Unsupervised Topic Modelling Techniques for Qualitative Data Analysis of Online Communities. arXiv 2024, arXiv:2412.14486. [Google Scholar]
- Bagheri, R.; Entezarian, N.; Sharifi, M.H. Topic Modeling on System Thinking Themes Using Latent Dirichlet Allocation, Non-Negative Matrix Factorization and BER Topic. J. Syst. Think. Pract. (JSTINP) 2023, 2, 33–56. [Google Scholar]
- Abuzayed, A.; Al-Khalifa, H.J. BERT for Arabic topic modeling: An experimental study on BERTopic technique. Procedia Comput. Sci. 2021, 189, 191–194. [Google Scholar] [CrossRef]
- Mihajlov, T.; Nešić, M.I.; Stanković, R.; Kitanović, O. Topic Modeling of the SrpELTeC Corpus: A Comparison of NMF, LDA, and BERTopic. In Proceedings of the 2024 19th Conference on Computer Science and Intelligence Systems (FedCSIS), Belgrade, Serbia, 8–11 September 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 649–653. [Google Scholar]
- Rose, R.L.; Puranik, T.G.; Mavris, D.N.; Rao, A.H.J.R.E.; Safety, S. Application of structural topic modeling to aviation safety data. Reliab. Eng. Syst. Saf. 2022, 224, 108522. [Google Scholar] [CrossRef]
- Nanyonga, A.; Wasswa, H.; Turhan, U.; Joiner, K.; Wild, G. Exploring Aviation Incident Narratives Using Topic Modeling and Clustering Techniques. In Proceedings of the 2024 IEEE Region 10 Symposium (TENSYMP), New Delhi, India, 27–29 September 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
- Nanyonga, A.; Wasswa, H.; Wild, G. Topic Modeling Analysis of Aviation Accident Reports: A Comparative Study between LDA and NMF Models. In Proceedings of the 2023 3rd International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), Bangalore, India, 29–31 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–2. [Google Scholar]
- Nanyonga, A.; Wasswa, H.; Turhan, U.; Joiner, K.; Wild, G. Comparative Analysis of Topic Modeling Techniques on ATSB Text Narratives Using Natural Language Processing. In Proceedings of the 2024 3rd International Conference for Innovation in Technology (INOCON), Bangalore, India, 1–3 March 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–7. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
- Mbaye, S.; Walsh, H.S.; Jones, G.; Davies, M. BERT-based Topic Modeling and Information Retrieval to Support Fishbone Diagramming for Safe Integration of Unmanned Aircraft Systems in Wildfire Response. In Proceedings of the 2023 IEEE/AIAA 42nd Digital Avionics Systems Conference (DASC), Barcelona, Spain, 1–5 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–7. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Xing, Y.; Wu, Y.; Zhang, S.; Wang, L.; Cui, H.; Jia, B.; Wang, H. Discovering latent themes in aviation safety reports using text mining and network analytics. Int. J. Transp. Sci. Technol. 2024, 16, 292–316. [Google Scholar] [CrossRef]
- Liu, W.; Zhang, H.; Shi, Z.; Wang, Y.; Chang, J.; Zhang, J.J.S. Risk topics discovery and trend analysis in air traffic control operations—air traffic control incident reports from 2000 to 2022. Sustainability 2023, 15, 12065. [Google Scholar] [CrossRef]
- Xu, Y.; Gan, Z.; Guo, R.; Wang, X.; Shi, K.; Ma, P.J.A. Hazard Analysis for Massive Civil Aviation Safety Oversight Reports Using Text Classification and Topic Modeling. Aerospace 2024, 11, 837. [Google Scholar] [CrossRef]
- Paul, S.; Purkaystha, B.S.; Das, P.J. NLP TOOLS USED IN CIVIL AVIATION: A SURVEY. Int. J. Adv. Res. Comput. Sci. 2018, 9, 109–114. [Google Scholar] [CrossRef]
- Blair, S.J.; Bi, Y.; Mulvenna, M.D. Aggregated topic models for increasing social media topic coherence. Appl. Intell. 2020, 50, 138–156. [Google Scholar] [CrossRef]
- Wang, Y.-X.; Zhang, Y.-J. Nonnegative matrix factorization: A comprehensive review. IEEE Trans. Knowl. Data Eng. 2012, 25, 1336–1353. [Google Scholar] [CrossRef]
- Eggert, J.; Korner, E. Sparse coding and NMF. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat No 04CH37541), Budapest, Hungary, 25–29 July 2004; IEEE: Piscataway, NJ, USA, 2004; pp. 2529–2533. [Google Scholar]
- Song, H.A.; Lee, S.-Y. Hierarchical representation using NMF. In Proceedings of the Neural Information Processing: 20th International Conference, ICONIP 2013, Daegu, Republic of Korea, 3–7 November 2013; Proceedings, Part I 20. Springer: Berlin/Heidelberg, Germany, 2013; pp. 466–473. [Google Scholar]
- Tijare, P.; Rani, P.J. Exploring popular topic models. J. Phys. Conf. Ser. 2020, 1706, 012171. [Google Scholar] [CrossRef]
- Galli, C.; Colangelo, M.T.; Meleti, M.; Guizzardi, S.; Calciolari, E. Topic Analysis of the Literature Reveals. Big Data Cogn. Comput. 2024, 9, 7. [Google Scholar] [CrossRef]
- Vorontsov, K.; Potapenko, A.J.M.L. Additive regularization of topic models. Mach Learn. 2015, 101, 303–323. [Google Scholar] [CrossRef]
- Wang, R.-S.; Zhang, S.; Wang, Y.; Zhang, X.-S.; Chen, L.J.N. Clustering complex networks and biological networks by nonnegative matrix factorization with various similarity measures. Neurocomputing 2008, 72, 134–141. [Google Scholar] [CrossRef]
- Egger, R.; Yu, J.J. A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts. Front. Sociol. 2022, 7, 886498. [Google Scholar] [CrossRef]
- Mbaye, S.; Walsh, H.S.; Davies, M.; Infeld, S.I.; Jones, G. From BERTopic to SysML: Informing Model-Based Failure Analysis with Natural Language Processing for Complex Aerospace Systems. In Proceedings of the AIAA SCITECH 2024 Forum, Orlando, FL, USA, 8–12 January 2024; p. 2700. [Google Scholar]
- Griffiths, T.L.; Steyvers, M. Finding scientific topics. Proc. Natl. Acad. Sci. USA 2004, 101 (Suppl. 1), 5228–5235. [Google Scholar] [CrossRef] [PubMed]
- Shastry, P.; Prakash, C. Comparative analysis of LDA, LSA and NMF topic modelling for web data. AIP Conf. Proc. 2023, 2901, 060006. [Google Scholar]
- Blei, D.M.; Lafferty, J.D. A correlated topic model of science. Ann. Appl. Stat. 2007, 1, 17–35. [Google Scholar] [CrossRef]
- Bosch, A.; Zisserman, A.; Munoz, X. Scene classification via pLSA. In Proceedings of the Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; Proceedings, Part IV 9. Springer: Berlin/Heidelberg, Germany, 2006; pp. 517–530. [Google Scholar]
- Gaussier, E.; Goutte, C. Relation between PLSA and NMF and implications. In Proceedings of the 28th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, 15 August 2005; pp. 601–602. [Google Scholar]
Ref. | Date/Time | Location | State | Phase of Flight | Summary | Injury Level | Departure/Destination |
---|---|---|---|---|---|---|---|
OA2013-00142 | 1/1/2013 12:01 a.m. | near Port Hedland Aerodrome | WA | Descent | During the descent, the aircraft was struck by lightning… | Nil | Perth [YPPH] → Port Hedland [YPPD] |
OA2013-00167 | 1/1/2013 1:00 a.m. | near Williamtown Aerodrome | NSW | Initial Climb | During the initial climb, the aircraft encountered windshear… | Unknown | Williamtown [YWLM] → Melbourne [YMML] |
OA2013-00196 | 1/1/2013 1:00 a.m. | Bali International Airport | Other | Standing | Passenger declared undeclared fireworks on board… | Nil | Bali [WADD] → Perth [YPPH] |
OA2013-00053 | 1/1/2013 8:40 a.m. | Groote Eylandt Aerodrome | NT | Take-off | During take-off, the aircraft struck a bird… | Nil | Groote Eylandt [YGTE] → Cairns [YBCS] |
OA2013-00087 | 1/5/2013 12:01 a.m. | Toowoomba Aerodrome | QLD | Unknown | During a runway inspection, ground staff retrieved a bird carcass… | Nil | - |
OA2013-00045 | 1/5/2013 6:55 a.m. | Darwin Aerodrome | NT | Initial Climb | During the initial climb, the aircraft struck a … | Nil | Darwin [YPDN] → Dili [WPDL] |
OA2013-00248 | 1/5/2013 8:00 a.m. | Isisford (ALA) | QLD | Landing | Aircraft bounced on one wheel in crosswind landing. Gear detached during go-around and subsequent landing caused substantial damage… | Substantial | Isisford [YISF] → Isisford [YISF] |
OA2013-00055 | 1/5/2013 8:30 a.m. | Ballina/Byron Gateway Aerodrome | NSW | Approach | During final approach, the aircraft struck a swallow… | Nil | Sydney [YSSY] → Ballina/Byron [YBNA] |
OA2013-00245 | 1/5/2013 8:30 a.m. | Sydney Aerodrome | NSW | Standing | Ground staff observed smoke from the APU; engineers identified oil leak as the source… | Nil | Sydney [YSSY] |
OA2013-00067 | 1/5/2013 8:35 a.m. | Perth Aerodrome | WA | Unknown | During a runway inspection, the safety officer retrieved a kestrel carcass… | Nil | - |
OA2013-00051 | 1/5/2013 9:30 a.m. | Parafield Aerodrome | SA | Take-off | During take-off run, the aircraft struck a magpie. | Nil | Parafield [YPPF] → Parafield [YPPF] |
OA2013-00634 | 1/5/2013 10:15 a.m. | Perth Aerodrome | WA | Initial Climb | Crew received pitot static system warnings and returned. Engineering found no faults. | Nil | Perth [YPPH] → Sydney [YSSY] |
OA2013-00233 | 1/5/2013 10:40 a.m. | near Karratha Aerodrome | WA | Cruise | Crew received GPU warning during cruise and returned. Inspection found GPU door not latched correctly. | Nil | Karratha [YPKA] → Karratha [YPKA] |
Models | Coherence Score | Perplexity |
---|---|---|
pLSA | 0.7634 | −4.6237 |
LDA | 0.4394 | −6.471 |
NMF | 0.7987 | 2.0739 |
BERTopic | 0.264 | −4.638 |
LDA Model | BERTopic Model | pLSA Model | NMF Model | Theme | Topic No. |
---|---|---|---|---|---|
fumes, detected, failed, cabin, cruise, crew, descent, engineering, inspection, source | bird, landing, struck, butcherbird, parrot, turkey, aircraft, bundey, during, pygmy | aircraft, struck, landing, take, bird, approach, runway, multiple, taxi, magpie, entered, without | bird, struck, aircraft, approach, climb, kite, initial, takeoff, landing, run | Bird Strikes and Landing Issues | 0 |
aircraft, crew, separation, runway, ATC, approach, resulting, observed, Cessna, loss | bird, approach, struck, pale, fantail, durig, frigatebird, stilt, aircraft, blackbird | damage, aircraft, resulting, minor, landing, pilot, sustained, collided, terrain, substantial, operations, control | approach, missed, windshear, conducted, encountered, crew, aircraft, final, flap, ft | Approach and Airspace Separation | 1 |
take, rejected, crew, swallow, rough, martin, fairy, WINDSHEAR, lapwing, masked | entered, taxi, without, clearance, runway, duty, runways, strip, transmission, comply | approach, crew, aircraft, missed, conducted, encountered, flap, windshear, final, runway, PA28, turbulence | strike, evidence, occurred, determined, birdstrike, flight, post, detected, inspection, pre | Clearance and Taxiway Incidents | 2 |
engine, inspection, flight, detected, climb, post, fuel, routine, revealed, determined | initial, cocos, ngukurr, durign, durring, climb, bird, maryborough, denpasar, parrot | received, landing, crew, g0ar, alert, approach,1GPWS, E, indication2 unsafe, climb, faile3 | landing, struck, aircraft, multiple, magpie, roll, gear, bat, galah, swallow | Engine and Mechanical Failures | 3 |
pilot, aircraft, flight, helicopter, terrain, control, damage, sustained, increase, collided | windscreen, cracked, shattered, window, pane, windshield, outer, arcing, layer, heating | pilot, ft, aircraft, flight, Passing, helicopter, runway, observed, VH, circuit, registered, normal | retrieved, officer, safety, carcass, runway, routine, inspection, fox, flying, magpie | Pilot Operations and Mid-Air Collisions | 4 |
RPA, operations, aircraft, aerial, ATC, normal, resulting, collided, door, communications | takeoff, dave, fortescue, bird, forrest, nadi, swan, turkey, lilydale, winged | crew, inspection, engineering, detected, revealed, returned, replaced, fumes, Engineers, engine, aircraft, climb | engine, failed, cruise, crew, returned, engineering, revealed, climb, inspection, gear | RPA and ATC Operations | 5 |
approach, aircraft, crew, encountered, alert, missed, received, conducted, GPWS, clearance | pre, birdstrike, evidence, strike, could, occurred, determined, deteremined, flight, bridstrike | inspection, safety, officer, flight, runway, retrieved, post, carcass, routine, detected, determined, could | resulting, damage, minor, encountered, aircraft, turbulence, pilot, substantial, sustained, collided | Approach and Safety Warnings | 6 |
aircraft, struck, landing, bird, damage, minor, resulting, approach, runway, multiple | final, stint, bird, edinburgh, necked, raven, approach, struck, feet, red | fuel, flight, issue, aircraft, destroyed, investigation, pre, crew, balloon, due, pitot, Jandakot | fumes, cabin, detected, source, descent, cockpit, engineering, did, reveal, inspection | Landing and Fuel System Issues | 7 |
crew, received, landing, gear, approach, engineering, returned, replaced, Engineers, aircraft | climbing, turn, bank, angle, gpws, alert, anlge, received, recieved, crew | engine, crew, pilot, RPA, aircraft, observed, radio, failed, TCAS, cruise, RA, received | gpws, received, alert, bank, angle, crew, climb, warning, climbing, turn | GPWS and Flight Alerts | 8 |
runway, safety, officer, inspection, retrieved, carcass, forced, Australian, partial, determine | plover, spur, winged, landing, struck, aircraft, minor, damage, resulting | separation, ATC, aircraft, resulting, crew, runway, clearance, Cessna, loss, track, without, controller | runway, clearance, entered, taxi, aircraft, atc, airspace, controlled, incorrect, separation | Runway Incursions | 9 |
Model | Strengths | Limitations |
---|---|---|
LDA | - Provides interpretable topics. - Generates distinct word clusters for each topic. - Works well for short and long texts. | - Requires manual tuning of the number of topics. - Struggles with overlapping topics. - Topics can be less coherent for complex datasets. |
BERTopics | - Uses word embeddings, making it context-aware. - Provides dynamic topic reduction. - Can handle large datasets efficiently. - Visualizations (e.g., topic evolution, similarity graphs). | - Computationally expensive due to transformers. - Requires fine-tuning of hyperparameters. - Less interpretable than LDA. |
pLSA | - Good for small datasets. - Finds latent structures in data. - Works well with document similarity tasks. | - Suffers from overfitting. - Does not generalize well to new data. - Lacks a probabilistic prior, leading to instability. |
NMF | - Produces coherent topics. - Works well for short documents. - More deterministic (less randomness). | - Requires normalized data. - Less flexible for diverse document structures. - Can be sensitive to noise in data |
Aspect | LDA (Latent Dirichlet Allocation) | BERTopic (Bidirectional Encoder Representations for Topics) | pLSA (Probabilistic Latent Semantic Analysis) | NMF (Non-Negative Matrix Factorization) |
---|---|---|---|---|
Interpretability | Topics are easy to interpret | Less interpretable, depends on embeddings | Moderate, lacks probabilistic priors | Interpretable, but depends on preprocessing |
Granularity | May mix topics if not well-tuned | Fine-grained topic separation | Decent granularity, but can mix topics | Creates distinct topics, good separation |
Scalability | Scales well, but is slow on large data | Scales well, but computationally expensive | Struggles with large datasets | Scales well but is sensitive to noise |
Topic Coherence | Good but requires tuning | Leverages contextual embeddings | Can generate less coherent topics | Produces clear topics with distinct words |
Computational Cost | Moderate, but increases with more topics | High due to transformer embeddings | Computationally expensive, not scalable | Moderate, needs matrix factorization |
Flexibility | Requires parameter tuning for coherence | Highly flexible, allows dynamic topic modeling | Less flexible, predefined number of topics | Requires non-negative constraints |
Best Use Case | General topic modeling, balanced performance | Complex text, fine-grained topics, embeddings-based | Small datasets, early-stage analysis | Text data with clear structure, document clustering |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nanyonga, A.; Joiner, K.; Turhan, U.; Wild, G. Does the Choice of Topic Modeling Technique Impact the Interpretation of Aviation Incident Reports? A Methodological Assessment. Technologies 2025, 13, 209. https://doi.org/10.3390/technologies13050209
Nanyonga A, Joiner K, Turhan U, Wild G. Does the Choice of Topic Modeling Technique Impact the Interpretation of Aviation Incident Reports? A Methodological Assessment. Technologies. 2025; 13(5):209. https://doi.org/10.3390/technologies13050209
Chicago/Turabian StyleNanyonga, Aziida, Keith Joiner, Ugur Turhan, and Graham Wild. 2025. "Does the Choice of Topic Modeling Technique Impact the Interpretation of Aviation Incident Reports? A Methodological Assessment" Technologies 13, no. 5: 209. https://doi.org/10.3390/technologies13050209
APA StyleNanyonga, A., Joiner, K., Turhan, U., & Wild, G. (2025). Does the Choice of Topic Modeling Technique Impact the Interpretation of Aviation Incident Reports? A Methodological Assessment. Technologies, 13(5), 209. https://doi.org/10.3390/technologies13050209