Convex Hull-Based Topic Similarity Mapping in Multidimensional Data
Abstract
1. Introduction
2. Literature Review
3. Experimental Research
3.1. Dataset Characteristics
3.2. Preprocessing
3.3. Labeling Strategy
3.4. Labeling Quality Metrics
3.5. Convex Hull Construction
4. Research Results
4.1. Labeling Evaluation and K-Selection Analysis
4.2. Full Run Analysis (K = 3000)
4.3. Convex Hull Results and Interpretation
4.4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Egger, R.; Yu, J. A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts. Front. Sociol. 2022, 7, 886498. [Google Scholar] [CrossRef]
- Sy, C.Y.; Maceda, L.L.; Flores, N.M.; Abisado, M.B. Unsupervised machine learning approaches in nlp: A comparative study of topic modeling with bertopic and lda. Int. J. Intell. Syst. Appl. Eng. 2024, 12, 185–194. [Google Scholar]
- Albalawi, R.; Yeap, T.-H.; Benyoucef, M. Using topic modeling methods for short-text data: A comparative analysis. Front. Artif. Intell. 2020, 3, 42. [Google Scholar] [CrossRef] [PubMed]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language under-standing. In Proceedings of the NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Feng, F.; Yang, Y.; Cer, D.; Arivazhagan, N.; Wang, W. Language-agnostic bert sentence embedding. In Proceedings of the ACL 2020, Online, 5–10 July 2020; pp. 878–888. [Google Scholar]
- Mesaros, P.; Mandicak, T.; Spisakova, M.; Behunova, A.; Behun, M. The implementation factors of information and communi-cation technology in the life cycle costs of buildings. Appl. Sci. 2021, 11, 2934. [Google Scholar] [CrossRef]
- Knapcikova, L.; Konings, R. European railway infrastructure: A review. Acta Logist. 2018, 5, 71–77. [Google Scholar] [CrossRef]
- Hrehova, S.; Knapcikova, L. The study of machine learning assisted the design of selected composites properties. Appl. Sci. 2022, 12, 10863. [Google Scholar] [CrossRef]
- Knapcikova, L. Investigation of mechanical properties of recycled polyvinyl butyral after tensile test. Acta Technol. 2018, 4, 63–66. [Google Scholar] [CrossRef]
- Mesaros, P.; Spisakova, M.; Mandicak, T. Analysing the implementation motivations of BIM technology in construction project management. IOP Conf. Ser. Mater. Sci. Eng. 2020, 960, 042064. [Google Scholar] [CrossRef]
- Mandicak, T.; Spisakova, M.; Mesaros, P. Building information technology in economic sustainable construction project man-agement. SGEM Int. Multidiscip. Sci. GeoConference—EXPO Proc. 2022, 22, 509–516. [Google Scholar]
- Mandicak, T.; Mesaros, P.; Kanalikova, A. Digital and ICT competencies of employees for learning under COVID-19 pandemic at the faculty of civil engineering. In Proceedings of the ICERI Proceedings, Seville, Spain, 30–31 October 2020; pp. 2431–2438. [Google Scholar]
- Kliment, M.; Pekarcikova, M.; Trebuna, P.; Trebuna, M. Application of testbed 4.0 technology within the implementation of industry 4.0 in teaching methods of industrial engineering as well as industrial practice. Sustainability 2021, 13, 8963. [Google Scholar] [CrossRef]
- Trebuna, P.; Pekarcikova, M.; Kliment, M. Testing the replenishment model strategy using software tecnomatix plant simulation. In Innovations in Communication and Computing: 4th EAI International Conference on Management of Manufacturing Systems; Springer: Berlin/Heidelberg, Germany, 2020; pp. 103–110. [Google Scholar]
- Trebuna, P.; Mizerak, M.; Trojan, J. Establishing security measures for the protection of production workers through UWB real-time localization technology. Acta Technol. 2023, 9, 39–43. [Google Scholar] [CrossRef]
- Spodniak, M.; Hovanec, M.; Korba, P. Jet engine turbine mechanical properties prediction by using progressive numerical methods. Aerospace 2023, 10, 937. [Google Scholar] [CrossRef]
- Spodniak, M.; Hovanec, M.; Korba, P. A novel method for the natural frequency estimation of the jet engine turbine blades based on its dimensions. Heliyon 2024, 10, e26041. [Google Scholar] [CrossRef]
- Piľa, J.; Korba, P.; Hovanec, M. Aircraft brake temperature from a safety point of view. Sci. J. Silesian Univ. Technol. Ser. Transp. 2017, 94, 175–186. [Google Scholar] [CrossRef]
- Angelov, D. Top2Vec: Distributed representations of topics. arXiv 2020, arXiv:2008.09470. [Google Scholar] [CrossRef]
- Qiang, J.; Qian, Z.; Li, Y.; Yuan, Y.; Wu, X. Short text topic modeling techniques, applications, and performance: A survey. IEEE Trans. Knowl. Data Eng. 2022, 34, 1427–1445. [Google Scholar] [CrossRef]
- Wang, X.; Chen, Y.; Zhang, Y. Short text topic modeling with g-seanmf and semantic aggregation. Multimed. Tools Appl. 2023, 82, 14321–14345. [Google Scholar]
- Zuo, Y.; Wu, J.; Zhang, H.; Lin, H.; Wang, F.; Xu, J. A new model for short text topic modeling using word embeddings. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, TX, USA, 1–5 November 2016. [Google Scholar]
- Conneau, A.; Khandelwal, K.; Goyal, N.; Chaudhary, V.; Wenzek, G.; Guzman, F.; Grave, E.; Ott, M.; Zettlemoyer, L.; Stoyanov, V. Unsupervised cross-lingual representation learning at scale. In Proceedings of the ACL 2020, Online, 5–10 July 2020; pp. 8440–8451. [Google Scholar]
- Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the EMNLP-IJCNLP 2019, Hong Kong, China, 3–7 November 2019; pp. 3982–3992. [Google Scholar]
- Wu, X.; Li, C.; Zhu, Y.; Miao, Y. Short text topic modeling with topic distribution quantization and negative sampling decoder. In Proceedings of the EMNLP 2020, Online, 16–20 November 2020; pp. 1772–1782. [Google Scholar]
- Dieng, A.B.; Ruiz, F.J.R.; Blei, D.M. Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 2020, 8, 439–453. [Google Scholar] [CrossRef]
- Grootendorst, M. Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar]
- McInnes, L.; Healy, J.; Astels, S. hdbscan: Hierarchical density based clustering. J. Open Source Softw. 2017, 2, 205. [Google Scholar] [CrossRef]
- Pikuliak, M.; Grivalský, Š.; Konôpka, M.; Blšták, M.; Tamajka, M.; Bachratý, V.; Šimko, M.; Balážik, P.; Trnka, M.; Uhlárik, F. Slovakbert: Slovak language model and its evaluation. In Proceedings of the 2021 Conference on Computational Linguistics, Online, 6–11 June 2021. [Google Scholar]
- Pikuliak, M.; Grivalsky, S.; Konopka, M.; Blstak, M.; Tamajka, M.; Bachraty, V.; Simko, M.; Balazik, P.; Trnka, M.; Uhlarik, F. Slovakbert: Slovak masked language model. In Proceedings of the Findings of EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 7156–7168. [Google Scholar]
- Catak, F.O.; Kuzlu, M. Uncertainty Quantification in Large Language Models Through Convex Hull Analysis. arXiv 2024, arXiv:2406.19712. [Google Scholar] [CrossRef]
- Werling, M.; Moitra, A. Anchor-based topic modeling: Improving interpretability with convex hull methods. In Proceedings of the 37th International Conference on Machine Learning (ICML), Virtual, 13–18 July 2020. [Google Scholar]
- Bianchi, F.; Terragni, S.; Hovy, D.; Nozza, D.; Fersini, E. Cross-lingual contextualized topic models with zero-shot learning. In Proceedings of the EACL 2021, Online, 19–20 April 2021; pp. 1676–1683. [Google Scholar]





| Year of Completion | Bachelor | Master | Dissertation | Habilitation | Total |
|---|---|---|---|---|---|
| 2006 | 401 | 1313 | 0 | 0 | 1714 |
| 2007 | 820 | 1920 | 0 | 0 | 2740 |
| 2008 | 2253 | 1771 | 1 | 0 | 4025 |
| 2009 | 2585 | 2164 | 0 | 0 | 4749 |
| 2010 | 2719 | 2097 | 168 | 2 | 4986 |
| 2011 | 2672 | 2447 | 139 | 1 | 5259 |
| 2012 | 2523 | 2408 | 182 | 9 | 5122 |
| 2013 | 2162 | 2362 | 172 | 36 | 4732 |
| 2014 | 2082 | 2226 | 169 | 24 | 4501 |
| 2015 | 1482 | 1971 | 146 | 27 | 3626 |
| 2016 | 1411 | 1875 | 126 | 38 | 3450 |
| 2017 | 1291 | 1366 | 122 | 22 | 2801 |
| 2018 | 1206 | 1313 | 113 | 18 | 2650 |
| 2019 | 1224 | 1274 | 114 | 17 | 2629 |
| 2020 | 1272 | 1214 | 118 | 21 | 2625 |
| 2021 | 1413 | 1241 | 120 | 31 | 2805 |
| 2022 | 1456 | 1097 | 80 | 7 | 2640 |
| 2023 | 1183 | 1122 | 129 | 16 | 2450 |
| 2024 | 1259 | 1151 | 80 | 8 | 2498 |
| Total | 31,414 | 32,332 | 1979 | 277 | 66,002 |
| K | Coherence (C_v) | Num Topics | Docs per Topic | Improvement (%) |
|---|---|---|---|---|
| 1000 | 0.3893 | 999 | 66.0 | — |
| 1500 | 0.3895 | 1499 | 44.0 | +0.19 |
| 2000 | 0.3919 | 1999 | 33.0 | +2.45 |
| 2500 | 0.3959 | 2498 | 26.4 | +3.92 |
| 3000 | 0.4000 | 2998 | 22.0 | +4.17 |
| 3500 | 0.4034 | 3497 | 18.9 | +3.35 |
| 4000 | 0.4058 | 3997 | 16.5 | +2.38 |
| … | … | … | … | … |
| 10,000 | 0.4330 | 9997 | 6.6 | +1.35 |
| Statistic | Value |
|---|---|
| Total Topics | 2999 |
| Mean Coherence | 0.433 |
| Median Coherence | 0.371 |
| Standard Deviation | 0.159 |
| Minimum | 0.082 |
| Maximum | 1.000 |
| Quality Category | Coherence Range | Topics | Percentage |
|---|---|---|---|
| Excellent | ≥0.6 | ~450 | ~15% |
| Good | 0.5–0.6 | ~400 | ~13% |
| Moderate | 0.4–0.5 | ~500 | ~17% |
| Poor | <0.4 | ~1650 | ~55% |
| Rank | Topic ID | Coherence (C_v) |
|---|---|---|
| 1 | 2100 | 1.000 |
| 2 | 1183 | 0.985 |
| 3 | 1332 | 0.982 |
| 4 | 3 | 0.979 |
| 5 | 1828 | 0.978 |
| 6 | 653 | 0.972 |
| 7 | 6 | 0.971 |
| 8 | 767 | 0.961 |
| 9 | 84 | 0.951 |
| 10 | 621 | 0.951 |
| Rank | Topic ID | Coherence (C_v) |
|---|---|---|
| 2995 | 2960 | 0.123 |
| 2996 | 1216 | 0.110 |
| 2997 | 2787 | 0.108 |
| 2998 | 1022 | 0.089 |
| 2999 | 2942 | 0.082 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Pohorenec, M.; Vavrák, V.; Behúnová, A.; Behún, M.; Ennert, M. Convex Hull-Based Topic Similarity Mapping in Multidimensional Data. Information 2026, 17, 180. https://doi.org/10.3390/info17020180
Pohorenec M, Vavrák V, Behúnová A, Behún M, Ennert M. Convex Hull-Based Topic Similarity Mapping in Multidimensional Data. Information. 2026; 17(2):180. https://doi.org/10.3390/info17020180
Chicago/Turabian StylePohorenec, Matúš, Vladislav Vavrák, Annamária Behúnová, Marcel Behún, and Michal Ennert. 2026. "Convex Hull-Based Topic Similarity Mapping in Multidimensional Data" Information 17, no. 2: 180. https://doi.org/10.3390/info17020180
APA StylePohorenec, M., Vavrák, V., Behúnová, A., Behún, M., & Ennert, M. (2026). Convex Hull-Based Topic Similarity Mapping in Multidimensional Data. Information, 17(2), 180. https://doi.org/10.3390/info17020180

