Open Government Data Topic Modeling and Taxonomy Development
Abstract
:1. Introduction
Background and Motivation
2. Literature Review
2.1. Benchmarking Open Government Data
2.2. OGD Impact Evaluation Benchmarks
2.3. Impact Assessment—Methods Used
3. Materials and Methods
3.1. Data Extraction and Preprocessing
3.2. Topic Modeling
3.2.1. Generative Pre-Trained Transformer
3.2.2. Non-Negative Matrix Factorization
3.2.3. Latent Dirichlet Allocation
- Choose a multinomial distribution ϕt for topic t(t ∈ {1, …, T}) from a Dirichlet distribution with parameter β;
- Choose a multinomial distribution θd for document d(d ∈ {1, …, M}) from a Dirichlet distribution with parameter α;
- For a word wn (n ∈ {1, …, Nd}) in document d, select a topic zn from θd and select a word wn from ϕzn.
3.2.4. Hierarchical Dirichlet Process
3.2.5. Latent Semantic Analysis
- U contains the left singular vectors;
- Σ a diagonal matrix with singular values; and
- VT is the transpose of the matrix with the right singular vectors.
3.3. Taxonomy Development
3.3.1. Meta-Characteristic
- Purpose of the taxonomy: The meta-characteristic should align with the objective of the taxonomy and must reflect the core purpose or objective of the taxonomy.
- Audience and use cases: The meta-characteristic should be relevant to the stakeholders and use cases, guiding the classification in a way that meets their needs.
- Scope of the domain: The meta-characteristic should capture the essential scope of the domain being studied. It should be broad enough to encompass all relevant dimensions but specific enough to ensure focus.
- Mutually exclusive and collectively exhaustive: The meta-characteristic should help guide the development of dimensions and characteristics that are mutually exclusive (no overlap between categories) and collectively exhaustive (covering all relevant aspects).
3.3.2. Ending Conditions
3.3.3. Approach Selection
3.3.4. Validation
3.3.5. Process Schema
4. Results
4.1. Iteration 1—Initial Taxonomy Draft Based on Predefined Domains
Step | Action | Outcome |
---|---|---|
3ec | Approach: empirical-to-conceptual | Base taxonomy on existing classifications. |
4ec | Defined OGD use approaches | Identified 4 types of use: Everyday use, long-term use, direct use, indirect use |
5ec | Classified impact areas | Grouped into 4 broad dimensions: Governmental, Social, Environmental, Economic |
6ec | Created taxonomy matrix | Mapped 13 thematic areas to dimensions (see Table 3) |
7 | Evaluated taxonomy | Found lack of granularity, prompting machine learning integration |
4.2. Iteration 2—Refinement Using GPT-Based Topic Modeling
4.3. Iteration 3—Refinement Through Additional Topic Modeling Techniques
4.4. Iteration 4—Final Refinements and Conceptual Validation
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Organisation for Economic Co-operation and Development. Open Government Data Report: Enhancing Policy Maturity for Sustainable Impact; OECD Publishing: Paris, France, 2018; pp. 3–4. [Google Scholar] [CrossRef]
- Attard, J.; Orlandi, F.; Scerri, S.; Auer, S. A systematic review of open government data initiatives. Gov. Inf. Q. 2015, 32, 399–418. [Google Scholar] [CrossRef]
- Attard, J.; Orlandi, F.; Auer, S. Value Creation on Open Government Data. In Proceedings of the 2016 49th Hawaii International Conference on System Sciences (HICSS), Koloa, HI, USA, 5–8 January 2016; pp. 2605–2614. [Google Scholar] [CrossRef]
- Safarov, I.; Meijer, A.; Grimmelikhuijsen, S. Utilization of open government data: A systematic literature review of types, conditions, effects and users. Inf. Polity 2017, 22, 1–24. [Google Scholar] [CrossRef]
- Ubaldi, B. Open Government Data: Towards Empirical Analysis of Open Government Data Initiatives; OECD Publishing: Paris, France, 2013; pp. 4–17. [Google Scholar] [CrossRef]
- Yan, A.; Weber, N. Mining Open Government Data Used in Scientific Research. In Proceedings of the 13th International Conference, iConference 2018, Transforming Digital Worlds, Lecture Notes in Computer Science, Sheffield, UK, 25–28 March 2018; Volume 10766, pp. 303–313. [Google Scholar] [CrossRef]
- Jaeger, P.T.; Bertot, J.C. Transparency and technological change: Ensuring equal and sustained public access to government information. Gov. Inf. Q. 2010, 27, 371–376. [Google Scholar] [CrossRef]
- Buttow, C.V.; Weerts, S. Open Government Data: The OECD’s Swiss army knife in the transformation of government. Policy Internet 2022, 14, 219–234. [Google Scholar] [CrossRef]
- Fan, B.; Meng, X. Moderating Effects of Governance on Open Government Data Quality and Open Government Data Utilization: Analysis Based on the Resource Complementarity Perspective. J. Glob. Inf. Technol. Manag. 2023, 26, 300–322. [Google Scholar] [CrossRef]
- Nikiforova, A. Smarter Open Government Data for Society 5.0: Are Your Open Data Smart Enough? Sensors 2021, 21, 5204. [Google Scholar] [CrossRef]
- Jiang, H.; Duan, Y.; Zhu, Y. Citizens’ Continuous-Use Intention to Open Government Data: Empirical Evidence from China. In Proceedings of the 10th International Conference on Big Data, Big Data 2021, Virtual Event, 10–14 December 2021; Volume 12988, pp. 62–79. [Google Scholar] [CrossRef]
- Ruijer, E.H.J.M.; Martinius, E. Researching the democratic impact of open government data: A systematic literature review. Inf. Polity 2017, 22, 233–250. [Google Scholar] [CrossRef]
- Open Data Institute. Available online: https://theodi.org/ (accessed on 7 March 2025).
- European Commission. Danish Basic Data Program. Available online: https://ec.europa.eu/digital-building-blocks/sites/pages/viewpage.action?pageId=533365971 (accessed on 7 March 2025).
- Farhadloo, M.; Rosso, M.; Animesh, A. Open government data, innovation and diversification: The pursuit of economic value. Transform. Gov. People Process Policy 2024, 18, 722–743. [Google Scholar] [CrossRef]
- González-Zapata, F.; Rivera, A.; Chauvet, L.; Emilsson, C.; Zahuranec, A.; Young, A.; Verhulst, S. Open Data in Action: Initiatives during the Initial Stage of the COVID-19 Pandemic (Report); GovLab, Organisation for Economic Co-operation and Development (OECD): Paris, France, 2021. [Google Scholar] [CrossRef]
- Roa, H.N.; Loza-Aguirre, E.; Flores, P. A Survey on the Problems Affecting the Development of Open Government Data Initiatives. In Proceedings of the Sixth International Conference on eDemocracy & eGovernment (ICEDEG), Quito, Ecuador, 24–26 April 2019; pp. 157–163. [Google Scholar] [CrossRef]
- Zuiderwijk, A.; Janssen, M. Open data policies, their implementation and impact: A framework for comparison. Gov. Inf. Q. 2014, 31, 17–29. [Google Scholar] [CrossRef]
- Jamieson, D.; Wilson, R.; Martin, M. The (im)possibilities of open data? Public Money Manag. 2019, 39, 364–368. [Google Scholar] [CrossRef]
- Zhang, L.; Sok, S. The impact of government open data platform construction on corporate capital market performance: Evidence from stock liquidity. Pac. Basin Financ. J. 2025, 90, 102667. [Google Scholar] [CrossRef]
- Tan, L.; Pei, J. Open government data and the urban–rural income divide in China: An exploration of data inequalities and their consequences. Sustainability 2023, 15, 9867. [Google Scholar] [CrossRef]
- Peng, X.; Xiao, D. Can open government data improve city green land-use efficiency? Evidence from China. Land 2024, 13, 1891. [Google Scholar] [CrossRef]
- European Commission. 2024 Open Data Maturity Report; Publications Office of the European Union: Luxembourg, 2024; pp. 5–15. [Google Scholar] [CrossRef]
- European Commission. European Data Portal Report; Capgemini Invent: Luxembourg, 2020; pp. 10–16. [Google Scholar] [CrossRef]
- Tai, K. Open government research over a decade: A systematic review. Gov. Inf. Q. 2021, 38, 101566. [Google Scholar] [CrossRef]
- Ali Hassan, M.; Twinomurinzi, H. A Systematic Literature Review of Open Government Data Research: Challenges, Opportunities and Gaps. In Proceedings of the 2018 Open Innovations Conference (OI), Johannesburg, South Africa, 3–5 October 2018; pp. 299–304. [Google Scholar] [CrossRef]
- Zhang, H.; Zheng, L. Research on Open Government Data in China:: A Critical Assessment of 587 Papers. In Proceedings of the 15th International Conference on Theory and Practice of Electronic Governance, Guimaraes, Portugal, 4–7 October 2022; pp. 516–521. [Google Scholar] [CrossRef]
- Young, A.; Verhulst, S. The Global Impact of Open Data: Key Findings from Detailed Case Studies Around the World; O’Reilly Media: Sebastopol, CA, USA, 2016. [Google Scholar]
- Publications Office of the European Union. Use Cases. Available online: https://data.europa.eu/en/publications/use-cases (accessed on 15 February 2025).
- Crusoe, J.; Clarinval, A. Classification of Open Government Data Solutions’ Help: A Novel Taxonomy and Cluster Analysis. Electron. Gov. 2023, 14130, 230–245. [Google Scholar] [CrossRef]
- Zeleti, A.F. Analytical Frame for Open Data Impact Assessment—An Exploratory Research. SSRN 2023. [Google Scholar] [CrossRef]
- Cagliero, L.; Garza, P. Improving classification models with taxonomy information. Data Knowl. Eng. 2013, 86, 85–101. [Google Scholar] [CrossRef]
- Charalabidis, Y.; Alexopoulos, C.; Loukis, E. A taxonomy of open government data research areas and topics. J. Organ. Comput. Electron. Commer. 2016, 26, 41–63. [Google Scholar] [CrossRef]
- Mohamad, A.N.; Sylvester, A.; Campbell-Meier, J. Towards a taxonomy of research areas in open government data. Online Inf. Rev. 2024, 48, 67–83. [Google Scholar] [CrossRef]
- Zuiderwijk, A.; Reuver, M.D. Why open government data initiatives fail to achieve their objectives: Categorizing and prioritizing barriers through a global survey. Transform. Gov. People Process Policy 2021, 15, 377–395. [Google Scholar] [CrossRef]
- Hao-En, K. Between International Practice and Academia: Review and integration of Open Government Data Benchmarks. In Proceedings of the 24th Annual International Conference on Digital Government Research, Gdansk, Poland, 11–14 July 2023; pp. 73–89. [Google Scholar] [CrossRef]
- Zuiderwijk, A.; Pirannejad, A.; Susha, I. Comparing open data benchmarks: Which metrics and methodologies determine countries’ positions in the ranking lists? Telemat. Inform. 2021, 62, 101634. [Google Scholar] [CrossRef]
- Open Knowledge Foundation. Global Open Data Index. Available online: http://index.okfn.org/ (accessed on 8 July 2024).
- Open Data Economy. Available online: https://www.opendataeconomy.org/ (accessed on 8 July 2024).
- Open Data Inventory Network. Available online: https://odin.opendatawatch.com/ (accessed on 8 July 2024).
- World Bank Group. Readiness Assessment Tool. Available online: https://opendatatoolkit.worldbank.org/en/data/opendatatoolkit/odra (accessed on 8 July 2024).
- World Wide Web Foundation. Open Data Barometer. Available online: https://opendatabarometer.org/ (accessed on 9 July 2024).
- Janssen, M.; Charalabidis, Y.; Zuiderwijk, A. Benefits, Adoption Barriers and Myths of Open Data and Open Government. Inf. Syst. Manag. 2012, 29, 258–268. [Google Scholar] [CrossRef]
- Stuermer, M.; Dapp, M. Measuring the Promise of Open Data: Development of the Impact Monitoring Framework. In Proceedings of the 2016 Conference for E-Democracy and Open Government (CeDEM), Krems, Austria, 18–20 May 2016; pp. 197–203. [Google Scholar] [CrossRef]
- Jetzek, T.; Avital, M.; Bjorn-Andersen, N. The Sustainable Value of Open Government Data. J. Assoc. Inf. Syst. 2019, 20, 702–734. [Google Scholar] [CrossRef]
- Jetzek, T.; Avital, M.; Bjørn-Andersen, N. Generating Value from Open Government Data. In Proceedings of the International Conference on Information Systems, ICIS 2013, Milano, Italy, 15–18 December 2013. [Google Scholar]
- Jetzek, T.; Avital, M.; Bjørn-Andersen, N. Data-Driven Innovation through Open Government Data. J. Theor. Appl. Electron. Commer. Res. 2014, 9, 100–120. [Google Scholar] [CrossRef]
- Donker, F.; van Loenen, B. How to assess the success of the open data ecosystem? Int. J. Digit. Earth 2016, 10, 284–306. [Google Scholar] [CrossRef]
- Kao, H. Societal, Economic, Political and Environmental: A Review of Benchmarks and AI-assisted Systematic Literature Review of Impact of Open Government Data. In Proceedings of the 25th Annual International Conference on Digital Government Research, dg.o ’24, Taipei, Taiwan, 11–14 June 2024; pp. 574–593. [Google Scholar] [CrossRef]
- Dawes, S.; Vidiasova, L.; Trutnev, D. Approaches to assessing open government data programs: Comparison of common traits and differences at global context. In Proceedings of the 2015 2nd International Conference on Electronic Governance and Open Society: Challenges in Eurasia, EGOSE ’15, St. Petersburg, Russia, 24–25 November 2015; pp. 24–28. [Google Scholar] [CrossRef]
- Zheng, L.; Kwok, W.; Aquaro, V.; Qi, X.; Lyu, W. Evaluating global open government data: Methods and status. In Proceedings of the 13th International Conference on Theory and Practice of Electronic Governance, ICEGOV ’20, Athens, Greece, 23–25 September 2020; pp. 381–391. [Google Scholar] [CrossRef]
- Kawashita, I.M.; Baptista, A.A.; Soares, D.S. An Assessment of Open Government Data Benchmark Instruments. In Proceedings of the IFIP WG 8.5 International Conference EGOV-CeDEM-ePart, Linkoping, Sweden, 31 August–2 September 2020; pp. 29–40. [Google Scholar]
- Osorio-Sanabria, M.; Brito-Carvajal, J.; Astudillo, H.; Amaya-Fernàndez, F.; Gonzalez-Zabala, M. Evaluating Open Government Data Programs: A Systematic Mapping Study. In Proceedings of the 2020 Seventh International Conference on eDemocracy & eGovernment, ICEDEG, Buenos Aires, Argentina, 22–24 April 2020; pp. 157–164. [Google Scholar] [CrossRef]
- Ferencek, A. Impact assessment of open government data. In Proceedings of the 34th Bled eConference Digital Support from Crisis to Progressive Change, Bled, Slovenia, 27–30 June 2021; pp. 783–791. [Google Scholar] [CrossRef]
- Afful-Dadzie, E.; Afful-Dadzie, A. Liberation of public data: Exploring central themes in open government data and freedom of information research. Int. J. Inf. Manag. 2017, 37, 664–672. [Google Scholar] [CrossRef]
- Bilkova, R.; Machova, R.; Lnenicka, M. Evaluating the Impact of Open Data Using Partial Least Squares Structural Equation Modeling. Sci. Pap. Univ. Pardubic. Ser. D 2015, 22, 29–41. [Google Scholar]
- Machova, R.; Lnenicka, M. Modelling E-Government Development through the Years Using Cluster Analysis. JeDEM 2016, 8, 62–83. [Google Scholar] [CrossRef]
- Yerden, X.; Luna-Reyes, L. Promoting Government Impacts through Open Data: Key Influential Factors. In Proceedings of the 22nd Annual International Conference on Digital Government Research, dg.o ’21, Omaha, NE, USA, 9–11 June 2021; pp. 180–188. [Google Scholar] [CrossRef]
- Buteau, S.; Rao, P.; Mehta, A.; Kadirvell, V. Developing a Framework to Assess Socio-Economic Value of Open Data in India. In Proceedings of the 14th International Symposium on Open Collaboration, OpenSym ’18, Paris, France, 22–24 August 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Viscusi, G.; Castelli, M.; Batini, C. Assessing Social Value in Open Data Initiatives: A Framework. Future Internet 2014, 6, 498–517. [Google Scholar] [CrossRef]
- dos Santos Brito, K.; da Silva Costa, M.; Cardoso Garcia, V.; de Lemos Meira, S. Assessing the benefits of open government data: The case of Meu Congresso Nacional in Brazilian elections 2014. In Proceedings of the 16th Annual International Conference on Digital Government Research, dg.o ’15, Phoenix, AZ, USA, 27–30 May 2015; pp. 89–96. [Google Scholar] [CrossRef]
- Tinati, R.; Carr, L.; Halford, S.; Pope, C. Exploring the impact of adopting open data in the UK government. In Proceedings of the Digital Futures 2012, Aberdeen, UK, 23–25 October 2012; p. 3. [Google Scholar]
- Meng, A. Investigating the Roots of Open Data’s Social Impact. JeDEM 2014, 6, 1–13. [Google Scholar] [CrossRef]
- Silva, P.N.; Pinheiro, M.M.K. DGABr: Metric for Evaluating Brazilian Open Government Data. Inf. Soc. 2018, 28, 225–243. [Google Scholar] [CrossRef]
- Vracic, T.; Varga, M.; Curko, K. Effects and evaluation of open government data initiative in Croatia. In Proceedings of the 2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO, Opatija, Croatia, 30 May–3 June 2016; pp. 1521–1526. [Google Scholar] [CrossRef]
- Silva, P.N.; Pinheiro, M.M.K. Metricas para Dados Governamentais Abertos. In Proceedings of the Encontro Nacional De Pesquisa Em Ciência Da Informação, ENANCIB, Sao Paulo, Brasil, 22–26 October 2018; pp. 314–332. [Google Scholar] [CrossRef]
- Wilson, B.; Cong, C. Beyond the supply side: Use and impact of municipal open data in the U.S. Telemat. Inform. 2021, 58, 101526. [Google Scholar] [CrossRef]
- Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P. Natural Language Processing (Almost) from Scratch. J. Mach. Learn. 2011, 12, 2493–2537. [Google Scholar] [CrossRef]
- Hevner, A.; March, S.; Park, J.; Ram, S. Design Science in Information Systems Research. MIS Q. 2004, 28, 75–105. [Google Scholar] [CrossRef]
- Azevedo, A.; Santos, M. KDD, SEMMA and CRISP-DM: A parallel overview. In Proceedings of the IADIS European Conference on Data Mining, Amsterdam, The Netherlands, 24–26 July 2008; pp. 182–185. [Google Scholar]
- Nickerson, R.; Varshney, U.; Muntermann, J. A method for taxonomy development and its application in information systems. Eur. J. Inf. Syst. 2013, 22, 336–359. [Google Scholar] [CrossRef]
- Miner, G.D.; Elder, J.F.; Nisbet, R. Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications, 1st ed.; Academic Press: New York, NY USA, 2012. [Google Scholar]
- Textract. Available online: https://textract.readthedocs.io/en/stable/ (accessed on 24 September 2024).
- Bird, S.; Loper, E.; Klein, E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit, 1st ed.; O’Reilly Media: Sebastopol, CA, USA, 2009. [Google Scholar]
- Barde, B.V.; Bainwad, A.M. An overview of topic modeling methods and tools. In Proceedings of the 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 15–16 June 2017; pp. 745–750. [Google Scholar] [CrossRef]
- OpenAI. ChatGPT. Available online: https://openai.com/chatgpt (accessed on 5 March 2024).
- Lund, B.; Wang, T. Chatting about ChatGPT: How may AI and GPT impact academia and libraries? Libr. Hi Tech News 2023, 40, 26–29. [Google Scholar] [CrossRef]
- Roumeliotis, K.I.; Tselikas, N.D. ChatGPT and Open-AI Models: A Preliminary Review. Future Internet 2023, 15, 192. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Ambartsoumian, A.; Popowich, F. Self-attention: A better building block for sentiment analysis neural network classifiers. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Brussels, Belgium, 31 October 2018; pp. 130–139. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015. [Google Scholar] [CrossRef]
- OpenAI. API Reference Introduction. Available online: https://platform.openai.com/docs/api-reference/introduction (accessed on 5 March 2024).
- Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef]
- Carbonetto, P.; Sarkar, A.K.; Wang, Z.; Stephens, M. Non-negative matrix factorization algorithms greatly improve topic model fits. arXiv 2021, arXiv:2105.13440. [Google Scholar] [CrossRef]
- Purpura, A. Non-negative Matrix Factorization for Topic Modeling. In Proceedings of the Biennial Conference on Design of Experimental Search & Information Retrieval Systems, Bertinoro, Italy, 28–31 August 2018. [Google Scholar]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Ostrowski, D. Using latent dirichlet allocation for topic modelling in twitter. In Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing, Anaheim, CA, USA, 7–9 February 2015; pp. 493–497. [Google Scholar] [CrossRef]
- Christy, A.; Praveena, A.; Shabu, J. A Hybrid Model for Topic Modeling Using Latent Dirichlet Allocation and Feature Selection Method. J. Comput. Theor. Nanosci. 2019, 16, 3367–3371. [Google Scholar] [CrossRef]
- Muchene, L.; Safari, W. Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya. PLoS ONE 2021, 16, e0243208. [Google Scholar] [CrossRef]
- Teh, Y.; Jordan, M.; Beal, M.; Blei, D. Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 2006, 101, 1566–1581. [Google Scholar] [CrossRef]
- Zhang, M.; He, T.; Li, F.; Peng, L. Incorporating Hierarchical Dirichlet Process into Tag Topic Model. Chin. Lex. Semant. 2013, 8229, 368–377. [Google Scholar] [CrossRef]
- Dai, A.; Storkey, A. The Supervised Hierarchical Dirichlet Process. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 243–255. [Google Scholar] [CrossRef]
- Fan, W.; Bouguila, N. Online Data Clustering Using Variational Learning of a Hierarchical Dirichlet Process Mixture of Dirichlet Distributions. In Proceedings of the 19th International Conference, DASFAA 2014, International Workshops: BDMA, DaMEN, SIM3, UnCrowd, Bali, Indonesia, 21–24 April 2014; pp. 18–32. [Google Scholar] [CrossRef]
- Deerwester, S.; Dumais, S.T.; Furnas, G.W.; Landauer, T.K.; Harshman, R. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 1990, 41, 391–407. [Google Scholar] [CrossRef]
- Valdez, D.; Pickett, A.C.; Goodson, P. Topic Modeling: Latent Semantic Analysis for the Social Sciences. Soc. Sci. Q. 2018, 99, 1665–1679. [Google Scholar] [CrossRef]
- Gupta, I.; Chatterjee, I.; Gupta, N. Latent Semantic Analysis based Real-world Application of Topic Modeling: A Review Study. In Proceedings of the 2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS), Coimbatore, India, 23–25 February 2022; pp. 1142–1149. [Google Scholar] [CrossRef]
- Alghamdi, R.; Alfalqi, K. A Survey of Topic Modeling in Text Mining. Int. J. Adv. Comput. Sci. Appl. 2015, 6, 147–153. [Google Scholar] [CrossRef]
- Glass, R.; Vessey, I. Contemporary application-domain taxonomies. IEEE Softw. 1995, 12, 63–76. [Google Scholar]
- Miller, J.; Roth, A. A taxonomy of manufacturing strategies. Manag. Sci. 1994, 40, 285–304. [Google Scholar]
- Bailey, K. Typologies and Taxonomies—An Introduction to Classification Techniques; Sage: Thousand Oaks, CA, USA, 1994. [Google Scholar]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar] [CrossRef]
- Gunawan, D.; Sembiring, C.A.; Budiman, M.A. The implementation of cosine similarity to calculate text relevance between two documents. J. Phys. Conf. Ser. 2018, 978, 012120. [Google Scholar] [CrossRef]
- Li, B.; Han, L. Distance weighted cosine similarity measure for text classification. In Proceedings of the Intelligent Data Engineering and Automated Learning–IDEAL 2013: 14th International Conference, Hefei, China, 20–23 October 2013; pp. 611–618. [Google Scholar] [CrossRef]
- Muflikhah, L.; Baharudin, B. Document clustering using concept space and cosine similarity measurement. In Proceedings of the 2009 International Conference on Computer Technology and Development, Kota Kinabalu, Malaysia, 13–15 November 2009; Volume 1, pp. 58–62. [Google Scholar] [CrossRef]
Ending Condition | Explanation | Subjective/Objective |
---|---|---|
All objects have been classified | Every object under consideration is classified into one and only one characteristic for each dimension. | Objective |
No new dimensions or characteristics emerge | In recent iterations, no new dimensions or characteristics have emerged. This indicates that the taxonomy has reached a point of saturation and further iterations are unlikely to add value. | Objective |
At least one object is classified under every characteristic of every dimension | For every dimension in taxonomy, each characteristic must have at least one object that can be placed under that characteristic. | Objective |
Dimensions are mutually exclusive and collectively exhaustive | The dimensions in the taxonomy do not overlap, and together, they account for all the possibilities within the scope of the meta-characteristic. | Objective |
All dimensions and characteristics are unique | Each dimension and characteristic must be distinct from one another to ensure the clarity and utility of the taxonomy. | Objective |
Conciseness | The taxonomy should not be overly complex. It should be as simple as possible while still fulfilling its purpose. Overly detailed taxonomies might be difficult to use and understand. | Subjective |
Usefulness | The taxonomy should provide value for the intended purpose. | Subjective |
Comprehensiveness | The taxonomy should be complete in the sense that it includes all necessary dimensions and characteristics that are important for the classification. | Subjective |
Extendibility | The taxonomy should be designed so that it can be expanded to accommodate new categories or characteristics that may arise as the field evolves. | Subjective |
Explanatory power | The taxonomy should clearly show the relationships between the different dimensions and characteristics. It should help users see how one category relates to another, and why certain objects are grouped together. | Subjective |
# | Use Case Sector | Abbreviation |
---|---|---|
1 | Agriculture, Fisheries, Forestry and Foods | AFF |
2 | Economy and Finance | EFI |
3 | Education, Culture and Sport | ECS |
4 | Energy | ENR |
5 | Environment | ENV |
6 | Government and Public Sector | GPS |
7 | Health | HLT |
8 | International Issues | IIS |
9 | Justice, Legal System and Public Safety | JLP |
10 | Population and Society | PSO |
11 | Regions and Cities | RCI |
12 | Science and Technology | STE |
13 | Transport | TRS |
Initiative Focus | Governmental | Social | Environmental | Economic | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GPS | RCI | IIS | ECS | HLT | PSO | JLP | AFF | ENR | ENV | EFI | STE | TRS | |
Everyday use | x | x | x | x | x | x | x | x | x | x | x | x | |
Long-term use | x | x | x | x | x | x | x | x | x | x | x | x | x |
Direct use | x | x | x | x | x | x | x | x | x | x | x | x | |
Indirect use | x | x | x | x | x | x | x | x |
Step | Action | Outcome |
---|---|---|
3ec | Approach: empirical-to-conceptual | Extend taxonomy based on findings derived from GPT topic modeling |
4ec | Expanded OGD use approaches | Added Collaborative and Individual Use |
5ec | GPT-based topic modeling | Extracted 210 unique keywords from OGD use cases |
6ec | Grouped keywords into impact dimensions | Identified 7 refined dimensions (see Table 6) based on hierarchical clustering New dimensions and approaches were integrated in existing taxonomy |
7 | Evaluated taxonomy | Some terms were too abstract, requiring additional refinement using multiple topic modeling techniques |
# | Dimension | Subcategories |
---|---|---|
1 | Society | Community, Public governance, Social justice |
2 | Health | Public health, Wellness |
3 | Infrastructure | Transport, Urban development |
4 | Education | Skill development, Knowledge dissemination |
5 | Innovation | Technology, Research, Sustainability |
6 | Governance | Policy, Public spending, Crisis management |
7 | Environment | Climate action, Pollution, Sustainability |
Step | Action | Outcome |
---|---|---|
3ec | Approach: empirical-to-conceptual | Extend taxonomy based on findings derived from other topic modeling techniques |
4ec | Kept existing use approaches | No new ways of use were identified |
5ec | Topic modeling | We used NMF, LDA, HDP, LSA and extracted 453 additional unique keywords from OGD use cases |
6ec | Merged and refined categories | Many terms overlapped, some merged, some removed |
7 | Evaluated taxonomy | Found no new dimensions or subdimensions, only new characteristics. Confirming taxonomy structure |
Technique | Strength | Weakness |
---|---|---|
Generative Pre-Trained Transformer (GPT) | Broad contextual awareness and ability to recognize high-level themes from textual data | Generated abstract terms, some lacking specific real-world relevance |
Non-Negative Matrix Factorization (NMF) | Clear topic separation, effectively distinguishing different thematic clusters in the dataset | Required manual tuning of parameters, and results varied depending on preprocessing steps |
Latent Dirichlet Allocation (LDA) | Probabilistic topic assignment allowed for identification of nuanced topics across multiple documents | Lacked coherence in some topic groupings |
Hierarchical Dirichlet Process (HDP) | Adapted dynamically to the data, allowing for a flexible number of topics instead of requiring a predefined topic count | Less control over output, making it harder to fine-tune and interpret results |
Latent Semantic Analysis (LSA) | Found latent relationships between words, improving identification of synonyms and related concepts | Required dimensionality reduction, which sometimes led to loss of meaningful information |
Step | Action | Outcome |
---|---|---|
3ce | Approach: conceptual-to-empirical | Extend taxonomy based on findings derived from other topic modeling techniques |
4ce | Conceptual validation | Evaluated against taxonomy development criteria No additional approaches for using OGD were identified |
5ce, 6ce | Conceptual validation | No additional characteristics or dimensions were identified |
7 | Assessed taxonomy against ending conditions | Verified coverage, exclusivity, simplicity, practicality, and extendibility |
Initiative Focus | Everyday Use | Long-term Use | Direct Use | Indirect Use | Collaborat. Use | Individual Use |
---|---|---|---|---|---|---|
Society | ||||||
Community dynamics | ||||||
Community | x | x | x | x | x | |
Participation | x | x | x | x | x | |
Civic engagement | x | x | x | |||
Culture | x | x | x | x | ||
Public services | ||||||
Social justice | ||||||
Cohesion | x | x | x | x | ||
Justice | x | x | x | |||
Inequality | x | x | x | |||
Homelessness | x | x | x | x | ||
Poverty | x | x | x | |||
Gender equality | x | x | x | |||
Health | ||||||
Public health | ||||||
Health | x | x | x | x | ||
Sanitation | x | x | x | |||
Food | x | x | x | x | ||
Public health services | x | x | x | x | ||
Wellness | ||||||
Healthy lifestyle | x | x | x | x | ||
Well-being | x | x | x | x | x | |
Infrastructure | ||||||
Transportation systems | ||||||
Transport | x | x | x | x | ||
Public transport | x | x | x | x | ||
Parking | x | x | x | |||
Cycling | x | x | x | |||
Urbanism | x | x | x | x | ||
Urban development | ||||||
Engineering | x | x | ||||
Infrastructure | x | x | ||||
Logistics | x | x | ||||
City information | x | x | x | x | ||
Energy | x | x | x | x | ||
Education | ||||||
Skill development | ||||||
Education | x | x | x | x | ||
Data literacy | x | x | x | x | ||
Science | x | x | x | |||
Technology | x | x | x | |||
Knowledge dissemination | ||||||
History | x | x | x | |||
Trends | x | x | x | |||
Journalism | x | x | x | |||
Statistics | x | x | x | |||
Research | x | x | x | |||
Innovation | ||||||
Technology and research | ||||||
Innovation | x | x | x | x | x | |
Technology innovation | x | x | x | |||
Industrial innovation | x | x | x | |||
Engineering | x | x | x | |||
Data science | x | x | x | |||
Geospatial science | x | x | x | x | ||
Sustainability | ||||||
Governance | ||||||
Public governance | ||||||
Government | x | x | x | |||
Policy | x | x | x | |||
Transparency | x | x | x | |||
Public spending | x | x | x | |||
Consulting | x | x | x | |||
Politics | x | x | x | |||
Crisis management | ||||||
Crisis | x | x | x | x | ||
Natural disasters | x | x | x | |||
Economic crises | x | x | x | |||
Public health emergencies | x | x | x | |||
Environment | ||||||
Pollution | x | x | x | x | ||
Climate action | x | x | x | |||
Environmental sustainability | x | x | x | |||
Agriculture | x | x | x | x |
Taxons | Description | Level |
---|---|---|
Society | 1 | |
Community dynamics | 2 | |
Community | Refers to the people, groups, and organizations in a particular area or interest, focusing on their interaction and relationships. | 3 |
Participation | The active involvement of individuals in community activities, decision-making, or civic processes. | 3 |
Civic engagement | Activities that connect individuals with public life and governance, such as voting, volunteering, or advocacy. | 3 |
Culture | The shared values, customs, beliefs, and practices within a community that shape collective identity. | 3 |
Public services | Services provided by the government for the welfare of the public, such as healthcare, education, and sanitation. | 2 |
Social justice | 2 | |
Cohesion | The strength and unity of a society, ensuring members feel part of the community. | 3 |
Justice | The fair and impartial treatment of individuals, ensuring equity in laws and social systems. | 3 |
Inequality | The disparities and unequal distribution of resources, rights, and opportunities in society. | 3 |
Homelessness | The issue of individuals lacking permanent housing and the societal response to this problem. | 3 |
Poverty | The condition where individuals or groups lack the financial resources to meet basic needs. | 3 |
Gender equality | The fair treatment and equal opportunities for all genders. | 3 |
Health | 1 | |
Public health | 2 | |
Health | The overall state of physical, mental, and social well-being of individuals and populations. | 3 |
Sanitation | Measures to promote hygiene and prevent disease through clean water, waste disposal, etc. | 3 |
Food | Access to nutritious food and addressing food security in the population. | 3 |
Public health services | Health services provided or regulated by the government for public benefit. | 3 |
Wellness | 2 | |
Healthy lifestyle | Practices that promote physical, mental, and emotional well-being. | 3 |
Well-being | A broader concept encompassing quality of life, including happiness, health, and life satisfaction. | 3 |
Infrastructure | 1 | |
Transportation systems | 2 | |
Transport | General movement of goods and people, including roads, railways, and other means. | 3 |
Public transport | Publicly accessible transport services like buses, trains, subways, etc. | 3 |
Parking | Infrastructure related to vehicle parking and its availability in urban environments. | 3 |
Cycling | Bicycle transportation, infrastructure, and the promotion of cycling as a mode of transport. | 3 |
Urban development | 2 | |
Engineering | The application of science and technology in designing and building infrastructure. | 3 |
Infrastructure | The fundamental facilities and systems serving a city or country, including transport, utilities, and buildings. | 3 |
Logistics | The coordination and movement of resources, goods, and people. | 3 |
City information | Systems and technologies providing information and services to urban residents. | 3 |
Energy | Energy production, distribution, and sustainability in cities. | 3 |
Education | 1 | |
Skill development | 2 | |
Education | Formal and informal teaching and learning processes. | 3 |
Data literacy | The ability to understand, interpret, and use data in various contexts. | 3 |
Science | Scientific education and research, promoting the understanding of natural and social sciences. | 3 |
Technology | Teaching and learning related to technological innovations and applications. | 3 |
Knowledge dissemination | 2 | |
History | The study and sharing of past events and historical information. | 3 |
Trends | Analyzing current and emerging trends in various fields for educational purposes. | 3 |
Journalism | The profession and practice of reporting and disseminating news and information. | 3 |
Statistics | The science of collecting, analyzing, and interpreting data. | 3 |
Research | Systematic investigation to establish facts, theories, or new knowledge. | 3 |
Innovation | 1 | |
Technology and research | 2 | |
Other innovation | The development and application of new ideas, methods, or technologies in fields not explicitly classified. | 3 |
Technology innovation | Advancements in technology aimed at improving systems, products, or services. | 3 |
Industrial innovation | Innovation within the manufacturing and industrial sectors. | 3 |
Engineering innovation | Applying scientific and technical knowledge to innovate and improve infrastructure and technology. | 3 |
Data science | The application of data-driven techniques to extract insights and foster innovation. | 3 |
Geospatial science | Using technologies like GIS and remote sensing to collect, analyze, and interpret spatial data. | 3 |
Sustainability | Practices and policies designed to maintain long-term environmental, social, and economic well-being. | 2 |
Governance | 1 | |
Public governance | 2 | |
Government | The institution responsible for creating and enforcing laws and policies. | 3 |
Policy | The principles and strategies used by the government to govern public affairs. | 3 |
Transparency | Openness and accountability in governance to ensure public trust. | 3 |
Public spending | Allocation of public funds to various sectors like education, healthcare, etc. | 3 |
Consulting | Advisory services for government and public institutions to improve efficiency. | 3 |
Politics | The activities and decisions related to governance, power, and public administration. | 3 |
Crisis management | 2 | |
Crisis | Managing and responding to critical situations affecting public safety or well-being. | 3 |
Natural disasters | Strategies for preparing for and responding to environmental crises like floods, earthquakes, etc. | 3 |
Economic crises | Addressing financial downturns and maintaining economic stability. | 3 |
Public health emergencies | Responding to large-scale health crises, such as pandemics. | 3 |
Environment | 1 | |
Pollution | The contamination of air, water, and land due to human activity and its impact on health and ecosystems. | 2 |
Climate action | Efforts to reduce or mitigate the effects of climate change through policy and technological innovation. | 2 |
Environmental sustainability | Practices aimed at balancing economic growth with the preservation of natural resources. | 2 |
Agriculture | Sustainable farming practices and the management of natural resources in food production. | 2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ferencek, A.; Kljajić Borštnar, M. Open Government Data Topic Modeling and Taxonomy Development. Systems 2025, 13, 242. https://doi.org/10.3390/systems13040242
Ferencek A, Kljajić Borštnar M. Open Government Data Topic Modeling and Taxonomy Development. Systems. 2025; 13(4):242. https://doi.org/10.3390/systems13040242
Chicago/Turabian StyleFerencek, Aljaž, and Mirjana Kljajić Borštnar. 2025. "Open Government Data Topic Modeling and Taxonomy Development" Systems 13, no. 4: 242. https://doi.org/10.3390/systems13040242
APA StyleFerencek, A., & Kljajić Borštnar, M. (2025). Open Government Data Topic Modeling and Taxonomy Development. Systems, 13(4), 242. https://doi.org/10.3390/systems13040242