Towards a Sustainable Workforce in Big Data Analytics: Skill Requirements Analysis from Online Job Postings Using Neural Topic Modeling
Abstract
1. Introduction
1.1. Research Background and Related Work
1.2. Motivation and Contribution of the Study
- Identification of key expertise roles and task definitions in big data analytics.
- Determination of essential skills and competencies required for big data analytics.
- Development of a skill taxonomy based on technical background.
- Taxonomic distribution and correlation analysis of skills according to expertise roles.
- Analysis of job types and requirements for education and experience levels.
- Highlighting key tasks, techniques, and tools utilized in big data analytics workflows.
2. Materials and Methods
2.1. Data Collection
2.2. Data Preprocessing
2.3. Expertise Roles and Job Title Classification
2.4. Skill and Competency Extraction Using Neural Topic Modeling
2.5. Taxonomic Classification of Skills
2.6. Analysis of the Taxonomic Distribution of Skills Across Expertise Roles
2.7. Analysis of Education, Experience, and Job Type Requirements
2.8. Identification of Tasks and Related Tools in Big Data Analytics
2.9. Validity and Reproducibility
3. Results
3.1. Expertise Roles, Job Titles, and Task Definitions
3.2. Knowledge-Domains, Skills, and Competencies Required for Big Data Analytics
3.3. Taxonomy of the Skills by Technical Knowledge and Background
3.4. Taxonomic Distribution of Skills According to Expertise Roles
3.5. Distribution of Skills by Expertise Roles
3.6. Requirements for Job Type, Education, and Experience
3.7. Key Tasks and Related Tools in Big Data Analytics
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Topic Name | Keywords | % |
---|---|---|
Cloud Services | cloud, aws, azure, gcp, serverless, network, security, storage, architecture, devops | 8.89 |
ETL Processes | data, pipeline, etl, ingestion, transformation, warehouse, integration, extraction, processing, automation | 8.56 |
Machine Learning | machine, model, feature, learning, algorithm, class, regression, training, dataset, prediction | 6.57 |
Software Engineering | software, development, coding, testing, debugging, design, version, deployment, integration, scalability | 5.85 |
Spark Development | spark, rdd, dataframe, dataset, cluster, parallel, execution, optimization, partition, transformation | 5.74 |
Business Analytics | business, insight, analytic, machine, learning, predict, python, metric, forecast, scenario | 4.96 |
Education | degree, science, bachelor, master, related, course, faculty, field, equivalent, engineering | 4.93 |
Agile Development | agile, scrum, kanban, backlog, sprint, iteration, planning, velocity, story, release | 4.69 |
Communication Skills | communication, speaking, strong, verbal, english, listening, writing, presentation, argument, persuasion | 4.64 |
Large-Scale Analytics | scale, processing, framework, parallel, distributed, large, optimization, computation, data, modeling | 4.36 |
Analytical Thinking | reasoning, logic, deduction, induction, evaluation, inference, problem, interpretation, decision, synthesis | 4.21 |
Hadoop Development | hadoop, hive, spark, hdfs, mapreduce, yarn, hbase, pig, sqoop, tez | 3.73 |
Project Management | project, planning, scope, timeline, risk, stakeholder, requirement, deliverable, budget, execution | 3.58 |
Distributed Programming | programming, distributed, parallel, concurrency, communication, develop, process, fault, replication | 3.24 |
Decision-Making | decision, strategy, insight, risk, evaluation, analysis, choice, priority, tradeoff, impact | 3.10 |
Troubleshooting | troubleshooting, issue, problem, debugging, diagnosis, resolution, failure, testing, recovery, log | 3.09 |
ETL Tools | airflow, talend, informatica, pentaho, databricks, fivetran, matillion, ssis, dbt, stitch | 2.78 |
Streaming Analytics | streaming, event, real-time, ingestion, processing, window, computation, change, aggregation, low-latency | 2.60 |
Business Intelligence | business, dashboard, intelligence, insight, metric, visualization, kpi, trend, reporting, strategy | 2.39 |
Problem Solving | problem, solve, solution, logic, reasoning, creativity, innovation, challenge, approach, analysis | 2.23 |
Risk Analysis | risk, security, threat, mitigation, exposure, assessment, compliance, fraud, vulnerability, impact | 2.11 |
Testing | test, validation, verification, unit, integration, regression, automation, bug, performance, acceptance | 2.08 |
Advertising Intelligence | advertising, campaign, targeting, stream, marketing, conversion, optimization, bidding, strategy, message | 1.52 |
Mobile Applications | mobile, app, interface, usability, performance, optimization, framework, backend, frontend, security | 1.50 |
Team Working | collaboration, coordination, leadership, team, relation, engagement, interaction, alignment, teamwork | 1.37 |
Data Analytics Platforms | aws, azure, google, platform, warehouse, lake, analytics, integration, query, scalability, computation, | 1.28 |
References
- Rahmani, A.M.; Azhir, E.; Ali, S.; Mohammadi, M.; Ahmed, O.H.; Ghafour, M.Y.; Ahmed, S.H.; Hosseinzadeh, M. Artificial Intelligence Approaches and Mechanisms for Big Data Analytics: A Systematic Study. PeerJ Comput. Sci. 2021, 7, e488. [Google Scholar] [CrossRef]
- Chen, H.; Chiang, R.H.L.; Storey, V.C. Business Intelligence and Analytics: From Big Data to Big Impact. Mis Q. 2012, 36, 1165–1188. [Google Scholar] [CrossRef]
- Hu, H.; Wen, Y.; Chua, T.S.; Li, X. Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. IEEE Access 2014, 2, 652–687. [Google Scholar] [CrossRef]
- Liang, T.P.; Liu, Y.H. Research Landscape of Business Intelligence and Big Data Analytics: A Bibliometrics Study. Expert Syst. Appl. 2018, 111, 2–10. [Google Scholar] [CrossRef]
- Kumar, N.; Hema, K.; Hordiichuk, V.; Menon, R.; Catherene; Aarthy, C.C.J.; Gonesh, C. Harnessing the Power of Big Data: Challenges and Opportunities in Analytics. Tuijin Jishu/J. Propuls. Technol. 2023, 44, 363–371. [Google Scholar] [CrossRef]
- Philip Chen, C.L.; Zhang, C.-Y. Data-Intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data. Inf. Sci. 2014, 275, 314–347. [Google Scholar] [CrossRef]
- Sun, Z.; Sun, L.; Strang, K. Big Data Analytics Services for Enhancing Business Intelligence. J. Comput. Inf. Syst. 2018, 58, 162–169. [Google Scholar] [CrossRef]
- Zheng, Z.; Cai, Y.; Li, Y. Oversampling Method for Imbalanced Classification. Comput. Inform. 2015, 34, 1017–1037. [Google Scholar]
- Halwani, M.A.; Amirkiaee, S.Y.; Evangelopoulos, N.; Prybutok, V. Job Qualifications Study for Data Science and Big Data Professions. Inf. Technol. People 2022, 35, 510–525. [Google Scholar] [CrossRef]
- Rodríguez-Mazahua, L.; Rodríguez-Enríquez, C.A.; Sánchez-Cervantes, J.L.; Cervantes, J.; García-Alcaraz, J.L.; Alor-Hernández, G. A General Perspective of Big Data: Applications, Tools, Challenges and Trends. J. Supercomput. 2016, 72, 3073–3113. [Google Scholar] [CrossRef]
- Gurcan, F. What Issues Are Data Scientists Talking about? Identification of Current Data Science Issues Using Semantic Content Analysis of Q&A Communities. PeerJ Comput. Sci. 2023, 9, e1361. [Google Scholar] [CrossRef]
- Debortoli, S.; Müller, O.; Vom Brocke, J. Comparing Business Intelligence and Big Data Skills: A Text Mining Study Using Job Advertisements. Bus. Inf. Syst. Eng. 2014, 6, 289–300. [Google Scholar] [CrossRef]
- Rahhal, I.; Kassou, I.; Ghogho, M. Data Science for Job Market Analysis: A Survey on Applications and Techniques. Expert Syst. Appl. 2024, 251, 124101. [Google Scholar] [CrossRef]
- Raschka, S.; Patterson, J.; Nolet, C. Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence. Information 2020, 11, 193. [Google Scholar] [CrossRef]
- Turulja, L.; Vugec, D.S.; Bach, M.P. Big Data and Labour Markets: A Review of Research Topics. Procedia Comput. Sci. 2022, 217, 526–535. [Google Scholar] [CrossRef]
- Jackson, S.E.; Seo, J. The Greening of Strategic HRM Scholarship. Organ. Manag. J. 2010, 7, 278–290. [Google Scholar] [CrossRef]
- Ehnert, I.; Parsa, S.; Roper, I.; Wagner, M.; Muller-Camen, M. Reporting on Sustainability and HRM: A Comparative Study of Sustainability Reporting Practices by the World’s Largest Companies. Int. J. Hum. Resour. Manag. 2016, 27, 88–108. [Google Scholar] [CrossRef]
- World Economic Forum. The Future of Jobs Report. 2023. Available online: https://www.weforum.org/reports/the-future-of-jobs-report-2023 (accessed on 15 October 2025).
- Aljohani, N.R.; Aslam, M.A.; Khadidos, A.O.; Hassan, S.U. A Methodological Framework to Predict Future Market Needs for Sustainable Skills Management Using AI and Big Data Technologies. Appl. Sci. 2022, 12, 6898. [Google Scholar] [CrossRef]
- Han, F.; Ren, J. Analyzing Big Data Professionals: Cultivating Holistic Skills Through University Education and Market Demands. IEEE Access 2024, 12, 23568–23577. [Google Scholar] [CrossRef]
- Gurcan, F.; Gudek, B.; Menekse Dalveren, G.G.; Derawi, M. Future-Ready Skills Across Big Data Ecosystems: Insights from Machine Learning-Driven Human Resource Analytics. Appl. Sci. 2025, 15, 5841. [Google Scholar] [CrossRef]
- Moreno, A.M.; Sanchez-Segura, M.I.; Medina-Dominguez, F.; Carvajal, L. Balancing Software Engineering Education and Industrial Needs. J. Syst. Softw. 2012, 85, 1607–1620. [Google Scholar] [CrossRef]
- Gurcan, F.; Cagiltay, N.E. Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling. IEEE Access 2019, 7, 82541–82552. [Google Scholar] [CrossRef]
- Montandon, J.E.; Politowski, C.; Silva, L.L.; Valente, M.T.; Petrillo, F.; Guéhéneuc, Y.G. What Skills Do IT Companies Look for in New Developers? A Study with Stack Overflow Jobs. Inf. Softw. Technol. 2021, 129, 106429. [Google Scholar] [CrossRef]
- Terblanche, C.; Wongthongtham, P. Ontology-Based Employer Demand Management. Softw. Pract. Exp. 2015, 46, 469–492. [Google Scholar] [CrossRef]
- Radovilsky, Z.; Hegde, V.; Acharya, A.; Uma, U. Skills Requirements of Business Data Analytics and Data Science Jobs: A Comparative Analysis. J. Supply Chain Oper. Manag. 2018, 16, 82–101. [Google Scholar]
- De Mauro, A.; Greco, M.; Grimaldi, M.; Ritala, P. Human Resources for Big Data Professions: A Systematic Classification of Job Roles and Required Skill Sets. Inf. Process. Manag. 2018, 54, 807–817. [Google Scholar] [CrossRef]
- Gardiner, A.; Aasheim, C.; Rutner, P.; Williams, S. Skill Requirements in Big Data: A Content Analysis of Job Advertisements. J. Comput. Inf. Syst. 2018, 58, 374–384. [Google Scholar] [CrossRef]
- Debao, D.; Yinxia, M.; Min, Z. Analysis of Big Data Job Requirements Based on K-Means Text Clustering in China. PLoS ONE 2021, 16, e0255419. [Google Scholar] [CrossRef]
- Ozyurt, O.; Gurcan, F.; Dalveren, G.G.M.; Derawi, M. Career in Cloud Computing: Exploratory Analysis of In-Demand Competency Areas and Skill Sets. Appl. Sci. 2022, 12, 9787. [Google Scholar] [CrossRef]
- Ningrum, P.K.; Pansombut, T.; Ueranantasun, A. Text Mining of Online Job Advertisements to Identify Direct Discrimination during Job Hunting Process: A Case Study in Indonesia. PLoS ONE 2020, 15, e0233746. [Google Scholar] [CrossRef] [PubMed]
- Persaud, A. Key Competencies for Big Data Analytics Professions: A Multimethod Study. Inf. Technol. People 2021, 34, 178–203. [Google Scholar] [CrossRef]
- Calanca, F.; Sayfullina, L.; Minkus, L.; Wagner, C.; Malmi, E. Responsible Team Players Wanted: An Analysis of Soft Skill Requirements in Job Advertisements. EPJ Data Sci. 2019, 8, 13. [Google Scholar] [CrossRef]
- Egger, R.; Yu, J. A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts. Front. Sociol. 2022, 7, 886498. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Chen, J.; Chen, J.; Chen, H. Identifying Interdisciplinary Topics and Their Evolution Based on BERTopic. Scientometrics 2023, 129, 7359–7384. [Google Scholar] [CrossRef]
- Pejic-Bach, M.; Bertoncel, T.; Meško, M.; Krstić, Ž. Text Mining of Industry 4.0 Job Advertisements. Int. J. Inf. Manag. 2020, 50, 416–431. [Google Scholar] [CrossRef]
- Gurcan, F. Major Research Topics in Big Data: A Literature Analysis from 2013 to 2017 Using Probabilistic Topic Models. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing, IDAP 2018, Malatya, Turkey, 28–30 September 2018; pp. 1–4. [Google Scholar]
- Indeed Job Search|Indeed. Available online: https://www.indeed.com/ (accessed on 16 January 2024).
- Gurcan, F.; Erdogdu, F.; Cagiltay, N.E.; Cagiltay, K. Student Engagement Research Trends of Past 10 Years: A Machine Learning-Based Analysis of 42,000 Research Articles. Educ. Inf. Technol. 2023, 28, 15067–15091. [Google Scholar] [CrossRef]
- Feng, J.; Zhang, Z.; Ding, C.; Rao, Y.; Xie, H.; Wang, F.L. Context Reinforced Neural Topic Modeling over Short Texts. Inf. Sci. 2022, 607, 79–91. [Google Scholar] [CrossRef]
- Hickman, L.; Thapa, S.; Tay, L.; Cao, M.; Srinivasan, P. Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations. Organ. Res. Methods 2022, 25, 114–146. [Google Scholar] [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar] [CrossRef]
- Gurcan, F. Identification of Mobile Development Issues Using Semantic Topic Modeling of Stack Overflow Posts. PeerJ Comput. Sci. 2023, 9, e1658. [Google Scholar] [CrossRef]
- Boztaş, G.D.; Berigel, M.; Altınay, F. A Bibliometric Analysis of Educational Data Mining Studies in Global Perspective. Educ. Inf. Technol. 2024, 29, 8961–8985. [Google Scholar] [CrossRef]
- Grootendorst, M. BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar]
- Wu, X.; Nguyen, T.; Luu, A.T. A Survey on Neural Topic Models: Methods, Applications, and Challenges. Artif. Intell. Rev. 2024, 57, 18. [Google Scholar] [CrossRef]
- Xu, K.; Lu, X.; Li, Y.-F.; Wu, T.; Qi, G.; Ye, N.; Wang, D.; Zhou, Z. Neural Topic Modeling with Deep Mutual Information Estimation. Big Data Res. 2022, 30, 100344. [Google Scholar] [CrossRef]
- Gurcan, F.; Dalveren, G.G.M.; Cagiltay, N.E.; Soylu, A. Detecting Latent Topics and Trends in Software Engineering Research Since 1980 Using Probabilistic Topic Modeling. IEEE Access 2022, 10, 74638–74654. [Google Scholar] [CrossRef]
- Blei, D.M. Probabilistic Topic Models. Commun. ACM 2012, 55, 77–84. [Google Scholar] [CrossRef]
- Lukauskas, M.; Šarkauskaitė, V.; Pilinkienė, V.; Stundžienė, A.; Grybauskas, A.; Bruneckienė, J. Enhancing Skills Demand Understanding through Job Ad Segmentation Using NLP and Clustering Techniques. Appl. Sci. 2023, 13, 6119. [Google Scholar] [CrossRef]
- Subakti, A.; Murfi, H.; Hariadi, N. The Performance of BERT as Data Representation of Text Clustering. J. Big Data 2022, 9, 15. [Google Scholar] [CrossRef]
- Murakami, R.; Chakraborty, B. Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts. Sensors 2022, 22, 852. [Google Scholar] [CrossRef]
- Raschka, S.; Mirjalili, V. Python Machine Learning: Machine Learning and Deep Learning with Python, Scikit-Learn, and TensorFlow 2; Packt Publishing Ltd.: Birmingham, UK, 2019. [Google Scholar]
- Kilinc, M.; Gurcan, F.; Soylu, A. LLM-Based Generative AI in Medicine: Analysis of Current Research Trends with BERTopic. IEEE Access 2025, 13, 157026–157043. [Google Scholar] [CrossRef]
- Gurcan, F. Extraction of Core Competencies for Big Data: Implications for Competency-Based Engineering Education. Int. J. Eng. Educ. 2019, 35, 1110–1115. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Scikit-Learn 1. Supervised Learning—Scikit-Learn 1.4.2 Documentation. Available online: https://scikit-learn.org/stable/supervised_learning.html (accessed on 29 April 2024).
- Booker, Q.; Rebman, C.; Wimmer, H.; Levkoff, S.; Powell, L.; Breese, J. Data Analytics Position Description Analysis: Skills Review and Implications for Data Analytics Curricula. Inf. Syst. Educ. J. 2024, 22, 76–87. [Google Scholar] [CrossRef]
- Chang, J.C.; Lu, H.Q. Competency and Training Needs for Net-Zero Sustainability Management Personnel. Sustainability 2025, 17, 3244. [Google Scholar] [CrossRef]
- Boselli, R.; Cesarini, M.; Mercorio, F.; Mezzanzanica, M. Classifying Online Job Advertisements through Machine Learning. Futur. Gener. Comput. Syst. 2018, 86, 319–328. [Google Scholar] [CrossRef]
- Mostafaeipour, A.; Jahangard Rafsanjani, A.; Ahmadi, M.; Arockia Dhanraj, J. Investigating the Performance of Hadoop and Spark Platforms on Machine Learning Algorithms. J. Supercomput. 2021, 77, 1273–1300. [Google Scholar] [CrossRef]
- Smaldone, F.; Ippolito, A.; Lagger, J.; Pellicano, M. Employability Skills: Profiling Data Scientists in the Digital Labour Market. Eur. Manag. J. 2022, 40, 671–684. [Google Scholar] [CrossRef]
- Kantardzic, M. Data Mining: Concepts, Models, Methods, and Algorithms, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2011; ISBN 9780470890455. [Google Scholar]
- Al-Azzam, N.; Shatnawi, I. Comparing Supervised and Semi-Supervised Machine Learning Models on Diagnosing Breast Cancer. Ann. Med. Surg. 2021, 62, 53–64. [Google Scholar] [CrossRef]
- Verma, A.; Yurov, K.M.; Lane, P.L.; Yurova, Y.V. An Investigation of Skill Requirements for Business and Data Analytics Positions: A Content Analysis of Job Advertisements. J. Educ. Bus. 2019, 94, 243–250. [Google Scholar] [CrossRef]
- Najafabadi, M.M.; Villanustre, F.; Khoshgoftaar, T.M.; Seliya, N.; Wald, R.; Muharemagic, E. Deep Learning Applications and Challenges in Big Data Analytics. J. Big Data 2015, 2, 1. [Google Scholar] [CrossRef]
- Miller, S. Collaborative Approaches Needed to Close the Big Data Skills Gap. J. Organ. Des. 2014, 3, 26–30. [Google Scholar] [CrossRef]
- Karakolis, E.; Kapsalis, P.; Skalidakis, S.; Kontzinos, C.; Kokkinakos, P.; Markaki, O.; Askounis, D. Bridging the Gap between Technological Education and Job Market Requirements through Data Analytics and Decision Support Services. Appl. Sci. 2022, 12, 7139. [Google Scholar] [CrossRef]
- Alibasic, A.; Upadhyay, H.; Simsekler, M.C.E.; Kurfess, T.; Woon, W.L.; Omar, M.A. Evaluation of the Trends in Jobs and Skill-Sets Using Data Analytics: A Case Study. J. Big Data 2022, 9, 32. [Google Scholar] [CrossRef]
Role | Top Five Job Titles | Task Definition |
---|---|---|
Analyst (29.14) |
| Works principally with big datasets and performs the analysis, modeling, testing and visualization on these datasets, in addition to assisting big data scientists in performing essential tasks. |
Engineer (28.20) |
| Creates and manages the big data infrastructure and tools in different business domains in addition to developing, testing and evaluating big data solutions within industries. |
Developer (27.25) |
| Works on the development of big data services and applications in multi-platforms, using software engineering processes in order to improve the success of business strategies and big data analytics processes. |
Architect (15.41) |
| Creates and maintains the necessary infrastructure and architecture for big data analytics processes, as well as plans big data workflows, strategies, and solutions. |
Skill/Role | Analyst | Engineer | Developer | Architect | Total |
---|---|---|---|---|---|
Cloud Services | 2.39 | 2.04 | 2.09 | 2.37 | 8.89 |
ETL Processes | 1.22 | 2.79 | 2.38 | 2.17 | 8.56 |
Machine Learning | 2.42 | 1.65 | 2.11 | 0.38 | 6.57 |
Software Engineering | 1.12 | 1.82 | 2.44 | 0.47 | 5.85 |
Spark Development | 1.17 | 1.39 | 1.63 | 1.55 | 5.74 |
Business Analytics | 1.96 | 1.48 | 1.03 | 0.49 | 4.96 |
Education | 1.28 | 1.61 | 1.30 | 0.73 | 4.93 |
Agile Development | 0.92 | 1.46 | 1.94 | 0.36 | 4.69 |
Communication Skills | 1.30 | 1.50 | 1.29 | 0.56 | 4.64 |
Large-Scale Analytics | 1.69 | 1.41 | 0.81 | 0.44 | 4.36 |
Analytical Thinking | 1.64 | 1.34 | 0.80 | 0.42 | 4.21 |
Hadoop Development | 0.92 | 1.10 | 1.13 | 0.57 | 3.73 |
Project Management | 1.11 | 0.75 | 0.79 | 0.93 | 3.58 |
Distributed Programming | 0.56 | 1.03 | 1.47 | 0.18 | 3.24 |
Decision-Making | 1.48 | 0.62 | 0.71 | 0.29 | 3.10 |
Troubleshooting | 0.63 | 1.42 | 0.54 | 0.49 | 3.09 |
ETL Tools | 0.56 | 0.72 | 0.78 | 0.72 | 2.78 |
Streaming Analytics | 1.30 | 0.56 | 0.61 | 0.14 | 2.60 |
Business Intelligence | 1.39 | 0.41 | 0.41 | 0.18 | 2.39 |
Problem Solving | 0.80 | 0.68 | 0.46 | 0.29 | 2.23 |
Risk Analysis | 0.96 | 0.36 | 0.40 | 0.39 | 2.11 |
Testing | 0.58 | 0.62 | 0.66 | 0.22 | 2.08 |
Advertising Intelligence | 0.59 | 0.39 | 0.37 | 0.17 | 1.52 |
Mobile Applications | 0.28 | 0.38 | 0.60 | 0.25 | 1.50 |
Team Working | 0.33 | 0.38 | 0.31 | 0.35 | 1.37 |
Data Analytics Platforms | 0.51 | 0.27 | 0.22 | 0.29 | 1.28 |
Total | 29.14 | 28.20 | 27.25 | 15.41 | 100 |
Skill/Role | Analyst | Engineer | Developer | Architect |
---|---|---|---|---|
Cloud Services | 2 | 2 | 4 | 1 |
Education | 10 | 5 | 8 | 5 |
Communication Skills | 9 | 6 | 9 | 8 |
ETL Processes | * | 1 | 2 | 2 |
Machine Learning | 1 | 4 | 3 | * |
Business Analytics | 3 | 7 | * | 10 |
Software Engineering | * | 3 | 1 | * |
Agile Development | * | 8 | 5 | * |
Spark Development | * | 6 | 3 | |
Large-Scale Analytics | 4 | 10 | * | * |
Troubleshooting | * | 9 | * | 9 |
Hadoop Development | * | * | 10 | 7 |
Analytical Thinking | 5 | * | * | * |
Business Intelligence | 7 | * | * | * |
Decision-Making | 6 | * | * | * |
Distributed Programming | * | * | 7 | * |
ETL Tools | * | * | * | 6 |
Project Management | * | * | * | 4 |
Streaming Analytics | 8 | * | * | * |
Task | Use Case | % |
---|---|---|
Analytics | Extracting insights from large datasets | 7.75 |
Processing | Cleaning, transforming, and structuring data | 7.04 |
Modeling | Creating predictive and prescriptive models | 6.37 |
Integration | Merging multiple data sources efficiently | 6.06 |
Optimization | Enhancing system performance and resource use | 5.51 |
Prediction | Identifying trends and future outcomes | 4.89 |
Classification | Categorizing data into predefined groups | 4.56 |
Management | Ensuring data governance and regulatory compliance | 4.18 |
Forecasting | Predicting future trends based on historical data | 3.82 |
Visualization | Representing complex data with interactive dashboards | 3.62 |
Mining | Discovering hidden patterns in large datasets | 3.46 |
Recognition | Identifying images, text, and speech patterns | 3.22 |
Automation | Streamlining workflows with AI-driven solutions | 2.92 |
Deployment | Implementing models into real-world applications | 2.65 |
Segmentation | Dividing users or markets based on behavior | 2.46 |
Tools | Use Case | % |
---|---|---|
Python, Spark, SQL | Data processing, analytics, and querying with Spark and SQL. | 6.73 |
Python, TensorFlow, SQL | Machine learning with TensorFlow and SQL for data storage. | 6.04 |
Java, Kafka, Flink | Real-time stream processing with Kafka and Flink. | 5.62 |
Python, Snowflake, Tableau | Cloud data warehousing and visualization. | 5.38 |
Scala, Spark, Hadoop | Big data processing with Spark and Hadoop. | 5.07 |
Python, Databricks, SQL | Unified analytics with Databricks and SQL. | 4.94 |
Java, HBase, Spark | NoSQL data storage with HBase and Spark for processing. | 4.73 |
Python, D3.js, SQL | Data visualization with D3.js and SQL for querying. | 4.47 |
Python, Airflow, Kubernetes | Workflow orchestration and cloud-native development. | 4.31 |
R, SQL, Tableau | Statistical analysis and visualization. | 4.12 |
Java, Kubernetes, Python | Cloud-native development with Kubernetes and Python. | 3.78 |
Python, PyTorch, SQL | Deep learning with PyTorch and SQL for data storage. | 3.54 |
Python, Snowflake, Power BI | Cloud data warehousing and business intelligence. | 3.37 |
Java, Spark, SQL | Big data processing with Spark and SQL. | 3.21 |
Python, Flask, Docker | Web development and containerization with Flask and Docker. | 3.06 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gurcan, F.; Soylu, A.; Khan, A.Q. Towards a Sustainable Workforce in Big Data Analytics: Skill Requirements Analysis from Online Job Postings Using Neural Topic Modeling. Sustainability 2025, 17, 9293. https://doi.org/10.3390/su17209293
Gurcan F, Soylu A, Khan AQ. Towards a Sustainable Workforce in Big Data Analytics: Skill Requirements Analysis from Online Job Postings Using Neural Topic Modeling. Sustainability. 2025; 17(20):9293. https://doi.org/10.3390/su17209293
Chicago/Turabian StyleGurcan, Fatih, Ahmet Soylu, and Akif Quddus Khan. 2025. "Towards a Sustainable Workforce in Big Data Analytics: Skill Requirements Analysis from Online Job Postings Using Neural Topic Modeling" Sustainability 17, no. 20: 9293. https://doi.org/10.3390/su17209293
APA StyleGurcan, F., Soylu, A., & Khan, A. Q. (2025). Towards a Sustainable Workforce in Big Data Analytics: Skill Requirements Analysis from Online Job Postings Using Neural Topic Modeling. Sustainability, 17(20), 9293. https://doi.org/10.3390/su17209293