Future-Ready Skills Across Big Data Ecosystems: Insights from Machine Learning-Driven Human Resource Analytics
Abstract
:1. Introduction
2. Background and Related Work
3. Materials and Methods
3.1. Data Collection and Preprocessing
3.2. Data Analysis and Interpretation
4. Results
4.1. Identification of Expertise Roles
4.2. Identification of Skill Sets
4.3. Taxonomy of the Skill Sets by Competency Areas
4.4. Mapping Skill Sets to Expertise Roles
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chen, H.; Chiang, R.H.L.; Storey, V.C. Business Intelligence and Analytics: From Big Data to Big Impact. Mis Q. 2012, 36, 1165–1188. [Google Scholar] [CrossRef]
- Philip Chen, C.L.; Zhang, C.-Y. Data-Intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data. Inf. Sci. 2014, 275, 314–347. [Google Scholar] [CrossRef]
- Debortoli, S.; Müller, O.; Vom Brocke, J. Comparing Business Intelligence and Big Data Skills: A Text Mining Study Using Job Advertisements. Bus. Inf. Syst. Eng. 2014, 6, 289–300. [Google Scholar] [CrossRef]
- De Mauro, A.; Greco, M.; Grimaldi, M.; Ritala, P. Human Resources for Big Data Professions: A Systematic Classification of Job Roles and Required Skill Sets. Inf. Process. Manag. 2018, 54, 807–817. [Google Scholar] [CrossRef]
- Gurcan, F.; Cagiltay, N.E. Big Data Software Engineering: Analysis of Knowledge Domains and Skill Sets Using LDA-Based Topic Modeling. IEEE Access 2019, 7, 82541–82552. [Google Scholar] [CrossRef]
- Halwani, M.A.; Amirkiaee, S.Y.; Evangelopoulos, N.; Prybutok, V. Job Qualifications Study for Data Science and Big Data Professions. Inf. Technol. People 2022, 35, 510–525. [Google Scholar] [CrossRef]
- Kantardzic, M. Data Mining: Concepts, Models, Methods, and Algorithms, 2nd ed.; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2011; ISBN 9780470890455 (ISBN). [Google Scholar]
- Gardiner, A.; Aasheim, C.; Rutner, P.; Williams, S. Skill Requirements in Big Data: A Content Analysis of Job Advertisements. J. Comput. Inf. Syst. 2018, 58, 374–384. [Google Scholar] [CrossRef]
- Deb, D.; Fuad, M. Integrating Big Data and Cloud Computing Topics into the Computing Curricula: A Modular Approach. J. Parallel Distrib. Comput. 2021, 157, 303–315. [Google Scholar] [CrossRef]
- Persaud, A. Key Competencies for Big Data Analytics Professions: A Multimethod Study. Inf. Technol. People 2021, 34, 178–203. [Google Scholar] [CrossRef]
- Verma, A.; Yurov, K.M.; Lane, P.L.; Yurova, Y.V. An Investigation of Skill Requirements for Business and Data Analytics Positions: A Content Analysis of Job Advertisements. J. Educ. Bus. 2019, 94, 243–250. [Google Scholar] [CrossRef]
- Miller, S. Collaborative Approaches Needed to Close the Big Data Skills Gap. J. Organ. Des. 2014, 3, 26. [Google Scholar] [CrossRef]
- Gandomi, A.; Haider, M. Beyond the Hype: Big Data Concepts, Methods, and Analytics. Int. J. Inf. Manag. 2015, 35, 137–144. [Google Scholar] [CrossRef]
- Debao, D.; Yinxia, M.; Min, Z. Analysis of Big Data Job Requirements Based on K-Means Text Clustering in China. PLoS ONE 2021, 16, e0255419. [Google Scholar] [CrossRef]
- Najafabadi, M.M.; Villanustre, F.; Khoshgoftaar, T.M.; Seliya, N.; Wald, R.; Muharemagic, E. Deep Learning Applications and Challenges in Big Data Analytics. J. Big Data 2015, 2, 1. [Google Scholar] [CrossRef]
- Radovilsky, Z.; Hegde, V.; Acharya, A.; Uma, U. Skills Requirements of Business Data Analytics and Data Science Jobs: A Comparative Analysis. J. Supply Chain Oper. Manag. 2018, 16, 82–101. [Google Scholar]
- Gurcan, F. Extraction of Core Competencies for Big Data: Implications for Competency-Based Engineering Education. Int. J. Eng. Educ. 2019, 35, 1110–1115. [Google Scholar]
- Indeed. Indeed Job Search. Available online: https://www.indeed.com/ (accessed on 16 January 2024).
- Ozyurt, O.; Gurcan, F.; Dalveren, G.G.M.; Derawi, M. Career in Cloud Computing: Exploratory Analysis of In-Demand Competency Areas and Skill Sets. Appl. Sci. 2022, 12, 9787. [Google Scholar] [CrossRef]
- Montandon, J.E.; Politowski, C.; Silva, L.L.; Valente, M.T.; Petrillo, F.; Guéhéneuc, Y.G. What Skills Do IT Companies Look for in New Developers? A Study with Stack Overflow Jobs. Inf. Softw. Technol. 2021, 129, 106429. [Google Scholar] [CrossRef]
- Gurcan, F. Major Research Topics in Big Data: A Literature Analysis from 2013 to 2017 Using Probabilistic Topic Models. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing, IDAP 2018, Malatya, Turkey, 28–30 September 2018; pp. 1–4. [Google Scholar]
- Řehůřek, R.; Sojka, P. Gensim—Statistical Semantics in Python; Masaryk University: Brno, Czech Republic, 2011; Volume 6611. [Google Scholar]
- Ningrum, P.K.; Pansombut, T.; Ueranantasun, A. Text Mining of Online Job Advertisements to Identify Direct Discrimination during Job Hunting Process: A Case Study in Indonesia. PLoS ONE 2020, 15, e0233746. [Google Scholar] [CrossRef]
- Uysal, A.K.; Gunal, S. The Impact of Preprocessing on Text Classification. Inf. Process. Manag. 2014, 50, 104–112. [Google Scholar] [CrossRef]
- Murakami, R.; Chakraborty, B. Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts. Sensors 2022, 22, 852. [Google Scholar] [CrossRef] [PubMed]
- Calanca, F.; Sayfullina, L.; Minkus, L.; Wagner, C.; Malmi, E. Responsible Team Players Wanted: An Analysis of Soft Skill Requirements in Job Advertisements. EPJ Data Sci. 2019, 8, 13. [Google Scholar] [CrossRef]
- Blei, D.M. Probabilistic Topic Models. Commun. ACM 2012, 55, 77–84. [Google Scholar] [CrossRef]
- Subakti, A.; Murfi, H.; Hariadi, N. The Performance of BERT as Data Representation of Text Clustering. J. Big Data 2022, 9, 15. [Google Scholar] [CrossRef]
- Egger, R.; Yu, J. A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts. Front. Sociol. 2022, 7, 886498. [Google Scholar] [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar] [CrossRef]
- Blei, D.M.; Lafferty, J.D. Correction: A Correlated Topic Model of Science. Ann. Appl. Stat. 2007, 1, 634. [Google Scholar] [CrossRef]
- Gurcan, F.; Erdogdu, F.; Cagiltay, N.E.; Cagiltay, K. Student Engagement Research Trends of Past 10 Years: A Machine Learning-Based Analysis of 42,000 Research Articles. Educ. Inf. Technol. 2023, 28, 15067–15091. [Google Scholar] [CrossRef]
- Alibasic, A.; Upadhyay, H.; Simsekler, M.C.E.; Kurfess, T.; Woon, W.L.; Omar, M.A. Evaluation of the Trends in Jobs and Skill-Sets Using Data Analytics: A Case Study. J. Big Data 2022, 9, 32. [Google Scholar] [CrossRef]
- Mimno, D.; Wallach, H.M.; Talley, E.; Leenders, M.; McCallum, A. Optimizing Semantic Coherence in Topic Models. In Proceedings of the EMNLP 2011—Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, 27–29 July 2011. [Google Scholar]
- Katsanos, C.; Avouris, N.; Stamelos, I.; Tselios, N.; Demetriadis, S.; Angelis, L. Cross-Study Reliability of the Open Card Sorting Method. In Proceedings of the Conference on Human Factors in Computing Systems—Proceedings, Glasgow, Scotland, 4–9 May 2019. [Google Scholar]
- Han, F.; Ren, J. Analyzing Big Data Professionals: Cultivating Holistic Skills Through University Education and Market Demands. IEEE Access 2024, 12, 23568–23577. [Google Scholar] [CrossRef]
- Kılınç, M.; Aydın, C.; Tarhan, Ç. Do Machine Learning and Business Analytics Approaches Answer the Question of ‘Will Your Kickstarter Project Be Successful? Istanbul Bus. Res. 2021, 50, 255–274. [Google Scholar] [CrossRef]
- Gurcan, F. What Are Developers Talking about Information Security? A Large-Scale Study Using Semantic Analysis of Q&A Posts. PeerJ Comput. Sci. 2024, 10, e1954. [Google Scholar] [CrossRef]
- Bonesso, S.; Bruni, E.; Gerli, F. How Big Data Creates New Job Opportunities: Skill Profiles of Emerging Professional Roles. In Behavioral Competencies of Digital Professionals; Springer: Berlin/Heidelberg, Germany, 2020; pp. 21–39. [Google Scholar]
- Wowczko, I.A. Skills and Vacancy Analysis with Data Mining Techniques. Informatics 2015, 2, 31–49. [Google Scholar] [CrossRef]
- Karakolis, E.; Kapsalis, P.; Skalidakis, S.; Kontzinos, C.; Kokkinakos, P.; Markaki, O.; Askounis, D. Bridging the Gap between Technological Education and Job Market Requirements through Data Analytics and Decision Support Services. Appl. Sci. 2022, 12, 7139. [Google Scholar] [CrossRef]
- Yang, C.; Huang, Q.; Li, Z.; Liu, K.; Hu, F. Big Data and Cloud Computing: Innovation Opportunities and Challenges. Int. J. Digit. Earth 2017, 10, 13–53. [Google Scholar] [CrossRef]
- Boselli, R.; Cesarini, M.; Mercorio, F.; Mezzanzanica, M. Classifying Online Job Advertisements through Machine Learning. Futur. Gener. Comput. Syst. 2018, 2, 31–49. [Google Scholar] [CrossRef]
- Gurcan, F. What Issues Are Data Scientists Talking about? Identification of Current Data Science Issues Using Semantic Content Analysis of Q&A Communities. PeerJ Comput. Sci. 2023, 9, e1361. [Google Scholar] [CrossRef]
- Aljohani, N.R.; Aslam, M.A.; Khadidos, A.O.; Hassan, S.U. A Methodological Framework to Predict Future Market Needs for Sustainable Skills Management Using AI and Big Data Technologies. Appl. Sci. 2022, 12, 6898. [Google Scholar] [CrossRef]
- Moreno, A.M.; Sanchez-Segura, M.I.; Medina-Dominguez, F.; Carvajal, L. Balancing Software Engineering Education and Industrial Needs. J. Syst. Softw. 2012, 85, 1607–1620. [Google Scholar] [CrossRef]
Role | Related Titles | Rate (%) |
---|---|---|
Developer | Big Data Developer; Big Data Software Engineer; Big Data Spark Developer; Big Data Hadoop Developer; Java Big Data Developer | 34.69 |
Engineer | Big Data Engineer; Senior Big Data Engineer; Lead Big Data Engineer; Principal Big Data Engineer; Big Data Platform Engineer | 33.99 |
Architect | Big Data Architect; Big Data Solution Architect; Google Cloud Big Data Architect; Senior Big Data Architect | 10.48 |
Analyst | Big Data Analyst; Big Data Analytics; Business Analyst; BI analyst; AWS Big Data Analytics | 9.05 |
Manager | Big Data Lead; Big Data Technical Lead; Big Data Program Manager; Big Data Solution Manager; Big Data Product Manager | 5.18 |
Administrator | Big Data DBA; Big Data Admin; Big Data Administrator; Big Data Hadoop Administrator; Big Data Platform Administrator | 3.39 |
Consultant | Big Data Consultant; Big Data Hadoop Consultant; Big Data Solution Consultant; Big Data Technology Consultant; AWS Big Data Consultant | 3.21 |
Topic Name | Keywords | % |
---|---|---|
Big Data Processing | data pipeline process build engineer platform analytic etl lake develop | 6.82 |
Big Data Tools | spark hadoop hive hbase java knowledge scala developer python sql | 5.93 |
Communication Skills | skill strong ability communication write environment excellent problem good service | 5.02 |
Remote Development | remote developer warehouse software reliably telecommuting connect good location part-time | 4.69 |
Big Data Architecture | solution design architecture architect technical pipeline system structure enterprise implement | 4.64 |
Programming Languages | software solution development programming tool practice java python language scala | 4.38 |
Agile Development | software development design agile scrum product modeling team cross-functional customer | 4.03 |
Information Security | information safety security privacy data financial banking risk prevent threat | 3.75 |
Project Management | project management technical plan identify lead manage process manager program | 3.38 |
Scalable Systems | scalability platform engineering engineer deliver structure scale service analytic healthcare | 3.28 |
Amazon EMR | amazon system design software service emr aws engineer development distribute | 3.23 |
Data Analytics | data analysis model business develop tool sql quality analytic knowledge | 3.16 |
Business Applications | application development business support integration deploy delivery software agility design | 3.09 |
Analytical Skills | analysis analytics critical prediction capability inferential analytical problem-solving report | 3.05 |
Business Analytics | business analytics drive product customer leadership partner organization role strategy | 3.01 |
Google Cloud | cloud platform build google service gcp infrastructure kubernete docker engine | 2.99 |
Technical Knowledge | technical knowledge expert strong skill background look grow diversity professional | 2.98 |
Database | database nosql sql stream management system distribute relational kafka mongodb | 2.93 |
Bachelor Degree | science computer degree engineering relate field security system minimum qualification | 2.73 |
AWS Data Services | aws cloud redshift sql emr python tool lambda glue engineer | 2.53 |
Distributed Systems | distribute apache system kafka process parallel hdfs storm computing oracle | 2.39 |
Data Warehousing | warehouse data warehousing process storage repository store tool model server | 2.33 |
Hadoop Ecosystem | system hadoop performance issue cluster support database infrastructure security admin | 2.28 |
Machine Learning | learn machine data science ml build model algorithm scientist learning | 2.25 |
Troubleshooting | customer support service troubleshooting aws help technical engineer application amazon | 2.19 |
Application Development | application develop process system software programming platform qualify language code | 2.11 |
Data Streaming | data processing spark hadoop kafka real-time frameworks streaming apache storm | 1.94 |
Testing | test testing code quality design unit automation etl qa agile | 1.94 |
Data Visualization | data visual report visualization graph view time tableau chart infogram design | 1.78 |
Azure Cloud | client azure solution consult delivery consultant databrick professional microsoft sql | 1.77 |
Team Working | teamwork lead member join independently collaborate member contact solidarity | 1.76 |
Decision-making | data business work decision-making core key judgment successful system | 1.63 |
Topics (Skills) | Developer | Engineer | Architect | Analyst | Manager | Administrator | Consultant | Rate |
---|---|---|---|---|---|---|---|---|
Big Data Processing | 1.90 | 3.26 | 0.90 | 0.29 | 0.26 | 0.10 | 0.11 | 6.82 |
Big Data Tools | 2.93 | 1.88 | 0.50 | 0.09 | 0.22 | 0.16 | 0.15 | 5.93 |
Communication Skills | 1.93 | 1.69 | 0.46 | 0.29 | 0.30 | 0.12 | 0.22 | 5.02 |
Remote Development | 2.17 | 1.55 | 0.45 | 0.12 | 0.17 | 0.11 | 0.11 | 4.69 |
Big Data Architecture | 1.01 | 1.46 | 1.42 | 0.17 | 0.23 | 0.17 | 0.18 | 4.64 |
Programming Languages | 2.22 | 1.43 | 0.30 | 0.16 | 0.12 | 0.08 | 0.07 | 4.38 |
Agile Development | 1.88 | 1.43 | 0.28 | 0.19 | 0.14 | 0.07 | 0.05 | 4.03 |
Information Security | 2.01 | 0.88 | 0.28 | 0.16 | 0.11 | 0.12 | 0.19 | 3.75 |
Project Management | 0.92 | 1.00 | 0.35 | 0.26 | 0.55 | 0.06 | 0.23 | 3.38 |
Scalable Systems | 1.20 | 1.37 | 0.20 | 0.23 | 0.17 | 0.05 | 0.07 | 3.28 |
Amazon EMR | 1.86 | 0.47 | 0.41 | 0.19 | 0.06 | 0.01 | 0.22 | 3.23 |
Data Analytics | 0.74 | 1.56 | 0.27 | 0.39 | 0.10 | 0.03 | 0.07 | 3.16 |
Business Applications | 1.17 | 1.00 | 0.29 | 0.17 | 0.24 | 0.07 | 0.15 | 3.09 |
Analytical Skills | 0.61 | 0.27 | 0.35 | 1.15 | 0.26 | 0.23 | 0.17 | 3.05 |
Business Analytics | 0.65 | 0.36 | 0.47 | 1.04 | 0.30 | 0.04 | 0.16 | 3.01 |
Google Cloud | 0.81 | 1.28 | 0.53 | 0.10 | 0.08 | 0.05 | 0.14 | 2.99 |
Technical Knowledge | 1.08 | 1.15 | 0.26 | 0.20 | 0.12 | 0.07 | 0.10 | 2.98 |
Database | 1.17 | 1.22 | 0.27 | 0.08 | 0.09 | 0.05 | 0.05 | 2.93 |
Bachelor Degree | 0.81 | 1.19 | 0.19 | 0.27 | 0.13 | 0.08 | 0.07 | 2.73 |
AWS Data Services | 0.81 | 1.20 | 0.28 | 0.07 | 0.09 | 0.04 | 0.04 | 2.53 |
Distributed Systems | 0.70 | 0.97 | 0.16 | 0.17 | 0.14 | 0.12 | 0.13 | 2.39 |
Data Warehousing | 0.43 | 1.03 | 0.26 | 0.05 | 0.06 | 0.47 | 0.02 | 2.33 |
Hadoop Ecosystem | 0.51 | 0.83 | 0.25 | 0.15 | 0.05 | 0.46 | 0.04 | 2.28 |
Machine Learning | 0.71 | 0.50 | 0.08 | 0.76 | 0.05 | 0.02 | 0.13 | 2.25 |
Troubleshooting | 0.34 | 0.96 | 0.14 | 0.30 | 0.07 | 0.34 | 0.05 | 2.19 |
Application Development | 1.11 | 0.64 | 0.12 | 0.09 | 0.08 | 0.04 | 0.04 | 2.11 |
Data Streaming | 0.34 | 0.96 | 0.34 | 0.10 | 0.07 | 0.08 | 0.05 | 1.94 |
Testing | 0.94 | 0.69 | 0.07 | 0.08 | 0.09 | 0.03 | 0.04 | 1.94 |
Data Visualization | 0.52 | 0.12 | 0.06 | 0.87 | 0.17 | 0.02 | 0.02 | 1.78 |
Azure Cloud | 0.40 | 0.65 | 0.33 | 0.10 | 0.15 | 0.03 | 0.11 | 1.77 |
Team Working | 0.56 | 0.85 | 0.09 | 0.11 | 0.07 | 0.05 | 0.03 | 1.76 |
Decision-making | 0.24 | 0.12 | 0.14 | 0.65 | 0.42 | 0.03 | 0.02 | 1.63 |
Topics (Skills) | Developer | Engineer | Architect | Analyst | Manager | Administrator | Consultant | Total |
---|---|---|---|---|---|---|---|---|
Communication Skills | 5 | 3 | 6 | 9 | 3 | 7 | 3 | 7 |
Big Data Tools | 1 | 2 | 4 | 9 | 6 | 9 | 6 | |
Analytical Skills | 10 | 1 | 5 | 4 | 6 | 5 | ||
Big Data Architecture | 6 | 1 | 8 | 5 | 5 | 5 | ||
Big Data Processing | 6 | 1 | 2 | 8 | 6 | 5 | ||
Remote Development | 3 | 5 | 7 | 10 | 4 | |||
Business Analytics | 5 | 2 | 4 | 7 | 4 | |||
Business Applications | 10 | 7 | 8 | 3 | ||||
Google Cloud | 10 | 3 | 10 | 3 | ||||
Amazon EMR | 8 | 8 | 2 | 3 | ||||
Information Security | 4 | 8 | 4 | 3 | ||||
Project Management | 9 | 1 | 1 | 3 | ||||
Scalable Systems | 9 | 9 | 2 | |||||
Agile Development | 7 | 8 | 2 | |||||
Data Visualization | 3 | 10 | 2 | |||||
Troubleshooting | 7 | 3 | 2 | |||||
Data Analytics | 4 | 6 | 2 | |||||
Programming Languages | 2 | 7 | 2 | |||||
Decision-making | 5 | 2 | 2 | |||||
Bachelor Degree | 10 | 1 | ||||||
Distributed Systems | 9 | 1 | ||||||
Machine Learning | 4 | 1 | ||||||
Hadoop Ecosystem | 2 | 1 | ||||||
Data Warehousing | 1 | 1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gurcan, F.; Gudek, B.; Menekse Dalveren, G.G.; Derawi, M. Future-Ready Skills Across Big Data Ecosystems: Insights from Machine Learning-Driven Human Resource Analytics. Appl. Sci. 2025, 15, 5841. https://doi.org/10.3390/app15115841
Gurcan F, Gudek B, Menekse Dalveren GG, Derawi M. Future-Ready Skills Across Big Data Ecosystems: Insights from Machine Learning-Driven Human Resource Analytics. Applied Sciences. 2025; 15(11):5841. https://doi.org/10.3390/app15115841
Chicago/Turabian StyleGurcan, Fatih, Beyza Gudek, Gonca Gokce Menekse Dalveren, and Mohammad Derawi. 2025. "Future-Ready Skills Across Big Data Ecosystems: Insights from Machine Learning-Driven Human Resource Analytics" Applied Sciences 15, no. 11: 5841. https://doi.org/10.3390/app15115841
APA StyleGurcan, F., Gudek, B., Menekse Dalveren, G. G., & Derawi, M. (2025). Future-Ready Skills Across Big Data Ecosystems: Insights from Machine Learning-Driven Human Resource Analytics. Applied Sciences, 15(11), 5841. https://doi.org/10.3390/app15115841