Review of Maturity Models for Data Mining and Proposal of a Data Preparation Maturity Model Prototype for Data Mining
Abstract
:1. Introduction
2. Searching for Data Mining Maturity Models
2.1. Overview of Maturity Models in the Field of Data Mining
2.2. Maturity Models and Process Models for Data Mining
2.3. Process Mining Maturity Models
2.4. Maturity Models for Machine Learning
2.5. Maturity Models for Data
2.6. Ethic Maturity Models
3. Data Preparation for Data Mining
3.1. Overview of Data Prepararion in the Context of Data Mining
3.2. Data Integration
3.3. Data Cleansing
3.4. Data Transformation
3.5. Data Reduction
3.6. Models Targeting Data Preparation
4. Maturity Model Implications for Data Preparation
4.1. Description of the Data Preparation Pipeline
4.2. Influences on the Data Preparation Pipeline
4.2.1. General Influences on the Data Preparation Pipeline
4.2.2. Direct Influences on the Data Preparation Pipeline
4.2.3. Indirect Influences on the Data Preparation Pipeline
4.3. Derivation of Influence Groups
4.4. Requirements for the Data Preparation Maturity Model Prototype
4.5. Prototype of the Data Preparation Maturity Model
- The dimensions of domain knowledge and the analysis question could be relevant for target definition.
- The dimensions of data, technical infrastructure, decisions made for data collection, domain knowledge, and analysis question could be relevant for data selection.
- The dimensions of data in the sense of the prepared data, technical infrastructure input requirements for data mining, data mining knowledge, domain knowledge, and analysis question could be relevant for data mining.
- The dimensions of data mining knowledge, domain knowledge, and the analysis question could be relevant for the post-processing phase.
5. Discussion of the Necessity of a New Maturity Model for Data Mining
6. Conclusions and Outlook
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial intelligence |
CMMI | Capability Maturity Model Integration |
GDPR | General Data Protection Regulation |
KDD | Knowledge discovery in databases |
PM | Process mining |
References
- Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P. From Data Mining to Knowledge Discovery in Databases. AI Mag. 1996, 17, 37–54. [Google Scholar]
- Scheidler, A.A.; Rabe, M. Integral Verification and Validation for Knowledge Discovery Procedure Models. Int. J. Bus. Intell. Data Min. 2021, 18, 73–87. [Google Scholar] [CrossRef]
- García, S.; Ramírez-Gallego, S.; Luengo, J.; Benítez, J.M.; Herrera, F. Big Data Preprocessing: Methods and Prospects. Big Data Anal. 2016, 1, 9. [Google Scholar] [CrossRef]
- Moges, H.-T.; van Vlasselaer, V.; Lemahieu, W.; Baesens, B. Determining the Use of Data Quality Metadata (DQM) for Decision Making Purposes and its Impact on Decision Outcomes—An Exploratory Study. Decis. Support Syst. 2016, 83, 32–46. [Google Scholar] [CrossRef]
- Sonntag, M.; Mehmann, S.; Mehmann, J.; Teuteberg, F. Development and Evaluation of a Maturity Model for AI Deployment Capability of Manufacturing Companies. Inf. Sys. Manag. 2024, 42, 37–67. [Google Scholar] [CrossRef]
- Sadiq, R.B.; Safie, N.; Abd Rahman, A.H.; Goudarzi, S. Artificial Intelligence Maturity Model: A Systematic Literature Review. PeerJ Comput. Sci. 2021, 7, e661. [Google Scholar] [CrossRef]
- ISO/TR 13054:2012; International Organization for Standardization. Knowledge Management of Health Information Standards. ISO: Geneva, Switzerland, 2012.
- Schumacher, A.; Erol, S.; Sihn, W. A Maturity Model for Assessing Industry 4.0 Readiness and Maturity of Manufacturing Enterprises. Procedia CIRP 2016, 52, 161–166. [Google Scholar] [CrossRef]
- Becker, J.; Knackstedt, R.; Pöppelbuß, J. Developing Maturity Models for IT Management. Bus. Inf. Syst. Eng. 2009, 1, 213–222. [Google Scholar] [CrossRef]
- Han, J.; Pei, J.; Tong, H. Data Mining: Concepts and Techniques, 4th ed.; Elsevier: Cambridge, MA, USA, 2023; ISBN 978-0-128-11760-6. [Google Scholar]
- Accenture. The Art of AI Maturity: Advancing from Practice to Performance. 2022. Available online: https://www.accenture.com/content/dam/accenture/final/a-com-migration/manual/r3/pdf/pdf-4/Accenture-Art-of-AI-Maturity-Report.pdf#zoom=40 (accessed on 10 June 2024).
- Redaktion DIGITAL X. Digital Maturity Model: Company Digitisation Put to the Test. Available online: https://www.digital-x.eu/en/magazine/article/dx-xplain/digital-maturity-model (accessed on 1 July 2024).
- Britze, N.; Schulze, A.; Fenge, K.; Woltering, M.; Gross, M.; Menge, F.; Mucke, A.; Ensinger, A.; Keller, H.; Oldenburg, L.; et al. Reifegradmodell Digitale Geschäftsprozesse. 2022. Available online: https://www.bitkom.org/sites/main/files/2020-04/200406_lf_reifegradmodell_digitale-geschaftsprozesse_final.pdf (accessed on 6 March 2025).
- Schuster, T.; Waidelich, L.; Volz, R. Reifegradmodelle zur Bewertung Künstlicher Intelligenz in kleinen und mittleren Unternehmen. In Proceedings of Informatik 2021, Berlin, Germany, 27 September–1 October 2021; Gesellschaft für Informatik: Bonn, Germany, 2021; pp. 1237–1246. [Google Scholar]
- Abdullah, M.F.; Ahmad, K. Business Intelligence Model for Unstructured Data Management. In Proceedings of the 2015 International Conference on Electrical Engineering and Informatics. ICEEI, Legian-Bali, Indonesia, 4 October–4 November 2015; IEEE: New York, NY, USA, 2015; pp. 473–477, ISBN 978-1-4673-7319-7. [Google Scholar]
- Hein-Pensel, F.; Winkler, H.; Brückner, A.; Wölke, M.; Jabs, I.; Mayan, I.J.; Kirschenbaum, A.; Friedrich, J.; Zinke-Wehlmann, C. Maturity Assessment for Industry 5.0: A Review of Existing Maturity Models. J. Manuf. Syst. 2023, 66, 200–210. [Google Scholar] [CrossRef]
- Grossman, R.L. A Framework for Evaluating the Analytic Maturity of an Organization. Int. J. Inf. Manag. 2018, 38, 45–51. [Google Scholar] [CrossRef]
- Davenport, T.H.; Harris, J.G.; Morison, R. Analytics at Work; Harvard Business Review Press: Brighton, UK, 2010; ISBN 978-1-422-15712-1. [Google Scholar]
- Lismont, J.; Vanthienen, J.; Baesens, B.; Lemahieu, W. Defining Analytics Maturity Indicators: A Survey Approach. Int. J. Inf. Manag. 2017, 37, 114–124. [Google Scholar] [CrossRef]
- Cosic, R.; Shanks, G.; Maynard, S. Towards a Business Analytics Capability Maturity Model. In Proceedings of the ACIS 2012 Proceedings, Melbourne, Australia, 3–5 December 2012; Volume 14. [Google Scholar]
- Król, K.; Zdonek, D. Analytics Maturity Models: An Overview. Information 2020, 11, 142. [Google Scholar] [CrossRef]
- Azevedo, A.; Santos, M.F. KDD, SEMMA and CRISP-DM: A Parallel Overview. In Proceedings of the IADIS European Conference on Data Mining. IADIS 2008, Amsterdam, The Netherlands, 22–27 July 2008; Abraham, A.P., Ed.; pp. 182–185, ISBN 978-972-8924-63-8. [Google Scholar]
- Wirth, R.; Hipp, J. CRISP-DM: Towards a Standard Process Model for Data Mining. In Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, Manchester, UK, 11–13 April 2000; Mackin, N., Ed.; Practical Application Company: Blackpool, UK, 2000; pp. 29–39, ISBN 1902426088. [Google Scholar]
- Jacobi, C.; Meier, M.; Herborn, L.; Furmans, K. Maturity Model for Applying Process Mining in Supply Chains: Literature Overview and Practical Implications. Logist. J. Proc. 2020, 2020, 1–16. [Google Scholar] [CrossRef]
- Samalikova, J.; Kusters, R.J.; Trienekens, J.J.M.; Weijters, A.J.M.M. Process Mining Support for Capability Maturity Model Integration-Based Software Process Assessment, in Principle and in Practice. J. Softw. Evol. Process. 2014, 26, 714–728. [Google Scholar] [CrossRef]
- Heap. The Four Stages of Data Maturity—And How to Ace Them. Available online: https://www.heap.io/blog/the-four-stages-of-data-maturity (accessed on 7 March 2024).
- Curry, E.; Tuikka, T. An Organizational Maturity Model for Data Spaces: A Data Sharing Wheel Approach. In Data Spaces: Design, Deployment and Future Directions; Curry, E., Scerri, S., Tuikka, T., Eds.; Springer Nature: Cham, Switzerland, 2022; pp. 21–42. ISBN 978-3-030-98636-0. [Google Scholar]
- Belghith, O.; Skhiri, S.; Zitoun, S.; Ferjaoui, S. A Survey of Maturity Models in Data Management. In Proceedings of the 2021 IEEE 12th International Conference on Mechanical and Intelligent Manufacturing Technologies. ICMIMT, Cape Town, South Africa, 13–15 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 298–309, ISBN 978-1-6654-1453-1. [Google Scholar]
- Alation. Discover the Depth of Your Data Culture Maturity. Available online: https://www.alation.com/dcmm-assessment/ (accessed on 10 June 2024).
- Snowplow Team. The Snowplow Data Maturity Model. Available online: https://snowplow.io/blog/the-snowplow-data-maturity-model/ (accessed on 10 June 2024).
- Office of Data Governance. Data Management Maturity Model. Available online: https://www.dol.gov/agencies/odg/data-management-maturity-model (accessed on 10 June 2024).
- Krijger, J.; Thuis, T.; de Ruiter, M.; Ligthart, E.; Broekman, I. The AI Ethics Maturity Model: A Holistic Approach to Advancing Ethical Data Science in Organizations. AI Ethics 2023, 3, 355–367. [Google Scholar] [CrossRef]
- Vakkuri, V.; Jantunen, M.; Halme, E.; Kemell, K.-K.; Nguyen-Duc, A.; Mikkonen, T.; Abrahamsson, P. Time for AI (Ethics) Maturity Model Is Now. In Proceedings of the Workshop on Artificial Intelligence Safety 2021, SafeAI 2021, Virtual, 8 February 2021; Espinoza, H., McDermid, J., Huang, X., Castillo-Effen, M., Chen, X.C., Hernández-Orallo, J., Ó hÉigeartaigh, S., Mallah, R., Eds.; CEUR Workshop Proceedings: Aachen, Germany, 2021. [Google Scholar]
- GDPR EU. GDPR: General Data Protection Regulation. Available online: https://www.gdpreu.org/ (accessed on 6 March 2025).
- Cortina, S.; Valoggia, P.; Barafort, B.; Renault, A. Designing a Data Protection Process Assessment Model Based on the GDPR. In Systems, Software and Services Process Improvement, Proceedings of the European Conference on Software Process Improvement, Edinburgh, UK, 18–20 September 2019; Walker, A., O’Connor, R.V., Messnarz, R., Eds.; Springer: Cham, Switzerland, 2019; pp. 136–148. ISBN 978-3-030-28005-5. [Google Scholar]
- Future of Life Institute. The EU Artificial Intelligence Act. Available online: https://artificialintelligenceact.eu/ (accessed on 6 March 2025).
- Tawakuli, A.; Engel, T. Make your Data Fair: A Survey of Data Preprocessing Techniques that Address Biases in Data Towards Fair AI. J. Eng. Res. 2024; in press. [Google Scholar] [CrossRef]
- Huang, J.; Galal, G.; Etemadi, M.; Vaidyanathan, M. Evaluation and Mitigation of Racial Bias in Clinical Machine Learning Models: Scoping Review. JMIR Med. Inform. 2022, 10, e36388. [Google Scholar] [CrossRef]
- Nazer, L.H.; Zatarah, R.; Waldrip, S.; Ke, J.X.C.; Moukheiber, M.; Khanna, A.K.; Hicklen, R.S.; Moukheiber, L.; Moukheiber, D.; Ma, H.; et al. Bias in Artificial Intelligence Algorithms and Recommendations for Mitigation. PLoS Digit. Health 2023, 2, e0000278. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, J.M.; Sarro, F.; Harman, M. A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning Classifiers. ACM Trans. Softw. Eng. Methodol. 2023, 32, 106. [Google Scholar] [CrossRef]
- Akbarighatar, P.; Pappas, I.; Vassilakopoulou, P. A Sociotechnical Perspective for Responsible AI Maturity Models: Findings from a Mixed-method Literature Review. Int. J. Inf. Manag. Data Insights 2023, 3, 100193. [Google Scholar] [CrossRef]
- Mylrea, M.; Robinson, N. Artificial Intelligence (AI) Trust Framework and Maturity Model: Applying an Entropy Lens to Improve Security, Privacy, and Ethical AI. Entropy 2023, 25, 1429. [Google Scholar] [CrossRef] [PubMed]
- Reuel, A.; Connolly, P.; Meimandi, K.J.; Tewari, S.; Wiatrak, J.; Venkatesh, D.; Kochenderfer, M. Responsible AI in the Global Context: Maturity Model and Survey. Comput. Soc. 2024, arXiv:2410.09985. [Google Scholar] [CrossRef]
- García, S.; Luengo, J.; Herrera, F. Data Preprocessing in Data Mining; Springer International: Cham, Switzerland, 2015; ISBN 978-3-319-10246-7. [Google Scholar]
- Restat, V.; Klettke, M.; Störl, U. Towards a Holistic Data Preparation Tool. In Proceedings of the Workshop Proceedings of the EDBT/ICDT2022 Joint Conference. EDBT/ICDT2022, Edinburgh, UK, 29 March 2022; Ramanath, M., Palpanas, T., Eds.; CEUR Workshop Proceedings: Aachen, Germany, 2022. [Google Scholar]
- Acito, F. Predictive Analytics with KNIME: Analytics for Citizen Data Scientists; Springer Nature: Cham, Switzerland, 2023; ISBN 978-3-031-45629-9. [Google Scholar]
- Mazilu, L.; Paton, N.W.; Konstantinou, N.; Fernandes, A.A.A. Fairness-aware Data Integration. J. Data Inf. Qual. 2022, 14, 28. [Google Scholar] [CrossRef]
- Azzalini, F. Data Integration and Ethical Quality: Fundamental Steps of the Data Analysis Pipeline. Ph.D. Thesis, Polytechnic University of Milan, Milan, Italy, 2022. [Google Scholar]
- Abdallah, Z.S.; Du, L.; Webb, G.I. Data Preparation. In Encyclopedia of Machine Learning and Data Mining, 2nd ed.; Sammut, C., Webb, G.I., Eds.; Springer Nature: New York, NY, USA, 2017; pp. 318–327. ISBN 978-1-4899-7687-1. [Google Scholar]
- Ilyas, I.F.; Chu, X. Data Cleaning; Association for Computing Machinery: New York, NY, USA, 2019; ISBN 978-1-450-37155-1. [Google Scholar]
- Teng, C.M. A Comparison of Noise Handling Techniques. In Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference, Key West, FL, USA, 21–23 May 2001; Russel, I., Kolen, J.F., Eds.; AAAI Press: Washington, DC, USA, 2001; pp. 269–273, ISBN 978-1-57735-133-7. [Google Scholar]
- Xia, W.; Jiang, H.; Feng, D.; Douglis, F.; Shilane, P.; Hua, Y.; Fu, M.; Zhang, Y.; Zhou, Y. A Comprehensive Study of the Past, Present, and Future of Data Deduplication. Proc. IEEE 2016, 104, 1681–1710. [Google Scholar] [CrossRef]
- Allison, P.D. Missing Data. In The SAGE Handbook of Quantitative Methods in Psychology; Millsap, R.E., Maydeu-Olivares, A., Eds.; SAGE Publications Ltd.: London, UK, 2009; pp. 72–90. ISBN 978-1-4129-3091-8. [Google Scholar]
- Seu, K.; Kang, M.-S.; Lee, H. An Intelligent Missing Data Imputation Techniques: A Review. Int. J. Inform. Vis. 2022, 6, 278–283. [Google Scholar] [CrossRef]
- Hosseinzadeh, M.; Azhir, E.; Ahmed, O.H.; Ghafour, M.Y.; Ahmed, S.H.; Rahmani, A.M.; Vo, B. Data Cleansing Mechanisms and Approaches for Big Data Analytics: A Systematic Study. J. Ambient Intell. Hum. Comput. 2021, 14, 99–111. [Google Scholar] [CrossRef]
- Aggarwal, C.C. Outlier Analysis, 2nd ed.; Springer International Publishing: Cham, Switzerland, 2017; ISBN 978-3-319-47577-6. [Google Scholar]
- Prajapati, P.; Shah, P. A Review on Secure Data Deduplication: Cloud Storage Security Issue. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 3996–4007. [Google Scholar] [CrossRef]
- Caton, S.; Malisetty, S.; Haas, C. Impact of Imputation Strategies on Fairness in Machine Learning. J. Artif. Intell. Res. 2022, 74, 1011–1035. [Google Scholar] [CrossRef]
- Hochkamp, F.; Rabe, M. Outlier Detection in Data Mining: Exclusion of Errors or Loss of Information? In Proceedings of the Changing Tides: The New Role of Resilience and Sustainability in Logistics and Supply Chain Management—Innovative Approaches for the Shift to a New Era, Hamburg International Conference of Logistics (HICL), Hamburg, Germany, 21–23 September 2022; Kersten, W., Jahn, C., Blecker, T., Ringle, C.M., Eds.; epubli GmbH: Berlin, Germany, 2022; pp. 91–117, ISBN 978-3-756541-95-9. [Google Scholar]
- Wainer, H. Robust Statistics: A Survey and Some Prescriptions. J. Educ. Stat. 1976, 1, 285–312. [Google Scholar] [CrossRef]
- Cleve, J.; Lämmel, U. Data Mining, 3rd ed.; De Gruyter: Berlin, Germany; Boston, MA, USA, 2020; ISBN 978-3-110-67627-3. [Google Scholar]
- Singh, D.; Singh, B. Investigating the Impact of Data Normalization on Classification Performance. Appl. Soft Comput. 2020, 97, 105524. [Google Scholar] [CrossRef]
- Kaur, M.; Munjal, A. Data Aggregation Algorithms for Wireless Sensor Network: A Review. Ad Hoc Netw. 2020, 100, 102083. [Google Scholar] [CrossRef]
- Simonoff, J.S.; Tutz, G. Smoothing Methods for Discrete Data. In Smoothing and Regression: Approaches, Computation, and Application; Schimek, M.G., Ed.; John Wiley & Sons: New York, NY, USA, 2013. [Google Scholar]
- Ramírez-Gallego, S.; García, S.; Mouriño-Talín, H.; Martínez-Rego, D.; Bolón-Canedo, V.; Alonso-Betanzos, A.; Benítez, J.M.; Herrera, F. Data Discretization: Taxonomy and Big Data Challenge. WIREs Data Min. Knowl. Discov. 2016, 6, 5–21. [Google Scholar] [CrossRef]
- Zhou, K.; Liu, Z.; Qiao, Y.; Xiang, T.; Loy, C.C. Domain Generalization: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 4396–4415. [Google Scholar] [CrossRef] [PubMed]
- Chernick, M.R. Resampling Methods. WIREs Data Min. Knowl. Discov. 2012, 2, 255–262. [Google Scholar] [CrossRef]
- Ahsan, M.M.; Mahmud, M.A.P.; Saha, P.K.; Gupta, K.D.; Siddique, Z. Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance. Technologies 2021, 9, 52. [Google Scholar] [CrossRef]
- Gal, M.; Rubinfeld, D.L. Data Standardization. N. Y. Univ. Law Rev. 2018, 94, 737–770. [Google Scholar] [CrossRef]
- Maharana, K.; Mondal, S.; Nemade, B. A Review: Data Pre-processing and Data Augmentation Techniques. Glob. Transit. Proc. 2022, 3, 91–99. [Google Scholar] [CrossRef]
- Patel, K.M.A.; Thakral, P. The Best Clustering Algorithms in Data Mining. In Proceedings of the International Conference on Communication and Signal Processing 2016, Batu, India, 6–8 April 2016; Iyer, B., Nalbalwar, S.L., Pawade, R.S., Eds.; IEEE: Piscataway, NJ, USA, 2016; pp. 2042–2046. [Google Scholar]
- Biswas, A.; Dutta, S.; Turton, T.L.; Ahrens, J. Sampling for Scientific Data Analysis and Reduction. In In Situ Visualization for Computational Science; Childs, H., Bennett, J.C., Garth, C., Eds.; Springer Nature Switzerland AG: Cham, Switzerland, 2022; pp. 11–36. ISBN 978-3-030-81626-1. [Google Scholar]
- Jayasankar, U.; Thirumal, V.; Ponnurangam, D. A Survey on Data Compression Techniques: From the Perspective of Data Quality, Coding Schemes, Data Type and Applications. J. King Saud Univ. Comput. Inf. Sci. 2021, 33, 119–140. [Google Scholar] [CrossRef]
- Zebari, R.; Abdulazeez, A.; Zeebaree, D.; Zebari, D.; Saeed, J. A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction. J. Appl. Sci. Technol. Trends 2020, 1, 56–70. [Google Scholar] [CrossRef]
- Munson, M.A. A Study on the Importance of and Time Spent on Different Modeling Steps. ACM SIGKDD Explor. Newsl. 2012, 13, 65–71. [Google Scholar] [CrossRef]
- Mishra, P.; Biancolillo, A.; Roger, J.M.; Marini, F.; Rutledge, D.N. New Data Preprocessing Trends Based on Ensemble of Multiple Preprocessing Techniques. Trends Anal. Chem. 2020, 132, 116045. [Google Scholar] [CrossRef]
- Corrales, D.C.; Ledezma, A.; Corrales, J.C. A Conceptual Framework for Data Quality in Knowledge Discovery Tasks (FDQ-KDT): A Proposal. J. Comp. 2015, 10, 396–405. [Google Scholar] [CrossRef]
- Taleb, I.; Serhani, M.A.; Bouhaddioui, C.; Dssouli, R. Big Data Quality Framework: A Holistic Approach to Continuous Quality Management. J. Big Data 2021, 8, 76. [Google Scholar] [CrossRef]
- Vetrò, A.; Canova, L.; Torchiano, M.; Minotas, C.O.; Iemma, R.; Morando, F. Open Data Quality Measurement Framework: Definition and Application to Open Government Data. Gov. Inf. Q. 2016, 33, 325–337. [Google Scholar] [CrossRef]
- Gupta, M.K.; Chandra, P. A Comprehensive Survey of Data Mining. Int. J. Inf. Tecnol. 2020, 12, 1243–1257. [Google Scholar] [CrossRef]
- Aggarwal, C.C. Data Mining: The Textbook; Springer International Publishing: Cham, Switzerland, 2015; ISBN 978-3-319-14141-1. [Google Scholar]
- Gentsch, P. AI Business: Framework and Maturity Model. In AI in Marketing, Sales and Service: How Marketers Without a Data Science Degree Can Use AI, Big Data and Bots; Gentsch, P., Ed.; Springer Nature: Cham, Switzerland, 2019; pp. 27–78. ISBN 978-3-319-89957-2. [Google Scholar]
- Coates, D.L.; Martin, A. An Instrument to Evaluate the Maturity of Bias Governance Capability in Artificial Intelligence Projects. IBM J. Res. Dev. 2019, 63, 7:1–7:15. [Google Scholar] [CrossRef]
Requirement | Short Name | Description |
---|---|---|
R1 | Comparison | Deriving need for development by comparison with existing models |
R2 | Iterative procedure | Development of the model is conducted iteratively |
R3 | Evaluation | Evaluation of all objects of observation and the development process |
R4 | Multi-methodological procedure | Development of maturity models with a variety of research methods |
R5 | Identification of relevance | Demonstration of the problem solution relevance |
R6 | Problem definition | Definition of the scope of the maturity model and conditions prior to the design |
R7 | Result presentation | Determination of a suitable form of representation for the model |
R8 | Documentation | Documentation of the maturity model design process |
Company | Areas of Expertise | Example Areas of Expertise | Source |
---|---|---|---|
Accenture | 4 | Strategy, data, talent | [11] |
Digital X | 8 | Operation, people, technology | [12] |
Bitkom | 4 | Data, quality, organization | [13] |
Company | Areas of Expertise | Example Areas of Expertise | Source |
---|---|---|---|
Alation | 4 | Data literacy, data governance, data search | [29] |
Snowplow | 3 | Data team, data challenges, analytic challenges | [30] |
DOL | 5 | Data, analytics, culture | [31] |
Maturity Models | Areas of Expertise | Example Areas of Expertise | Source |
---|---|---|---|
AI Ethics Maturity Model | 6 | Policy, governance, tooling | [32] |
AI Trust Framework and Maturity Model | 7 | Explainability, data privacy, societal well-being | [42] |
Responsible AI Maturity Model | 7 | Reliability, transparency, human interaction | [43] |
Influence | Group |
---|---|
Decisions for data collection | Expert knowledge |
Data preparation knowledge | Expert knowledge |
Data mining knowledge | Expert knowledge |
Domain knowledge | Expert knowledge |
Analysis question | Expert knowledge |
Data | Technical requirements |
Input requirements for data preparation | Technical requirements |
Input requirements for data mining | Technical requirements |
Technical infrastructure | Technical requirements |
Privacy and ethics | Administration |
People | Administration |
Organizational structure | Administration |
Governance | Administration |
Requirement | Short Name | Fulfilment |
---|---|---|
R1 | Comparison | Fulfilled |
R2 | Iterative procedure | Partly fulfilled |
R3 | Evaluation | Fulfilled |
R4 | Multi-methodological procedure | Fulfilled |
R5 | Identification of relevance | Fulfilled |
R6 | Problem definition | Fulfilled |
R7 | Result presentation | Partly fulfilled |
R8 | Documentation | Fulfilled |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hochkamp, F.; Scheidler, A.A.; Rabe, M. Review of Maturity Models for Data Mining and Proposal of a Data Preparation Maturity Model Prototype for Data Mining. Computers 2025, 14, 146. https://doi.org/10.3390/computers14040146
Hochkamp F, Scheidler AA, Rabe M. Review of Maturity Models for Data Mining and Proposal of a Data Preparation Maturity Model Prototype for Data Mining. Computers. 2025; 14(4):146. https://doi.org/10.3390/computers14040146
Chicago/Turabian StyleHochkamp, Florian, Anne Antonia Scheidler, and Markus Rabe. 2025. "Review of Maturity Models for Data Mining and Proposal of a Data Preparation Maturity Model Prototype for Data Mining" Computers 14, no. 4: 146. https://doi.org/10.3390/computers14040146
APA StyleHochkamp, F., Scheidler, A. A., & Rabe, M. (2025). Review of Maturity Models for Data Mining and Proposal of a Data Preparation Maturity Model Prototype for Data Mining. Computers, 14(4), 146. https://doi.org/10.3390/computers14040146