Data Quality Affecting Big Data Analytics in Smart Factories: Research Themes, Issues and Methods
Abstract
:1. Introduction
2. Definition and Related Studies
2.1. Digital Symmetry
2.2. Data Quality
2.3. Smart Factory
2.4. Related Studies on Data Quality in Smart Factory
- Timeframe of review: The timeframe of our review is up to 2020 for collecting all relevant studies developed during this period.
- Included studies: We only included the empirical studies that followed research methods such as case study, survey, and experiment on DQ affecting BDA in SF.
- Focus of review: Our study focuses on DQ for BDA in the SF context, analysing and synthesising research themes, issues of DQ for BDA, and methods used to address DQ issues in the SF context, as well as establishing relationships between these results from the reviewed studies to better understand this phenomenon.
3. Research Methods
3.1. Defining the Scope of the Review
3.1.1. Establishing Inclusion and Exclusion Criteria
3.1.2. Identifying Fields of Research
3.1.3. Selecting Databases and Outlets
3.1.4. Formulating Search Terms
3.2. Searching the Initial List of Articles
3.3. Selecting Relevant Papers
3.4. Analysing Data from the Included Articles
4. Results
4.1. Demographics of the Included Studies
4.1.1. Publication Trends
4.1.2. Research Methods
4.1.3. Study Contexts of DQ Affecting BDA in SF
4.2. Findings for the RQs
4.2.1. Research Themes of DQ Affecting BDA in the SF Context
4.2.2. DQ Issues for BDA in the SF Context
4.2.3. Methods Used to Address DQ Issues for BDA in the SF Context
- Data imputation. This method refers to replacing missing data or inaccurate data with plausible values from the sample data [S1, S2, S7, S12, S21, S25, S26, S27]. Plausible values were identified through a k-nearest neighbour algorithm (kNN) [S1, S12], a decision trees approach [S12], or a last observation carried forward method [S7] used in a raw dataset, algorithms (e.g., kNN, naïve Bayes classifier, association rule induction algorithm) used in a clean sample dataset [S26], combination of domain knowledge and a clean sample dataset [S2], sequence patterns learned from the whole dataset [S3], seasonal-trend decomposition and recomposition [S21], cold deck imputation [S25], and multiple imputation [S27]. These methods helped estimate a best guess value and were commonly used to fill data and minimise bias, addressing DQ issues such as missing data, anomalies/ noisy data, and data inconsistencies. However, these imputed data values are not real data, and their uncertainty need to estimate [39]. Case deletion. In this study, case deletion is defined as deleting the case of interest from a dataset. For example, null columns [S9] and the cases with missing values [S19, S20, S27] identified were removed directly from the dataset. When the values were out of potential ranges defined based on domain knowledge, these values considered as noise were deleted [S6, S28, S30]. This method was also utilised to address duplicates by deleting and replacing old data if a newer update exactly occurred [S18]. Case deletion is a simple method of dealing with DQ issues for BDA in SF, while it may miss potential information when discarding incomplete data and bias the results of BDA [S27].
- Anomaly detection. This method aims to identify the patterns of a dataset deviated from expected patterns [S6]. Unexpected changes in data values or patterns were recognised as anomalies according to understanding of machine working conditions [S2, S15], for example. A few researchers determined an anomaly when the value was out of defined ranges [S4, S6, S8, S30] or the data point had large distance deviated from normal ones [S28]. This method is commonly utilised to reveal noisy data and its root causes for solution. However, effective use of the method relies heavily on knowledge of manufacturing process [S2], in the SF context.
- Data visualisation. This method enables users to visualise data production process. For example, authors of the study [S2] developed a module that allows users to visualise abnormal detection and interact with anomalies (i.e., modify and label anomalies and upload manual repairing results). This method also was adopted to discover anomalies [23] and observe missing data [S31]. Data visualisation contributes to monitoring and controlling DQ [S31], while having challenges in displaying a large amount of information extracted from the dataset [40].
- Clustering analysis. This method pertains to classifying the data based on data characteristics. By using clustering algorithms, two studies [S16, S19] divided anomalies into groups based on data similarity and analysed the cause of these anomalies to distinguish between indicators of an event and noisy data. In this light, the identified fault data was filtered out for BDA. Clustering analysis method helps classify the data without references and disclose DQ issues, while this method only performs well with the guidance of industrial domain knowledge [S16].
- Database commit was utilised to update data in the database when receiving new records, assisting in addressing issues of old data [S18]. Matching inconsistent values method focused on constructing a classifier based on training dataset to obtain matching patterns and repairing inconsistent data based on the learned patterns [S2]. For ontology-based sematic enrichment, authors of the study [S11] referred to an ontology created for both input and output flows of a data inventory. This method addressed the differences in data interpretation from multiple sources by semantic enrichment that help avoid repetitive data created. Computational conformance checking aimed to automatically identify and diagnose root causes of data errors such as sensor faults, human faults and system/equipment faults and send alerts to system mangers for dealing with these issues [S14]. While an adaptive correction threshold method was used to identify the impact features of noise and reduce the invalid data for BDA [S5].
- Organisational structure design referred to creating an organisational unit for data governance and dividing responsibilities within an organisation. According to the studies [S13, S29], to better deal with DQ issues for BDA, a group of people in an organisation was defined and they had responsibilities for implementing data governance. Every employee also had defined responsibilities to complete tasks of addressing DQ. In this light, actors worked toward a common goal to achieve high-quality data for BDA in the SF context.
- Organisational culture cultivation concerned a cultural change across all levels and attitude shift about transformation from traditional manufacturing to data-driven SF [S29]. When top management level and staff members understand the impacts of DQ issues and added values of using BDA for SF, they are more likely to put efforts in dealing with DQ. Furthermore, creating a problem awareness and motivation could help drive employees to follow defined rules in routine data practices and support changes required in this transformation.
- Regulation formulation addressed defining rules and conventions for data practices. As noted in [S13, S29], a universal, easy-to-follow and detailed rule for creating data associated with DQ requirements and policies of standardised documentation and storage locations helped govern data collection and storage to reduce missing data and data inconsistencies that prepare quality-assured data used for later BDA toward SF.
- Data architecture standardisation dealt with defining an architecture for combing different data sources and structures. The amount of manufacturing data is collected from multiple sources and within diverse structures and requires a well-defined architecture for data organisation and integration in order to address heterogeneity problems [S29]. Such a data architecture also helps decrease DQ issues such as missing data, data inconsistencies and redundancies by clarifying the naming of data objects [S13, S29].
- Process management pertained to monitoring and controlling data practices in manufacturing processes. As mentioned, intelligent transformation of companies could incur changes, communication and documentation of changes are needed to deal with in this process [S13]. These activities related to process management could help employees improve the understanding of SF benefits and its difficulties in the implementation, and they can be guided to address DQ issues in practices. Furthermore, monitoring the process of production and distribution also assisted in revealing DQ issues and their root causes for addressing these issues in time [S14].
5. Discussion
5.1. Trends of DQ Affecting BDA in SF
5.2. Findings Addressing Research Questions
5.3. Theoretical and Practical Implications
5.3.1. Theoretical Implications
5.3.2. Practical Implications
5.4. Limitations of the Study
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Study Number | Reference |
---|---|
S1 | Chien, C.F.; Chen, Y.J.; Wu, J.Z. Big data analytics for modeling WAT parameter variation induced by process tool in semiconductor manufacturing and empirical study. In Proceedings of the 2016 Winter Simulation Conference; IEEE Press: Piscataway, United States, 2016, pp.2512–2522. |
S2 | Ding, X.; Wang, H.; Su, J.; Li, Z.; Li, J.; Gao, H. Cleanits: A data cleaning system for industrial time series. In Proceedings of the VLDB Endowment; VLDB Endowment: Los Angeles, United States, 2019, 12(12), pp.1786–1789. |
S3 | McGinnis, L.F.; Rose, O. History and perspective of simulation in manufacturing. In 2017 Winter Simulation Conference; IEEE Press: Piscataway, United States, 2017, pp. 385–397. |
S4 | Huang, J.; Kong, L.; Dai, H.N.; Ding, W.; Cheng, L.; Chen, G.; Jin, X.; Zeng, P. Blockchain-based mobile crowd sensing in industrial systems. IEEE Transactions on Industrial Informatics 2020, 16, 10, 6553–6563. |
S5 | Zhao, H.; Zhang, J.; Jiang, Z.; Wei, D.; Zhang, X.; Mao, Z. A new fault diagnosis method for a diesel engine based on an optimized vibration MEL frequency under multiple operation conditions. Sensors 2019, 19, 11, p.2590. |
S6 | Yu, W.; Dillon, T.; Mostafa, F.; Rahayu, W.; Liu, Y. Implementation of industrial cyber physical system: Challenges and solutions. In Proceedings of the 2019 IEEE International Conference on Industrial Cyber Physical Systems; IEEE Press: Piscataway, United States, 2019, pp.173–178. |
S7 | Zhou, H.; Yu, K.M.; Lee, M.G.; Han, C.C. The application of last observation carried forward method for missing data estimation in the context of industrial wireless sensor networks. In 2018 IEEE Asia-Pacific Conference on Antennas and Propagation; IEEE Press: Piscataway, United States, 2018, pp.130–131. |
S8 | Moyne, J.; Iskandar, J. Big data analytics for smart manufacturing: Case studies in semiconductor manufacturing. Processes 2017, 5, 3, 39. |
S9 | Taetragool, U.; Achalakul, T. Method for failure pattern analysis in disk drive manufacturing. International Journal of Computer Integrated Manufacturing, 2011, 24, 9, 834–846. |
S10 | Vitolo, F.; Franciosa, P.; Ceglarek, D.; Patalano, S.; De Martino, M. A generalised multi-attribute task sequencing approach for robotics optical inspection systems. In Proceedings of the 2019 II Workshop on Metrology for Industry 4.0 and IoT; IEEE Press: Piscataway, United States, 2019, pp.117–122. |
S11 | Jayapal, J.; Kumaraguru, S. Real-time linked open data for life cycle inventory. In Proceedings of the IFIP International Conference on Advances in Production Management Systems, Springer: Cham, Switzerland, 2018, pp.249–254. |
S12 | Reuter, C.; Brambring, F.; Weirich, J.; Kleines, A. Improving data consistency in production control by adaptation of data mining algorithms. Procedia CIRP 2016, 56, 545–550. |
S13 | Krumay, B.; Rueckel, D. Data governance and digitalization–A case study in a manufacturing company. Paper presented at the 24th Pacific Asia Conference on Information Systems, Dubai, United Arab Emirates, 22–24 Jun 2020. |
S14 | Wang, Y.; Hulstijn, J.; Tan, Y.H. Towards smart manufacturing: Compliance monitoring for computational auditing. Paper presented at the 26th European Conference on Information Systems, Portsmouth, United Kingdom, 23–28 Jun 2018. |
S15 | Utz, F.; Neumann, C.; Omid, T. How to discover knowledge for improving availability in the manufacturing domain. Paper presented at the 51st Hawaii International Conference on System Sciences, Hilton Waikoloa Village, United States, 3- 6 Jan 2018. |
S16 | Li, X.; Tu, Z.; Jia, Q.; Man, X.; Wang, H.; Zhang, X. Deep-level quality management based on big data analytics with case study. In Proceedings of the 2017 Chinese Automation Congress; IEEE Press: Piscataway, United States, 2017, pp.4921–4926. |
S17 | Michaloski, J.; Lee, B.E.; Proctor, F.; Venkatesh, S. Web-enabled real-Time quality feedback for factory systems using MTConnect. In Proceedings of International Design Engineering Technical Conferences and Computers and Information in Engineering Conference; American Society of Mechanical Engineers: Montreal, Canada, 2012, 45011, pp.403–409. |
S18 | Lynn, R.; Louhichi, W.; Parto, M.; Wescoat, E.; Kurfess, T. Rapidly deployable MTConnect-based machine tool monitoring systems. In Proceedings of International Manufacturing Science and Engineering Conference; American Society of Mechanical Engineers: Montreal, Canada, 2017, 50749, p.V003T04A046. |
S19 | Cheng, Y.; Shang, W.; Zhu, L.; Zhang, D.; Feng, D. Items analysis of postal supervision. Paper presented at the 15th International Conference on Computer and Information Science, Okayama, Japan, 26–29 June 2016. |
S20 | Chien, C.F.; Wang, W.C.; Cheng, J.C. Data mining for yield enhancement in semiconductor manufacturing and an empirical study. Expert Systems with Applications 2007, 33, 1, 192–198. |
S21 | Liu, Y.; Dillon, T.; Yu, W.; Rahayu, W.; Mostafa, F. Missing value imputation for Industrial IoT sensor data with large gaps. IEEE Internet of Things Journal 2020, 7, 8, 6855–6867. |
S22 | Li, S.; Peng, G.C.; Xing, F. Barriers of embedding big data solutions in smart factories: insights from SAP consultants. Industrial Management & Data Systems 2019, 119, 5, 1147–1164. |
S23 | Iftikhar, N.; Baattrup-Andersen, T.; Nordbjerg, F.E.; Bobolea, E.; Radu, P.B. Data Analytics for Smart Manufacturing: A Case Study. In Proceedings of the 8th International Conference on Data Science, Technology and Applications; SciTePress: Setúbal, Portugal, 2019, pp.392–399. |
S24 | Hui, K.; Ke, L.; Sheen, S.Y. Forging basic elements of cyber-physical systems in industry 4.0 with parametric characterization for FDC. In Proceedings of the 29th Annual SEMI Advanced Semiconductor Manufacturing Conference; IEEE Press: Piscataway, United States, 2018, pp.111–116. |
S25 | Schuh, G.; Potente, T.; Thomas, C.; Brambring, F. Improving scheduling accuracy by reducing data inconsistencies in production control. Paper presented at the 25th Annual Conference of the Production and Operations Management Society “Reaching New Heights”, Atlanta, United States, 9–12 May 2014. |
S26 | Reuter, C.; Brambring, F. Improving data consistency in production control. Procedia CIRP 2016, 41, 51–56. |
S27 | Kwak, D.S.; Kim, K.J. A data mining approach considering missing values for the optimization of semiconductor-manufacturing processes. Expert Systems with Applications 2012, 39, 3, 2590–2596. |
S28 | Liu, Y.; Dillon, T.; Yu, W.; Rahayu, W.; Mostafa, F. Noise removal in the presence of significant anomalies for Industrial IoT sensor data in manufacturing. IEEE Internet of Things Journal 2020, 7, 8, 7084–7096. |
S29 | Marx, E.; Stierle, M.; Weinzierl, S.; Matzner, M. Closing the Gap between Smart Manufacturing Applications and Data Management. Paper presented at the 15th International Conference on Wirtschaftsinformatik, Potsdam, Germany, 8–11 March 2020. |
S30 | Yu, W.; Dillon, T.; Mostafa, F.; Rahayu, W.; Liu, Y. A global manufacturing big data ecosystem for fault detection in predictive maintenance. IEEE Transactions on Industrial Informatics 2019, 16, 1, 183–192. |
S31 | Hazen, B.T.; Boone, C.A.; Ezell, J.D.; Jones-Farmer, L.A. Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. International Journal of Production Economics 2014, 154, 72–80. |
Appendix B
Acronym | Full Form | Definition | Reference(s) |
---|---|---|---|
BDA | Big data analytics | the use of advanced analytic techniques to discover patterns, trends and relationships from large datasets | [3] |
DQ | Data quality | defined both as ‘fitness for use’ and as ‘conformance to requirements’ that can be divided into dimensions for describing different DQ aspects | [7] |
EC | Exclusion criteria | the characteristics that prospective subjects must have if they are to be removed in the study sample | Not applicable |
IC | Inclusion criteria | the characteristics that prospective subjects must have if they are to be included in the study sample | Not applicable |
IM | Information management | a discipline that deals with the collection, management and distribution of information | Not applicable |
IS | Information system | a discipline/a system used to collect, process, store, and distribute information | Not applicable |
IT | Information technologies | a discipline/the systems, software, and networks dealing with data processing and distribution | Not applicable |
NM | Not mentioned | the concerned term/content that has not been mentioned in the studies | Not applicable |
RQ | Research question | an inquiry that a study or research project aims to answer | Not applicable |
SF | Smart factories | the use of advanced technologies and data exchange in the manufacturing process to realise intelligent production | [20,21,22,23] |
SLR | Systematic literature review | a means of the selection, analysis and interpretation of the available literature in relation to addressing a specific research topic, a RQ or a phenomenon of interest | [10] |
SM | Smart manufacturing | the use of data analytics and information and communication technologies to govern and optimise manufacturing operations | [27] |
References
- Bagozi, A.; Bianchini, D.; De Antonellis, V.; Marini, A.; Ragazzi, D. Summarisation and Relevance Evaluation Techniques for Big Data Exploration: The Smart Factory Case Study. In International Conference on Advanced Information Systems Engineering; Springer: Cham, Switzerland, 2017; pp. 264–279. [Google Scholar] [CrossRef]
- Qi, Q.; Tao, F. Digital twin and big data towards smart manufacturing and industry 4.0: 360 degree comparison. IEEE Access 2018, 6, 3585–3593. [Google Scholar] [CrossRef]
- Luo, S.; Hongwei, L.; Ershi, Q. Big data analytics–enabled cyberphysicalsystem: Model and applications. Ind. Manag. Data Syst. 2019, 119, 1072–1088. [Google Scholar] [CrossRef]
- Chien, C.F.; Chen, Y.J.; Wu, J.Z. Big data analytics for modeling WAT parameter variation induced by process tool in semiconductor manufacturing and empirical study. In Proceedings of the 2016 Winter Simulation Conference, Washington, DC, USA, 11–14 December 2016; pp. 2512–2522. [Google Scholar]
- Moyne, J.; Jimmy, I. Big data analytics for smart manufacturing: Case studies in semiconductor manufacturing. Processes 2017, 5, 39. [Google Scholar] [CrossRef] [Green Version]
- Hazen, B.T.; Boone, C.A.; Ezell, J.D.; Jones-Farmer, L.A. Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. Int. J. Prod. Econ. 2014, 154, 72–80. [Google Scholar] [CrossRef]
- De Feo, J.A.; Juran, J.M. Juran’s Quality handbook: The Complete Guide to Performance Excellence, 7th ed.; McGraw-Hill: New York, NY, USA, 2017. [Google Scholar]
- Marx, E.; Stierle, M.; Weinzierl, S.; Matzner, M. Closing the Gap between Smart Manufacturing Applications and Data Management. In Proceedings of the 15th International Conference on Wirtschaftsinformatik, Potsdam, Germany, 8–11 March 2020. [Google Scholar]
- Cui, Y.; Kara, S.; Chan, K. Manufacturing big data ecosystem: A systematic literature review. Robot. Comput. Manuf. 2020, 62, 101861. [Google Scholar] [CrossRef]
- Safaei, M.; Asadi, S.; Driss, M.; Boulila, W.; Alsaeedi, A.; Chizari, H.; Abdullah, R.; Safaei, M. A systematic literature review on outlier detection in wireless sensor networks. Symmetry 2020, 12, 328. [Google Scholar] [CrossRef] [Green Version]
- Wolfswinkel, J.F.; Furtmueller, E.; Wilderom, C.P. Using grounded theory as a method for rigorously reviewing literature. Eur. J. Inf. Syst. 2013, 22, 45–55. [Google Scholar] [CrossRef]
- Shangguan, D.; Chen, L.; Ding, J. A digital twin-based approach for the fault diagnosis and health monitoring of a complex satellite system. Symmetry 2020, 12, 1307. [Google Scholar] [CrossRef]
- Ghita, M.; Siham, B.; Hicham, M. Digital Twins Development Architectures and Deployment Technologies: Moroccan use Case’. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 468–478. [Google Scholar] [CrossRef] [Green Version]
- Wang, R.Y.; Strong, D.M. Beyond Accuracy: What Data Quality Means to Data Consumers. J. Manag. Inf. Syst. 1996, 12, 5–33. [Google Scholar] [CrossRef]
- Tilly, R.; Oliver, P.; Kai, F.; Detlef, S. Towards a conceptualization of data and information quality in social information systems. Bus. Inf. Syst. Eng. 2017, 59, 3–21. [Google Scholar] [CrossRef]
- Côrte-Real, N.; Pedro, R.; Tiago, O. Leveraging internet of things and big data analytics initiatives in European and American firms: Is data quality a way to extract business value? Inf. Manag. 2020, 57, 103141. [Google Scholar] [CrossRef]
- Mikalef, P.; Pappas, I.; Krogstie, J.; Giannakos, M. Big data analytics capabilities: A systematic literature review and research agenda. Inf. Syst. e Bus. Manag. 2017, 16, 547–578. [Google Scholar] [CrossRef]
- Chen, D.Q.; Preston, D.S.; Swink, M. How the use of big data analytics affects value creation in supply chain management. J. Manag. Inf. Syst. 2015, 32, 4–39. [Google Scholar] [CrossRef]
- Büchi, G.; Cugno, M.; Castagnoli, R. Smart factory performance and Industry 4.0. Technol. Forecast. Soc. Chang. 2020, 150, 119790. [Google Scholar] [CrossRef]
- Hrustek, L.; Vrcek, N.; Furjan, M.T. ERP systems in the context of smart factories. In Proceedings of the 62nd International Scientific Conference on Economic and Social Development, Budapest, Hungary, 4–5 September 2020. [Google Scholar]
- Gunal, M.M.; Mumtaz, K. Industry 4.0, digitisation in manufacturing, and simulation: A review of the literature. In Simulation for Industry 4.0; Springer: Berlin/Heidelberg, Germany, 2019; pp. 19–37. [Google Scholar]
- Mabkhot, M.M.; Al-Ahmari, A.M.; Salah, B.; Alkhalefah, H. Requirements of the Smart Factory System: A Survey and Perspective. Machines 2018, 6, 23. [Google Scholar] [CrossRef] [Green Version]
- Strozzi, F.; Claudia, C.; Alessandro, C.; Carlo, N. Literature review on the ‘Smart Factory’concept using bibliometric tools. Int. J. Prod. Res. 2017, 55, 6572–6591. [Google Scholar] [CrossRef]
- O’Donovan, P.; Leahy, K.; Bruton, K.; O’Sullivan, D.T. An industrial big data pipeline for data-driven analytics maintenance applications in large-scale smart manufacturing facilities. J. Big Data 2015, 2, 25. [Google Scholar] [CrossRef] [Green Version]
- Chopra, S. Designing the distribution network in a supply chain. Transp. Res. Part E Logist. Transp. Rev. 2003, 39, 123–140. [Google Scholar] [CrossRef]
- Rushton, A.; Phil, C.; Peter, B. The Handbook of Logistics and Distribution Management: Understanding the Supply Chain; Kogan Page Publishers: London, UK, 2014. [Google Scholar]
- Thoben, K.D.; Wiesner, S.; Wuest, T. “Industrie 4.0” and smart manufacturing-a review of research issues and application examples. Int. J. Autom. Technol. 2017, 11, 4–16. [Google Scholar] [CrossRef] [Green Version]
- Sundarraj, M.; Rajkamal, M.N. Data governance in smart factory: Effective metadata management. Int. J. Adv. Res. Ideas Innov. Technol. 2019, 5, 798–804. [Google Scholar]
- Mäkinen, M.V. Data Quality in Smart Manufacturing. Master’s Thesis, University of Vaasa, Vaasa, Finland, 2020. [Google Scholar]
- Krumay, B.; David, R. Data governance and digitalization-A case study in a manufacturing company. In Proceedings of the 24th Pacific Asia Conference on Information Systems, Dubai, United Arab Emirates, 22–24 June 2020. [Google Scholar]
- Wang, Y.; Joris, H.; Yao-hua, T. Towards smart manufacturing: Compliance monitoring for computational auditing. In Proceedings of the 26th European Conference on Information Systems, Portsmouth, UK, 23–28 June 2018. [Google Scholar]
- Utz, F.; Christian, N.; Tafreschi, O. How to discover knowledge for improving availability in the manufacturing domain. In Proceedings of the Paper Presented at the 51st Hawaii International Conference on System Sciences, Waikoloa Village, HI, USA, 3–6 January 2018. [Google Scholar]
- Sadiq, S.; Naiem, K.Y.; Marta, I. 20 years of data quality research: Themes, trends and synergies. In Proceedings of the 22rd Australasian Database Conference, Perth, Australia, 17–20 January 2011. [Google Scholar]
- Van Nguyen, T.; Zhou, L.; Spiegler, V.; Ieromonachou, P.; Lin, Y. Big data analytics in supply chain management: A state-of-the-art literature review. Comput. Oper. Res. 2018, 98, 254–264. [Google Scholar] [CrossRef] [Green Version]
- Shelley, M.; Krippendorff, K. Content Analysis: An Introduction to its Methodology. J. Am. Stat. Assoc. 1984, 79, 240. [Google Scholar] [CrossRef] [Green Version]
- Boyatzis, R.E. Transforming Qualitative Information: Thematic Analysis and Code Development; Sage Publications: New York, NY, USA, 1998. [Google Scholar]
- Zhang, R.; Marta, I.; Shazia, S. Discovering data quality problems. Bus. Inf. Syst. Eng. 2019, 61, 575–593. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Y.; Wang, W.; Du, W.; Qian, C.; Yang, H. Coloured Petri net-based active sensing system of real-time and multi-source manufacturing information for smart factory. Int. J. Adv. Manuf. Technol. 2017, 94, 3427–3439. [Google Scholar] [CrossRef]
- Scheffer, J. Dealing with missing data. In Research Letters in the Information and Mathematical Sciences; Institute of Information and Mathematical Sciences: Auckland, New Zealand, 2002; pp. 153–160. [Google Scholar]
- Descrimes, M.; Ben Zouari, Y.; Wery, M.; Legendre, R.; Gautheret, D.; Morillon, A. VING: A software for visualization of deep sequencing signals. BMC Res. Notes 2015, 8, 419. [Google Scholar] [CrossRef] [Green Version]
- Wang, R.Y. A product perspective on total data quality management. Commun. ACM 1998, 41, 58–65. [Google Scholar] [CrossRef]
- Wahyudi, A.; Kuk, G.; Janssen, M. A Process Pattern Model for Tackling and Improving Big Data Quality. Inf. Syst. Front. 2018, 20, 457–469. [Google Scholar] [CrossRef] [Green Version]
- Daraio, C.; Lenzerini, M.; Leporelli, C.; Naggar, P.; Bonaccorsi, A.; Bartolucci, A. The advantages of an Ontology-Based Data Management approach: Openness, interoperability and data quality. Scientometrics 2016, 108, 441–455. [Google Scholar] [CrossRef]
- Choi, T.-M. Blockchain-technology-supported platforms for diamond authentication and certification in luxury supply chains. Transp. Res. Part E Logist. Transp. Rev. 2019, 128, 17–29. [Google Scholar] [CrossRef]
- Choi, T.-M.; Luo, S. Data quality challenges for sustainable fashion supply chain operations in emerging markets: Roles of blockchain, government sponsors and environment taxes. Transp. Res. Part E Logist. Transp. Rev. 2019, 131, 139–152. [Google Scholar] [CrossRef]
- Xuan, S.; Zhang, Y.; Tang, H.; Chung, I.; Wang, W.; Yang, W. Hierarchically Authorized Transactions for Massive Internet-of-Things Data Sharing Based on Multilayer Blockchain. Appl. Sci. 2019, 9, 5159. [Google Scholar] [CrossRef] [Green Version]
Citation | Timeframe of Reviewed Publications | Number of Papers Reviewed | Number of RQs Proposed | Context | DQ Focus? |
---|---|---|---|---|---|
Thoben et al. [27] | NM | NM | NM | Industrie 4.0 and SM | No |
Sundarraj et al. [28] | NM | NM | NM | SF | No |
Cui et al. [9] * | 2008–2017 | 128 | 4 | SM | No |
Emanuel et al. [8] | NM | 148 | 1 | SM | No |
Mäkinen [29] | NM | NM | NM | SM | Yes |
Criteria | Number | Description |
---|---|---|
Inclusion criteria (IC) | IC1 | The articles included are published in English. |
IC2 | The articles included are published up to 2020. | |
IC3 | The article has a topic on DQ affecting BDA in the context of SF. | |
Exclusion criteria (EC) | EC1 | The articles are duplicates. |
EC2 | The articles are not peer-reviewed research publications. | |
EC3 | The articles cannot be accessed online. | |
EC4 | The researchers of the articles do not present empirical findings themselves. |
Major Search Terms | Data Quality | Big Data Analytics | Smart Factory |
---|---|---|---|
Synonyms and alternative terms | Quality of data | Big Data | Smart factories |
Data analytics | Intelligent factor * (factory, factories) | ||
Data mining | Ubiquitous factor * (factory, factories) | ||
Machine learning | Real-time factor * (factory, factories) | ||
Descriptive analytics | Smart manufacturing | ||
Predictive analytics | Intelligent manufacturing | ||
Prescriptive analytics | Ubiquitous manufacturing | ||
Real-time manufacturing | |||
Factory-of-things |
Research Methods | Number of the Reviewed Studies (Percent) | References |
---|---|---|
Case study | 15 (48%) | [S4, S8, S10, S11, S13, S14, S15, S16, S17, S23, S24, S25, S27, S29, S31] |
Experiment | 14 (45%) | [S1, S2, S5, S6, S7, S9, S12, S18, S19, S20, S21, S26, S28, S30] |
Survey | 2 (7%) | [S3, S22] |
Total | 31 |
Product Stages | Number of Reviewed Studies | References | Systems of Data Sources (References) |
---|---|---|---|
Production | 27 | [S1, S2, S5–S18, S20–S30] | Temperature control system [S2], Fans group system [S2], Fault detection and diagnosis system [S5], Data warehouse [S9, S20, S21], Optical inspection system [S10], Cloud storage platform [S11], Planning and scheduling systems [S12], Information systems [S13], Compliance management system [S14], Manufacturing execution systems [S15], Enterprise management systems [S16, S23, S26, S29], Machine monitoring systems [S18], Production data acquisition systems [S25, S29] |
Distribution | 6 | [S3, S4, S13, S14, S19, S31] | Manufacturing systems [S3], Logistics distribution system [S4], Information systems [S13], Compliance management system [S14], Data warehouse [S19], Data management system [S31] |
Research Themes | Description | References |
---|---|---|
Production scheduling | Preparing quality-assured data used in BDA to plan tasks in the manufacturing process | [S10, S12, S22, S25, S26, S29] |
Process monitoring | Preparing quality-assured data used in BDA to understand the operating status of machines/ process | [S4, S7, S11, S13, S19, S20, S21, S26, S28, S31] |
Quality tracing | Preparing quality-assured data used in BDA to identify root causes of product quality failures | [S1, S9, S16, S17, S18, S20, S26] |
Fault detection | Preparing quality-assured data used in BDA to discover faults in the performance of machines | [S2, S5, S6, S8, S14, S15, S20, S23, S24, S29, S30] |
Predictive maintenance | Preparing quality-assured data used in BDA to estimate when machine maintenance should be deployed before any downtime | [S8, S15, S23, S29, S30] |
Process optimisation | Preparing quality-assured data used in BDA to improve efficiency and effectiveness of process | [S3, S27, S29] |
DQ Issues (Alternative Terms) | Description | Associated DQ Dimension Used in ISO (the Terms of DQ Dimension Used in The Reviewed Studies) | Definition of the Associated DQ Dimension Presented in ISO | Definition of the Associated DQ Dimension Presented in the Reviewed Studies | References of DQ Issues |
---|---|---|---|---|---|
Missing data/values [S1, S2, S7, S12, S15, S19, S20, S21, S24, S25, S26, S27] (null column [S9], incomplete data [S23, S31], inadequate data availability [S3]) | Data values are null or deficient [S7] | Completeness (completeness [S8, S12, S14, S15, S21, S26, S29, S31], integrity [S7, S22] | ‘The degree to which subject data associated with an entity has values for all expected attributes and related entity instances in a specific context of use.’ | ‘no data which should have been gathered is missing’ [S26] ‘are necessary data missing?’ [S31] | [S1, S2, S3, S7, S9, S12, S15, S19, S20, S21, S23, S24, S25, S26, S27, S31] |
Anomalies/abnormal data [S2, S4, S6, S8, S28] (imprecise data [S2], incorrect data [S12, S23], fault data [S16], outliers [S4, S16, S24, S30], data errors [S26]) | Data is deviated from the patterns of normal data [S6] | Accuracy (accuracy [S8, S10, S11, S15, S22, S28, S29], correctness [S12, S14, S26], validity [S29]) | ‘The degree to which data has attributes that correctly represent the true value of the intended attribute of a concept or event in a specific context of use.’ | ‘reality is reflected correctly by the data’ [S26] ‘Are the data free of errors?’ [S31] | [S2, S4, S6, S8, S12, S16, S23, S24, S26, S28, S30] |
Noisy data/noise [S5, S6, S15, S19, S20, S28] (dirty data/values [S2, S30], invalid data [S5, S11]) | Data is out of all potential values [S6] | [S2, S5, S6, S11, S15, S19, S20, S28, S30] | |||
Data inconsistencies [S12, S13, S25, S26] (inconsistent values/data [S2, S20, S22]) | Data elements from different data sources are with contradictions [S12] | Consistency (consistency [S12, S15, S22, S25, S26, S29]) | ‘The degree to which data has attributes that are free from contradiction and are coherent with other data in a specific context of use.’ | ‘information can be won from the data without any contradictions surfacing’ [S26] ‘Are the data presented in the same format?’ [S31] | [S2, S12, S13, S20, S22, S25, S26] |
Data redundancies [S4, S13] (repetitive data [S11], duplicates [S18, S23]) | Data for the same observation appears in many places [S9] | Consistency (repeatability [S10]) | [S4, S9, S11, S13, S18, S23] | ||
Old data [S18] (outdated data [S11], antiquated data [S17], time-alignment issue [S24]) | Data is out of date [S11, S18] | Currentness (timeliness [S15, S31]) | ‘The degree to which data has attributes that are of the right age in a specific context of use.’ | ‘Are the data up-to-date?’ [S31] | [S11, S17, S18, S24] |
Methods | DQ Issues | |||||
---|---|---|---|---|---|---|
Missing Data | Anomalies/Noisy Data | Data Inconsistencies | Data Redundancies | Old Data | ||
Technical methods | Data imputation | [S1, S2, S7, S12, S21, S25, S26, S27] | [S25] | [S12, S25, S26] | ||
Case deletion | [S9, S19, S20, S27] | [S6, S28, S30] | [S18] | |||
Anomaly detection | [S2, S4, S6, S8, S15, S28, S30] | |||||
Data visualisation control | [S31] | [S2, S23] | ||||
Clustering analysis | [S16, S19] | |||||
Database commit | [S18] | |||||
Matching values method | [S2] | |||||
Ontology-based sematic enrichment | [S11] | |||||
Computational conformance checking | [S14] | |||||
Adaptive correction threshold method | [S5] | |||||
Non-technical methods | Organisational structure design | [S29] | [S13, S29] | [S13] | ||
Organisational culture cultivation | [S29] | [S29] | ||||
Regulation formulation | [S29] | [S13, S29] | [S13] | |||
Data architecture standardisation | [S29] | [S13, S29] | [S13] | |||
Process management | [S14] | [S13] | [S13] |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, C.; Peng, G.; Kong, Y.; Li, S.; Chen, S. Data Quality Affecting Big Data Analytics in Smart Factories: Research Themes, Issues and Methods. Symmetry 2021, 13, 1440. https://doi.org/10.3390/sym13081440
Liu C, Peng G, Kong Y, Li S, Chen S. Data Quality Affecting Big Data Analytics in Smart Factories: Research Themes, Issues and Methods. Symmetry. 2021; 13(8):1440. https://doi.org/10.3390/sym13081440
Chicago/Turabian StyleLiu, Caihua, Guochao Peng, Yongxin Kong, Shuyang Li, and Si Chen. 2021. "Data Quality Affecting Big Data Analytics in Smart Factories: Research Themes, Issues and Methods" Symmetry 13, no. 8: 1440. https://doi.org/10.3390/sym13081440
APA StyleLiu, C., Peng, G., Kong, Y., Li, S., & Chen, S. (2021). Data Quality Affecting Big Data Analytics in Smart Factories: Research Themes, Issues and Methods. Symmetry, 13(8), 1440. https://doi.org/10.3390/sym13081440