A Review of Data Mining with Big Data towards Its Applications in the Electronics Industry

Featured Application: This review not only beneﬁts researchers to develop strong research themes and identify gaps in the ﬁeld but also helps practitioners for DM and Big Data application system development. Abstract: Data mining (DM) with Big Data has been widely used in the lifecycle of electronic products that range from the design and production stages to the service stage. A comprehensive analysis of DM with Big Data and a review of its application in the stages of its lifecycle will not only beneﬁt researchers to develop strong research themes and identify gaps in the ﬁeld but also help practitioners for DM application system development. In this paper, a brief clariﬁcation of DM-related topics is presented ﬁrst. A ﬂowchart of DM and the main content of the ﬂowchart steps are given in which commonly used data preparation and preprocessing approaches, DM functions and techniques, and performances indicators are summarized. Then, a comprehensive review covering 105 articles from 2007 to 2017 on DM or Big Data applications in the electronics industry is provided according to the ﬂowchart from various points of view such as data handling, applications of DM, or Big Data at different lifecycle stages, and the software used in the applications. On this basis, a diagram of data content for different knowledge areas and a framework for DM and Big Data applications in the electronics industry are established. Finally, conclusions and future research directions are given. of DM with Big Data for process monitoring and control, PMO, and quality improvement in the stage of production the of most research. On the one hand, sophisticated DM and Big Data related techniques such as FDC and R2R have been developed for the wafer production process monitoring and control to reduce defects and improve the quality/yield based on the data collected from manufacturing processes, equipment/tool/environment statuses, and process parameters. The functions of classiﬁcation and clustering were widely used for FDC based on related DMTs such as DT, SVM and ANN, k-means and SOM, while the prediction function was widely presented for VM based on ANN, regression, and SVM. On the other hand, prediction, clustering, and the combination of the two are the most frequently employed functions for the optimizing scheduling plan and prediction of cycle time/due date based on ANN, FCM, SOM, and a hybrid of fuzzy logic and ANN. Additionally, post hoc diagnosis, quality prediction, and classiﬁcation were conducted based on the functions of prediction, classiﬁcation, clustering, and association for future production quality improvement.


Introduction
Since the internet of things and advanced information technologies (for example, radio frequency identification (RFID) tags and smart sensors) are widely used in manufacturing enterprises for their daily production and management, the product lifecycle management (PLM) processes produce a huge amount of data [1]. Furthermore, the accumulation of historical data in enterprise resource planning (ERP), supply chain management (SCM), customer relationship management (CRM), and order management system (OMS), as well as the timely collected data by the widely used manufacturing execution system (MES) and distributed control system (DCS) contributed to the sharp increase of data over the decades. The era of industrial Big Data has come.
Leaders of manufacturing enterprises are becoming increasingly interested in benefiting their companies by effectively using Big Data [1]. Big data related technologies such as knowledge discovery in databases (KDD) and data mining (DM) have been widely employed to enhance the intelligence and efficiency of the design, production, and service processes in many manufacturing scenes such as product design improvement, manufacturing process optimization, production management and optimization (PMO), production process monitoring and control, quality management, CRM, SCM, and so forth. Intel employs Big Data for predictive maintenance of equipment and greatly reduces the unnecessary equipment stop and idle time. A Taiwan Semiconductor Manufacturing Company adopts Big Data based advanced equipment control/advanced process control (AEC/APC) to improve production efficiency and wafer yield. Many reviews of these applications in the manufacturing industry have been reported and summarized in Table 1, from which we can see most of the achievements related to DM application in manufacturing before 2015 [2][3][4][5][6], and many researchers have started to adopt the concept of Big Data [7][8][9][10][11] in smart manufacturing since then. However, the aforementioned reviews provide no comprehensive analysis of DM with Big Data nor a summarization of them in the electronics industry from the view of their lifecycle, considering the special requirement of this manufacturing industry to the best of our knowledge. Table 1. The reviews of data mining and big data application in the smart manufacturing industry.

Reference Main Review Content Year
Choudhary et al. [2] Application of KDD and DM in manufacturing, the kinds of patterns to be mined, and data mining techniques (DMTs) 2009 Ngai et al. [3] DM application in customer identification, attraction, retention, and development 2009 Gulser et al. [4] DM application for product quality improvement tasks including quality description/predicting/classification and parameter optimization 2011 Liao et al. [5] DMTs applications in CRM, product development, and fault pattern analysis 2012 Hamidey et al. [6] Support vector machine (SVM) application in quality assessment in manufacturing 2015 Donovan et al. [7] Application of Big Data in the area of design, process and planning, quality management, maintenance and diagnosis, scheduling, control, environment, and so forth. 2015 Li et al. [8] Concept, characteristics, and potential application of Big Data in PLM 2015 Zhong et al. [9] Big Data applications in finance, economics, healthcare, SCM, and the manufacturing sector. Current movements on the Big Data for SCM in service and manufacturing 2016 Nagorny et al. [10] Big Data in smart manufacturing systems including related research roadmaps and projects in European, the infrastructures, Big Data analysis process, algorithm and tools, and so forth. 2017 Cheng et al. [11] Development of DMTs, major functions of DMTs, applications of DMTs to production management in the Big Data era 2017 Electronics is one of the fastest evolving, most innovative, and most competitive industries. The research and development of new and improved products are of great importance, where companies often compete fiercely to bring the newest technology to the market first. The past five years, from 2012 to 2017, have been characterized by growth in emerging markets and introduction of new products, leading more people to buy consumer electronics. The global consumer electronics industry was valued at $283 billion in 2015 [12]. Grand view research predicted that the global consumer electronics market is expected to reach $838.85 billion by 2020 [13]. The newly developed products are featured by high precision, long and complex manufacturing/test processes with high purity environments, diverse and high-quality requirements from customers, and a large amount of data generated at different stages of their lifecycle from design and production to sale and service. Thus, the electronics industry is currently in the midst of a data-driven revolution [7] which has pushed forward many data excavation related research over the past decades for the better utilization of these data that can facilitate quality or service improvement, production optimization, and so forth. [14]. A review of DM with Big Data application in the electronics industry not only benefits researchers to develop strong research themes and identify gaps in the field but also helps practitioners for DM application system development.
In the following sections, DM with Big Data and related techniques are given in Section 2 in which a brief introduction of the concepts of DM and Big Data is presented, and also the flowchart and the main content of the flowchart steps are summarized. In Section 3, the article selection condition and distribution of the selected articles in different years and different lifecycle stages are discussed. A comprehensive analysis of the reviewed literature from various points of view is provided subsequently, in Section 4, which summarizes data handling, discusses the DM with Big Data application in different stages of the product lifecycle, and surveys the software used in these applications. On this basis, the data content and a framework for DM application in the electronics industry are established in Section 5. Finally, the conclusions and future research directions are given in Section 6.

Concepts of Data Mining and Big Data
There are many concepts such as DM, KDD, and Big Data that are closely related to each other. DM, as an interdisciplinary subject including database design, statistics, pattern recognition, machine learning, and data visualization [6], can be defined in many different ways. Romero and Ventura [15] specified DM as "the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data". Han et al. [16] defined DM as "the process of discovering interesting patterns and knowledge from large amounts of data".
Many researcher and practitioners treat DM as a synonym for KDD as IBM [17] deems KDD and DM the same as "an interdisciplinary area focusing on methodologies for extracting useful knowledge from data". However, others think that "KDD refers to the overall process of discovering knowledge from data while DM (in a narrow sense) refers to application of algorithms for extracting patterns from data without the additional steps of the KDD process" [16], in which the additional steps include data preparation, preprocessing, incorporation of appropriate prior knowledge, and proper interpretation of the results of mining [16]. Here, we take DM as a synonym for KDD whereas DM in a narrow sense refers only to the step to generate a specific pattern using a particular algorithm within an acceptable computational efficiency limit [11,16].
There are various definitions of Big Data from 3 Vs to 4 Vs [18]. Volume, velocity, and variety are the well-known 3Vs and the fourth V can be value, variability, or virtual [8,18]. Wikipedia specifies that "Big Data is data sets that are so voluminous and complex that traditional data processing methods are inadequate to deal with them" [19]. Gartner gives a more detailed definition as follows: "Big Data is high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization" [20]. Big Data analysis is strongly connected with classical data analysis and DM approaches to access and process these amounts of data very fast [2,10].
The flowchart of DM with Big Data is illustrated in Figure 1. The main content of each step includes data preparation, preprocessing, DM in a narrow sense, and evaluation. The interpretation of the results will be discussed in the following sections.

Data Preparation and Preprocessing
The data preparation includes problem clarification and collecting the targeted data. The problem clarification is to understand the industry domain including the relevant prior knowledge related to different applications and targeted goals [4]. The targeted data can be obtained by experimental observations, historical accumulated records, online sensor measurement, real-time status of RFID tags, and simulation results. These data sets can be stored in different formats such as data warehouse, marts, database, files, and so on [4,16], and the data relevant to the mining tasks are retrieved and selected before data preprocessing. The preprocessing consists of data cleaning, transformation, reduction, and discretization. Data cleaning operation involves techniques for filling in missing values, smoothing out noise, handling outliers, detecting, and removing redundant data. Data transformation puts the data into appropriate forms for mining when necessary. Data reduction is performed to obtain a smaller representation of the original data without sacrificing its integrity. Dimensionality reduction, numerosity reduction, and data compression are the three ways for data reduction. Dimensionality reduction is a technique to detect and remove irrelevant, weakly relevant or redundant attributes [16]. Numerosity reduction replaces the original data volume by alternative and smaller forms of data representation. In data compression, transformations are applied so as to obtain a reduced or compressed representation of the original data, such as principal components analysis (PCA). Discretization reduces the number of levels of an attribute by collecting and replacing low-level concepts with high-level concepts [4].

Data Mining in a Narrow Sense
Data mining in a narrow sense, as the core of DM, is to derive the model and mining the patterns/knowledge in the data. The patterns to be mined determine the DM functions to be performed which can always be divided into descriptive and predictive DM. The descriptive function is to characterize properties of the data in a target data set that mainly includes the functions of summarization, clustering, and association/sequential pattern mining. While the predictive DM performs induction on the current data in order to make predictions that mainly consists of the functions of classification, prediction, outlier detection (anomaly detection), and time series analysis [4,11,16]. The corresponding data mining techniques (DMTs) to realize different functions can be categorized into statistical analysis-oriented (SA-oriented) and knowledge discovery-oriented (KD-oriented). SA-oriented techniques make assumptions about data distribution and relationships between variables based on prior knowledge in advance and verify or deny the assumptions. Common SA-oriented DMTs include the algorithms such as regression, k-nearest neighbor (k-NN), k-means, Bayesian classifier [21], and so on. On the contrary, KD-oriented DMTs search for the relationship automatically under no clear assumptions [11]. The details of the DM functions and the related DMTs are summarized in Table 2 [4,11,16].

Performance Indicators
The knowledge extracted should be evaluated and interpreted correctly to obtain reliable results. The evaluation of the DM methods to reach a final decision requires a comparison of results obtained from various DM methods using several measures [4]. The performance indicators employed to evaluate classifiers based on a confusion matrix are illustrated in Figure 2. The indicators widely used for the measurement of prediction, clustering, and association of DM functions are summarized in Tables 3-5 respectively.  [22]. Table 3. The performance indicators for the prediction function [23].

Indicators Equation Indicators
Equation Note: y i andŷ i are the observed and predicted value of sample i respectively; y is the average result of samples.
(ŷ i − y) 2 is the explained sum of squares. E is the expectation value. Table 4. The performance indicators for the clustering function [24].

Indicators Equation Description
DBI n is the number of clusters, c x is the centroid of cluster x, σ x is the average distance of all elements in cluster x to centroid c x , and d(c i , c j ) is the distance between centroids c i and c j . The definitions of TP, TN, FP, FN, precision, and recall are the same as the specifications given in Figure 2; β is the penalty coefficient.
N refers to the number of original data vectors, and r β is the best matching unit of the data vector x i ; u(x) gets the value of 1 if the best and the second best matching units of the input vector are non-adjacent, and 0 otherwise.

Indicators Equation Description
Support sup(X) = |{t ∈ T; X ⊆ t}| |T| X is an item set, X → Y is an association rule, and T is a set of transactions. Support of X (sup (X)) with respect to T is defined as the proportion of transactions t in the dataset which contains the item set X. con f (X → Y) is the proportion of the transactions that contains X which also contains Y.
Accuracy (ACC), precision, sensitivity or recall, specificity, and so forth, given in Figure 2, are the commonly employed indicators. Meanwhile, the receiver operating characteristic curve (ROC) created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings is always taken to illustrate the diagnostic ability of a binary classifier system as its discrimination threshold is varied.
The performance indicators for prediction mainly include the mean absolute percentage error (MAPE), the mean squared error (MSE), the mean absolute error (MAE), the root-mean-square error (RMSE), the root absolute error (RAE), the mean error (ME), the variance of errors (VARER), the relative error (RE), the goodness of fit (R 2 ), the index of agreement (IA), and so on. Typical objective functions to assess the quality of clustering include internal and external criteria. The internal criterion for the quality of a clustering can be evaluated by the Davies-Bouldin index (DBI), Dunn index (DI), and so on, while the most used external criteria includes purity, normalized mutual information (NMI), rand index (RI), F measure, and so on. Meanwhile, some indicators like the quantization error (QE) and the topographic error (TE) are for a special algorithm like self-organizing map (SOM). The support, confidence, lift, and conviction are pervasive performance indicators for association. The outlier detection can be taken as a binary classification, and the performance indicators for classification can be used to evaluate the results. Time series analysis can be used for clustering, classification, and anomaly detection, as well as forecasting, and therefore, the related performance can be verified by the corresponding indicators for clustering, classification, and prediction.

Article Selection and Distribution
The electronics industry is composed of organizations involved in the design, development, manufacture, assembly, and service of electronic equipment and components. These organizations offer a wide variety of products that range from government products, industrial products, consumer products, and electronic components as four primary segments. Each category serves a specific market, which allows it to focus on components and products geared toward their customers. The government market is primarily developed for aircraft and military products, as well as communication technology and medical devices. Industrial products include large-scale computers, radio and television broadcasting equipment, telecommunications equipment, and electronic office equipment, while consumer products are the well-known televisions, cell phones, DVD players, smartphones, radios, video game systems, personal computers, electronic ovens, and home intercommunication and alarm systems. The final segment the manufacturers produce and sell includes electron tubes, semiconductors, printed circuit boards (PCB), and passive components [25].
Based on the initial search from databases with keywords such as DM, Big Data, and electronics, we found that most of the articles were related to consumer products and components. Therefore, articles related to DM with Big Data applications in consumer electronics and components were selected here. On this basis, the article selection was conducted in which the period of interest for this literature survey ranges from 2007 to 2017. In October 2017, a search was made according to the following conditions: (1) Database: Science Direct, IEEE Xplore Digital Library, Springer Link, Taylor & Francis Online, Wiley Online Library, SAGE Journal, Web of Science, and Google Scholar (2) Stages: design, production, sale, service, and recycling (3) Products: electronic products, integrated circuit, wafer, semiconductor, PCB, phone, and computer (4) DM-related concepts: data mining, Big Data, and knowledge discovery (5) DM functions: Prediction, classification, clustering, association, product/process characterization, time series analysis, outlier detection, and anomaly detection.
A total of 105 application studies within the scope of this review were found. The distribution of the selected articles in different years and different stages are illustrated in Figure 3. It can be seen that 17% (17 articles) were related to the stage of product and manufacturing process design [26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42], and more than 75% (80 articles) applied DM and Big Data to production management and control in the stage of production , but less than 8% (8 articles) of applications focused on the stage of sale, service, and recycling [123][124][125][126][127][128][129][130]. The fluctuation in quantity of the selected articles in different years presents no obvious tendency, however, it indicates that the topic has attracted ongoing attention and research during the past decades, and the application areas have been extended and many new approaches have been developed.

Data Mining with Big Data Applications in the Electronics Industry
In the following, we examine and discuss the reviewed literature from various points of view based on the flowchart given in Figure 1. Data handling, or more specifically, data preparation and data preprocessing before performing the DM functions, are discussed first. Next, DM with Big Data applications in different stages of the electronics industry, including the knowledge area, DM functions, developed DMTs, and performance indicators, are summarized. In addition, findings of these applications in each knowledge area are given, and the summarization of these reviews is also presented. Finally, the software tools used in these applications are examined.

Data Handling
Data preparation is the initial step of DM to collect the necessary data recording the feature values directly from the experimental data and historical observations or indirectly from the simulation results [4], in which the experimental data are the records of full factorials or fractional factorials while historical observations can be obtained either through online measurements or from historical accumulated records. The data preparation from the reviewed literature is summarized in Table 6.
Through Table 6, we can see that the data for the verification of product design improvement and manufacturing process optimization were mainly based on experimental observation and historical records. The DM application in the production process monitoring and control for the tasks of fault detection and classification (FDC), run to run (R2R), statistical process control (SPC), and so on, worked mainly on the data obtained through online measurements while DM in production and quality management for the tasks such as scheduling, yield/cost/cycle time prediction, and so forth was conducted mainly based on historical records from ERP and MES along with some process simulation. The task of SCM and CRM is conducted mainly based on interactions and transaction records accumulated in the system of SCM, OMS, and CRM.  Table 7, from which we can see that most of the cleaning techniques were used for the observational data sets. Some imputation techniques such as the missing values-patient rule induction method (m-PRLM) [30], k-NN [84,87], syndromes imputation [109] and so on were developed for filling in missing values. SVM [54], moving average smoothing [90], King-move neighborhood [93], Winter's exponentials smoothing [126,127], and so on were employed for noise smoothing. Meanwhile, the methods of box plot [79,88], PCA [97], clustering [122], and so forth were applied for outliers detection. However, the missing values, noise, outliers, and redundant data were omitted directly in most cases.  Data transformation is the process of converting data from one format or structure into another. The pervasive method is normalization for the selected articles but few were conducted based on variance scaling [35,63], text mining [42,123], Fisher Z-transformation [44,47], binary vector transformation [91,93,117], Box-Cox transformation [84], and numerical into binary [85].
Dimensionality reduction, as one of the important approaches to data reduction, is to remove the irrelevant and redundant variables to reduce the complexity of analysis and the generated models, and also to improve the efficiency of the whole modeling processes. The widely used approaches from the reviewed articles include regression [34,44,45,83,86,89,110,126,127], analysis of variance (ANOVA) [27,79], GA [55,83], Las Vegas filter [60,78], Pearson coefficient [79], Cramer's V correlation coefficients [85,87], and so on. Clustering [38,90,101,126], aggregation [34,87], and sampling [82] based approaches were applied to reduce the data numerosity. PCA or the modified PCA [64,65,83,92,94], and multi-dimensional scaling [84] were employed to compress the representation of the original data. Only a few of the researchers conducted discretization for continuous attributes at the stage of preprocessing.

Application of DM with Big Data in Different Stages
DM with Big Data has been applied in different stages including design, production, sale, service, and recycling for different scenes, such as product design improvement, manufacturing process optimization, PMO, production process monitoring and control, quality management, CRM, SCM, and so forth. The application of DM with Big Data for the procurement of electronics components at the production stage has not been studied in the reviewed articles. Meanwhile, few reviewed articles have devoted their research into product distribution and logistics that mainly includes order process, inventory management, and product transportation at the stage of sale and service, and thus, we take them into SCM as a whole. The order management as an extension of CRM will also be considered as CRM. Quality improvement (QI), development time/cost estimation (DTCE), PMO, AEC/APC, CRM, and SCM considered in the review are the typical knowledge areas to enhance the intelligence and efficiency of lifecycle management and control in which the data-driven QI is closely related to product design improvement, manufacturing process optimization, and quality management. AEC/APC, as the core of production process monitoring and control in the electronics industry, is also used to enhance product quality or yield. The task of AEC/APC is always conducted online during the manufacturing process and has attracted a lot of research. The description of these knowledge areas and their tasks is summarized in Table 8.
In the following sections, from Section 4.2.1 to Section 4.2.3, the summarization will not be taken as a function alone because it is employed to characterize the product/process and then to facilitate the functions of prediction, classification, clustering, and so forth. The SA-oriented and/or KD-oriented categories of different DMTs in an article will also be included. Description of product/process (1) Identifying attributes that affect quality significantly; (2) Comparing the end result of the whole process with the desired specifications, analyzing the root causes of low yield for adjusting the process parameters to ensure future quality [102], and we call it as post hoc (fault) diagnosis here.
Design and production stage Quality classification For a given set of input parameters, predicting the class of the quality output.

Quality prediction
Predicting what the resulting quality (yield) characteristic will be for a given set of input parameters or process values.

Parameter optimization
Based on the learned features of the cases, yielding high-quality and finding optimal levels of process/product parameters that consistently yield target performance. AEC/APC [4,11,86] Fault detection and classification Fault detection (FD) is to monitor and analyze the variation in equipment, tool or process data and detect anomalies, and the fault classification is to determine its root cause.
Production stage

R2R
Modifying recipe parameters or the selection of control parameters between runs to improve performance.
Virtual metrology (VM) Prediction of post-process metrology variables using process and wafer state. Equipment health monitoring (EHM) Monitoring tool parameters to assess the tool health as a function of deviation from normal behavior.

Statistical process control
Using statistical methods to analyze processes or products to take appropriate actions to achieve a state of statistical control and continuously improve the process capability.
CRM [3] Customer identification, attraction, retention, and development Analyzing and understanding customers' behaviors and characteristics. Sale, service and, recycling stage SCM -DM application for the management of the flow of goods and services

Application of DM and Big Data for Design
The design stage includes the product design followed by process planning. Product design is to create a new product while process planning is to translate product design requirements to manufacturing process details that act as a bridge between product design and manufacturing. Capodieci [14] presented a review of the data analysis and machining learning for the design process yield optimization in electronic design and semiconductor manufacturing. Another 17 articles related to DM application in the stage of design have been retrieved and summarized in Table 9, and the following findings can be achieved: (1) The quality improvement of product design [28], the prediction of development cost and time [34,37], and the product customization [38,42] are the main applications of DM in product design.
(3) KD-oriented ANN is the widely used DMT. The pervasive function is prediction and has been widely employed for parameter optimization and determination of its effect [26,27,[32][33][34][35][36] followed by clustering and association. Clustering was mainly employed to identify similarity products, process plans, and parameters and no supervision classification was conducted to support more efficient and reasonable manufacturing [39][40][41]. Association was mainly used to identify purchase behavior and therefore, develop marketing competitive products [38,42].

Application of DM and Big Data for Production
The product in its final shape is obtained in the production phase. The knowledge areas of DM with Big Data application in the stage of production include PMO, AEC/APC, and quality improvement. The reviewed studies are summarized in Tables 10-12 for PMO, AEC/APC, and quality improvement, respectively. The following conclusions can be obtained for the application of DM with Big Data for PMO: (1) The scheduling optimization, cycle time, complete time, and output time prediction for wafer fabs have attracted most of the research. The reason may be that wafer fab usually takes several months and is the top priority for improvement. Therefore, cycle time reduction is always an important task in controlling a wafer fab factory. To become an agile supplier, shortening the cycle time of every operation is critical [51].
(2) Hybrid approaches combining fuzzy logic/clustering with ANN have been developed for different applications because of the un-deterministic characteristic factors that require fuzzy expressions, such as the release time, average fab utilization, total queue length on the processing route, and cycle time. Since they cannot be determined accurately, a certain probability distribution is needed. The fuzzy based DM approaches facilitate more realistic pattern extraction.
(3) The tasks realized by ensemble approaches combining fuzzy c-means (FCM) or SOM-based clustering with ANN-based prediction were pervasive. The purpose of clustering is to classify objects according to its similarity considering various features and therefore, improve the accuracy of prediction. The results show that the hybrid approaches with clustering-based pre-classification or post-classification are some of the most accurate approaches used to estimate the cycle/lead time or the complete date and obtain an optimization scheduling plan [51].  Association (2) FACRs KD-oriented - [50] FNN + ANN + Apriori Cycle time Prediction Support, confidence [51] FACRs: Fuzzy association classification rules; GNR: Gauss-Newton regression; RBFNN: Radial basis function neural network.
Tens of thousands of monitoring and online detection measurement values, and hundreds of electrical test parameters timely measured at different positions on a wafer during the fab process facilitates the Big Data application for production control. The typical knowledge area of these applications is AEC/APC that is a collection of tasks including FDC, R2R control, SPC, and VM to reduce the process variation and meet the process target for yield (quality) enhancement.
The related literature is summarized in Table 11 from which we can see that the outlier detection was conducted online and the time series analysis was employed for anomaly detection while the prediction function was mainly used for VM and R2R. Classification and clustering have been widely used for FDC. Some preprocessing like regression [87][88][89] was conducted to identify the main effects on observation variables before classification or clustering model establishment.   The data-driven mechanism is one of the pervasive approaches to FDC [114] and the summarization in Table 11 also indicates that FDC (or FD only) is the most researched task of AEC/APC [84][85][86][87][88][89][90][91][92][93]95,[97][98][99]. The wafer fab is a complex and lengthy process that involves hundreds of process steps, and early FD gives engineers more time to perform appropriately to avoid serious equipment abnormalities [84][85][86] while fault classification can be considered as the combination of fault identification and diagnosis in order to identify the main effects on observation variables, concentrate on the process variables related to diagnosing abnormalities, and then to determine the cause of the observed out-of-control status that can facilitate the process recovery by removing the cause of the fault to reduce yield loss [86,90].
R2R control consists of several levels including real-time control, single-process R2R control, inter-process R2R control, and factory-level R2R [79]. FDC stands for a representative technique of real-time control. Single-process R2R control focuses on an individual process module while the selected R2R related articles concentrate mainly on inter-process R2R that deals with the process control of two or more inter-related process modules [80,86] combined with other tasks like FDC. The factory-level R2R has only been considered by a few research papers [79] that are used to enhance the results of electronic tests in wafer acceptance tests and yield circuit probe tests.
The reviewed VM-related literature utilized MLR [80], ANN [81], SVM(R), and DT [82,83] based on the production equipment data and preceding metrology results to predict every wafer's metrology measurements, which fills a lack of physical measurement by prediction that enables the measurement of every wafer for every process step on all capable equipment available in the fab, thus, allowing significant improvement of process control and product quality, reduction of operational cost, and production cycle time [81,83].
In SPC, significant characteristics are monitored such as the failure percentage of wafer bin maps [93] and the soldering quality [94]. The process control chart, as a widely used approach to SPC [93,94], has been used to diagnose and identify the variability of the fab process. The statistical process system can help detect defects that might originate from the process steps to improve quality and eliminate the need for expensive post inspections [94,96]. With increasing the demand for high-quality products and reliable processes, multivariate statistical process control (MSPC) has been developed to ensure that equipment is "statistically controlled" by monitoring two or more related quality characteristics simultaneously [105].
The above review indicates that AEC/APC conducts monitoring of online measurements of specific process steps, and undertakes corrective action to ensure that the parameter being measured remains within the desired limits. However, the integration of FDC, R2R, SPC, and VM has been considered only in a few research papers [86], requiring further research from different aspects such as the consistency and integration of data, unified frameworks, high-efficiency algorithms and platforms, and so on.
The application of DM with Big Data for quality improvement of electronic products, especially for wafer fab at the production stage was summarized in Table 12. One of the research papers deals with predicting the performance (yield) of a manufacturing process or system in terms of critical functional characteristics. Months may pass before a chip is completed; hence, there is a great interest in mining production data to predict its performance prior to the final testing of the wafers [100][101][102][103][104][105][106][107][108]. In order to infer to the possible causes of faults and manufacturing process variations in semiconductor manufacturing after the whole fab process is completed, the clustering, classification, and association analyses are conducted based on different DMTs such as k-means, SOM, SVM, and decision tree to identify critical poor yield factors and determine the root cause of low yield. On this basis, the related process parameters can be adjusted to ensure future quality based on post hoc diagnosis [110][111][112][113][114][115][116][117][118]121]. Some studies combined with sequential pattern mining to identify the sequence association events between different operations during the manufacturing [119,120].  Decision correlation rules KD-oriented - [120] Time series analysis (1) Co-clustering SA-oriented Quality prediction RMSE [121] Outlier detection (1) Hierarchical clustering, DT SA-oriented, KD-oriented Post hoc diagnosis - [122] Moreover, more than hundred test items and millions of rows of data for wafers will be generated after testing, per day. According to the basic requirements of quality management, an essential work is to analyze these test items one by one according to different specifications and requirements. In accordance with the traditional mode of work, more than a hundred process capability indexes should be calculated step by step and the quality characteristics should be evaluated one by one with enormous and complicated operations. Meanwhile, it is difficult to determine the association between these indexes and present a comprehensive summary of the overall performance of the product. The application of Big Data for the quality management and analysis can easily generate a traditional single index process capability analysis report. More importantly, it can excavate many new results from the Big Data set [114].

Application of DM and Big Data for Sale, Service, and Recycling
The stage of sale, service, and recycling (SSR) is to store produced products in a warehouse and transport them to customers in logistics, and then the customers use the product while a manufacturer provides remote service. If it can no longer be used, it comes to the end of its life such as remanufacturing and disposal [8].
The summarization of DM application in the SSR stage is given in Table 13 and it can be seen that most of the applications related to CRM involve marketing and sales prediction [125][126][127], customer service [129], and the SCM to achieve greater efficiencies and effectiveness in delivering customer value [130]. The detailed information indicates that one direction of the research is to mine the behavioral characteristics of customers on the product and maintenance, and therefore, identify customer's requirement for customer attraction and retention [123,129]. Another one is to predict the marketing demand and price for customer identification and development, and therefore, to facilitate the plan optimization of production, procurement, and resource [125][126][127]. Only one article is related to the recycling of electronic products considering the storage behavior of customers [124]. From Table 13, it can also be seen that the prediction of marketing requirement and determination of a more reasonable price are the main functions while the clustering and classification have been taken to classify products and customer's requirement and identify the purchase feature of different customers. Text mining was utilized to excavate the knowledge from interaction records in some cases [123].  Figure 4 illustrates different functions used by the selected articles applied in different stages. It can be seen that the prediction, classification, and clustering functions are the top three functions employed for mining patterns at different stages. The six functions have been used in the production stage which indicates that there are diverse requirements of DM and Big Data application at this stage for different purposes, while the time series analysis and outlier detection function have seldom been used in the stage of design and SSR. Figure 5 illustrates the distribution of different knowledge areas considering the tasks of QI for design/production, DTCE, PTP, FDC, VM, R2R, SPC, CRM, and SCM according to Tables 9-13. The frequency in Figure 5 indicates that the QI for design, scheduling optimization, production time prediction, FDC, post hoc diagnosis, production yield/quality prediction, and optimization of sale and service for CRM are pervasive knowledge areas and tasks.  The statistic of different categories of DMTs adopted in the 105 articles for different knowledge areas are conducted and the results are illustrated in Figure 6 from which we can see that the pervasively used DMTs are hybrids or integrations of the SA-oriented and KD-oriented DMTs, especially for the knowledge areas of PMO, AEC/APC, and QI for production, followed by the combination of different KD-oriented DMTs or only one KD-oriented DMT. However, only one SA-oriented approach and the ensemble of SA-oriented DMTs have been widely adopted by researchers compared to other approaches. It can be seen that the top DMT used is ANN followed by fuzzy logic because many ANNs are combined with fuzzy logic to solve the scheduling optimization and production time prediction. ANN has been applied in eight areas of the above-mentioned knowledge areas except for SPC and R2R. Fuzzy logic has been used mainly for the production time prediction, yield/quality prediction, and the optimization of CRM/SCM. The DT has been widely employed for FDC and post hoc diagnosis. The regression has been pervasively used for feature selection and prediction of quality, yield, development cost, VM, and so on. The SOM, K-means, and FCM have been used for clustering, especially for the pre-classification of jobs while conducting scheduling optimization and production time prediction. GA has been used to find optimal levels of process/product parameters [26,27,[31][32][33][34][35][36], which can also be used to optimize parameters of DMTs such as SVM [55,83] and fuzzy clustering [105].

Software Used for the Selected Articles
Many algorithm engines, tools, and platforms have been developed to implement functions and related DMTs. Predictive analytics today summarized the top 50 free DM software [131], including Orange, RapidMiner, Weka, KNIME, SpagoBI, Anaconda, Octave, and so forth. Some commercial software including Sisense, Oracle Data Mining, Microsoft SharePoint, IBM Cognos, Dundas BI, SAP Business Objects, Matlab, Statistic, SAS EM, SPSS Clementine (IBM SPSS Modeler after 2009), Tanagra, Qlik Sense, and so forth have also been widely used by researchers and practitioners.
The different tools shown in Table 14 have been used for various purposes in mining applications from the reviewed literature. The category of software in the reviewed articles can be categorized into spreadsheets, statistical software package, DM software package, general purpose software, special purpose tools, and high-level languages.
Some statistical software packages such as MiniTab, SAS, SPSS, and Statistics were preferred for implementing SA-oriented methods such as MRA and ANOVA. Spreadsheet-application excel was mainly used for data preparation and preprocessing. However, commercial software packages such as SPSS Modeler, SAS Enterprise Miner (SAS EM), were only used in a few of the applications.
The general purpose software Matlab and special purpose packages based on Matlab were used in various applications for the design and production of QI, PMO, and CRM. They were mostly utilized to realize ANN, fuzzy logic, SVM, and SOM supported by several open source toolboxes such as NeuroSolutions, Neural Network, NeuralPower, Fuzzy Logic, LibSVM, and SOM. The association, outlier detection, and time series analysis functions were mainly conducted by commercial software packages such as SAS EM [38], and RapidMiner [42].
Some high-level languages such as C/C++ [94,120] and Visual Basic [70][71][72][73][75][76][77] were used for SOM, fuzzy c-means, fuzzy logic, ANN, and the combination of these approaches for its flexibility for researcher to design or combine particular methodologies considering domain knowledge in handling and analyzing the data. Meanwhile, some platforms such as the online system [79], fab-wide FDC [80], VM system [83], online time series prediction system [88], and wafer bin of map clustering and classification systems [117] have been developed for different tasks of AEC/APC based on high-level languages. However, the commonly used platforms for developing DM or Big Data application system such as WEKA [28], RapidMiner [42], R software environment [122], and Python [84] have been utilized by only a few of the researchers, indicating that the systematized applications of these results still require further development by practitioners.

Diagram of Data Content for Different Knowledge Areas and DM Framework for the Electronics Industry
The product lifecycle processes carry a huge number of structured, semi-structured, and unstructured data. Big Data analytics and DM technology can be used to make a deep analysis of historical lifecycle data, to discover knowledge, and to optimize the process of PLM. A framework with four modules including data sensing and acquisition, data processing and storage, DM model development, and Big Data application in PLM was presented by Zhang et al. [1]. However, the summarization and classification of lifecycle related data and its utilization by different knowledge areas have not been discussed. Meanwhile, the special application scheme for electronics manufacturing has not been considered. Therefore, the establishment of a diagram of data content for different knowledge areas and DM with Big Data framework for the electronics industry can guide companies to accumulate related data and develop DM strategy from the view of lifecycle and overall business chain, which can also facilitate researchers and practitioners to select appropriate techniques and better utilization of data for knowledge discovery.

Diagram of Data Content for Different Knowledge Areas
From the view of electronics lifecycle, the main data for different knowledge areas can be divided into engineering data, enterprise resource and environment data, production plan and arrangement data, manufacturing result data, and transaction and interaction related data. Figure 9 illustrates the main content of each category and its application for different knowledge areas. The detailed description of each category is given as follows. Figure 9. The data content for different knowledge areas. 1 Engineering data: It includes product structure and function, manufacturing process plans, and quality requirements to define what is to be manufactured and how to manufacture. Relevant DM with Big Data applications have been conducted to improve product quality and customers' satisfaction or to optimize process parameters. This data can be stored in different systems such as PLM and computer-aided process planning system with structured (bill of materials), semi-structured (requirement reports), and unstructured (design model or drawing) styles. a structured style, which has been widely used for the optimization of PMO tasks such as production time prediction and scheduling optimization at the production stage. 4 Manufacturing result records: Result records define the quality and quantity of products at a certain time and workplace. They are always accumulated in MES, quality management system, ERP, and storage management system with a structured style. RFID has been widely used for product lifecycle management in recent years, and the traced data generated automatically at different stages through RFID placed on materials, semi-products, and finished products can also be taken as the data of manufacturing result. Taking an example from the data involved in the wafer fab, it is generated at various steps including inline through metrology steps that measure test wafers and product wafers such as parameters of critical dimension, film thicknesses, film resistances, and so forth. It also includes electrical test and final yield data. DM-based post hoc diagnosis, yield prediction, and parameters adjustment are used to ensure the future quality has been conducted based on different steps of the result. They can also be combined with enterprise resource and environment data for AEC/APC. 5 Interaction and transaction data: Owing to the fast development of online trading and electronic commerce in the past decades, a large amount of records related to transactions and online interactions between upper stream supplier, middle collaborator, downstream customer have been accumulated. The structured transaction data, semi-structured or unstructured interactions have been widely used for the optimization of SCM and CRM such as marketing analyses and product design improvement based on the feedback from customers at the design stage, procurement and inventory optimization at the production stage, price and demand prediction, customer identification, attraction, retention, and development at the SSR stage. Text mining techniques have also been used to excavate the pattern from the interaction text and were combined with DM for the final knowledge discovery [123]. Meanwhile, RFID-based records can be used for product tracing in transaction, service, and recycling.

Data Mining with Big Data Frameworks for the Electronics Industry
On the basis of the aforementioned review, a framework of DM with Big Data applications in the electronics industry is presented in Figure 10 in which the stage of design and production corresponds to the beginning of lifecycle, and the sale and service can be taken as the middle of lifecycle, while recycling is at the end of lifecycle, respectively [1,8]. Each stage of the lifecycle corresponds to different application scenes. The DM application of product and manufacturing process design mainly includes the product design, manufacturing process design, and marketing with relevant knowledge areas, such as quality improvement, development cost and time prediction, product customization, manufacturing parameter optimization, and SCM. The equipment management, production management, and procurement are the main application areas of DM with Big Data in the production stage with typical knowledge areas such as AEC/APC, PMO, QI, and SCM. The application of the DM with Big Data for sale and service cannot only be support for quality improvement and customization design but also optimize logistics and facilitate customer service and maintenance. The recycling attracted less attention from DM with Big Data application in the electronics industry, which could be used in remanufacturing, reuse, and environment protection, considering the knowledge areas of product recovery, remaining life prediction, and reverse logistics optimization.
The details of knowledge areas of different stages have been summarized in Section 4.2. The quality improvement for design and production can be further divided into quality (yield) prediction, classification, description, and parameter optimization. Post hoc diagnosis can be taken as the quality description at the production stage with the purpose of process parameters adjustment to ensure future quality. The tasks of AEC/APC that consists of FDC, R2R control, SPC, VM, and so forth are also for quality enhancement, and therefore, the quality improvement at the production stage and AEC/APC are not a disjoint division here. The DM and Big Data application in PMO is a collection of scheduling optimizations, cost/time prediction, and so on.
SCM is used to optimize the logistics for material supply at the beginning stage of the lifecycle and it can also be used to achieve greater efficiencies and effectiveness in delivering customer value at the end of the lifecycle. The application of DM or Big Data tools in CRM is an emerging trend in the global economy. Analyzing and understanding customer behaviors and characteristics is the foundation of the development of a competitive CRM strategy so as to acquire and retain potential customers and maximize customer value [3]. The tasks of customer identification, attraction, retention, and development of CRM can be realized through Big Data-based marketing prediction, personalized service, predictive maintenance, remote online diagnosis and so on.
Data preparation such as data acquisition, accumulation, and storage for different knowledge areas and applications can be guided by the diagram of data content for different knowledge areas given in Section 5.1. The commonly used data preprocessing techniques including data cleaning, transformation, reduction, and discretization that can utilize the preprocessing approaches summarized in Sections 2.2 and 4.1, based on the requirement of application areas and the quality of data. DM, in a narrow sense, for each function, can be implemented based on some pervasive DMTs summarized in Section 4.2.4. The interpretation, evaluation, and implementation software can be conducted by combing experts' knowledge with performance indicators given in Section 2.4, which is not given in the framework because it has many selections in practice. The final purpose of the DM application has been proved by many researchers and practitioners. This framework provides an option for different types of companies and expects for further extension.

Conclusions and Future Research Directions
This paper presents a comprehensive review of DM with Big Data towards its applications in the electronics industry. We can see that the DM with Big Data has been applied to different scenes including product design improvement, manufacturing process optimization, PMO, production process monitoring and control, quality improvement, CRM, and so forth.
Customer-oriented product development and process plan optimization are the main applications for product design improvement and manufacturing process optimization in the stage of design. Prediction was the most frequently used DM function observed in the reviewed articles. ANN and regression were the widely used DMTs for the prediction.
The application of DM with Big Data for process monitoring and control, PMO, and quality improvement in the stage of production has attracted the interest of most research. On the one hand, sophisticated DM and Big Data related techniques such as FDC and R2R have been developed for the wafer production process monitoring and control to reduce defects and improve the quality/yield based on the data collected from manufacturing processes, equipment/tool/environment statuses, and process parameters. The functions of classification and clustering were widely used for FDC based on related DMTs such as DT, SVM and ANN, k-means and SOM, while the prediction function was widely presented for VM based on ANN, regression, and SVM. On the other hand, prediction, clustering, and the combination of the two are the most frequently employed functions for the optimizing scheduling plan and prediction of cycle time/due date based on ANN, FCM, SOM, and a hybrid of fuzzy logic and ANN. Additionally, post hoc diagnosis, quality prediction, and classification were conducted based on the functions of prediction, classification, clustering, and association for future production quality improvement.
Most of the DM applications are related to CRM at the stage of SSR for the purpose of acquiring and retaining potential customers and maximizing customer value based on the records of transaction and online feedback from customers. Prediction, classification, clustering, and time series analysis functions were conducted based on ANN, regression, and SVM for sale and service to mine the consumption habits and predict the marketing price.
The achievement of the reviewed articles facilitates theoretical study and practical application of DM with Big Data to the electronics industry. Nevertheless, the limitation and challenges still exist for future research.
(1) Data preparation and preprocessing. The data of the product lifecycle are characterized by multisource (for example, design, production, and service data), heterogeneity (for example, structured, semi-structured, and unstructured data), and "noise" (for example, incomplete, incorrect, redundant, and inconsistent data) [1]. These problems increase the difficulties of data preparation, preprocessing, and subsequent mining, and also generate misleading patterns. However, little effort has been devoted to handling these problems. Manufacturing organizations with well-established and integrated data collection systems would benefit from a larger application of DM and Big Data [4]. Unified management and storage of the multi-source and heterogeneous data are necessary, and this motivates enterprises to develop DM strategies with dedicated consideration to data accumulation, integration, and consistency. Multi-business requirements integration, concept standardization, unified model establishment, and data/system interface development should be conducted collaboratively to facilitate data utilization. The standardization of operations such as data entry, storage, and maintenance should also be conducted accordingly to ensure the data quality and reduce data redundancy.
(2) The knowledge area of DM application. DM has been widely used in the stage of design and production especially for wafer fab and PCB assembly, and the pervasive knowledge areas include QI, PMO, AEC/APC, and so forth. However, potential applications such as customization production, procurement, warehouse management and inventory balance, and equipment maintenance and repair require more relevant data accumulation and extended mining. The global logistics industry has a large ever-growing amount of Big Data and is flooded with real-time data ranging from smartphones, sensors, and digital machines [9]. However, the application of DM with Big Data in SCM and logistics for electronic products has attracted few special discussions. Meanwhile, little effort has been put on CRM and order management combining the features of electronics such as a large amount of consumers, fast replacement of new products, and fierce market competition.
The patterns and knowledge hidden in Big Data are multidimensional (for example, various departments and lifecycle stages) and scattered, which hinders the effective mining and utilization of the knowledge. Therefore, further studies can be conducted to mine consumer habits and market characteristics to support more reasonable decision for customization product development, market pricing, and maintenance based on the association, prediction, and time series analysis functions. The fast upgrading of electronic products resulting in a large number of e-waste and the use of DM and Big Data to improve the efficiency and effectiveness of its energy saving, recycling, reverse logistics, and reduction of environmental risks are a worthwhile attempt. More importantly, the macro strategy for integrated mining and integration applications for the whole lifecycle should be considered and developed by enterprises.
(3) DM functions and DMTs. The prediction, classification, and clustering are the most frequently used DM functions while the other three functions (outlier detection, association and time series analysis) have been used only in a few situations. The extended investigation of outlier detection, sequential pattern mining, and time series analysis considering time information for online model development and updating could enable companies to respond promptly to dynamic and emerging situations. For DMTs, the parameter optimization of DMTs, such as ANN and SVM, requires continuous study. While FCM and fuzzy logic have been combined with ANN to handle uncertainty, they might be combined with other related mechanisms such as SVM and regression. Additionally, these approaches would handle Big Data with easy implementation and high performance, and more deliberate consideration for industrial applications is required.
(4) Algorithm performance. In general, it is difficult to obtain results with obviously competitive advantage in the existing single algorithm. Generally, a hybrid mining algorithm needs to be constructed based on the characteristics of the problem by integrating different functions and different DMTs so as to ensure the validity and advantage of the algorithm. How to set and optimize algorithm parameters, such as parameters of ANN and SVM, also remains to be further studied. Meanwhile, how to evaluate the advantages and disadvantages of the developed algorithm dynamically and ensure the robustness of the algorithm under certain data loss and redundancy needs to be further compared. How to evaluate the under-fitting and overfitting of algorithms and balance of the two has been paid less attention and requires further consideration.
(5) Software and implementation: Many researchers employed special purpose tools, such as NeuroSolutions, Neural Network Toolbox, LibSVM, Fuzzy Logic Toolbox, and SOM toolbox to implement the developed algorithms. Meanwhile, many approaches were developed by Matlab. A dedicated software package and Matlab integration of the basic engine allowed researchers to implement the proposed algorithm and verify the results more easily. The FDC was always conducted based on online analysis related platforms that were developed independently because of its high-efficiency requirements for data preprocessing and algorithm execution. However, application-oriented software platforms, such as Orange, IBM SPSS modeler, WEKA, and RapidMiner were employed only by few researchers in the reviewed articles. In order to strengthen the connection between enterprises and research, one of the important directions is to directly develop the application platform and then, to validate and optimize the results through practical feedback. In addition, DM technology should be combined with data management and visualization tools that can facilitate user understanding, operating, and utilizing data efficiently.
(6) Knowledge maintenance and updating. Most of the mining was conducted statically and the corresponding data handling was conducted based on batch data. These approaches were difficult to learn by themselves and the patterns obtained were often difficult to update dynamically based on newly accumulated data. Nowadays, data is generated continuously and typically sent in the data records simultaneously and in small sizes. This data needs to be processed sequentially and incrementally on a record-by-record basis or over sliding time windows and also used for a wide variety of analytics and mining. Online mining and learning will be an important challenge for further research.

Conflicts of Interest:
The authors declare that they have no conflict of interest.