Multivariate Control Chart Based on Kernel PCA for Monitoring Mixed Variable and Attribute Quality Characteristics

: The need for a control chart that can visualize and recognize the symmetric or asymmetric pattern of the monitoring process with more than one type of quality characteristic is a necessity in the era of Industry 4.0. In the past, the control charts were only developed to monitor one kind of quality characteristic. Several control charts were created to deal with this problem. However, there are some problems and drawbacks to the conventional mixed charts. In this study, another approach is used to monitor mixed quality characteristics by applying the Kernel Principal Component Analyisis (KPCA) method. Using the Hotelling’s T 2 statistic, the kernel PCA mix chart is proposed to simultaneously monitor the variable and attribute quality characteristics. Due to its ability to estimate the asymmetric pattern of the mixed process, the kernel density estimation (KDE) used in the proposed chart has successfully estimated the control limits that produce ARL 0 at about 370 for α = 0.00273. Through several experiments based on the proportion of the attribute characteristics and kernel functions, the proposed chart demonstrates better performance in detecting outlier and shift in the process. When it is applied to monitor the synthetic data, the proposed chart can detect the shift accurately. Additionally, the proposed chart outperforms the performance of the conventional mixed chart based on PCA mix by producing lower false alarm with more accurate detection of out of control processes.


Introduction
The control chart can visualize the quality characteristics in a graphical form and calculate its control limit based on the symmetric or asymmetric distribution of the monitored processes. In statistical process control (SPC), two types of control chart have been developed based on the monitored quality characteristics, namely the variable and attribute charts [1]. The variable control chart is developed to monitor the metric quality characteristics (variable or ratio scale) such as length or height. On the other hand, to monitor the nonmetric quality characteristics (categorical scale), the attribute chart is used. Some works have developed the variable and attribute charts, especially to monitor more than one characteristic (multivariable or multiattribute characteristics). The Shewhart, multivariate exponentially weighted moving average (MEWMA), and multivariate cumulative sum (MCUSUM) type charts are developed to accommodate the multivariable characteristics [1].
The recent development of the Shewhart type chart includes the T 2 -PCA chart [2], the robust T 2 chart [3,4], the variable parameters (VP)-T 2 Hotelling chart [5], and the T 2 chart for short-run C = 1 n n j=1 x j x T j . (1) The new coordinate, principal component, is calculated based on the eigenvector projection of the input data. PCA works under the assumption that the data has a linear relationship. However, in the complex case such as the chemical industry or biology, the relationship of the data is not always linear. As a consequence, the conventional PCA has a poor performance for such a case [26].
To overcome the nonlinearity problem, Schölkopf et al. [22] introduced the kernel PCA method. The basic idea of this method is calculating the PCs in feature space by conducting a nonlinear mapping Φ : R p → F, x → X (see Figure 1). This can be done by involving kernel functions known from SVM [27]. In other words, PCA can be executed in feature space F by employing the kernel function. The new coordinate, principal component, is calculated based on the eigenvector projection of the input data. PCA works under the assumption that the data has a linear relationship. However, in the complex case such as the chemical industry or biology, the relationship of the data is not always linear. As a consequence, the conventional PCA has a poor performance for such a case [26].
To overcome the nonlinearity problem, Schölkopf et al. [22] introduced the kernel PCA method. The basic idea of this method is calculating the PCs in feature space by conducting a nonlinear mapping : , Figure 1). This can be done by involving kernel functions known from SVM [27]. In other words, PCA can be executed in feature space F by employing the kernel function. Assume that the centered input data are mapped to feature space F, 1 (x ),..., (x ) n   . The covariance matrix in feature space can be written as: The next step is finding the eigenvalues 0 By substituting the F C in Equations (2) and (3), it can be found that: Assume that the centered input data are mapped to feature space F, Φ(x 1 ), . . . , Φ(x n ). The covariance matrix in feature space can be written as: The next step is finding the eigenvalues λ ≥ 0 eigenvector V ∈ F\{0} that satisfies: By substituting the C F in Equations (2) and (3), it can be found that: where Φ(x j ), V is a dot product between Φ(x j ) and V. As a consequence, all solutions from V with λ ≥ 0 lies on the range of Φ(x 1 ), . . . , Φ(x n ). Therefore, λV = C F V is equivalent to: and there are α 1 , . . . ., α n so that: By combining Equations (5) and (6), we found that: In general, the mapping Φ(.) is not always can be calculated. To solve the problem, we just need to calculate the dot product from to vector in feature space. Let matrix K with a size of n × n be defined as K ij = Φ(x i ), Φ(x j ) . By replacing the left-hand side from Equation (7) with matrix K we found: and the right-hand side from Equation (7) becomes: By combining Equations (8) and (9), we found the following expression: If we simplify the Equation (10) into a matrix form, we found: The solution of Equation (11) can be found by solving the eigenvalue problem from: for non-zero eigenvalue. In other words, conducting PCA in feature space is equivalent to solving the eigenvalue problem from Equation (12). After solving the eigenvalue problem, eigenvector α 1 , α 2 , . . . ., α n and eigenvalue λ 1 ≥ λ 2 ≥ . . . ≥ λ n can be determined. The dimension reduction is conducted by taking the first l eigenvector. Further, normalize the α 1 , α 2 , . . . ., α l that provide V v , V v = 1, ∀ v = 1, 2, . . . , l. From Equation (6), we found that: Symmetry 2020, 12, 1838

of 25
Thus, V v , V v = 1 can be written as Principal component score t is calculated by projecting Φ(x i ) to eigenvector V v where v = 1, 2, . . . , l as follows: To solve the eigenvalue problem in Equation (12) and principal component calculation, the nonlinear mapping does not need to be conducted. To replace this, the kernel function can be constructed K(x, y) = Φ(x), Φ(y) . In this work, the kernel centering is calculated before it is applied in KPCA using the following expression:

Kernel PCA Mix Control Chart Procedures
After explaining the KPCA algorithm in the previous subsection, the KPCA mix chart procedure is presented in this subsection. The main idea of KPCA mix chart is constructing the matrix Z representing the metric and nonmetric variables. There are two steps in this KPCA mix chart procedures. First, the T 2 statistics are calculated from matrix Z. Further, the control limit of the proposed chart is calculated using the KDE approach. These procedures are illustrated by the flowchart in Figure 2. The procedures are detailed as follows: a. Z 1 sized n × p is centered on a matrix X 1 which is contained the metric data. b. Z 2 sized n × m is centered on a matrix G which is contained binary coding from every level of nonmetric data X 2 . For example, X 2 has three categories such as "no defect", "minor defect", and "major defect" represented as 1, 2, and 3, respectively where the dummy variable for "no defect" symbolized as 1 is 1 0 0, the dummy variable for "minor defect" symbolized as 2 is 0 1 0, and the dummy variable for "major defect" represented as 3 is 0 0 1.
6. From the first l principal component t, calculate the T 2 statistics using the following equation: (19) where 1, 2,..., v l  , and v  eigenvalues that correspond to v-th PCs. KDE control limit calculation 1. Estimate the empirical density of 2 k T statistics using the following equation: 2. Calculate the cumulative distribution using the trapezoid rule as KDE control limit calculation 1. Estimate the empirical density of T 2 k statistics using the following equation: k using the trapezoid rule as follows: where π min and π max are the maximum and minimum value of T 2 k . 3. Calculate the KDE control limit using the following expression: In this paper, R statistical software was used to create the proposed KPCA mix chart and conduct the simulation studies. The Kernel-Based Machine Learning Lab (kernlab) package was used to perform the KPCA algorithm.

KDE Control Limit
In this section, KDE control limit of the T 2 k statistics is presented for various kernel functions. Three types of kernel functions are used in this paper, such as: The continuous or metric quality characteristic X 1 is generated from the multivariate normal distribution. In this research, the number of metric quality characteristics p is 5. Meanwhile, the nonmetric or categorical quality characteristics are generated from a multinomial distribution X 2 ∼ M(n, θ 1 , θ 2 , θ 3 ) with three types of parameters as follows: a.
θ 1 , θ 2 = 0.05 and θ 3 = 0.9 (extreme imbalanced case). Table 1 reports the KDE control limit for linear kernel when the number of continuous characteristics p is 5 and the number of PCs l = 2, 3, and 5. From the table, it can be seen that the KDE control limit produces stable ARL 0 at 370 for α = 0.00273. Additionally, it can be seen that the larger number of PCs l used the larger KDE control limit produced.

Polynomial Kernel
KDE control limits of polynomial kernel for various cases are reported in Tables 2-4. According to the results, the larger the d used, the larger the ARL 0 produced. In this case, the ARL 0 that is close to the theoretical is achieved when the parameter of the polynomial kernel is 1 (d = 1). Moreover, similar to the linear kernel, KDE control limit is larger for the larger number of principal components used.

RBF Kernel
Tables 5-7 present the KDE control limit of the proposed chart for p = 5 and various proportions of nonmetric data. From the tables, it can be seen that the smaller the hyperparameter σ * used, the closer the ARL 0 to the theory (in this case is 370). In general, the ARL 0 is close to the theory when the hyperparameter σ * = 0.001. Thus, for the same case in this work, the hyperparameter σ * is set to 0.001.

Performance of the Proposed Chart
In this paper, the performance of the proposed chart to detect outlier and to detect a shift in the process is evaluated for some scenarios. Similar to the previous section, the variable quality characteristics are generated from multivariate normal distribution and the attribute quality characteristics are generated from multinomial distribution.

Simulation Setup
In this part, the performance of the proposed chart in detecting the presence of outlier is presented. Using the same algorithm as in Ahsan et al. [20], the simulation studies was conducted 1000 times to calculate the hit rate, FN (false negative) rate, and FP (false positive) rate. The metric data X 1 is generated to follow the multivariate normal distribution X 1 ∼ N p (0,I). Meanwhile, the nonmetric data is generated to multinomial distribution X 2 ∼ M(n, θ 1 , θ 2 , θ 3 ). The percentage of outlier ε added to the clean or in-control data is set to 5%, 10%, 20%, 30%, 40%, and 50% out of the total observation. Furthermore, Table 8 shows the scenarios used to assess the proposed chart performance.

Scenario
Nonmetric  Table A1 for the detailed results). According to the results, the increase in the proportion of outliers added to the clean data causes a decrease in performance which can be seen from a decrease in the hit rate value. Moreover, for this case, the usage of kernel linear in kernel PCA mix chart is still reasonable for 30% outlier added to the clean data which can be seen from the high hit rate value produced (around 0.85-0.9). The performance of the proposed chart with the polynomial kernel in detecting outliers is presented in Figure 4 (see Appendix A Table A2 for detailed results). In this case, the parameter of the polynomial kernel is 1 (based on the result from the previous section). Similar to the previous results, the larger the outlier added to the clean data the smaller the hit rate value. According to its hit rate, the polynomial kernel is still in a good performance for 30% outlier added to the in-control data. Similar results also occur in RBF Kernel (see Figure 5 and Appendix A Table A3). Using the hyperparameter σ * = 0.001, the performance of the proposed chart is still good for smaller than 40% outlier added. When more than 40% outlier added to the in-control data, the misdetection for this case occurs due to the large false alarm produced. This happens because the proposed chart declares the actual in-control observations as the outliers. Thus, to improve the performance of the proposed chart in detecting outliers, the new method needs to overcome this issue. Symmetry 2020, 12, x FOR PEER REVIEW 11 of 27

Detecting Shift in the Process
The performance of the proposed chart is evaluated to detect a shift in the process using the average run length (ARL) criterion. This chart is also evaluated using several scenarios based on the proportion of the nonmetric parameter and kernel function. Moreover, the control limits used in this simulation are taken from the previous section.

Extreme Imbalanced
In this subsection, the performance of the proposed chart is evaluated when the variable characteristics are generated from the multivariate normal distribution ( , ) p N 0I and the attribute characteristics are generated from a multinomial distribution with extreme imbalanced parameter (  Figure 6a. From the results, it can be seen that ARL0 for all cases is around 370. Additionally, it can be concluded that the proposed chart can detect the shift in the process, which can be seen from the smaller ARL1 value for the larger shift given. According to the figure, the ARLs value for the kernel RBF and linear are not significantly different. Furthermore, the kernel function for this case did not performed well compared to the two kernel functions.

Detecting Shift in the Process
The performance of the proposed chart is evaluated to detect a shift in the process using the average run length (ARL) criterion. This chart is also evaluated using several scenarios based on the proportion of the nonmetric parameter and kernel function. Moreover, the control limits used in this simulation are taken from the previous section.

Extreme Imbalanced
In this subsection, the performance of the proposed chart is evaluated when the variable characteristics are generated from the multivariate normal distribution N p (0, I) and the attribute characteristics are generated from a multinomial distribution with extreme imbalanced parameter (θ 1 , θ 2 = 0.05 and θ 3 = 0.9). For p = 5 and l = 2, the evaluation results for various kernel function are visualized in Figure 6a. From the results, it can be seen that ARL 0 for all cases is around 370. Additionally, it can be concluded that the proposed chart can detect the shift in the process, which can be seen from the smaller ARL 1 value for the larger shift given. According to the figure, the ARLs value for the kernel RBF and linear are not significantly different. Furthermore, the kernel function for this case did not performed well compared to the two kernel functions.  Figure 6b depicts the performance evaluation results from the kernel PCA mix control chart for p = 5 and l = 3 with various kernel functions and extreme imbalanced proportion of categorical quality characteristics. It can be seen from the table that the proposed chart can detect a shift in the process which can be seen from smaller ARL1 for the larger shift. For the smaller shift, the polynomial kernel produces a better performance compared to the RBF kernel which can be seen from the smaller ARL1 owned. On the other hand, for the larger shift, the RBF kernel outperforms the polynomial kernel. For this case, the linear kernel does not perform well compared to the other kernel functions. Figure 6c presents the ARLs of the proposed chart for 1 2 3 , 0.05 dan 0.9     and p = 5 and l = 4. From the figure it can be seen the difference between the kernel functions used. In general, for all kernel functions used, it can be said that the proposed chart can detect the shift in the process which can be seen from the smaller ARL1 value for the larger shift. In this case, the similar performance from all kernel functions compared can be seen. However, for the small shift, the linear kernel produces a slightly better performance compared to the other kernels. The detailed results for this case can be found in Appendix A Tables A4-A6.

Imbalanced
In this part, the performance of the proposed chart for the imbalanced parameter of the nonmetric characteristics with various kernel functions are presented. Figure 7a shows the ARLs of the proposed chart for 1 2 , p = 5, and l = 2 with various kernel functions used.
According to the table, it can be said that the proposed chart is able to detect a shift in the process indicated by the smaller the value of ARL1 when the process shift gets larger. For this case, the best performance is produced by the linear and polynomial kernels. On the other hand, the RBF kernel does not perform well for this scenario.   Figure 6b depicts the performance evaluation results from the kernel PCA mix control chart for p = 5 and l = 3 with various kernel functions and extreme imbalanced proportion of categorical quality characteristics. It can be seen from the table that the proposed chart can detect a shift in the process which can be seen from smaller ARL 1 for the larger shift. For the smaller shift, the polynomial kernel produces a better performance compared to the RBF kernel which can be seen from the smaller ARL 1 owned. On the other hand, for the larger shift, the RBF kernel outperforms the polynomial kernel. For this case, the linear kernel does not perform well compared to the other kernel functions. Figure 6c presents the ARLs of the proposed chart for θ 1 , θ 2 = 0.05 and θ 3 = 0.9 and p = 5 and l = 4. From the figure it can be seen the difference between the kernel functions used. In general, for all kernel functions used, it can be said that the proposed chart can detect the shift in the process which can be seen from the smaller ARL 1 value for the larger shift. In this case, the similar performance from all kernel functions compared can be seen. However, for the small shift, the linear kernel produces a slightly better performance compared to the other kernels. The detailed results for this case can be found in Appendix A Tables A4-A6.

Imbalanced
In this part, the performance of the proposed chart for the imbalanced parameter of the nonmetric characteristics with various kernel functions are presented. Figure 7a shows the ARLs of the proposed chart for θ 1 , θ 2 = 0.1, θ 3 = 0.8, p = 5, and l = 2 with various kernel functions used. According to the table, it can be said that the proposed chart is able to detect a shift in the process indicated by the smaller the value of ARL 1 when the process shift gets larger. For this case, the best performance is produced by the linear and polynomial kernels. On the other hand, the RBF kernel does not perform well for this scenario.  Figure 6b depicts the performance evaluation results from the kernel PCA mix control chart for p = 5 and l = 3 with various kernel functions and extreme imbalanced proportion of categorical quality characteristics. It can be seen from the table that the proposed chart can detect a shift in the process which can be seen from smaller ARL1 for the larger shift. For the smaller shift, the polynomial kernel produces a better performance compared to the RBF kernel which can be seen from the smaller ARL1 owned. On the other hand, for the larger shift, the RBF kernel outperforms the polynomial kernel. For this case, the linear kernel does not perform well compared to the other kernel functions. In general, for all kernel functions used, it can be said that the proposed chart can detect the shift in the process which can be seen from the smaller ARL1 value for the larger shift. In this case, the similar performance from all kernel functions compared can be seen. However, for the small shift, the linear kernel produces a slightly better performance compared to the other kernels. The detailed results for this case can be found in Appendix A Tables A4-A6.

Imbalanced
In this part, the performance of the proposed chart for the imbalanced parameter of the nonmetric characteristics with various kernel functions are presented. Figure 7a shows the ARLs of the proposed chart for 1 2 , p = 5, and l = 2 with various kernel functions used.
According to the table, it can be said that the proposed chart is able to detect a shift in the process indicated by the smaller the value of ARL1 when the process shift gets larger. For this case, the best performance is produced by the linear and polynomial kernels. On the other hand, the RBF kernel does not perform well for this scenario.  The performance of the proposed chart in detecting the shift in process for θ 1 , θ 2 = 0.1, θ 3 = 0.8, p = 5, and l = 3 is presented in Figure 7b. In general, the proposed chart can detect the shift for all kernel functions used. According to the figure, it can be seen that the linear and polynomial kernels have similar performance. For this case, these two kernel functions outperform the performance of RBF kernel. Furthermore, Figure 7c reports the performance of the proposed chart with various kernel functions for θ 1 , θ 2 = 0.1, θ 3 = 0.8, p = 5, and l = 3. According to the figure, the best performance for this case is performed by the linear and polynomial kernel. The detailed results for this case can be found in Appendix A Tables A7-A9.

Balanced
In this subsection, the performance of the proposed chart to detect a shift in the process for the balanced proportion of the nonmetric data is presented. Some scenarios based on the kernel function and number of the PCs l used are used to assess the performance of the proposed chart. For the balanced nonmetric data with p = 5 and l = 2, the proposed chart can detect the shift in the process according to its ARL 1 values for all kernel functions (see Figure 8a). For this case, the linear and polynomial kernel outperform the performance of the RBF kernel. Moreover, the best performance for this case is presented by the polynomial kernel. In this subsection, the performance of the proposed chart to detect a shift in the process for the balanced proportion of the nonmetric data is presented. Some scenarios based on the kernel function and number of the PCs l used are used to assess the performance of the proposed chart. For the balanced nonmetric data with p = 5 and l = 2, the proposed chart can detect the shift in the process according to its ARL1 values for all kernel functions (see Figure 8a). For this case, the linear and polynomial kernel outperform the performance of the RBF kernel. Moreover, the best performance for this case is presented by the polynomial kernel.  Figure 8b shows the ARLs of the proposed chart for a balanced proportion of the nonmetric parameter with p = 5 and l = 3. For this case, all kernel functions demonstrate good performance as can be seen in Figure 8b. For a small shift, the polynomial kernel shows a great performance which can be found from the smaller ARL1 produced. On the other hand, the RBF kernel outperforms the other kernel functions for the large shift. This similar performance also happens for p = 5 and l = 3, which can be seen in Figure 8c. According to the figure, the kernel polynomial has a slightly better performance for the small shift compared to the other kernel functions. The detailed results for this case can be found in Appendix A Tables A10-A12.

Summary and Discussion
In this section, the summary of simulation studies and discussion about the performance of the proposed KPCA mix chart are presented. The simulation studies were conducted to evaluate the performance of the proposed chart in detecting outlier and process shift. In detecting outliers, it can be found that the KPCA mix chart still has better performance for 30% outlier added to the clean data. In general, for more than 30% outlier added, the misdetection is mainly caused by the high FP rate value (see Appendix A Tables A1-A3). Table 9 summarizes the proposed KPCA mix chart performance in detecting process shift for all scenarios. The sign • symbolizes the better performance for a small shift while the sign ⁂ represents the better performance for the large shift. Based on the results, the polynomial kernel demonstrates good performance in the balanced and imbalanced cases for both small and large shifts in the process. On the other hand, for the extreme imbalanced parameter of the nonmetric data, the RBF and linear kernels show a better performance when it is used to monitor a small shift.  Figure 8b shows the ARLs of the proposed chart for a balanced proportion of the nonmetric parameter with p = 5 and l = 3. For this case, all kernel functions demonstrate good performance as can be seen in Figure 8b. For a small shift, the polynomial kernel shows a great performance which can be found from the smaller ARL 1 produced. On the other hand, the RBF kernel outperforms the other kernel functions for the large shift. This similar performance also happens for p = 5 and l = 3, which can be seen in Figure 8c. According to the figure, the kernel polynomial has a slightly better performance for the small shift compared to the other kernel functions. The detailed results for this case can be found in Appendix A Tables A10-A12.

Summary and Discussion
In this section, the summary of simulation studies and discussion about the performance of the proposed KPCA mix chart are presented. The simulation studies were conducted to evaluate the performance of the proposed chart in detecting outlier and process shift. In detecting outliers, it can be found that the KPCA mix chart still has better performance for 30% outlier added to the clean data. In general, for more than 30% outlier added, the misdetection is mainly caused by the high FP rate value (see Appendix A Tables A1-A3). Table 9 summarizes the proposed KPCA mix chart performance in detecting process shift for all scenarios. The sign • symbolizes the better performance for a small shift while the sign

Balanced
In this subsection, the performance of the proposed chart to detect a shift in the process for the balanced proportion of the nonmetric data is presented. Some scenarios based on the kernel function and number of the PCs l used are used to assess the performance of the proposed chart. For the balanced nonmetric data with p = 5 and l = 2, the proposed chart can detect the shift in the process according to its ARL1 values for all kernel functions (see Figure 8a). For this case, the linear and polynomial kernel outperform the performance of the RBF kernel. Moreover, the best performance for this case is presented by the polynomial kernel.  Figure 8b shows the ARLs of the proposed chart for a balanced proportion of the nonmetric parameter with p = 5 and l = 3. For this case, all kernel functions demonstrate good performance as can be seen in Figure 8b. For a small shift, the polynomial kernel shows a great performance which can be found from the smaller ARL1 produced. On the other hand, the RBF kernel outperforms the other kernel functions for the large shift. This similar performance also happens for p = 5 and l = 3, which can be seen in Figure 8c. According to the figure, the kernel polynomial has a slightly better performance for the small shift compared to the other kernel functions. The detailed results for this case can be found in Appendix A Tables A10-A12.

Summary and Discussion
In this section, the summary of simulation studies and discussion about the performance of the proposed KPCA mix chart are presented. The simulation studies were conducted to evaluate the performance of the proposed chart in detecting outlier and process shift. In detecting outliers, it can be found that the KPCA mix chart still has better performance for 30% outlier added to the clean data. In general, for more than 30% outlier added, the misdetection is mainly caused by the high FP rate value (see Appendix A Tables A1-A3). Table 9 summarizes the proposed KPCA mix chart performance in detecting process shift for all scenarios. The sign • symbolizes the better performance for a small shift while the sign ⁂ represents the better performance for the large shift. Based on the results, the polynomial kernel demonstrates good performance in the balanced and imbalanced cases for both small and large shifts in the process. On the other hand, for the extreme imbalanced parameter of the nonmetric data, the RBF and linear kernels show a better performance when it is used to monitor a small shift.
represents the better performance for the large shift. Based on the results, the polynomial kernel demonstrates good performance in the balanced and imbalanced cases for both small and large shifts in the process. On the other hand, for the extreme imbalanced parameter of the nonmetric data, the RBF and linear kernels show a better performance when it is used to monitor a small shift. performance for the small shift compared to the other kernel functions. The detailed results for this case can be found in Appendix A Tables A10-A12.

Summary and Discussion
In this section, the summary of simulation studies and discussion about the performance of the proposed KPCA mix chart are presented. The simulation studies were conducted to evaluate the performance of the proposed chart in detecting outlier and process shift. In detecting outliers, it can be found that the KPCA mix chart still has better performance for 30% outlier added to the clean data. In general, for more than 30% outlier added, the misdetection is mainly caused by the high FP rate value (see Appendix A Tables A1-A3). Table 9 summarizes the proposed KPCA mix chart performance in detecting process shift for all scenarios. The sign • symbolizes the better performance for a small shift while the sign ⁂ represents the better performance for the large shift. Based on the results, the polynomial kernel demonstrates good performance in the balanced and imbalanced cases for both small and large shifts in the process. On the other hand, for the extreme imbalanced parameter of the nonmetric data, the RBF and linear kernels show a better performance when it is used to monitor a small shift.
• 3 which can be seen in Figure 8c. According to the figure, the kernel polynomial has a slightly better performance for the small shift compared to the other kernel functions. The detailed results for this case can be found in Appendix A Tables A10-A12.

Summary and Discussion
In this section, the summary of simulation studies and discussion about the performance of the proposed KPCA mix chart are presented. The simulation studies were conducted to evaluate the performance of the proposed chart in detecting outlier and process shift. In detecting outliers, it can be found that the KPCA mix chart still has better performance for 30% outlier added to the clean data. In general, for more than 30% outlier added, the misdetection is mainly caused by the high FP rate value (see Appendix A Tables A1-A3). Table 9 summarizes the proposed KPCA mix chart performance in detecting process shift for all scenarios. The sign • symbolizes the better performance for a small shift while the sign ⁂ represents the better performance for the large shift. Based on the results, the polynomial kernel demonstrates good performance in the balanced and imbalanced cases for both small and large shifts in the process. On the other hand, for the extreme imbalanced parameter of the nonmetric data, the RBF and linear kernels show a better performance when it is used to monitor a small shift.
• which can be seen in Figure 8c. According to the figure, the kernel polynomial has a slightly better performance for the small shift compared to the other kernel functions. The detailed results for this case can be found in Appendix A Tables A10-A12.

Summary and Discussion
In this section, the summary of simulation studies and discussion about the performance of the proposed KPCA mix chart are presented. The simulation studies were conducted to evaluate the performance of the proposed chart in detecting outlier and process shift. In detecting outliers, it can be found that the KPCA mix chart still has better performance for 30% outlier added to the clean data. In general, for more than 30% outlier added, the misdetection is mainly caused by the high FP rate value (see Appendix A Tables A1-A3). Table 9 summarizes the proposed KPCA mix chart performance in detecting process shift for all scenarios. The sign • symbolizes the better performance for a small shift while the sign ⁂ represents the better performance for the large shift. Based on the results, the polynomial kernel demonstrates good performance in the balanced and imbalanced cases for both small and large shifts in the process. On the other hand, for the extreme imbalanced parameter of the nonmetric data, the RBF and linear kernels show a better performance when it is used to monitor a small shift.
• 4 from the smaller ARL1 produced. On the other hand, the RBF kernel outperforms the functions for the large shift. This similar performance also happens for p = 5 and l = 3, seen in Figure 8c. According to the figure, the kernel polynomial has a slightly better for the small shift compared to the other kernel functions. The detailed results for this ound in Appendix A Tables A10-A12. and Discussion ection, the summary of simulation studies and discussion about the performance of the CA mix chart are presented. The simulation studies were conducted to evaluate the of the proposed chart in detecting outlier and process shift. In detecting outliers, it can t the KPCA mix chart still has better performance for 30% outlier added to the clean data. r more than 30% outlier added, the misdetection is mainly caused by the high FP rate pendix A Tables A1-A3). summarizes the proposed KPCA mix chart performance in detecting process shift for all e sign • symbolizes the better performance for a small shift while the sign ⁂ represents rformance for the large shift. Based on the results, the polynomial kernel demonstrates ance in the balanced and imbalanced cases for both small and large shifts in the process. hand, for the extreme imbalanced parameter of the nonmetric data, the RBF and linear a better performance when it is used to monitor a small shift.
• can be found from the smaller ARL1 produced. On the other hand, the RBF kernel outperforms the other kernel functions for the large shift. This similar performance also happens for p = 5 and l = 3, which can be seen in Figure 8c. According to the figure, the kernel polynomial has a slightly better performance for the small shift compared to the other kernel functions. The detailed results for this case can be found in Appendix A Tables A10-A12.

Summary and Discussion
In this section, the summary of simulation studies and discussion about the performance of the proposed KPCA mix chart are presented. The simulation studies were conducted to evaluate the performance of the proposed chart in detecting outlier and process shift. In detecting outliers, it can be found that the KPCA mix chart still has better performance for 30% outlier added to the clean data. In general, for more than 30% outlier added, the misdetection is mainly caused by the high FP rate value (see Appendix A Tables A1-A3). Table 9 summarizes the proposed KPCA mix chart performance in detecting process shift for all scenarios. The sign • symbolizes the better performance for a small shift while the sign ⁂ represents the better performance for the large shift. Based on the results, the polynomial kernel demonstrates good performance in the balanced and imbalanced cases for both small and large shifts in the process. On the other hand, for the extreme imbalanced parameter of the nonmetric data, the RBF and linear kernels show a better performance when it is used to monitor a small shift.
• Imbalanced 2 can be seen in Figure 8b. For a small shift, the polynomial kernel shows a great performance which can be found from the smaller ARL1 produced. On the other hand, the RBF kernel outperforms the other kernel functions for the large shift. This similar performance also happens for p = 5 and l = 3, which can be seen in Figure 8c. According to the figure, the kernel polynomial has a slightly better performance for the small shift compared to the other kernel functions. The detailed results for this case can be found in Appendix A Tables A10-A12.

Summary and Discussion
In this section, the summary of simulation studies and discussion about the performance of the proposed KPCA mix chart are presented. The simulation studies were conducted to evaluate the performance of the proposed chart in detecting outlier and process shift. In detecting outliers, it can be found that the KPCA mix chart still has better performance for 30% outlier added to the clean data. In general, for more than 30% outlier added, the misdetection is mainly caused by the high FP rate value (see Appendix A Tables A1-A3). Table 9 summarizes the proposed KPCA mix chart performance in detecting process shift for all scenarios. The sign • symbolizes the better performance for a small shift while the sign ⁂ represents the better performance for the large shift. Based on the results, the polynomial kernel demonstrates good performance in the balanced and imbalanced cases for both small and large shifts in the process. On the other hand, for the extreme imbalanced parameter of the nonmetric data, the RBF and linear kernels show a better performance when it is used to monitor a small shift.
• 3 parameter with p = 5 and l = 3. For this case, all kernel functions demonstrate good performance as can be seen in Figure 8b. For a small shift, the polynomial kernel shows a great performance which can be found from the smaller ARL1 produced. On the other hand, the RBF kernel outperforms the other kernel functions for the large shift. This similar performance also happens for p = 5 and l = 3, which can be seen in Figure 8c. According to the figure, the kernel polynomial has a slightly better performance for the small shift compared to the other kernel functions. The detailed results for this case can be found in Appendix A Tables A10-A12.

Summary and Discussion
In this section, the summary of simulation studies and discussion about the performance of the proposed KPCA mix chart are presented. The simulation studies were conducted to evaluate the performance of the proposed chart in detecting outlier and process shift. In detecting outliers, it can be found that the KPCA mix chart still has better performance for 30% outlier added to the clean data. In general, for more than 30% outlier added, the misdetection is mainly caused by the high FP rate value (see Appendix A Tables A1-A3). Table 9 summarizes the proposed KPCA mix chart performance in detecting process shift for all scenarios. The sign • symbolizes the better performance for a small shift while the sign ⁂ represents the better performance for the large shift. Based on the results, the polynomial kernel demonstrates good performance in the balanced and imbalanced cases for both small and large shifts in the process. On the other hand, for the extreme imbalanced parameter of the nonmetric data, the RBF and linear kernels show a better performance when it is used to monitor a small shift.
• parameter with p = 5 and l = 3. For this case, all kernel functions demonstrate good performance as can be seen in Figure 8b. For a small shift, the polynomial kernel shows a great performance which can be found from the smaller ARL1 produced. On the other hand, the RBF kernel outperforms the other kernel functions for the large shift. This similar performance also happens for p = 5 and l = 3, which can be seen in Figure 8c. According to the figure, the kernel polynomial has a slightly better performance for the small shift compared to the other kernel functions. The detailed results for this case can be found in Appendix A Tables A10-A12.

Summary and Discussion
In this section, the summary of simulation studies and discussion about the performance of the proposed KPCA mix chart are presented. The simulation studies were conducted to evaluate the performance of the proposed chart in detecting outlier and process shift. In detecting outliers, it can be found that the KPCA mix chart still has better performance for 30% outlier added to the clean data. In general, for more than 30% outlier added, the misdetection is mainly caused by the high FP rate value (see Appendix A Tables A1-A3). Table 9 summarizes the proposed KPCA mix chart performance in detecting process shift for all scenarios. The sign • symbolizes the better performance for a small shift while the sign ⁂ represents the better performance for the large shift. Based on the results, the polynomial kernel demonstrates good performance in the balanced and imbalanced cases for both small and large shifts in the process. On the other hand, for the extreme imbalanced parameter of the nonmetric data, the RBF and linear kernels show a better performance when it is used to monitor a small shift.  Figure 8b shows the ARLs of the proposed chart for a balanced proportion of the nonmetric parameter with p = 5 and l = 3. For this case, all kernel functions demonstrate good performance as can be seen in Figure 8b. For a small shift, the polynomial kernel shows a great performance which can be found from the smaller ARL1 produced. On the other hand, the RBF kernel outperforms the other kernel functions for the large shift. This similar performance also happens for p = 5 and l = 3, which can be seen in Figure 8c. According to the figure, the kernel polynomial has a slightly better performance for the small shift compared to the other kernel functions. The detailed results for this case can be found in Appendix A Tables A10-A12.

Summary and Discussion
In this section, the summary of simulation studies and discussion about the performance of the proposed KPCA mix chart are presented. The simulation studies were conducted to evaluate the performance of the proposed chart in detecting outlier and process shift. In detecting outliers, it can be found that the KPCA mix chart still has better performance for 30% outlier added to the clean data. In general, for more than 30% outlier added, the misdetection is mainly caused by the high FP rate value (see Appendix A Tables A1-A3). Table 9 summarizes the proposed KPCA mix chart performance in detecting process shift for all scenarios. The sign • symbolizes the better performance for a small shift while the sign ⁂ represents the better performance for the large shift. Based on the results, the polynomial kernel demonstrates good performance in the balanced and imbalanced cases for both small and large shifts in the process. On the other hand, for the extreme imbalanced parameter of the nonmetric data, the RBF and linear kernels show a better performance when it is used to monitor a small shift.
• Figure 8b shows the ARLs of the proposed chart for a balanced proportion of the nonmetric parameter with p = 5 and l = 3. For this case, all kernel functions demonstrate good performance as can be seen in Figure 8b. For a small shift, the polynomial kernel shows a great performance which can be found from the smaller ARL1 produced. On the other hand, the RBF kernel outperforms the other kernel functions for the large shift. This similar performance also happens for p = 5 and l = 3, which can be seen in Figure 8c. According to the figure, the kernel polynomial has a slightly better performance for the small shift compared to the other kernel functions. The detailed results for this case can be found in Appendix A Tables A10-A12.

Summary and Discussion
In this section, the summary of simulation studies and discussion about the performance of the proposed KPCA mix chart are presented. The simulation studies were conducted to evaluate the performance of the proposed chart in detecting outlier and process shift. In detecting outliers, it can be found that the KPCA mix chart still has better performance for 30% outlier added to the clean data. In general, for more than 30% outlier added, the misdetection is mainly caused by the high FP rate value (see Appendix A Tables A1-A3). Table 9 summarizes the proposed KPCA mix chart performance in detecting process shift for all scenarios. The sign • symbolizes the better performance for a small shift while the sign ⁂ represents the better performance for the large shift. Based on the results, the polynomial kernel demonstrates good performance in the balanced and imbalanced cases for both small and large shifts in the process. On the other hand, for the extreme imbalanced parameter of the nonmetric data, the RBF and linear kernels show a better performance when it is used to monitor a small shift. the ARLs of the proposed chart for a balanced proportion of the nonmetric and l = 3. For this case, all kernel functions demonstrate good performance as 8b. For a small shift, the polynomial kernel shows a great performance which e smaller ARL1 produced. On the other hand, the RBF kernel outperforms the s for the large shift. This similar performance also happens for p = 5 and l = 3, Figure 8c. According to the figure, the kernel polynomial has a slightly better mall shift compared to the other kernel functions. The detailed results for this ppendix A Tables A10-A12.
ussion e summary of simulation studies and discussion about the performance of the chart are presented. The simulation studies were conducted to evaluate the roposed chart in detecting outlier and process shift. In detecting outliers, it can A mix chart still has better performance for 30% outlier added to the clean data. han 30% outlier added, the misdetection is mainly caused by the high FP rate A Tables A1-A3). zes the proposed KPCA mix chart performance in detecting process shift for all symbolizes the better performance for a small shift while the sign ⁂ represents e for the large shift. Based on the results, the polynomial kernel demonstrates the balanced and imbalanced cases for both small and large shifts in the process. r the extreme imbalanced parameter of the nonmetric data, the RBF and linear performance when it is used to monitor a small shift.
represents better performance for the large shift.
Based on the summary of simulation studies discussed, some limitations are found. First, the proposed KPCA mix chart is producing more false alarm when the larger outlier is added in simulations. Second, there is no superior kernel functions for all cases. Third, executing the KPCA mix chart requires more computational time due to the complexity of the kernel function. To overcome these problems, new methods for calculating the control limit and robust estimator are needed to reduce the false alarm when more outliers are added. Additionally, discovering new kernel functions and using the Fast KPCA method can improve the accuracy and speed of the computation of the proposed chart.

Applications
In this section, the proposed chart is applied for the simulated and real data. First, some scenarios of data are given in order to see the ability of the proposed chart in detecting mean shift. Second, the proposed chart is applied to monitor the real data and its monitoring result is compared with the PCA mix chart [19]. Table 10 shows the application of the proposed chart to monitor three scenarios of data. The linear, polynomial, and RBF kernel are employed in this application. The first 70 metric observations are generated to follow the multivariate normal distribution with µ = 0 and Σ = I. Meanwhile, the remaining 30 shifted observations are generated to follow a multivariate normal distribution with µ shi f t = 2 and Σ = I. Furthermore, the nonmetric data is generated to follow the multinomial distribution with a certain parameter (θ 1 , θ 2 , and θ 2 ) as given in Table 10. Table 10. Scenarios of simulated data for proposed chart application. Figures 9-11 illustrate the application of the proposed chart to monitor simulated data for RBF, polynomial, and linear kernels, respectively. From the results, it can be seen that for all kernel function used, the proposed chart can correctly detect the shift in 71st observation. However, for the imbalanced proportion of nonmetric data (see scenarios 2 and 3), the shift is not clearly seen as in the balanced case when the RBF kernel is used (see Figure 9). On the other hand, the polynomial kernel has a good performance for the imbalanced and extreme imbalanced cases as depicted in Figure 10. Furthermore, compared to the polynomial kernel, the linear kernel has better performance for the balanced and imbalanced cases as presented in Figure 11. has a good performance for the imbalanced and extreme imbalanced cases as depicted in Figure 10. Furthermore, compared to the polynomial kernel, the linear kernel has better performance for the balanced and imbalanced cases as presented in Figure 11.

Real Data
In this subsection, the proposed chart is applied to monitor the machine failure data used by Ahsan et al. [19] in evaluating the mixed chart based on PCA mix. The machine failure dataset has a balanced proportion of the categorical characteristics (see the complete description in [20]). Therefore, in this application, the RBF kernel is used. Table 11 presents the performance comparison between the proposed KPCA mix and PCA mix charts in monitoring the machine failure dataset. Based on the monitoring results, it can be concluded that the proposed chart can detect all out of control observations. However, the proposed KPCA mix chart produced more false alarms than the PCA mix chart. has a good performance for the imbalanced and extreme imbalanced cases as depicted in Figure 10. Furthermore, compared to the polynomial kernel, the linear kernel has better performance for the balanced and imbalanced cases as presented in Figure 11.

Real Data
In this subsection, the proposed chart is applied to monitor the machine failure data used by Ahsan et al. [19] in evaluating the mixed chart based on PCA mix. The machine failure dataset has a balanced proportion of the categorical characteristics (see the complete description in [20]). Therefore, in this application, the RBF kernel is used. Table 11 presents the performance comparison between the proposed KPCA mix and PCA mix charts in monitoring the machine failure dataset. Based on the monitoring results, it can be concluded that the proposed chart can detect all out of control observations. However, the proposed KPCA mix chart produced more false alarms than the PCA mix chart. has a good performance for the imbalanced and extreme imbalanced cases as depicted in Figure 10. Furthermore, compared to the polynomial kernel, the linear kernel has better performance for the balanced and imbalanced cases as presented in Figure 11.

Real Data
In this subsection, the proposed chart is applied to monitor the machine failure data used by Ahsan et al. [19] in evaluating the mixed chart based on PCA mix. The machine failure dataset has a balanced proportion of the categorical characteristics (see the complete description in [20]). Therefore, in this application, the RBF kernel is used. Table 11 presents the performance comparison between the proposed KPCA mix and PCA mix charts in monitoring the machine failure dataset. Based on the monitoring results, it can be concluded that the proposed chart can detect all out of control observations. However, the proposed KPCA mix chart produced more false alarms than the PCA mix chart.

Real Data
In this subsection, the proposed chart is applied to monitor the machine failure data used by Ahsan et al. [19] in evaluating the mixed chart based on PCA mix. The machine failure dataset has a balanced proportion of the categorical characteristics (see the complete description in [20]). Therefore, in this application, the RBF kernel is used. Table 11 presents the performance comparison between the proposed KPCA mix and PCA mix charts in monitoring the machine failure dataset. Based on the monitoring results, it can be concluded that the proposed chart can detect all out of control observations. However, the proposed KPCA mix chart produced more false alarms than the PCA mix chart.

Managerial Implication
In the industrial 4.0 era, monitoring the products with control chart plays a crucial role for the enhancement of process quality. Monitoring and enhancing the process are the main purpose of the control chart by reducing the variability in the process. The traditional control charts are used to monitor one type of quality characteristics. For instance, the numerical measurements such as length or weight are monitored using a variable type control chart. On the other hand, the categorical data such as defect, color, or softness are monitored using the attribute control chart. Thus, if a corporation wants to monitor the numerical and categorical data simultaneously, they need to use two types of the chart (variable and attribute) individually which is inefficient.
The findings in this paper are in-line with the concept of continuous quality enhancement and the adaptive monitoring process. The mixed monitoring scheme, proposed in this paper, covers not only one type of quality characteristic but also the mixed variable and attribute quality characteristic in one chart. Through simulation studies, this chart was guaranteed effective in monitoring shifts in the mixed process. By using this chart, fast corrective actions for any assignable causes can be taken by the administrator due to the sensitivity of the mixed monitoring scheme. Additionally, monitoring control limits need to be readjusted for the certain time intervals. The historical in-control observation can be used to calculated new control limits by estimating its empirical distribution (asymmetric or even unknown) using the KDE method. The adjusted control limit will help the company to adapt to the new data production behavior in the future.

Conclusions and Future Works
In this paper, a new control chart based on kernel PCA for monitoring mixed variable (continuous data) and attribute (categorical data) quality characteristics was proposed. The principal component scores (PCs) were transformed into T 2 statistics in constructing the proposed method. In calculating the accurate control limit, kernel density estimation (KDE) was employed. To evaluate the performance of the proposed chart, some scenarios with various kernel function such as linear, polynomial, and radial basis function kernels were used. For in-control condition, using the KDE control limit, the proposed chart produces ARL 0 at about 370 (α = 0.00273) for all scenarios. For the shifted process, the control chart was evaluated in monitoring the outlier in phase I and process shift in phase II. In monitoring outlier, the proposed chart was successful in detecting outliers mixed with clean data. In general, for this case, the proposed chart still has a good performance in detecting up to 30% outliers added in simulations. In monitoring the shift in the process, the proposed control chart based on kernel PCA demonstrated better performance. For this case, the different result was produced for different kernel function. The polynomial kernel showed a good performance for both small and large shifts with the balanced and imbalanced proportion of categorical data. This can be concluded from the high hit rate yielded by the polynomial kernel. On the other hand, for a small shift in the process, the linear and RBF kernels demonstrated good performance for an extreme imbalanced proportion of categorical data in term of accuracy detection. Furthermore, the proposed chart was applied to monitor the simulated and real data. The proposed chart shows great performance in monitoring the simulated data in terms of success detection of the out-of-control observations. Meanwhile, in monitoring the real data, the proposed chart outperforms the performance of the conventional PCA mix chart by producing lower false alarms. As future study, the bootstrap resampling method [28] can be employed to estimate the control limit of the proposed method. Development of mixed kernel function can also be a good alternative to exchange the conventional kernel used in this study. Finally, the use of fast kernel PCA [29] can improve the computational time.