Residual Control Chart for Binary Response with Multicollinearity Covariates by Neural Network Model

: Quality control studies have dealt with symmetrical data having the same shape with respect to left and right. In this research, we propose the residual ( r ) control chart for binary asymmetrical (non-symmetric) data with multicollinearity between input variables via combining principal component analysis (PCA), functional PCA (FPCA) and the generalized linear model with probit and logit link functions, and neural network regression model. The motivation in this research is that the proposed control chart method can deal with both high-dimensional correlated multivariate data and high frequency functional multivariate data by neural network model and FPCA. We show that the neural network r control chart is relatively efﬁcient to monitor the simulated and real binary response data with the narrow length of control limits.


Introduction
The current available quality control research has focused on symmetrical data having the same shape with respect to left and right. Data are getting bigger and highly correlated with each other and have asymmetric (non-symmetric) distributions. Therefore, the quality control is facing difficulty to handle highly correlated data so that we have a hard time to get accurate information from the current available control charts. In order to monitor a process mean vector, there have been a number of multivariate control charts including Hotelling T 2 distribution [1], mulvariate CUSUM [2] and multivariate EWMA [3]. These current available multivariate control charts have limitations to handle high-dimensional data because of the complexity of the covariance structure. Neural network based methods have been applied to quality control research areas, but there is no research available for residual (r) control charts for binary asymmetrical data with highly correlated multivariate covariates by using neural network regression model. Hence, this is a motivation to propose the r control chart for binary asymmetrical data with multicollinearity between input variables via principal component analysis (PCA), functional PCA (FPCA), and neural network model. To deal with high correlations among independent variables, [4] proposed Poisson, negative binomial, COM-Poisson-based principal component regression-based r-control charts for monitoring dispersed count data. More detailed information for diverse control charts can be found in [5,6]. The novelties with respect to existing strategies in this research have two things. The first one is that the proposed control chart method can deal with high-dimensional correlated multivariate data by neural network model and the second one is that the proposed control chart method can deal high frequency functional multivariate data by FPCA. One of the applications by the proposed statistical process control is monitoring clinical performance which measures binary asymmetrical data such as mortality with patient medical information.

Statistical Methods
In this research, we present regression-based r-control charts combining principal component analysis methods (PCA and FPCA) and binary response regression models (generalized linear model and neural network regression model) for binary asymmetrical data with multicollinearity among independent variables.

Generalized Linear Model and Neural Network Model for Binary Response Data
To introduce the binary response regression models, the generalized linear model (GLM) should be considered first because GLM is both a generalized and flexible model which can consider binary asymmetrical data. The GLM has the following probability density distribution which comes from the exponential family: where we denote the response variable to be y, the location parameter to be λ, the dispersion parameter to be δ, and arbitrary functions to be a 1 (·), a 2 (·), and a 3 (·). We denote ζ to be the linear predictor for the response, y so that ζ is a linear combination of unknown parameters b = (b 0 , b 1 , · · · , b p ) and input variables x = (1, x 1 , · · · , x p ) . A link function f such that E(y) = f −1 (ı) provides the relationship between the linear predictor and the mean of the distribution function. The link function f (·) specified how to convert the expected value µ = E(y) to the linear predictor ζ: i.e., Using a logit model as an example, we have where b is the column vector of the fixed-effects regression coefficients. For the (3), it can be written as With the (4), we can derive the likelihood function for GLMs as follows: and so the log-likelihood function is given by Then the maximum-likelihood estimating equation for b is easily solved via standard softwares (e.g., SAS or R) using the Fisher scoring or Newton-Raphson method.
The GLM with probit link function requires the assumptions that the response is binary and that an underlying latent variable governing the binary process follows a normal distribution. In some cases the GLMs with probit link function can probably gives best goodness-of-fit of the test where response variables are assumed to have normal distributions [7] because the response probability distribution of the GLMs belongs to an exponential family of distributions which employ the methods analogous to the normal linear methods for the normal data [8,9]. Therefore, for asymmetrical (non-normal) distributed data, the GLM with probit link function may not be the best model. Hence, this is another motivation to propose a neural network model based on r control chart for the better predictive accuracy with the non-normal data.
An artificial neural network (ANN) is originally inspired from human brain and ANN resembles biological neural networks which imitate human brain activity through a computer simulations [10][11][12]. Physically, an ANN contains neurons connected by synapses that connected them. The ANN learning process heavily relies on both weights of the connections between the neurons specifying which variables involved in the network and activities of the neurons. The weights are computed by optimizing a learning algorithm. The ANN uses the concept of competition to select the highest probability of inhibiting all neurons [10][11][12]. Specifically, the most basic form of an ANN is a single layer feedforward type of connection among neurons. ANNs have input layers and multiple hidden layers. Lastly, the hidden layers are connected to the output layer, which produces the outputs. In the last decade, the ANN-based statistical process control research has been actively studied. Research papers such as [13] proposed the pattern recognition for bivariate process mean shifts using feature-based ANN and [14] proposed control chart pattern recognition using Radial Basis Function (RBF) neural networks. Recently, [15] proposed statistical process control with Intelligence Based on the Deep Learning Model and reviewed the neural network-based statistical process control. In this paper, we used 'nnet' R packge [16] for feed-forward neural networks with a single hidden layer, and for multinomial log-linear models. With repeated simulated data (each sample size 1,000 and 10,000 different replications for the case of PCA and each sample size 1000 and 30 different replications for the case of FPCA) and real data, we employed single layer and 30 neurons for the simulation study and real data analysis.
The r-control charts for binary response data uses the GLM models with logit and probit link functions and neural network models employ deviance residuals being independent and asymptotically normally distributed with zero mean and unit variance, i.e., r i ∼N(0, 1) for i = 1, . . . , n. In this research, we chose a deviance residual for the GLM models with logit and probit link functions and a neural network model because the R packages for the GLM models with logit and probit link functions and neural network model have a command for producing the deviance residual. It is easy to compare the residuals from both models which are the GLM-based model and neural network model. Ref. [17] proposed Shewhart control limits for the deviance residuals are where k is defined by the false alarm probability, α = 1/ARL 0 , and ARL 0 is the average run length (ARL) under the process in-control. The ARL is a measure of the performance of control charts for monitoring a process.

Dimension Reduction by Principal Component Analysis
The principal component analysis (PCA) is a statistical orthogonal transformation method converting multivariate data set of correlated variables into a set of values of linearly uncorrelated variables called principal components. Hence, the PCA is the most common dimension reduction statistical method which can reduce the dimensionality of multivariate data to the smaller uncorrelated principal components which account for the variation of the original data. The r control chart for binary response regression model with primary principal components by the PCA is a new statistical process control which monitors the binary response variable as a function of uncorrelated PCs. In this paper, we propose a binary response regression model-based r control charts for binary response data overcoming a multicollinearity issue among independent variables.
To verify our proposed method in terms of the model flexibility and performance, we have run simulations for various circumstances: in-control, one inflated, or zero inflated binary data.
Through both simulation study and a real data example, we illustrated the good performance of our proposed method.
Ref. [4] denotes X to be an n × q matrix of independent variables and W be the standardized matrix of X so that each column has mean 0 and standard deviance 1. Let Q be the matrix of eigenvectors of W W and D be the diagonal matrix with eigenvalues of W W on the diagonal and values of zero everywhere else. Hence, we sort the eigenvalues from largest to smallest such as λ 1 > λ 2 > · · · > λ q and sort the eigenvectors in Q accordingly, and let Q * be this sorted matrix of eigenvectors. Then, the principal components, W * , are given by We can perform a dimensional reduction to the uncorrelated variables W * from the original multivariate high correlated data. Our proposed procedure uses these uncorrelated variables W * to perform GLM with probit, GLM with logit, and neural network regression models.

Dimension Reduction by Functional Principal Component Analysis
Ref. [18] proposed a functional PCA (FPCA) for functional data, which is another dimensional reduction statistical method for explaining the variance of components by using non-liner eigenfunctions and for the multivariate highly correlated data, because FPCA overcomes the high-dimensionality difficulty and efficiently examines the sample covariance structure.
The functional form of y i (t) is given by the sum of the weighted basis functions, φ p (t), across the set of times T.
where P is a number of basis functions. In this study, a Fourier basis is used to represent smooth functions as a basis function due to its flexibility and computational advantages. Here, our goal is to obtain a smooth function which fits well into the observed time series, y i (t j ). For calculating functional PCA, we employ 'fdapace' R package [19]. This package is for functional principal component analysis (FPCA) via the principal analysis by the conditional estimation (PACE) algorithm which yields covariance and mean functions, eigenfunctions and principal component (scores). PACE provides fitted continuous trajectories with confidence bands [20,21].

New Binary response statistical process control Procedure
A new binary response statistical process control procedure for the deviance residuals, r, from binary response regression models that have high correlated multivariate covariates is proposed through the following steps: Apply the (functional) principal component analysis in input variables X and obtain the principal components w * from (8).

2.
Fit the binary response regression model by using the binary response variable y and the (functional) principal components w * through probit link function, logit link function, and neural network regression models, respectively. 3.
Obtain the deviance residuals from each model.

4.
Set k value and obtain the lower and upper control limits of the r-charts using (7).

Illustrated Examples
With the proposed method in Section 2, we perform the efficiency comparison among the proposed methods with simulated data and real data.

Simulation Study
In order to compare the r-charts based on binary regression models, we generate simulated data denoting x = (x 1 , x 2 , x 3 , x 4 ) as input variables which are generated from the multivariate normal distribution with mean (1, 2, 3, 4) and covariance matrix M as follows: ) which passes through an inverse logit function. Then, we generate response variable y randomly by using the Bernoulli distribution with the probability P(y = 1) with sample size 1000. For the one ('1') inflated case of binary response data, we added 0.1 to the probability P(y = 1) such as P(y = 1) + 0.1 and for the zero ('0') inflated case of binary response data, we subtracted 0.1 from the probability P(y = 1) such as P(y = 1) − 0.1. Also, P(y = 1) is used for the in-control dispersion case. In each setup, we perform 10, 000 different replications of sample size of 1000. Table 1 shows the simulation results. By using the deviance residuals for each model and (7) for k = 1, 2, 3, we compute the lower control limit (LCL) and upper control limit (UCL) for the process. The expected length of the confidence interval is computed by the average of the length of control limits. The coverage probability is the proportion of the deviance residuals contained in the control limits. The lower control limit and the upper control limit value for r-chart are calculated by means of y minus and plus its one, two and three standard deviations.
The summary statistics of P(y = 1) in Figure 3.1 are that the minimum is 0.3450, the first quartile is 0.5723, the median is 0.6222, the mean is 0.6172, the third quartile is 0.6654 and the maximum is 0.8477. The skewness of P(y = 1) is -0.3272 which proves the shape of the simulated data is asymmetry so that it can be more inclined to produce zero value data rather than one value data. Table 1. Based on PCA, the coverage probability, expected confidence interval (CI) length, and control limits for the simulated in-control, one inflated-, and zero inflated-dispersion binary data via various r-charts based on GLM with probit, GLM with logit, and neural network models. Neural network model used single layer and 30 neurons. 'NA' in the table means that there is no points out of control limits and the number of simulations is 10, 000 different replications of sample size of 1000.

Probit Logit Neural Network
Case  Based on PCA, Table 1 presents the average run length (ARL) results for simulated in-control, one inflated-, and zero inflated-dispersion data via r-charts based on the GLM with probit and logit link function and neural network models. Changes in w that resulted in one, two and three standard deviations from the mean of y are considered. From Table 1, and based on PCA, we can see that the coverage probabilities by the GLMs with probit and logit link functions are slightly greater or equal to the coverage probabilities by the neural network regression model, but the length of the confidence intervals (CIs) for the neural network regression model are much smaller than the the length of the confidence intervals (CIs) for the GLMs with probit and logit link functions and the ARLs of the neural network regression model are much smaller than the the ARLs for the GLMs with probit and logit link function because of the smaller length of CIs.
From Figures 2-4 in case of the in-control dispersion based on PCA, we can observe that the residuals of a neural network regression model are much closer to zero than the residuals of the GLM with probit and logit model. Therefore, it is not a surprising result in Table 1 that the r-chart based on the neural network model shows a superiority in all cases in terms of the expected length of the confidence interval. We can say that the r-chart based on the neural network model for monitoring observations has the smallest expected length of the confidence interval with the reasonable coverage probability.
We also see that the r-charts based on the neural network model give the superior performance following the (7), E(r i ) ± k Var(r i ) ≈ ± k but the r-charts based on the GLM with logit and probit link functions do not give the good performance following the (7). The reason for the difference is probably the tails of the distribution in the GLMs.
For calculating FPCA, we employ 'fdapace' R package [19] to represent smooth functions with Fourier basis. In order to generate the functional data, we set the number of subjects (N=1000) and the number of measurements per subjects (M=1000). We define the four covariates (x) with four eigencomponents and we define the coefficients of parameters (β's) to be β 0 = 0.1, β 1 = 0.25, β 2 = −0.5, β 3 = 0.25, β 4 = 0.1 so that P(y = 1) = 1 1+exp(−0.1−0.25x 1 +0.5x 2 −0.25x 2 −0.1x 4 )) which passes through an inverse logit function. To apply the simulated data to the proposed methods, we generate 30 different functional simulated data replications of sample size of 1000 in this study. Based on FPCA, Table 2 presents the average run length (ARL) results for simulated in-control, one inflated-, and zero inflated-dispersion data via r-charts based on the GLM with probit and logit link function and neural network models. Changes in w that resulted in one, two and three standard deviations from the mean of y are considered.    Table 2. Based on FPCA, the coverage probability, expected confidence interval (CI) length, and control limits for the simulated in-control, one inflated-, and zero inflated-dispersion binary data via various r-charts based on GLM with probit, GLM with logit, and neural network models. Neural network model used single layer and 30 neurons. 'NA' in the table means that there is no points out of control limits and and the number of simulations is 30 different replications of sample size of 1000.

Probit Logit Neural Network
Case From Table 2 which is based on FPCA, we can see the same results as the ones in Table 1 based on PCA. The coverage probabilities by the GLMs with probit and logit link functions are slightly greater or equal to the coverage probabilities by the neural network regression model. The length of the confidence intervals (CIs) for the neural network regression model are much smaller than the the length of the confidence intervals (CIs) for the GLMs with probit and logit link functions but the ARLs of the neural network regression model are not smaller than the the ARLs for the GLMs with probit and logit link function. This result based on FPCA is different from the result based on PCA.
From Figures 5-7, in case of the in-control dispersion based on FPCA, we can observe that the residuals of a neural network regression model are much closer to zero than the residuals of the GLM with probit and logit model, which are the same as the figures based on PCA.   From Tables 1 and 2, we can compare the results for the neural network regression model, the GLMs with probit and logit link functions based on PCA and FPCA. We found the that, for the in-control and the one-inflated cases, the ARLs based on PCA are smaller than the ARLs based on FPCA but, for zero-inflated case, the ARLs based on FPCA are smaller than the ARLs based on PCA. Another interesting result is that, in terms of the ARLs, the GLMs with probit and logit link functions based on FPCA is overall superior than the the neural network regression model. This result is an opposite result compared with the one based on PCA.

Real Data Analysis
Ref. [22] proposed R package "mlbench" which included Wisconsin breast cancer database named as Breast Cancer. We used Breast Cancer for the illustration of real data analysis in this paper.
The objective of the Wisconsin breast cancer database is to identify each of a number of benign or malignant classes which are binary data ('0' and '1'). Samples arrive periodically as Dr. Wolberg reports his clinical cases. The database is the chronological grouping of the data. A data frame with 699 observations on 11 variables, one being a character variable, nine being ordered or nominal, and 1 target class. In this paper, We used nine covariates and one target variable such as the Cl.thickness (Clump Thickness), Cell.size (Uniformity of Cell Size), Cell.shape (Uniformity of Cell Shape), Marg.adhesion (Marginal Adhesion), Epith.c.size (Single Epithelial Cell Size), Bare.nuclei (Bare Nuclei), Bl.cromatin (Bland Chromatin), Normal.nucleoli (Normal Nucleoli), Mitoses and Class which is the target binary variable (Y) ('0'=benigh and '1'=malignant). Table 3 presents the Pearson correlation coefficients with the Breast Cancer real data. It shows that nine covariates have strong positive correlation coefficients It means that nine covariates in breast cancer real data positively correlated each other. Figure 8 also showed high correlated pairwise scatter plots of nine covariates in breast cancer real data.   Table 4 showed the PCA summary with nine covariates in breast cancer real data. To avoid multicollinearity of the nine covariates, we used principal components for binary response data (Y=Class) via various r-charts based on GLM with probit, GLM with logit, and neural network models. Based on PCA, Table 5 shows that the r-chart based on neural network model has narrow control limits compared with the r-charts based on the GLM with probit and logit link function models. From Figures 9-11, we can observe that the residuals of a neural network regression model are much closer to zero than the residuals of the GLM with probit and logit model. With the narrow control limits of a neural network model, we can monitor the class of the patients with breast cancer by r control chart with important covaraites' information. Table 5. Based on PCA, control limits for binary response data (Y=Class) via various r-charts based on GLM with probit, GLM with logit, and neural network models. Neural network model used single layer and 30 neurons.

Probit
Logit Neural Network    . Based on PCA, r control charts (E(r i ) ± 3 Var(r i )) for probit, logit and neural network. Figure 12 showed the plots of FPCA with nine covariates in Breast Cancer real data so that two main components explain the 94% proportion of variance by FPCA. Hence, we used two main components for the r-chart based regression models. Based on FPCA, Table 6 shows that the r-chart based on the neural network model has narrow control limits compared with the r-charts based on the GLM with probit and logit link function models. From Figures 13-15, we can observe that the residuals of a neural network regression model are much closer to zero than the residuals of the GLM with the probit and logit model which concur to the same result based on PCA. With the narrow control limits of a neural network model, we can monitor the class of the patients with breast cancer by r control chart with important covaraites' information.
Similar to the simulation data analysis, the r-chart based regression models based on PCA have the the narrower control limits than the r-chart based regression models based on FPCA. Table 6. Based on FPCA, control limits for binary response data (Y=death) via various r-charts based on GLM with probit, GLM with logit, and neural network models. Neural network model used a single layer and 30 neurons.

Probit
Logit Neural Network

Conclusions
In this research, we have presented the binary response regression model-based statistical process control r-charts for dispersed binary asymmetrical data with multicollinearity among input variables. We have demonstrated the proposed method in terms of the model flexibility and performance by running simulations for various circumstances: in-control, one inflated-, or zero inflated-dispersion data. With both simulated data and real data, our proposed method has shown a superiority of the performance. Furthermore, we compared PCA-based binary response regression model-based statistical control r-charts and FPCA-based binary response regression model-based statistical control r-charts with the GLM with probit and logit link function models and neural network model. In case of the dimension reduction by PCA, our proposed approach by a neural network is superior in handling cases of dispersed binary asymmetrical data with multicollinearity among explanatory variables. However, in case of the dimension reduction by FPCA, our proposed approach by neural network is superior in handling cases of dispersed binary asymmetrical data with multicollinearity among explanatory variables but it is not more efficient than the proposed method by the dimension reduction by PCA in this research.
The conclusion in this research is that for the high-dimensional correlated multivariate covariate data, the binary control chart by neural network model is a good statistical process control method and, for high-frequency functional multivariate data, the proposed GLM-based control charts by FPCA are good statistical process control methods. Hence, a cancer clinical study can be investigated by the proposed statistical process control.
Our future research will address the following topics. More general versions of binary asymmetric data will be considered with other machine learning models such as deep learning or multi-layer neural network model. Instead of the deviance residual, quantile residual can also be considered in the binary response regression control charts. Lastly, we need to consider other types of bases for constructing FPCA for further comparison studies with PCA-based models.
Author Contributions: J.-M.K. designed the model, analyzed the data and wrote the paper. N.W. formulated the conceptual framework, designed the model, obtained inference and wrote the paper. Y.L. formulated the conceptual framework, designed the model, obtained inference and wrote the paper. K.P analyzed the data and provided editorial supports. All the authors cooperated to revise the paper. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.