Deep Learning-Based Residual Control Chart for Binary Response

: A residual ( r ) control chart of asymmetrical and non-normal binary response variable with highly correlated explanatory variables is proposed in this research. To avoid multicollinearity between multiple explanatory variables, we employ and compare a neural network regression model and deep learning regression model using Bayesian variable selection (BVS), principal component analysis (PCA), nonlinear PCA (NLPCA) or whole multiple explanatory variables. The advantage of our r control chart is able to process both non-normal and correlated multivariate explanatory variables by employing a neural network model and deep learning model. We prove that the deep learning r control chart is relatively efﬁcient to monitor the simulated and real binary response asymmetric data compared with r control chart of the generalized linear model with probit and logit link functions and neural network r control chart.


Introduction
The COVID-19 pandemic started at the end of the year 2019. It has dramatically changed the social life of human activity since people fight against the spread of COVID-19 by covering their faces wearing masks and doing social distancing. Artificial intelligence (AI) platform-based contact-free human activity has been more common in our society since the occurrence of the COVID-19 pandemic. Therefore, deep learning and machine learning methods for artificial intelligence have been exponentially developed by software engineers recently. However, applying the deep learning method to the quality control area has not been deeply considered yet even though AI-based products such as smart glasses with optical head-mounted display have been developed in our society rapidly.
Quality improvement is an endless objective in manufacturing industries. To improve the quality of a product, researchers try to reduce process variations by using the statistical process control (SPC) which has been an essential statistical method to complete this objective and monitor industrial processes among the quality control society. Walter A. Shewhart developed this control chart in 1924 and has been named as Shewhart control chart which is a graphical display of the quality characteristic used for the monitoring of the process. The main creative idea by Shewhart is to consider the variability of a production process in terms of statistical viewpoints and analyze the variation of a process into common and special causes. Many variants of the SPC have been developed since then. Many diverse control charts can be found in [1,2]. Now, we have too many SPCs available so that the choice of the appropriate SPC considering symmetric or asymmetric data has become a prime research question among professionals for quality control in manufacturing industries. Asymmetric, big and highly correlated datasets have been produced in our modern society, and those data have asymmetric and non-normal distributions. Therefore, it is difficult for quality control researchers to handle highly correlated and asymmetric data because the current available quality control charts can not handle asymmetrical data. Therefore, it is common to get inaccurate quality control information from the current available control charts. Numerous multivariate control charts such as the Hotelling T 2 distribution [3], mulvariate CUSUM [4] and multivariate EWMA [5] have been proposed to monitor a process mean vector. But these multivariate control charts have a difficulty to handle non-normal and asymmetric data because of the estimation issue of the unknown covariance structure. Neural network-based approach to quality control research has been popularly applied. Recently [6] proposed r control charts for binary asymmetrical response variable with highly correlated multivariate covariates by using a single layer neural network regression model.
In this research, we extend the single hidden layer neural network regression-based r control charts for binary asymmetrical data to a deep learning regression model with multiple hidden layers via Bayesian variable selection (BVS), principal component analysis (PCA) and nonlinear PCA (NLPCA) so that our r control chart can solve a multicollinearity problem among independent variables. Reference [7] also proposed Poisson, negative binomial and COM-Poisson-based principal component regression-based r-control charts for monitoring dispersed count data to avoid the multicollinearity problem.
Our research proves that our deep learning r control chart has better efficiency than the current methods [6] while overcoming the multicollinearity issue of high-dimensional correlated multivariate data. Our deep learning r control chart will be evaluated with simulated data and Cleveland heart disease read data found in the UCI machine learning repository.

Statistical Methods
This research presents deep learning regression-based r-control charts for binary asymmetrical data with multicollinearity among independent variables. We also compare deep learning regression-based r-control charts with whole data, which means it does not apply one of BVS, PCA, and NLPCA to the whole data, with deep learning regression-based r-control charts with dimension reduction data by applying one of BVS, PCA, and NLPCA to the whole data. In addition, we compare our proposed control chart with the binary response regression models (GLM with logit and probit, and neural network regression model) proposed by [6].

Bayesian Variable Selection and Dimension Reduction by Principal Component Analysis
Before we apply the proposed control chart to a multivariate dataset, we employ the Bayesian variable selection and PCA methods to avoid the multicollearity issue of the multivariate dataset. First, we introduce Objective Bayesian variable selection in linear models proposed in [8]. We used GibbsBvs function with gZellner prior in BayesVarSel R package and performed the number of iterations = 10,000 and the number of burninng = 100 in the BayesVarSel R package [9] for applying the Bayesian variable selection method to a simulated data and real data, the Cleveland heart disease data [10].
In this paper, the r control chart for binary response regression model with the important selected variables by the BVS is a new proposed SPC method which monitors the binary response variable. The BVS method will be applied to GLM with probit, GLM with logit, neural network and deep learning regression models with simulated and real data. PCA is a statistical dimensional reduction method converting a multivariate data set of correlated variables into a set of values of linearly uncorrelated variables called principal components which account for the variation of the original data.
References [6,7,11] considered PCA method for SPCwith the multivariate highly correlated data. Reference [12] proposed nonlinear principal component analysis (NLPCA) as a kernel eigenvalue problem. Unlike linear PCA, the nonlinear kernel PCA (NLPCA) method is a statistical method for performing a nonlinear form of principal component analysis. To extract five principal components in high-dimensional feature spaces, using kernel PCA, we used the 'kernlab' R package [13] which provides the most popular kernel functions. We used Gaussian Radial Basis kernel function with hyperparameter: sigma = 0.2 which is inverse kernel width for the radial basis kernel function.
The r control chart for the binary response regression model with primary principal components by PCA was introduced in [6]. But, in this paper, the r control chart for binary response regression model with primary principal components by the NLPCA is a new statistical process control which monitors the binary response variable as a function of uncorrelated PCs, overcoming a multicollinearity issue among independent variables. The new method will be applied to GLM with probit, GLM with logit, neural network and deep learning regression models with simulated and real data.

Generalized Linear Model and Neural Network Model for Binary Response Data
The r control charts for binary response regression models, such as the GLM with logit and probit, and neural network regression model were proposed by [6]. The GLM has the following probability density distribution which comes from the exponential family: where we denote the response variable to be y, the location parameter to be λ, the dispersion parameter to be δ, and arbitrary functions to be a 1 (·), a 2 (·), and a 3 (·). In particular, a 1 (δ) is commonly of the form a 1 (δ) = δ or a 1 (δ) = δ/w with a known weight w, a 2 (λ) is a cumulant function of λ, and a 3 (y, δ) is a function of y and δ: for various forms of the three functions, we recommend to see Section 2.2.2 in [14]. We denote ζ = x b to be the linear predictor for the response, y, so that ζ is a linear combination of unknown parameters b = (b 0 , b 1 , · · · , b p ) and input variables x = (1, x 1 , · · · , x p ) . A link function g, such that E(y) = g −1 (ζ), provides the relationship between the linear predictor and the mean of the distribution function. The link function g(·) specifies how to convert the expected value µ = E(y) to the linear predictor ζ: i.e., As an example, when the response variable y follows a Bernoulli distribution with a success probability p, we have p = E(y) =µ. If in (1) we take a logit link function with g(p) = Logit(p) = log{p/(1 − p)} where p = P(y = 1|x), a logit model (or logistic model) is given by where b is the column vector of the fixed-effects regression coefficients. For the (2), it can be written as The response probability distribution of the GLMs belongs to an exponential family of distributions which employ methods analogous to normal linear methods for the normal data [15,16]. Therefore, for asymmetrical (non-normal) distributed data, the GLM with probit link function may not be the best model. That was the motivation that [6] proposed a neural network model based on the r control chart for better predictive accuracy with the non-normal data.
Artificial neural networks (ANNs) are the same as biological neural networks which imitate human brain activity through computer simulations [17][18][19] The ANN uses the concept of weight to select the highest probability of inhibiting all neurons [17][18][19]. The basic formula of an ANN is a single layer feedforward type of connection among neurons. ANNs have input layers and multiple hidden layers. Lastly, the hidden layers are connected to the output layer, which produces the outputs. Reference [20] proposed a pattern recognition for bivariate process mean shifts using feature-based ANN and [21] proposed a control chart pattern recognition using radial basis function (RBF) neural networks. Recently, Reference [22] proposed statistical process control with intelligence based on the deep learning model and reviewed the neural network-based statistical process control. In this paper, we used the 'nnet' R packge [23] for feed-forward neural networks with a single hidden layer, and for the deep learning model, we used the 'deepnet' R packge [24] with the backpropagation (BP) algorithm for training feed-forward neural networks by using the 'nn.predict' command.
Based on the setup of [6], the r-control charts for binary response data use GLM models with logit and probit link functions, and neural network models employ deviance residuals being independent and asymptotically normally distributed with zero mean and unit variance, i.e., r i ∼N(0, 1) for i = 1, . . . , n. In this research, we chose a deviance residual for the GLM models with logit and probit link functions and a neural network model because the R packages for the GLM models with logit and probit link functions and the neural network model have a command for producing the deviance residual. It is easy to compare the residuals from both models, which are the GLM-based model, single hidden layer neural network model and multiple hidden layers deep learning model. Reference [25] proposed Shewhart control limits for the deviance residuals as follows: where k is defined by the false alarm probability, α = 1/ARL 0 , and ARL 0 is the average run length (ARL) under the process in-control. The ARL is a measure of the performance of control charts for monitoring a process.

New Binary Response Statistical Process Control Procedure
Our r control chart for binary response uses the following statistical process control procedures for the deviance residuals: 1.
Apply the BVS, PCS and NLPCS to input variables X and obtain the important selected variables or principal components.

2.
Fit the binary response regression model by using the binary response variable y and the important selected variables or the principal components through probit link function, logit link function, and neural network or deep learning regression models, respectively.

3.
Obtain the deviance residuals from each model.

4.
Set the k value and obtain the lower and upper control limits of the r-charts using (4).

Illustrated Examples
With the proposed method in Section 2, we perform the efficiency comparison among the proposed methods with simulated data and real data.

Simulation Study
With high correlated and non-normal simulated data, we want to compare the r control charts for binary response regression models introduced in the Section 2. So we need to generate high correlated and non-normal simulated data denoting X as input variables.
Because of relaxing the assumptions of normality, linearity and independence, copulas have been popular in the research areas of biostatics, econometrics, finance and statistics over the last three decades. A copula is a statistical method to find the dependence structure of multivariate data. By using the copula, we can have the marginal behavior of a random variable and the joint dependence of two random variables. Every joint distribution can be expressed by as A bivariate copula is a function C : [0, 1] 2 → [0, 1], whose domain is the entire unit square with the following three properties: See [26,27] for the definitions of the copula in detail. To construct a highly correlated dependence structure of input variables, we employed two Archimedean copula functions. One is the Clayton copula with a dependence parameter equalling to 3 and the number of dimensions equal to 30, and the other is the Gumbel copula with a dependence parameter equal to 30 and the number of dimensions equalling to 30.
The reason of choosing the Clayton and the Gumbel copulas for generating simulated data in this research is that the Clayton copula is an asymmetric Archimedean copula, exhibiting greater dependence in the negative tail than in the positive and the Gumbel copula is an asymmetric Archimedean copula, exhibiting greater dependence in the positive tail than in the negative.
We generate a random sample of 1000 observations from each copula. The random sample is assigned to X as input variables. With each simulated random sample data X, we define the coefficients of parameters (β's) to be β 0 = −0.17186, which passes through an inverse logit function. Then, we generate the response variable y randomly by using the Bernoulli distribution with the probability P(y = 1) with sample size 1000. For the one ('1') inflated case of binary response data, we added 0.1 to the probability P(y = 1), such as P(y = 1) + 0.1, and for the zero ('0') inflated case of binary response data, we subtracted 0.1 from the probability P(y = 1), such as P(y = 1) − 0.1. Additionally, P(y = 1) is used for the in-control dispersion case.
In each setup, we perform 1000 different replications of sample size of 1000. Table 1 shows the simulation results. With the simulated data, 70% of data were assigned to the training data and 30% of data were assigned to the test data.
We apply the BVS, PCS, and NLPCS to input variables X and then we fit the binary response regression model by using the binary response variable y and the important selected variables or the principal components or the whole data through the probit link function, logit link function, and neural network or deep learning regression models, respectively. We used the 'nnet' R packge [23] for feed-forward neural networks with a single hidden layer with 30 neurons by using 'predict' command, and for the deep learning model, we used the 'deepnet' R packge [24] with the backpropagation (BP) algorithm for training feed-forward neural networks with double hidden layers and (15,15) neurons by using the 'nn.predict' command.
where Root MSE = root mean squared error, i = variablei, n = number of observations, x i = actual observation, andx i = predicted value of x i observations. By using the Root MSE Formula (5), we performed the Root MSE of each simulated in-control data of sample size 1000 with 1000 repetitions in Table 1. It is a surprising result that the r-chart based on the deep learning models with BVS, PCS, NLPCS and whole data for both the Clayton and the Gumbel copulas show a superiority to all other cases in Table 1 in terms of the accuracy and precision by mean, median, and interquartile range (IQR). From Figures 1 and 2 in cases of the in-control, over dispersion and under dispersion, we can observe that the residuals of deep learning regression models with BVS, PCS, NLPCS, and whole data for both the Clayton and Gumbel copulas show a superiority over the neural network regression models with BVS, PCS, NLPCS, and whole data in Table 1 in terms of the precision by a measure of spread, IQR.    With three cases of the in-control, over dispersion, and under dispersion in Tables 2-7, we apply the BVS, PCS, and NLPCS to input variables X and then we fit the binary response regression model by using the binary response variable y and the important selected variables or the principal components through neural network or deep learning regression models, respectively. By using the deviance residuals for each model and (4) for k = 1, 2, 3, we compute the lower control limit (LCL) and upper control limit (UCL) for the process. The expected length of the confidence interval is computed by the average of the length of control limits. The coverage probability is the proportion of the deviance residuals contained in the control limits. The lower control limit and the upper control limit value for the r-chart are calculated by means of y minus and plus its one, two, and three standard deviations. Mainly, we compare the results for the deep learning regression model and neural network regression model based on BVS, PCA, NLPCA, and whole data because [6] showed that the neural network regression model (Nnet) outperformed the GLMs with probit and logit link functions based on PCA. We found that, for the in-control case, the one-inflated case, and zero-inflated case, the expected lengths of the confidence interval on the deep learning regression model (DL) based on BVS, PCA, NLPCA, and whole data are shorter than in all other cases of Nnet in Tables 2-5 while, in terms of the coverage probability, the DL is keeping overall higher than the neural network regression model (nnet). In terms of the ARLs, the coverage probability and the expected length of the confidence interval, we note that the r-chart based on the DL based on whole data for monitoring observations is about the same as the r-chart based on the DL based on BVS, PCA, and NLPCA.

Real Data Analysis
For the real data application, we used Wisconsin breast cancer data in the R package 'mlbench' [23]. The objective of collecting the data was to identify a number of benign or malignant classes. Samples arrive periodically as Dr. Wolberg reports his clinical cases. The database, therefore, reflects this chronological grouping of the data. This grouping information appears immediately below, having been removed from the data itself. Mitoses, (11) Class), one being a character variable, 9 being ordered or nominal, and 1 target class. By using the R package 'missForest' [28], we imputed the missing data in the Wisconsin breast cancer data. We set the target variable (y) to be Class ("malignant" = 0, "benign" = 1) and 9 input variables except for Id and Class variables in the Wisconsin breast cancer data. We also used the 'nnet' R packge [23] with a single hidden layer with 30 neurons by using 'predict' command, and for deep learning model, we used the R packge 'deepnet' [24] with double hidden layers and (15,15) neurons by using 'nn.predict' command on the Wisconsin breast cancer data. By using the Root MSE formula (5), we performed Root MSE of each random sample data of sample size 478(= 0.683 × 0.7) out of the total number of data (683) with 1000 repetitions in Table 6. It confirms that the r-chart based on the DL models with BVS, PCS, NLPCS and whole data show a superiority to all other cases in Nnet models in Table 6 in terms of the accuracy and precision by mean and interquartile range (IQR).
The expected lengths of the confidence interval on the DL-based on BVS, PCA, NLPCA, and whole real data are shorter than in all other cases of Nnet in Table 7 while, in terms of the coverage probability, DL is overall keeping higher than the Nnet. In terms of the ARLs, the coverage probability and the expected length of the confidence interval, we note that the r-chart based on the DL based on whole real data is about the same as the r-chart based on the DL based on BVS, PCA, and NLPCA.
Therefore, from the simulation study and real data analysis, we confirmed that the DL based r control chart for binary response data on BVS, PCA, NLPCA, and whole real data are superior to the Nnet-based r control chart for binary response data on BVS, PCA, NLPCA, and whole data in terms of the accuracy, precision, coverage probability, and expected length of the confidence interval.

Conclusions
In this research, we have presented the binary response DL regression model-based statistical process control r-charts for dispersed binary asymmetrical data with multicollinearity among input variables. We have demonstrated the proposed DL method in terms of the model flexibility and performance by running simulations for various circumstances: in-control, one inflated-, or zero inflated-dispersion data. With both simulated data and real data, our DL proposed methods based on BVS, PCA, NLPCA, and whole data have shown a superiority of performance compared with the binary response regression model-based statistical control r-charts with the GLM with probit and logit link function models and Nnet based on BVS, PCA, NLPCA and whole data. We also showed that the binary response DL regression model-based statistical process control r-charts for dispersed binary asymmetrical data with multicollinearity among input variables does not need dimension reduction methods such as BVS, PCA, and NLPCA because the results with the dimension reduction methods, such as BVS, PCA, and NLPCA are the same as the results without the dimension reduction methods. Our proposed approach by deep learning is superior in handling cases of dispersed binary asymmetrical data with multicollinearity among explanatory variables. The conclusion in this research is that for the high-dimensional correlated multivariate covariate data, the binary control chart by DL is a good statistical process control method. Our proposed binary control chart by DL can be applied to improve the quality control of visual fault detection medical equipment devices such a full-body X-ray scanner or brain functional magnetic resonance imaging scanner or a computed tomography (CT) scanner for detecting cancers. Our future research will be the general version of DL-based SPC for categorical data or continuous data or mixed data for categorical and continuous data. We will also apply our proposed method to a multi-stage SPC for binary outcome variables given the covariates.