A Novel Semi-Supervised Feature Extraction Method and Its Application in Automotive Assembly Fault Diagnosis Based on Vision Sensor Data

The fault diagnosis of dimensional variation plays an essential role in the production of an automotive body. However, it is difficult to identify faults based on small labeled sample data using traditional supervised learning methods. The present study proposed a novel feature extraction method named, semi-supervised complete kernel Fisher discriminant (SS-CKFDA), and a new fault diagnosis flow for automotive assembly was introduced based on this method. SS-CKFDA is a combination of traditional complete kernel Fisher discriminant (CKFDA) and semi-supervised learning. It adjusts the Fisher criterion with the data global structure extracted from large unlabeled samples. When the number of labeled samples is small, the global structure that exists in the measured data can effectively improve the extraction effects of the projected vector. The experimental results on Tennessee Eastman Process (TEP) data demonstrated that the proposed method can improve diagnostic performance, when compared to other Fisher discriminant algorithms. Finally, the experimental results on the optical coordinate data proves that the method can be applied in the automotive assembly process, and achieve a better performance.


Introduction
As it is well-known, an automotive body is a complex structure consisting of a large amount of thin stamping sheets with different geometry shapes welded together. Dimensional variation produced during the welding process affects not only the appearance and performance of automotive products, but also the personal and property safety of the user. Thus, the fault diagnosis of dimensional variation is vital to the quality promotion of vehicles. In recent years, in order to meet the needs of different customer groups, automobile manufacturers must develop more detailed product lines, and inevitably, the product development cycle needed to be shortened. This has undoubtedly put forward higher requirements for fault diagnosis, including quick, high accuracy and intelligent.
The traditional coordinate data source is a coordinate measurement machine (CMM), which has been widely established in welding workshops to measure automotive body dimensional features [1]. Although CMM has high accuracy and reliability, it also has very obvious disadvantages, such as low efficiency and harsh environmental requirements. According to statistics, merely <3% of automotive bodies can be measured in most workshops [2], which is far from enough for quick fault discovery, much less for fault diagnosis. In recent years, with the rapid development of optical imaging devices, various vision-based systems have been widely applied in modern industries [3,4]. In automotive welding workshops, more and more hybrid visual inspection systems have been distributed on production lines for 100% online inspection [4]. Combined the industrial robot's high-flexibility and high-speed visual inspection technology, the measure speed of a hybrid visual inspection system can reach 80 points per minute, which is much more efficient than CMM, and provides favorable conditions for data-driven fault diagnosis.
One of the important fault diagnosis methods for assembly dimensional variation is the model-based method, which achieves fault diagnosis by constructing a mathematical model, and estimating the residual between estimates and measurements. The classical model-based method is the state space modeling approach [5,6]. In this method, the dimensional variation propagation is described through a state space model, and an observation equation, which represents the relationship between the observation vector and state vector, is also established. The state space modeling provides a method to describe the overall body assembly variation propagation. However, once the number of welding stations increases, the modeling process will become extremely difficult, and even lead to failure of diagnosis. Different from model-based methods, the necessary process information of data-driven methods can be extracted directly from huge amounts of recorded process data. Hu was the first to introduce principal component analysis (PCA) into fixture fault diagnosis to extract the deviation pattern and identify the source of fault by comparing the deviation pattern with potential failure modes. Jang and Yang [7] adopted artificial neural networks (ANNs) to deal with noisy or missing data. Lian [8] extracted the variations of the measured data by correlation analysis, and found the root of fault through the maximal tree method. For small data set problems, Liu [2] established the Bayesian network model, and was successfully able to overcome it.
As a well-known feature extraction method, Fisher discriminant analysis (FDA) and its extension methods have been applied in various industries [9][10][11][12]. Chen [9] introduced local FDA and support vector machines for hepatitis disease diagnosis. Wen [10] applied weighted KFDA to extract the E-nose's data, and achieved a desirable effect. Direct linear discriminant analysis was proposed for view-angle invariant gait recognition [11]. Unlike PCA, which is an unsupervised method, FDA takes full advantage of its sample label information, which can be regarded as a more outstanding method, compared to PCA [13]. However, these supervised methods need sufficient labeled samples to achieve a desirable result, which is hard to satisfy in the real world. Especially in the automobile industry, it is costly and time-consuming to label fault data with different causes. However, unlabeled data can be easily obtained. For example, if a fixture failure occurs in one assembly station, the corresponding dimension deviation will delay a period of time, which is difficult even for professional technicians to trace back.
In order to solve the problem of less labeled data, the investigators propose a novel semi-supervised complete kernel Fisher discriminant analysis method for fault diagnosis, called SS-CKFDA. The basic idea of SS-CKFDA is to extract the global structure from the unlabeled sample to enhance the generalization performance of FDA. Meanwhile, the kernel method is reserved to solve the nonlinear problem by projecting the data from the linear inseparable input space to a high-dimensional feature space, which is linearly separable. The innovative contributions of the present study are summarized, as follows: (1) A novel semi-supervised feature extraction method called, SS-CKFDA, was proposed. In order to confirm the method's effectiveness, a simulation experiment based on TEP data was performed, and the results were compared with three other FDA-based methods. (2) SS-CKFDA was introduced to mine fault information obtained from the historical data of the hybrid visual inspection system. Based on this, a novel fault diagnosis flow was put forward for automotive body assembly. (3) A real experimental system for the automotive body assembly process was realized, and the results show that the proposed method can greatly enhance diagnosis accuracy, especially when less labeled data are obtained. (4) Two representative classifiers, k-nearest neighbor classifier (KNN) and minimum distance classifier (MD), were also discussed in the present study.
The present study is organized as follows: Section 2 presents the details of the proposed SS-CKFDA algorithm for fault diagnosis. The two different experimental datasets used in the present study are introduced in Section 3. In Section 4, simulations on TEP data were carried out to verify the effectiveness of the proposed SS-CKFDA algorithm, and analyze the effects of the parameters and classifiers. Then, the experimental results for the automotive assembly data were subsequently analyzed. Finally, the conclusion is presented in Section 5.

Principle of the Complete Kernel Fisher Discriminant Analysis (CKFDA)
The CKFDA algorithm was proposed based on a novel framework, KPCA plus LDA, which makes full use of irregular discriminant information that exists in the null space of the within-class scatter matrix [14]. The irregular discriminant subspace, which is neglected in the previous KFDA algorithm, is as powerful as the regular discriminant subspace for feature recognition. Combined with discriminant vectors in regular and irregular subspaces, CKFDA can outperform other FDA-based algorithms. Although CKFDA was proposed as an effective tool for solving the "small sample size" problem, it has also been proven to be suitable for "large sample size" problems. The principle of CKFDA is as follows: First, KPCA [15] is performed to transform the original input space n into a m-dimensional space m . Hence, we first set forth the KPCA algorithm in detail.
It is assumed that a set of M training samples x 1 , x 2 , . . . , x M in space n can be mapped into a feature space H by a given nonlinear mapping Φ : x → Φ(x) . Then, the covariance matrix on the feature space can be given by: . In addition, for every eigenvector of S Φ t , η can be linearly expanded by In order to obtain the expansion coefficients, Q = [Φ(x 1 ), . . . Φ(x M )] and the Gram matrix R = Q T Q were denoted, and the elements can be gained using the kernel trick: Kernel function can be any function that fulfills Mercer's condition [16]. The most common kernel functions are Gaussian kernel, polynomial kernel and sigmoid kernel.
Before the following operation, the Gram matrix R should be centralized [16]. The orthonormal eigenvectors γ 1 , γ 2 , . . . , γ m of R are calculated, which correspond to the m largest eigenvalues λ 1 ≥ λ 2 ≥ . . . λ m . Then, eigenvector η j of S Φ t can also be obtained by: Lastly, the KPCA-transformed feature vector y = (y 1 , y 2 , . . . , y m ) T can be obtained by: Next, LDA is performed in the KPCA-transformed space and the regular and irregular discriminant features are extracted. Suppose that there are c pattern classes, the between-class scatter matrix S b and the within-class scatter matrix S w in the KPCA-transformed space can be defined through the following pairwise forms [17]: where: 1/n − 1/n k if y i ∈ class k and y j ∈ class k 1/n otherwise (7) and: i,j and W w i,j are the corresponding coefficient matrix of S b and S w . It was assumed that α 1 , . . . , α m are the orthonormal eigenvectors of S w , and the first q ones correspond to positive eigenvalues, q = rank(S w ). Then, the irregular space of S w is the subspace Θ w = span α q+1 , . . . , α m , and its orthogonal complementary space Θ ⊥ w = span α 1 , . . . , α q is the regular space.
In the regular space, P 1 = α 1 , . . . , α q , and the optimal regular mapping vectors can be gained by maximizing the Fisher criterion argmax ξ represents the optimal vectors of Fisher criterion. The optimization problem is proven to be equal to the generalized eigenvalue problem: S b ξ = λ S w ξ. After working out u 1 , . . . , u d (d ≤ c − 1), which correspond to the d largest positive eigenvalues of S b ξ = λ S w ξ, the optimal regular discriminant feature vector can be obtain, as follows: In a similar way, in the irregular space, P 2 = α q+1 , . . . , α m , and the criterion is converted into argmax ξ ξ TŜ b ξ, ( ξ = 1), whereŜ b = P T 2 S b P 2 . The optimal vector v 1 , . . . , v d (d ≤ c − 1) ofĴ(ξ) are the orthonormal eigenvectors ofŜ b , which corresponds to d largest eigenvalues. Then, the optimal irregular discriminant feature vector can be obtained, as follows: Finally, the two kinds of discriminant features are fused for classification. The normalized-distance between sample z = z 1 , z 2 and training samples where θ is the fusion coefficient, which determines the weight of the regular discriminant information in the decision level. In this study, θ was set to 1. To this point, the feature extraction process of CKFDA has been completed. The original features are reduced to two kinds of discriminant features z 1 , z 2 , and the normalized-distance can be used for the next classification with one distance classifier.

Semi-Supervised Learning
Traditional supervised learning only uses labeled data to train, which inevitably tends to perform poorly due to overfitting when there are no adequate labeled samples. In some cases, such as automotive assembly quality monitoring, labeled samples are hard to obtain, while unlabeled data are abundant. Therefore, semi-supervised learning is an effective solution to reduce human labor and improve identification accuracy. The goal of semi-supervised learning is to change the learning behavior of the traditional supervised method by combining this with unlabeled data, and design algorithms that take advantage of such combination [18].
SELF is an effective graph based semi-supervised learning algorithm [17][18][19]. The basic idea of SELF is that the information extracted by limited labeled samples can be adjusted through the global structure of unlabeled samples. The algorithm flow of SELF is as follows.
Assuming that the input samples X can be expressed by X = {X L , X U }, X L refers to labeled samples, which corresponds to the above x j i with c classes, while X U refers to unlabeled samples. Then, SELF defines the regularized between-class and within-class scatter matrices, as follows: where S b and S w corresponds to the between-class scatter matrix and within-class scatter matrix of the labeled samples X L in Equations (5) and (6).
total scatter matrix of all samples X, where W t i,j = 1/n. β is a trade-off parameter, and its value range is [0,1].
In SELF, S t represents the global structure, and by adjusting parameter β, the influence of S t can be controlled. When β = 0, SELF inclines to FDA, which is a supervised method, and when β = 1, SELF deteriorates to PCA, which is an unsupervised method. When β ∈ (0, 1), SELF is a semi-supervised method.

The SS-CKFDA Algorithm
Based on the above idea of semi-supervised learning, a novel semi-supervised CKFDA algorithm (SS-CKFDA), which combined CKFDA with SELF, was proposed first. Although CKFDA can achieve an ideal performance with labeled samples, the investigators still wanted to obtain more information from the unlabeled samples. Through the algorithm description of CKFDA, it could be observed that the double discriminant subspaces (DDS), which irregularly and regularly spanned in the CKFDA feature spaces, were learned from the labeled samples, and this can be inaccurate when the labeled samples are insufficient. By complementing semi-supervised learning in the feature spaces, DDS can be corrected through sufficient unlabeled samples. This is the basic idea of SS-CKFDA. The steps for the proposed SS-CKFDA algorithm are presented, as follows: Step 1: KPCA is performed for both labeled and unlabeled samples.
Step 2: SELF is performed in m . First, between-class scatter matrices S b and within-class scatter matrices S w are constructed with data Y L by Equations (5) and (6). Then, the regular between-class scatter matrices S rb and regular within-class scatter matrices S rw are constructed with data S b , S w and Y by Equations (12) and (13).
Step 3: The S rw 's orthonormal eigenvectors α 1 , . . . , α m are calculated, assuming that the first q (q = rank(S rw )) corresponds to the positive eigenvalues.
Step 5: The regular and irregular discriminant features are fused using Equation (11) for classification.
The proposed SS-CKFDA algorithm has two advantages. On one hand, kernel methods have been intensively used to determine the nonlinear structure, which adopts nonlinear mapping to map the input data into an implicit feature space, where the data will obtain a nonlinear representation. In order to avoid implicit features computation, nonlinear mapping usually adopts kernel tricks, that is, the nonlinear dot product kernel in the kernel methods. On the other hand, SELF was applied in DDS. SELF can incorporate the global structure of unlabeled samples into Fisher criterion to overcome the overfitting problem induced by the lack of labeled samples. Step 1:

Comparison of SS-CKFDA with Other FDA Algorithms
KPCA is performed for both labeled and unlabeled samples. Data Step 2: SELF is performed in m ℜ . First, between-class scatter matrices b S and within-class scatter matrices w S are constructed with data L Y by Equations (5) and (6). Then, the regular between-class scatter matrices rb S and regular within-class scatter matrices rw S are constructed with data b S , w S and Y by Equations (12) and (13).
Step 3: The rw S 's orthonormal eigenvectors 1 , , m α α  are calculated, assuming that the first q (q = rank( rw S )) corresponds to the positive eigenvalues.
Step 4: The regular discriminant feature Equation (9) and irregular discriminant feature Step 5: The regular and irregular discriminant features are fused using Equation (11) for classification.
The proposed SS-CKFDA algorithm has two advantages. On one hand, kernel methods have been intensively used to determine the nonlinear structure, which adopts nonlinear mapping to map the input data into an implicit feature space, where the data will obtain a nonlinear representation. In order to avoid implicit features computation, nonlinear mapping usually adopts kernel tricks, that is, the nonlinear dot product kernel in the kernel methods. On the other hand, SELF was applied in DDS. SELF can incorporate the global structure of unlabeled samples into Fisher criterion to overcome the overfitting problem induced by the lack of labeled samples. FDA and SELF are implemented in the input space. Compared to FDA, SELF utilizes unlabeled data to improve data learning ability. However, as shown in Figure 1a, when dealing with nonlinear data, the performance improvement of SELF is limited. CKFDA and SS-CKFDA are nonlinear extensions of FDA and SELF, respectively. In essence, these implement FDA and SELF in a fused feature space. By nonlinear mapping ( ) X Φ , the sample data in the input space, which is linearly inseparable, becomes separable in the feature space, as shown in Figure 1b. By using the FDA and SELF are implemented in the input space. Compared to FDA, SELF utilizes unlabeled data to improve data learning ability. However, as shown in Figure 1a, when dealing with nonlinear data, the performance improvement of SELF is limited. CKFDA and SS-CKFDA are nonlinear extensions of FDA and SELF, respectively. In essence, these implement FDA and SELF in a fused feature space. By nonlinear mapping Φ(X), the sample data in the input space, which is linearly inseparable, becomes separable in the feature space, as shown in Figure 1b. By using the global structure information extracted from the unlabeled data, SS-CKFDA can determine a better discriminant vector that can well-separate the unlabeled data, when compared with CKFDA.

SS-CKFDA for Fault Diagnosis
A detail flowchart of the SS-CKFDA modeling method for online fault diagnosis is presented in Figure 2. For the new data sample x new , the first step is to extract its discriminant feature z new using the SS-CKPCA model. The next step is to conduct a discriminant analysis based on the discriminant feature z new to determine which fault type the new sample x new belongs to. In the present study, two distance classifiers, KNN classifier and MD classifier, were designed for fault classification.
The KNN classifier, which is a common non-parameter classification method, has extensive research and application background in pattern recognition, machine learning and data mining due to features of being intuitionistic, simple, effective and easy to realize [20]. Its working principle is to first determine the k nearest neighbor of the data sample to be classified in the training set. Then, the data sample is classified by the majority vote of its neighbors. Distance metric is a key part of the KNN classifier, and plays an important role in the performance of the algorithm. In SS-CKFDA, normalized-distance g(z, z i ) defined by Equation (16) is the distance metric. The MD classifier is another simple and easy-to-used distance classifier. First, the mean vector u i = u 1 i , u 2 i of class i in the training sample is calculated. Then, for the new online data z, if g(z, u k ) = min i g(z, u i ), then z belongs to class k. global structure information extracted from the unlabeled data, SS-CKFDA can determine a better discriminant vector that can well-separate the unlabeled data, when compared with CKFDA.

SS-CKFDA for Fault Diagnosis
A detail flowchart of the SS-CKFDA modeling method for online fault diagnosis is presented in Figure 2. For the new data sample new x , the first step is to extract its discriminant feature new z using the SS-CKPCA model. The next step is to conduct a discriminant analysis based on the discriminant feature new z to determine which fault type the new sample new x belongs to. In the present study, two distance classifiers, KNN classifier and MD classifier, were designed for fault classification.
The KNN classifier, which is a common non-parameter classification method, has extensive research and application background in pattern recognition, machine learning and data mining due to features of being intuitionistic, simple, effective and easy to realize [20]. Its working principle is to first determine the k nearest neighbor of the data sample to be classified in the training set. Then, the data sample is classified by the majority vote of its neighbors. Distance metric is a key part of the KNN classifier, and plays an important role in the performance of the algorithm. In SS-CKFDA, normalized-distance ( ) i g z, z defined by Equation (16) is the distance metric. The MD classifier is another simple and easy-to-used distance classifier. First, the mean vector

Experiment Description
In order to verify the performance improvement of fault diagnosis based on the SS-CKFDA algorithm, two experiments that came from different applications are exploited and studied in the present study. The Tennessee Eastman Process (TEP) was employed to evaluate the superiority of the proposed method with other FDA-based algorithms in the viewpoint of less labeled samples situation, and the present method was further validated in the fault diagnosis of an actual automotive assembly process.

Experiment Description
In order to verify the performance improvement of fault diagnosis based on the SS-CKFDA algorithm, two experiments that came from different applications are exploited and studied in the present study. The Tennessee Eastman Process (TEP) was employed to evaluate the superiority of the proposed method with other FDA-based algorithms in the viewpoint of less labeled samples situation, and the present method was further validated in the fault diagnosis of an actual automotive assembly process.

Experiment I: Tennessee Eastman Process
TEP, which has been widely used as a benchmark for evaluating fault diagnosis methods, consists of five main units: a reactor, a condenser, a tripper, a separator, and a compressor [21]. Figure 3 presents the schematic of the TEP. A, C, D and E presents the four gas phase reactants, G and H presents the two liquid products, F presents the byproduct and B presents an inert. The reactor is used to transform the fed gaseous reactants into the liquid products. In the present study, the TEP simulator downloaded from http://www.brahms.scs.uiuc.edu was adopted, which can simulate a normal operating condition and 21 faulty conditions.
In the present study, TEP experimental data consisted of a normal operation mode dataset and four classes of faulty datasets, which were labeled 1-5, respectively. The specific description of a dataset in the present experiment is indicated in Table 1. There were 500 normal operation samples and 100 samples in each type of faulty dataset in the training set. The proportion of labeled data was changed, while the rest of the training samples were unlabeled data. In the present study, the algorithm with a proportion of unlabeled data rates that changed from 10% to 90% was tested. Meanwhile, there are 100 normal operation samples and 200 faulty samples in the testing dataset, with the assumption that the number of samples was equal with the different faulty types. For each sample, all 41 measurement variables were selected as monitored variables.

Experiment I: Tennessee Eastman Process
TEP, which has been widely used as a benchmark for evaluating fault diagnosis methods, consists of five main units: a reactor, a condenser, a tripper, a separator, and a compressor [21]. Figure 3 presents the schematic of the TEP. A, C, D and E presents the four gas phase reactants, G and H presents the two liquid products, F presents the byproduct and B presents an inert. The reactor is used to transform the fed gaseous reactants into the liquid products. In the present study, the TEP simulator downloaded from http://www.brahms.scs.uiuc.edu was adopted, which can simulate a normal operating condition and 21 faulty conditions.
In the present study, TEP experimental data consisted of a normal operation mode dataset and four classes of faulty datasets, which were labeled 1-5, respectively. The specific description of a dataset in the present experiment is indicated in Table 1. There were 500 normal operation samples and 100 samples in each type of faulty dataset in the training set. The proportion of labeled data was changed, while the rest of the training samples were unlabeled data. In the present study, the algorithm with a proportion of unlabeled data rates that changed from 10% to 90% was tested. Meanwhile, there are 100 normal operation samples and 200 faulty samples in the testing dataset, with the assumption that the number of samples was equal with the different faulty types. For each sample, all 41 measurement variables were selected as monitored variables.

Experiment II: Automotive Assembly Process
As mentioned at the beginning, visual inspection systems have been widely used to monitor the quality of automotive assemblies. As shown in Figure 4, a typical visual inspection system mainly consists of a six-degree-of-freedom industrial robot and a flexible vision sensor [3]. The measuring principle of the vision sensor is the optical triangulation principle [22]. The coordinate systems of the

Experiment II: Automotive Assembly Process
As mentioned at the beginning, visual inspection systems have been widely used to monitor the quality of automotive assemblies. As shown in Figure 4, a typical visual inspection system mainly consists of a six-degree-of-freedom industrial robot and a flexible vision sensor [3]. The measuring principle of the vision sensor is the optical triangulation principle [22]. The coordinate systems of the visual inspection system comprise of a workpiece frame (O w − X w Y w Z w ), a robot base , an end-effector frame (O h − X h Y h Z h ) and a vision sensor frame (O s − X s Y s Z s ). After global calibration was performed for the system, the original coordinate data collected from the vision sensor was unified and accessed in the workpiece frame O w − X w Y w Z w .
In the present experiment, the measured values of the rear combination assembly area were chosen to evaluate the proposed fault diagnosis method. The rear combination was the area that was important and more likely to expose the problem. This was assembled by the side panel, floor and trunk lid. In order to eliminate measurement noise, three statistics for each coordinate variables were calculated: the mean value . , x i ) with k = 5. There were 13 measure coordinates in this area, and 39 features were finally obtained to monitor this assembly. In the rear assembly station, the Nylon blocks on the fixture play an important role in adjusting the position of the auto-body part. Four typical fault modes exist in the nylon blocks deviation: the Y direction deviation on the right side, the X direction deviation on the right side, the Y direction deviation on the left side, and the X direction deviation on the left side. For simplicity, the label Y-R, X-R, Y-L and X-L were used to represent the four fault modes. After a period of data collection and class labeling, a total of 100 samples were acquired for each fault mode. In addition, 100 normal condition samples were also added in the experiment. Similarly, the dataset was divided into the training set and test set to evaluate the performance of various algorithms. global calibration was performed for the system, the original coordinate data collected from the vision sensor was unified and accessed in the workpiece frame w In the present experiment, the measured values of the rear combination assembly area were chosen to evaluate the proposed fault diagnosis method. The rear combination was the area that was important and more likely to expose the problem. This was assembled by the side panel, floor and trunk lid. In order to eliminate measurement noise, three statistics for each coordinate variables were calculated: the mean value There were 13 measure coordinates in this area, and 39 features were finally obtained to monitor this assembly. In the rear assembly station, the Nylon blocks on the fixture play an important role in adjusting the position of the auto-body part. Four typical fault modes exist in the nylon blocks deviation: the Y direction deviation on the right side, the X direction deviation on the right side, the Y direction deviation on the left side, and the X direction deviation on the left side. For simplicity, the label Y-R, X-R, Y-L and X-L were used to represent the four fault modes. After a period of data collection and class labeling, a total of 100 samples were acquired for each fault mode. In addition, 100 normal condition samples were also added in the experiment. Similarly, the dataset was divided into the training set and test set to evaluate the performance of various algorithms.

Results and Discussion
The SS-CKFDA was used to extract the feature information of two datasets. FDA, SELF and CKFDA were employed to act as controlled trials to demonstrate the validity of the proposed method. Meanwhile, BP-ANN, KNN and SVM were also employed to prove whether SS-CKFDA is able to achieve an ideal performance in dealing with the fault diagnosis in an automotive assembly process.

Parameter Selection Strategy
As described above, the tuning parameter β and kernel parameter σ play important role in SS-CKFDA algorithm. So we analyze the effects of the two parameter separately first. Then we proposed a parameter selection strategy based on repeated grid search cross-validation to find the optimal parameters.

Results and Discussion
The SS-CKFDA was used to extract the feature information of two datasets. FDA, SELF and CKFDA were employed to act as controlled trials to demonstrate the validity of the proposed method. Meanwhile, BP-ANN, KNN and SVM were also employed to prove whether SS-CKFDA is able to achieve an ideal performance in dealing with the fault diagnosis in an automotive assembly process.

Parameter Selection Strategy
As described above, the tuning parameter β and kernel parameter σ play important role in SS-CKFDA algorithm. So we analyze the effects of the two parameter separately first. Then we proposed a parameter selection strategy based on repeated grid search cross-validation to find the optimal parameters.

The Effect of Kernel Parameter σ and Tuning Parameter β
First, the effect of the kernel parameter for SS-CKFDA with KNN was evaluated. The other parameters were chosen as follows: the labeled rate and β were set to 0.5; the parameter of KNN is set to 9. Figure 5a presents the performance of SS-CKFDA with the different kernel parameter σ, which ranged from 1 to 50. It was obvious that classification accuracy varies with different kernel values. Generally, the performance content initially increased as a result of the increase in kernel parameters. However, this decreased thereafter when the values of the kernel parameters arrived at a certain stage.

The Effect of Kernel Parameter  and Tuning Parameter 
First, the effect of the kernel parameter for SS-CKFDA with KNN was evaluated. The other parameters were chosen as follows: the labeled rate and  were set to 0.5; the parameter of KNN is set to 9. Figure 5a presents the performance of SS-CKFDA with the different kernel parameter  , which ranged from 1 to 50. It was obvious that classification accuracy varies with different kernel values. Generally, the performance content initially increased as a result of the increase in kernel parameters. However, this decreased thereafter when the values of the kernel parameters arrived at a certain stage.
Then, we evaluated the effect of tuning parameter  with kernel parameter  set to 7 and the other parameter remained unchanged. Figure 5b shows the performance of SS-CKFDA with the different parameter  ranging from 0 to 1. It is clear that the semi-supervised algorithm outperformed both the unsupervised algorithm when 0   and the supervised algorithm when It can be observed that the performance of SS-CKFDA algorithms is affected by both the value of the kernel parameter  and parameter  .

Parameter Selection Strategy
When SS-CKFDA is applied to the TEP data and automotive assembly process data, these two optimization parameters (  and  ) need to be decided. For KNN classifier, the chosen of K is also an important parameter. In this study, we applied cross-validation for parameter tuning. An initial set of possible input parameters were chosen first and then repeated grid search cross-validation algorithm [23] was performed to find optimal parameters for SS-CKFDA. By repeating cross-validation N times and for each grid point generating N cross-validation errors, the tuning parameter with minimal mean cross-validation error was chosen as optimal. An overview of the parameter selection strategy based on repeated grid search cross-validation is presented in Figure 6.   Then, we evaluated the effect of tuning parameter β with kernel parameter σ set to 7 and the other parameter remained unchanged. Figure 5b shows the performance of SS-CKFDA with the different parameter β ranging from 0 to 1. It is clear that the semi-supervised algorithm outperformed both the unsupervised algorithm when β = 0 and the supervised algorithm when β = 1.
It can be observed that the performance of SS-CKFDA algorithms is affected by both the value of the kernel parameter σ and parameter β.

Parameter Selection Strategy
When SS-CKFDA is applied to the TEP data and automotive assembly process data, these two optimization parameters (σ and β) need to be decided. For KNN classifier, the chosen of K is also an important parameter. In this study, we applied cross-validation for parameter tuning. An initial set of possible input parameters were chosen first and then repeated grid search cross-validation algorithm [23] was performed to find optimal parameters for SS-CKFDA. By repeating cross-validation N times and for each grid point generating N cross-validation errors, the tuning parameter with minimal mean cross-validation error was chosen as optimal. An overview of the parameter selection strategy based on repeated grid search cross-validation is presented in Figure 6. First, the effect of the kernel parameter for SS-CKFDA with KNN was evaluated. The other parameters were chosen as follows: the labeled rate and β were set to 0.5; the parameter of KNN is set to 9. Figure 5a presents the performance of SS-CKFDA with the different kernel parameter σ , which ranged from 1 to 50. It was obvious that classification accuracy varies with different kernel values. Generally, the performance content initially increased as a result of the increase in kernel parameters. However, this decreased thereafter when the values of the kernel parameters arrived at a certain stage.
Then, we evaluated the effect of tuning parameter β with kernel parameter σ set to 7 and the other parameter remained unchanged. Figure 5b shows the performance of SS-CKFDA with the different parameter β ranging from 0 to 1. It is clear that the semi-supervised algorithm outperformed both the unsupervised algorithm when 0 β = and the supervised algorithm when It can be observed that the performance of SS-CKFDA algorithms is affected by both the value of the kernel parameter σ and parameter β .

Parameter Selection Strategy
When SS-CKFDA is applied to the TEP data and automotive assembly process data, these two optimization parameters ( σ and β ) need to be decided. For KNN classifier, the chosen of K is also an important parameter. In this study, we applied cross-validation for parameter tuning. An initial set of possible input parameters were chosen first and then repeated grid search cross-validation algorithm [23] was performed to find optimal parameters for SS-CKFDA. By repeating cross-validation N times and for each grid point generating N cross-validation errors, the tuning parameter with minimal mean cross-validation error was chosen as optimal. An overview of the parameter selection strategy based on repeated grid search cross-validation is presented in Figure 6.

Selection of Classifier
First, the distance classifier to be applied should be determined. SS-CKFDA with KNN and MD was designed to determine which classifier is better for the present situation. The labeled rate was set to 0.5 and other parameters were optimized as follows: β were set to 0.5; the Gaussian kernel parameter σ was set to 7.2 and the parameter of KNN is set to 5. The test ran with 10-fold cross validation for ten times. The average classification accuracy of each time was recorded and shown in Figure 7. The min, max, average and standard deviation statistics were given in Table 2.

Selection of Classifier
First, the distance classifier to be applied should be determined. SS-CKFDA with KNN and MD was designed to determine which classifier is better for the present situation. The labeled rate was set to 0.5 and other parameters were optimized as follows:  were set to 0.5; the Gaussian kernel parameter  was set to 7.2 and the parameter of KNN is set to 5. The test ran with 10-fold cross validation for ten times. The average classification accuracy of each time was recorded and shown in Figure 7. The min, max, average and standard deviation statistics were given in Table 2.
It is clear that SS-CKFDA with KNN has better performance than SS-CKFDA with MD. This is because KNN with an optimal k value can be more effectively in avoiding the influence of outliers than MD. So KNN is selected as the major distance classifier for SS-CKFDA.  As a novel FDA based feature extraction algorithm, it is necessary to compare with other FDA algorithms. Thus, we selected FDA, SELF and CKFDA for comparison with different unlabeled rate. The unlabeled rate from 10% to 90% was changed, and the total number of labeled and unlabeled samples was kept constant. Here, we tested 10-fold cross-validation ten times for each unlabeled rate and each algorithm. The average accuracy and standard deviation of each algorithms under different unlabeled rate were listed in Table 3. Table 3 and Figures 8 and 9 present the classification accuracy at different unlabeled rates. FDA and SELF were combined to compare the linear supervised and semi-supervised learning methods. It could be easily observed that FDA and SELF cannot handle this nonlinear data, in which the classification accuracy was the lowest. It was also concluded that SELF cannot use unlabeled data to improve its performance when a nonlinear relationship exists between variables. On the other side, SS-CKFDA can achieve higher accuracy, when compared with CKFDA, as a whole. When the unlabeled rate was relatively low, SS-CKFDA had no advantage, when compared with CKFDA, and CKFDA can even achieve a better performance when the unlabeled rate was 10% and 20%. However, as the unlabeled rate increased, the performance of CKFDA declined more rapidly than SS-CKFDA. When the unlabeled rate reached 80% and 90%, the classification of  It is clear that SS-CKFDA with KNN has better performance than SS-CKFDA with MD. This is because KNN with an optimal k value can be more effectively in avoiding the influence of outliers than MD. So KNN is selected as the major distance classifier for SS-CKFDA.

Results Comparison of SS-CKFDA with Other FDA Algorithms
As a novel FDA based feature extraction algorithm, it is necessary to compare with other FDA algorithms. Thus, we selected FDA, SELF and CKFDA for comparison with different unlabeled rate. The unlabeled rate from 10% to 90% was changed, and the total number of labeled and unlabeled samples was kept constant. Here, we tested 10-fold cross-validation ten times for each unlabeled rate and each algorithm. The average accuracy and standard deviation of each algorithms under different unlabeled rate were listed in Table 3. Table 3 and Figures 8 and 9 present the classification accuracy at different unlabeled rates. FDA and SELF were combined to compare the linear supervised and semi-supervised learning methods. It could be easily observed that FDA and SELF cannot handle this nonlinear data, in which the classification accuracy was the lowest. It was also concluded that SELF cannot use unlabeled data to improve its performance when a nonlinear relationship exists between variables. On the other side, SS-CKFDA can achieve higher accuracy, when compared with CKFDA, as a whole. When the unlabeled rate was relatively low, SS-CKFDA had no advantage, when compared with CKFDA, and CKFDA can even achieve a better performance when the unlabeled rate was 10% and 20%. However, as the unlabeled rate increased, the performance of CKFDA declined more rapidly than SS-CKFDA. When the unlabeled rate reached 80% and 90%, the classification of SS-CKFDA continued to reach to 0.84 and 0.78, respectively, which are significantly superior to CKFDA. This was because as the labeled samples decreased, the model precision decreased for CKFDA. However, for SS-CKFDA, it could use more and more unlabeled samples to correct its model, ensuring its classification accuracy at a high level. On the other hand, it can be seen from the comparison of standard deviation that SS-CKFDA can achieve more stable performance than CKFDA. SS-CKFDA continued to reach to 0.84 and 0.78, respectively, which are significantly superior to CKFDA. This was because as the labeled samples decreased, the model precision decreased for CKFDA. However, for SS-CKFDA, it could use more and more unlabeled samples to correct its model, ensuring its classification accuracy at a high level. On the other hand, it can be seen from the comparison of standard deviation that SS-CKFDA can achieve more stable performance than CKFDA.   SS-CKFDA continued to reach to 0.84 and 0.78, respectively, which are significantly superior to CKFDA. This was because as the labeled samples decreased, the model precision decreased for CKFDA. However, for SS-CKFDA, it could use more and more unlabeled samples to correct its model, ensuring its classification accuracy at a high level. On the other hand, it can be seen from the comparison of standard deviation that SS-CKFDA can achieve more stable performance than CKFDA.

Results of the Automotive Assembly Process Data
Similarly, the collected data were divided into the training set and test set. At the same time, a small adjustment was performed, in which the test set was also added to the training procedure as unlabeled data for semi-supervised algorithms. This was because even though the test data was the present prediction data, its measured values remain beneficial for differentiating its fault category. In the present study, 10% of the samples were taken as the training set, and the rest of the samples were used as the test set, which is really consistent with realistic situations. The test ran ten times to reduce accidental error. Furthermore, we selected BP-ANN [24], KNN [20] and SVM [25] for comparison.
For BP-ANN, the parameters were set in this study as follows: three layers (i.e., an input layer, a hidden layer and an output layer) were arranged; the number of neurons in the hidden layer was set 10, the learning rate factor and momentum factor were all set to 0.1; the initial weights were set to 0.3; the learning algorithm was gradient descent backpropagation. For kernel parameter of SVM and k value of KNN, we used the same repeated gird search cross-validation algorithm described in Section 4.1 for parameter selection. Figure 10 demonstrates the score plots of the proposed SS-CKFDA, revealing that the five different kinds of samples were better dispersed. Table 4 lists the results for SS-CKFDA and other classical methods. It was clearly shown that SS-CKFDA can achieve the highest average classification accuracy and lowest standard deviation. For each type of fault, SS-CKFDA always performed better than others. Although SVM has the best performance in the classification of Normal and X_L, it performs poorly in the classification of Y_R and X_R, where SS-CKFDA is significantly better. Summarizing all of the five conditions, SS-CKFDA performs better than SVM and other algorithms.

Results of the Automotive Assembly Process Data
Similarly, the collected data were divided into the training set and test set. At the same time, a small adjustment was performed, in which the test set was also added to the training procedure as unlabeled data for semi-supervised algorithms. This was because even though the test data was the present prediction data, its measured values remain beneficial for differentiating its fault category. In the present study, 10% of the samples were taken as the training set, and the rest of the samples were used as the test set, which is really consistent with realistic situations. The test ran ten times to reduce accidental error. Furthermore, we selected BP-ANN [24], KNN [20] and SVM [25] for comparison.
For BP-ANN, the parameters were set in this study as follows: three layers (i.e., an input layer, a hidden layer and an output layer) were arranged; the number of neurons in the hidden layer was set 10, the learning rate factor and momentum factor were all set to 0.1; the initial weights were set to 0.3; the learning algorithm was gradient descent backpropagation. For kernel parameter of SVM and k value of KNN, we used the same repeated gird search cross-validation algorithm described in section 4.1 for parameter selection. Figure 10 demonstrates the score plots of the proposed SS-CKFDA, revealing that the five different kinds of samples were better dispersed. Table 4 lists the results for SS-CKFDA and other classical methods. It was clearly shown that SS-CKFDA can achieve the highest average classification accuracy and lowest standard deviation. For each type of fault, SS-CKFDA always performed better than others. Although SVM has the best performance in the classification of Normal and X_L, it performs poorly in the classification of Y_R and X_R, where SS-CKFDA is significantly better. Summarizing all of the five conditions, SS-CKFDA performs better than SVM and other algorithms. Furthermore, SS-CKFDA was also compared with other methods with respect to training time, and the results are presented in Table 5. This clearly shows that BP-ANN has the longest training time, while FDA has the least training time. Within these five methods, SS-CKFDA also performs very well. Although the training time was slightly longer than other FDA methods, it can achieve a better accuracy rate when the labeled sample is lesser. Furthermore, SS-CKFDA was also compared with other methods with respect to training time, and the results are presented in Table 5. This clearly shows that BP-ANN has the longest training time, while FDA has the least training time. Within these five methods, SS-CKFDA also performs very well. Although the training time was slightly longer than other FDA methods, it can achieve a better accuracy rate when the labeled sample is lesser.

Conclusions
The present study mainly investigates a novel semi-supervised learning method called, SS-CKFDA, which combines the advantages of semi-supervised learning and CKFDA. The main idea of this method is that the projected vectors extracted from labeled data can be adjusted by integrating the global structure information provided by all labeled and unlabeled samples. We also analyze the effects of the two important parameters β and σ separately and propose a parameter selection strategy based on repeated grid search cross-validation to find the optimal parameters. The experimental results of the TEP data proves that as a novel algorithm, SS-CKFDA coupled with the KNN classifier achieves better and more stable performance, compared to FDA, SELF and CKFDA individually. As the unlabeled rate increases, its advantages become more obvious.
Automotive assembly process data is collected from vision sensors, and the lack of labeled samples is a typical situation. Hence, the proposed method was implied on this. The experimental results of the automotive assembly process data revealed that the new algorithm can be successfully used in automotive manufacturing. When compared with other classical methods, it was observed that when the labeled samples were limited, SS-CKFDA can achieve better average classification accuracy for both training set and test set, allowing training time to satisfy practical use.