Quality Prediction and Yield Improvement in Process Manufacturing Based on Data Analytics

: Quality management is important for maximizing yield in continuous-ﬂow manufacturing. However, it is more di ﬃ cult to manage quality in continuous-ﬂow manufacturing than in discrete manufacturing because partial defects can signiﬁcantly a ﬀ ect the quality of an entire lot of ﬁnal product. In this paper, a comprehensive framework that consists of three steps is proposed to predict defects and improve yield by using semi-supervised learning, time-series analysis, and classiﬁcation model. In Step 1, semi-supervised learning using both labeled and unlabeled data is applied to generate quality values. In addition, feature values are predicted in time-series analysis in Step 2. Finally, in Step 3, we predict quality values based on the data obtained in Step 1 and Step 2 and calculate yield values with the use of the predicted value. Compared to a conventional production plan, the suggested plan increases yield by up to 8.7%. The production plan proposed in this study is expected to contribute to not only the continuous manufacturing process but the discrete manufacturing process. In addition, it can be used in early diagnosis of equipment failure.


Introduction
In the manufacturing industry, quality management is a key to competitiveness, productivity, and profit of companies because poor quality management can damage the trust and good image which a company has built up for a long time [1]. For this reason, the importance of quality management in various industries has emerged early on. In early 1950, Juran introduced the idea of Total Quality Control (TQC) to the overall Japanese industry. Lillrank underlined that maintenance is the most important activity in quality management [2]. Bergman and Klefsgo emphasized the role of quality maintenance and repair in production [3]. Especially in continuous-flow process manufacturing, it is very significant to manage quality and defect rate because some defects in a part is directly related to the quality of all subsequent processes.
Recently, with technical advances of the Internet of Things (IoT) technology, the amount of industrial data from sensors is surging and this has promoted the level of automation significantly. From a perspective of quality management, the increasing size of data can be utilized to predict defects so that the quality of final products can be improved. For this reason, there has been a wide range of research on data mining that helps to decide through the modeling of the knowledge extracted from data relations, rules, patterns, and information hidden from the database. In addition, big data analysis, artificial intelligence, machine learning, and deep learning make it possible to conduct quality management more usefully and accurately. big data analysis, artificial intelligence, machine learning, and deep learning make it possible to conduct quality management more usefully and accurately.
Specifically, this study conducts the data analysis on the continuous manufacturing process with the use of the sensing data of the plastic core extrusion process of a company. Based on the analytic results, yield can be improved by predicting defects in advance and updating production plans. First, for the effective quality management of the continuous manufacturing process, we analyzed the conventional manufacturing process. The problems of it are as follows. Even though well-designed production and quality data are required for better quality management, collecting the data takes a lot of time and cost. For this reason, small and medium-sized enterprises (SMEs) have difficulty for obtaining such data. For example, as shown in Figure 1a, in the extraction process for manufacturing a cylinder product, it is necessary to measure both ends of the product in terms of quality. Therefore, the quality in the middle of the product is not guaranteed. If those defects in the process can be predicted with data analytics preemptively, it enables companies to not only manage a device and product quality but increase a yield. For instance, in Case 1 presented in Figure 1b, if a product is manufactured according to a certain length, two out of five items are defective and therefore a yield is 60%. If the product quality can be predicted before a defect occurs as shown in Case 2, the finished product can be produced after excluding the defective part. Therefore, unlike Case 1 it can be produced the 3rd and 4th finished products and be manufactured 4 items with the same amount of raw materials. Accordingly, the research questions and the corresponding approaches in this study are presented in Table 1.

Question Approach
How will quality data be collected? Quality data generation through device data through semisupervised learning. What techniques will you use to conduct your research?
Using semi-supervised learning, time series prediction, and machine learning.
How do you evaluate performance? Use evaluation indicators considering characteristics of unbalanced data. If those defects in the process can be predicted with data analytics preemptively, it enables companies to not only manage a device and product quality but increase a yield. For instance, in Case 1 presented in Figure 1b, if a product is manufactured according to a certain length, two out of five items are defective and therefore a yield is 60%. If the product quality can be predicted before a defect occurs as shown in Case 2, the finished product can be produced after excluding the defective part. Therefore, unlike Case 1 it can be produced the 3rd and 4th finished products and be manufactured 4 items with the same amount of raw materials. Accordingly, the research questions and the corresponding approaches in this study are presented in Table 1.

Question Approach
How will quality data be collected? Quality data generation through device data through semi-supervised learning.
What techniques will you use to conduct your research? Using semi-supervised learning, time series prediction, and machine learning.
How do you evaluate performance? Use evaluation indicators considering characteristics of unbalanced data.
In order to answer all questions, the comprehensive framework with semi-supervised learning, time series prediction, and machine learning is proposed for utilizing unlabeled data in the continuous manufacturing process. For verification of the proposed framework, two experiments were conducted for quality prediction and yield improvement. In the first experiment, after data preprocessing, deep learning and multiple classifiers are applied to predict product quality. In order to handle multiple unlabeled data in continuous flow process manufacturing, the second experiment is designed to predict product quality with the streamline of three models (semi-supervised learning, time series prediction, and multiple classifiers). The detailed procedures are explained in Sections 4 and 5. Finally, based on the experimental results, an efficient production plan for the plastic extrusion process is derived for improving quality and yield in the continuous-flow process.

Data-Driven Quality Management
There have been many studies on data-driven quality prediction and process improvement. Angun researched the neural network-based statistics quality management [3]. Schnell analyzed and improved the production process in the lithium-ion cell manufacturing line by comparing and evaluating the neural network technique and data-mining technique [4]. Ramana predicted quality in the plastic injection molding process as a continuous-flow process by finding data patterns and abnormal symptoms with the use of the data mining technique [5]. With the use of an artificial neural network (ANN) method, Wang predicted the quality of the product in the powder metallurgy compression process which is not a continuous-flow process [6]. Ogorodnyk conducted ANN and decision tree-based classification research in order to predict parts quality in the thermoplastic resin injection molding process [7]. Although many studies have been conducted on the quality prediction of various processes, the quality prediction in process manufacturing has been a little researched. In addition, most studies related to the continuous-flow process focused on the prediction of influential factors on quality. Li studies the ANN model to predict extrusion pressure [8]. Lela researched the linear regression mathematical model for predicting an aluminum molding temperature by using the data recorded continuously in the manufacturing process [9].

Semi-Supervised Learning
For accurate learning in conventional supervised learning, it is necessary to apply accurate labeling to learning data. A traditional classifier executes learning by using the data with class only. In fact, in the manufacturing process, labeling millions of data costs a lot and requires skillful experts' long work. However, it is relatively easy to obtain unlabeled data [10]. In the circumstance, it is more important to apply a technology that makes it possible to achieve high performance with the use of a small amount of labeled data. One of the algorithms that can analyze data using a small amount of labeled data is semi-supervised learning (SSL). SSL falls between supervised learning and unsupervised learning. It can be used when data with both input value (X i ) and the target value (y i ), or data with labeled data {(X i , y i )} and data with unlabeled data {X i } are made as a model [11].
Semi-supervised learning can be divided into graph-based and pseudo labeling. In graph-based algorithm, each data sample is expressed as a vertex in a network graph, and links are the measured value of similarity of each vertex [12]. In this algorithm, the label is propagated from the labeled point to the unlabeled point [13]. Typical examples are label propagation (LP) and label spreading (LS). In LP and LS, kernel derivation by the mapping is performed to classify high-dimensional data [14]. The kernels used are the radial basis function (RBF) and k-nearest neighbor (KNN) algorithms, RBF is also called Gaussian kernels. Pseudo-labeling is a model that uses unlabeled data like actual labeled data. Pseudo-labeling produces a pseudo-labeled data by training a classification model with a small amount of labeled data and then predicting the unlabeled data. Then, the pseudo-labeled data and the existing labeled data are trained on the model again to generate a classification model [15].
Today, SSL was applied to a variety of relevant studies, helping to increase research results. Yan utilized SSL in order for the early diagnosis and detection of the air handling unit and showed high accuracy of defect diagnosis [16]. Ellefsen also applied SSL to predict the effective life for engine performance lowering of turbo fan [17]. Sen used SSL to validate data in order to recognize the authenticity of sensor data which is used to judge a defect in the pipe process [18].

Time-Series Prediction
This study utilizes time-series data which are collected from multiple sensors in a plastic extrusion process of a company. A deep learning technique suitable for time-series analysis is used. Previous studies have used recurrent neural network (RNN), long short-term memory (LSTM) models), and gated recurrent unit (GRU) to predict time-series data. Since the studies using the advantages of RNN are actively conducted, this study applies RNN. Maknickienė utilized RNN as a methodology for predicting the financial market [19]. Zhu used RNN to analyze the components and structural change of soil [20]. In addition, defects in electrochemical sensors or polymer were diagnosed using machine learning models such as RNN, LSTM, and CNN [21,22]. Aside from that, RNN is applied to the prediction for temperature control of the variable frequency based oil cooler in the industrial process. As such, RNN is used to predict a time-series process [23].

RNN
RNN is a type of ANN. As one of the deep learning techniques, it recirculates the output of a hidden layer as an input to learn sequential (time-series) data for prediction or classification. RNN is an extension of the existing feed-forward neural network using a loop that iterates the previous input into the output. RNN has the iterative state in which the activation in each step relies on the activation in its previous step, in order to process sequential data input. When the sequence data x = (x 1 , x 2 , x 3 , · · · , x t ) (x t is ith time data) is given, RNN updates hidden state h t repeatedly, where ∅ is a nonlinear function like logistic sigmoid function or tangent function [12]:

Classification
Machine learning is a method inferring a system function based on a pair of data and labels. An inference process is dependent on input features. If the prediction result of new input data is a continuous value, it is used for regression analysis; if a discrete value, it is used for classification [24]. This study uses logistic regression, decision tree, random forest, linear discriminant analysis (LDA), Gaussian naive Bayes (GNB), K-nearest Neighbor (KNN), and support vector classifier (SVC) among machine learning classification techniques, and ANN as a deep learning technique.

Differences from Previous Studies
Although many studies have performed quality prediction and used SSL or RNN, some features can be found when classifying related studies by process. First, in studies on quality prediction, research on discrete processes is more abundant than research on continuous processes. In addition, most studies related to the quality of the continuous process predicted the factors of facilities that affect the quality overall, not the quality of the products themselves. Second, studies using SSL have been applied to prediction in various fields, but most of them have been used to generate data in image processing. Studies related to manufacturing processes mainly focused on defect detection and monitoring systems and did not consider quality prediction. In addition, in the studies using RNN, although various predictions were made for time series data, the number of papers predicting the quality in a continuous process was very small. Therefore, this study proceeds with the prediction of the quality of the continuous process, which was not done well in the previous study. Quality prediction is carried out by collecting equipment data. In the previous study, sensor data is used to predict and maintain the quality of the process, and there are studies using deep learning and machine learning for sensor-based prediction [21,22]. For improving the quality prediction of small data, we apply SSL to labeling and RNN to generating predictive feature data. Finally, we classify the data, which are SSL and RNN applied, through classification techniques for quality prediction. It is intended to efficiently operate the production in a continuous process and use the quality prediction results to check how much the yield can be finally improved.

Data Introduction and Statistical Analysis
This study collected the data in a plastic extrusion process as a continuous-flow manufacturing process. In the plastic extrusion process, 20,801 instances were collected from each device sensor at an interval of one second. Among attributes in the dataset, we considered 26 attributes such as extrusion temperature, screw speed, external temperature, external humidity, and water temperature because some attributes have the same values during the collection period. Table 2 summarizes the attributes used in this study. If a final product exceeds a given tolerance, the product is marked as defective. Based on the product specification with an inner diameter of 6 inches and a thickness of 8 cm, the allowable error of process is determined as ±0.2 mm in thickness and ±0.5 mm in thickness. For the application of a classifier, the quality value for products was coded as a binary value (1 for defective and 0 for acceptable). In this study, 3839 defective products (about 18%) of 20,801 data were used. Note that quality value means not the quality of a single piece of product, but the quality of a part of a product manufactured every second.
To find the characteristics of each data, we conducted a statistical analysis of the line graph and box plot. Figure 2 illustrates the line graphs of the features of continuous data. TIEXT1~3 and Water_1~4 followed a similar trend by the sensor. Nevertheless, a temperature was changed depending on a sensor position so that it went up or down overall. However, Outtemp and Outwet are changing irregularly. Through this, it can be seen that TIEXT and Water are control variables and Outtemp and Outwet are measurement variables. To find the characteristics of each data, we conducted a statistical analysis of the line graph and box plot. Figure 2 illustrates the line graphs of the features of continuous data. TIEXT1~3 and Water_1~4 followed a similar trend by the sensor. Nevertheless, a temperature was changed depending on a sensor position so that it went up or down overall. However, Outtemp and Outwet are changing irregularly. Through this, it can be seen that TIEXT and Water are control variables and Outtemp and Outwet are measurement variables. In outliner analysis, features of continuous data were checked with the use of the box plot. Box plot is one of the various statistical techniques named exploratory data analysis. It is used to visually identify a pattern that can be hidden in a data set. Figure 3 represents the illustrative box plots of continuous data. The darker the point representing an outlier is, the more outliers are distributed in a relevant range. According to the analysis, Outtemp and Outwet had no outliers, and TIEXT2~3 and Water_1~4 had sixty outliers of each one of 329, 5, 315, 237, and 3174. Although there were a significant number of outliers, these outliers were concentrated in a few particular sections. Taking this into consideration, this study did not regard the outliers as outliers caused by data errors. In outliner analysis, features of continuous data were checked with the use of the box plot. Box plot is one of the various statistical techniques named exploratory data analysis. It is used to visually identify a pattern that can be hidden in a data set. Figure 3 represents the illustrative box plots of continuous data. The darker the point representing an outlier is, the more outliers are distributed in a relevant range. According to the analysis, Outtemp and Outwet had no outliers, and TIEXT2~3 and Water_1~4 had sixty outliers of each one of 329, 5, 315, 237, and 3174. Although there were a significant number of outliers, these outliers were concentrated in a few particular sections. Taking this into consideration, this study did not regard the outliers as outliers caused by data errors. The distribution of discrete data is shown in Table 3. In the case of dice temperature and cylinder temperature, sensors are located differently so that data values and sections are different. The number of data related to the speed of RPM1_SCREW and RPM2_EXT is small. Table 3. Data sets of discrete data.

Attribute
Data Set TI_DIES1 {190,191,192,193,194,195,196,197,198,199,200} TI_DIES2 {191,192,193,194,195,196,197,198,199,200,201,202} TI_DIES3 {192,193,194,195,196,197,198,199,200 190,191,192,193} TI_CYL4 {190,191,192,193,194,195,196,197,198} TI_CYL5 {192,193,194,195,196,197,198} TI_CYL6 {194,195,196,197,198,199} The results from the correlation analysis on features are shown in Figure 4. Pearson correlation coefficient was applied, in which a correlation coefficient is a value between-1 and +1. The closer the coefficient is to ±1, the more there is a correlation. A correlation coefficient was presented with different color temperature. The larger a correlation coefficient is, the darker the color temperatures are. According to the analysis, the correlation coefficients of the sensors in close proximity, such as extrusion temperatures (TIEXT1~3) and dice temperatures (TI_DIES2~3), were mostly high. Aside from that, the correlation coefficients between TIEXT, TI_DIES4, Outwet, and Outtemp1 were high.
Additionally, the correlation coefficients between quality values and features were analyzed. In Table 4, the analysis results are presented in descending sort order after taking absolute values. Given that all of the correlation coefficients between quality values and features are small, a particular feature does not determine a quality value. The distribution of discrete data is shown in Table 3. In the case of dice temperature and cylinder temperature, sensors are located differently so that data values and sections are different. The number of data related to the speed of RPM1_SCREW and RPM2_EXT is small. Table 3. Data sets of discrete data.

Attribute
Data Set {190,191,192,193,194,195,196,197,198,199,200} TI_DIES2 {191,192,193,194,195,196,197,198,199,200,201,202} TI_DIES3 {192,193,194,195,196,197,198,199,200 190,191,192,193} TI_CYL4 {190,191,192,193,194,195,196,197,198} TI_CYL5 {192,193,194,195,196,197,198} TI_CYL6 {194,195,196,197,198,199} The results from the correlation analysis on features are shown in Figure 4. Pearson correlation coefficient was applied, in which a correlation coefficient is a value between-1 and +1. The closer the coefficient is to ±1, the more there is a correlation. A correlation coefficient was presented with different color temperature. The larger a correlation coefficient is, the darker the color temperatures are. According to the analysis, the correlation coefficients of the sensors in close proximity, such as extrusion temperatures (TIEXT1~3) and dice temperatures (TI_DIES2~3), were mostly high. Aside from that, the correlation coefficients between TIEXT, TI_DIES4, Outwet, and Outtemp1 were high.

Quality Prediction in Process Manufacturing
In this chapter is described the first experiment for predicting product quality with the uses of machine learning classification models. In all the analyses, Python 3.6.10 was applied and the software libraries used are listed in Table 5.  Additionally, the correlation coefficients between quality values and features were analyzed. In Table 4, the analysis results are presented in descending sort order after taking absolute values. Given that all of the correlation coefficients between quality values and features are small, a particular feature does not determine a quality value.

Quality Prediction in Process Manufacturing
In this chapter is described the first experiment for predicting product quality with the uses of machine learning classification models. In all the analyses, Python 3.6.10 was applied and the software libraries used are listed in Table 5.

Data Preprocessing
For the machine learning based quality prediction, this study applied ANN, Logistic Regression, Decision Tree, Random Forest, LDA, Gaussian NB, KNN, and SVC. In consideration of the imbalanced data (with different class ratio) used in this study, learning data sets were preprocessed before a classification model was created. For preprocessing, resampling and feature scaling were used. As resampling techniques for adjusting imbalanced data, there are undersampling, oversampling, and hybrid methods [25], each of which is described in Table 6. Table 6. Three resampling techniques.

Methods Description
Undersampling Reduces the imbalance rate by randomly removing some elements of the majority class from the data set.
Oversampling It takes the opposite approach to Undersampling. Instead of reducing the majority class, add the minority class to the training set as a cloned or disturbed variant. Hybrid method Combine Undersampling and Oversampling in a data balancing approach.
Of 20,801 original data, 18,218 are non-defective product data, and 2583 are defective product data. In this study, the set ratio of training data to test data is 5 to 5. Of 10,401 training data, 9131 are non-defective product data (about 88%), and 1270 are defective product data (about 12%). Since there was a large difference between the majority class and minority class, Oversampling was applied to set the class ratio. After resampling, 9131 defective product data and 9131 non-defective product data were created equally. In addition, since feature values vary and each data scale is irrelevant, feature scaling is applied to unify a range of features. Feature scaling is used to compare and analyze multi-dimensional values easily, and to improve the stability and convergence speed in the optimization process [26]. As described in Section 4.1, features had some data including outlier. In consideration of the point, RobustScaler of the Python scikit-learn library was used as a preprocessing method. RobustScaler is the technique of removing a median value, scaling data in the range of IQR (Inter-quartile range), and therefore minimizing the influence of outliers. The processing method of RubustScaler ensure that each feature has a median of 0 and a quartile of 1 so that all features are on the same magnitude [27].

Results of Quality Prediction through the Classifier
Cross-validation was applied to select a model fitting process data. Cross-validation can prevent overfitting and data sampling bias and can be applied if there are a small number of data records, and can be used to check the generality of a model [28]. K-fold cross-validation was applied. The technique splits the total data into k sets, selects each set as a test set at a time, and executes cross-validation a total of k times. At this time, k-1 data sets except for the data selected as a test set are used as a training set [29]. Performance indices for model evaluation generally use the confusion matrix as shown in Table 7. Among the indices, balanced accuracy is applied. General accuracy does not support a significant confidence interval and especially does not provide the safety device for biased data like imbalanced data. Therefore, to overcome the problem, balanced accuracy is replaced [30]. Balanced accuracy is applied to the binary and multiple class classification for processing imbalanced data sets and is defined as the mean of the recalls obtained from classes: accuracy =  Of the applied classifiers, ANN was designed to be a model with two hidden layers, each of which has 100 neurons and 80 neurons, respectively. Rectified linear unit (ReLU) and sigmoid were used as activation functions, and Adam was used as an optimizer. ReLU is the most frequently applied activation function. A final classification result should be a binary number: either non-defective product (0) or defective product (1). Therefore, sigmoid was applied to an output layer. The count of repetition was set to 1000, and the batch size was set to 10. For model learning, validation data was set to 30%, and then early stopping was applied. Early stopping can prevent underfitting and overfitting. It can stop learning even if the repetition count has yet to be reached, only if the repetition count, in which no more performance is improved, exceeds a particular count. In this study, the count was set to '5 . The results of cross-validation (k = 5) are presented in Table 8. It was found that a model has more generality in the order of decision tree, LDA, and logistic regression. For the evaluation of the performance of the classification model, log loss, and receiver operating characteristic (ROC) curves were analyzed as well as the performance found in cross-validation. The two performance indexes can be checked after data fitting in the model so that they are not used in cross-validation. Log loss is called cross-entropy and can be compared with accuracy. Accuracy means that a prediction value is equal to an actual value. Log loss takes into account uncertainty depending on how different the label is based on probability. Therefore, with log loss, it is possible to obtain a delicate view of the model performance. The lower a log loss is, the better value is. In the ROC curve, false positive rate (FPR) is displayed on the x-axis and true positive rate (TPR) on the y-axis. At this time, FPR is a rate of negative cases wrongly classified as positive ones, and TPR is a rate of positive cases that have labels specified correctly [31]. ROC can calculate the area under the curve (AUC) and provides the score that can be used to compare the AUC model. Generally, a ROC curve is effective at severe class imbalance when minority class has a small number of data [32]. An unskilled classifier scores 0.5, whereas a perfect classifier scores 1.0. The classification results after the consideration of the added performance evaluation measures are presented in Table 9. According to the final classification performance analysis, the best model in terms of balanced accuracy and log loss was decision tree, and the best model in terms of the ROC curve was random forest. We compared two models concerning the confusion matrix as shown in Table 10. In the case of random forest, FP (the case that was predicted to be defective though non-defective) was very low, but FN (the case that was predicted to non-defective though defective actually) was high. In the case of the decision tree, each error was distributed in a balanced way. In terms of the number of incorrectly classified items, decision tree 365 + 286 = 651, and random forest 2 + 1215 = 1217. Given that, we may conclude that decision tree has better performance.

Simple Classification Feature Importance
Feature Importance was analyzed in order to find which features were considered to be critical factors in data classification by decision tree as shown in Figure 5. In terms of feature importance, the top five features of all were TI_CYL5, TI_DIES1, and TIEXT1~3, each of which scored 0.1896, 0.1757, 0.1101, 0.1085, and 0.1046. The top five of 25 features accounted for 69% influence. According to the final classification performance analysis, the best model in terms of balanced accuracy and log loss was decision tree, and the best model in terms of the ROC curve was random forest. We compared two models concerning the confusion matrix as shown in Table 10. In the case of random forest, FP (the case that was predicted to be defective though non-defective) was very low, but FN (the case that was predicted to non-defective though defective actually) was high. In the case of the decision tree, each error was distributed in a balanced way. In terms of the number of incorrectly classified items, decision tree 365 + 286 = 651, and random forest 2 + 1215 = 1217. Given that, we may conclude that decision tree has better performance.

Simple Classification Feature Importance
Feature Importance was analyzed in order to find which features were considered to be critical factors in data classification by decision tree as shown in Figure 5. In terms of feature importance, the top five features of all were TI_CYL5, TI_DIES1, and TIEXT1~3, each of which scored 0.1896, 0.1757, 0.1101, 0.1085, and 0.1046. The top five of 25 features accounted for 69% influence.

Yield Improvement Method
Labeled data requires expensive labor and a lot of time so that it is hard to collect plenty of labeled data. On the contrary, it is far easy to obtain a lot of unlabeled data. For example, in text classification, it is possible to access a large document database easily through web crawling, and

Yield Improvement Method
Labeled data requires expensive labor and a lot of time so that it is hard to collect plenty of labeled data. On the contrary, it is far easy to obtain a lot of unlabeled data. For example, in text classification, it is possible to access a large document database easily through web crawling, and some of the collected data is classified manually [33]. In the second experiment, on the assumption that there are numerous unlabeled data in the product manufacturing process, the quality of the continuous-flow process is predicted with the use of SSL and time-series prediction.

Research Framework
The framework in the second experiment is illustrated in Figure 6. The whole process is divided into Step 1, Step 2, and Step 3. In Step 1, on the assumption that unlabeled data and labeled data coexist, unlabeled data is labeled through SSL. In Step 2, the future value of feature data is generated through time-series prediction. In Step 3, quality is finally predicted and classified on the basis of the labeled data created in Step 1 and the data created through time-series prediction in Step 2.

Research Framework
The framework in the second experiment is illustrated in Figure 6. The whole process is divided into Step 1, Step 2, and Step 3. In Step 1, on the assumption that unlabeled data and labeled data coexist, unlabeled data is labeled through SSL. In Step 2, the future value of feature data is generated through time-series prediction. In Step 3, quality is finally predicted and classified on the basis of the labeled data created in Step 1 and the data created through time-series prediction in Step 2.

STEP1-Semi-Supervised Learning
When a small amount of data is resampled or scaled, bias can occur. Therefore, the data is analyzed without data preprocessing. The ratio of training data to test data was set to 5:5. SSL was conducted only with the use of 10,400 training data. Of the training data, 6281 (about 60%) were randomly deleted, and 40% (4119 data) was used to label the deleted data. The SSL method applied pseudo labeling and graph-based classifier. Pseudo labeling applied seven classifiers except for ANN among the classifiers used in Section 4.2. Label spreading and label propagation as graph-based SSL classifiers applied radial basis function (RBF) and KNN kernels, respectively. Accordingly, SSL was applied to a total of eleven cases, and cross-validation (k = 5) was conducted. Table 11 presents the results. Given the results of cross-validation, decision tree and random forest had the most generality of the model. Table 11. Results of cross-validation applied with semi-supervised learning (k = 5).

STEP1-Semi-Supervised Learning
When a small amount of data is resampled or scaled, bias can occur. Therefore, the data is analyzed without data preprocessing. The ratio of training data to test data was set to 5:5. SSL was conducted only with the use of 10,400 training data. Of the training data, 6281 (about 60%) were randomly deleted, and 40% (4119 data) was used to label the deleted data. The SSL method applied pseudo labeling and graph-based classifier. Pseudo labeling applied seven classifiers except for ANN among the classifiers used in Section 4.2. Label spreading and label propagation as graph-based SSL classifiers applied radial basis function (RBF) and KNN kernels, respectively. Accordingly, SSL was applied to a total of eleven cases, and cross-validation (k = 5) was conducted. Table 11 presents the results. Given the results of cross-validation, decision tree and random forest had the most generality of the model. The results from the application to a real model are shown in Table 12. In this case, SVC classifies all values as the non-defective product so that it is meaningless in comparison to different performance evaluation measures. For this reason, it was excluded. For the comparison of model performance, balanced accuracy, log loss, and AUC of the ROC curve were used. In respect of balanced accuracy and log loss, the top five models were equal. In terms of the ROC curve, decision tree and random forest were excellent. Confusion matrices were analyzed in order to select a final model. As a result, random forest was found to have good performance as shown in Table 13. On balance, labeled data through random forest-based pseudo labeler are trained in the final classification process of Step 3, therefore, quality is predicted.

STEP2-RNN Time Series Analysis
In Step 2, a test data set is predicted with the use of the training data set of features. For time-series prediction, the RNN model was applied, and a model with two hidden layers was designed. One hidden layer has 500 neurons, and the other has 1000 neurons. As an activation function, ReLU was used. Adam was used as an optimizer. Individual learning was conducted for each feature. The count of repetition was set to 1000, and the batch size was set to 10. For model learning, validation data was set to 30%, and then early stopping was applied. Early stopping can prevent underfitting and overfitting. It can stop learning even if the repetition count has yet to be reached, only if the repetition count, in which no more performance is improved, exceeds a particular count. In this study, the count was set to '5 . For the features expressed as discrete data in Table 2, the output value was rounded off and then changed to an integer number. In comparison between a predicted value and a real value, the performance was evaluated with mean absolute percentage error (MAPE). MAPE is a measure of prediction accuracy of a forecasting method in statistics. It usually expresses the accuracy as a ratio defined in Equation (2). A is the actual value and F is the predicted value. Smaller MAPE means better predicted performance. The mean, standard deviation, minimum value, and maximum value of MAPEs of all features are 0.2530, 0.5239, 0, and 2.1950, respectively. More details are shown in Table 14:

STEP3-Classification
In Step 3, the final classification data is obtained for yield improvement. In Step 1, loss data were generated by random forest-based pseudo labeler. In Step 2, feature data were predicted with RNN. In the last step, classification is performed on the basis of the data generated in each previous step, and a future quality value is finally predicted. The classifiers used in Section 4.2 were also applied in this step. Tables 15 and 16 present the final classification performance and results after random forest (SSL) was applied. As shown in Table 15, decision tree and random forest had excellent performance in terms of all performance indexes. The final objective of this study is to increase the yield through quality prediction. For the reason, based on the quality value predicted in Step 3, how much more the process method can increase a yield than a conventional process method will be calculated.

Yield Improvement Results
Yield is calculated as 'the number of non-defective products (a) divided by total output (A)'. Typically, products are manufactured in the equidistant interval method in which products are cut at a given interval according to continuous production. In this case, if any part of a product is defective, the product is judged to be a defective product. Therefore, this study proposes a new production method that predicts quality through SSL, time-series prediction, and classifier and performs product cutting in consideration of the part predicted to be defective. In the proposed new method, the value calculated with the formula 'the number of non-defective products (b, c, d)/total output (A)' is measured as a yield for comparison. Total output is the value calculated in the conventional equidistant interval production method. Two models of decision tree and random forest were used for the final classification.
(1) If the predicated quality of all items is a non-defective product in a production length during a unit time, the products are manufactured. (2) If the manufactured products include any defective product, they are regarded as defective products.
Without the application of SSL and RNN, a small amount of data was applied to quality prediction. Of all the data (10,401 data), 60% were deleted randomly, and 40% (4120 data) were trained with Decision Tree, which showed the best performance in Section 4.2. Based on these small numbers of data, 10,395 data were predicted again. In addition, the quality of the Step 3 results was predicted with 40% of the data, and the results are shown in Section 5.4. Confusion matrix, which is the result of each prediction, is shown in Table 17.  Table 18 presents the numbers of the products generated according to the prediction results and the proposed method. Given the final number of non-defective products, there was no yield improvement effect even if product cutting was executed in consideration of defect occurrence. However, as shown in the study results, the proposed method increased to yield more than a conventional production method when a production unit time was larger than six seconds (c, d). In the case of decision tree, a yield increased in all the unit time slots after six seconds and went up by a maximum of about 8.70% (c). In the case of random forest, the maximum yield rise rate was 6.76%, and a yield sometimes dropped more than a conventional method sporadically (d). Given that, it is effective to apply the proposed manufacturing method through the decision tree (c) based quality prediction.

Conclusions
Most of the studies related to continuous-flow process quality focused on the prediction of equipment quality or particular factors. In this study, statistical analysis was conducted on basic data, and quality was predicted with the use of a classifier. In addition, a small amount of quality data was taken into account. A process of predicting quality efficiently with the use of the small data and increasing yield was researched. The process consists of three steps. In Step 1, unlabeled data were labeled by random forest-based pseudo labeling. At this time, the whole quality was predicted with the use of a small amount of labeled data. It does not require the cost and effort for additional data collection. In Step 2, a feature value was generated through time-series prediction-based RNN as a deep learning technique. In Step 3, the final quality was predicted through the decision tree. Finally, this study proposed a method for cutting a product in consideration of the defect occurrence point through quality prediction and thereby improving the yield of the continuous-flow process. SSL, RNN, and classification algorithms were widely used in previous studies. However, unlike previous studies, this study used these algorithms to predict the quality of a continuous process. In addition, the proposed framework improved the yield by 8.7% in a continuous process. If a small amount of data is used for quality prediction with no separate data processing even in consideration of defects, the yield can become lower than that of a conventional equidistant production method. However, the proposed method using semi-supervised learning and RNN showed better performance than the existing production method even when there is a small amount of data.
The production method proposed in this study applies to various areas. For example, Case 2 in Figure 1b can occur usually in the continuous-flow process industry, so applying a quality prediction can help improve yield. Also, it can be applied to the case where quality needs to be predicted even if there is not much data in various areas. If the proposed production method is applied as a model whose prediction ability will be improved with a large amount of data, it is expected to contribute to saving raw materials and improving quality greatly.